From dmessina at wustl.edu Sun Jul 1 01:38:48 2007 From: dmessina at wustl.edu (David Messina) Date: Sun, 1 Jul 2007 00:38:48 -0500 Subject: [Bioperl-l] svn auto-properties [was Re: First cut svn repository] In-Reply-To: <46869226.70203@sheffield.ac.uk> References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <5764264E-5C40-4C9E-B1C9-A70628AC1DD0@uiuc.edu> <18051.44281.831316.749586@almost.alerce.com> <18051.61992.627473.323346@almost.alerce.com> <4684AF3D.5090907@sheffield.ac.uk> <843758CD-9C5B-4DDA-9FF4-B90AA225BDB3@wustl.edu> <468628AC.9060200@sheffield.ac.uk> <461F64B9-87FD-458A-8945-8238E7076109@wustl.edu> <46869226.70203@sheffield.ac.uk> Message-ID: <3164A6E3-77CF-4E61-9609-1408768862B1@wustl.edu> > [Nath] > I think the list of seq formats recognised by Bioperl in Bio::SeqIO > and > Bio::AlignIO would be a good start. As these are likely to be the ones > that are sensitive to file format recognition and thus could break > tests > if renamed. Sounds good to me. I will do a quick tour of the rest of the repo looking for other common or important file extensions, but I don't expect there to be many if any. > [still Nath] > I think a lot of people have used "." in file names as an > alternative to > a space. I think it would be beneficial to use an underscore "_" in > these cases and leave the "." to represent the beginning of the file > extension. That's a great idea. > [Chris] > Do we need to define every filetype extension, or can there be a > fallback (eg if it isn't on the list or has no extension it's plain > text)? For every file that's added, svn takes a peek to see if it's human- readable. If not, it's tagged with the generic MIME type application/ octet-stream. (It does this so it knows not to try to do diffs and merges on a binary file.) So the default for a human-readable file is no MIME type, which I believe is essentially the same thing as text/plain. And then regardless of the outcome of svn's peek, any matching auto- props are then applied, overriding svn's choice. So if we don't define every extension, I think we'll be fine. It'd be nice to have everything tagged with a MIME type, though. For one thing, Apache will use it to do the right thing when people browse the repo over the web. And two, because metadata is cool. :) One more thing: in the course of reading up on this, I learned that my earlier expectation about multiple auto-prop matches was incorrect. It's true that multiple unrelated matches means that multiple properties are set on the file. But when a file matches multiple *conflicting* auto-property patterns, there's no telling which value it'll get. Dave From hartzell at alerce.com Sun Jul 1 12:29:29 2007 From: hartzell at alerce.com (George Hartzell) Date: Sun, 1 Jul 2007 09:29:29 -0700 Subject: [Bioperl-l] First cut svn repository In-Reply-To: References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <4683A7D1.8070403@sendu.me.uk> <18051.48684.996884.134046@almost.alerce.com> <4683C385.3050904@sendu.me.uk> <18051.63674.685297.426813@almost.alerce.com> <18052.3946.224905.415905@almost.alerce.com> <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net> Message-ID: <18055.54889.677775.868974@almost.alerce.com> Hilmar Lapp writes: > It turns out that both files are also present on the release-0-9-3, > bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add > > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ > HUMBETGLOA.fasta > > to the post-processing commands. > [...] Will do. Thanks for working out the incantations! g. From cjfields at uiuc.edu Mon Jul 2 09:26:06 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:26:06 -0500 Subject: [Bioperl-l] test data Message-ID: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> I am planing on adding test data to cvs for eutils and have run across some stuff in bugzilla that needs to be added as well. Should we, as convention, start adding data sequestered to a fold with the test name, within t/data? This might make life easier in the long run (keep track of files, get rid of old files, etc), and may make it easier for wrapping up the correct data with tests if we start submitting single module CPAN updates. chris From cjfields at uiuc.edu Mon Jul 2 09:52:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:52:27 -0500 Subject: [Bioperl-l] test data In-Reply-To: <468901C1.8020505@sendu.me.uk> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> Message-ID: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Chris Fields wrote: >> I am planing on adding test data to cvs for eutils and have run >> across some stuff in bugzilla that needs to be added as well. >> Should we, as convention, start adding data sequestered to a fold >> with the test name, within t/data? > > I'd actually argue that this shouldn't be done: data is sometimes > reused amongst multiple different test scripts, and when looking > for data to reuse its easier to spot it in a single directory > compared to searching through multiple directories. > > >> This might make life easier in the long run (keep track of files, >> get rid of old files, etc), and may make it easier for wrapping up >> the correct data with tests if we start submitting single module >> CPAN updates. > > I don't think that will be an issue. The automated process would > read the test script and see what input files it uses, copying > those into the archive. So, just be sure to standardise on using > test_input_file() to make that possible. > > > That said, I wouldn't mind especially either way. Just don't do it > now, since test script names (and therefore the name of the > directory you'd want to store the input files in) might all change. > > > In fact we can imagine that we have a test script t/ > BioZombieKitten.t which stores its test data in t/data/ > BioZombieKitten/input.file but the script gets the path to this > file by: > my $input_file = test_input_file('input.file'); > > test_input_file() is then implemented to look for the file in the > subdir of data corresponding to the script name if we're dealing > with the 900-modules-in-a-package checkout-type situation, but just > in t/data if we're in the one-module-in-a-package situation. > > In any case, things will be most flexible if you drop files > directly into t/data for now and reference them without any subdirs > in the call to test_input_file(). Fine by me, I just find it very cluttered. BioZombieKitten?!? chris From bix at sendu.me.uk Mon Jul 2 10:00:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 15:00:37 +0100 Subject: [Bioperl-l] test data In-Reply-To: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> Message-ID: <46890505.1070707@sendu.me.uk> Chris Fields wrote: > On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Fine by me, I just find it very cluttered. Yes, I agree. I also wish we had a decent naming convention for files. (Ie. it would be nice to have a good idea what a file was for without having to study the test script that uses it.) > BioZombieKitten?!? I get Bio/perl/ and Bio/ware/ confused in my head ;) http://forums.bioware.com/viewtopic.html?topic=562916&forum=84 From bix at sendu.me.uk Mon Jul 2 09:46:41 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 14:46:41 +0100 Subject: [Bioperl-l] test data In-Reply-To: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> Message-ID: <468901C1.8020505@sendu.me.uk> Chris Fields wrote: > I am planing on adding test data to cvs for eutils and have run across > some stuff in bugzilla that needs to be added as well. > > Should we, as convention, start adding data sequestered to a fold with > the test name, within t/data? I'd actually argue that this shouldn't be done: data is sometimes reused amongst multiple different test scripts, and when looking for data to reuse its easier to spot it in a single directory compared to searching through multiple directories. > This might make life easier in the long > run (keep track of files, get rid of old files, etc), and may make it > easier for wrapping up the correct data with tests if we start > submitting single module CPAN updates. I don't think that will be an issue. The automated process would read the test script and see what input files it uses, copying those into the archive. So, just be sure to standardise on using test_input_file() to make that possible. That said, I wouldn't mind especially either way. Just don't do it now, since test script names (and therefore the name of the directory you'd want to store the input files in) might all change. In fact we can imagine that we have a test script t/BioZombieKitten.t which stores its test data in t/data/BioZombieKitten/input.file but the script gets the path to this file by: my $input_file = test_input_file('input.file'); test_input_file() is then implemented to look for the file in the subdir of data corresponding to the script name if we're dealing with the 900-modules-in-a-package checkout-type situation, but just in t/data if we're in the one-module-in-a-package situation. In any case, things will be most flexible if you drop files directly into t/data for now and reference them without any subdirs in the call to test_input_file(). From hlapp at gmx.net Mon Jul 2 16:02:37 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 16:02:37 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18054.63942.316904.413911@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: Just FYI, after applying the changes I've been sending, I was able to check out the repository in its entirety. -hilmar On Jun 30, 2007, at 8:48 PM, George Hartzell wrote: > > There's a second cut at the subversion repository. I've done a better > job of setting svn:keywords and svn:eol-style on various files. The > defaults were more cautious and I used an auto-props files based on > the wiki version. > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2 > > The old repository's still around as > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1 > > I renamed it so that people would work with it by mistake. If, for > some hard-to-imagine reason, you have a working copy that you want to > run against it, you should be able to do an svn switch --relocate on > your working copy and be back in shape. In fact, it might be a good > time to give it a try.... > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From wrp at virginia.edu Mon Jul 2 16:08:04 2007 From: wrp at virginia.edu (William R. Pearson) Date: Mon, 2 Jul 2007 16:08:04 -0400 Subject: [Bioperl-l] Course: Computational and Comparative Genomics Message-ID: <4B3F66D7-CF05-4CD1-A148-272B4B56FBD4@virginia.edu> Course announcement - Application deadline, July 15, 2007 ================================================================ Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS November 7 - 13, 200 Application Deadline: July 15, 2007 INSTRUCTORS: Pearson, William, Ph.D., University of Virginia, Charlottesville, VA Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of Prussia, PA Beyond BLAST and FASTA - Alignment: from proteins to genomes - This course presents a comprehensive overview of the theory and practice of computational methods for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment, and genome scale alignment. Additional topics include gene finding, dentifying signals in unaligned sequences, integration of genetic and sequence information in biological databases. The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course makes extensive use of local WWW pages to present problem sets and the computing tools to solve them. Students use Windows and Mac workstations attached to a UNIX server. The course is designed for biologists seeking advanced training in biological sequence analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis and comparative genomics. The primary focus of the Computational and Comparative Genomics Course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and developing new algorithms. Cold Spring Harbor also offers a "Programming for Biology" course, which focuses more on software development. For additional information and the lecture schedule and problem sets for the 2006 course, see: http://fasta.bioch.virginia.edu/cshl06 ================================================================ To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/courses/courseapplication.asp ================================================================ Bill Pearson From niels at genomics.dk Mon Jul 2 16:45:07 2007 From: niels at genomics.dk (Niels Larsen) Date: Mon, 02 Jul 2007 22:45:07 +0200 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <468963D3.3000007@genomics.dk> I write hoping someone could show me how to create a PrimarySeq object without parsing features and all first. The lines below return "Can't locate object method "next_seq" via package "Bio::PrimarySeq" at ./tst2 line 16." whereas calling Bio::SeqIO-> gives no error, but a too big object. The GenBank record after the __END__ is the "1.gb" file. I could not find out how from the tutorial or the Bio::PrimarySeq description. Niels L #!/usr/bin/env perl use strict; use warnings FATAL => qw ( all ); use Data::Dumper; use Bio::Seq; use Bio::SeqIO; my ( $seq_h, $seq ); $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", -format => 'genbank' ); $seq = $seq_h->next_seq(); # print Dumper( $seq ); __END__ LOCUS X60065 9 bp mRNA linear MAM 14-NOV-2006 DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. ACCESSION X60065 REGION: 1..9 VERSION X60065.1 GI:5 KEYWORDS beta-2 glycoprotein I. SOURCE Bos taurus (cattle) ORGANISM Bos taurus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bos. REFERENCE 1 AUTHORS Bendixen,E., Halkier,T., Magnusson,S., Sottrup-Jensen,L. and Kristensen,T. TITLE Complete primary structure of bovine beta 2-glycoprotein I: localization of the disulfide bridges JOURNAL Biochemistry 31 (14), 3611-3617 (1992) PUBMED 1567819 REFERENCE 2 (bases 1 to 9) AUTHORS Kristensen,T. TITLE Direct Submission JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of Mol Biology, University of Aarhus, C F Mollers Alle 130, DK-8000 Aarhus C, DENMARK FEATURES Location/Qualifiers source 1..9 /organism="Bos taurus" /mol_type="mRNA" /db_xref="taxon:9913" /clone="pBB2I" /tissue_type="liver" gene <1..>9 /gene="beta-2-gpI" CDS <1..>9 /gene="beta-2-gpI" /codon_start=1 /product="beta-2-glycoprotein I" /protein_id="CAA42669.1" /db_xref="GI:6" /db_xref="GOA:P17690" /db_xref="UniProtKB/Swiss-Prot:P17690" /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT DASDVKPC" sig_peptide <1..>9 /gene="beta-2-gpI" ORIGIN 1 ccagcgctc // From Kevin.M.Brown at asu.edu Mon Jul 2 17:35:12 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 2 Jul 2007 14:35:12 -0700 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <468963D3.3000007@genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> Message-ID: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Start by having a look at the following link: http://bioperl.org/cgi-bin/deob_interface.cgi SeqIO is how one reads or writes sequences to/from files. Bio::PrimarySeq is just an object that holds information about a sequence obtained from a file. As for how to parse a Genbank file into a list of features: $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); while (my $seq = $file->next_seq()) { @features = $seq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { # @sorted_features holds all the Bio::PrimarySeq features obtained from the genbank file push @sorted_features, $f; } } } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Niels Larsen > Sent: Monday, July 02, 2007 1:45 PM > Cc: bioperl-l List > Subject: [Bioperl-l] simple PrimarySeq question > > I write hoping someone could show me how to create a > PrimarySeq object without parsing features and all first. The > lines below return > > "Can't locate object method "next_seq" via package > "Bio::PrimarySeq" at ./tst2 line 16." > > whereas calling Bio::SeqIO-> gives no error, but a too big object. > The GenBank record after the __END__ is the "1.gb" file. I > could not find out how from the tutorial or the > Bio::PrimarySeq description. > > Niels L > > > #!/usr/bin/env perl > > use strict; > use warnings FATAL => qw ( all ); > > use Data::Dumper; > > use Bio::Seq; > use Bio::SeqIO; > > my ( $seq_h, $seq ); > > $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => > 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", > -format => 'genbank' ); > > $seq = $seq_h->next_seq(); > > # print Dumper( $seq ); > > __END__ > > LOCUS X60065 9 bp mRNA linear > MAM 14-NOV-2006 > DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. > ACCESSION X60065 REGION: 1..9 > VERSION X60065.1 GI:5 > KEYWORDS beta-2 glycoprotein I. > SOURCE Bos taurus (cattle) > ORGANISM Bos taurus > Eukaryota; Metazoa; Chordata; Craniata; > Vertebrata; Euteleostomi; > Mammalia; Eutheria; Laurasiatheria; > Cetartiodactyla; Ruminantia; > Pecora; Bovidae; Bovinae; Bos. > REFERENCE 1 > AUTHORS Bendixen,E., Halkier,T., Magnusson,S., > Sottrup-Jensen,L. and > Kristensen,T. > TITLE Complete primary structure of bovine beta > 2-glycoprotein I: > localization of the disulfide bridges > JOURNAL Biochemistry 31 (14), 3611-3617 (1992) > PUBMED 1567819 > REFERENCE 2 (bases 1 to 9) > AUTHORS Kristensen,T. > TITLE Direct Submission > JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of > Mol Biology, > University of Aarhus, C F Mollers Alle 130, > DK-8000 Aarhus C, > DENMARK > FEATURES Location/Qualifiers > source 1..9 > /organism="Bos taurus" > /mol_type="mRNA" > /db_xref="taxon:9913" > /clone="pBB2I" > /tissue_type="liver" > gene <1..>9 > /gene="beta-2-gpI" > CDS <1..>9 > /gene="beta-2-gpI" > /codon_start=1 > /product="beta-2-glycoprotein I" > /protein_id="CAA42669.1" > /db_xref="GI:6" > /db_xref="GOA:P17690" > /db_xref="UniProtKB/Swiss-Prot:P17690" > > /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI > > VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT > > ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN > > SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN > > PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER > > VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT > DASDVKPC" > sig_peptide <1..>9 > /gene="beta-2-gpI" > ORIGIN > 1 ccagcgctc > // > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From niels at genomics.dk Mon Jul 2 20:41:24 2007 From: niels at genomics.dk (niels at genomics.dk) Date: Tue, 3 Jul 2007 02:41:24 +0200 (CEST) Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Message-ID: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Kevin, Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO gets entries from file, and from those large parsed entries I can get a simplified primary_seq object. But the SeqIO object includes feature and annotation objects etc that takes time to make, and I wish to know if there is a way to get a primari_seq object without this overhead. I apologize if I overlooked it in the docs. Niels > Start by having a look at the following link: > http://bioperl.org/cgi-bin/deob_interface.cgi > > SeqIO is how one reads or writes sequences to/from files. > Bio::PrimarySeq is just an object that holds information about a > sequence obtained from a file. > > As for how to parse a Genbank file into a list of features: > > $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); > while (my $seq = $file->next_seq()) > { > @features = $seq->all_SeqFeatures; > # sort features by their primary tags > for my $f (@features) > { > my $tag = $f->primary_tag; > if ($tag eq 'CDS') > { > # @sorted_features holds all the Bio::PrimarySeq > features obtained from the genbank file > push @sorted_features, $f; > } > } > } > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Niels Larsen >> Sent: Monday, July 02, 2007 1:45 PM >> Cc: bioperl-l List >> Subject: [Bioperl-l] simple PrimarySeq question >> >> I write hoping someone could show me how to create a >> PrimarySeq object without parsing features and all first. The >> lines below return >> >> "Can't locate object method "next_seq" via package >> "Bio::PrimarySeq" at ./tst2 line 16." >> >> whereas calling Bio::SeqIO-> gives no error, but a too big object. >> The GenBank record after the __END__ is the "1.gb" file. I >> could not find out how from the tutorial or the >> Bio::PrimarySeq description. >> >> Niels L >> >> >> #!/usr/bin/env perl >> >> use strict; >> use warnings FATAL => qw ( all ); >> >> use Data::Dumper; >> >> use Bio::Seq; >> use Bio::SeqIO; >> >> my ( $seq_h, $seq ); >> >> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >> -format => 'genbank' ); >> >> $seq = $seq_h->next_seq(); >> >> # print Dumper( $seq ); >> >> __END__ >> >> LOCUS X60065 9 bp mRNA linear >> MAM 14-NOV-2006 >> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >> ACCESSION X60065 REGION: 1..9 >> VERSION X60065.1 GI:5 >> KEYWORDS beta-2 glycoprotein I. >> SOURCE Bos taurus (cattle) >> ORGANISM Bos taurus >> Eukaryota; Metazoa; Chordata; Craniata; >> Vertebrata; Euteleostomi; >> Mammalia; Eutheria; Laurasiatheria; >> Cetartiodactyla; Ruminantia; >> Pecora; Bovidae; Bovinae; Bos. >> REFERENCE 1 >> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >> Sottrup-Jensen,L. and >> Kristensen,T. >> TITLE Complete primary structure of bovine beta >> 2-glycoprotein I: >> localization of the disulfide bridges >> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >> PUBMED 1567819 >> REFERENCE 2 (bases 1 to 9) >> AUTHORS Kristensen,T. >> TITLE Direct Submission >> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >> Mol Biology, >> University of Aarhus, C F Mollers Alle 130, >> DK-8000 Aarhus C, >> DENMARK >> FEATURES Location/Qualifiers >> source 1..9 >> /organism="Bos taurus" >> /mol_type="mRNA" >> /db_xref="taxon:9913" >> /clone="pBB2I" >> /tissue_type="liver" >> gene <1..>9 >> /gene="beta-2-gpI" >> CDS <1..>9 >> /gene="beta-2-gpI" >> /codon_start=1 >> /product="beta-2-glycoprotein I" >> /protein_id="CAA42669.1" >> /db_xref="GI:6" >> /db_xref="GOA:P17690" >> /db_xref="UniProtKB/Swiss-Prot:P17690" >> >> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >> >> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >> >> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >> >> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >> >> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >> >> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >> DASDVKPC" >> sig_peptide <1..>9 >> /gene="beta-2-gpI" >> ORIGIN >> 1 ccagcgctc >> // >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Mon Jul 2 22:36:19 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 22:36:19 -0400 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Message-ID: <84F5C120-FE0B-472D-8F1B-026AD238E959@gmx.net> Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have examples for what you want to do: use Bio::SeqIO; # usually you won't instantiate this yourself - a SeqIO object - # you will have one already my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank"); my $builder = $seqin->sequence_builder(); # if you need only sequence, id, and description (e.g. for # conversion to FASTA format): $builder->want_none(); $builder->add_wanted_slot('display_id','desc','seq'); # if you want everything except the sequence and features $builder->want_all(1); # this is the default if it's untouched $builder->add_unwanted_slot('seq','features'); Let us know if that doesn't answer your question. Note that this is currently only implemented for Genbank format. -hilmar On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote: > Kevin, > > Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO > gets entries from file, and from those large parsed entries I can > get a > simplified primary_seq object. But the SeqIO object includes feature > and annotation objects etc that takes time to make, and I wish to know > if there is a way to get a primari_seq object without this overhead. I > apologize if I overlooked it in the docs. > > Niels > > > > >> Start by having a look at the following link: >> http://bioperl.org/cgi-bin/deob_interface.cgi >> >> SeqIO is how one reads or writes sequences to/from files. >> Bio::PrimarySeq is just an object that holds information about a >> sequence obtained from a file. >> >> As for how to parse a Genbank file into a list of features: >> >> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); >> while (my $seq = $file->next_seq()) >> { >> @features = $seq->all_SeqFeatures; >> # sort features by their primary tags >> for my $f (@features) >> { >> my $tag = $f->primary_tag; >> if ($tag eq 'CDS') >> { >> # @sorted_features holds all the Bio::PrimarySeq >> features obtained from the genbank file >> push @sorted_features, $f; >> } >> } >> } >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Niels Larsen >>> Sent: Monday, July 02, 2007 1:45 PM >>> Cc: bioperl-l List >>> Subject: [Bioperl-l] simple PrimarySeq question >>> >>> I write hoping someone could show me how to create a >>> PrimarySeq object without parsing features and all first. The >>> lines below return >>> >>> "Can't locate object method "next_seq" via package >>> "Bio::PrimarySeq" at ./tst2 line 16." >>> >>> whereas calling Bio::SeqIO-> gives no error, but a too big object. >>> The GenBank record after the __END__ is the "1.gb" file. I >>> could not find out how from the tutorial or the >>> Bio::PrimarySeq description. >>> >>> Niels L >>> >>> >>> #!/usr/bin/env perl >>> >>> use strict; >>> use warnings FATAL => qw ( all ); >>> >>> use Data::Dumper; >>> >>> use Bio::Seq; >>> use Bio::SeqIO; >>> >>> my ( $seq_h, $seq ); >>> >>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >>> -format => 'genbank' ); >>> >>> $seq = $seq_h->next_seq(); >>> >>> # print Dumper( $seq ); >>> >>> __END__ >>> >>> LOCUS X60065 9 bp mRNA linear >>> MAM 14-NOV-2006 >>> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >>> ACCESSION X60065 REGION: 1..9 >>> VERSION X60065.1 GI:5 >>> KEYWORDS beta-2 glycoprotein I. >>> SOURCE Bos taurus (cattle) >>> ORGANISM Bos taurus >>> Eukaryota; Metazoa; Chordata; Craniata; >>> Vertebrata; Euteleostomi; >>> Mammalia; Eutheria; Laurasiatheria; >>> Cetartiodactyla; Ruminantia; >>> Pecora; Bovidae; Bovinae; Bos. >>> REFERENCE 1 >>> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >>> Sottrup-Jensen,L. and >>> Kristensen,T. >>> TITLE Complete primary structure of bovine beta >>> 2-glycoprotein I: >>> localization of the disulfide bridges >>> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >>> PUBMED 1567819 >>> REFERENCE 2 (bases 1 to 9) >>> AUTHORS Kristensen,T. >>> TITLE Direct Submission >>> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >>> Mol Biology, >>> University of Aarhus, C F Mollers Alle 130, >>> DK-8000 Aarhus C, >>> DENMARK >>> FEATURES Location/Qualifiers >>> source 1..9 >>> /organism="Bos taurus" >>> /mol_type="mRNA" >>> /db_xref="taxon:9913" >>> /clone="pBB2I" >>> /tissue_type="liver" >>> gene <1..>9 >>> /gene="beta-2-gpI" >>> CDS <1..>9 >>> /gene="beta-2-gpI" >>> /codon_start=1 >>> /product="beta-2-glycoprotein I" >>> /protein_id="CAA42669.1" >>> /db_xref="GI:6" >>> /db_xref="GOA:P17690" >>> /db_xref="UniProtKB/Swiss-Prot:P17690" >>> >>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >>> >>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >>> >>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >>> >>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >>> >>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >>> >>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >>> DASDVKPC" >>> sig_peptide <1..>9 >>> /gene="beta-2-gpI" >>> ORIGIN >>> 1 ccagcgctc >>> // >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ewijaya at gmail.com Tue Jul 3 02:56:30 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 14:56:30 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at gmail.com Tue Jul 3 03:00:16 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 15:00:16 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707030000t5ab77608x264d49125255a6d1@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at i2r.a-star.edu.sg Tue Jul 3 02:35:12 2007 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 3 Jul 2007 14:35:12 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net> Message-ID: <3ACF03E372996C4EACD542EA8A05E66A26EB85@mailbe01.teak.local.net> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward ------------ Institute For Infocomm Research - Disclaimer -------------This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.-------------------------------------------------------- From lstein at cshl.edu Tue Jul 3 10:41:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 3 Jul 2007 10:40:26 -0401 Subject: [Bioperl-l] Problem with GD.pm version 2.35 In-Reply-To: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> References: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Message-ID: <6dce9a0b0707030741r52b8d0beq757a8faf982e1f2f@mail.gmail.com> This happens when there is a mismatch between the compiled (.so) portion of GD and the perl (.pm) version. Typically it occurs when you have installed GD incorrectly by, e.g., copying the .pm file into position rather than using the make file. Solution: Uninstall old versions of GD by manually finding all occurrences of GD.so and GD.pm and removing them. Then reinstall the correct way. Lincoln On 7/3/07, Edward Wijaya wrote: > > Dear all, > I was trying to perform check with this command: > > $ perl -MGD -e 'print $GD::VERSION'; > > And it gave: > > GD object version 2.32 does not match $GD::VERSION 2.35 at > /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. > Compilation failed in require. > BEGIN failed--compilation aborted. > > Similarly my script that uses GD.pm doesn't execute. > > > I have installed the latest version of libgd version 2.0.35 downloaded > from > http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 > > Can anybody suggest how can I resolve my problem? > > This is my Perl version: > This is perl, v5.8.8 built for i386-linux-thread-multi > > -- > Edward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Jul 4 01:45:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 00:45:16 -0500 Subject: [Bioperl-l] genbank2gff3 - Name attribute? Message-ID: I noticed that genbank2gff3.pl doesn't have an explicitly defined way of converting the gene/locus/etc name to a Name tag (for, say, GBrowse). Any particular reason? Should I stick with GFF2 for now? chris From bix at sendu.me.uk Wed Jul 4 06:00:31 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 04 Jul 2007 11:00:31 +0100 Subject: [Bioperl-l] Splitting Bioperl Message-ID: <468B6FBF.1070708@sendu.me.uk> To summarise some previous threads: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/focus=15409 # Bioperl is currently one monolithic distribution of ~900 modules # There is some desire to split it up into smaller functional groups # There are some problems with that proposal # An extreme variant of that proposal is to make the groups individual modules Following this discussion: http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html (especially Adam Kennedy's postings of 4/07, soon to appear in that archive), the extreme variant doesn't seem like a good idea. I'm now suggesting that Steve's original split idea, as modified/expanded by Adam's driver and other ideas, is the best choice. The problems I previously identified can be solved in the same way they were solved in my extreme variant: the splits are done by Build.PL automation working on a single repository/code-base, not by splitting things up at the repository level. As I see it, the way forward now is for someone interested enough to decide on the specifics of how things will be split and offer them up to the group for discussion. I don't mean vague possibilities of what might work as a split, but rather some real thought should go into it to make sure the split makes sense and will actually work in practice. Following that, the splits can be implemented by some automated dist action of Build.PL. If there isn't sufficient interest to make this happen, I don't see that as a terrible thing. There are benefits to keeping Bioperl monolithic, and some of the problems (eg. lack of updates) can be solved without changing its nature. From cjfields at uiuc.edu Wed Jul 4 10:53:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 09:53:45 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <468B6FBF.1070708@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> Message-ID: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> On Jul 4, 2007, at 5:00 AM, Sendu Bala wrote: > To summarise some previous threads: > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/ > focus=15409 > > # Bioperl is currently one monolithic distribution of ~900 modules > # There is some desire to split it up into smaller functional groups > # There are some problems with that proposal > # An extreme variant of that proposal is to make the groups individual > modules > > > Following this discussion: > http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html > (especially Adam Kennedy's postings of 4/07, soon to appear in that > archive), the extreme variant doesn't seem like a good idea. brian d foy made some sound arguments against it as well. > I'm now suggesting that Steve's original split idea, as > modified/expanded by Adam's driver and other ideas, is the best > choice. > The problems I previously identified can be solved in the same way > they > were solved in my extreme variant: the splits are done by Build.PL > automation working on a single repository/code-base, not by splitting > things up at the repository level. > > As I see it, the way forward now is for someone interested enough to > decide on the specifics of how things will be split and offer them > up to > the group for discussion. I don't mean vague possibilities of what > might > work as a split, but rather some real thought should go into it to > make > sure the split makes sense and will actually work in practice. We've already identified a few (SearchIO, Tools, GBrowse-related, etc). ... > If there isn't sufficient interest to make this happen, I don't see > that > as a terrible thing. There are benefits to keeping Bioperl monolithic, > and some of the problems (eg. lack of updates) can be solved without > changing its nature. If so, proposals that solve this problem need to be made as well. If we stay monolithic, then here's mine: we start having fixed, regularly timed dev releases like Parrot, monthly or bimonthly (quite common on CPAN), with brief release reports on which bugs have been fixed, code has been added, so on. Not every bug has to be fixed per dev release; if that were true there would never be releases for some of the XML parser packages. No RCs for dev releases (it's a dev release!). These would be 1.x.y. We can then, every once in a while, have a bug-squashing session, hackathon, etc, and have regular non-dev release (1.x) that all core devs accept and that passes a particular milestone. As for the advantage of a split approach, as mentioned previously it is to focus modules/tests/scripts into groups with related functions. Even just splitting off ones with external reqs (XML parsers, GD, etc) into an 'aux' release would be an advantage, as it doesn't confront a new user with the burden of installing a large list of dependencies, some of which may be complicated for a perl newbie to either install from scratch (DBD::mysql, GD) or to get the latest bug-fixed prereq release for their OS (the recent debacle with XML::SAX::Expat issues come to mind, which wasn't immediately available for win32 as a PPM). I'm fairly open to any approach as long as it's reasonably though out, though I am admittedly a bit biased towards the split approach. I do think some change is in order; I worry about there ever being a 1.6 release at this point. chris From davila at ioc.fiocruz.br Wed Jul 4 13:11:20 2007 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Wed, 04 Jul 2007 14:11:20 -0300 Subject: [Bioperl-l] ESTs in EST format Message-ID: <468BD4B8.5050105@ioc.fiocruz.br> Dear All, I am trying to get all ESTs from a given species (eg: Trypanosoma brucei) from Genbank in EST format (eg: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucest&id=10280980)... while using Entrez I can "display" individual EST entries in EST format, this "EST format" is not an option in the main "display" menu for batch download ... I dont see the EST format listed (http://www.bioperl.org/wiki/Sequence_formats) among the ones that SeqIO deal with, so wonder there would another BioPerl module to do this ? any tips, would be greatly appreciated ;-) Kindest regards, Alberto From jason at bioperl.org Wed Jul 4 13:52:59 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 10:52:59 -0700 Subject: [Bioperl-l] ESTs in EST format In-Reply-To: <468BD4B8.5050105@ioc.fiocruz.br> References: <468BD4B8.5050105@ioc.fiocruz.br> Message-ID: Currently we don't support this format as far as I know it isn't a published standard nor is it a format that you NCBI distributes this data in flat format for (i.e. genbank dumps). Is there any reason why you can't get what you need from the GenBank format? http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? db=nucest&qty=1&c_start=1&list_uids=10280980&uids=&dopt=gb -jason On Jul 4, 2007, at 10:11 AM, Alberto Davila wrote: > Dear All, > > I am trying to get all ESTs from a given species (eg: Trypanosoma > brucei) from Genbank in EST format (eg: > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > db=nucest&id=10280980)... > while using Entrez I can "display" individual EST entries in EST > format, > this "EST format" is not an option in the main "display" menu for > batch > download ... > > I dont see the EST format listed > (http://www.bioperl.org/wiki/Sequence_formats) among the ones that > SeqIO > deal with, so wonder there would another BioPerl module to do > this ? any > tips, would be greatly appreciated ;-) > > Kindest regards, Alberto > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From dmessina at wustl.edu Wed Jul 4 14:37:22 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 4 Jul 2007 13:37:22 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> Message-ID: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > we start having fixed, > regularly timed dev releases like Parrot, monthly or bimonthly (quite > common on CPAN), with brief release reports on which bugs have been > fixed, code has been added, so on. Not every bug has to be fixed per > dev release; if that were true there would never be releases for some > of the XML parser packages. No RCs for dev releases (it's a dev > release!). These would be 1.x.y. We can then, every once in a > while, have a bug-squashing session, hackathon, etc, and have regular > non-dev release (1.x) that all core devs accept and that passes a > particular milestone. Regardless of whether we split or don't, I think these ideas of adding a little more structure to BioPerl's development cycles -- especially having bug-squashing and hacking sessions, where we all band together and commit some time to cranking through a bunch of to- dos -- would be beneficial, particularly as a means to keeping a certain basal level of momentum in BioPerl. Dave From jason at bioperl.org Wed Jul 4 15:45:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 12:45:29 -0700 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I definitely agree - we can live up to the unstable "living on the edge" nature of dev releases a bit more perhaps? On Jul 4, 2007, at 11:37 AM, David Messina wrote: > > On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > >> we start having fixed, >> regularly timed dev releases like Parrot, monthly or bimonthly (quite >> common on CPAN), with brief release reports on which bugs have been >> fixed, code has been added, so on. Not every bug has to be fixed per >> dev release; if that were true there would never be releases for some >> of the XML parser packages. No RCs for dev releases (it's a dev >> release!). These would be 1.x.y. We can then, every once in a >> while, have a bug-squashing session, hackathon, etc, and have regular >> non-dev release (1.x) that all core devs accept and that passes a >> particular milestone. > > > Regardless of whether we split or don't, I think these ideas of > adding a little more structure to BioPerl's development cycles -- > especially having bug-squashing and hacking sessions, where we all > band together and commit some time to cranking through a bunch of to- > dos -- would be beneficial, particularly as a means to keeping a > certain basal level of momentum in BioPerl. > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Wed Jul 4 16:54:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 15:54:14 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I think what's partially responsible for slowing down releases is the expectation that each dev release is supposed to have all bugs fixed, work for every OS, etc. In other words, act like a stable release. A developer release by nature is living on the edge, so why not have regular dev releases? We keep telling users to update to using bioperl-live whenever something breaks, anyway. We could decide to split stuff off along the way into more 'stable' sections if there were more demand for it, and have the more API-volatile code (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the 'dev' tag until we feel it's ready for prime time. chris On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > I definitely agree - we can live up to the unstable "living on the > edge" nature of dev releases a bit more perhaps? > > > On Jul 4, 2007, at 11:37 AM, David Messina wrote: > >> >> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: >> >>> we start having fixed, >>> regularly timed dev releases like Parrot, monthly or bimonthly >>> (quite >>> common on CPAN), with brief release reports on which bugs have been >>> fixed, code has been added, so on. Not every bug has to be fixed >>> per >>> dev release; if that were true there would never be releases for >>> some >>> of the XML parser packages. No RCs for dev releases (it's a dev >>> release!). These would be 1.x.y. We can then, every once in a >>> while, have a bug-squashing session, hackathon, etc, and have >>> regular >>> non-dev release (1.x) that all core devs accept and that passes a >>> particular milestone. >> >> >> Regardless of whether we split or don't, I think these ideas of >> adding a little more structure to BioPerl's development cycles -- >> especially having bug-squashing and hacking sessions, where we all >> band together and commit some time to cranking through a bunch of to- >> dos -- would be beneficial, particularly as a means to keeping a >> certain basal level of momentum in BioPerl. >> >> Dave >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Thu Jul 5 04:09:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 09:09:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: <468CA721.4020804@sheffield.ac.uk> Chris Fields wrote: > I think what's partially responsible for slowing down releases is the > expectation that each dev release is supposed to have all bugs fixed, > work for every OS, etc. In other words, act like a stable release. > > A developer release by nature is living on the edge, so why not have > regular dev releases? We keep telling users to update to using > bioperl-live whenever something breaks, anyway. We could decide to > split stuff off along the way into more 'stable' sections if there > were more demand for it, and have the more API-volatile code > (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the > 'dev' tag until we feel it's ready for prime time. > > chris > > On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > > -- snip -- I agree, although would the dev releases still need to pass all the tests? I'm thinking of people installing via CPAN. I also agree with what was said in a previous post about bringing back bioperl-run (and some others) back into the same repository as bioperl-core (after a successful move over to svn) and have Build.PL deal with creating the packages etc for CPAN. This would hopefully help keep the run package (and others) up to speed with the core package. I also agree with previous posts about organising and/or having some naming convention for test data files. I think an approach whereby data files were organised into directory trees (1 - 3 deep) with names that elude to the type of data in that subtree/file rather than the tests that use it etc. For example: t/data |__ formats | |__ seq | | |__ legal_fasta | | | |__ extension.fas | | | |__ extension.fasta | | | |__ extension.foo | | | |__ extension.bar | | | |__ no_extension | | | |__ interleaved.fas | | | |__ non_interleaved.fas | | | |__ single_seq.fas | | | |__ multiple_seq.fas | | | |__ desc_line1.fas | | | |__ desc_line2.fas | | | | | |__ illegal_fasta | | | |__ illegal_chars.fas | | | |__ some_other_illegal_alternative.fas | | | | | |__ legal_genbank | | | |__ etc etc | | | | | |__ illegal_genank | | |__ etc etc | | | |__ aln | |__ blast | | |__ legal_blastx | | | | | |__ legal_blastp | | | | | |__ legal_tblastx | | | | | |__ legal_plastpsi | | | | | |__ legal_wublast | |__ foo | |__ bar | |__ misc | |__ etc This type of setup, might lend itself to having a test script simply try to parse all the files in a directory to ensure nothing fails (for legal file formats) and fails for illegal formats. Naming of the file paths would help test authors to identify a suitable data file for their own tests before adding their own to the t/data dir. It might also help to identify areas where example test data is currently lacking. Thinking about this a little more, I think it would be a good idea to include Test::Exception in t/lib. We should also be testing that warnings and exceptions are generated when expected - e.g. illegal characters in seq files etc etc. Without these sorts of tests we are only getting half the story. This testing might account for a large chunk of the poor test coverage, particularly when it comes to branches in the code. Anyway, this type of reorganisation couldn't take place until the svn repo is up and working. I'd appreciate any comments on the above! Nath From bix at sendu.me.uk Thu Jul 5 04:55:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 09:55:25 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <468CB1FD.7060301@sendu.me.uk> Nathan S. Haigh wrote: > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Yes, they'd all have to pass. 'Developer release' should never have the connotation of 'broken release'. However, getting all tests to pass is a lot easier than fixing all bugs in bugzilla. (... which actually goes to show how poor our tests are) Worst case, if we were forced to stick to a schedule but couldn't fix a failing test, we could always make it a 'todo' test. > I also agree with what was said in a previous post about bringing back > bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) Agree (with myself essentially). > I also agree with previous posts about organising and/or having some > naming convention for test data files. I think an approach whereby data > files were organised into directory trees (1 - 3 deep) with names that > elude to the type of data in that subtree/file rather than the tests > that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas [snip] At that level, files don't need extensions and can have fully informative names that explain what's interesting or special about them. > This type of setup, might lend itself to having a test script simply try > to parse all the files in a directory to ensure nothing fails (for legal > file formats) and fails for illegal formats. Great idea. > Thinking about this a little more, I think it would be a good idea to > include Test::Exception in t/lib. Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > Anyway, this type of reorganisation couldn't take place until the svn > repo is up and working. Agree. From bix at sendu.me.uk Thu Jul 5 05:39:10 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 10:39:10 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <468CBC3E.1020408@sendu.me.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Thinking about this a little more, I think it would be a good idea to >> include Test::Exception in t/lib. > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. I've now done that: BioperlTest loads Test::Exception, from the copy in t/lib if necessary. So, in BioperlTest-using scripts you now have access to the methods dies_ok, lives_ok, throws_ok and lives_and. From N.Haigh at sheffield.ac.uk Thu Jul 5 06:01:04 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 11:01:04 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <1183629664.468cc1609891a@webmail.shef.ac.uk> Quoting Sendu Bala : -- snip -- > > > > I also agree with previous posts about organising and/or having some > > naming convention for test data files. I think an approach whereby data > > files were organised into directory trees (1 - 3 deep) with names that > > elude to the type of data in that subtree/file rather than the tests > > that use it etc. For example: > > > > t/data > > |__ formats > > | |__ seq > > | | |__ legal_fasta > > | | | |__ extension.fas > [snip] > > At that level, files don't need extensions and can have fully > informative names that explain what's interesting or special about them. > You may be correct in most cases, however, isn't there a method for detecting the file format from the file extension and failing that it peeks inside the file? Therefore there should be a file extension for each of these to get good code coverage as well as each format not having an extension to check that the peek inside the file correctly determines the format. -- snip -- From bix at sendu.me.uk Thu Jul 5 06:04:16 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:04:16 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <1183629664.468cc1609891a@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> Message-ID: <468CC220.804@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Sendu Bala : > > -- snip -- >> >>> I also agree with previous posts about organising and/or having >>> some naming convention for test data files. I think an approach >>> whereby data files were organised into directory trees (1 - 3 >>> deep) with names that elude to the type of data in that >>> subtree/file rather than the tests that use it etc. For example: >>> >>> t/data |__ formats | |__ seq | | |__ >>> legal_fasta | | | |__ extension.fas >>> >> [snip] >> >> At that level, files don't need extensions and can have fully >> informative names that explain what's interesting or special about >> them. >> > > You may be correct in most cases, however, isn't there a method for > detecting the file format from the file extension and failing that it > peeks inside the file? Therefore there should be a file extension for > each of these to get good code coverage as well as each format not > having an extension to check that the peek inside the file correctly > determines the format. Yes, you're quite correct. From bix at sendu.me.uk Thu Jul 5 06:47:12 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:47:12 +0100 Subject: [Bioperl-l] Warnings Message-ID: <468CCC30.90406@sendu.me.uk> I'm trying to get Test::Warn to work with Bioperl warnings as produced by Bio::Root::RootI::warn(). However, afaict the warnings must be generated with CORE::warn(), not print STDERR. Is there any particular reason RootI::warn is done with print and not CORE::warn ? Can I change it to a warn? From bix at sendu.me.uk Thu Jul 5 09:04:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:04:50 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <200707051458.59921.heikki@sanbi.ac.za> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> Message-ID: <468CEC72.4090909@sendu.me.uk> Heikki Lehvaslaiho wrote: > My guess is that using 'print STDERR' avoids showing sometimes annoying > errordescription at programname line NN > syntax being used. Afaik, CORE::warn "anything\n"; never includes the line number: messages with a new line always disable that feature. Bio::Root::RootI::warn /always/ puts new lines into the message, so they /never/ have the line number. > On the other hand, the main reason we need to set verbosity to 1 in BioPerl > objects is to find where warnings are coming from. Maybe extra text in > warnings leads to easier debugging. > > I favour changing it. So its my understanding there will be absolutely no difference in behaviour following this change (except that warning can be caught by Test::Warn). I just wanted to confirm my understanding. From hlapp at gmx.net Thu Jul 5 09:07:27 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 5 Jul 2007 09:07:27 -0400 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > Chris Fields wrote: >> I think what's partially responsible for slowing down releases is the >> expectation that each dev release is supposed to have all bugs fixed, >> work for every OS, etc. In other words, act like a stable release. >> It doesn't. A stable release has a stable API that will be supported until the next stable release through point releases. >> A developer release by nature is living on the edge, so why not have >> regular dev releases? There's no problem with regular dev releases, but tests will need to pass. There was never a stipulation that all bugs need to have been fixed. But all tests need to pass, so in an ideal world (in which everything is being tested) all tests passing would imply all (known) bugs fixed. Obviously, we don't live in an ideal world ... If not everything passes then what is the big difference to a code snapshot? If using cvs (or svn) is too difficult for most people, we can consider creating a mechanism that puts up nightly snapshots for download. > -- snip -- > > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. For example, that's another point. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From heikki at sanbi.ac.za Thu Jul 5 09:12:37 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 15:12:37 +0200 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <200707051512.38185.heikki@sanbi.ac.za> One more suggestion: It would be extemaly useful if we had a standard way of testing that a when a file is read into a bioperl object and then written out again into a same format, the input and output files are identical. If not, the test should show where the the differences start (showing all the differences would just clutter the screen). This standard method/subroutine should be used to test all sequence and other text file IO. Any takers? -Heikki On Thursday 05 July 2007 11:39:10 Sendu Bala wrote: > Sendu Bala wrote: > > Nathan S. Haigh wrote: > >> Thinking about this a little more, I think it would be a good idea to > >> include Test::Exception in t/lib. > > > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Jul 5 08:58:59 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 14:58:59 +0200 Subject: [Bioperl-l] Warnings In-Reply-To: <468CCC30.90406@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> Message-ID: <200707051458.59921.heikki@sanbi.ac.za> My guess is that using 'print STDERR' avoids showing sometimes annoying errordescription at programname line NN syntax being used. On the other hand, the main reason we need to set verbosity to 1 in BioPerl objects is to find where warnings are coming from. Maybe extra text in warnings leads to easier debugging. I favour changing it. -Heikki On Thursday 05 July 2007 12:47:12 Sendu Bala wrote: > I'm trying to get Test::Warn to work with Bioperl warnings as produced > by Bio::Root::RootI::warn(). However, afaict the warnings must be > generated with CORE::warn(), not print STDERR. > > Is there any particular reason RootI::warn is done with print and not > CORE::warn ? Can I change it to a warn? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Thu Jul 5 09:44:08 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:44:08 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF5A8.7040402@sendu.me.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that > a when a file is read into a bioperl object and then written out > again into a same format, the input and output files are identical. As Hilmar has pointed out in the past, Bioperl doesn't aim for the files to be identical, only for none of the information to be lost and to be ouput in the correct format. So a round-trip test should read in the original, store all the parsed data, write it out, then read in the written version and see if the new parsed data matches the original. For simpler or ultra-strict file formats, though... > If not, the test should show where the the differences start (showing > all the differences would just clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other text file IO. > > Any takers? There's already something along these lines in t/SeqIO.t (the section that uses Algorithm::Diff). I copied that over from the old testformats.pl script but haven't really taken the time to see if its a good way of doing the test. Is it? Can someone come up with something better? Can someone generalise it if necessary? I imagine you could just read the files into arrays and use Test::More::is_deeply(). If that would be satisfactory I could easily add a little method to BioperlTest that did that. From n.haigh at sheffield.ac.uk Thu Jul 5 09:47:24 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 14:47:24 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF66C.2070907@sheffield.ac.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that a when a > file is read into a bioperl object and then written out again into a same > format, the input and output files are identical. If not, the test should > show where the the differences start (showing all the differences would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence and other > text file IO. > > Any takers? > > -Heikki > Wouldn't this require info about the formatting of the file to be stored in the object as well, such that the same formatting could be used when writing the file? Wouldn't a better approach be to read the contents of file1 into ojb1, write obj1 to file2 in the same format, and then read file2 into obj2 and compare obj1 to obj2 to ensure we have all the same data. Nath From cjfields at uiuc.edu Thu Jul 5 09:52:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 08:52:12 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote: > ... > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Remains to be decided. All current tests (net and non-non) should pass. Any bug fixes should try to have added tests if possible, with in-process stuff as TODO's. Network tests are left up to user discretion, so if they fail for any particular reason there is a way around them. > I also agree with what was said in a previous post about bringing > back bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) and have > Build.PL deal with creating the packages etc for CPAN. This would > hopefully help keep the run package (and others) up to speed with > the core package. It's up to how we want to have everything split. I don't think it's immediately prescient (there are more important priorities, i.e. bugs, svn) but I would say folding everything back into live and 'splitting' them out using an automated Build process is a viable option. > I also agree with previous posts about organising and/or having > some naming convention for test data files. I think an approach > whereby data files were organised into directory trees (1 - 3 deep) > with names that elude to the type of data in that subtree/file > rather than the tests that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas > | | | |__ extension.fasta > | | | |__ extension.foo > | | | |__ extension.bar > | | | |__ no_extension > | | | |__ interleaved.fas > | | | |__ non_interleaved.fas > | | | |__ single_seq.fas > | | | |__ multiple_seq.fas > | | | |__ desc_line1.fas > | | | |__ desc_line2.fas > | | | > | | |__ illegal_fasta > | | | |__ illegal_chars.fas > | | | |__ > some_other_illegal_alternative.fas > | | | > | | |__ legal_genbank > | | | |__ etc etc > | | | > | | |__ illegal_genank > | | |__ etc etc > | | > | |__ aln > | |__ blast > | | |__ legal_blastx > | | | > | | |__ legal_blastp > | | | > | | |__ legal_tblastx > | | | > | | |__ legal_plastpsi > | | | > | | |__ legal_wublast > | |__ foo > | |__ bar > | |__ misc > | > |__ etc > > This type of setup, might lend itself to having a test script > simply try to parse all the files in a directory to ensure nothing > fails (for legal file formats) and fails for illegal formats. > Naming of the file paths would help test authors to identify a > suitable data file for their own tests before adding their own to > the t/data dir. It might also help to identify areas where example > test data is currently lacking. ... This seems like more of a 'guess sequence' and format validation issue, something we've talked about before: http://bugzilla.open-bio.org/show_bug.cgi?id=1508 The way I feel about it is sequence format validation and sequence parsing should be separate issues and therefore in separate classes (with parsing optionally preceded by validation), but that's something for another discussion. > Thinking about this a little more, I think it would be a good idea > to include Test::Exception in t/lib. We should also be testing that > warnings and exceptions are generated when expected - e.g. illegal > characters in seq files etc etc. Without these sorts of tests we > are only getting half the story. This testing might account for a > large chunk of the poor test coverage, particularly when it comes > to branches in the code. > > Anyway, this type of reorganisation couldn't take place until the > svn repo is up and working. > > I'd appreciate any comments on the above! > Nath chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:08:29 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:08:29 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CF5A8.7040402@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> Message-ID: <468CFB5D.6080406@sheffield.ac.uk> Is there a way to install all the modules that are used in the tests? I mean there are cases where tests are skipped and pass if the required module for testing is not installed. Therefore, missing out a chunk of the tests. It would be desirable to be able to install all these modules in order to complete they whole test suite - any ideas if/how this can be done? Cheers Nath From bix at sendu.me.uk Thu Jul 5 10:15:34 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 15:15:34 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: <468CFD06.3080604@sendu.me.uk> Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these modules > in order to complete they whole test suite - any ideas if/how this can > be done? Yes, add them as recommended (or perhaps 'build_requires') modules in Build.PL, then run Build.PL and install the modules when it asks you. Everything should be in Build.PL already. If I missed something, please add it. From cjfields at uiuc.edu Thu Jul 5 10:18:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:08 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 9:08 AM, Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the > tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these > modules > in order to complete they whole test suite - any ideas if/how this can > be done? > > Cheers > Nath That's optionally done upon 'perl Build.PL', correct? So if you choose not to install a particular prereq (i.e. XML::SAX), you shouldn't be forced to install it later just for tests. Or am I misunderstanding you? chris From cjfields at uiuc.edu Thu Jul 5 10:18:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:23 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CC220.804@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > Nathan S. Haigh wrote: >> Quoting Sendu Bala : >>> ... >>> At that level, files don't need extensions and can have fully >>> informative names that explain what's interesting or special about >>> them. >>> >> >> You may be correct in most cases, however, isn't there a method for >> detecting the file format from the file extension and failing that it >> peeks inside the file? Therefore there should be a file extension for >> each of these to get good code coverage as well as each format not >> having an extension to check that the peek inside the file correctly >> determines the format. > > Yes, you're quite correct. I actually like Sendu's idea more, or the idea of each test suite having it's own directory. Tests which need to guess/validate the format are probably best left sequestered to a specific suite focused on format guessing/ validation, at least in my opinion. chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:22:40 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:22:40 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFD06.3080604@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> Message-ID: <468CFEB0.80201@sheffield.ac.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Is there a way to install all the modules that are used in the tests? >> I mean there are cases where tests are skipped and pass if the >> required module for testing is not installed. Therefore, missing out a >> chunk of the tests. It would be desirable to be able to install all >> these modules in order to complete they whole test suite - any ideas >> if/how this can be done? > > Yes, add them as recommended (or perhaps 'build_requires') modules in > Build.PL, then run Build.PL and install the modules when it asks you. > > Everything should be in Build.PL already. If I missed something, please > add it. > OK, to clarify using the test file Sendu mentioned in a previous post: t/SeqIO.t This test skips tests if Algorithm::Diff, IO::ScalarArray or IO::String are not installed (the first two are not mentioned in Build.PL). However, if there are a lot of such skips in the whole test suite then there maybe few system with all these modules installed in order to conduct a complete test. These are the modules I'm referring to. Nath From n.haigh at sheffield.ac.uk Thu Jul 5 10:30:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:30:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: <468D006D.6050806@sheffield.ac.uk> Chris Fields wrote: > > On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > >> Nathan S. Haigh wrote: >>> Quoting Sendu Bala : >>>> ... >>>> At that level, files don't need extensions and can have fully >>>> informative names that explain what's interesting or special about >>>> them. >>>> >>> >>> You may be correct in most cases, however, isn't there a method for >>> detecting the file format from the file extension and failing that it >>> peeks inside the file? Therefore there should be a file extension for >>> each of these to get good code coverage as well as each format not >>> having an extension to check that the peek inside the file correctly >>> determines the format. >> >> Yes, you're quite correct. > > I actually like Sendu's idea more, or the idea of each test suite having > it's own directory. > > Tests which need to guess/validate the format are probably best left > sequestered to a specific suite focused on format guessing/validation, > at least in my opinion. > > chris How easily would this lend itself to using the same data for multiple tests, or is it likely to lead to/exacerbate a culture of adding duplicate data files in each "test suite" rather than reusing? Nath From cjfields at uiuc.edu Thu Jul 5 10:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:33:46 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 8:07 AM, Hilmar Lapp wrote: > On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > >> Chris Fields wrote: >>> I think what's partially responsible for slowing down releases is >>> the >>> expectation that each dev release is supposed to have all bugs >>> fixed, >>> work for every OS, etc. In other words, act like a stable release. > > It doesn't. A stable release has a stable API that will be > supported until the next stable release through point releases. I agree, but I think there is still an expectation that 1.5.2 and beyond are more like true 'stable' releases even though we still designate them as 'developer.' We unfortunately reinforce that when we tell users they need to update to v. 1.5.2 or bioperl-live to fix a particular bug in the 1.4 release. There's nothing we can do about that now (hindsight is always 20/20, and 1.4 is just too old). We (pumpkin, core devs) can try correcting that by ensuring any bug fixes be committed to any new stable branch as well as to live, at least until it becomes too problematic to maintain that particular stable branch (at which point we would go about getting ready for the next 'stable' and repeat the cycle over again). >>> A developer release by nature is living on the edge, so why not have >>> regular dev releases? > > There's no problem with regular dev releases, but tests will need > to pass. There was never a stipulation that all bugs need to have > been fixed. But all tests need to pass, so in an ideal world (in > which everything is being tested) all tests passing would imply all > (known) bugs fixed. Obviously, we don't live in an ideal world ... ...particularly when it comes to network-related tests and remote server problems (but those are by default not run, so there is a way around test fails there). I agree here as well (all tests must pass). As for the bug fixes, we can just stipulate which ones were fixed with the release (in a RELEASE_NOTES or similar), and maybe have TODO's in the test suite designating they are being worked on. Basically, at regular intervals, maybe with a few weeks of lead time, the pumpkin would announce an impending dev. release. Go through rounds of tests, bug fixes, etc. When all tests pass post it on CPAN as a dev. release. If we have a stable release branch with relevant bug fixes we can post that as well, again to the point where it becomes too problematic. Would we just take a snapshot of MAIN and any relevant stable branch at that particular point for the CPAN release, just increasing the version number (1.x.y)? Would it make sense to have a 1.x.y branch for each release (I don't think so, but maybe others disagree)? > If not everything passes then what is the big difference to a code > snapshot? If using cvs (or svn) is too difficult for most people, > we can consider creating a mechanism that puts up nightly snapshots > for download. If we feel a nightly snapshot is warranted we could do that though. I personally don't think there is a need, particularly since we have several means to obtain the latest code at any point in time (including the browsable CVS 'Download tarball'). We could state the next dev/stable CPAN release (pending on date dd/mm/yy) will have the bug fix, and if they want it immediately then pick it up from CVS. >> -- snip -- >> >> I agree, although would the dev releases still need to pass all the >> tests? I'm thinking of people installing via CPAN. > > For example, that's another point. > > -hilmar Yes, I agree. As an aside, I don't think dev. releases pop up when you run a simple 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know the answer to that. chris From cjfields at uiuc.edu Thu Jul 5 10:34:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:34:22 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > One more suggestion: > > It would be extemaly useful if we had a standard way of testing > that a when a > file is read into a bioperl object and then written out again into > a same > format, the input and output files are identical. If not, the test > should > show where the the differences start (showing all the differences > would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other > text file IO. > > Any takers? > > -Heikki ... I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t that do some checking, I think, but something like this would be of use. However, what if the test file is old (as many in t/data are) and the format has changed? GenBank and EMBL, for instance, have gone through several changes to format. chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:43:51 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:43:51 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <468D03A7.3090408@sheffield.ac.uk> Chris Fields wrote: -- snip -- >>> >>> I agree, although would the dev releases still need to pass all the >>> tests? I'm thinking of people installing via CPAN. >> >> For example, that's another point. >> >> -hilmar > > Yes, I agree. > > As an aside, I don't think dev. releases pop up when you run a simple > 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know > the answer to that. > > chris Thats right, it'll only install the non-developer releases (1.4 currently). If you want to install the developer release from CPAN you need to know the path the archive and then do: cpan> install S/SE/SENDU/bioperl-1.5.2_102.tar.gz as detailed on the wiki: http://www.bioperl.org/wiki/Release_1.5.2 Nath From cjfields at uiuc.edu Thu Jul 5 10:49:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:49:33 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFEB0.80201@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> Message-ID: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > Sendu Bala wrote: >> ... >> Yes, add them as recommended (or perhaps 'build_requires') modules in >> Build.PL, then run Build.PL and install the modules when it asks you. >> >> Everything should be in Build.PL already. If I missed something, >> please >> add it. >> > > OK, to clarify using the test file Sendu mentioned in a previous post: > t/SeqIO.t > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > IO::String > are not installed (the first two are not mentioned in Build.PL). > However, if there are a lot of such skips in the whole test suite then > there maybe few system with all these modules installed in order to > conduct a complete test. These are the modules I'm referring to. > > Nath If they are only necessary for tests, work for all OSs, and are pure Perl they should be added to t/lib, like Test::More and the rest. If they only work for some OSs they could be added to t/lib and skip based on OS, but they still must be pure Perl. I would avoid anything that requires any compiling for XS or Inline altogether (I don't want to go down the nightmare road of OS-dependent compiler issues for a few tests). Finally, if they are needed for core modules (not just tests) then they should be added to the core prereqs in Build. chris From cjfields at uiuc.edu Thu Jul 5 10:52:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:52:58 -0500 Subject: [Bioperl-l] Warnings In-Reply-To: <468CEC72.4090909@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> Message-ID: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > ... > > So its my understanding there will be absolutely no difference in > behaviour following this change (except that warning can be caught by > Test::Warn). I just wanted to confirm my understanding. You can always just try it out and run tests. Might be interesting to see if anything breaks. chris From N.Haigh at sheffield.ac.uk Thu Jul 5 10:58:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 15:58:30 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> Message-ID: <1183647510.468d07168963c@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > > > > One more suggestion: > > > > It would be extemaly useful if we had a standard way of testing > > that a when a > > file is read into a bioperl object and then written out again into > > a same > > format, the input and output files are identical. If not, the test > > should > > show where the the differences start (showing all the differences > > would just > > clutter the screen). > > > > This standard method/subroutine should be used to test all sequence > > and other > > text file IO. > > > > Any takers? > > > > -Heikki > ... > > I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t > that do some checking, I think, but something like this would be of > use. However, what if the test file is old (as many in t/data are) > and the format has changed? GenBank and EMBL, for instance, have > gone through several changes to format. > > chris > > Is there any way to distinguish variants apart other than just layout? e.g. a version number of the likes? Nath From N.Haigh at sheffield.ac.uk Thu Jul 5 11:04:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 16:04:30 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> Message-ID: <1183647870.468d087ed4c80@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > > > Sendu Bala wrote: > >> ... > >> Yes, add them as recommended (or perhaps 'build_requires') modules in > >> Build.PL, then run Build.PL and install the modules when it asks you. > >> > >> Everything should be in Build.PL already. If I missed something, > >> please > >> add it. > >> > > > > OK, to clarify using the test file Sendu mentioned in a previous post: > > t/SeqIO.t > > > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > > IO::String > > are not installed (the first two are not mentioned in Build.PL). > > However, if there are a lot of such skips in the whole test suite then > > there maybe few system with all these modules installed in order to > > conduct a complete test. These are the modules I'm referring to. > > > > Nath > > If they are only necessary for tests, work for all OSs, and are pure > Perl they should be added to t/lib, like Test::More and the rest. If > they only work for some OSs they could be added to t/lib and skip > based on OS, but they still must be pure Perl. I would avoid > anything that requires any compiling for XS or Inline altogether (I > don't want to go down the nightmare road of OS-dependent compiler > issues for a few tests). If this is the case, there surely is no need to skip the tests if they should be provided in the t/lib dir. Am I missing something!? > > Finally, if they are needed for core modules (not just tests) then > they should be added to the core prereqs in Build. > > chris > From bix at sendu.me.uk Thu Jul 5 11:13:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:13:35 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: <468D0A9F.4010709@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Chris Fields : >>> OK, to clarify using the test file Sendu mentioned in a previous >>> post: t/SeqIO.t >>> >>> This test skips tests if Algorithm::Diff, IO::ScalarArray or >>> IO::String are not installed >> >> If they are only necessary for tests, work for all OSs, and are >> pure Perl they should be added to t/lib, like Test::More and the >> rest. If they only work for some OSs they could be added to t/lib >> and skip based on OS, but they still must be pure Perl. I would >> avoid anything that requires any compiling for XS or Inline >> altogether (I don't want to go down the nightmare road of >> OS-dependent compiler issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? That skip in SeqIO.t is new and I simply didn't think of them as important enough to make anyone install them or include them in t/lib. I'd go ahead and add those modules, but like I say, it may make more sense just to use is_deeply(), removing the dependency on Algorithm::Diff and IO::ScalarArray completely. From cjfields at uiuc.edu Thu Jul 5 11:35:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:35:41 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: On Jul 5, 2007, at 10:04 AM, Nathan S. Haigh wrote: > ... >> If they are only necessary for tests, work for all OSs, and are pure >> Perl they should be added to t/lib, like Test::More and the rest. If >> they only work for some OSs they could be added to t/lib and skip >> based on OS, but they still must be pure Perl. I would avoid >> anything that requires any compiling for XS or Inline altogether (I >> don't want to go down the nightmare road of OS-dependent compiler >> issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? No, you are correct, but these are currently not in t/lib (unless someone snuck them in....) Of the modules you listed above, only one (IO::String) is required by the core modules. The others are not. Users shouldn't be forced to install Algorithm::Diff or IO::ScalarArray just to run tests, so anything not required should go into t/lib if at all possible. If there any reasons (OS issues, list of prereqs) which preclude adding these to t/lib we need to ask ourselves (1) why we are using that module in the first place? And, if there is a good reason, (2) can we skip them if they aren't present? Both of those options are already available. chris From cjfields at uiuc.edu Thu Jul 5 11:50:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:50:55 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468D006D.6050806@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> <468D006D.6050806@sheffield.ac.uk> Message-ID: <404EEDE8-53AC-411E-B4F0-CF4B4AABE9E0@uiuc.edu> On Jul 5, 2007, at 9:30 AM, Nathan S. Haigh wrote: > ... >> I actually like Sendu's idea more, or the idea of each test suite >> having it's own directory. >> Tests which need to guess/validate the format are probably best >> left sequestered to a specific suite focused on format guessing/ >> validation, at least in my opinion. >> chris > > > How easily would this lend itself to using the same data for > multiple tests, or is it likely to lead to/exacerbate a culture of > adding duplicate data files in each "test suite" rather than reusing? > > Nath If there is a group of test data used for more than one test suite we can group those together into a common use folder, or we can go by format. I'm pretty open to anything, really, as long as it is more organized. My point is really concerned more with validation/guessing. I think we should limit those tests to their respective specific test suites, or even to sections within a particular test suite (for instance, genbank.t), but not to force sequence guessing or validation in other cases. To me validation, guessing, and parsing are three distinct issues (much like XML parsers handle things), so they require three distinct tests. As for true sequence validation, there is no official format validation scheme yet in BioPerl. It's sort of unofficially intergrated into the sequence parsers themselves (something which I find to be problematic for several reasons too long to outline here). chris From cjfields at uiuc.edu Thu Jul 5 11:54:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:54:42 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <1183647510.468d07168963c@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> <1183647510.468d07168963c@webmail.shef.ac.uk> Message-ID: <48474A2C-2A58-4D51-8E7F-7CE083948D0F@uiuc.edu> On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote: > Quoting Chris Fields : > >> >> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: >> >>> >>> One more suggestion: >>> >>> It would be extemaly useful if we had a standard way of testing >>> that a when a >>> file is read into a bioperl object and then written out again into >>> a same >>> format, the input and output files are identical. If not, the test >>> should >>> show where the the differences start (showing all the differences >>> would just >>> clutter the screen). >>> >>> This standard method/subroutine should be used to test all sequence >>> and other >>> text file IO. >>> >>> Any takers? >>> >>> -Heikki >> ... >> >> I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t >> that do some checking, I think, but something like this would be of >> use. However, what if the test file is old (as many in t/data are) >> and the format has changed? GenBank and EMBL, for instance, have >> gone through several changes to format. >> >> chris >> >> > > Is there any way to distinguish variants apart other than just > layout? e.g. a version number of the likes? > > Nath I don't think so; this veers back into the whole validation issue (i.e. does the record fit certain specifications). There are examples of seq records from different sources which bioperl is expected to parse, for example Ensembl GenBank records. Some of those have feature tags or annotation fields which may not appear in output when using write_seq(). I don't think it's as important to replicate the output data exactly like the input as much as it's important to have the data represented in a Bio::Seq object (or any other Bio* instance) in a consistent manner and have the ability to incorporate new fields (such as the recent addition of genome projects) transparently. The latter is hard to do with the current genbank parser (you have to specifically code for it), but it is a bit easier to do with the driver-handler model I'm working on. chris From bix at sendu.me.uk Thu Jul 5 11:56:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:56:29 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <468D14AD.8050007@sendu.me.uk> Sendu Bala wrote: > Sendu Bala wrote: >> Nathan S. Haigh wrote: >>> Thinking about this a little more, I think it would be a good idea to >>> include Test::Exception in t/lib. >> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. And I've also now added in support for Test::Warn, giving you warning_is, warnings_are, warning_like and warnings_like. I've updated the HOWTO as well: http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests You can see these things in action in t/seq_quality.t From bix at sendu.me.uk Thu Jul 5 11:57:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:57:23 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> Message-ID: <468D14E3.6030104@sendu.me.uk> Chris Fields wrote: > > On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > >> ... >> >> So its my understanding there will be absolutely no difference in >> behaviour following this change (except that warning can be caught by >> Test::Warn). I just wanted to confirm my understanding. > > You can always just try it out and run tests. Might be interesting to > see if anything breaks. I've made the change. Everything seems ok as far as I can tell. From dmessina at wustl.edu Thu Jul 5 12:02:26 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:02:26 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 9:33 AM, Chris Fields wrote: > I agree, but I think there is still an expectation that 1.5.2 and > beyond are more like true 'stable' releases even though we still > designate them as 'developer.' We unfortunately reinforce that when > we tell users they need to update to v. 1.5.2 or bioperl-live to fix > a particular bug in the 1.4 release. I know this has been discussed before, but while we're talking about future release plans, it might be worth revisiting the BioPerl policy of designating only even-numbered releases as 'stable'. It's taking so long to get from 1.4 to 1.6. While the principle of keeping a stable API between 'stable' releases is valid in the ideal case, I think that continuing to label 1.5.2 (or whatever the latest 'dev' release is) as a developer release (which implies potentially unstable or bleeding-edge code) is highly misleading since we would never ever tell anyone to get 1.4 instead. Alternatively, if we adopt a more aggressive release schedule as Chris proposed a couple days ago, then perhaps we could agree to push out an even-numbered release once a year or so, so that there is a 'stable' release we could recommend. > If we feel a nightly snapshot is warranted we could do that though. > I personally don't think there is a need, particularly since we have > several means to obtain the latest code at any point in time > (including the browsable CVS 'Download tarball'). We could state the > next dev/stable CPAN release (pending on date dd/mm/yy) will have the > bug fix, and if they want it immediately then pick it up from CVS. To make it easier for people to obtain the latest tarball, we could put the 'download tarball' link directly on the 'Getting_BioPerl' wiki page instead of only a link to the viewcvs interface. That way they wouldn't have to navigate the source tree to figure out which tarball they want (which is almost always going to be the bioperl- live tarball). I think the actual URL underlying the 'Download tarball' link on viewcvs is stable: http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- live.tar.gz?tarball=1 Dave From cjfields at uiuc.edu Thu Jul 5 12:13:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:13:30 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 11:02 AM, David Messina wrote: > ... > I know this has been discussed before, but while we're talking > about future release plans, it might be worth revisiting the > BioPerl policy of designating only even-numbered releases as > 'stable'. It's taking so long to get from 1.4 to 1.6. While the > principle of keeping a stable API between 'stable' releases is > valid in the ideal case, I think that continuing to label 1.5.2 (or > whatever the latest 'dev' release is) as a developer release (which > implies potentially unstable or bleeding-edge code) is highly > misleading since we would never ever tell anyone to get 1.4 instead. > > Alternatively, if we adopt a more aggressive release schedule as > Chris proposed a couple days ago, then perhaps we could agree to > push out an even-numbered release once a year or so, so that there > is a 'stable' release we could recommend. I think the idea of 'stable' is best summarized back in Hilmar's post (i.e. we support a particular API for that release). The 1.5 releases I believe break some aspects of 1.4 API (some of the Feature/ Annotation stuff introduced before the official 1.5 release). We still need to address some of those issues before a 1.6 which seems to be the only real stumbling block, but they are unfortunately not well-documented and are somewhat interwoven with GMOD code. > ... > To make it easier for people to obtain the latest tarball, we could > put the 'download tarball' link directly on the 'Getting_BioPerl' > wiki page instead of only a link to the viewcvs interface. That way > they wouldn't have to navigate the source tree to figure out which > tarball they want (which is almost always going to be the bioperl- > live tarball). > > I think the actual URL underlying the 'Download tarball' link on > viewcvs is stable: > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- > live.tar.gz?tarball=1 > > Dave Sounds reasonable enough. Do you want to do the honors? chris From dmessina at wustl.edu Thu Jul 5 12:44:28 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:44:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> > [Chris] > The 1.5 releases I believe break some aspects of 1.4 API Yes, this is true. I question, though, whether it's relevant given that virtually no one uses 1.4 anymore. In any case, I would venture that the number of people who would be bitten by the 1.4->1.5 API change is much smaller than the number of people who download 1.4 and then ask us why it doesn't work. I think that, rather than continuing to call 1.5.x the developer release in order to adhere to the API guarantee, it would be much clearer to users if we state clearly that everyone should download 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API changes. >> [me] >> we could put the 'download tarball' link directly on the >> 'Getting_BioPerl' wiki page > > [Chris] > Sounds reasonable enough. Do you want to do the honors? Done. Dave From cjfields at uiuc.edu Thu Jul 5 12:57:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:57:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: On Jul 5, 2007, at 11:44 AM, David Messina wrote: > >> [Chris] >> The 1.5 releases I believe break some aspects of 1.4 API > > Yes, this is true. > > I question, though, whether it's relevant given that virtually no > one uses 1.4 anymore. In any case, I would venture that the number > of people who would be bitten by the 1.4->1.5 API change is much > smaller than the number of people who download 1.4 and then ask us > why it doesn't work. > > I think that, rather than continuing to call 1.5.x the developer > release in order to adhere to the API guarantee, it would be much > clearer to users if we state clearly that everyone should download > 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API > changes. You'd be surprised how many are still using bioperl 1.2.3 (Ensembl) and 1.4 (any admin too scared to go with a 'dev' release). The real answer is to get out a stable 1.6 ASAP. The problem we currently have is (horrible Texas pun) 'too many pokers in the fire.' We have svn migration, major changes in the test suite, talk about splitting bioperl, a lot of bugs to sort through, new code to add or work on, etc. Not to mention our $jobs! I think we should just bite the bullet and proceed with pulling out the controversial operator overloading in Bio::Annotation*, deprecate the tag methods in AnnotatableI, and go about fixing everything up. If that occurs (which seems to be the major impediment) and we get GMOD/GBrowse playing well with BioPerl then we can aim for a new stable release, and then institute a regular release cycle. chris From bpederse at gmail.com Thu Jul 5 13:58:24 2007 From: bpederse at gmail.com (Brent Pedersen) Date: Thu, 5 Jul 2007 10:58:24 -0700 Subject: [Bioperl-l] slippy map for genomic features. Message-ID: hi, here's a side project i've been tinkering on in googlecode svn that may be useful to some. http://code.google.com/p/genome-browser/ it's a simple hack on top of OpenLayers (openlayers.org) to provide a javascript slippy map interface and API to view and browse genomic features. It can be used with any image generation program that can accept &xmin= and &xmax= parameters through the url. -- though i havent had it working it bioperl as bioperl generates images of different height depending on the number of tracks. there's a live example of the code in SVN here: http://toxic.berkeley.edu/bpederse/genome-browser/ with images generated by a colleague's modules on first request. those images are then cached by a simple perl script included in the SVN repo. all subsequent requests are returned from the cache. an image request (automatically generated by the javascript) looks like: http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512 but any implementation need only implement xmin and xmax. all other parameters will be used for caching but are not required. if anyone is interested in getting this going with bioperl image generation--or improving the project in any way, let me know and i'll add you as a committer and provide any javascript support that i can. -brent tar ball download: http://genome-browser.googlecode.com/files/genome-browser-0.02.tar From dmessina at wustl.edu Thu Jul 5 14:39:16 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 13:39:16 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: > The real answer is to get out a stable 1.6 ASAP. The problem we > currently have is (horrible Texas pun) 'too many pokers in the > fire.' We have svn migration, major changes in the test suite, > talk about splitting bioperl, a lot of bugs to sort through, new > code to add or work on, etc. Not to mention our $jobs! Yep, I hear ya. > I think we should just bite the bullet and proceed with pulling out > the controversial operator overloading in Bio::Annotation*, > deprecate the tag methods in AnnotatableI, and go about fixing > everything up. If that occurs (which seems to be the major > impediment) and we get GMOD/GBrowse playing well with BioPerl then > we can aim for a new stable release, and then institute a regular > release cycle. That's a great plan. You're right -- better to devote energy to 1.6 than to interim solutions. Alright, I give, I give! :) Dave From glauberwagner at yahoo.com.br Thu Jul 5 15:56:43 2007 From: glauberwagner at yahoo.com.br (Glauber Wagner) Date: Thu, 5 Jul 2007 16:56:43 -0300 (ART) Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com> Message-ID: <839755.95349.qm@web36514.mail.mud.yahoo.com> Dear All, I have a problem if Bio::DB::Query::GenBank module. I am trying to count the number of protein sequences and the module did not return the expected number by count object. use Bio::DB::GenBank; use Bio::DB::Query::GenBank; $query_string = "Trypanosoma cruzi[Organism]"; my $query = Bio::DB::Query::GenBank->new(-db=>'protein', -query=>$query_string); my $count = $query->count; my @ids = $query->ids; print "$count\n"; Thanks. Glauber ____________________________________________________________________________________ Novo Yahoo! Cad?? - Experimente uma nova busca. http://yahoo.com.br/oqueeuganhocomisso From cjfields at uiuc.edu Thu Jul 5 16:21:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 15:21:49 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <839755.95349.qm@web36514.mail.mud.yahoo.com> References: <839755.95349.qm@web36514.mail.mud.yahoo.com> Message-ID: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> NCBI esearch doesn't seem to be working at the moment. I'm getting 'Internal Server Error' at this time. Try back again at a later point. chris On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote: > Dear All, > > I have a problem if Bio::DB::Query::GenBank module. I > am trying to count the number of protein sequences and > the module did not return the expected number by count > object. > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > $query_string = "Trypanosoma cruzi[Organism]"; > > my $query = > Bio::DB::Query::GenBank->new(-db=>'protein', > > -query=>$query_string); > my $count = $query->count; > my @ids = $query->ids; > > print "$count\n"; > > Thanks. > Glauber > > > > > ______________________________________________________________________ > ______________ > Novo Yahoo! Cad?? - Experimente uma nova busca. > http://yahoo.com.br/oqueeuganhocomisso > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mitch_skinner at berkeley.edu Thu Jul 5 17:22:38 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Thu, 05 Jul 2007 14:22:38 -0700 Subject: [Bioperl-l] slippy map for genomic features. In-Reply-To: References: Message-ID: <468D611E.7020904@berkeley.edu> Hi, FWIW, we've been working on something similar: http://genome.biowiki.org/dmel/static/browser/prototype_gbrowse.html based on GBrowse/Bio::Graphics and javascript that Andrew wrote from scratch (with the prototype library). When our project was starting up (fall 05) Andrew looked but didn't find openlayers; I'm not sure if it was public back then but their current svn only goes back to 2006. I think that things like layout (bumping) ought to be done in advance on a chromosome-wide basis; otherwise it's difficult to keep features from ending up at different heights on neighboring tiles. And it would be difficult for the server to know what was being clicked on. So we've been doing some up-front work to either do layout or to just render all the tiles in advance: http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/TileGenerator.pm?revision=1.1&view=markup which is driven by this script: http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/generate-tiles.pl?revision=1.14&view=markup Or you could just not bump at all, I guess. I think of that as important functionality but I'd be interested in hearing about use cases where it's not necessary. It's not just bumping, though; things like text labels also make it difficult to predict exactly what pixels a feature will span if you only have its genomic coordinates. To make features clickable we've been using imagemaps; it simplifies the server code but it bogs down the client quite a bit. I'd certainly be interested in seeing if there are ways we could work together; if you're at Berkeley maybe we could meet. Regards, Mitch Brent Pedersen wrote: > hi, > here's a side project i've been tinkering on in googlecode svn that > may be useful to some. > http://code.google.com/p/genome-browser/ > it's a simple hack on top of OpenLayers (openlayers.org) to provide a > javascript slippy map interface and API to view and browse genomic > features. It can be used with any image generation program that can > accept &xmin= and &xmax= parameters through the url. -- though i > havent had it working it bioperl as bioperl generates images of > different height depending on the number of tracks. > > there's a live example of the code in SVN here: > http://toxic.berkeley.edu/bpederse/genome-browser/ > with images generated by a colleague's modules on first request. those > images are then cached by a simple perl script included in the SVN > repo. all subsequent requests are returned from the cache. > an image request (automatically generated by the javascript) looks like: > http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512 > but any implementation need only implement xmin and xmax. all other > parameters will be used for caching but are not required. > > if anyone is interested in getting this going with bioperl image > generation--or improving the project in any way, let me know and i'll > add you as a committer and provide any javascript support that i can. > > -brent > > tar ball download: > http://genome-browser.googlecode.com/files/genome-browser-0.02.tar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Jul 5 17:42:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 16:42:40 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> References: <839755.95349.qm@web36514.mail.mud.yahoo.com> <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> Message-ID: <3219E785-D475-4C21-ABCC-89FABD502E05@uiuc.edu> Update: seems to be back up. Give it a try now. chris On Jul 5, 2007, at 3:21 PM, Chris Fields wrote: > NCBI esearch doesn't seem to be working at the moment. I'm getting > 'Internal Server Error' at this time. Try back again at a later > point. > > chris > > On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote: > >> Dear All, >> >> I have a problem if Bio::DB::Query::GenBank module. I >> am trying to count the number of protein sequences and >> the module did not return the expected number by count >> object. >> >> use Bio::DB::GenBank; >> use Bio::DB::Query::GenBank; >> >> $query_string = "Trypanosoma cruzi[Organism]"; >> >> my $query = >> Bio::DB::Query::GenBank->new(-db=>'protein', >> >> -query=>$query_string); >> my $count = $query->count; >> my @ids = $query->ids; >> >> print "$count\n"; >> >> Thanks. >> Glauber >> >> >> >> >> _____________________________________________________________________ >> _ >> ______________ >> Novo Yahoo! Cad?? - Experimente uma nova busca. >> http://yahoo.com.br/oqueeuganhocomisso >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Jul 6 03:09:17 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 08:09:17 +0100 Subject: [Bioperl-l] API Changes In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: <468DEA9D.6010809@sheffield.ac.uk> David Messina wrote: >> [Chris] >> The 1.5 releases I believe break some aspects of 1.4 API >> > > Yes, this is true. > > I question, though, whether it's relevant given that virtually no one > uses 1.4 anymore. In any case, I would venture that the number of > people who would be bitten by the 1.4->1.5 API change is much smaller > than the number of people who download 1.4 and then ask us why it > doesn't work. > I'm not really up-to-speed with how the API should remain stable etc. Is the idea that the API should be stable from 1.4 though the 1.5 dev and then the next stale release can change that API? So any stable to stable upgrade could involve an API change while a stable to dev upgrade should have the same API? Does a stable API mean that the same method calls are available in a newer release....what about adding new methods to a newer release? How are these API changes currently tracked? It seems to me that Test::More might be able to help in testing the API: can_ok($module, @methods); Nath From n.haigh at sheffield.ac.uk Fri Jul 6 07:10:14 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 12:10:14 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange Message-ID: <468E2316.1030804@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm taking a look at the tests for Bio::Variation::RNAChange. If you create a new oject without arguments: my $obj = Bio::Variation::RNAChange->new(); What do you expect the following to return: $obj->label(); I thought it would probably be: 'inframe' However you get: 'inframe, deletion' Can anyone in the know explain what behaviour would be expected? Cheers Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjiMVczuW2jkwy2gRAv+0AJ9tA/1WgEbTRCen+FCi/DU/P2RnAwCfbGit B8DxDViDOcx2gTFjSwQ2kNg= =SroY -----END PGP SIGNATURE----- From n.haigh at sheffield.ac.uk Fri Jul 6 08:54:33 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 13:54:33 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E2316.1030804@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> Message-ID: <468E3B89.3090202@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nathan S. Haigh wrote: > I'm taking a look at the tests for Bio::Variation::RNAChange. > > If you create a new oject without arguments: > my $obj = Bio::Variation::RNAChange->new(); > > What do you expect the following to return: > $obj->label(); > > I thought it would probably be: > 'inframe' > > However you get: > 'inframe, deletion' > > Can anyone in the know explain what behaviour would be expected? > > Cheers > Nath Following on from this, AAChange has the following two methods: add_Allele() and allele_mut() It appears that allele_mut is only capable of remembering 1 allele at a time, whereas add_Allele() is provided to add support for mutliple alleles - is that correct? However, add_Allele() also calls allele_mut(), such that mutliple calls to add_Allele will result in the overwriting of the allele being remembered by allele_mut(). Things are further complicated by the fact that label() uses allele_mut() to decide on the label to return. Shouldn't label know aout multiple alleles set by multiple calls to add_Allele? It may be my lack of understanding alleles and what these classes are intending to do, but trying to rewrite the test scripts to improve code coverage has let me a little confused! Thanks Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjjuJczuW2jkwy2gRAgogAKDXAn8h5iFIBCjtQgxYsrUGofYpOwCguC6I b8ZOENvDDDIxphAoxeKg8/E= =f/sa -----END PGP SIGNATURE----- From tanzeem.mb at gmail.com Thu Jul 5 02:39:34 2007 From: tanzeem.mb at gmail.com (tanzeem) Date: Wed, 4 Jul 2007 23:39:34 -0700 (PDT) Subject: [Bioperl-l] Problem working with remoteblast submit method in webbrowser. In-Reply-To: <11114623.post@talk.nabble.com> References: <11114623.post@talk.nabble.com> Message-ID: <11441586.post@talk.nabble.com> Ifound it myself.run apache as root and disable selinux, the problem will not recur. tanzeem wrote: > > I have a program which uses the Bio perl remoteblast module which > compares a aminoacid fasta file with swissprot database. The > submit_blast() method works successfully when run from commandline.But > when the program is run from web browser it returns -1. I was trying to > adapt the code from Remoteblast synopsis for my need. > -- View this message in context: http://www.nabble.com/Problem-working-with-remoteblast-submit-method-in-webbrowser.-tf3919886.html#a11441586 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Fri Jul 6 09:00:32 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 06 Jul 2007 09:00:32 -0400 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: <1183726832.2566.34.camel@localhost.localdomain> On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote: > > I think we should just bite the bullet and proceed with pulling out > the controversial operator overloading in Bio::Annotation*, deprecate > the tag methods in AnnotatableI, and go about fixing everything up. > If that occurs (which seems to be the major impediment) and we get > GMOD/GBrowse playing well with BioPerl then we can aim for a new > stable release, and then institute a regular release cycle. > I think this sounds like a good idea to me too. I'm planning on having a GMOD hackathon at the end of the summer; if I had a new API by then, we could focus on fixing anything that gets broken by the changes. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070706/d77c2d90/attachment.bin From cjfields at uiuc.edu Fri Jul 6 09:10:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 6 Jul 2007 08:10:41 -0500 Subject: [Bioperl-l] API Changes In-Reply-To: <468DEA9D.6010809@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> Message-ID: On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote: > David Messina wrote: >>> [Chris] >>> The 1.5 releases I believe break some aspects of 1.4 API >>> >> >> Yes, this is true. >> >> I question, though, whether it's relevant given that virtually no one >> uses 1.4 anymore. In any case, I would venture that the number of >> people who would be bitten by the 1.4->1.5 API change is much smaller >> than the number of people who download 1.4 and then ask us why it >> doesn't work. >> > > I'm not really up-to-speed with how the API should remain stable > etc. Is > the idea that the API should be stable from 1.4 though the 1.5 dev and > then the next stale release can change that API? So any stable to > stable > upgrade could involve an API change while a stable to dev upgrade > should > have the same API? Does a stable API mean that the same method > calls are > available in a newer release....what about adding new methods to a > newer > release? > > How are these API changes currently tracked? It seems to me that > Test::More might be able to help in testing the API: > > can_ok($module, @methods); > > > Nath It's basically a 'contract' of sorts between the devs (us) and users (us/them) that the API won't change for the extent of that release series, thus ensuring any scripts out there generating tons of data won't break down if they attempt to call a renamed method. We try to maintain the API state anyway for those reasons, but in a dev release series we might decide to change some method names for consistency and deprecate older ambiguously-named methods (see below). For a stable release it's critical the API remain intact. There are a few methods which are considered deprecated or will be deprecated. For instance, we recently talked about changes to method names which use case to specify whether you're receiving an object (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested list, or whether to use each_* vs next_* for iterators. Consistency is nice! chris From heikki at sanbi.ac.za Fri Jul 6 09:20:26 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 6 Jul 2007 15:20:26 +0200 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E3B89.3090202@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> <468E3B89.3090202@sheffield.ac.uk> Message-ID: <200707061520.27000.heikki@sanbi.ac.za> Hi Nat, These modules have not been touched for a while and were developed for a specific task. A revire is defiitely in order. The way RNAChange->label was written, it should return 'inframe' when given no alleles, but 'no change' would actually be better. The multiple alleles were originally though to be a good idea, but the vocabulary for labels was developed for single allele, only, The use of the module ended up being limited to single allele, so add_allele() behaviour was conveniently ignored but not removed. :( -Heikki On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: > Nathan S. Haigh wrote: > > I'm taking a look at the tests for Bio::Variation::RNAChange. > > > > If you create a new oject without arguments: > > my $obj = Bio::Variation::RNAChange->new(); > > > > What do you expect the following to return: > > $obj->label(); > > > > I thought it would probably be: > > 'inframe' > > > > However you get: > > 'inframe, deletion' > > > > Can anyone in the know explain what behaviour would be expected? > > > > Cheers > > Nath > > Following on from this, AAChange has the following two methods: > add_Allele() and allele_mut() > > It appears that allele_mut is only capable of remembering 1 allele at a > time, whereas add_Allele() is provided to add support for mutliple > alleles - is that correct? > > However, add_Allele() also calls allele_mut(), such that mutliple calls > to add_Allele will result in the overwriting of the allele being > remembered by allele_mut(). Things are further complicated by the fact > that label() uses allele_mut() to decide on the label to return. > Shouldn't label know aout multiple alleles set by multiple calls to > add_Allele? > > It may be my lack of understanding alleles and what these classes are > intending to do, but trying to rewrite the test scripts to improve code > coverage has let me a little confused! > > Thanks > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From schlesi at ebi.ac.uk Fri Jul 6 10:24:05 2007 From: schlesi at ebi.ac.uk (Felix Schlesinger) Date: Fri, 6 Jul 2007 15:24:05 +0100 Subject: [Bioperl-l] Unrooting a tree Message-ID: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> Hi, I am reading a rooted tree in newick format from a string (i.e. a bifurcation at the root) and would like to unroot it (i.e. a trifurcation at the root). I tried getting a grandchild of the root and adding it as a direct child, but that does not seem to work (the root still only has two descendents and the tree structure gets messed up). Is there a nice way to do this directly in bioperl? Doing it on the newick string is possible of course, but not nice. Thanks Felix From n.haigh at sheffield.ac.uk Fri Jul 6 11:37:19 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 16:37:19 +0100 Subject: [Bioperl-l] API Changes In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> Message-ID: <468E61AF.9040106@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris Fields wrote: > > On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote: > >> David Messina wrote: >>>> [Chris] >>>> The 1.5 releases I believe break some aspects of 1.4 API >>>> >>> >>> Yes, this is true. >>> >>> I question, though, whether it's relevant given that virtually no one >>> uses 1.4 anymore. In any case, I would venture that the number of >>> people who would be bitten by the 1.4->1.5 API change is much smaller >>> than the number of people who download 1.4 and then ask us why it >>> doesn't work. >>> >> >> I'm not really up-to-speed with how the API should remain stable etc. Is >> the idea that the API should be stable from 1.4 though the 1.5 dev and >> then the next stale release can change that API? So any stable to stable >> upgrade could involve an API change while a stable to dev upgrade should >> have the same API? Does a stable API mean that the same method calls are >> available in a newer release....what about adding new methods to a newer >> release? >> >> How are these API changes currently tracked? It seems to me that >> Test::More might be able to help in testing the API: >> >> can_ok($module, @methods); >> >> >> Nath > > It's basically a 'contract' of sorts between the devs (us) and users > (us/them) that the API won't change for the extent of that release > series, thus ensuring any scripts out there generating tons of data > won't break down if they attempt to call a renamed method. We try to > maintain the API state anyway for those reasons, but in a dev release > series we might decide to change some method names for consistency and > deprecate older ambiguously-named methods (see below). For a stable > release it's critical the API remain intact. Hmm, still not 100% clear - it is Friday! So, someone running a script that was designed when 1.4 was released should still be able to run their script for all future releases. So all changes need to be backward compatible? So you have several situations regarding method names: 1) Adding new methods should e fine since past scripts don't know about them and won't have used them 2) Removing methods would break past scripts that used them 3) Renamed methods would break past scripts that used the old name A stable API to me, means the same method calls should still be able to accept the same arguments (inc the constructor) and return the same object/data etc. What if a module is pretty outdated and would benefit from a rewrite - should all the old method names be included, what if this makes coding difficult? > > There are a few methods which are considered deprecated or will be > deprecated. For instance, we recently talked about changes to method > names which use case to specify whether you're receiving an object > (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested > list, or whether to use each_* vs next_* for iterators. Consistency is > nice! > You mean the use of case to signify objects vs data being returned are to be deprecated or encouraged? What was the outcome of the each_* vs next_*? Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjmGvczuW2jkwy2gRAkGeAKDBXVSBvN0b39xbK1+2RLed35knSQCgz3pk kAWH1zVa1ycopijl761cvkQ= =fppH -----END PGP SIGNATURE----- From n.haigh at sheffield.ac.uk Fri Jul 6 11:43:41 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 16:43:41 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <200707061520.27000.heikki@sanbi.ac.za> References: <468E2316.1030804@sheffield.ac.uk> <468E3B89.3090202@sheffield.ac.uk> <200707061520.27000.heikki@sanbi.ac.za> Message-ID: <468E632D.4090801@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Heikki Lehvaslaiho wrote: > Hi Nat, > > These modules have not been touched for a while and were developed for a > specific task. A revire is defiitely in order. > > The way RNAChange->label was written, it should return 'inframe' when given no > alleles, but 'no change' would actually be better. Wouldn't this effectively be changing the API since past scripts "could" expect "inframe" to be returned. > > The multiple alleles were originally though to be a good idea, but the > vocabulary for labels was developed for single allele, only, The use of the > module ended up being limited to single allele, so add_allele() behaviour was > conveniently ignored but not removed. :( So add_Allele() and each_Allele() should be deprecated in favour of allele_mut()? - From my post about API's.....how should the capitalisation of add_Allele() and each_Allele() be changed? Cheers Nath > > -Heikki > > > > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: >> Nathan S. Haigh wrote: >>> I'm taking a look at the tests for Bio::Variation::RNAChange. >>> >>> If you create a new oject without arguments: >>> my $obj = Bio::Variation::RNAChange->new(); >>> >>> What do you expect the following to return: >>> $obj->label(); >>> >>> I thought it would probably be: >>> 'inframe' >>> >>> However you get: >>> 'inframe, deletion' >>> >>> Can anyone in the know explain what behaviour would be expected? >>> >>> Cheers >>> Nath >> Following on from this, AAChange has the following two methods: >> add_Allele() and allele_mut() >> >> It appears that allele_mut is only capable of remembering 1 allele at a >> time, whereas add_Allele() is provided to add support for mutliple >> alleles - is that correct? >> >> However, add_Allele() also calls allele_mut(), such that mutliple calls >> to add_Allele will result in the overwriting of the allele being >> remembered by allele_mut(). Things are further complicated by the fact >> that label() uses allele_mut() to decide on the label to return. >> Shouldn't label know aout multiple alleles set by multiple calls to >> add_Allele? >> >> It may be my lack of understanding alleles and what these classes are >> intending to do, but trying to rewrite the test scripts to improve code >> coverage has let me a little confused! >> >> Thanks >> Nath >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjmMtczuW2jkwy2gRAgQHAKC+S5mVh4lqR95NmgR6z+aU9br5lQCfc6ue GBHuSHfsesX1ko55s+ME2Zc= =tkG8 -----END PGP SIGNATURE----- From cjfields at uiuc.edu Sat Jul 7 16:57:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 7 Jul 2007 15:57:37 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <1183726832.2566.34.camel@localhost.localdomain> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <1183726832.2566.34.camel@localhost.localdomain> Message-ID: <198D3F24-8510-453D-9201-21F2CCEC3519@uiuc.edu> We'll prob. get a start soon, then. I'll let you know when we start. chris On Jul 6, 2007, at 8:00 AM, Scott Cain wrote: > On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote: >> >> I think we should just bite the bullet and proceed with pulling out >> the controversial operator overloading in Bio::Annotation*, deprecate >> the tag methods in AnnotatableI, and go about fixing everything up. >> If that occurs (which seems to be the major impediment) and we get >> GMOD/GBrowse playing well with BioPerl then we can aim for a new >> stable release, and then institute a regular release cycle. >> > I think this sounds like a good idea to me too. I'm planning on > having > a GMOD hackathon at the end of the summer; if I had a new API by then, > we could focus on fixing anything that gets broken by the changes. > > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Jul 7 17:17:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 7 Jul 2007 16:17:14 -0500 Subject: [Bioperl-l] API Changes In-Reply-To: <468E61AF.9040106@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> <468E61AF.9040106@sheffield.ac.uk> Message-ID: <369F72D5-E5A3-4A33-BDEC-D462A339474F@uiuc.edu> On Jul 6, 2007, at 10:37 AM, Nathan S. Haigh wrote: > ... > Hmm, still not 100% clear - it is Friday! > > So, someone running a script that was designed when 1.4 was released > should still be able to run their script for all future releases. > So all > changes need to be backward compatible? It helps. For instance, if we change method names (rename each_Foo as next_Foo), we should have each_Foo delegate to next_Foo for the time being. If we plan on deprecating the old method altogether we would add a warning message when it's called, then delegate. It's a better solution than just changing the method outright, which means the user has to search through docs to find the renamed method. > So you have several situations regarding method names: > 1) Adding new methods should e fine since past scripts don't know > about > them and won't have used them > 2) Removing methods would break past scripts that used them > 3) Renamed methods would break past scripts that used the old name > > A stable API to me, means the same method calls should still be > able to > accept the same arguments (inc the constructor) and return the same > object/data etc. Yes. > What if a module is pretty outdated and would benefit from a rewrite - > should all the old method names be included, what if this makes coding > difficult? It depends on the module. If a complete rewrite is needed then maybe starting with a new module/interface is best, and we could deprecate the older module completely. That has been done already with Bio::Tools::BPLite (in favor of SearchIO) and a few other modules. >> There are a few methods which are considered deprecated or will be >> deprecated. For instance, we recently talked about changes to method >> names which use case to specify whether you're receiving an object >> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. >> nested >> list, or whether to use each_* vs next_* for iterators. >> Consistency is >> nice! >> > > You mean the use of case to signify objects vs data being returned are > to be deprecated or encouraged? What was the outcome of the each_* vs > next_*? > > Nath Here's the section I added to the wiki (it started in a thread a few weeks or so ago, so it's a summary really): http://www.bioperl.org/wiki/Advanced_BioPerl#Method_names Feel free to add to it or make suggestions. BTWm Hilmar mentioned there was a movement to rename methods in old code to follow these recs but it was never completed. It should be taken up again at some point but the recommendations are mainly here for newer code. chris From heikki at sanbi.ac.za Sun Jul 8 03:32:21 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Sun, 8 Jul 2007 09:32:21 +0200 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E632D.4090801@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> <200707061520.27000.heikki@sanbi.ac.za> <468E632D.4090801@sheffield.ac.uk> Message-ID: <200707080932.21818.heikki@sanbi.ac.za> On Friday 06 July 2007 17:43:41 Nathan S. Haigh wrote: > Heikki Lehvaslaiho wrote: > > Hi Nat, > > > > These modules have not been touched for a while and were developed for a > > specific task. A revire is defiitely in order. > > > > The way RNAChange->label was written, it should return 'inframe' when > > given no alleles, but 'no change' would actually be better. > > Wouldn't this effectively be changing the API since past scripts "could" > expect "inframe" to be returned. Checking tha actal usage and what happens when you do change of a nucleotide to itself, you get the label 'silent'. I guess that would be a valid lable value even when the alleles are not initialised, too. > > The multiple alleles were originally though to be a good idea, but the > > vocabulary for labels was developed for single allele, only, The use of > > the module ended up being limited to single allele, so add_allele() > > behaviour was conveniently ignored but not removed. :( > > So add_Allele() and each_Allele() should be deprecated in favour of > allele_mut()? Yes. > From my post about API's.....how should the capitalisation of > add_Allele() and each_Allele() be changed? Definitely, keept the current ones as deprecated alternatives. -Heikki > Cheers > Nath > > > -Heikki > > > > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: > >> Nathan S. Haigh wrote: > >>> I'm taking a look at the tests for Bio::Variation::RNAChange. > >>> > >>> If you create a new oject without arguments: > >>> my $obj = Bio::Variation::RNAChange->new(); > >>> > >>> What do you expect the following to return: > >>> $obj->label(); > >>> > >>> I thought it would probably be: > >>> 'inframe' > >>> > >>> However you get: > >>> 'inframe, deletion' > >>> > >>> Can anyone in the know explain what behaviour would be expected? > >>> > >>> Cheers > >>> Nath > >> > >> Following on from this, AAChange has the following two methods: > >> add_Allele() and allele_mut() > >> > >> It appears that allele_mut is only capable of remembering 1 allele at a > >> time, whereas add_Allele() is provided to add support for mutliple > >> alleles - is that correct? > >> > >> However, add_Allele() also calls allele_mut(), such that mutliple calls > >> to add_Allele will result in the overwriting of the allele being > >> remembered by allele_mut(). Things are further complicated by the fact > >> that label() uses allele_mut() to decide on the label to return. > >> Shouldn't label know aout multiple alleles set by multiple calls to > >> add_Allele? > >> > >> It may be my lack of understanding alleles and what these classes are > >> intending to do, but trying to rewrite the test scripts to improve code > >> coverage has let me a little confused! > >> > >> Thanks > >> Nath > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From xing.y.hu at gmail.com Mon Jul 9 02:26:40 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Mon, 09 Jul 2007 14:26:40 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? Message-ID: <4691D520.60700@gmail.com> Hi friends, I wrote a script for getting genomic sequence file from GenBank. To fulfill that target, I used DB::GenBank module to get the sequence via get_Seq_by_acc, and it works well. But this time, facing enormous amount of ESTs, I have no idea how to download them swiftly and elegantly. PROBLEM DESCRIPTION: goal: download all EST files of a specific species from GenBank, say Arabidopsis Thaliana or Oryza sativa(rice). other: whether all of ESTs are in a single file or separatedly placed does not matter. Can I use a bioperl script to achieve that? And How? I really appreciate. Xing. From akozik at atgc.org Mon Jul 9 08:25:14 2007 From: akozik at atgc.org (Alexander Kozik) Date: Mon, 09 Jul 2007 05:25:14 -0700 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <4691D520.60700@gmail.com> References: <4691D520.60700@gmail.com> Message-ID: <4692292A.1080900@atgc.org> To download genomic sequences or ESTs for any organism (in various formats) you can use NCBI Taxonomy Browser: http://www.ncbi.nlm.nih.gov/Taxonomy/ you can use taxonomy id to access different organisms, Arabidopsis for example (3702): http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 or by direct web link: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 assembled genomes can be accessed via ftp: ftp://ftp.ncbi.nih.gov/genomes/ To download large amount of selected sequences (ESTs for example) you can use batch Entrez: http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide (select EST for EST, it's critical) It seems, to solve the problem you describe, you don't need to use bioperl. NCBI GenBank Entrez provides all necessary tools to work on these simple and frequent tasks. -Alex -- Alexander Kozik Bioinformatics Specialist Genome and Biomedical Sciences Facility 451 East Health Sciences Drive University of California Davis, CA 95616-8816 Phone: (530) 754-9127 email#1: akozik at atgc.org email#2: akozik at gmail.com web: http://www.atgc.org/ Xing Hu wrote: > Hi friends, > > I wrote a script for getting genomic sequence file from GenBank. To > fulfill that target, I used DB::GenBank module to get the sequence via > get_Seq_by_acc, and it works well. But this time, facing enormous amount > of ESTs, I have no idea how to download them swiftly and elegantly. > > PROBLEM DESCRIPTION: > goal: download all EST files of a specific species from GenBank, say > Arabidopsis Thaliana or Oryza sativa(rice). > other: whether all of ESTs are in a single file or separatedly > placed does not matter. > > Can I use a bioperl script to achieve that? And How? I really > appreciate. > > Xing. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Jul 9 10:17:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 9 Jul 2007 09:17:23 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <4692292A.1080900@atgc.org> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> Message-ID: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> Caveat: if you have millions of ESTs please consider NOT using my eutil script below or NCBI Batch Entrez, which would repeatedly hit the NCBI server thousands of times. At least try looking for other ways to retrieve the data you want (ftp, organism-specific resources like Ensembl, so on), or run any scripts or data retrieval in off hours so you don't overtax the NCBI server. There is a way you can use BioPerl if you don't mind living on the bleeding edge by using bioperl-live (core code from CVS). I have been working on a set of modules for the last year (Bio::DB::EUtilities) which interact with all the various eutils for building data pipelines which uses the NCBI CGI interface. You could possibly retrieve all relevant ESTs using a variation of the example script here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch Note that the code examples do NOT work with rel. 1.5.2 code as the API has changed quite a bit; I'm working to rectify some of that. The script I would use is below. It retrieves batches of 500 sequences (in fasta format) at a time, for a total of 10000 max seq records, saving the raw record data directly to a file (appending as you go along). I added an eval block to check the server status and redo the call up to 4 times before giving up completely. Using eval this way hasn't been extensively tested but should work. --------------------------------------- use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'nucest', -term => 'txid3702', -usehistory => 'y', -keep_histories => 1); my $count = $factory->get_count; print "Count: $count\n"; if (my $hist = $factory->next_History) { print "History returned\n"; # note db carries over from above $factory->set_parameters(-eutil => 'efetch', -rettype => 'fasta', -history => $hist); my ($retmax, $retstart) = (500,0); my $retry = 1; my $maxcount = $count < 10000 ? $count : 10000; # set max # seq records to return RETRIEVE_SEQS: while ($retstart < $maxcount) { print "Returning from ",$retstart+1," to ",$retstart+ $retmax,"\n"; $factory->set_parameters(-retmax => $retmax, -retstart => $retstart); # check in case of server error eval{ $factory->get_Response(-file => ">>ESTs.fas"); }; if ($@) { die "Server error: $@. Try again later" if $retry == 5; print STDERR "Server error, redo #$retry\n"; $retry++ && redo RETRIEVE_SEQS; } $retstart += $retmax; } } --------------------------------------- chris On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: > To download genomic sequences or ESTs for any organism (in various > formats) you can use NCBI Taxonomy Browser: > http://www.ncbi.nlm.nih.gov/Taxonomy/ > > you can use taxonomy id to access different organisms, Arabidopsis for > example (3702): > http://www.ncbi.nlm.nih.gov/sites/entrez? > db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 > > or by direct web link: > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? > mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 > > assembled genomes can be accessed via ftp: > ftp://ftp.ncbi.nih.gov/genomes/ > > To download large amount of selected sequences (ESTs for example) you > can use batch Entrez: > http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html > http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide > (select EST for EST, it's critical) > > It seems, to solve the problem you describe, you don't need to use > bioperl. NCBI GenBank Entrez provides all necessary tools to work on > these simple and frequent tasks. > > -Alex > > -- > Alexander Kozik > Bioinformatics Specialist > Genome and Biomedical Sciences Facility > 451 East Health Sciences Drive > University of California > Davis, CA 95616-8816 > Phone: (530) 754-9127 > email#1: akozik at atgc.org > email#2: akozik at gmail.com > web: http://www.atgc.org/ > > > > Xing Hu wrote: >> Hi friends, >> >> I wrote a script for getting genomic sequence file from >> GenBank. To >> fulfill that target, I used DB::GenBank module to get the sequence >> via >> get_Seq_by_acc, and it works well. But this time, facing enormous >> amount >> of ESTs, I have no idea how to download them swiftly and elegantly. >> >> PROBLEM DESCRIPTION: >> goal: download all EST files of a specific species from >> GenBank, say >> Arabidopsis Thaliana or Oryza sativa(rice). >> other: whether all of ESTs are in a single file or separatedly >> placed does not matter. >> >> Can I use a bioperl script to achieve that? And How? I really >> appreciate. >> >> Xing. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Mon Jul 9 14:08:07 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 9 Jul 2007 11:08:07 -0700 Subject: [Bioperl-l] Unrooting a tree In-Reply-To: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> Message-ID: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> I don't think there is a function for this yet but it would be a good one to have. I assume you don't really want to take a shot at writing it though? To make this work I think you have to create a new node which contains the trifurcation and this node is what the root is set to. -jason On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote: > Hi, > > I am reading a rooted tree in newick format from a string (i.e. a > bifurcation at the root) and would like to unroot it (i.e. a > trifurcation at the root). I tried getting a grandchild of the root > and adding it as a direct child, but that does not seem to work (the > root still only has two descendents and the tree structure gets messed > up). Is there a nice way to do this directly in bioperl? Doing it on > the newick string is possible of course, but not nice. > > Thanks > Felix > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From lstein at cshl.edu Mon Jul 9 17:35:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 9 Jul 2007 17:35:49 -0400 Subject: [Bioperl-l] JOB NOTICE: Looking for CSHL bioinformatics core manager Message-ID: <6dce9a0b0707091435h3d134b05oa6f7da24839c24bb@mail.gmail.com> Hi Folks, Sorry for the job spam. We're looking for a manager of the Cold Spring Harbor Laboratory bioinformatics core facility. This is a semi-independent staff position supporting CSHL scientific researchers by providing consultation, data mining and software development activities. You will have a software staff of two, a nice salary, good health benefits, and an exciting and dynamic environment to work in. I'm looking for someone with a strong bioinformatics background, at least five years experience programming Perl, Java or Python in a academic or commercial environment, and management experience. If you are interested, please send your CV and cover letter to me. Thanks, Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From stewarta at nmrc.navy.mil Mon Jul 9 18:16:12 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Mon, 9 Jul 2007 18:16:12 -0400 Subject: [Bioperl-l] rpsblast Message-ID: <9DF71DFB-F54E-4392-89E3-33345EC2DB36@nmrc.navy.mil> When I run... $result = $factory->rpsblast($seq); ... where $seq is a Bio::Seq object, it seems to simply copy the $seq object to $result; When I run something similar... $rpsblast('/path/to/ myFile'); ... the value of $result then becomes '/path/to/myFile'. Anyone else encounter this? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From jason_stajich at berkeley.edu Mon Jul 9 21:36:10 2007 From: jason_stajich at berkeley.edu (Jason Stajich) Date: Mon, 9 Jul 2007 18:36:10 -0700 Subject: [Bioperl-l] BOSC2007 Message-ID: I posted a quick note about meeting up at BOSC/ISMB this year. If you are attending, please sign your name on the page or at least express an interest on whether you are interested in a BoF. We'll try and discuss some of the current topics in BioPerl development as well try and use the time to coordinate any development that benefits from the face-to-face time. http://bioperl.org/wiki/BOSC2007_Meetup http://bioperl.org/news/2007/07/09/are-you-going-to-ismbbosc-2007/ -jason -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From schlesi at ebi.ac.uk Tue Jul 10 08:58:00 2007 From: schlesi at ebi.ac.uk (Felix Schlesinger) Date: Tue, 10 Jul 2007 13:58:00 +0100 Subject: [Bioperl-l] Unrooting a tree In-Reply-To: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> Message-ID: <7317d50c0707100558m76853bf8s37ee1e8852835306@mail.gmail.com> Hi, > I don't think there is a function for this yet but it would be a good one > to have. > I assume you don't really want to take a shot at writing it though? > To make this work I think you have to create a new node which contains the > trifurcation and this node is what the root is set to. Creating a new root is fine, but what would the (3) children of that node be? I took a different approach now, where I iterate over all (indirect) descendents of the root, find the first one which does not have the root as its direct ancestor and move it up the tree, i.e. foreach my $d ($root->get_all_Descendents){ if ($d->ancestor != $root){ $d->ancestor->remove_Descendent($d); if ($root->add_Descendent($d, 1) == 3){ last; }}} This will make the old root a trifurcation. It does the right thing for what I am trying to do, but is not general I believe (it does for example at the moment not worry about branch length). Also instead of taking the first, taking the most distant possible subtree of a clade up to the root might be better. Felix > On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote: > > Hi, > > I am reading a rooted tree in newick format from a string (i.e. a > bifurcation at the root) and would like to unroot it (i.e. a > trifurcation at the root). I tried getting a grandchild of the root > and adding it as a direct child, but that does not seem to work (the > root still only has two descendents and the tree structure gets messed > up). Is there a nice way to do this directly in bioperl? Doing it on > the newick string is possible of course, but not nice. > > Thanks > Felix > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From xing.y.hu at gmail.com Tue Jul 10 09:29:36 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Tue, 10 Jul 2007 21:29:36 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> Message-ID: <469389C0.5060303@gmail.com> Thanks you guys. I had to confess that how stupid I was. The easiest way seems to be the way using NCBI Taxonomy Browser which suggested by alex. As a matter of fact, I knew that but I thought it was necessary to have all items selected before pressing save to launch download. So I was desperate to find a button that could achieve that without hundreds of thousands of clicking by me. "What about select none of those items at all?" -- This idea finally came to me after days of struggling and the problem was solved. Xing Chris Fields wrote: > Caveat: if you have millions of ESTs please consider NOT using my > eutil script below or NCBI Batch Entrez, which would repeatedly hit > the NCBI server thousands of times. At least try looking for other > ways to retrieve the data you want (ftp, organism-specific resources > like Ensembl, so on), or run any scripts or data retrieval in off > hours so you don't overtax the NCBI server. > > There is a way you can use BioPerl if you don't mind living on the > bleeding edge by using bioperl-live (core code from CVS). I have been > working on a set of modules for the last year (Bio::DB::EUtilities) > which interact with all the various eutils for building data pipelines > which uses the NCBI CGI interface. You could possibly retrieve all > relevant ESTs using a variation of the example script here: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch > > Note that the code examples do NOT work with rel. 1.5.2 code as the > API has changed quite a bit; I'm working to rectify some of that. > > The script I would use is below. It retrieves batches of 500 > sequences (in fasta format) at a time, for a total of 10000 max seq > records, saving the raw record data directly to a file (appending as > you go along). I added an eval block to check the server status and > redo the call up to 4 times before giving up completely. Using eval > this way hasn't been extensively tested but should work. > > --------------------------------------- > > use Bio::DB::EUtilities; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'nucest', > -term => 'txid3702', > -usehistory => 'y', > -keep_histories => 1); > > my $count = $factory->get_count; > > print "Count: $count\n"; > > if (my $hist = $factory->next_History) { > print "History returned\n"; > # note db carries over from above > $factory->set_parameters(-eutil => 'efetch', > -rettype => 'fasta', > -history => $hist); > my ($retmax, $retstart) = (500,0); > my $retry = 1; > my $maxcount = $count < 10000 ? $count : 10000; # set max # seq > records to return > RETRIEVE_SEQS: > while ($retstart < $maxcount) { > print "Returning from ",$retstart+1," to > ",$retstart+$retmax,"\n"; > $factory->set_parameters(-retmax => $retmax, > -retstart => $retstart); > # check in case of server error > eval{ > $factory->get_Response(-file => ">>ESTs.fas"); > }; > if ($@) { > die "Server error: $@. Try again later" if $retry == 5; > print STDERR "Server error, redo #$retry\n"; > $retry++ && redo RETRIEVE_SEQS; > } > $retstart += $retmax; > } > } > > > --------------------------------------- > > > chris > > On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: > >> To download genomic sequences or ESTs for any organism (in various >> formats) you can use NCBI Taxonomy Browser: >> http://www.ncbi.nlm.nih.gov/Taxonomy/ >> >> you can use taxonomy id to access different organisms, Arabidopsis for >> example (3702): >> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >> >> >> or by direct web link: >> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >> >> >> assembled genomes can be accessed via ftp: >> ftp://ftp.ncbi.nih.gov/genomes/ >> >> To download large amount of selected sequences (ESTs for example) you >> can use batch Entrez: >> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >> (select EST for EST, it's critical) >> >> It seems, to solve the problem you describe, you don't need to use >> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >> these simple and frequent tasks. >> >> -Alex >> >> --Alexander Kozik >> Bioinformatics Specialist >> Genome and Biomedical Sciences Facility >> 451 East Health Sciences Drive >> University of California >> Davis, CA 95616-8816 >> Phone: (530) 754-9127 >> email#1: akozik at atgc.org >> email#2: akozik at gmail.com >> web: http://www.atgc.org/ >> >> >> >> Xing Hu wrote: >>> Hi friends, >>> >>> I wrote a script for getting genomic sequence file from GenBank. To >>> fulfill that target, I used DB::GenBank module to get the sequence via >>> get_Seq_by_acc, and it works well. But this time, facing enormous >>> amount >>> of ESTs, I have no idea how to download them swiftly and elegantly. >>> >>> PROBLEM DESCRIPTION: >>> goal: download all EST files of a specific species from GenBank, >>> say >>> Arabidopsis Thaliana or Oryza sativa(rice). >>> other: whether all of ESTs are in a single file or separatedly >>> placed does not matter. >>> >>> Can I use a bioperl script to achieve that? And How? I really >>> appreciate. >>> >>> Xing. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From davila at ioc.fiocruz.br Tue Jul 10 09:58:29 2007 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Tue, 10 Jul 2007 10:58:29 -0300 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <469389C0.5060303@gmail.com> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> Message-ID: <46939085.40906@ioc.fiocruz.br> Hi Xing, Unfortunately that did not work for me... there are 5133 T. brucei ESTs (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) and 13971 from T. cruzi (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) that I cannot download at once in GenBank format... even when I select "GenBank" format in the Display menu I can only see and get/download 500 ESTs each time... I also downloaded all ESTs from GenBank (a pity there are not subsets of them !) but merging all them generate a file bigger than 120GB to be processed... Just asked Diogo (my student) to give a try to the script sent by Chris Fields.. so finger crossed ;-) Cheers, Alberto Xing Hu wrote: > Thanks you guys. > > I had to confess that how stupid I was. The easiest way seems to be the > way using NCBI Taxonomy Browser which suggested by alex. As a matter of > fact, I knew that but I thought it was necessary to have all items > selected before pressing save to launch download. So I was desperate to > find a button that could achieve that without hundreds of thousands of > clicking by me. "What about select none of those items at all?" -- This > idea finally came to me after days of struggling and the problem was solved. > > Xing > > > > Chris Fields wrote: >> Caveat: if you have millions of ESTs please consider NOT using my >> eutil script below or NCBI Batch Entrez, which would repeatedly hit >> the NCBI server thousands of times. At least try looking for other >> ways to retrieve the data you want (ftp, organism-specific resources >> like Ensembl, so on), or run any scripts or data retrieval in off >> hours so you don't overtax the NCBI server. >> >> There is a way you can use BioPerl if you don't mind living on the >> bleeding edge by using bioperl-live (core code from CVS). I have been >> working on a set of modules for the last year (Bio::DB::EUtilities) >> which interact with all the various eutils for building data pipelines >> which uses the NCBI CGI interface. You could possibly retrieve all >> relevant ESTs using a variation of the example script here: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch >> >> Note that the code examples do NOT work with rel. 1.5.2 code as the >> API has changed quite a bit; I'm working to rectify some of that. >> >> The script I would use is below. It retrieves batches of 500 >> sequences (in fasta format) at a time, for a total of 10000 max seq >> records, saving the raw record data directly to a file (appending as >> you go along). I added an eval block to check the server status and >> redo the call up to 4 times before giving up completely. Using eval >> this way hasn't been extensively tested but should work. >> >> --------------------------------------- >> >> use Bio::DB::EUtilities; >> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -db => 'nucest', >> -term => 'txid3702', >> -usehistory => 'y', >> -keep_histories => 1); >> >> my $count = $factory->get_count; >> >> print "Count: $count\n"; >> >> if (my $hist = $factory->next_History) { >> print "History returned\n"; >> # note db carries over from above >> $factory->set_parameters(-eutil => 'efetch', >> -rettype => 'fasta', >> -history => $hist); >> my ($retmax, $retstart) = (500,0); >> my $retry = 1; >> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >> records to return >> RETRIEVE_SEQS: >> while ($retstart < $maxcount) { >> print "Returning from ",$retstart+1," to >> ",$retstart+$retmax,"\n"; >> $factory->set_parameters(-retmax => $retmax, >> -retstart => $retstart); >> # check in case of server error >> eval{ >> $factory->get_Response(-file => ">>ESTs.fas"); >> }; >> if ($@) { >> die "Server error: $@. Try again later" if $retry == 5; >> print STDERR "Server error, redo #$retry\n"; >> $retry++ && redo RETRIEVE_SEQS; >> } >> $retstart += $retmax; >> } >> } >> >> >> --------------------------------------- >> >> >> chris >> >> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >> >>> To download genomic sequences or ESTs for any organism (in various >>> formats) you can use NCBI Taxonomy Browser: >>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>> >>> you can use taxonomy id to access different organisms, Arabidopsis for >>> example (3702): >>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>> >>> >>> or by direct web link: >>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>> >>> >>> assembled genomes can be accessed via ftp: >>> ftp://ftp.ncbi.nih.gov/genomes/ >>> >>> To download large amount of selected sequences (ESTs for example) you >>> can use batch Entrez: >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>> (select EST for EST, it's critical) >>> >>> It seems, to solve the problem you describe, you don't need to use >>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >>> these simple and frequent tasks. >>> >>> -Alex >>> >>> --Alexander Kozik >>> Bioinformatics Specialist >>> Genome and Biomedical Sciences Facility >>> 451 East Health Sciences Drive >>> University of California >>> Davis, CA 95616-8816 >>> Phone: (530) 754-9127 >>> email#1: akozik at atgc.org >>> email#2: akozik at gmail.com >>> web: http://www.atgc.org/ >>> >>> >>> >>> Xing Hu wrote: >>>> Hi friends, >>>> >>>> I wrote a script for getting genomic sequence file from GenBank. To >>>> fulfill that target, I used DB::GenBank module to get the sequence via >>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>> amount >>>> of ESTs, I have no idea how to download them swiftly and elegantly. >>>> >>>> PROBLEM DESCRIPTION: >>>> goal: download all EST files of a specific species from GenBank, >>>> say >>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>> other: whether all of ESTs are in a single file or separatedly >>>> placed does not matter. >>>> >>>> Can I use a bioperl script to achieve that? And How? I really >>>> appreciate. >>>> >>>> Xing. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> From cjfields at uiuc.edu Tue Jul 10 10:05:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 09:05:43 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <46939085.40906@ioc.fiocruz.br> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> Message-ID: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> Just make sure you're using the latest from CVS. Let me know if it doesn't work and I'll look into it. chris On Jul 10, 2007, at 8:58 AM, Alberto Davila wrote: > Hi Xing, > > Unfortunately that did not work for me... there are 5133 T. brucei > ESTs > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691 > [Organism:exp]&cmd=Search&db=nucest&QueryKey=8) > and 13971 from T. cruzi > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693 > [Organism:exp]&cmd=Search&db=nucest&QueryKey=11) > that I cannot download at once in GenBank format... even when I > select > "GenBank" format in the Display menu I can only see and get/ > download 500 > ESTs each time... > > I also downloaded all ESTs from GenBank (a pity there are not > subsets of > them !) but merging all them generate a file bigger than 120GB to be > processed... > > Just asked Diogo (my student) to give a try to the script sent by > Chris > Fields.. so finger crossed ;-) > > Cheers, Alberto > > > Xing Hu wrote: >> Thanks you guys. >> >> I had to confess that how stupid I was. The easiest way seems to >> be the >> way using NCBI Taxonomy Browser which suggested by alex. As a >> matter of >> fact, I knew that but I thought it was necessary to have all items >> selected before pressing save to launch download. So I was >> desperate to >> find a button that could achieve that without hundreds of >> thousands of >> clicking by me. "What about select none of those items at all?" -- >> This >> idea finally came to me after days of struggling and the problem >> was solved. >> >> Xing >> >> >> >> Chris Fields wrote: >>> Caveat: if you have millions of ESTs please consider NOT using my >>> eutil script below or NCBI Batch Entrez, which would repeatedly hit >>> the NCBI server thousands of times. At least try looking for other >>> ways to retrieve the data you want (ftp, organism-specific resources >>> like Ensembl, so on), or run any scripts or data retrieval in off >>> hours so you don't overtax the NCBI server. >>> >>> There is a way you can use BioPerl if you don't mind living on the >>> bleeding edge by using bioperl-live (core code from CVS). I have >>> been >>> working on a set of modules for the last year (Bio::DB::EUtilities) >>> which interact with all the various eutils for building data >>> pipelines >>> which uses the NCBI CGI interface. You could possibly retrieve all >>> relevant ESTs using a variation of the example script here: >>> >>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-. >>> 3Eefetch >>> >>> Note that the code examples do NOT work with rel. 1.5.2 code as the >>> API has changed quite a bit; I'm working to rectify some of that. >>> >>> The script I would use is below. It retrieves batches of 500 >>> sequences (in fasta format) at a time, for a total of 10000 max seq >>> records, saving the raw record data directly to a file (appending as >>> you go along). I added an eval block to check the server status and >>> redo the call up to 4 times before giving up completely. Using eval >>> this way hasn't been extensively tested but should work. >>> >>> --------------------------------------- >>> >>> use Bio::DB::EUtilities; >>> >>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >>> -db => 'nucest', >>> -term => 'txid3702', >>> -usehistory => 'y', >>> -keep_histories => 1); >>> >>> my $count = $factory->get_count; >>> >>> print "Count: $count\n"; >>> >>> if (my $hist = $factory->next_History) { >>> print "History returned\n"; >>> # note db carries over from above >>> $factory->set_parameters(-eutil => 'efetch', >>> -rettype => 'fasta', >>> -history => $hist); >>> my ($retmax, $retstart) = (500,0); >>> my $retry = 1; >>> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >>> records to return >>> RETRIEVE_SEQS: >>> while ($retstart < $maxcount) { >>> print "Returning from ",$retstart+1," to >>> ",$retstart+$retmax,"\n"; >>> $factory->set_parameters(-retmax => $retmax, >>> -retstart => $retstart); >>> # check in case of server error >>> eval{ >>> $factory->get_Response(-file => ">>ESTs.fas"); >>> }; >>> if ($@) { >>> die "Server error: $@. Try again later" if $retry == 5; >>> print STDERR "Server error, redo #$retry\n"; >>> $retry++ && redo RETRIEVE_SEQS; >>> } >>> $retstart += $retmax; >>> } >>> } >>> >>> >>> --------------------------------------- >>> >>> >>> chris >>> >>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >>> >>>> To download genomic sequences or ESTs for any organism (in various >>>> formats) you can use NCBI Taxonomy Browser: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>>> >>>> you can use taxonomy id to access different organisms, >>>> Arabidopsis for >>>> example (3702): >>>> http://www.ncbi.nlm.nih.gov/sites/entrez? >>>> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>>> >>>> >>>> or by direct web link: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? >>>> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>>> >>>> >>>> assembled genomes can be accessed via ftp: >>>> ftp://ftp.ncbi.nih.gov/genomes/ >>>> >>>> To download large amount of selected sequences (ESTs for >>>> example) you >>>> can use batch Entrez: >>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>>> (select EST for EST, it's critical) >>>> >>>> It seems, to solve the problem you describe, you don't need to use >>>> bioperl. NCBI GenBank Entrez provides all necessary tools to >>>> work on >>>> these simple and frequent tasks. >>>> >>>> -Alex >>>> >>>> --Alexander Kozik >>>> Bioinformatics Specialist >>>> Genome and Biomedical Sciences Facility >>>> 451 East Health Sciences Drive >>>> University of California >>>> Davis, CA 95616-8816 >>>> Phone: (530) 754-9127 >>>> email#1: akozik at atgc.org >>>> email#2: akozik at gmail.com >>>> web: http://www.atgc.org/ >>>> >>>> >>>> >>>> Xing Hu wrote: >>>>> Hi friends, >>>>> >>>>> I wrote a script for getting genomic sequence file from >>>>> GenBank. To >>>>> fulfill that target, I used DB::GenBank module to get the >>>>> sequence via >>>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>>> amount >>>>> of ESTs, I have no idea how to download them swiftly and >>>>> elegantly. >>>>> >>>>> PROBLEM DESCRIPTION: >>>>> goal: download all EST files of a specific species from >>>>> GenBank, >>>>> say >>>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>>> other: whether all of ESTs are in a single file or separatedly >>>>> placed does not matter. >>>>> >>>>> Can I use a bioperl script to achieve that? And How? I really >>>>> appreciate. >>>>> >>>>> Xing. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From diogoat at gmail.com Tue Jul 10 10:15:20 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 10 Jul 2007 11:15:20 -0300 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> Message-ID: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> Deal All, I use this script bellow, and it`s work very fine! I only changed the query! And the script gave me the 5133 EST from T. brucei. ################################################################################# use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query =>'gbdiv est[prop] AND Trypanosoma brucei [organism]', db => 'nucleotide'); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); my $out = Bio::SeqIO->new(-format => 'Genbank', -file => '>>Tbrucei.EST.fasta'); while (my $seq = $seqio->next_seq){ $out->write_seq($seq); } #################################################################### Diogo Tschoeke/Fiocruz (Alberto`s Student) From cjfields at uiuc.edu Tue Jul 10 10:35:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 09:35:03 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> Message-ID: <4D704A90-A88A-44A3-B514-E5031CBF288C@uiuc.edu> That will work as well; the key difference between my example and this one is that the seq stream retrieved using Bio::DB::GenBank passes through Bio::SeqIO while Bio::DB::EUtilities saves the raw seq record directly to a file (or callback or HTTP::Response) for optionally parsing later. If you have problems with Bio::SeqIO you can always use Bio::DB::EUtilities to get around the issue until we resolve it. chris On Jul 10, 2007, at 9:15 AM, Diogo Tschoeke wrote: > Deal All, > I use this script bellow, and it`s work very fine! > I only changed the query! And the script gave me the 5133 EST from T. > brucei. > > ###################################################################### > ########### > use strict; > use warnings; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > my $query = Bio::DB::Query::GenBank->new > (-query =>'gbdiv est[prop] AND > Trypanosoma > brucei [organism]', > db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > my $out = Bio::SeqIO->new(-format => 'Genbank', > -file => '>>Tbrucei.EST.fasta'); > while (my $seq = $seqio->next_seq){ > $out->write_seq($seq); > } > #################################################################### > > Diogo Tschoeke/Fiocruz (Alberto`s Student) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hartzell at alerce.com Tue Jul 10 12:50:31 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 12:50:31 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <18067.47319.254632.538811@almost.alerce.com> Jason Stajich writes: > [...] > Do you know how to have svn commit messages generate summary emails > as well? I've made a local installation of the SVN::Notify bits in my home directory and set up its notification script. If folks are happy with it then I'll work on getting The Powers That Be to do a real install and we'll use it for the real repository. It's currently configured to include diffs inline in the message. I prefer them as an attachment, but the current configuration of the bioperl-guts-l list stalls messages w/ attachments and requires admin intervention. I have a support@ request going on it and will change it if/when we get the issue resolved. So, to review: svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/ is the top of the repository and svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/bioperl-live/trunk will get you the main branch of bioperl-live. Remember that the repository is transient, don't put anything important in there.... Have at it, but remember that the entire world will see your commit messages. g. From xing.y.hu at gmail.com Tue Jul 10 13:08:35 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Wed, 11 Jul 2007 01:08:35 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <46939085.40906@ioc.fiocruz.br> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> Message-ID: <4693BD13.2070509@gmail.com> Hi Alberto, Yes, I know that there is only choice for showing no more than 500 entries on the NCBI website. However, I completely ignored that (doesn't mean that I have not seen that), and pulled down the "send to" and chose "file". Then a small window popped up, after saying yes to that, the downloading started. You might ask me how I know that it was not a batch of only 5 (default selection) or 500 ESTs? To be honest, I don't know at the first time. But the download has accumulated to millions bytes since then(due to my bad network condition, I have no idea when it will reach the end), and that doesn't look like a little batch of ESTs less than one thousand. Actually, I wrote a script to count the sequences within the temporary file and got a number much bigger than ten thousand. So I guess it works. BTW, I never thought Bio::DB::Genbank can do that! Again, thanks you guys! Xing Alberto Davila wrote: > Hi Xing, > > Unfortunately that did not work for me... there are 5133 T. brucei ESTs > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) > and 13971 from T. cruzi > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) > that I cannot download at once in GenBank format... even when I select > "GenBank" format in the Display menu I can only see and get/download 500 > ESTs each time... > > I also downloaded all ESTs from GenBank (a pity there are not subsets of > them !) but merging all them generate a file bigger than 120GB to be > processed... > > Just asked Diogo (my student) to give a try to the script sent by Chris > Fields.. so finger crossed ;-) > > Cheers, Alberto > > > Xing Hu wrote: > >> Thanks you guys. >> >> I had to confess that how stupid I was. The easiest way seems to be the >> way using NCBI Taxonomy Browser which suggested by alex. As a matter of >> fact, I knew that but I thought it was necessary to have all items >> selected before pressing save to launch download. So I was desperate to >> find a button that could achieve that without hundreds of thousands of >> clicking by me. "What about select none of those items at all?" -- This >> idea finally came to me after days of struggling and the problem was solved. >> >> Xing >> >> >> >> Chris Fields wrote: >> >>> Caveat: if you have millions of ESTs please consider NOT using my >>> eutil script below or NCBI Batch Entrez, which would repeatedly hit >>> the NCBI server thousands of times. At least try looking for other >>> ways to retrieve the data you want (ftp, organism-specific resources >>> like Ensembl, so on), or run any scripts or data retrieval in off >>> hours so you don't overtax the NCBI server. >>> >>> There is a way you can use BioPerl if you don't mind living on the >>> bleeding edge by using bioperl-live (core code from CVS). I have been >>> working on a set of modules for the last year (Bio::DB::EUtilities) >>> which interact with all the various eutils for building data pipelines >>> which uses the NCBI CGI interface. You could possibly retrieve all >>> relevant ESTs using a variation of the example script here: >>> >>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch >>> >>> Note that the code examples do NOT work with rel. 1.5.2 code as the >>> API has changed quite a bit; I'm working to rectify some of that. >>> >>> The script I would use is below. It retrieves batches of 500 >>> sequences (in fasta format) at a time, for a total of 10000 max seq >>> records, saving the raw record data directly to a file (appending as >>> you go along). I added an eval block to check the server status and >>> redo the call up to 4 times before giving up completely. Using eval >>> this way hasn't been extensively tested but should work. >>> >>> --------------------------------------- >>> >>> use Bio::DB::EUtilities; >>> >>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >>> -db => 'nucest', >>> -term => 'txid3702', >>> -usehistory => 'y', >>> -keep_histories => 1); >>> >>> my $count = $factory->get_count; >>> >>> print "Count: $count\n"; >>> >>> if (my $hist = $factory->next_History) { >>> print "History returned\n"; >>> # note db carries over from above >>> $factory->set_parameters(-eutil => 'efetch', >>> -rettype => 'fasta', >>> -history => $hist); >>> my ($retmax, $retstart) = (500,0); >>> my $retry = 1; >>> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >>> records to return >>> RETRIEVE_SEQS: >>> while ($retstart < $maxcount) { >>> print "Returning from ",$retstart+1," to >>> ",$retstart+$retmax,"\n"; >>> $factory->set_parameters(-retmax => $retmax, >>> -retstart => $retstart); >>> # check in case of server error >>> eval{ >>> $factory->get_Response(-file => ">>ESTs.fas"); >>> }; >>> if ($@) { >>> die "Server error: $@. Try again later" if $retry == 5; >>> print STDERR "Server error, redo #$retry\n"; >>> $retry++ && redo RETRIEVE_SEQS; >>> } >>> $retstart += $retmax; >>> } >>> } >>> >>> >>> --------------------------------------- >>> >>> >>> chris >>> >>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >>> >>> >>>> To download genomic sequences or ESTs for any organism (in various >>>> formats) you can use NCBI Taxonomy Browser: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>>> >>>> you can use taxonomy id to access different organisms, Arabidopsis for >>>> example (3702): >>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>>> >>>> >>>> or by direct web link: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>>> >>>> >>>> assembled genomes can be accessed via ftp: >>>> ftp://ftp.ncbi.nih.gov/genomes/ >>>> >>>> To download large amount of selected sequences (ESTs for example) you >>>> can use batch Entrez: >>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>>> (select EST for EST, it's critical) >>>> >>>> It seems, to solve the problem you describe, you don't need to use >>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >>>> these simple and frequent tasks. >>>> >>>> -Alex >>>> >>>> --Alexander Kozik >>>> Bioinformatics Specialist >>>> Genome and Biomedical Sciences Facility >>>> 451 East Health Sciences Drive >>>> University of California >>>> Davis, CA 95616-8816 >>>> Phone: (530) 754-9127 >>>> email#1: akozik at atgc.org >>>> email#2: akozik at gmail.com >>>> web: http://www.atgc.org/ >>>> >>>> >>>> >>>> Xing Hu wrote: >>>> >>>>> Hi friends, >>>>> >>>>> I wrote a script for getting genomic sequence file from GenBank. To >>>>> fulfill that target, I used DB::GenBank module to get the sequence via >>>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>>> amount >>>>> of ESTs, I have no idea how to download them swiftly and elegantly. >>>>> >>>>> PROBLEM DESCRIPTION: >>>>> goal: download all EST files of a specific species from GenBank, >>>>> say >>>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>>> other: whether all of ESTs are in a single file or separatedly >>>>> placed does not matter. >>>>> >>>>> Can I use a bioperl script to achieve that? And How? I really >>>>> appreciate. >>>>> >>>>> Xing. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Tue Jul 10 13:14:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 10 Jul 2007 18:14:29 +0100 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.47319.254632.538811@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> Message-ID: <4693BE75.4090005@sendu.me.uk> George Hartzell wrote: > Jason Stajich writes: > > [...] > > Do you know how to have svn commit messages generate summary emails > > as well? > > I've made a local installation of the SVN::Notify bits in my home > directory and set up its notification script. If folks are happy with > it then I'll work on getting The Powers That Be to do a real install > and we'll use it for the real repository. > > It's currently configured to include diffs inline in the message. I > prefer them as an attachment, but the current configuration of the > bioperl-guts-l list stalls messages w/ attachments and requires admin > intervention. I have a support@ request going on it and will change > it if/when we get the issue resolved. Can I put a vote in that you don't? I search through email body text in my archive of guts to find certain diffs, so really like the diffs inline. Also, is there any way to get rid of the 'bioperl' in [bioperl revision] in the subject? Seems redundant and makes it harder to see what was changed in a small email client window. From aaron.j.mackey at gsk.com Tue Jul 10 13:20:15 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 10 Jul 2007 13:20:15 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.47319.254632.538811@almost.alerce.com> Message-ID: George, this is all very nice to finally have, thank you for your efforts! Any chance that the diff-as-attachment vs. diffs-inline question can be different for each subscriber? The utility of the "guts" mailing list (to me) is that it's an encyclopedia of browsable, skimmable, and searchable diffs, not just a date-stamped record of diffs (if so, why provide an attachment at all, just provide a URL to the diff in the respository). Thanks again, -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 07/10/2007 12:50:31 PM: > Jason Stajich writes: > > [...] > > Do you know how to have svn commit messages generate summary emails > > as well? > > I've made a local installation of the SVN::Notify bits in my home > directory and set up its notification script. If folks are happy with > it then I'll work on getting The Powers That Be to do a real install > and we'll use it for the real repository. > > It's currently configured to include diffs inline in the message. I > prefer them as an attachment, but the current configuration of the > bioperl-guts-l list stalls messages w/ attachments and requires admin > intervention. I have a support@ request going on it and will change > it if/when we get the issue resolved. > > So, to review: > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/ > > is the top of the repository and > > svn co svn+ssh://dev.open-bio. > org/home/hartzell/bioperl_take2/bioperl-live/trunk > > will get you the main branch of bioperl-live. > > Remember that the repository is transient, don't put anything > important in there.... > > Have at it, but remember that the entire world will see your commit > messages. > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Jul 10 14:18:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 13:18:07 -0500 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <4693BE75.4090005@sendu.me.uk> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: On Jul 10, 2007, at 12:14 PM, Sendu Bala wrote: > George Hartzell wrote: >> Jason Stajich writes: >>> [...] >>> Do you know how to have svn commit messages generate summary emails >>> as well? >> >> I've made a local installation of the SVN::Notify bits in my home >> directory and set up its notification script. If folks are happy >> with >> it then I'll work on getting The Powers That Be to do a real install >> and we'll use it for the real repository. >> >> It's currently configured to include diffs inline in the message. I >> prefer them as an attachment, but the current configuration of the >> bioperl-guts-l list stalls messages w/ attachments and requires admin >> intervention. I have a support@ request going on it and will change >> it if/when we get the issue resolved. > > Can I put a vote in that you don't? I search through email body > text in > my archive of guts to find certain diffs, so really like the diffs > inline. > > Also, is there any way to get rid of the 'bioperl' in [bioperl > revision] > in the subject? Seems redundant and makes it harder to see what was > changed in a small email client window. Agree on both counts; the devs have gotten used to seeing the diffs inline. We prob. need to schedule a specific day/time when the switchover would take place so we can announce (so everyone knows and no one can gripe). Did we ever resolve the svn->cvs issue? Jason pointed out some tools a while ago... chris From hartzell at alerce.com Tue Jul 10 16:09:09 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 16:09:09 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <4693BE75.4090005@sendu.me.uk> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: <18067.59237.519166.454578@almost.alerce.com> Sendu Bala writes: > George Hartzell wrote: > > Jason Stajich writes: > > > [...] > > > Do you know how to have svn commit messages generate summary emails > > > as well? > > > > I've made a local installation of the SVN::Notify bits in my home > > directory and set up its notification script. If folks are happy with > > it then I'll work on getting The Powers That Be to do a real install > > and we'll use it for the real repository. > > > > It's currently configured to include diffs inline in the message. I > > prefer them as an attachment, but the current configuration of the > > bioperl-guts-l list stalls messages w/ attachments and requires admin > > intervention. I have a support@ request going on it and will change > > it if/when we get the issue resolved. > > Can I put a vote in that you don't? I search through email body text in > my archive of guts to find certain diffs, so really like the diffs inline. Ok, three votes against attachments. Anyone want to vote in support, otherwise I'll just leave 'em inline. > Also, is there any way to get rid of the 'bioperl' in [bioperl revision] > in the subject? Seems redundant and makes it harder to see what was > changed in a small email client window. Sure. The default's just [RevisionNumber]. Does that work for folk? g. From hartzell at alerce.com Tue Jul 10 16:11:36 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 16:11:36 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: <18067.59384.247108.463648@almost.alerce.com> Chris Fields writes: > [...] > We prob. need to schedule a specific day/time when the switchover > would take place so we can announce (so everyone knows and no one can > gripe). Did we ever resolve the svn->cvs issue? Jason pointed out > some tools a while ago... I haven't done anything about it. I think that we also need to have some input from the admin/support folk about access methods (https, etc...). Are we going to want to mirror the repository anywhere? g. From hartzell at alerce.com Wed Jul 11 09:17:08 2007 From: hartzell at alerce.com (George Hartzell) Date: Wed, 11 Jul 2007 09:17:08 -0400 Subject: [Bioperl-l] extra hook functionality for svn repos? Message-ID: <18068.55380.626778.486775@almost.alerce.com> There are a bunch of "contributed" hook scripts at http://subversion.tigris.org/tools_contrib.html#hook_scripts Given that many bioperl users depend on case-preserving but case-insensitive file systems, I'm wondering if hooking up the case-insensitive.py script might be worthwhile. Likewise, the check-mime-type.pl script might help us keep svn:mime-type and svn:eol-style properties up to date. There are others there, but none that I found interesting. How big-brother do we want the repository to be? g. From cjfields at uiuc.edu Wed Jul 11 09:40:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Jul 2007 08:40:54 -0500 Subject: [Bioperl-l] extra hook functionality for svn repos? In-Reply-To: <18068.55380.626778.486775@almost.alerce.com> References: <18068.55380.626778.486775@almost.alerce.com> Message-ID: On Jul 11, 2007, at 8:17 AM, George Hartzell wrote: > > There are a bunch of "contributed" hook scripts at > > http://subversion.tigris.org/tools_contrib.html#hook_scripts > > Given that many bioperl users depend on case-preserving but > case-insensitive file systems, I'm wondering if hooking up the > case-insensitive.py script might be worthwhile. I'm not sure how often we run into this, though. Anyone know? > Likewise, the check-mime-type.pl script might help us keep > svn:mime-type and svn:eol-style properties up to date. The latter two might be nice. I thought we planned on defaulting to a simple 'plain text' mime type on commits if it isn't specifically predefined, but maybe this way is better? > There are others there, but none that I found interesting. > > How big-brother do we want the repository to be? > > g. 'Friendly' big-brother, not 'dystopian' big-brother. chris From marian.thieme at lycos.de Wed Jul 11 05:05:18 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Wed, 11 Jul 2007 09:05:18 +0000 Subject: [Bioperl-l] submitting code Message-ID: <188661178019848@lycos-europe.com> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/eec1aa42/attachment.html From dmessina at wustl.edu Wed Jul 11 16:14:17 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 11 Jul 2007 15:14:17 -0500 Subject: [Bioperl-l] submitting code In-Reply-To: <188661178019848@lycos-europe.com> References: <188661178019848@lycos-europe.com> Message-ID: <4DF90B9A-7FFA-4867-B5D3-E6F05EC84BBC@wustl.edu> Hi Marian, Thanks so much for contributing! The best way would be to create a Bugzilla ticket and then attach the code to that ticket. One of the developers will check it in and give you feedback if there are any little tweaks that would be helpful*. Would you be able to include documentation and test cases with your module? Dave * For more info: http://www.bioperl.org/wiki/FAQ#I. 27ve_got_an_idea_for_a_module_how_do_I_contribute_it.3F http://www.bioperl.org/wiki/Developer_Information http://www.bioperl.org/wiki/Becoming_a_developer http://bioperl.org/pipermail/bioperl-l/2003-February/011226.html -- Dave Messina Senior Analyst, Assembly Group Genome Sequencing Center Washington University St. Louis, MO From marian.thieme at lycos.de Wed Jul 11 11:12:20 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Wed, 11 Jul 2007 15:12:20 +0000 Subject: [Bioperl-l] submitting code Message-ID: <188661178030343@lycos-europe.com> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/c95991b8/attachment.html From e-just at northwestern.edu Thu Jul 12 10:37:03 2007 From: e-just at northwestern.edu (Eric Just) Date: Thu, 12 Jul 2007 09:37:03 -0500 Subject: [Bioperl-l] Job opening in Chicago Message-ID: Hello everyone, We have an opening at dictyBase (Northwestern University in Chicago) for a Bioinformatics Software Engineer. This job involves writing and maintaining software for a genome database using Chado/OO-Perl/Bioperl and many other state of the art technologies. For more information please see: http://dictybase.org/dictybase_jobs.htm Thanks, Eric From cjfields at uiuc.edu Thu Jul 12 12:09:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Jul 2007 11:09:02 -0500 Subject: [Bioperl-l] DB::SeqFeature::Store::GFF3Loader question Message-ID: I have been running into some GFF formatting issues where the attributes column is left undef (no '.'), which causes GFF3Loader::parse_attributes() to complain with an 'use of undefined string with split' warning. Would it be okay with the powers that be (Scott, Lincoln) to add a warning or exception there? I'm guessing a warning is better in this case, as just returning works fine. chris From jason at bioperl.org Fri Jul 13 13:30:05 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 13:30:05 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.59384.247108.463648@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> Message-ID: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> I'll try and look into this and other stuff with the migration in next week or so - maybe we'll make some time to talk it through during BOSC. I don't know yet when I'll actually have time to think about it properly. I am still worried about doing https because of the current system we have supporting user logins and that we didn't want to run a web server on the main repository machine and we'll have to install DAV on the main repository machine. if ssh+svn is going to be sufficient hurdle for people, note it was already a hurdle for them with CVS, but we'll have to think a bit more on it. We might be able to do some sort of NFS (or other exported FS) but exported to the webserver machine but that is may be a recipe for disaster. -jason On Jul 10, 2007, at 4:11 PM, George Hartzell wrote: > Chris Fields writes: >> [...] >> We prob. need to schedule a specific day/time when the switchover >> would take place so we can announce (so everyone knows and no one can >> gripe). Did we ever resolve the svn->cvs issue? Jason pointed out >> some tools a while ago... > > I haven't done anything about it. > > I think that we also need to have some input from the admin/support > folk about access methods (https, etc...). > > Are we going to want to mirror the repository anywhere? > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Fri Jul 13 14:29:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 13 Jul 2007 13:29:22 -0500 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> Message-ID: <5F5EB9B6-11AF-4D20-95B1-EBBD40A98962@uiuc.edu> I don't think there's a huge rush on this since BOSC is imminent. If devs really want https then we can try adding it after migration, but if it becomes too much of a headache (particularly for the web admins) I wouldn't worry about it. chris On Jul 13, 2007, at 12:30 PM, Jason Stajich wrote: > I'll try and look into this and other stuff with the migration in > next week or so - maybe we'll make some time to talk it through > during BOSC. I don't know yet when I'll actually have time to think > about it properly. > > I am still worried about doing https because of the current system we > have supporting user logins and that we didn't want to run a web > server on the main repository machine and we'll have to install DAV > on the main repository machine. if ssh+svn is going to be sufficient > hurdle for people, note it was already a hurdle for them with CVS, > but we'll have to think a bit more on it. > > We might be able to do some sort of NFS (or other exported FS) but > exported to the webserver machine but that is may be a recipe for > disaster. > > -jason > On Jul 10, 2007, at 4:11 PM, George Hartzell wrote: > >> Chris Fields writes: >>> [...] >>> We prob. need to schedule a specific day/time when the switchover >>> would take place so we can announce (so everyone knows and no one >>> can >>> gripe). Did we ever resolve the svn->cvs issue? Jason pointed out >>> some tools a while ago... >> >> I haven't done anything about it. >> >> I think that we also need to have some input from the admin/support >> folk about access methods (https, etc...). >> >> Are we going to want to mirror the repository anywhere? >> >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sheris at eps.berkeley.edu Fri Jul 13 14:42:32 2007 From: sheris at eps.berkeley.edu (Sheri Simmons) Date: Fri, 13 Jul 2007 11:42:32 -0700 Subject: [Bioperl-l] Problem with Bio::PopGen::Individual Message-ID: <200707131142.32366.sheris@eps.berkeley.edu> Hi, I have a collection of sequencing reads aligned with a consensus sequence that I input into a Bio::PopGen::Population object in order to calculate allele frequencies. The consensus sequence is included to force clustalw to give a better alignment. However, I need to remove the consensus sequence before calculating allele frequencies in the individual reads. I'm having trouble with this part of it. I get the following error message: "Can't locate object method "person_id" via package "Bio::PopGen::Individual" at /usr/share/perl5/Bio/PopGen/Population.pm line 260, line 49." Here is the code snippet producing the error. $pop is a Bio::PopGen::Population object. my @consensus = "gene_consensus"; $pop->remove_Individuals(@consensus); I also tried: my @consensus = $pop->get_Individuals(-unique_id => "gene_consensus"); $pop->remove_Individuals(@consensus); which produced the same error. Can anyone send me in the right direction? I suspect this is a simple problem. Sheri -- Sheri Simmons Department of Earth and Planetary Sciences University of California, Berkeley Berkeley, CA 94720-4767 From jason at bioperl.org Fri Jul 13 16:17:31 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 16:17:31 -0400 Subject: [Bioperl-l] Problem with Bio::PopGen::Individual In-Reply-To: <200707131142.32366.sheris@eps.berkeley.edu> References: <200707131142.32366.sheris@eps.berkeley.edu> Message-ID: <99A3513A-7DBE-4C89-B38B-8C2B76B0E14F@bioperl.org> Hi Sheri - Shoot - that was my fault - bug in the code where I was only using "Person" not Individuals for the code when I was testing. I've commited a bugfix to CVS - do you need me to send you the updated file or are you comfortable grabbing the code from CVS or http://code.open-bio.org This is the change - you may have a different version of BioPerl than what is in CVS so you may have to make the changes on line 260 rather than 282 -- or you can upgrade to latest code via CVS (although this is probably harder for you since you've got stuff installed in /usr/ share)': RCS file: /home/repository/bioperl/bioperl-live/Bio/PopGen/ Population.pm,v retrieving revision 1.22 diff -r1.22 Population.pm 282c282 < unshift @tosplice, $i if( $namehash{$ind->person_id} ); --- > unshift @tosplice, $i if( $namehash{$ind->unique_id} ); -jason On Jul 13, 2007, at 2:42 PM, Sheri Simmons wrote: > Hi, > I have a collection of sequencing reads aligned with a consensus > sequence that > I input into a Bio::PopGen::Population object in order to calculate > allele > frequencies. The consensus sequence is included to force clustalw > to give a > better alignment. However, I need to remove the consensus sequence > before > calculating allele frequencies in the individual reads. I'm having > trouble > with this part of it. I get the following error message: > > "Can't locate object method "person_id" via package > "Bio::PopGen::Individual" > at /usr/share/perl5/Bio/PopGen/Population.pm line 260, line > 49." > > Here is the code snippet producing the error. $pop is a > Bio::PopGen::Population object. > > my @consensus = "gene_consensus"; > $pop->remove_Individuals(@consensus); > > I also tried: > my @consensus = $pop->get_Individuals(-unique_id => > "gene_consensus"); > $pop->remove_Individuals(@consensus); > > which produced the same error. Can anyone send me in the right > direction? I > suspect this is a simple problem. > > Sheri > > -- > Sheri Simmons > Department of Earth and Planetary Sciences > University of California, Berkeley > Berkeley, CA 94720-4767 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From hartzell at alerce.com Fri Jul 13 16:34:14 2007 From: hartzell at alerce.com (George Hartzell) Date: Fri, 13 Jul 2007 16:34:14 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> Message-ID: <18071.57798.130368.703488@almost.alerce.com> Jason Stajich writes: > I'll try and look into this and other stuff with the migration in > next week or so - maybe we'll make some time to talk it through > during BOSC. I don't know yet when I'll actually have time to think > about it properly. > > I am still worried about doing https because of the current system we > have supporting user logins and that we didn't want to run a web > server on the main repository machine and we'll have to install DAV > on the main repository machine. if ssh+svn is going to be sufficient > hurdle for people, note it was already a hurdle for them with CVS, > but we'll have to think a bit more on it. > [...] How are you thinking about providing anonymous readonly non-dev access to the repository? svn+ssh using an anonymous/guest account (can it be screwed down tightly enough?) svn-mirror the repo onto the public machine and do DAV there w/out having to worry about authenticating the devs? g. From jason at bioperl.org Fri Jul 13 17:33:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 17:33:29 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18071.57798.130368.703488@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> <18071.57798.130368.703488@almost.alerce.com> Message-ID: <5C42D957-BCCA-46B6-8121-3313CE4B0F2A@bioperl.org> On Jul 13, 2007, at 4:34 PM, George Hartzell wrote: > Jason Stajich writes: >> I'll try and look into this and other stuff with the migration in >> next week or so - maybe we'll make some time to talk it through >> during BOSC. I don't know yet when I'll actually have time to think >> about it properly. >> >> I am still worried about doing https because of the current system we >> have supporting user logins and that we didn't want to run a web >> server on the main repository machine and we'll have to install DAV >> on the main repository machine. if ssh+svn is going to be sufficient >> hurdle for people, note it was already a hurdle for them with CVS, >> but we'll have to think a bit more on it. >> [...] > > How are you thinking about providing anonymous readonly non-dev access > to the repository? svn+ssh using an anonymous/guest account (can it > be screwed down tightly enough?) svn-mirror the repo onto the public > machine and do DAV there w/out having to worry about authenticating > the devs? > We'll do svn on the public anonymous machine like we already do with CVS and with SVN See: http://code.open-bio.org AND http://code.open-bio.org/svnweb/ See blipkit. -jason > g. > > -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From scrosson at uchicago.edu Fri Jul 13 18:15:30 2007 From: scrosson at uchicago.edu (Sean Crosson) Date: Fri, 13 Jul 2007 22:15:30 +0000 (UTC) Subject: [Bioperl-l] ace to fasta conversion Message-ID: I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta and it works great. We're now trying to convert a big (250 MB) .ace file to fasta. The documentation suggests I can do this, but everytime I run the script below, it outputs an empty .fas file. Does anyone have any suggestions on how to make this script work? Does SeqIO really convert between these file types? Thanks for your help. #!/usr/bin/perl -w use Bio::SeqIO; $in = Bio::SeqIO->new(-file => "454Contigs.ace", -format => 'ace'); $out = Bio::SeqIO->new(-file => ">454Contigs.fas", -format => 'fasta'); while ( $seq = $in->next_seq() ) {$out->write_seq($seq); } From cvillamar at gmail.com Fri Jul 13 19:24:04 2007 From: cvillamar at gmail.com (Carlos Villacorta) Date: Fri, 13 Jul 2007 16:24:04 -0700 Subject: [Bioperl-l] beginner problem with fasta headers Message-ID: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> hi all, I have a embl sequence file, when formatting to fasta with Seqio it gives a long string header for each sequence that my following phylogenetic software cannot handle... Does anyone knows how to format those embl or genbank files to fasta but retrieving in the headers just two or three fields (e.g. id | gene | sp_name)? Any advice with this problem would be very appreciated, thanks! From j_martin at lbl.gov Fri Jul 13 20:05:45 2007 From: j_martin at lbl.gov (Joel Martin) Date: Fri, 13 Jul 2007 17:05:45 -0700 Subject: [Bioperl-l] ace to fasta conversion In-Reply-To: References: Message-ID: <20070714000544.GB29841@eniac.jgi-psf.org> Hello, the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use is a phrap/consed ace file. They aren't related at all. You might try poking around in Bio::AssemblyIO which should read assembly ace files. Joel On Fri, Jul 13, 2007 at 10:15:30PM +0000, Sean Crosson wrote: > I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta > and it works great. We're now trying to convert a big (250 MB) .ace file to > fasta. The documentation suggests I can do this, but everytime I run the script > below, it outputs an empty .fas file. Does anyone have any suggestions on how > to make this script work? Does SeqIO really convert between these file types? > Thanks for your help. > > #!/usr/bin/perl -w > > use Bio::SeqIO; > > > $in = Bio::SeqIO->new(-file => "454Contigs.ace", > -format => 'ace'); > $out = Bio::SeqIO->new(-file => ">454Contigs.fas", > -format => 'fasta'); > while ( $seq = $in->next_seq() ) {$out->write_seq($seq); } > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sat Jul 14 00:06:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 13 Jul 2007 23:06:27 -0500 Subject: [Bioperl-l] beginner problem with fasta headers In-Reply-To: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> References: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> Message-ID: <0089195A-4935-49F2-A8E7-C1F9B8A34D4E@uiuc.edu> Some reading material... http://www.bioperl.org/wiki/ FAQ#Accession_numbers_are_not_present_for_FASTA_sequence_files http://www.bioperl.org/wiki/ FAQ#I_would_like_to_make_my_own_custom_fasta_header_- _how_do_I_do_this.3F http://www.bioperl.org/wiki/FASTA_sequence_format#Note Quiz on Monday! chris On Jul 13, 2007, at 6:24 PM, Carlos Villacorta wrote: > hi all, > I have a embl sequence file, when formatting to fasta with Seqio it > gives a long string header for each sequence that my following > phylogenetic software cannot handle... > Does anyone knows how to format those embl or genbank files to fasta > but retrieving in the headers just two or three fields (e.g. id | gene > | sp_name)? > Any advice with this problem would be very appreciated, thanks! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scrosson at uchicago.edu Fri Jul 13 23:43:59 2007 From: scrosson at uchicago.edu (scrosson) Date: Fri, 13 Jul 2007 20:43:59 -0700 (PDT) Subject: [Bioperl-l] ace to fasta conversion In-Reply-To: <20070714000544.GB29841@eniac.jgi-psf.org> References: <20070714000544.GB29841@eniac.jgi-psf.org> Message-ID: <11590811.post@talk.nabble.com> This problem now makes sense. I've been playing with Bio::Assembly::IO, which does indeed read phrap .ace files. Does anyone have an idea how to pull the assembled contigs out of a Bio::Assembly object and write them out as multi-fasta (or strings for that matter)? None of our workstations are running phrap/consed and I'd love to see these contigs. Sean Hello, the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use is a phrap/consed ace file. They aren't related at all. You might try poking around in Bio::AssemblyIO which should read assembly ace files. Joel -- View this message in context: http://www.nabble.com/ace-to-fasta-conversion-tf4077370.html#a11590811 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bioperlanand at yahoo.com Sat Jul 14 13:55:53 2007 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Sat, 14 Jul 2007 10:55:53 -0700 (PDT) Subject: [Bioperl-l] a question on obtain PDB records using bioperl Message-ID: <798126.17426.qm@web36804.mail.mud.yahoo.com> Hi everybody, Is there a method in Bioperl to obtain PDB record(s) on the fly, i.e. something similar to Bio:Perl methods to retrieve EMBL or GenBank records. Thanks in advance, Anand --------------------------------- Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. From johnsonm at gmail.com Tue Jul 17 14:23:58 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 17 Jul 2007 13:23:58 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? Message-ID: I'm tinkering with parsing iprscan reports with BioPerl. I noticed that this: my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => 'interpro'); while (my $seq = $seqio->next_seq()) { ... } Does not work unless I first 'use XML::DOM::XPath'. I get this error: Can't locate object method "findnodes" via package "XML::DOM::Document" at bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line 30. I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to suck in XML::DOM::Xpath. I see that t/interpro.t requires XML::DOM::XPath: test_begin(-tests => 17, -requires_module => 'XML::DOM::XPath'); Is suppose the reason the test specs a require XML::DOM::XPath is so that tests can be skipped if XML::DOM::XPath is not available. Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? From sac at bioperl.org Tue Jul 17 15:49:32 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 17 Jul 2007 12:49:32 -0700 Subject: [Bioperl-l] Ohloh account for bioperl Message-ID: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> I came across a web app that tracks various metrics for open source projects, noticed that bioperl wasn't listed, and added it: http://www.ohloh.net/projects/6685 Seems like an interesting resource that could help add some visibility. It creates metrics by directly processing the source code repository. I hooked it up to the CVS repos for bioperl-live, -db, -run, and -pipeline. It has yet to do its analysis at this point. Feel free to create Ohloh accounts for yourselves. When you add yourself as a contributor to Bioperl, you can indicate the username associated with your commits, but this requires that it first process the commit logs to figure out what the usernames are. You can still create an account, just update it later with your username. Steve From cjfields at uiuc.edu Tue Jul 17 17:04:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 17 Jul 2007 16:04:44 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? In-Reply-To: References: Message-ID: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu> On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote: > I'm tinkering with parsing iprscan reports with BioPerl. I noticed > that this: > > my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => > 'interpro'); > > while (my $seq = $seqio->next_seq()) { > ... > } > > Does not work unless I first 'use XML::DOM::XPath'. I get this error: > > Can't locate object method "findnodes" via package > "XML::DOM::Document" at > bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line > 30. > > I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to > suck in XML::DOM::Xpath. I see that t/interpro.t requires > XML::DOM::XPath: > > test_begin(-tests => 17, > -requires_module => 'XML::DOM::XPath'); > > Is suppose the reason the test specs a require XML::DOM::XPath is so > that tests can be skipped if XML::DOM::XPath is not available. > Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? You're right; I think tests passed b/c XML::DOM::XPath (if present), was eval'd as a required module. When I commented out the spot where it is eval'd in the test suite I can replicate this error. I have added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it passes fine. Thanks for the heads up! chris From xianranli78 at yahoo.com.cn Wed Jul 18 01:55:19 2007 From: xianranli78 at yahoo.com.cn (Xianran Li) Date: Wed, 18 Jul 2007 13:55:19 +0800 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file Message-ID: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> Hi, I want to extract some infomation from the gff3 file like: 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? Thanks for your help. Xianran Li From georg.otto at tuebingen.mpg.de Wed Jul 18 05:32:26 2007 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Wed, 18 Jul 2007 11:32:26 +0200 Subject: [Bioperl-l] run megablast Message-ID: Hi, is there a module to run megablast in a script (equivalent to ncbi blast in StandAloneBlast.pm)? Cheers, Georg From jeevitesh at ibab.ac.in Wed Jul 18 06:03:24 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 15:33:24 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <47819.192.168.1.125.1184753004.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D we need to find the shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From jeevitesh at ibab.ac.in Wed Jul 18 03:15:33 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 12:45:33 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <55933.192.168.1.125.1184742933.squirrel@webmail.ibab.ac.in> Hi Friends, we need your valuable help in finding the SHARED PATH BETWEEN TWO NODES OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES. Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From jeevitesh at ibab.ac.in Wed Jul 18 04:45:50 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 14:15:50 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <43613.192.168.1.125.1184748350.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D we need to find the shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From cain.cshl at gmail.com Wed Jul 18 09:10:40 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 18 Jul 2007 09:10:40 -0400 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file In-Reply-To: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> Message-ID: <1184764240.2570.31.camel@localhost.localdomain> Hi Xianran Li, Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing as Bio::DB::GFF3), then you can use the attributes method to get anything in the ninth column: my ($name) = $gene->attributes('Name'); The parenthesis are needed around $name because the attributes method returns a list and the parens capture the first item of the list into $name. Scott On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote: > Hi, > > I want to extract some infomation from the gff3 file like: > > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative > > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? > > Thanks for your help. > > > Xianran Li > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/c66ec18b/attachment.bin From johnsonm at gmail.com Wed Jul 18 16:53:00 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 18 Jul 2007 15:53:00 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? In-Reply-To: <469DB6C6.9010702@pasteur.fr> References: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu> <469DB6C6.9010702@pasteur.fr> Message-ID: The output from InterProScan, invoked thusly: iprscan -cli -seqtype p -i input_file -o output_file -format xml On 7/18/07, Emmanuel Quevillon wrote: > Hi guys, > > I read your email and I wondered which iprscan file you've > been talking about? Is it the file produced by InterProScan > or the file called match.xml representing the whole uniprot > database against InterPro? Reading the xml parser > implemented into Bio::SeqIO::interpro, I guess it is the > second one? > In such case, I just want to let you know that the xml > schema changed and the file name also. It is now called > match_complete.xml. > I attached the DTD to be able to see the new structure. > Here is an example of the new data representation. > > > crc64="F1DD0C1042811B48"> > name="PyrdxlP-dep_Trfase_major_sub1" dbname="GENE3D" > status="T" evd="HMMPfam"> > type="Domain" /> > > > dbname="PANTHER" status="T" evd="not_rel"> > > > > > As you can see some time there is no interpro info (no ipr > element). > > I think it would be good to change also the interpro parser ? > > Regards > > Emmanuel > > Chris Fields wrote: > > On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote: > > > >> I'm tinkering with parsing iprscan reports with BioPerl. I noticed > >> that this: > >> > >> my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => > >> 'interpro'); > >> > >> while (my $seq = $seqio->next_seq()) { > >> ... > >> } > >> > >> Does not work unless I first 'use XML::DOM::XPath'. I get this error: > >> > >> Can't locate object method "findnodes" via package > >> "XML::DOM::Document" at > >> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line > >> 30. > >> > >> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to > >> suck in XML::DOM::Xpath. I see that t/interpro.t requires > >> XML::DOM::XPath: > >> > >> test_begin(-tests => 17, > >> -requires_module => 'XML::DOM::XPath'); > >> > >> Is suppose the reason the test specs a require XML::DOM::XPath is so > >> that tests can be skipped if XML::DOM::XPath is not available. > >> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? > > > > You're right; I think tests passed b/c XML::DOM::XPath (if present), > > was eval'd as a required module. When I commented out the spot where > > it is eval'd in the test suite I can replicate this error. I have > > added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it > > passes fine. > > > > Thanks for the heads up! > > > > chris > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cain.cshl at gmail.com Wed Jul 18 22:47:53 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 18 Jul 2007 22:47:53 -0400 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file In-Reply-To: <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL> References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> <1184764240.2570.31.camel@localhost.localdomain> <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL> Message-ID: <1184813273.2570.96.camel@localhost.localdomain> [Please always reply to the mailing list so that answers can archived] Yes, because commas are not allowed in GFF3 in an unescaped form. Essentially, you are doing this with your GFF3: Name=receptor kinase ORK10;Name= putative and when you do this: my ($name) = $gene->attributes('Name'); you are getting the first item in the list of names, and I suspect which one you get is random. To fix it, you need to replace the comma with %2C (the URL escape code for a comma). If you generated this GFF3, you will need to add a step to URI encode your attribute strings. If you got it from someone else, you should point out to them that their GFF is flawed. Scott On Thu, 2007-07-19 at 10:32 +0800, Xianran Li wrote: > However, the $name return the string "putative" rather than "receptor kinase ORK10". Is any particular reason? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing > as Bio::DB::GFF3), then you can use the attributes method to get > anything in the ninth column: > > my ($name) = $gene->attributes('Name'); > > The parenthesis are needed around $name because the attributes method > returns a list and the parens capture the first item of the list into > $name. > > Scott > > > On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote: > > Hi, > > > > I want to extract some infomation from the gff3 file like: > > > > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative > > > > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? > > > > Thanks for your help. > > > > > > Xianran Li > ----- Original Message ----- > From: "Scott Cain" > To: "Xianran Li" > Cc: > Sent: Wednesday, July 18, 2007 9:10 PM > Subject: Re: [Bioperl-l] extract information with Bio::DB::GFF3 fromgff3 file > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l�??i??'?????h??& -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/86cf671f/attachment.bin From acutter at eeb.utoronto.ca Thu Jul 19 22:25:08 2007 From: acutter at eeb.utoronto.ca (Asher Cutter) Date: Thu, 19 Jul 2007 22:25:08 -0400 Subject: [Bioperl-l] tree comparisons with bioperl Message-ID: <46A01D04.5040209@eeb.utoronto.ca> I was reading over the functions for working with trees in bioperl. I am looking for something that will compare two topologies and report back if they are equivalent. i.e. something like: does ((a,(b,c)) == ((A,B),C) ? (in this case, no) But of course in reality they would be more complicated topologies. This would be useful for simulating random trees to compare with some given topology of interest. I saw the methods for testing for monophyly and paraphyly, but not much beyond that...perhaps I have missed something? Any suggestions? Thanks, Asher -- ___________________________________ Asher D. Cutter Assistant Professor Department of Ecology & Evolutionary Biology University of Toronto 25 Harbord St. Toronto, ON, M5S 3G5 tel: 416-978-4602 email: acutter at eeb.utoronto.ca http://www.eeb.utoronto.ca/faculty/faculty_profile.cfm?prof_id=130 ___________________________________ From jeevitesh at ibab.ac.in Fri Jul 20 00:25:22 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Fri, 20 Jul 2007 09:55:22 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <53244.192.168.1.125.1184905522.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D The shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From n.haigh at sheffield.ac.uk Sun Jul 22 07:34:58 2007 From: n.haigh at sheffield.ac.uk (Nathan S Haigh) Date: Sun, 22 Jul 2007 12:34:58 +0100 Subject: [Bioperl-l] Ohloh account for bioperl In-Reply-To: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> References: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> Message-ID: <46A340E2.4040505@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Steve Chervitz wrote: > I came across a web app that tracks various metrics for open source > projects, noticed that bioperl wasn't listed, and added it: > > http://www.ohloh.net/projects/6685 > > Seems like an interesting resource that could help add some > visibility. It creates metrics by directly processing the source code > repository. I hooked it up to the CVS repos for bioperl-live, -db, > -run, and -pipeline. It has yet to do its analysis at this point. > > Feel free to create Ohloh accounts for yourselves. When you add > yourself as a contributor to Bioperl, you can indicate the username > associated with your commits, but this requires that it first process > the commit logs to figure out what the usernames are. You can still > create an account, just update it later with your username. > > Steve > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Nice to see the graphs of number of commits each developer has made over the last 5 years and how new developers have arisen while those more "seasoned" developers can relax a little more -proof of an excellent open source project! Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGo0Dih5z4PPfwHQoRAua4AJ9nxDJeqAZIbyv0M3g+6Y2xWzkEEgCgnHBO 4JWvG5Gy+H/UqpeXYAcSCX0= =LrFt -----END PGP SIGNATURE----- From cjfields at uiuc.edu Sun Jul 22 23:53:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 22 Jul 2007 22:53:48 -0500 Subject: [Bioperl-l] run megablast In-Reply-To: References: Message-ID: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu> StandAloneBlast runs the megablast executable directly, though I think you can specify a MegaBlast search using blastall with the '-n' flag. We could probably add this functionality in fairly easily since SearchIO can parse megablast output; no one's had the need to code it yet. chris On Jul 18, 2007, at 4:32 AM, Georg Otto wrote: > > Hi, > > is there a module to run megablast in a script (equivalent to ncbi > blast in StandAloneBlast.pm)? > > Cheers, > > Georg > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jeevitesh at ibab.ac.in Mon Jul 23 06:34:36 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Mon, 23 Jul 2007 16:04:36 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D The shared path between AB and AC is 2. and for AC and BD the shared path is 6. We need to find the shared distance as said above. Kindly helps us it will help our research a lot. With Thanks & regards jeevitesh From bix at sendu.me.uk Mon Jul 23 07:08:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 23 Jul 2007 12:08:23 +0100 Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance In-Reply-To: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> References: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> Message-ID: <46A48C27.6060905@sendu.me.uk> jeevitesh at ibab.ac.in wrote: > Hi Friends, > > We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF > A TREE. Please stop sending this message. We heard you the first time. If no one answered, either no one knows the answer or no one understood you. > The Distance method of TreeIO in Bioperl module gives the total distance. > > But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as > illustrated > in figure. > > Suppose we have a tree > A C > \ / > \2 2/ > \__________/ > / 6 \ > /2 2\ > / \ > B D > > The shared path between AB and AC is 2. > and for AC and BD the shared path is 6. I don't follow. But if you already know how to work the answer out, describe the algorithm in words and maybe someone can code it up for you. From georg.otto at tuebingen.mpg.de Mon Jul 23 09:56:46 2007 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Mon, 23 Jul 2007 15:56:46 +0200 Subject: [Bioperl-l] run megablast References: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu> Message-ID: Thanks a lot! I guess I should have read the blast documentation more carefully.... Best, Georg Chris Fields writes: > StandAloneBlast runs the megablast executable directly, though I > think you can specify a MegaBlast search using blastall with the '-n' > flag. > > We could probably add this functionality in fairly easily since > SearchIO can parse megablast output; no one's had the need to code it > yet. > > chris > > On Jul 18, 2007, at 4:32 AM, Georg Otto wrote: > >> >> Hi, >> >> is there a module to run megablast in a script (equivalent to ncbi >> blast in StandAloneBlast.pm)? >> >> Cheers, >> >> Georg >> From cjfields at uiuc.edu Mon Jul 23 11:41:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 23 Jul 2007 10:41:35 -0500 Subject: [Bioperl-l] Bio::Assembly bug/feature? Message-ID: <52744D70-CED6-49DB-8A17-0998F125D9AD@uiuc.edu> To all: I think I have found a major problem with Bio::Assembly; this was first noticed on Mac OS X in relation to bug 2320 and Bio::Assembly::IO. I am uncertain whether this is meant to be a feature or a bug but it certainly needs to be documented or fixed as it leads to subtle errors. I also can't see the advantage of this approach, but maybe I can be enlightened? Either way, I think it's worth a discussion for those willing to follow. I'll add as a bug later if needed. A bit of background: each instance of a Bio::Assembly::Contig has a Bio::SeqFeature::Collection instance attached to it; each Bio::SeqFeature::Collection itself has a tied DB_File handle attached which remains open during the lifetime of the Bio::SF::Collection object. When using Bio::Assembly one adds the various Contig objects to a Bio::Assembly::Scaffold. So, for instance, if one had ~1000 Contigs in a Scaffold, one would also have ~1000 open tied db handles, one per Contig instance. So far, so good. Unfortunately, when adding a ton of Contig objects to a Bio::Assembly::Scaffold one can run into a host of system-dependent issues based on resource usage limits (as one might expect). This script: ------------------------------ use Bio::Assembly::Scaffold; use Bio::Assembly::Contig; use Bio::SeqFeature::Generic; my $scaffold = Bio::Assembly::Scaffold->new(); for my $id (1..15000) { print "Contig #$id\n"; my $contig = Bio::Assembly::Contig->new(-id => $id); my $feat = Bio::SeqFeature::Generic->new(-start=>1, -end=>10, -strand=>1); $contig->add_features([$feat]); $scaffold->add_contig($contig); } ------------------------------ may fail on Mac OS X when one reaches the maximum number of open file descriptors possible on Mac OS X (on UNIX'y systems, this is 'ulimit - n'); the call to tie the DB_File handle in SF::Collection fails silently, so later on when called on you get the following: ... Contig #251 Contig #252 Contig #253 Contig #254 Can't call method "put" on an undefined value at /Users/cjfields/src/ bioperl-live/Bio/SeqFeature/Collection.pm line 225. I have added an exception to catch this. On Mac OS X you can increase the file descriptor limit using ulimit, at least to a certain point. However, when testing this out on dev.open-bio.org (Linux) the 'tie' sometimes fails (and the exception pops up), but it isn't dependent on 'ulimit -n'. This is what happens more often: ... Contig #10567 Contig #10568 Contig #10569 Contig #10570 Out of memory! Sometimes followed by a seg fault. Ick! Any ideas? For instance, should we set this up so that one SF::Collection is used for all the Contigs (since each one has a unique ID anyway)? Leave as is and document/track the issue as a bug? Both? chris From ba6450 at wayne.edu Mon Jul 23 16:06:14 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Mon, 23 Jul 2007 16:06:14 -0400 (EDT) Subject: [Bioperl-l] error running codeml Message-ID: <20070723160614.EEU90041@mirapointms6.wayne.edu> Hello everyone: I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: [code] use Bio::Tools::Run::Phylo::PAML::Codeml; use Bio::AlignIO; use Bio::TreeIO; my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => 'NM_000034.CDSalign.paml'); my $aln = $alignio->next_aln; my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); my $tree = $treeio->next_tree; my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); $codeml->alignment($aln); $codeml->tree($tree); my ($rc,$parser) = $codeml->run(); my $result = $parser->next_result; my $MLmatrix = $result->get_MLmatrix(); print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; [/code] It gives the following error when I try to compile: [error] ------------ EXCEPTION: Bio::Root::Exception ------------- MSG: unable to find or run executable for 'codeml' STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 ----------------------------------------------------------- Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 [/error] Any idea, guys? Munirul Islam Phd Student Computer Science Wayne State University From arareko at campus.iztacala.unam.mx Mon Jul 23 17:19:24 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 23 Jul 2007 16:19:24 -0500 Subject: [Bioperl-l] error running codeml In-Reply-To: <20070723160614.EEU90041@mirapointms6.wayne.edu> References: <20070723160614.EEU90041@mirapointms6.wayne.edu> Message-ID: <46A51B5C.9080808@campus.iztacala.unam.mx> Apparently, your script isn't able to locate the codeml executable in your Windows environment. Do you have the PAML package installed? Instructions on how to install it are located here: http://abacus.gene.ucl.ac.uk/software/paml.html Regards, Mauricio. Munirul Islam wrote: > Hello everyone: > > I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: > > [code] > use Bio::Tools::Run::Phylo::PAML::Codeml; > use Bio::AlignIO; > use Bio::TreeIO; > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > -file => 'NM_000034.CDSalign.paml'); > > my $aln = $alignio->next_aln; > > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); > my $tree = $treeio->next_tree; > > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); > > $codeml->alignment($aln); > $codeml->tree($tree); > > my ($rc,$parser) = $codeml->run(); > my $result = $parser->next_result; > my $MLmatrix = $result->get_MLmatrix(); > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; > [/code] > > It gives the following error when I try to compile: > > [error] > ------------ EXCEPTION: Bio::Root::Exception ------------- > MSG: unable to find or run executable for 'codeml' > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 > ----------------------------------------------------------- > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 > [/error] > > Any idea, guys? > > Munirul Islam > Phd Student > Computer Science > Wayne State University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From ba6450 at wayne.edu Mon Jul 23 19:53:22 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Mon, 23 Jul 2007 19:53:22 -0400 (EDT) Subject: [Bioperl-l] error running codeml Message-ID: <20070723195322.EEV22403@mirapointms6.wayne.edu> Thanks Mauricio. I needed to add an environment variable for the paml directiory. $ENV{'PAMLDIR'} = 'c:\paml3.15\bin'; One question ... I would like to save the temp files. So, what modification do I need to make such that $obj->save_tempfiles returns 1 within codeml.pm? Regards Munir ---- Original message ---- >Date: Mon, 23 Jul 2007 16:19:24 -0500 >From: Mauricio Herrera Cuadra >Subject: Re: [Bioperl-l] error running codeml >To: Munirul Islam >Cc: bioperl-l at lists.open-bio.org > >Apparently, your script isn't able to locate the codeml executable in >your Windows environment. Do you have the PAML package installed? >Instructions on how to install it are located here: > >http://abacus.gene.ucl.ac.uk/software/paml.html > >Regards, >Mauricio. > >Munirul Islam wrote: >> Hello everyone: >> >> I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: >> >> [code] >> use Bio::Tools::Run::Phylo::PAML::Codeml; >> use Bio::AlignIO; >> use Bio::TreeIO; >> >> my $alignio = Bio::AlignIO->new(-format => 'phylip', >> -file => 'NM_000034.CDSalign.paml'); >> >> my $aln = $alignio->next_aln; >> >> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); >> my $tree = $treeio->next_tree; >> >> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); >> >> $codeml->alignment($aln); >> $codeml->tree($tree); >> >> my ($rc,$parser) = $codeml->run(); >> my $result = $parser->next_result; >> my $MLmatrix = $result->get_MLmatrix(); >> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; >> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; >> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; >> [/code] >> >> It gives the following error when I try to compile: >> >> [error] >> ------------ EXCEPTION: Bio::Root::Exception ------------- >> MSG: unable to find or run executable for 'codeml' >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 >> ----------------------------------------------------------- >> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 >> [/error] >> >> Any idea, guys? >> >> Munirul Islam >> Phd Student >> Computer Science >> Wayne State University >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >-- >MAURICIO HERRERA CUADRA >arareko at campus.iztacala.unam.mx >Laboratorio de Gen?tica >Unidad de Morfofisiolog?a y Funci?n >Facultad de Estudios Superiores Iztacala, UNAM > From jason at bioperl.org Tue Jul 24 03:19:18 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 24 Jul 2007 09:19:18 +0200 Subject: [Bioperl-l] error running codeml In-Reply-To: <46A51B5C.9080808@campus.iztacala.unam.mx> References: <20070723160614.EEU90041@mirapointms6.wayne.edu> <46A51B5C.9080808@campus.iztacala.unam.mx> Message-ID: <8273f6c20707240019q1f5e55c9i79a3142a92e2be6e@mail.gmail.com> when you initialize the Codeml object just pass in my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1); OR do $codeml->save_tempfiles(1); You may want to set you TEMPDIR as well and you print out where the tempdir is located with print $codeml->tempdir; and I think you can get the temp outfile. my $name = $codeml->outfile_name; print "name is $name\n"; -jason On 7/23/07, Mauricio Herrera Cuadra wrote: > > Apparently, your script isn't able to locate the codeml executable in > your Windows environment. Do you have the PAML package installed? > Instructions on how to install it are located here: > > http://abacus.gene.ucl.ac.uk/software/paml.html > > Regards, > Mauricio. > > > Munirul Islam wrote: > > Hello everyone: > > > > I am new to bioperl. I am running perl in Eclipse in Windows. Here is > the code: > > > > [code] > > use Bio::Tools::Run::Phylo::PAML::Codeml; > > use Bio::AlignIO; > > use Bio::TreeIO; > > > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > > -file => 'NM_000034.CDSalign.paml'); > > > > my $aln = $alignio->next_aln; > > > > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); > > my $tree = $treeio->next_tree; > > > > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); > > > > $codeml->alignment($aln); > > $codeml->tree($tree); > > > > my ($rc,$parser) = $codeml->run(); > > my $result = $parser->next_result; > > my $MLmatrix = $result->get_MLmatrix(); > > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; > > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; > > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; > > [/code] > > > > It gives the following error when I try to compile: > > > > [error] > > ------------ EXCEPTION: Bio::Root::Exception ------------- > > MSG: unable to find or run executable for 'codeml' > > STACK: Error::throw > > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 > > ----------------------------------------------------------- > > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI > (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 > > [/error] > > > > Any idea, guys? > > > > Munirul Islam > > Phd Student > > Computer Science > > Wayne State University > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From ba6450 at wayne.edu Tue Jul 24 17:16:54 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Tue, 24 Jul 2007 17:16:54 -0400 (EDT) Subject: [Bioperl-l] error loading sequence Message-ID: <20070724171654.EEX04380@mirapointms6.wayne.edu> Hello everyone: I am having problem loading a sequence file from within a directory. ############################################################# $dirname = "rundir"; opendir (DIR, $dirname) || die("can't open $dirname"); while (defined($file = readdir(DIR))) { next if $file =~ /^\.\.?$/; # skip . and .. $abs_path = File::Spec->rel2abs( $file ) ; # gives a file not found exception for the following code my $alignio = Bio::AlignIO->new(-format => 'nexus', -file => $abs_path); my $aln = $alignio->next_aln; @sequencenames -> $aln->_read_taxlabels; foreach $taxa (@sequencenames) { print $taxa . "\n"; } } ############################################################# Your suggestions please. Regards, Munirul Islam PhD Student Computer Science Wayne State University Detroit, Michigan, USA From bix at sendu.me.uk Tue Jul 24 18:39:33 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 24 Jul 2007 23:39:33 +0100 Subject: [Bioperl-l] error loading sequence In-Reply-To: <20070724171654.EEX04380@mirapointms6.wayne.edu> References: <20070724171654.EEX04380@mirapointms6.wayne.edu> Message-ID: <46A67FA5.3070505@sendu.me.uk> Munirul Islam wrote: > Hello everyone: > > I am having problem loading a sequence file from within a directory. > > ############################################################# > $dirname = "rundir"; > opendir (DIR, $dirname) || die("can't open $dirname"); > > while (defined($file = readdir(DIR))) { > next if $file =~ /^\.\.?$/; # skip . and .. > $abs_path = File::Spec->rel2abs( $file ) ; > > # gives a file not found exception for the following code This isn't a Bioperl problem. You're using the wrong File::Spec method. You want File::Spec->catfile($dirname, $file). From ba6450 at wayne.edu Tue Jul 24 20:10:04 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Tue, 24 Jul 2007 20:10:04 -0400 (EDT) Subject: [Bioperl-l] error loading sequence Message-ID: <20070724201004.EEX30791@mirapointms6.wayne.edu> Thanks. That worked nicely. I need your suggestion to load codeml control data from a file. Consider the following code: ------------------------------------------------------------- my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1, -params => {'noisy' => 9, 'verbose' => 2, 'runmode' => 0, 'seqtype' => 1, 'CodonFreq' => 2, 'aaDist' => 0, 'model' => 2, 'NSsites' => 2, 'icode' => 0 }); ------------------------------------------------------------- Tried to modify it by passing a hash reference after loading data from a file.: ------------------------------------------------------------- my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1, -params => \%hashlist ); ------------------------------------------------------------- Still that didn't work. Your suggestions pls. Munir ---- Original message ---- >Date: Tue, 24 Jul 2007 23:39:33 +0100 >From: Sendu Bala >Subject: Re: [Bioperl-l] error loading sequence >To: Munirul Islam >Cc: bioperl-l at lists.open-bio.org > >Munirul Islam wrote: >> Hello everyone: >> >> I am having problem loading a sequence file from within a directory. >> >> ############################################################# >> $dirname = "rundir"; >> opendir (DIR, $dirname) || die("can't open $dirname"); >> >> while (defined($file = readdir(DIR))) { >> next if $file =~ /^\.\.?$/; # skip . and .. >> $abs_path = File::Spec->rel2abs( $file ) ; >> >> # gives a file not found exception for the following code > >This isn't a Bioperl problem. You're using the wrong File::Spec method. >You want File::Spec->catfile($dirname, $file). From ba6450 at wayne.edu Thu Jul 26 15:21:20 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Thu, 26 Jul 2007 15:21:20 -0400 (EDT) Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl) Message-ID: <20070726152120.EFA94600@mirapointms6.wayne.edu> Hello Everyone: I have an alignment ('seq.txt'). It runs fine when I directly run codeml. But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved. my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => 'seq.txt'); I guess its not in valid phylip format. I tried to change 'seq.txt' to sequential format. Still that didn't work. Any suggestions on how to load 'seq.txt' in bioperl? Thanks, Munir PhD Student Computer Science Wayne State University -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: seq.txt Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0001.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: seq.out Type: application/octet-stream Size: 24318 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0001.obj From jason at bioperl.org Thu Jul 26 20:12:03 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Jul 2007 17:12:03 -0700 Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl) In-Reply-To: <20070726152120.EFA94600@mirapointms6.wayne.edu> References: <20070726152120.EFA94600@mirapointms6.wayne.edu> Message-ID: <8273f6c20707261712o149fb884v2044421146e8bc24@mail.gmail.com> You can try and pass in -interleaved => 0 as another option when you init your AlignIO object. On 7/26/07, Munirul Islam wrote: > Hello Everyone: > > I have an alignment ('seq.txt'). It runs fine when I directly run codeml. But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved. > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > -file => 'seq.txt'); > > I guess its not in valid phylip format. > > I tried to change 'seq.txt' to sequential format. Still that didn't work. > > Any suggestions on how to load 'seq.txt' in bioperl? > > Thanks, > > Munir > PhD Student > Computer Science > Wayne State University > > 11 2202 > > human > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAT AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC > GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC > CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT > TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CAC CCC TCA GAG CGC CCC ACA GCT GGC CCC > ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG > CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT > GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG --- > --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG > CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CGG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGA GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG > AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TCC CGG AGT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > chimp > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAC AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AAA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC > GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC > CCC AGC GAG AGA CTT TAC ACC CAG GAT GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC > CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT > TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CGC CCC TCA GAG CGC CCC ACA GCT GGC CCC > ACA GGT CCC CCC NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN --- NNN ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG > CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT > GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG --- > --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG > CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT TTG GAC AAG > CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG > AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TCC CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > macaca > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AAA ACC NNN AAT CTC ACT GAC AGG CAG CTG GCA GAG GAC TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CAT --- GGA GAC TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC CAG ACC GGT GAG CTA GAC AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAA GAC GCC TTT GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGG CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCG > CTG GGC AAG GGC GTC GTG GTT CCA ACT AAG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACA GAC GGT CGC TCC GAC > GGC GTG CCC TGG TGC AGT ACC ACA GCC AAC TAC GAC ACT GAC CGC CGG TTT GGC TTC TGT > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAC GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GCC GAC TCG ACC GTG ATC GGG GGC AAC TCG GCG GGG GAG CTG TGC GTT TTC CCC TTC > ACC TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT CTG TTC CTC GTG GCA GCT CAC GAA TTC GGC CAC GCG CTG GGC TTA GAT CAT > ACC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGA TTC ACT GAG GAG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CAG TAT CTC TAT GGT TCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACT GGA CCC CCC ACT GTC CGC CCC TCA GAC CGC CCC ACA GCC GGC CCC > ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG ACC ACT ACT --- GTG > CCT TTG AAT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC ACG GAG ATC > GGG AAC CAG CTG TAT CTG TTC AAG GAT GGG AGG TAC TGG --- --- CGA TTC TCC GAG --- > --- CGC AGG GGG AGC CGG CTG CAG GGC CCC TTC CTT ATC GCC GAC ACG TGG CCC GCG TTG > CCC CGC AAG CTG GAC TCG GCC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTA GAC AAG > CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG CGT GGC GCG GGG > AAG ATG CTG CTA TTC AGC GGG CGG CGC TTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTA GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CAA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TTC CAG AGT NNN NNN NNN NNN NNN NNN NNN GGG GTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > mouse > GCT GCC CCT TAC CAG CGC --- CAG CCG --- ACT TTT --- GTG GTC TTC CCC AAA GAC CTG > AAA ACC TCC AAC CTC ACG GAC ACC CAG CTG GCA GAG GCA TAC TTG TAC CGC TAT GGT TAC > ACC CGG GCC GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCT CTA CGG --- CCG GCT TTG > CTG ATG CTT CAG AAG CAG CTC TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC CAG ACA CTA > AAG GCC ATT CGA ACA CCA CGC TGT GGT GTC CCA GAC GTG GGT CGA TTC CAA ACC TTC AAA > GGC NNN CTC AAG TGG GAC CAT CAT AAC ATC ACA TAC TGG ATC CAA AAC TAC TCT GAA GAC > TTG CCG CGA GAC ATG ATC GAT GAC GCC TTC GCG CGC GCC TTC GCG GTG TGG GGC GAG GTG > GCA CCC CTC ACC TTC ACC CGC GTG TAC GGA CCC GAA GCG GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGC AAG GAC GGC CTT CTG GCA CAC GCC > TTT CCC CCT GGC GCC GGC GTT CAG GGA GAT GCC CAT TTC GAC GAC GAC GAG TTG TGG TCG > CTG GGC AAA GGC GTC GTG ATC CCC ACT TAC TAT GGA AAC TCA AAT GGT GCC CCA TGT CAC > TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TCG GCC TGC ACC ACA GAC GGC CGC AAC GAC > GGC ACG CCT TGG TGT AGC ACA ACA GCT GAC TAC GAT AAG GAC GGC AAA TTT GGT TTC TGC > CCT AGT GAG AGA CTC TAC ACG GAG CAC GGC AAC GGA GAA GGC AAA CCC TGT GTG TTC CCG > TTC ATC TTT GAG GGC CGC TCC TAC TCT GCC TGC ACC ACT AAA GGC CGC TCG GAT GGT TAC > CGC TGG TGC GCC ACC ACA GCC AAC TAT GAC CAG GAT AAA CTG TAT GGC TTC TGC CCT ACC > CGA GTG GAC GCG ACC GTA GTT GGG GGC AAC TCG GCA GGA GAG CTG TGC GTC TTC CCC TTC > GTC TTC CTG GGC AAG CAG TAC TCT TCC TGT ACC AGC GAC GGC CGC AGG GAT GGG CGC CTC > TGG TGT GCG ACC ACA TCG AAC TTC GAC ACT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA > GGG TAC AGC CTG TTC CTG GTG GCA GCG CAC GAG TTC GGC CAT GCA CTG GGC TTA GAT CAT > TCC AGC GTG CCG GAA GCG CTC ATG TAC CCG CTG TAT AGC TAC CTC GAG GGC TTC CCT CTG > AAT AAA GAC GAC ATA GAC GGC ATC CAG TAT CTG TAT GGT CGT GGC TCT AAG CCT GAC CCA > AGG CCT CCA GCC ACC ACC ACA ACT NNN NNN NNN GAA --- CCA CAG CCG ACA GCA CCT CCC > ACT ATG TGT CCC ACT ATA CCT CCC ACG GCC TAT CCC ACA GTG GGC CCC ACG GTT GGC CCT > ACA GGC GCC CCC TCA CCT GGC CCC ACA AGC AGC CCG TCA CCT GGC CCT ACA GGC GCC CCC > TCA CCT GGC CCT ACA GCG CCC --- CCT ACT GCG GGC TCT TCT GAG GCC TCT ACA --- GAG > TCT TTG AGT CCG GCA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCT ATT GCT GAG ATC > CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT TGG TAC TGG --- --- AAG TTC CTG AAT --- > --- CAT AGA GGA AGC CCA TTA CAG GGC CCC TTC CTT ACT GCC CGC ACG TGG CCA GCC CTG > CCT GCA ACG CTG GAC TCC GCC TTT GAG GAT CCG CAG ACC AAG AGG GTT TTC TTC TTC TCT > GGA CGT CAA ATG TGG GTG TAC ACA GGC AAG ACC GTG CTG GGC CCC AGG AGT CTG GAT AAG > TTG GGT CTA GGC CCA GAG GTA ACC CAC GTC AGC GGG CTT CTC CCG CGT CGT CTC --- GGG > AAG GCT CTG CTG TTC AGC AAG GGG CGT GTC TGG AGA TTC GAC TTG AAG TCT CAG AAG GTG > GAT CCC CAG AGC GTC ATT CGC --- --- GTG GAT AAG GAG TTC TCT GGT GTG CCC TGG AAC > TCA CAC GAC ATC TTC CAG TAC CAA --- GAC AAA GCC TAT --- TTC TGC CAT GGC AAA TTC > TTC TGG CGT GTG AGT TTC CAA AAT GAG GTG AAC AAG GTG GAC CAT GAG GTG AAC CAG GTG > GAC GAC GTG GGC TAC GTG ACC TAC GAC CTC CTG CAG TGC CCT > rat > GCT GCC CCT CAC CAG CGC --- CAG CCG --- ACT TAT --- GTG GTC TTC CCC CGA GAC CTG > AAA ACC TCC AAC CTC ACG GAC ACA CAG CTG GCA GAG GAT TAC CTG TAC CGC TAT GGT TAC > ACT CGG GCA GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCC CTG CGG --- CCC GCT TTG > CTG ATG CTT CAG AAG CAG CTG TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC GAG ACA CTA > AAG GCC ATT CGT TCA CCG CGC TGT GGT GTC CCA GAC GTG GGC AAA TTC CAA ACC TTC GAA > GGC GAC CTC AAG TGG CAC CAT CAT AAC ATC ACC TAT TGG ATC CAA AGC TAC ACC GAA GAC > TTG CCG CGA GAC GTG ATC GAT GAC TCC TTC GCG CGC GCC TTC GCG GTG TGG AGC GCG GTG > ACA CCG CTC ACC TTC ACC CGC GTG TAC GGG CTC GAA GCA GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGG GAC GGG TAT CCC TTC GAC GGC AAG GAT GGT CTA CTG GCA CAC GCC > TTT CCC CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAC GAG TTG TGG TCG > CTG GGC AAA GGC GCC GTG GTC CCC ACT TAC TTT GGA AAC GCA AAT GGT GCC CCA TGT CAC > TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TTG TCC TGC ACC ACG GAT GGC CGC AAC GAC > GGC AAG CCT TGG TGT GGC ACG ACA GCT GAC TAC GAC ACA GAC AGA AAA TAT GGT TTC TGC > CCC AGT GAG AAT CTC TAC ACG GAG CAT GGC AAC GGA GAC GGC AAA CCC TGC GTA TTT CCA > TTC ATC TTC GAG GGC CAC TCC TAC TCT GCC TGC ACC ACT AAA GGT CGC TCG GAT GGT TAT > CGC TGG TGC GCC ACC ACC GCC AAC TAT GAC CAG GAT AAG CTG TAT GGC TTC TGT CCT ACT > CGA GCC GAC GTC ACT GTA ACT GGG GGC AAC TCG GCA GGA GAG ATG TGC GTC TTC CCC TTC > GTC TTC CTG GGC AAG CAG TAC TCT ACC TGT ACC GGC GAG GGC CGC AGT GAT GGG CGC CTC > TGG TGC GCG ACG ACG TCG AAC TTC GAC GCT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA > GGG TAC AGC CTG TTT CTG GTG GCA GCG CAC GAG TTC GGC CAT GCG CTG GGC TTA GAT CAT > TCT TCA GTG CCG GAA GCG CTC ATG TAC CCC ATG TAT CAC TAC CAC GAG GAC TCC CCT CTG > CAT GAA GAC GAC ATA AAA GGC ATC CAG CAT CTG TAT GGT CGT GGC TCT AAA CCT GAC CCA > AGG CCT CCA GCC ACC ACC GCA GCT NNN NNN NNN GAA --- CCA CAG CCG ACA GCT CCT CCC > ACT ATG TGT CCC ACT GCA CCT CCC ATG GCC TAT CCC ACA GGG GGC CCC ACA GTC GCC CCT > ACA GGC GCC CCC TCA CCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCT ACT GCT GGT CCT TCT GAG GCC CCT ACA --- GAG > TCT TCG ACT CCA GTA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCC ATT GCT GAT ATC > CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT CGG TAT TGG --- --- AAG TTC TCG AAT --- > --- CAC GGA GGA AGC CAA TTG CAG GGC CCC TTT CTT ATT GCC CGC ACG TGG CCA GCT TTG > CCT GCA AAG TTG AAC TCA GCC TTT GAG GAT CCG CAG TCC AAG AAG ATT TTC TTC TTC TCT > GGG CGC AAA ATG TGG GTG TAC ACA GGC CAG ACG GTG CTG GGC CCC AGG AGT CTG GAT AAG > TTG GGG CTA GGC TCA GAG GTA ACC CTG GTC ACC GGA CTT CTC CCG CGT CGT GGA --- GGG > AAG GCT CTG CTG ATC AGC CGG GAA CGT ATC TGG AAA TTC GAC TTG AAG TCT CAG AAG GTG > GAT CCC CAG AGC GTT ACT CGC --- --- TTG GAT AAC GAG TTC TCT GGC GTG CCC TGG AAC > TCA CAC AAC GTC TTT CAC TAC CAA --- GAC AAG GCC TAT --- TTC TGC CAT GAC AAA TAC > TTC TGG CGT GTG AGT TTC CAC AAC NNN NNN NNN NNN NNN NNN NNN CGG GTG AAC CAG GTG > GAC CAC GTG GCC TAC GTG ACC TAT GAC CTC CTG CAG TGC CCT > rabbit > GCC GCC CCT CGC CGC CGC --- CAG CCC --- ACC TTG --- GTG GTC TTC CCA GGA GAG CTG > AGA ACC NNN AGG CTC ACC GAC AGG CAG CTG GCA GAG GAG TAC CTG TTC CGC TAT GGT TAC > ACC CGC GTA GCC AGC ATG CAC --- GGA GAC AGC CAG --- TCC CTG CGG CTG CCG --- CTG > CTA CTT CTG CAG AAG CAT CTG TCC CTG CCG GAG ACG GGG GAG CTG GAT AAT GCC ACC CTG > GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC GTG GGC AAA TTC CAG ACC TTC GAG > GGT GAC CTC AAG TGG CAC CAC CAC AAC ATC ACG TAC TGG ATC CAA AAC TAC TCC GAA GAC > CTG CCG CGC GAC GTC ATC GAC GAC GCC TTC GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG > ACG CCA CTC ACC TTC ACC CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGG > GTC GCG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGC AAG GAC GGG CTC CTG GCG CAC GCC > TTC CCT CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAA GAG CTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCC ACG TAC TTT GGA AAC GCC GAC GGC GCC CCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC ACC GCC TGC ACC ACG GAC GGC CGC TCT GAC > GGC ATG GCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTT GGC TTC TGC > CCC AGC GAA AGA CTC TAC ACC CAG GAC GGC AAC GCA GAC GGC AAG CCC TGC GAG TTT CCG > TTC ATC TTC CAG GGC CGT ACC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCC GAC GGC CAC > CGC TGG TGC GCC ACC ACC GCC AGC TAC GAC AAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GCT GAC TCC ACG GTG GTC GGG GGC AAC TCG GCG GGA GAG CTG TGT GTC TTC CCC TTC > GTC TTC CTG GGC AAA GAG TAC TCG TCC TGT ACC AGC GAG GGT CGC AGG GAT GGG CGC CTC > TGG TGT GCC ACC ACT TCC AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCT GAT AAA > GGA TAC AGC CTG TTC CTC GTG GCA GCC CAC GAG TTC GGC CAT GCA CTG GGC TTG GAT CAC > TCC TCT GTG CCG GAG CGC CTC ATG TAC CCC ATG TAC CGC TAC CTA GAG GGG TCC CCC CTG > CAC GAG GAC GAC GTC AGG GGC ATC CAG CAT CTA TAT GGT CCT AAC CCC AAC CCC CAG CCT > --- CCA GCC ACC ACC ACA CCT GAN NNN NNN NNN NNN NNG CCG CAG CCC ACG GCT CCC CCG > ACG GCC TGC CCC ACC TGG CCG GCC ACT GTG CGC CCC TCC GAG CAC CCC ACT ACC AGC CCT > ACC GGC GCC CCC TCA GCT GGC CCT ACC GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACG GCC AGC CCC TCT GCG GCC CCC ACT --- GCG > TCC TTG GAC CCA GCT GAA GAC GTC TGC AAC GTG AAT GTC TTC GAC GCC ATC GCC GAG ATA > GGG AAC AAG CTG CAT GTC TTC AAG GAT GGG AGG TAC TGG --- --- CGG TTC TCC GAG --- > --- GGC AGT GGG CGC CGG CCG CAG GGC CCC TTC CTC ATC GCC GAC ACC TGG CCC GCG CTG > CCG GCC AAG CTG GAC TCC GCC TTT GAG GAG CCG CTC ACC AAG AAG CTG TTC TTC TTC TCG > GGG CGC CAA GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGT CCC GAG GTG CCG CAC GTC ACC GGA GCC CTC CCG CGC GCC GGG --- GGC > AAG GTG CTG CTG TTC GGC GCG CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACG GTG > GAT TCC CGG AGC GGC GCT CCG --- --- GTG GAT CAG ATG TTC CCC GGG GTG CCT TTG AAC > ACA CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TTC TGG CGT GTG AGT ACC CGG AAC NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CTA GTG > GAC CAG GTG GGC TAC GTG AGC TTT GAC ATC CTG CAC TGC CCT > dog > GCA GCT CCC AGA CCA CAC --- AAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAC CTG > AGA ACT NNN AAT CTC ACT GAC AAG CAG CTG GCA GAG GAA TAT CTG TTT CGC TAT GGC TAC > ACT CAA GTG GCC GAG CTG AGC --- GAC GAC AAG CAG --- TCC CTG AGT CGC GGG --- CTG > CGG CTT CTC CAG AGG CGC CTG GCT CTG CCT GAG ACT GGA GAG CTG GAC AAA ACC ACC CTG > GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC CTG GGC AAA TTC CAG ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC AAC GAC ATC ACT TAC TGG ATA CAA AAC TAC TCG GAA GAC > TTG CCC CGC GAC GTG ATC GAC GAC GCC TTT GCC CGA GCC TTC GCG GTC TGG AGC GCG GTG > ACA CCG CTC ACC TTC ACT CGC GTG TAC GGC CCC GAA GCC GAC ATC ATC ATT CAG TTT GGT > GTT AGG GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTT CTG GCT CAC GCC > TTT CCT CCC GGC CCG GGC ATT CAG GGA GAC GCC CAC TTC GAC GAC GAG GAG TTA TGG ACT > CTG GGC AAG GGC GTC GTG GTT CCG ACC CAC TTC GGA AAC GCA GAT GGC GCC CCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACG GAC GGC CGC TCC GAT > GAC ACG CCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTC GGC TTC TGC > CCC AGC GAG AAA CTC TAC ACC CAG GAC GGC AAT GGG GAC GGC AAG CCC TGC GTG TTT CCG > TTC ACC TTC GAG GGC CGC TCC TAC TCC ACG TGC ACC ACC GAC GGC CGC TCG GAC GGC TAC > CGC TGG TGC TCC ACC ACC GGC GAC TAC GAC CAG GAC AAA CTC TAC GGC TTC TGC CCA ACC > CGA GTC GAT TCC GCG GTG ACC GGG GGC AAC TCC GCC GGG GAG CCG TGT GTC TTC CCC TTC > ATC TTC CTG GGC AAG CAG TAC TCG ACG TGC ACC AGG GAG GGC CGC GGA GAT GGG CAC CTC > TGG TGC GCC ACC ACT TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGC CTG TTC CTT GTG GCC GCC CAT GAG TTC GGC CAC GCG CTG GGT TTA GAT CAT > TCA TCG GTG CCA GAA GCG CTC ATG TAC CCC ATG TAC AGC TTC ACC GAG GGC CCC CCC CTG > CAT GAA GAC GAC GTG AGG GGC ATC CAG CAT CTG TAC GGT CCT CGC CCT GAA CCT GAG CCA > CAG CCT CCA ACC GCN NNN NNN NNN NNN NNN NNN NNN --- NNC CCG CCC ACC GCC CCG CCC > ACC GTC TGC GCT ACT GGT CCT CCC ACC ACC CGC CCC TCA GAG CGC CCC ACT GCT GGC CCC > ACA GGC CCC CCT GCA GCT GGC CCC ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCC TCT GAG GCC CCT ACA --- GTG > CCT GTG GAT CCG GCA GAG GAT ATA TGC AAA GTG AAC ATC TTC GAC GCC ATC GCG GAG ATC > AGG AAC TAC TTG CAT TTC TTC AAG GAA GGG AAG TAC TGG --- --- CGA TTC TCC AAG --- > --- GGC AAG GGA CGC CGG GTG CAG GGC CCC TTC CTT ATC ACC GAC ACG TGG CCT GCG CTG > CCC CGC AAG CTG GAC TCC GCC TTT GAG GAC GGG CTC ACC AAG AAG ACT TTC TTC TTC TCT > GGG CGC CAA GTG TGG GTG TAC ACA GGC ACG TCG GTG GTA GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGC CCG GAG GTT ACC CAA GTC ACC GGC GCC CTC CCG CAA GGC GGG --- GGT > AAG GTG CTG CTG TTC AGC AGG CAG CGC TTC TGG AGT TTC GAC GTG AAG ACG CAG ACC GTG > GAT CCC AGG AGC GCC GGC TCG --- --- GTG GAA CAG ATG TAC CCC GGG GTG CCC TTG AAC > ACG CAT GAC ATC TTC CAG TAC CAA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TAC TGG CGT GTG AAT TCT CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CAG GTG > GAC GAA GTG GGC TAC GTG ACC TTT GAC ATT TTG CAG TGC CCT > cow > GCT GTC CCC AGA CGA CGC --- CAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAA CCA > CGA ACC NNN AAC CTC ACC AAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGC TAC > ACT CCT GGG GCA GAG CTG AGC --- GAG GAC GGT CAG --- TCC CTG CAG CGA GCT CTG CTG > CGC --- TTC CAG CGG CGC CTG TCC CTG CCC GAG ACT GGC GAG CTG GAC AGC ACC ACC CTG > AAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC GTG GGC AGA TTC CAG ACC TTT GAG > GGC GAA CTC AAG TGG CAC CAC CAC AAC ATC ACC TAC TGG ATC CAA AAT TAC TCG GAA GAC > CTG CCG CGC GCC GTG ATC GAC GAC GCC TTT GCC CGC GCT TTC GCG CTC TGG AGC GCT GTG > ACG CCG CTC ACC TTC ACT CGA GTG TAC GGC CCC GAA GCT GAC ATT GTC ATC CAG TTT GGT > GTT AGA GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTC CTG GCA CAC GCC > TTT CCG CCT GGC AAA GGC ATT CAG GGA GAT GCC CAC TTC GAC GAT GAA GAG TTG TGG TCT > CTG GGC AAA GGC GTT GTG ATC CCG ACC TAC TTC GGA AAC GCG AAG GGC GCC GCC TGC CAC > TTC CCC TTC ACC TTT GAG GGT CGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGT TCC GAC > GAC ATG CTC TGG TGC AGC ACC ACC GCC GAC TAC GAC GCC GAC CGC CAG TTC GGC TTC TGC > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCG GAC GGC AAG CCC TGC GTC TTC CCG > TTC ACC TTC CAG GGC CGC ACC TAC TCC GCC TGT ACC TCC GAT GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GTC GAT GCA ACG GTG ACC GGG GGC AAC GCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACC TTC CTG GGC AAG GAA TAC TCG GCC TGC ACC AGA GAG GGT CGC AAT GAT GGG CAC CTC > TGG TGC GCC ACC ACC TCC AAC TTC GAC AAA GAC AAG AAG TGG GGC TTC TGC CCG GAT CAA > GGA TAC AGC CTG TTC CTT GTG GCC GCA CAC GAG TTT GGC CAC GCG CTG GGC TTA GAT CAC > ACC TCC GTG CCA GAG GCG CTC ATG TAC CCC ATG TAC AGA TTC ACA GAG GAG CAC CCC CTG > CAT AGG GAC GAT GTT CAG GGC ATC CAG CAT CTG TAT GGT CCT CGC CCT GAG CCT GAA CCA > CGG CCT CCG ACC ACT ACC ACC ACT ACC ACC ACC GAA --- CCC CAG CCC ACC GCT CCC CCC > ACG GTC TGC GTC ACG GGG CCT CCC ACC GCC CGC CCC TCA GAG GGT CCC ACT ACT GGC CCC > ACA GGG CCC CCG GCA GCT GGC CCT ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCT --- CCC ACG GCT GGC CCT TCT GCG GCC CCG ACG GAG TCC > CCG --- GAT CCA GCG GAG GAC GTC TGC AAC GTG GAC ATC TTC GAC GCC ATC GCG GAG ATT > AGG AAC CGC TTG CAT TTC TTC AAG GCT GGG AAG TAC TGG --- --- AGA CTT TCT GAG --- > --- GGA GGG GGC CGC CGG GTG CAG GGT CCC TTC CTT GTC AAG AGC AAG TGG CCT GCG CTG > CCC CGC AAG CTG GAC TCC GCC TTC GAG GAT CCG CTC ACC AAG AAG ATT TTC TTC TTC TCT > GGG CGC CAA GTA TGG GTG TAC ACC GGC GCG TCG TTG CTA GGC CCG AGG CGT CTG GAC AAG > TTG GGC CTG GGC CCG GAA GTG GCC CAG GTC ACC GGG GCC CTC CCG CGC CCT GAG --- GGT > AAG GTG CTG CTG TTC AGC GGG CAG AGC TTC TGG AGG TTC GAC GTG AAG ACA CAG AAG GTG > GAT CCC CAG AGC GTC ACC CCC --- --- GTG GAC CAG ATG TTC CCC GGG GTG CCC ATT AGC > ACG CAC GAC ATC TTT CAG TAC CAA --- GAG AAA GCT TAC --- TTC TGC CAG GAT CAC TTC > TAC TGG CGC GTG AGT TCC CAG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAT CAG GTG > GAC TAT GTG GGC TAC GTG ACC TTC GAC CTC CTG AAG TGC CCT > elephant > --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- > --- --- --- --- --- --- --- --- --- --- --- GAG --- TAT CTG TAC CGC TAT GGC TAC > ACT CGT GTG GCG GAG ATG AAC --- --- AGT AAG GTG --- TCC CTG GGT --- CGA GCG CTA > AGG CTT CTC CAG CAA AAC CTG GCC CTG CCC GAG ACC GGC GAG CTG GAC AGC ACC ACC CTG > GAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC ATG GGT GGC TTC CAG ACC TTC GAG > GGT GAC CTC AAG TGG AAC CAC CAC AAC ATC ACA TAC TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCC AAA CAA GTG ATC GAA GAC GCT TTT GCC CGC GCC TTC GCG GCG TGG AGC GAG GTG > ACA CCA CTC ACC TTC ACC CGC CTG CGC AGC AGG GAC GTG GAC ATC GTC ATC CGG TTT GGG > GTC AAG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGG AAG GAC GGG CTG CTG GCA CAC GCC > TTT CCT CCC GGC CCC GGC ATT CAG GGA GAC GCG CAC TTC GAC GAT GAC GAA TTG TGG TCG > TTG GGC AAG GGC GTC GTG GTT CCC ACC CGC TTT GGA AAC GCA GAT GGC GCC GCC TGC CAC > TTT CCC TTC ACC TTC CAG GGC CGC TCG TAC ACT GCC TGC ACC GCC GAC GGC CGC TCC GAC > GGC CAG CTC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGC CAG TTT GGC TTC TGC > CCC AGT GAG AGG CTC TAC ACC CAG CAC GGC AAT GAC AAC GGC AAG CCC TGC GTG TTT CCG > TTC ACG TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACC GAC GGC CGC TCG GAT GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAT GGC TTC TGT CCC ACC > CGA --- GNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- NNN NNN NNN --- > --- --- --- --- --- --- --- --- NNN NNN --- NNN NNN NNN --- --- --- --- --- --- > --- --- --- --- NNN NNN NNN NNN NNN --- --- --- --- --- --- --- --- NNN NNN NNN > NNN NNN --- --- --- --- NNN --- NNN NNN NNN NNN --- --- --- --- NNN NNN --- --- > --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- --- NNN NNN NNN NNN --- > --- --- --- --- --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- NNN NNN NNN --- NNN > NNN ATA GTG CTG TTT AGT AGA CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACT GTG > GAG CCC CGG AGC GTC CGC TCG --- --- GTG GAC CAG GTG TTC TCC GGG GTG CCC TTG GAC > ACG CAC GAC ATC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG TGT TTC CGG AAT GAT --- AAT GAA --- --- --- --- GTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG AAC TTT GAC ATC CTG CAG TGC CCT > opossum > GCT GCA CCC CGA GGG GGC CCC TCT CCC GGG TCT ATC TTG ATC ACC TTT CCT GAA GAG AGA > --- ACA CGC ACT CTC ACT GAC CAG CAA TTT GCT GAG GAA TAT CTG CTT CGG TAC GGC TAC > ATC CCG --- GCA GGG CTT CTG --- GGC CAA AAC CAC ACT TCT CTG AAG --- CAT GCC TTA > AAG AAA CTC CAA CGT CAG CTG GCC CTG ACA CAG ACG GGA GAG CTG GAC AGC GCC ACC ATC > GAG GCA ATG CGG GCC CCG CGC TGC GGA GTA CCC GAC GTC GCC CCA TTC CAA ACC TTC GAG > GGT GAA CTG AAG TGG AAA CAT CAG AAC ATC ACC TAT CGG ATC CAG AAT TAC TCC CCC GAC > CTG CCT CCT GAG GTG ACG GAT GAT GCT TTC CAA CGA GCC TTT GCT CTG TGG AGT AAA GTG > ACC CCA CTC ACC TTC ACA CGT GTC AGC AGC GGG GAG GCA GAC ATC CTG ATC CAG TTT GGG > ACC AGA GAG CAC GGC GAT GGA TAC CCT TTT GAC GGG AAA GAT GGA CTC TTG GCT CAC GCT > TTC CCC CCG GGC CCA GGA ATC CAG GGA GAT GCC CAC TTT GAT GAC GAG GAG TTC TGG ACT > CTA GGC AAA GGC GTC GTG GTC AAA ACG CGG TTC GGG AAC GCA GAC GGA GCC CCC TGC CAC > TTT CCT TTC ACC TTC GAG GGC AGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCT GAC > GGG CTG CAC TGG TGC AGC ACT ACG GCT GAC TAT GAC AAG GAC CGC CTT TAC GGC TTT TGC > CCT AGC GAG CTG CTC TAC ACC CTG GAT GGT AAC GCC AAT GGC GAT CCC TGC GTG TTC CCC > TTC ACC TTC GAT GGT CGT TCC TAC ACA GCC TGC ACC ACT GAA GGA CGC TCT GAC GGC TAC > CGC TGG TGT GCC ACT ACT GCC AGT TAC GAT CAG GAC AAG CTT TAT GGC TTC TGT CCC AAC > CGA --- GAT ACT GCG GTG AGC GGA GGC AAC TCC CAA GGG GAA CCC TGC GTC TTT CCC TTC > ACT TTC CTA AAT CGA GAA TAC TCA GCC TGC ACC AGT GAG GGC CGC AGT GAC GGT CGT CTC > TGG TGT GCG ACC ACC GAT GAC TTC GAT CGG GAT CAC AAG TGG GGC TTC TGT CAG GAT CGA > GGG TAC AGC TTA TTC CTT GTG GCC GCG CAC GAG TTT GGG CAC GCG CTG GGC TTG GAC CAC > TCA TCT GTG CCG GAA GCA TTG ATG TTC CCA ATG TAC CGT TTT ACC GAG GGA CCC CCG TTG > CAT GAG GAT GAC GTG AAG GGA ATC CAA CAT CTG TAT GGT TCT AGG ACT GAG CCG GAT CCG > GAA CCT CCG ACC TCT --- --- --- TCT CCC TTA GAG --- CCA GAT TCC ACC ACT CAG TTC > AAT GCT TGT --- --- --- CCC --- TCT GTA --- CCC CCC CCT --- --- --- GCC AGA CCC > ACC GGC CCT CCT ACT GCT CGC CCC TCA --- --- --- --- --- --- --- --- GCA CCT CCC > ACT GCT GGA CCC ACT GGT CCT --- CCC ACA GCC AAC CCT CCT GTG CCC CCC ACT --- GGG > CCC TTG GAC CCA GCT GAC GAC GCT TGT GGC GTC CTG GTA TTT GAT GCC ATC GCT GAG ATT > CGA GGC CAG CTT CAC TTC TTC AAA GAC GGA CGG TAC TGG CGA GTC CCC AGG GAC TCC --- > --- AAG --- GGG CCA --- ACT CAA GGA CCC TTC CTC ATT GCT AAC ACT TGG TCT GCT TTG > CCC CCA AAA CTG GAC TCG GCT TTC GAA GAT CCC CTG ACT AAG AAA CTC TTC TTC TTT TCA > GGT AAA GGT ATG TGG GTA TAC ACA GGC CAG TCA GTT GTA GGT CCC CGG CGC CTG GAG AAG > CTG GGT CTG CAT AGC AGA GTT CAA AGG ATA ACA GGT GCC ATT CAG CAT AAT GGA --- GGC > AAG GTG CTA TTA TTC AGC CAG AAT CAA TAT TGG AGG TTG GAT GTG AAG AAG CAG AAG GTA > GAC TCA AGA GAA CCT TAC CCT --- --- GTG GAG AAC ATG TTC CCT GGA GTA CCT GAA AAC > ACT CAT GAT GTT TTC CTG TAT AAG GGA GAT ACA --- TAC --- TTC TGC CAG GGC ATC TTC > TTC TGG CGC GTG AAC --- --- --- --- --- AAG GAG --- --- --- --- --- AAC AAG GTG > GAC TTA GTA GGC TAC GTG ACC TAC GAC CTC CTG --- --- --- > chicken > GCC GCC CCA CTG CAC AGC --- AAG CCG CAG GCG GTC --- ATC ACC TTC CCA GGG GAG CTG > --- CTC AGC GCC CCA TCA GAC GTG GAG CTG GCG GAG AAC TAC CTG CTG CGC TTC GGC TAC > ATC CAG GAG GCA GAG GTG AGG AGG AGC AGC AAG CAC GTG TCC CTG GCC --- AAA GCG CTG > CGC AGG ATG CAG AAG CAG CTG GGG CTG GAG GAG ACG GGG GAG CTG GAC GCC AGC ACC CTG > GAG GCC ATG CGA GCC CCC CGC TGT GGG GTG CCT GAC GTG GGG GGT TTC CTC ACC TTC GAG > GGG GAG CTC AAA TGG GAC CAC ATG GAC CTC ACG TAC CGG GTG ATG AAC TAC TCC CCC GAC > CTG GAC CGT GCC GTG ATA GAT GAT GCC TTC CGG CGG GCA TTC AAG GTG TGG AGT GAT GTC > ACT CCC CTC ACC TTC ACC CAG ATT TAC AGC GGC GAG GCA GAC ATC ATG ATC ATG TTC GGC > AGC CAA GAG CAT GGT GAT GGG TAC CCC TTC GAC GGC AAG GAT GGG CTC CTG GCC CAC GCC > TTT CCC CCC GGC AGT GGG ATT CAG GGC GAT GCC CAC TTC GAT GAT GAT GAG TTC TGG ACT > CTG GGA ACC GGC TTA GAG GTG AAG ACC CGC TAT GGG AAT GCC AAC GGG GCC AGC TGC CAC > TTC CCC TTC ATC TTT GAG GGC CGC TCC TAC TCC CGG TGC ATC ACG GAG GGC CGC ACG GAT > GGG ATG CTG TGG TGT GCC ACC ACC GCC AGC TAC GAC GCC GAC AAG ACC TAC GGC TTC TGC > CCC AGC GAG CTG CTC TAC ACC AAT GGT GGC AAC AGC GAT GGG TCT CCC TGC GTC TTC CCC > TTC ATC TTC GAT GGC GCC TCC TAT GAC ACC TGC ACC ACA GAT GGG CGC TCT GAC GGC TAT > CGC TGG TGT GCC ACC ACG GCC AAC TTC GAC CAG GAC AAG AAA TAC GGC TTC TGC CCC AAC > CGA --- GAC ACG GCG GCG ATC GGT GGC AAC TCC CAG GGG GAC CCG TGT GTC TTC CCC TTC > ACC TTC CTG GGG CAG TCC TAC AGC GCG CGC ACC AGC CAG GGC CGG CAG GAC GGG AAG CTC > TGG TGT GCC ACC ACC AGC AAC TAT GAC ACC GAC AAG AAG TGG GGC TTC TGC CCA GAC AGA > GGT TAC AGC ATC TTC TTG GTG GCT GCC CAC GAG TTT GGG CAC TCA CTG GGG CTG GAC CAC > TCC AGC GTG CGC GAG GCA TTG ATG TAC CCT ATG TAC AGC TAC GTC CAG GAC TTC CAG CTG > CAT GAG GAT GAT GTC CAG GGC ATC CAG TAC CTC TAT GGT CGT GGC TCT GGC CCT GAG CCC > ACC CCC CCG --- --- --- --- --- GCA CCT TTG --- --- CCC --- --- ACC GAG GAG --- > --- --- --- --- --- --- CCC CAG TCC ATA --- CCC ACC GAA --- --- --- GCT --- --- > --- GGC --- --- AGT GCT TCC ACC ACA --- --- --- --- --- --- --- --- GAG GAG GAG > GAG GAG --- GAG ACA --- CCT GAG CCC ACA GCT GAG --- --- --- --- CCC AGC --- --- > CCC GTG GAC CCC AGC CGG GAT GCC TGC ATG GAG AAG AAC TTC GAC GCC ATC ACT GAG ATC > AAT GGA GAG CTG CAC TTC TTC AAG AAT GGG AAA TAC TGG --- --- ACC CAC TCG TCC TTC > TGG AAA TCA GGC --- --- ACT CAG GGC GCC TTC TCT ATC GCT GAC ACC TGG CCC GGC CTC > CCG GCT GTC ATC GAC GCG GCG TTC CAA GAT GTG CTC ACC AAG AGG GTC TTC TTC TTC GCG > GGA CGG CAG TTC TGG GTG TTC TCC GGC AAG AAC GCA GTG GGC CCC CGT AGG ATT GAG AAG > TTG GGC ATT GGG AAG GAG GCC GGG CGC ATC ACG GGG GCC CTG CAG CGG GGA CGT --- GGC > AAA GTG CTG CTC TTC AGT GGG GAG CAC TAC TGG AGG CTG GAC GTG AAG GTC CAG ACA GTG > GAC --- AAG GGC --- TAC CCC CGT GAC ACT GAT GAT GTC TTT ACT GGT GTC CCC CTT GAC > GCA CGT AAC GTC TTC CTG TAC CAA --- GAC AAG --- TAC CAC TTC TGC CGG GAC AGC TTC > TAC TGG AGG ATG ACC --- --- --- --- --- CCA CGT --- --- --- --- --- TAC CAG GTG > GAC CGC GTG GGA TAC ATC AGA TAC GAC CTC CTG CAG TGC CCC > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From ba6450 at wayne.edu Thu Jul 26 21:20:11 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Thu, 26 Jul 2007 21:20:11 -0400 (EDT) Subject: [Bioperl-l] Finding the Sequence List in an Alignment Message-ID: <20070726212011.EFB49252@mirapointms6.wayne.edu> Thanks. The error is removed now. I have a question. Is there any function that I can use to get the sequence list (human, chimp, etc.) after loading an alignment from file? Munir ---- Original message ---- >Date: Thu, 26 Jul 2007 17:12:03 -0700 >From: "Jason Stajich" >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in bioperl) >To: "Munirul Islam" >Cc: bioperl-l at lists.open-bio.org > >You can try and pass in -interleaved => 0 as another option when you >init your AlignIO object. > From jason at bioperl.org Fri Jul 27 00:28:36 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Jul 2007 21:28:36 -0700 Subject: [Bioperl-l] Finding the Sequence List in an Alignment In-Reply-To: <20070726212011.EFB49252@mirapointms6.wayne.edu> References: <20070726212011.EFB49252@mirapointms6.wayne.edu> Message-ID: <8273f6c20707262128s23e7e3ebgeb1cb74b3c0baf37@mail.gmail.com> Have you tried reading the documentation for the Bio::SimpleAlign object? for my $seq ( $aln->each_seq ) { print $seq->display_id, "\n"; } I'd appreciate if you added some of your questions with the answers to the FAQ or to other places on the wiki so that other people can benefit from your learning here. On 7/26/07, Munirul Islam wrote: > > Thanks. The error is removed now. > > I have a question. Is there any function that I can use to get the > sequence list (human, chimp, etc.) after loading an alignment from file? > > Munir > > ---- Original message ---- > >Date: Thu, 26 Jul 2007 17:12:03 -0700 > >From: "Jason Stajich" > >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in > bioperl) > >To: "Munirul Islam" > >Cc: bioperl-l at lists.open-bio.org > > > >You can try and pass in -interleaved => 0 as another option when you > >init your AlignIO object. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From arareko at campus.iztacala.unam.mx Fri Jul 27 11:18:55 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 27 Jul 2007 10:18:55 -0500 Subject: [Bioperl-l] Perl Survey 2007 Message-ID: <46AA0CDF.1030503@campus.iztacala.unam.mx> It really takes about 5 minutes: http://perlsurvey.org/ Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From dhoworth at mrc-lmb.cam.ac.uk Fri Jul 27 12:07:17 2007 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Fri, 27 Jul 2007 17:07:17 +0100 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: <46AA0CDF.1030503@campus.iztacala.unam.mx> References: <46AA0CDF.1030503@campus.iztacala.unam.mx> Message-ID: <46AA1835.2020004@mrc-lmb.cam.ac.uk> Mauricio Herrera Cuadra wrote: > It really takes about 5 minutes: > http://perlsurvey.org/ and gives all your personal information including email address to anybody who cares to snoop the HTTP POST message! So there's definitely no anonymity. Cheers, Dave From spiros at lokku.com Fri Jul 27 12:38:57 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Fri, 27 Jul 2007 17:38:57 +0100 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: <46AA1835.2020004@mrc-lmb.cam.ac.uk> References: <46AA0CDF.1030503@campus.iztacala.unam.mx> <46AA1835.2020004@mrc-lmb.cam.ac.uk> Message-ID: On 7/27/07, Dave Howorth wrote: > Mauricio Herrera Cuadra wrote: > > It really takes about 5 minutes: > > http://perlsurvey.org/ > > and gives all your personal information including email address to > anybody who cares to snoop the HTTP POST message! So there's definitely > no anonymity. Not to mention that it requires registration (?). Who is behind the survey ? I am on a number of Perl and Perl related lists and haven't seen it being mentioned. Spiros From arareko at campus.iztacala.unam.mx Fri Jul 27 13:37:31 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 27 Jul 2007 12:37:31 -0500 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: References: <46AA0CDF.1030503@campus.iztacala.unam.mx> <46AA1835.2020004@mrc-lmb.cam.ac.uk> Message-ID: <46AA2D5B.9080304@campus.iztacala.unam.mx> Spiros Denaxas wrote: > On 7/27/07, Dave Howorth wrote: >> Mauricio Herrera Cuadra wrote: >>> It really takes about 5 minutes: >>> http://perlsurvey.org/ >> and gives all your personal information including email address to >> anybody who cares to snoop the HTTP POST message! So there's definitely >> no anonymity. I didn't provided any personal information other than my country and birthyear. As for my email, I always use the one I have for all the SPAM I'd like to subscribe to :) > Not to mention that it requires registration (?). Who is behind the > survey ? I am on a number of Perl and Perl related lists and haven't > seen it being mentioned. Registration is rather different from confirming your email (which prevents filling the DB multiple times by spambots/yourself, thus screwing the survey). Who's behind it, its purpose, privacy, etc., please read the FAQ: http://perlsurvey.org/faq/ Cheers, Mauricio. > Spiros > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From Alicia.Amadoz at uv.es Mon Jul 30 11:46:57 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Mon, 30 Jul 2007 17:46:57 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver Message-ID: <1245168492amadoz@uv.es> Hi, i'm trying to run a bioperl script in linux with standaloneblast from a webserver but I have the following error: -------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- I have tried several things to fix it as setting some environment variables both directly through the shell and adding some code in my script with, BEGIN { $ENV{PATH} .= ':/usr/local/blast-2.2.16'; $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; $ENV{BLASTDATADIR} = '/usr/local/data/'; } and with, $local->executable('/usr/local/bin'); my $blast_report = $local->blastall($inputfilename); I have also checked that the webserver has permission of read and execute in all blast executables and directories. But trying all of these things it keeps showing the same error above. Any more idea to solve this problem? My script works well when I use it as a simply script and I've reboot the system several times when changes where performed. Thanks to anyone who will be able to help me! Regards, Alicia From gyang at plantbio.uga.edu Mon Jul 30 16:58:51 2007 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon, 30 Jul 2007 16:58:51 -0400 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml Message-ID: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> I am running remoteblast and using readmethod "xml", I noticed that it is printing the output repeatedly nonstop. It's like in a loop. Did anybody notice this before? Can anybody help me getting out of this? Thanks a lot, Guojun Yang University of Georgia From grafman at graphcomp.com Sun Jul 29 17:08:04 2007 From: grafman at graphcomp.com (Grafman Productions) Date: Sun, 29 Jul 2007 14:08:04 -0700 Subject: [Bioperl-l] Perl 3D OpenGL Message-ID: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> If this posting is inappropriate, please let me know - my apologies. I recently came across an article on BioPerl, and it occurred to me that there might be some need for 3D rendering within your BioPerl project. I released a number of new/updated Perl OpenGL (POGL) modules this year, along with benchmarks that demonstrate that it performs comparably to C. If there's a need for 3D features within BioPerl, and if I can be of any assistance in helping to add such features, I would enjoy the opportunity. From torsten.seemann at infotech.monash.edu.au Mon Jul 30 19:27:46 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 31 Jul 2007 09:27:46 +1000 Subject: [Bioperl-l] error using standaloneblast through webserver In-Reply-To: <1245168492amadoz@uv.es> References: <1245168492amadoz@uv.es> Message-ID: Alicia, > Hi, i'm trying to run a bioperl script in linux with standaloneblast > from a webserver but I have the following error: > -------------------- WARNING --------------------- > MSG: cannot find path to blastall > --------------------------------------------------- > $ENV{BLASTDATADIR} = '/usr/local/data/'; > $ENV{PATH} .= ':/usr/local/blast-2.2.16'; > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; I think the last one (or two) paths should be '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard BLAST installation is where the 'blastall' binary actually lives. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From cjfields at uiuc.edu Mon Jul 30 20:53:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 30 Jul 2007 19:53:45 -0500 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> Message-ID: On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: > I am running remoteblast and using readmethod "xml", I noticed that > it is printing the output repeatedly nonstop. It's like in a loop. > Did anybody notice this before? Can anybody help me getting out of > this? > Thanks a lot, > > > Guojun Yang > University of Georgia Not seeing that using bioperl-live; you may need to update RemoteBlast.pm as this sounds similar to an issue that popped up earlier in the spring. chris From torsten.seemann at infotech.monash.edu.au Tue Jul 31 02:24:34 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 31 Jul 2007 16:24:34 +1000 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> Message-ID: > as this sounds similar to an issue that popped up > earlier in the spring. I could have sworn it was autumn! ;-) -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From Alicia.Amadoz at uv.es Tue Jul 31 06:11:54 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Tue, 31 Jul 2007 12:11:54 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver In-Reply-To: References: Message-ID: <2361686267amadoz@uv.es> Hi, I tried what you suggested and that was it, it works perfectly. Thank you very much. Regards, Alicia > Alicia, > > > Hi, i'm trying to run a bioperl script in linux with standaloneblast > > from a webserver but I have the following error: > > -------------------- WARNING --------------------- > > MSG: cannot find path to blastall > > --------------------------------------------------- > > $ENV{BLASTDATADIR} = '/usr/local/data/'; > > $ENV{PATH} .= ':/usr/local/blast-2.2.16'; > > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; > > I think the last one (or two) paths should be > '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard > BLAST installation is where the 'blastall' binary actually lives. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > > From jay at jays.net Tue Jul 31 08:00:56 2007 From: jay at jays.net (Jay Hannah) Date: Tue, 31 Jul 2007 07:00:56 -0500 Subject: [Bioperl-l] Perl 3D OpenGL In-Reply-To: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> Message-ID: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote: > If this posting is inappropriate, please let me know - my apologies. Not at all. AFAIK this is the perfect place to discuss any contributions you're motivated to make to the BioPerl project. > I recently came across an article on BioPerl, and it occurred to me > that > there might be some need for 3D rendering within your BioPerl project. > > I released a number of new/updated Perl OpenGL (POGL) modules this > year, > along with benchmarks that demonstrate that it performs comparably > to C. > > If there's a need for 3D features within BioPerl, and if I can be > of any > assistance in helping to add such features, I would enjoy the > opportunity. I know nothing about 3D modeling in biology, nor do I hang out with any protein structure folks, but 3D always sounds sexy. -grin- If you're new to bioinformatics (I certainly am) you might want to read this: http://en.wikipedia.org/wiki/Protein_structure Because that's probably where your 3D work would be used. Especially note the "Software" section, where you'll find some of the "competition". :) There's some cool stuff out there. I don't know what all would or wouldn't be time well spent in Perl / BioPerl. HTH, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From cjfields at uiuc.edu Tue Jul 31 12:51:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 31 Jul 2007 11:51:42 -0500 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: <20070731104052.b4b93021@dogwood.plantbio.uga.edu> References: <20070731104052.b4b93021@dogwood.plantbio.uga.edu> Message-ID: <7A2D7E4A-4024-48DB-88C8-063388A98419@uiuc.edu> Make sure to keep responses on the ail list. You might want to run a full install, just in case. If I remember correctly Sendu made some changes a while back in the BLAST-related modules which may be related to this. At the very least install/ upgrade all modules in Bio::Tools::Run. chris On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote: > Thanks, Chris, > But when I replaced the old RemoteBlast.pm with the new one, I got > "can't locate the object method "retrieve_parameter"". Does this > mean I need to install something else? > Guojun > > ----- Original Message ----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast > with xml > > >>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: >>>> I am running remoteblast and using readmethod "xml", I noticed that >>> it is printing the output repeatedly nonstop. It's like in a loop. >>> Did anybody notice this before? Can anybody help me getting out of >>> this? >>> Thanks a lot, >>> >>> >>> Guojun Yang >>> University of Georgia >>> Not seeing that using bioperl-live; you may need to update >> RemoteBlast.pm as this sounds similar to an issue that popped up >> earlier in the spring. >>> chris >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Jul 31 22:15:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 31 Jul 2007 21:15:45 -0500 Subject: [Bioperl-l] Perl 3D OpenGL In-Reply-To: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> Message-ID: <04BCAD9E-CC25-4F0A-85B1-FBA91C64CE7D@uiuc.edu> On Jul 31, 2007, at 7:00 AM, Jay Hannah wrote: > On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote: >> If this posting is inappropriate, please let me know - my apologies. > > Not at all. AFAIK this is the perfect place to discuss any > contributions you're motivated to make to the BioPerl project. > >> I recently came across an article on BioPerl, and it occurred to me >> that >> there might be some need for 3D rendering within your BioPerl >> project. >> >> I released a number of new/updated Perl OpenGL (POGL) modules this >> year, >> along with benchmarks that demonstrate that it performs comparably >> to C. >> >> If there's a need for 3D features within BioPerl, and if I can be >> of any >> assistance in helping to add such features, I would enjoy the >> opportunity. > > I know nothing about 3D modeling in biology, nor do I hang out with > any protein structure folks, but 3D always sounds sexy. -grin- > > If you're new to bioinformatics (I certainly am) you might want to > read this: > > http://en.wikipedia.org/wiki/Protein_structure > > Because that's probably where your 3D work would be used. Especially > note the "Software" section, where you'll find some of the > "competition". :) > > There's some cool stuff out there. I don't know what all would or > wouldn't be time well spent in Perl / BioPerl. > > HTH, > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah I agree that protein structure is the best place for something like this. It's a wide open area as far as I'm concerned; in fact I would say that Bio::Structure is getting pretty dated, so if anyone wants to take it over, refactor the code, and so on I don't have a problem. chris From dmessina at wustl.edu Sun Jul 1 01:38:48 2007 From: dmessina at wustl.edu (David Messina) Date: Sun, 1 Jul 2007 00:38:48 -0500 Subject: [Bioperl-l] svn auto-properties [was Re: First cut svn repository] In-Reply-To: <46869226.70203@sheffield.ac.uk> References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <5764264E-5C40-4C9E-B1C9-A70628AC1DD0@uiuc.edu> <18051.44281.831316.749586@almost.alerce.com> <18051.61992.627473.323346@almost.alerce.com> <4684AF3D.5090907@sheffield.ac.uk> <843758CD-9C5B-4DDA-9FF4-B90AA225BDB3@wustl.edu> <468628AC.9060200@sheffield.ac.uk> <461F64B9-87FD-458A-8945-8238E7076109@wustl.edu> <46869226.70203@sheffield.ac.uk> Message-ID: <3164A6E3-77CF-4E61-9609-1408768862B1@wustl.edu> > [Nath] > I think the list of seq formats recognised by Bioperl in Bio::SeqIO > and > Bio::AlignIO would be a good start. As these are likely to be the ones > that are sensitive to file format recognition and thus could break > tests > if renamed. Sounds good to me. I will do a quick tour of the rest of the repo looking for other common or important file extensions, but I don't expect there to be many if any. > [still Nath] > I think a lot of people have used "." in file names as an > alternative to > a space. I think it would be beneficial to use an underscore "_" in > these cases and leave the "." to represent the beginning of the file > extension. That's a great idea. > [Chris] > Do we need to define every filetype extension, or can there be a > fallback (eg if it isn't on the list or has no extension it's plain > text)? For every file that's added, svn takes a peek to see if it's human- readable. If not, it's tagged with the generic MIME type application/ octet-stream. (It does this so it knows not to try to do diffs and merges on a binary file.) So the default for a human-readable file is no MIME type, which I believe is essentially the same thing as text/plain. And then regardless of the outcome of svn's peek, any matching auto- props are then applied, overriding svn's choice. So if we don't define every extension, I think we'll be fine. It'd be nice to have everything tagged with a MIME type, though. For one thing, Apache will use it to do the right thing when people browse the repo over the web. And two, because metadata is cool. :) One more thing: in the course of reading up on this, I learned that my earlier expectation about multiple auto-prop matches was incorrect. It's true that multiple unrelated matches means that multiple properties are set on the file. But when a file matches multiple *conflicting* auto-property patterns, there's no telling which value it'll get. Dave From hartzell at alerce.com Sun Jul 1 12:29:29 2007 From: hartzell at alerce.com (George Hartzell) Date: Sun, 1 Jul 2007 09:29:29 -0700 Subject: [Bioperl-l] First cut svn repository In-Reply-To: References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <4683A7D1.8070403@sendu.me.uk> <18051.48684.996884.134046@almost.alerce.com> <4683C385.3050904@sendu.me.uk> <18051.63674.685297.426813@almost.alerce.com> <18052.3946.224905.415905@almost.alerce.com> <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net> Message-ID: <18055.54889.677775.868974@almost.alerce.com> Hilmar Lapp writes: > It turns out that both files are also present on the release-0-9-3, > bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add > > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ > HUMBETGLOA.fasta > > to the post-processing commands. > [...] Will do. Thanks for working out the incantations! g. From cjfields at uiuc.edu Mon Jul 2 09:26:06 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:26:06 -0500 Subject: [Bioperl-l] test data Message-ID: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> I am planing on adding test data to cvs for eutils and have run across some stuff in bugzilla that needs to be added as well. Should we, as convention, start adding data sequestered to a fold with the test name, within t/data? This might make life easier in the long run (keep track of files, get rid of old files, etc), and may make it easier for wrapping up the correct data with tests if we start submitting single module CPAN updates. chris From cjfields at uiuc.edu Mon Jul 2 09:52:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:52:27 -0500 Subject: [Bioperl-l] test data In-Reply-To: <468901C1.8020505@sendu.me.uk> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> Message-ID: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Chris Fields wrote: >> I am planing on adding test data to cvs for eutils and have run >> across some stuff in bugzilla that needs to be added as well. >> Should we, as convention, start adding data sequestered to a fold >> with the test name, within t/data? > > I'd actually argue that this shouldn't be done: data is sometimes > reused amongst multiple different test scripts, and when looking > for data to reuse its easier to spot it in a single directory > compared to searching through multiple directories. > > >> This might make life easier in the long run (keep track of files, >> get rid of old files, etc), and may make it easier for wrapping up >> the correct data with tests if we start submitting single module >> CPAN updates. > > I don't think that will be an issue. The automated process would > read the test script and see what input files it uses, copying > those into the archive. So, just be sure to standardise on using > test_input_file() to make that possible. > > > That said, I wouldn't mind especially either way. Just don't do it > now, since test script names (and therefore the name of the > directory you'd want to store the input files in) might all change. > > > In fact we can imagine that we have a test script t/ > BioZombieKitten.t which stores its test data in t/data/ > BioZombieKitten/input.file but the script gets the path to this > file by: > my $input_file = test_input_file('input.file'); > > test_input_file() is then implemented to look for the file in the > subdir of data corresponding to the script name if we're dealing > with the 900-modules-in-a-package checkout-type situation, but just > in t/data if we're in the one-module-in-a-package situation. > > In any case, things will be most flexible if you drop files > directly into t/data for now and reference them without any subdirs > in the call to test_input_file(). Fine by me, I just find it very cluttered. BioZombieKitten?!? chris From bix at sendu.me.uk Mon Jul 2 10:00:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 15:00:37 +0100 Subject: [Bioperl-l] test data In-Reply-To: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> Message-ID: <46890505.1070707@sendu.me.uk> Chris Fields wrote: > On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Fine by me, I just find it very cluttered. Yes, I agree. I also wish we had a decent naming convention for files. (Ie. it would be nice to have a good idea what a file was for without having to study the test script that uses it.) > BioZombieKitten?!? I get Bio/perl/ and Bio/ware/ confused in my head ;) http://forums.bioware.com/viewtopic.html?topic=562916&forum=84 From bix at sendu.me.uk Mon Jul 2 09:46:41 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 14:46:41 +0100 Subject: [Bioperl-l] test data In-Reply-To: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> Message-ID: <468901C1.8020505@sendu.me.uk> Chris Fields wrote: > I am planing on adding test data to cvs for eutils and have run across > some stuff in bugzilla that needs to be added as well. > > Should we, as convention, start adding data sequestered to a fold with > the test name, within t/data? I'd actually argue that this shouldn't be done: data is sometimes reused amongst multiple different test scripts, and when looking for data to reuse its easier to spot it in a single directory compared to searching through multiple directories. > This might make life easier in the long > run (keep track of files, get rid of old files, etc), and may make it > easier for wrapping up the correct data with tests if we start > submitting single module CPAN updates. I don't think that will be an issue. The automated process would read the test script and see what input files it uses, copying those into the archive. So, just be sure to standardise on using test_input_file() to make that possible. That said, I wouldn't mind especially either way. Just don't do it now, since test script names (and therefore the name of the directory you'd want to store the input files in) might all change. In fact we can imagine that we have a test script t/BioZombieKitten.t which stores its test data in t/data/BioZombieKitten/input.file but the script gets the path to this file by: my $input_file = test_input_file('input.file'); test_input_file() is then implemented to look for the file in the subdir of data corresponding to the script name if we're dealing with the 900-modules-in-a-package checkout-type situation, but just in t/data if we're in the one-module-in-a-package situation. In any case, things will be most flexible if you drop files directly into t/data for now and reference them without any subdirs in the call to test_input_file(). From hlapp at gmx.net Mon Jul 2 16:02:37 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 16:02:37 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18054.63942.316904.413911@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: Just FYI, after applying the changes I've been sending, I was able to check out the repository in its entirety. -hilmar On Jun 30, 2007, at 8:48 PM, George Hartzell wrote: > > There's a second cut at the subversion repository. I've done a better > job of setting svn:keywords and svn:eol-style on various files. The > defaults were more cautious and I used an auto-props files based on > the wiki version. > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2 > > The old repository's still around as > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1 > > I renamed it so that people would work with it by mistake. If, for > some hard-to-imagine reason, you have a working copy that you want to > run against it, you should be able to do an svn switch --relocate on > your working copy and be back in shape. In fact, it might be a good > time to give it a try.... > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From wrp at virginia.edu Mon Jul 2 16:08:04 2007 From: wrp at virginia.edu (William R. Pearson) Date: Mon, 2 Jul 2007 16:08:04 -0400 Subject: [Bioperl-l] Course: Computational and Comparative Genomics Message-ID: <4B3F66D7-CF05-4CD1-A148-272B4B56FBD4@virginia.edu> Course announcement - Application deadline, July 15, 2007 ================================================================ Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS November 7 - 13, 200 Application Deadline: July 15, 2007 INSTRUCTORS: Pearson, William, Ph.D., University of Virginia, Charlottesville, VA Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of Prussia, PA Beyond BLAST and FASTA - Alignment: from proteins to genomes - This course presents a comprehensive overview of the theory and practice of computational methods for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment, and genome scale alignment. Additional topics include gene finding, dentifying signals in unaligned sequences, integration of genetic and sequence information in biological databases. The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course makes extensive use of local WWW pages to present problem sets and the computing tools to solve them. Students use Windows and Mac workstations attached to a UNIX server. The course is designed for biologists seeking advanced training in biological sequence analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis and comparative genomics. The primary focus of the Computational and Comparative Genomics Course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and developing new algorithms. Cold Spring Harbor also offers a "Programming for Biology" course, which focuses more on software development. For additional information and the lecture schedule and problem sets for the 2006 course, see: http://fasta.bioch.virginia.edu/cshl06 ================================================================ To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/courses/courseapplication.asp ================================================================ Bill Pearson From niels at genomics.dk Mon Jul 2 16:45:07 2007 From: niels at genomics.dk (Niels Larsen) Date: Mon, 02 Jul 2007 22:45:07 +0200 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <468963D3.3000007@genomics.dk> I write hoping someone could show me how to create a PrimarySeq object without parsing features and all first. The lines below return "Can't locate object method "next_seq" via package "Bio::PrimarySeq" at ./tst2 line 16." whereas calling Bio::SeqIO-> gives no error, but a too big object. The GenBank record after the __END__ is the "1.gb" file. I could not find out how from the tutorial or the Bio::PrimarySeq description. Niels L #!/usr/bin/env perl use strict; use warnings FATAL => qw ( all ); use Data::Dumper; use Bio::Seq; use Bio::SeqIO; my ( $seq_h, $seq ); $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", -format => 'genbank' ); $seq = $seq_h->next_seq(); # print Dumper( $seq ); __END__ LOCUS X60065 9 bp mRNA linear MAM 14-NOV-2006 DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. ACCESSION X60065 REGION: 1..9 VERSION X60065.1 GI:5 KEYWORDS beta-2 glycoprotein I. SOURCE Bos taurus (cattle) ORGANISM Bos taurus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bos. REFERENCE 1 AUTHORS Bendixen,E., Halkier,T., Magnusson,S., Sottrup-Jensen,L. and Kristensen,T. TITLE Complete primary structure of bovine beta 2-glycoprotein I: localization of the disulfide bridges JOURNAL Biochemistry 31 (14), 3611-3617 (1992) PUBMED 1567819 REFERENCE 2 (bases 1 to 9) AUTHORS Kristensen,T. TITLE Direct Submission JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of Mol Biology, University of Aarhus, C F Mollers Alle 130, DK-8000 Aarhus C, DENMARK FEATURES Location/Qualifiers source 1..9 /organism="Bos taurus" /mol_type="mRNA" /db_xref="taxon:9913" /clone="pBB2I" /tissue_type="liver" gene <1..>9 /gene="beta-2-gpI" CDS <1..>9 /gene="beta-2-gpI" /codon_start=1 /product="beta-2-glycoprotein I" /protein_id="CAA42669.1" /db_xref="GI:6" /db_xref="GOA:P17690" /db_xref="UniProtKB/Swiss-Prot:P17690" /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT DASDVKPC" sig_peptide <1..>9 /gene="beta-2-gpI" ORIGIN 1 ccagcgctc // From Kevin.M.Brown at asu.edu Mon Jul 2 17:35:12 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 2 Jul 2007 14:35:12 -0700 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <468963D3.3000007@genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> Message-ID: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Start by having a look at the following link: http://bioperl.org/cgi-bin/deob_interface.cgi SeqIO is how one reads or writes sequences to/from files. Bio::PrimarySeq is just an object that holds information about a sequence obtained from a file. As for how to parse a Genbank file into a list of features: $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); while (my $seq = $file->next_seq()) { @features = $seq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { # @sorted_features holds all the Bio::PrimarySeq features obtained from the genbank file push @sorted_features, $f; } } } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Niels Larsen > Sent: Monday, July 02, 2007 1:45 PM > Cc: bioperl-l List > Subject: [Bioperl-l] simple PrimarySeq question > > I write hoping someone could show me how to create a > PrimarySeq object without parsing features and all first. The > lines below return > > "Can't locate object method "next_seq" via package > "Bio::PrimarySeq" at ./tst2 line 16." > > whereas calling Bio::SeqIO-> gives no error, but a too big object. > The GenBank record after the __END__ is the "1.gb" file. I > could not find out how from the tutorial or the > Bio::PrimarySeq description. > > Niels L > > > #!/usr/bin/env perl > > use strict; > use warnings FATAL => qw ( all ); > > use Data::Dumper; > > use Bio::Seq; > use Bio::SeqIO; > > my ( $seq_h, $seq ); > > $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => > 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", > -format => 'genbank' ); > > $seq = $seq_h->next_seq(); > > # print Dumper( $seq ); > > __END__ > > LOCUS X60065 9 bp mRNA linear > MAM 14-NOV-2006 > DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. > ACCESSION X60065 REGION: 1..9 > VERSION X60065.1 GI:5 > KEYWORDS beta-2 glycoprotein I. > SOURCE Bos taurus (cattle) > ORGANISM Bos taurus > Eukaryota; Metazoa; Chordata; Craniata; > Vertebrata; Euteleostomi; > Mammalia; Eutheria; Laurasiatheria; > Cetartiodactyla; Ruminantia; > Pecora; Bovidae; Bovinae; Bos. > REFERENCE 1 > AUTHORS Bendixen,E., Halkier,T., Magnusson,S., > Sottrup-Jensen,L. and > Kristensen,T. > TITLE Complete primary structure of bovine beta > 2-glycoprotein I: > localization of the disulfide bridges > JOURNAL Biochemistry 31 (14), 3611-3617 (1992) > PUBMED 1567819 > REFERENCE 2 (bases 1 to 9) > AUTHORS Kristensen,T. > TITLE Direct Submission > JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of > Mol Biology, > University of Aarhus, C F Mollers Alle 130, > DK-8000 Aarhus C, > DENMARK > FEATURES Location/Qualifiers > source 1..9 > /organism="Bos taurus" > /mol_type="mRNA" > /db_xref="taxon:9913" > /clone="pBB2I" > /tissue_type="liver" > gene <1..>9 > /gene="beta-2-gpI" > CDS <1..>9 > /gene="beta-2-gpI" > /codon_start=1 > /product="beta-2-glycoprotein I" > /protein_id="CAA42669.1" > /db_xref="GI:6" > /db_xref="GOA:P17690" > /db_xref="UniProtKB/Swiss-Prot:P17690" > > /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI > > VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT > > ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN > > SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN > > PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER > > VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT > DASDVKPC" > sig_peptide <1..>9 > /gene="beta-2-gpI" > ORIGIN > 1 ccagcgctc > // > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From niels at genomics.dk Mon Jul 2 20:41:24 2007 From: niels at genomics.dk (niels at genomics.dk) Date: Tue, 3 Jul 2007 02:41:24 +0200 (CEST) Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Message-ID: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Kevin, Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO gets entries from file, and from those large parsed entries I can get a simplified primary_seq object. But the SeqIO object includes feature and annotation objects etc that takes time to make, and I wish to know if there is a way to get a primari_seq object without this overhead. I apologize if I overlooked it in the docs. Niels > Start by having a look at the following link: > http://bioperl.org/cgi-bin/deob_interface.cgi > > SeqIO is how one reads or writes sequences to/from files. > Bio::PrimarySeq is just an object that holds information about a > sequence obtained from a file. > > As for how to parse a Genbank file into a list of features: > > $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); > while (my $seq = $file->next_seq()) > { > @features = $seq->all_SeqFeatures; > # sort features by their primary tags > for my $f (@features) > { > my $tag = $f->primary_tag; > if ($tag eq 'CDS') > { > # @sorted_features holds all the Bio::PrimarySeq > features obtained from the genbank file > push @sorted_features, $f; > } > } > } > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Niels Larsen >> Sent: Monday, July 02, 2007 1:45 PM >> Cc: bioperl-l List >> Subject: [Bioperl-l] simple PrimarySeq question >> >> I write hoping someone could show me how to create a >> PrimarySeq object without parsing features and all first. The >> lines below return >> >> "Can't locate object method "next_seq" via package >> "Bio::PrimarySeq" at ./tst2 line 16." >> >> whereas calling Bio::SeqIO-> gives no error, but a too big object. >> The GenBank record after the __END__ is the "1.gb" file. I >> could not find out how from the tutorial or the >> Bio::PrimarySeq description. >> >> Niels L >> >> >> #!/usr/bin/env perl >> >> use strict; >> use warnings FATAL => qw ( all ); >> >> use Data::Dumper; >> >> use Bio::Seq; >> use Bio::SeqIO; >> >> my ( $seq_h, $seq ); >> >> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >> -format => 'genbank' ); >> >> $seq = $seq_h->next_seq(); >> >> # print Dumper( $seq ); >> >> __END__ >> >> LOCUS X60065 9 bp mRNA linear >> MAM 14-NOV-2006 >> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >> ACCESSION X60065 REGION: 1..9 >> VERSION X60065.1 GI:5 >> KEYWORDS beta-2 glycoprotein I. >> SOURCE Bos taurus (cattle) >> ORGANISM Bos taurus >> Eukaryota; Metazoa; Chordata; Craniata; >> Vertebrata; Euteleostomi; >> Mammalia; Eutheria; Laurasiatheria; >> Cetartiodactyla; Ruminantia; >> Pecora; Bovidae; Bovinae; Bos. >> REFERENCE 1 >> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >> Sottrup-Jensen,L. and >> Kristensen,T. >> TITLE Complete primary structure of bovine beta >> 2-glycoprotein I: >> localization of the disulfide bridges >> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >> PUBMED 1567819 >> REFERENCE 2 (bases 1 to 9) >> AUTHORS Kristensen,T. >> TITLE Direct Submission >> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >> Mol Biology, >> University of Aarhus, C F Mollers Alle 130, >> DK-8000 Aarhus C, >> DENMARK >> FEATURES Location/Qualifiers >> source 1..9 >> /organism="Bos taurus" >> /mol_type="mRNA" >> /db_xref="taxon:9913" >> /clone="pBB2I" >> /tissue_type="liver" >> gene <1..>9 >> /gene="beta-2-gpI" >> CDS <1..>9 >> /gene="beta-2-gpI" >> /codon_start=1 >> /product="beta-2-glycoprotein I" >> /protein_id="CAA42669.1" >> /db_xref="GI:6" >> /db_xref="GOA:P17690" >> /db_xref="UniProtKB/Swiss-Prot:P17690" >> >> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >> >> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >> >> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >> >> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >> >> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >> >> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >> DASDVKPC" >> sig_peptide <1..>9 >> /gene="beta-2-gpI" >> ORIGIN >> 1 ccagcgctc >> // >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Mon Jul 2 22:36:19 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 22:36:19 -0400 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Message-ID: <84F5C120-FE0B-472D-8F1B-026AD238E959@gmx.net> Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have examples for what you want to do: use Bio::SeqIO; # usually you won't instantiate this yourself - a SeqIO object - # you will have one already my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank"); my $builder = $seqin->sequence_builder(); # if you need only sequence, id, and description (e.g. for # conversion to FASTA format): $builder->want_none(); $builder->add_wanted_slot('display_id','desc','seq'); # if you want everything except the sequence and features $builder->want_all(1); # this is the default if it's untouched $builder->add_unwanted_slot('seq','features'); Let us know if that doesn't answer your question. Note that this is currently only implemented for Genbank format. -hilmar On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote: > Kevin, > > Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO > gets entries from file, and from those large parsed entries I can > get a > simplified primary_seq object. But the SeqIO object includes feature > and annotation objects etc that takes time to make, and I wish to know > if there is a way to get a primari_seq object without this overhead. I > apologize if I overlooked it in the docs. > > Niels > > > > >> Start by having a look at the following link: >> http://bioperl.org/cgi-bin/deob_interface.cgi >> >> SeqIO is how one reads or writes sequences to/from files. >> Bio::PrimarySeq is just an object that holds information about a >> sequence obtained from a file. >> >> As for how to parse a Genbank file into a list of features: >> >> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); >> while (my $seq = $file->next_seq()) >> { >> @features = $seq->all_SeqFeatures; >> # sort features by their primary tags >> for my $f (@features) >> { >> my $tag = $f->primary_tag; >> if ($tag eq 'CDS') >> { >> # @sorted_features holds all the Bio::PrimarySeq >> features obtained from the genbank file >> push @sorted_features, $f; >> } >> } >> } >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Niels Larsen >>> Sent: Monday, July 02, 2007 1:45 PM >>> Cc: bioperl-l List >>> Subject: [Bioperl-l] simple PrimarySeq question >>> >>> I write hoping someone could show me how to create a >>> PrimarySeq object without parsing features and all first. The >>> lines below return >>> >>> "Can't locate object method "next_seq" via package >>> "Bio::PrimarySeq" at ./tst2 line 16." >>> >>> whereas calling Bio::SeqIO-> gives no error, but a too big object. >>> The GenBank record after the __END__ is the "1.gb" file. I >>> could not find out how from the tutorial or the >>> Bio::PrimarySeq description. >>> >>> Niels L >>> >>> >>> #!/usr/bin/env perl >>> >>> use strict; >>> use warnings FATAL => qw ( all ); >>> >>> use Data::Dumper; >>> >>> use Bio::Seq; >>> use Bio::SeqIO; >>> >>> my ( $seq_h, $seq ); >>> >>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >>> -format => 'genbank' ); >>> >>> $seq = $seq_h->next_seq(); >>> >>> # print Dumper( $seq ); >>> >>> __END__ >>> >>> LOCUS X60065 9 bp mRNA linear >>> MAM 14-NOV-2006 >>> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >>> ACCESSION X60065 REGION: 1..9 >>> VERSION X60065.1 GI:5 >>> KEYWORDS beta-2 glycoprotein I. >>> SOURCE Bos taurus (cattle) >>> ORGANISM Bos taurus >>> Eukaryota; Metazoa; Chordata; Craniata; >>> Vertebrata; Euteleostomi; >>> Mammalia; Eutheria; Laurasiatheria; >>> Cetartiodactyla; Ruminantia; >>> Pecora; Bovidae; Bovinae; Bos. >>> REFERENCE 1 >>> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >>> Sottrup-Jensen,L. and >>> Kristensen,T. >>> TITLE Complete primary structure of bovine beta >>> 2-glycoprotein I: >>> localization of the disulfide bridges >>> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >>> PUBMED 1567819 >>> REFERENCE 2 (bases 1 to 9) >>> AUTHORS Kristensen,T. >>> TITLE Direct Submission >>> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >>> Mol Biology, >>> University of Aarhus, C F Mollers Alle 130, >>> DK-8000 Aarhus C, >>> DENMARK >>> FEATURES Location/Qualifiers >>> source 1..9 >>> /organism="Bos taurus" >>> /mol_type="mRNA" >>> /db_xref="taxon:9913" >>> /clone="pBB2I" >>> /tissue_type="liver" >>> gene <1..>9 >>> /gene="beta-2-gpI" >>> CDS <1..>9 >>> /gene="beta-2-gpI" >>> /codon_start=1 >>> /product="beta-2-glycoprotein I" >>> /protein_id="CAA42669.1" >>> /db_xref="GI:6" >>> /db_xref="GOA:P17690" >>> /db_xref="UniProtKB/Swiss-Prot:P17690" >>> >>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >>> >>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >>> >>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >>> >>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >>> >>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >>> >>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >>> DASDVKPC" >>> sig_peptide <1..>9 >>> /gene="beta-2-gpI" >>> ORIGIN >>> 1 ccagcgctc >>> // >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ewijaya at gmail.com Tue Jul 3 02:56:30 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 14:56:30 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at gmail.com Tue Jul 3 03:00:16 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 15:00:16 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707030000t5ab77608x264d49125255a6d1@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at i2r.a-star.edu.sg Tue Jul 3 02:35:12 2007 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 3 Jul 2007 14:35:12 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net> Message-ID: <3ACF03E372996C4EACD542EA8A05E66A26EB85@mailbe01.teak.local.net> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward ------------ Institute For Infocomm Research - Disclaimer -------------This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.-------------------------------------------------------- From lstein at cshl.edu Tue Jul 3 10:41:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 3 Jul 2007 10:40:26 -0401 Subject: [Bioperl-l] Problem with GD.pm version 2.35 In-Reply-To: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> References: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Message-ID: <6dce9a0b0707030741r52b8d0beq757a8faf982e1f2f@mail.gmail.com> This happens when there is a mismatch between the compiled (.so) portion of GD and the perl (.pm) version. Typically it occurs when you have installed GD incorrectly by, e.g., copying the .pm file into position rather than using the make file. Solution: Uninstall old versions of GD by manually finding all occurrences of GD.so and GD.pm and removing them. Then reinstall the correct way. Lincoln On 7/3/07, Edward Wijaya wrote: > > Dear all, > I was trying to perform check with this command: > > $ perl -MGD -e 'print $GD::VERSION'; > > And it gave: > > GD object version 2.32 does not match $GD::VERSION 2.35 at > /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. > Compilation failed in require. > BEGIN failed--compilation aborted. > > Similarly my script that uses GD.pm doesn't execute. > > > I have installed the latest version of libgd version 2.0.35 downloaded > from > http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 > > Can anybody suggest how can I resolve my problem? > > This is my Perl version: > This is perl, v5.8.8 built for i386-linux-thread-multi > > -- > Edward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Jul 4 01:45:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 00:45:16 -0500 Subject: [Bioperl-l] genbank2gff3 - Name attribute? Message-ID: I noticed that genbank2gff3.pl doesn't have an explicitly defined way of converting the gene/locus/etc name to a Name tag (for, say, GBrowse). Any particular reason? Should I stick with GFF2 for now? chris From bix at sendu.me.uk Wed Jul 4 06:00:31 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 04 Jul 2007 11:00:31 +0100 Subject: [Bioperl-l] Splitting Bioperl Message-ID: <468B6FBF.1070708@sendu.me.uk> To summarise some previous threads: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/focus=15409 # Bioperl is currently one monolithic distribution of ~900 modules # There is some desire to split it up into smaller functional groups # There are some problems with that proposal # An extreme variant of that proposal is to make the groups individual modules Following this discussion: http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html (especially Adam Kennedy's postings of 4/07, soon to appear in that archive), the extreme variant doesn't seem like a good idea. I'm now suggesting that Steve's original split idea, as modified/expanded by Adam's driver and other ideas, is the best choice. The problems I previously identified can be solved in the same way they were solved in my extreme variant: the splits are done by Build.PL automation working on a single repository/code-base, not by splitting things up at the repository level. As I see it, the way forward now is for someone interested enough to decide on the specifics of how things will be split and offer them up to the group for discussion. I don't mean vague possibilities of what might work as a split, but rather some real thought should go into it to make sure the split makes sense and will actually work in practice. Following that, the splits can be implemented by some automated dist action of Build.PL. If there isn't sufficient interest to make this happen, I don't see that as a terrible thing. There are benefits to keeping Bioperl monolithic, and some of the problems (eg. lack of updates) can be solved without changing its nature. From cjfields at uiuc.edu Wed Jul 4 10:53:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 09:53:45 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <468B6FBF.1070708@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> Message-ID: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> On Jul 4, 2007, at 5:00 AM, Sendu Bala wrote: > To summarise some previous threads: > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/ > focus=15409 > > # Bioperl is currently one monolithic distribution of ~900 modules > # There is some desire to split it up into smaller functional groups > # There are some problems with that proposal > # An extreme variant of that proposal is to make the groups individual > modules > > > Following this discussion: > http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html > (especially Adam Kennedy's postings of 4/07, soon to appear in that > archive), the extreme variant doesn't seem like a good idea. brian d foy made some sound arguments against it as well. > I'm now suggesting that Steve's original split idea, as > modified/expanded by Adam's driver and other ideas, is the best > choice. > The problems I previously identified can be solved in the same way > they > were solved in my extreme variant: the splits are done by Build.PL > automation working on a single repository/code-base, not by splitting > things up at the repository level. > > As I see it, the way forward now is for someone interested enough to > decide on the specifics of how things will be split and offer them > up to > the group for discussion. I don't mean vague possibilities of what > might > work as a split, but rather some real thought should go into it to > make > sure the split makes sense and will actually work in practice. We've already identified a few (SearchIO, Tools, GBrowse-related, etc). ... > If there isn't sufficient interest to make this happen, I don't see > that > as a terrible thing. There are benefits to keeping Bioperl monolithic, > and some of the problems (eg. lack of updates) can be solved without > changing its nature. If so, proposals that solve this problem need to be made as well. If we stay monolithic, then here's mine: we start having fixed, regularly timed dev releases like Parrot, monthly or bimonthly (quite common on CPAN), with brief release reports on which bugs have been fixed, code has been added, so on. Not every bug has to be fixed per dev release; if that were true there would never be releases for some of the XML parser packages. No RCs for dev releases (it's a dev release!). These would be 1.x.y. We can then, every once in a while, have a bug-squashing session, hackathon, etc, and have regular non-dev release (1.x) that all core devs accept and that passes a particular milestone. As for the advantage of a split approach, as mentioned previously it is to focus modules/tests/scripts into groups with related functions. Even just splitting off ones with external reqs (XML parsers, GD, etc) into an 'aux' release would be an advantage, as it doesn't confront a new user with the burden of installing a large list of dependencies, some of which may be complicated for a perl newbie to either install from scratch (DBD::mysql, GD) or to get the latest bug-fixed prereq release for their OS (the recent debacle with XML::SAX::Expat issues come to mind, which wasn't immediately available for win32 as a PPM). I'm fairly open to any approach as long as it's reasonably though out, though I am admittedly a bit biased towards the split approach. I do think some change is in order; I worry about there ever being a 1.6 release at this point. chris From davila at ioc.fiocruz.br Wed Jul 4 13:11:20 2007 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Wed, 04 Jul 2007 14:11:20 -0300 Subject: [Bioperl-l] ESTs in EST format Message-ID: <468BD4B8.5050105@ioc.fiocruz.br> Dear All, I am trying to get all ESTs from a given species (eg: Trypanosoma brucei) from Genbank in EST format (eg: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucest&id=10280980)... while using Entrez I can "display" individual EST entries in EST format, this "EST format" is not an option in the main "display" menu for batch download ... I dont see the EST format listed (http://www.bioperl.org/wiki/Sequence_formats) among the ones that SeqIO deal with, so wonder there would another BioPerl module to do this ? any tips, would be greatly appreciated ;-) Kindest regards, Alberto From jason at bioperl.org Wed Jul 4 13:52:59 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 10:52:59 -0700 Subject: [Bioperl-l] ESTs in EST format In-Reply-To: <468BD4B8.5050105@ioc.fiocruz.br> References: <468BD4B8.5050105@ioc.fiocruz.br> Message-ID: Currently we don't support this format as far as I know it isn't a published standard nor is it a format that you NCBI distributes this data in flat format for (i.e. genbank dumps). Is there any reason why you can't get what you need from the GenBank format? http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? db=nucest&qty=1&c_start=1&list_uids=10280980&uids=&dopt=gb -jason On Jul 4, 2007, at 10:11 AM, Alberto Davila wrote: > Dear All, > > I am trying to get all ESTs from a given species (eg: Trypanosoma > brucei) from Genbank in EST format (eg: > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > db=nucest&id=10280980)... > while using Entrez I can "display" individual EST entries in EST > format, > this "EST format" is not an option in the main "display" menu for > batch > download ... > > I dont see the EST format listed > (http://www.bioperl.org/wiki/Sequence_formats) among the ones that > SeqIO > deal with, so wonder there would another BioPerl module to do > this ? any > tips, would be greatly appreciated ;-) > > Kindest regards, Alberto > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From dmessina at wustl.edu Wed Jul 4 14:37:22 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 4 Jul 2007 13:37:22 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> Message-ID: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > we start having fixed, > regularly timed dev releases like Parrot, monthly or bimonthly (quite > common on CPAN), with brief release reports on which bugs have been > fixed, code has been added, so on. Not every bug has to be fixed per > dev release; if that were true there would never be releases for some > of the XML parser packages. No RCs for dev releases (it's a dev > release!). These would be 1.x.y. We can then, every once in a > while, have a bug-squashing session, hackathon, etc, and have regular > non-dev release (1.x) that all core devs accept and that passes a > particular milestone. Regardless of whether we split or don't, I think these ideas of adding a little more structure to BioPerl's development cycles -- especially having bug-squashing and hacking sessions, where we all band together and commit some time to cranking through a bunch of to- dos -- would be beneficial, particularly as a means to keeping a certain basal level of momentum in BioPerl. Dave From jason at bioperl.org Wed Jul 4 15:45:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 12:45:29 -0700 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I definitely agree - we can live up to the unstable "living on the edge" nature of dev releases a bit more perhaps? On Jul 4, 2007, at 11:37 AM, David Messina wrote: > > On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > >> we start having fixed, >> regularly timed dev releases like Parrot, monthly or bimonthly (quite >> common on CPAN), with brief release reports on which bugs have been >> fixed, code has been added, so on. Not every bug has to be fixed per >> dev release; if that were true there would never be releases for some >> of the XML parser packages. No RCs for dev releases (it's a dev >> release!). These would be 1.x.y. We can then, every once in a >> while, have a bug-squashing session, hackathon, etc, and have regular >> non-dev release (1.x) that all core devs accept and that passes a >> particular milestone. > > > Regardless of whether we split or don't, I think these ideas of > adding a little more structure to BioPerl's development cycles -- > especially having bug-squashing and hacking sessions, where we all > band together and commit some time to cranking through a bunch of to- > dos -- would be beneficial, particularly as a means to keeping a > certain basal level of momentum in BioPerl. > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Wed Jul 4 16:54:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 15:54:14 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I think what's partially responsible for slowing down releases is the expectation that each dev release is supposed to have all bugs fixed, work for every OS, etc. In other words, act like a stable release. A developer release by nature is living on the edge, so why not have regular dev releases? We keep telling users to update to using bioperl-live whenever something breaks, anyway. We could decide to split stuff off along the way into more 'stable' sections if there were more demand for it, and have the more API-volatile code (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the 'dev' tag until we feel it's ready for prime time. chris On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > I definitely agree - we can live up to the unstable "living on the > edge" nature of dev releases a bit more perhaps? > > > On Jul 4, 2007, at 11:37 AM, David Messina wrote: > >> >> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: >> >>> we start having fixed, >>> regularly timed dev releases like Parrot, monthly or bimonthly >>> (quite >>> common on CPAN), with brief release reports on which bugs have been >>> fixed, code has been added, so on. Not every bug has to be fixed >>> per >>> dev release; if that were true there would never be releases for >>> some >>> of the XML parser packages. No RCs for dev releases (it's a dev >>> release!). These would be 1.x.y. We can then, every once in a >>> while, have a bug-squashing session, hackathon, etc, and have >>> regular >>> non-dev release (1.x) that all core devs accept and that passes a >>> particular milestone. >> >> >> Regardless of whether we split or don't, I think these ideas of >> adding a little more structure to BioPerl's development cycles -- >> especially having bug-squashing and hacking sessions, where we all >> band together and commit some time to cranking through a bunch of to- >> dos -- would be beneficial, particularly as a means to keeping a >> certain basal level of momentum in BioPerl. >> >> Dave >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Thu Jul 5 04:09:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 09:09:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: <468CA721.4020804@sheffield.ac.uk> Chris Fields wrote: > I think what's partially responsible for slowing down releases is the > expectation that each dev release is supposed to have all bugs fixed, > work for every OS, etc. In other words, act like a stable release. > > A developer release by nature is living on the edge, so why not have > regular dev releases? We keep telling users to update to using > bioperl-live whenever something breaks, anyway. We could decide to > split stuff off along the way into more 'stable' sections if there > were more demand for it, and have the more API-volatile code > (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the > 'dev' tag until we feel it's ready for prime time. > > chris > > On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > > -- snip -- I agree, although would the dev releases still need to pass all the tests? I'm thinking of people installing via CPAN. I also agree with what was said in a previous post about bringing back bioperl-run (and some others) back into the same repository as bioperl-core (after a successful move over to svn) and have Build.PL deal with creating the packages etc for CPAN. This would hopefully help keep the run package (and others) up to speed with the core package. I also agree with previous posts about organising and/or having some naming convention for test data files. I think an approach whereby data files were organised into directory trees (1 - 3 deep) with names that elude to the type of data in that subtree/file rather than the tests that use it etc. For example: t/data |__ formats | |__ seq | | |__ legal_fasta | | | |__ extension.fas | | | |__ extension.fasta | | | |__ extension.foo | | | |__ extension.bar | | | |__ no_extension | | | |__ interleaved.fas | | | |__ non_interleaved.fas | | | |__ single_seq.fas | | | |__ multiple_seq.fas | | | |__ desc_line1.fas | | | |__ desc_line2.fas | | | | | |__ illegal_fasta | | | |__ illegal_chars.fas | | | |__ some_other_illegal_alternative.fas | | | | | |__ legal_genbank | | | |__ etc etc | | | | | |__ illegal_genank | | |__ etc etc | | | |__ aln | |__ blast | | |__ legal_blastx | | | | | |__ legal_blastp | | | | | |__ legal_tblastx | | | | | |__ legal_plastpsi | | | | | |__ legal_wublast | |__ foo | |__ bar | |__ misc | |__ etc This type of setup, might lend itself to having a test script simply try to parse all the files in a directory to ensure nothing fails (for legal file formats) and fails for illegal formats. Naming of the file paths would help test authors to identify a suitable data file for their own tests before adding their own to the t/data dir. It might also help to identify areas where example test data is currently lacking. Thinking about this a little more, I think it would be a good idea to include Test::Exception in t/lib. We should also be testing that warnings and exceptions are generated when expected - e.g. illegal characters in seq files etc etc. Without these sorts of tests we are only getting half the story. This testing might account for a large chunk of the poor test coverage, particularly when it comes to branches in the code. Anyway, this type of reorganisation couldn't take place until the svn repo is up and working. I'd appreciate any comments on the above! Nath From bix at sendu.me.uk Thu Jul 5 04:55:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 09:55:25 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <468CB1FD.7060301@sendu.me.uk> Nathan S. Haigh wrote: > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Yes, they'd all have to pass. 'Developer release' should never have the connotation of 'broken release'. However, getting all tests to pass is a lot easier than fixing all bugs in bugzilla. (... which actually goes to show how poor our tests are) Worst case, if we were forced to stick to a schedule but couldn't fix a failing test, we could always make it a 'todo' test. > I also agree with what was said in a previous post about bringing back > bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) Agree (with myself essentially). > I also agree with previous posts about organising and/or having some > naming convention for test data files. I think an approach whereby data > files were organised into directory trees (1 - 3 deep) with names that > elude to the type of data in that subtree/file rather than the tests > that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas [snip] At that level, files don't need extensions and can have fully informative names that explain what's interesting or special about them. > This type of setup, might lend itself to having a test script simply try > to parse all the files in a directory to ensure nothing fails (for legal > file formats) and fails for illegal formats. Great idea. > Thinking about this a little more, I think it would be a good idea to > include Test::Exception in t/lib. Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > Anyway, this type of reorganisation couldn't take place until the svn > repo is up and working. Agree. From bix at sendu.me.uk Thu Jul 5 05:39:10 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 10:39:10 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <468CBC3E.1020408@sendu.me.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Thinking about this a little more, I think it would be a good idea to >> include Test::Exception in t/lib. > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. I've now done that: BioperlTest loads Test::Exception, from the copy in t/lib if necessary. So, in BioperlTest-using scripts you now have access to the methods dies_ok, lives_ok, throws_ok and lives_and. From N.Haigh at sheffield.ac.uk Thu Jul 5 06:01:04 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 11:01:04 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <1183629664.468cc1609891a@webmail.shef.ac.uk> Quoting Sendu Bala : -- snip -- > > > > I also agree with previous posts about organising and/or having some > > naming convention for test data files. I think an approach whereby data > > files were organised into directory trees (1 - 3 deep) with names that > > elude to the type of data in that subtree/file rather than the tests > > that use it etc. For example: > > > > t/data > > |__ formats > > | |__ seq > > | | |__ legal_fasta > > | | | |__ extension.fas > [snip] > > At that level, files don't need extensions and can have fully > informative names that explain what's interesting or special about them. > You may be correct in most cases, however, isn't there a method for detecting the file format from the file extension and failing that it peeks inside the file? Therefore there should be a file extension for each of these to get good code coverage as well as each format not having an extension to check that the peek inside the file correctly determines the format. -- snip -- From bix at sendu.me.uk Thu Jul 5 06:04:16 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:04:16 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <1183629664.468cc1609891a@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> Message-ID: <468CC220.804@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Sendu Bala : > > -- snip -- >> >>> I also agree with previous posts about organising and/or having >>> some naming convention for test data files. I think an approach >>> whereby data files were organised into directory trees (1 - 3 >>> deep) with names that elude to the type of data in that >>> subtree/file rather than the tests that use it etc. For example: >>> >>> t/data |__ formats | |__ seq | | |__ >>> legal_fasta | | | |__ extension.fas >>> >> [snip] >> >> At that level, files don't need extensions and can have fully >> informative names that explain what's interesting or special about >> them. >> > > You may be correct in most cases, however, isn't there a method for > detecting the file format from the file extension and failing that it > peeks inside the file? Therefore there should be a file extension for > each of these to get good code coverage as well as each format not > having an extension to check that the peek inside the file correctly > determines the format. Yes, you're quite correct. From bix at sendu.me.uk Thu Jul 5 06:47:12 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:47:12 +0100 Subject: [Bioperl-l] Warnings Message-ID: <468CCC30.90406@sendu.me.uk> I'm trying to get Test::Warn to work with Bioperl warnings as produced by Bio::Root::RootI::warn(). However, afaict the warnings must be generated with CORE::warn(), not print STDERR. Is there any particular reason RootI::warn is done with print and not CORE::warn ? Can I change it to a warn? From bix at sendu.me.uk Thu Jul 5 09:04:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:04:50 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <200707051458.59921.heikki@sanbi.ac.za> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> Message-ID: <468CEC72.4090909@sendu.me.uk> Heikki Lehvaslaiho wrote: > My guess is that using 'print STDERR' avoids showing sometimes annoying > errordescription at programname line NN > syntax being used. Afaik, CORE::warn "anything\n"; never includes the line number: messages with a new line always disable that feature. Bio::Root::RootI::warn /always/ puts new lines into the message, so they /never/ have the line number. > On the other hand, the main reason we need to set verbosity to 1 in BioPerl > objects is to find where warnings are coming from. Maybe extra text in > warnings leads to easier debugging. > > I favour changing it. So its my understanding there will be absolutely no difference in behaviour following this change (except that warning can be caught by Test::Warn). I just wanted to confirm my understanding. From hlapp at gmx.net Thu Jul 5 09:07:27 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 5 Jul 2007 09:07:27 -0400 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > Chris Fields wrote: >> I think what's partially responsible for slowing down releases is the >> expectation that each dev release is supposed to have all bugs fixed, >> work for every OS, etc. In other words, act like a stable release. >> It doesn't. A stable release has a stable API that will be supported until the next stable release through point releases. >> A developer release by nature is living on the edge, so why not have >> regular dev releases? There's no problem with regular dev releases, but tests will need to pass. There was never a stipulation that all bugs need to have been fixed. But all tests need to pass, so in an ideal world (in which everything is being tested) all tests passing would imply all (known) bugs fixed. Obviously, we don't live in an ideal world ... If not everything passes then what is the big difference to a code snapshot? If using cvs (or svn) is too difficult for most people, we can consider creating a mechanism that puts up nightly snapshots for download. > -- snip -- > > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. For example, that's another point. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From heikki at sanbi.ac.za Thu Jul 5 09:12:37 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 15:12:37 +0200 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <200707051512.38185.heikki@sanbi.ac.za> One more suggestion: It would be extemaly useful if we had a standard way of testing that a when a file is read into a bioperl object and then written out again into a same format, the input and output files are identical. If not, the test should show where the the differences start (showing all the differences would just clutter the screen). This standard method/subroutine should be used to test all sequence and other text file IO. Any takers? -Heikki On Thursday 05 July 2007 11:39:10 Sendu Bala wrote: > Sendu Bala wrote: > > Nathan S. Haigh wrote: > >> Thinking about this a little more, I think it would be a good idea to > >> include Test::Exception in t/lib. > > > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Jul 5 08:58:59 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 14:58:59 +0200 Subject: [Bioperl-l] Warnings In-Reply-To: <468CCC30.90406@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> Message-ID: <200707051458.59921.heikki@sanbi.ac.za> My guess is that using 'print STDERR' avoids showing sometimes annoying errordescription at programname line NN syntax being used. On the other hand, the main reason we need to set verbosity to 1 in BioPerl objects is to find where warnings are coming from. Maybe extra text in warnings leads to easier debugging. I favour changing it. -Heikki On Thursday 05 July 2007 12:47:12 Sendu Bala wrote: > I'm trying to get Test::Warn to work with Bioperl warnings as produced > by Bio::Root::RootI::warn(). However, afaict the warnings must be > generated with CORE::warn(), not print STDERR. > > Is there any particular reason RootI::warn is done with print and not > CORE::warn ? Can I change it to a warn? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Thu Jul 5 09:44:08 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:44:08 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF5A8.7040402@sendu.me.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that > a when a file is read into a bioperl object and then written out > again into a same format, the input and output files are identical. As Hilmar has pointed out in the past, Bioperl doesn't aim for the files to be identical, only for none of the information to be lost and to be ouput in the correct format. So a round-trip test should read in the original, store all the parsed data, write it out, then read in the written version and see if the new parsed data matches the original. For simpler or ultra-strict file formats, though... > If not, the test should show where the the differences start (showing > all the differences would just clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other text file IO. > > Any takers? There's already something along these lines in t/SeqIO.t (the section that uses Algorithm::Diff). I copied that over from the old testformats.pl script but haven't really taken the time to see if its a good way of doing the test. Is it? Can someone come up with something better? Can someone generalise it if necessary? I imagine you could just read the files into arrays and use Test::More::is_deeply(). If that would be satisfactory I could easily add a little method to BioperlTest that did that. From n.haigh at sheffield.ac.uk Thu Jul 5 09:47:24 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 14:47:24 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF66C.2070907@sheffield.ac.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that a when a > file is read into a bioperl object and then written out again into a same > format, the input and output files are identical. If not, the test should > show where the the differences start (showing all the differences would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence and other > text file IO. > > Any takers? > > -Heikki > Wouldn't this require info about the formatting of the file to be stored in the object as well, such that the same formatting could be used when writing the file? Wouldn't a better approach be to read the contents of file1 into ojb1, write obj1 to file2 in the same format, and then read file2 into obj2 and compare obj1 to obj2 to ensure we have all the same data. Nath From cjfields at uiuc.edu Thu Jul 5 09:52:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 08:52:12 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote: > ... > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Remains to be decided. All current tests (net and non-non) should pass. Any bug fixes should try to have added tests if possible, with in-process stuff as TODO's. Network tests are left up to user discretion, so if they fail for any particular reason there is a way around them. > I also agree with what was said in a previous post about bringing > back bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) and have > Build.PL deal with creating the packages etc for CPAN. This would > hopefully help keep the run package (and others) up to speed with > the core package. It's up to how we want to have everything split. I don't think it's immediately prescient (there are more important priorities, i.e. bugs, svn) but I would say folding everything back into live and 'splitting' them out using an automated Build process is a viable option. > I also agree with previous posts about organising and/or having > some naming convention for test data files. I think an approach > whereby data files were organised into directory trees (1 - 3 deep) > with names that elude to the type of data in that subtree/file > rather than the tests that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas > | | | |__ extension.fasta > | | | |__ extension.foo > | | | |__ extension.bar > | | | |__ no_extension > | | | |__ interleaved.fas > | | | |__ non_interleaved.fas > | | | |__ single_seq.fas > | | | |__ multiple_seq.fas > | | | |__ desc_line1.fas > | | | |__ desc_line2.fas > | | | > | | |__ illegal_fasta > | | | |__ illegal_chars.fas > | | | |__ > some_other_illegal_alternative.fas > | | | > | | |__ legal_genbank > | | | |__ etc etc > | | | > | | |__ illegal_genank > | | |__ etc etc > | | > | |__ aln > | |__ blast > | | |__ legal_blastx > | | | > | | |__ legal_blastp > | | | > | | |__ legal_tblastx > | | | > | | |__ legal_plastpsi > | | | > | | |__ legal_wublast > | |__ foo > | |__ bar > | |__ misc > | > |__ etc > > This type of setup, might lend itself to having a test script > simply try to parse all the files in a directory to ensure nothing > fails (for legal file formats) and fails for illegal formats. > Naming of the file paths would help test authors to identify a > suitable data file for their own tests before adding their own to > the t/data dir. It might also help to identify areas where example > test data is currently lacking. ... This seems like more of a 'guess sequence' and format validation issue, something we've talked about before: http://bugzilla.open-bio.org/show_bug.cgi?id=1508 The way I feel about it is sequence format validation and sequence parsing should be separate issues and therefore in separate classes (with parsing optionally preceded by validation), but that's something for another discussion. > Thinking about this a little more, I think it would be a good idea > to include Test::Exception in t/lib. We should also be testing that > warnings and exceptions are generated when expected - e.g. illegal > characters in seq files etc etc. Without these sorts of tests we > are only getting half the story. This testing might account for a > large chunk of the poor test coverage, particularly when it comes > to branches in the code. > > Anyway, this type of reorganisation couldn't take place until the > svn repo is up and working. > > I'd appreciate any comments on the above! > Nath chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:08:29 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:08:29 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CF5A8.7040402@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> Message-ID: <468CFB5D.6080406@sheffield.ac.uk> Is there a way to install all the modules that are used in the tests? I mean there are cases where tests are skipped and pass if the required module for testing is not installed. Therefore, missing out a chunk of the tests. It would be desirable to be able to install all these modules in order to complete they whole test suite - any ideas if/how this can be done? Cheers Nath From bix at sendu.me.uk Thu Jul 5 10:15:34 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 15:15:34 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: <468CFD06.3080604@sendu.me.uk> Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these modules > in order to complete they whole test suite - any ideas if/how this can > be done? Yes, add them as recommended (or perhaps 'build_requires') modules in Build.PL, then run Build.PL and install the modules when it asks you. Everything should be in Build.PL already. If I missed something, please add it. From cjfields at uiuc.edu Thu Jul 5 10:18:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:08 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 9:08 AM, Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the > tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these > modules > in order to complete they whole test suite - any ideas if/how this can > be done? > > Cheers > Nath That's optionally done upon 'perl Build.PL', correct? So if you choose not to install a particular prereq (i.e. XML::SAX), you shouldn't be forced to install it later just for tests. Or am I misunderstanding you? chris From cjfields at uiuc.edu Thu Jul 5 10:18:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:23 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CC220.804@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > Nathan S. Haigh wrote: >> Quoting Sendu Bala : >>> ... >>> At that level, files don't need extensions and can have fully >>> informative names that explain what's interesting or special about >>> them. >>> >> >> You may be correct in most cases, however, isn't there a method for >> detecting the file format from the file extension and failing that it >> peeks inside the file? Therefore there should be a file extension for >> each of these to get good code coverage as well as each format not >> having an extension to check that the peek inside the file correctly >> determines the format. > > Yes, you're quite correct. I actually like Sendu's idea more, or the idea of each test suite having it's own directory. Tests which need to guess/validate the format are probably best left sequestered to a specific suite focused on format guessing/ validation, at least in my opinion. chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:22:40 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:22:40 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFD06.3080604@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> Message-ID: <468CFEB0.80201@sheffield.ac.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Is there a way to install all the modules that are used in the tests? >> I mean there are cases where tests are skipped and pass if the >> required module for testing is not installed. Therefore, missing out a >> chunk of the tests. It would be desirable to be able to install all >> these modules in order to complete they whole test suite - any ideas >> if/how this can be done? > > Yes, add them as recommended (or perhaps 'build_requires') modules in > Build.PL, then run Build.PL and install the modules when it asks you. > > Everything should be in Build.PL already. If I missed something, please > add it. > OK, to clarify using the test file Sendu mentioned in a previous post: t/SeqIO.t This test skips tests if Algorithm::Diff, IO::ScalarArray or IO::String are not installed (the first two are not mentioned in Build.PL). However, if there are a lot of such skips in the whole test suite then there maybe few system with all these modules installed in order to conduct a complete test. These are the modules I'm referring to. Nath From n.haigh at sheffield.ac.uk Thu Jul 5 10:30:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:30:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: <468D006D.6050806@sheffield.ac.uk> Chris Fields wrote: > > On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > >> Nathan S. Haigh wrote: >>> Quoting Sendu Bala : >>>> ... >>>> At that level, files don't need extensions and can have fully >>>> informative names that explain what's interesting or special about >>>> them. >>>> >>> >>> You may be correct in most cases, however, isn't there a method for >>> detecting the file format from the file extension and failing that it >>> peeks inside the file? Therefore there should be a file extension for >>> each of these to get good code coverage as well as each format not >>> having an extension to check that the peek inside the file correctly >>> determines the format. >> >> Yes, you're quite correct. > > I actually like Sendu's idea more, or the idea of each test suite having > it's own directory. > > Tests which need to guess/validate the format are probably best left > sequestered to a specific suite focused on format guessing/validation, > at least in my opinion. > > chris How easily would this lend itself to using the same data for multiple tests, or is it likely to lead to/exacerbate a culture of adding duplicate data files in each "test suite" rather than reusing? Nath From cjfields at uiuc.edu Thu Jul 5 10:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:33:46 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 8:07 AM, Hilmar Lapp wrote: > On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > >> Chris Fields wrote: >>> I think what's partially responsible for slowing down releases is >>> the >>> expectation that each dev release is supposed to have all bugs >>> fixed, >>> work for every OS, etc. In other words, act like a stable release. > > It doesn't. A stable release has a stable API that will be > supported until the next stable release through point releases. I agree, but I think there is still an expectation that 1.5.2 and beyond are more like true 'stable' releases even though we still designate them as 'developer.' We unfortunately reinforce that when we tell users they need to update to v. 1.5.2 or bioperl-live to fix a particular bug in the 1.4 release. There's nothing we can do about that now (hindsight is always 20/20, and 1.4 is just too old). We (pumpkin, core devs) can try correcting that by ensuring any bug fixes be committed to any new stable branch as well as to live, at least until it becomes too problematic to maintain that particular stable branch (at which point we would go about getting ready for the next 'stable' and repeat the cycle over again). >>> A developer release by nature is living on the edge, so why not have >>> regular dev releases? > > There's no problem with regular dev releases, but tests will need > to pass. There was never a stipulation that all bugs need to have > been fixed. But all tests need to pass, so in an ideal world (in > which everything is being tested) all tests passing would imply all > (known) bugs fixed. Obviously, we don't live in an ideal world ... ...particularly when it comes to network-related tests and remote server problems (but those are by default not run, so there is a way around test fails there). I agree here as well (all tests must pass). As for the bug fixes, we can just stipulate which ones were fixed with the release (in a RELEASE_NOTES or similar), and maybe have TODO's in the test suite designating they are being worked on. Basically, at regular intervals, maybe with a few weeks of lead time, the pumpkin would announce an impending dev. release. Go through rounds of tests, bug fixes, etc. When all tests pass post it on CPAN as a dev. release. If we have a stable release branch with relevant bug fixes we can post that as well, again to the point where it becomes too problematic. Would we just take a snapshot of MAIN and any relevant stable branch at that particular point for the CPAN release, just increasing the version number (1.x.y)? Would it make sense to have a 1.x.y branch for each release (I don't think so, but maybe others disagree)? > If not everything passes then what is the big difference to a code > snapshot? If using cvs (or svn) is too difficult for most people, > we can consider creating a mechanism that puts up nightly snapshots > for download. If we feel a nightly snapshot is warranted we could do that though. I personally don't think there is a need, particularly since we have several means to obtain the latest code at any point in time (including the browsable CVS 'Download tarball'). We could state the next dev/stable CPAN release (pending on date dd/mm/yy) will have the bug fix, and if they want it immediately then pick it up from CVS. >> -- snip -- >> >> I agree, although would the dev releases still need to pass all the >> tests? I'm thinking of people installing via CPAN. > > For example, that's another point. > > -hilmar Yes, I agree. As an aside, I don't think dev. releases pop up when you run a simple 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know the answer to that. chris From cjfields at uiuc.edu Thu Jul 5 10:34:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:34:22 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > One more suggestion: > > It would be extemaly useful if we had a standard way of testing > that a when a > file is read into a bioperl object and then written out again into > a same > format, the input and output files are identical. If not, the test > should > show where the the differences start (showing all the differences > would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other > text file IO. > > Any takers? > > -Heikki ... I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t that do some checking, I think, but something like this would be of use. However, what if the test file is old (as many in t/data are) and the format has changed? GenBank and EMBL, for instance, have gone through several changes to format. chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:43:51 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:43:51 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <468D03A7.3090408@sheffield.ac.uk> Chris Fields wrote: -- snip -- >>> >>> I agree, although would the dev releases still need to pass all the >>> tests? I'm thinking of people installing via CPAN. >> >> For example, that's another point. >> >> -hilmar > > Yes, I agree. > > As an aside, I don't think dev. releases pop up when you run a simple > 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know > the answer to that. > > chris Thats right, it'll only install the non-developer releases (1.4 currently). If you want to install the developer release from CPAN you need to know the path the archive and then do: cpan> install S/SE/SENDU/bioperl-1.5.2_102.tar.gz as detailed on the wiki: http://www.bioperl.org/wiki/Release_1.5.2 Nath From cjfields at uiuc.edu Thu Jul 5 10:49:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:49:33 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFEB0.80201@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> Message-ID: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > Sendu Bala wrote: >> ... >> Yes, add them as recommended (or perhaps 'build_requires') modules in >> Build.PL, then run Build.PL and install the modules when it asks you. >> >> Everything should be in Build.PL already. If I missed something, >> please >> add it. >> > > OK, to clarify using the test file Sendu mentioned in a previous post: > t/SeqIO.t > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > IO::String > are not installed (the first two are not mentioned in Build.PL). > However, if there are a lot of such skips in the whole test suite then > there maybe few system with all these modules installed in order to > conduct a complete test. These are the modules I'm referring to. > > Nath If they are only necessary for tests, work for all OSs, and are pure Perl they should be added to t/lib, like Test::More and the rest. If they only work for some OSs they could be added to t/lib and skip based on OS, but they still must be pure Perl. I would avoid anything that requires any compiling for XS or Inline altogether (I don't want to go down the nightmare road of OS-dependent compiler issues for a few tests). Finally, if they are needed for core modules (not just tests) then they should be added to the core prereqs in Build. chris From cjfields at uiuc.edu Thu Jul 5 10:52:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:52:58 -0500 Subject: [Bioperl-l] Warnings In-Reply-To: <468CEC72.4090909@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> Message-ID: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > ... > > So its my understanding there will be absolutely no difference in > behaviour following this change (except that warning can be caught by > Test::Warn). I just wanted to confirm my understanding. You can always just try it out and run tests. Might be interesting to see if anything breaks. chris From N.Haigh at sheffield.ac.uk Thu Jul 5 10:58:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 15:58:30 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> Message-ID: <1183647510.468d07168963c@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > > > > One more suggestion: > > > > It would be extemaly useful if we had a standard way of testing > > that a when a > > file is read into a bioperl object and then written out again into > > a same > > format, the input and output files are identical. If not, the test > > should > > show where the the differences start (showing all the differences > > would just > > clutter the screen). > > > > This standard method/subroutine should be used to test all sequence > > and other > > text file IO. > > > > Any takers? > > > > -Heikki > ... > > I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t > that do some checking, I think, but something like this would be of > use. However, what if the test file is old (as many in t/data are) > and the format has changed? GenBank and EMBL, for instance, have > gone through several changes to format. > > chris > > Is there any way to distinguish variants apart other than just layout? e.g. a version number of the likes? Nath From N.Haigh at sheffield.ac.uk Thu Jul 5 11:04:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 16:04:30 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> Message-ID: <1183647870.468d087ed4c80@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > > > Sendu Bala wrote: > >> ... > >> Yes, add them as recommended (or perhaps 'build_requires') modules in > >> Build.PL, then run Build.PL and install the modules when it asks you. > >> > >> Everything should be in Build.PL already. If I missed something, > >> please > >> add it. > >> > > > > OK, to clarify using the test file Sendu mentioned in a previous post: > > t/SeqIO.t > > > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > > IO::String > > are not installed (the first two are not mentioned in Build.PL). > > However, if there are a lot of such skips in the whole test suite then > > there maybe few system with all these modules installed in order to > > conduct a complete test. These are the modules I'm referring to. > > > > Nath > > If they are only necessary for tests, work for all OSs, and are pure > Perl they should be added to t/lib, like Test::More and the rest. If > they only work for some OSs they could be added to t/lib and skip > based on OS, but they still must be pure Perl. I would avoid > anything that requires any compiling for XS or Inline altogether (I > don't want to go down the nightmare road of OS-dependent compiler > issues for a few tests). If this is the case, there surely is no need to skip the tests if they should be provided in the t/lib dir. Am I missing something!? > > Finally, if they are needed for core modules (not just tests) then > they should be added to the core prereqs in Build. > > chris > From bix at sendu.me.uk Thu Jul 5 11:13:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:13:35 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: <468D0A9F.4010709@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Chris Fields : >>> OK, to clarify using the test file Sendu mentioned in a previous >>> post: t/SeqIO.t >>> >>> This test skips tests if Algorithm::Diff, IO::ScalarArray or >>> IO::String are not installed >> >> If they are only necessary for tests, work for all OSs, and are >> pure Perl they should be added to t/lib, like Test::More and the >> rest. If they only work for some OSs they could be added to t/lib >> and skip based on OS, but they still must be pure Perl. I would >> avoid anything that requires any compiling for XS or Inline >> altogether (I don't want to go down the nightmare road of >> OS-dependent compiler issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? That skip in SeqIO.t is new and I simply didn't think of them as important enough to make anyone install them or include them in t/lib. I'd go ahead and add those modules, but like I say, it may make more sense just to use is_deeply(), removing the dependency on Algorithm::Diff and IO::ScalarArray completely. From cjfields at uiuc.edu Thu Jul 5 11:35:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:35:41 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: On Jul 5, 2007, at 10:04 AM, Nathan S. Haigh wrote: > ... >> If they are only necessary for tests, work for all OSs, and are pure >> Perl they should be added to t/lib, like Test::More and the rest. If >> they only work for some OSs they could be added to t/lib and skip >> based on OS, but they still must be pure Perl. I would avoid >> anything that requires any compiling for XS or Inline altogether (I >> don't want to go down the nightmare road of OS-dependent compiler >> issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? No, you are correct, but these are currently not in t/lib (unless someone snuck them in....) Of the modules you listed above, only one (IO::String) is required by the core modules. The others are not. Users shouldn't be forced to install Algorithm::Diff or IO::ScalarArray just to run tests, so anything not required should go into t/lib if at all possible. If there any reasons (OS issues, list of prereqs) which preclude adding these to t/lib we need to ask ourselves (1) why we are using that module in the first place? And, if there is a good reason, (2) can we skip them if they aren't present? Both of those options are already available. chris From cjfields at uiuc.edu Thu Jul 5 11:50:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:50:55 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468D006D.6050806@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> <468D006D.6050806@sheffield.ac.uk> Message-ID: <404EEDE8-53AC-411E-B4F0-CF4B4AABE9E0@uiuc.edu> On Jul 5, 2007, at 9:30 AM, Nathan S. Haigh wrote: > ... >> I actually like Sendu's idea more, or the idea of each test suite >> having it's own directory. >> Tests which need to guess/validate the format are probably best >> left sequestered to a specific suite focused on format guessing/ >> validation, at least in my opinion. >> chris > > > How easily would this lend itself to using the same data for > multiple tests, or is it likely to lead to/exacerbate a culture of > adding duplicate data files in each "test suite" rather than reusing? > > Nath If there is a group of test data used for more than one test suite we can group those together into a common use folder, or we can go by format. I'm pretty open to anything, really, as long as it is more organized. My point is really concerned more with validation/guessing. I think we should limit those tests to their respective specific test suites, or even to sections within a particular test suite (for instance, genbank.t), but not to force sequence guessing or validation in other cases. To me validation, guessing, and parsing are three distinct issues (much like XML parsers handle things), so they require three distinct tests. As for true sequence validation, there is no official format validation scheme yet in BioPerl. It's sort of unofficially intergrated into the sequence parsers themselves (something which I find to be problematic for several reasons too long to outline here). chris From cjfields at uiuc.edu Thu Jul 5 11:54:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:54:42 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <1183647510.468d07168963c@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> <1183647510.468d07168963c@webmail.shef.ac.uk> Message-ID: <48474A2C-2A58-4D51-8E7F-7CE083948D0F@uiuc.edu> On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote: > Quoting Chris Fields : > >> >> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: >> >>> >>> One more suggestion: >>> >>> It would be extemaly useful if we had a standard way of testing >>> that a when a >>> file is read into a bioperl object and then written out again into >>> a same >>> format, the input and output files are identical. If not, the test >>> should >>> show where the the differences start (showing all the differences >>> would just >>> clutter the screen). >>> >>> This standard method/subroutine should be used to test all sequence >>> and other >>> text file IO. >>> >>> Any takers? >>> >>> -Heikki >> ... >> >> I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t >> that do some checking, I think, but something like this would be of >> use. However, what if the test file is old (as many in t/data are) >> and the format has changed? GenBank and EMBL, for instance, have >> gone through several changes to format. >> >> chris >> >> > > Is there any way to distinguish variants apart other than just > layout? e.g. a version number of the likes? > > Nath I don't think so; this veers back into the whole validation issue (i.e. does the record fit certain specifications). There are examples of seq records from different sources which bioperl is expected to parse, for example Ensembl GenBank records. Some of those have feature tags or annotation fields which may not appear in output when using write_seq(). I don't think it's as important to replicate the output data exactly like the input as much as it's important to have the data represented in a Bio::Seq object (or any other Bio* instance) in a consistent manner and have the ability to incorporate new fields (such as the recent addition of genome projects) transparently. The latter is hard to do with the current genbank parser (you have to specifically code for it), but it is a bit easier to do with the driver-handler model I'm working on. chris From bix at sendu.me.uk Thu Jul 5 11:56:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:56:29 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <468D14AD.8050007@sendu.me.uk> Sendu Bala wrote: > Sendu Bala wrote: >> Nathan S. Haigh wrote: >>> Thinking about this a little more, I think it would be a good idea to >>> include Test::Exception in t/lib. >> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. And I've also now added in support for Test::Warn, giving you warning_is, warnings_are, warning_like and warnings_like. I've updated the HOWTO as well: http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests You can see these things in action in t/seq_quality.t From bix at sendu.me.uk Thu Jul 5 11:57:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:57:23 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> Message-ID: <468D14E3.6030104@sendu.me.uk> Chris Fields wrote: > > On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > >> ... >> >> So its my understanding there will be absolutely no difference in >> behaviour following this change (except that warning can be caught by >> Test::Warn). I just wanted to confirm my understanding. > > You can always just try it out and run tests. Might be interesting to > see if anything breaks. I've made the change. Everything seems ok as far as I can tell. From dmessina at wustl.edu Thu Jul 5 12:02:26 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:02:26 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 9:33 AM, Chris Fields wrote: > I agree, but I think there is still an expectation that 1.5.2 and > beyond are more like true 'stable' releases even though we still > designate them as 'developer.' We unfortunately reinforce that when > we tell users they need to update to v. 1.5.2 or bioperl-live to fix > a particular bug in the 1.4 release. I know this has been discussed before, but while we're talking about future release plans, it might be worth revisiting the BioPerl policy of designating only even-numbered releases as 'stable'. It's taking so long to get from 1.4 to 1.6. While the principle of keeping a stable API between 'stable' releases is valid in the ideal case, I think that continuing to label 1.5.2 (or whatever the latest 'dev' release is) as a developer release (which implies potentially unstable or bleeding-edge code) is highly misleading since we would never ever tell anyone to get 1.4 instead. Alternatively, if we adopt a more aggressive release schedule as Chris proposed a couple days ago, then perhaps we could agree to push out an even-numbered release once a year or so, so that there is a 'stable' release we could recommend. > If we feel a nightly snapshot is warranted we could do that though. > I personally don't think there is a need, particularly since we have > several means to obtain the latest code at any point in time > (including the browsable CVS 'Download tarball'). We could state the > next dev/stable CPAN release (pending on date dd/mm/yy) will have the > bug fix, and if they want it immediately then pick it up from CVS. To make it easier for people to obtain the latest tarball, we could put the 'download tarball' link directly on the 'Getting_BioPerl' wiki page instead of only a link to the viewcvs interface. That way they wouldn't have to navigate the source tree to figure out which tarball they want (which is almost always going to be the bioperl- live tarball). I think the actual URL underlying the 'Download tarball' link on viewcvs is stable: http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- live.tar.gz?tarball=1 Dave From cjfields at uiuc.edu Thu Jul 5 12:13:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:13:30 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 11:02 AM, David Messina wrote: > ... > I know this has been discussed before, but while we're talking > about future release plans, it might be worth revisiting the > BioPerl policy of designating only even-numbered releases as > 'stable'. It's taking so long to get from 1.4 to 1.6. While the > principle of keeping a stable API between 'stable' releases is > valid in the ideal case, I think that continuing to label 1.5.2 (or > whatever the latest 'dev' release is) as a developer release (which > implies potentially unstable or bleeding-edge code) is highly > misleading since we would never ever tell anyone to get 1.4 instead. > > Alternatively, if we adopt a more aggressive release schedule as > Chris proposed a couple days ago, then perhaps we could agree to > push out an even-numbered release once a year or so, so that there > is a 'stable' release we could recommend. I think the idea of 'stable' is best summarized back in Hilmar's post (i.e. we support a particular API for that release). The 1.5 releases I believe break some aspects of 1.4 API (some of the Feature/ Annotation stuff introduced before the official 1.5 release). We still need to address some of those issues before a 1.6 which seems to be the only real stumbling block, but they are unfortunately not well-documented and are somewhat interwoven with GMOD code. > ... > To make it easier for people to obtain the latest tarball, we could > put the 'download tarball' link directly on the 'Getting_BioPerl' > wiki page instead of only a link to the viewcvs interface. That way > they wouldn't have to navigate the source tree to figure out which > tarball they want (which is almost always going to be the bioperl- > live tarball). > > I think the actual URL underlying the 'Download tarball' link on > viewcvs is stable: > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- > live.tar.gz?tarball=1 > > Dave Sounds reasonable enough. Do you want to do the honors? chris From dmessina at wustl.edu Thu Jul 5 12:44:28 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:44:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> > [Chris] > The 1.5 releases I believe break some aspects of 1.4 API Yes, this is true. I question, though, whether it's relevant given that virtually no one uses 1.4 anymore. In any case, I would venture that the number of people who would be bitten by the 1.4->1.5 API change is much smaller than the number of people who download 1.4 and then ask us why it doesn't work. I think that, rather than continuing to call 1.5.x the developer release in order to adhere to the API guarantee, it would be much clearer to users if we state clearly that everyone should download 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API changes. >> [me] >> we could put the 'download tarball' link directly on the >> 'Getting_BioPerl' wiki page > > [Chris] > Sounds reasonable enough. Do you want to do the honors? Done. Dave From cjfields at uiuc.edu Thu Jul 5 12:57:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:57:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: On Jul 5, 2007, at 11:44 AM, David Messina wrote: > >> [Chris] >> The 1.5 releases I believe break some aspects of 1.4 API > > Yes, this is true. > > I question, though, whether it's relevant given that virtually no > one uses 1.4 anymore. In any case, I would venture that the number > of people who would be bitten by the 1.4->1.5 API change is much > smaller than the number of people who download 1.4 and then ask us > why it doesn't work. > > I think that, rather than continuing to call 1.5.x the developer > release in order to adhere to the API guarantee, it would be much > clearer to users if we state clearly that everyone should download > 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API > changes. You'd be surprised how many are still using bioperl 1.2.3 (Ensembl) and 1.4 (any admin too scared to go with a 'dev' release). The real answer is to get out a stable 1.6 ASAP. The problem we currently have is (horrible Texas pun) 'too many pokers in the fire.' We have svn migration, major changes in the test suite, talk about splitting bioperl, a lot of bugs to sort through, new code to add or work on, etc. Not to mention our $jobs! I think we should just bite the bullet and proceed with pulling out the controversial operator overloading in Bio::Annotation*, deprecate the tag methods in AnnotatableI, and go about fixing everything up. If that occurs (which seems to be the major impediment) and we get GMOD/GBrowse playing well with BioPerl then we can aim for a new stable release, and then institute a regular release cycle. chris From bpederse at gmail.com Thu Jul 5 13:58:24 2007 From: bpederse at gmail.com (Brent Pedersen) Date: Thu, 5 Jul 2007 10:58:24 -0700 Subject: [Bioperl-l] slippy map for genomic features. Message-ID: hi, here's a side project i've been tinkering on in googlecode svn that may be useful to some. http://code.google.com/p/genome-browser/ it's a simple hack on top of OpenLayers (openlayers.org) to provide a javascript slippy map interface and API to view and browse genomic features. It can be used with any image generation program that can accept &xmin= and &xmax= parameters through the url. -- though i havent had it working it bioperl as bioperl generates images of different height depending on the number of tracks. there's a live example of the code in SVN here: http://toxic.berkeley.edu/bpederse/genome-browser/ with images generated by a colleague's modules on first request. those images are then cached by a simple perl script included in the SVN repo. all subsequent requests are returned from the cache. an image request (automatically generated by the javascript) looks like: http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512 but any implementation need only implement xmin and xmax. all other parameters will be used for caching but are not required. if anyone is interested in getting this going with bioperl image generation--or improving the project in any way, let me know and i'll add you as a committer and provide any javascript support that i can. -brent tar ball download: http://genome-browser.googlecode.com/files/genome-browser-0.02.tar From dmessina at wustl.edu Thu Jul 5 14:39:16 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 13:39:16 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: > The real answer is to get out a stable 1.6 ASAP. The problem we > currently have is (horrible Texas pun) 'too many pokers in the > fire.' We have svn migration, major changes in the test suite, > talk about splitting bioperl, a lot of bugs to sort through, new > code to add or work on, etc. Not to mention our $jobs! Yep, I hear ya. > I think we should just bite the bullet and proceed with pulling out > the controversial operator overloading in Bio::Annotation*, > deprecate the tag methods in AnnotatableI, and go about fixing > everything up. If that occurs (which seems to be the major > impediment) and we get GMOD/GBrowse playing well with BioPerl then > we can aim for a new stable release, and then institute a regular > release cycle. That's a great plan. You're right -- better to devote energy to 1.6 than to interim solutions. Alright, I give, I give! :) Dave From glauberwagner at yahoo.com.br Thu Jul 5 15:56:43 2007 From: glauberwagner at yahoo.com.br (Glauber Wagner) Date: Thu, 5 Jul 2007 16:56:43 -0300 (ART) Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com> Message-ID: <839755.95349.qm@web36514.mail.mud.yahoo.com> Dear All, I have a problem if Bio::DB::Query::GenBank module. I am trying to count the number of protein sequences and the module did not return the expected number by count object. use Bio::DB::GenBank; use Bio::DB::Query::GenBank; $query_string = "Trypanosoma cruzi[Organism]"; my $query = Bio::DB::Query::GenBank->new(-db=>'protein', -query=>$query_string); my $count = $query->count; my @ids = $query->ids; print "$count\n"; Thanks. Glauber ____________________________________________________________________________________ Novo Yahoo! Cad?? - Experimente uma nova busca. http://yahoo.com.br/oqueeuganhocomisso From cjfields at uiuc.edu Thu Jul 5 16:21:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 15:21:49 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <839755.95349.qm@web36514.mail.mud.yahoo.com> References: <839755.95349.qm@web36514.mail.mud.yahoo.com> Message-ID: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> NCBI esearch doesn't seem to be working at the moment. I'm getting 'Internal Server Error' at this time. Try back again at a later point. chris On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote: > Dear All, > > I have a problem if Bio::DB::Query::GenBank module. I > am trying to count the number of protein sequences and > the module did not return the expected number by count > object. > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > $query_string = "Trypanosoma cruzi[Organism]"; > > my $query = > Bio::DB::Query::GenBank->new(-db=>'protein', > > -query=>$query_string); > my $count = $query->count; > my @ids = $query->ids; > > print "$count\n"; > > Thanks. > Glauber > > > > > ______________________________________________________________________ > ______________ > Novo Yahoo! Cad?? - Experimente uma nova busca. > http://yahoo.com.br/oqueeuganhocomisso > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mitch_skinner at berkeley.edu Thu Jul 5 17:22:38 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Thu, 05 Jul 2007 14:22:38 -0700 Subject: [Bioperl-l] slippy map for genomic features. In-Reply-To: References: Message-ID: <468D611E.7020904@berkeley.edu> Hi, FWIW, we've been working on something similar: http://genome.biowiki.org/dmel/static/browser/prototype_gbrowse.html based on GBrowse/Bio::Graphics and javascript that Andrew wrote from scratch (with the prototype library). When our project was starting up (fall 05) Andrew looked but didn't find openlayers; I'm not sure if it was public back then but their current svn only goes back to 2006. I think that things like layout (bumping) ought to be done in advance on a chromosome-wide basis; otherwise it's difficult to keep features from ending up at different heights on neighboring tiles. And it would be difficult for the server to know what was being clicked on. So we've been doing some up-front work to either do layout or to just render all the tiles in advance: http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/TileGenerator.pm?revision=1.1&view=markup which is driven by this script: http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/generate-tiles.pl?revision=1.14&view=markup Or you could just not bump at all, I guess. I think of that as important functionality but I'd be interested in hearing about use cases where it's not necessary. It's not just bumping, though; things like text labels also make it difficult to predict exactly what pixels a feature will span if you only have its genomic coordinates. To make features clickable we've been using imagemaps; it simplifies the server code but it bogs down the client quite a bit. I'd certainly be interested in seeing if there are ways we could work together; if you're at Berkeley maybe we could meet. Regards, Mitch Brent Pedersen wrote: > hi, > here's a side project i've been tinkering on in googlecode svn that > may be useful to some. > http://code.google.com/p/genome-browser/ > it's a simple hack on top of OpenLayers (openlayers.org) to provide a > javascript slippy map interface and API to view and browse genomic > features. It can be used with any image generation program that can > accept &xmin= and &xmax= parameters through the url. -- though i > havent had it working it bioperl as bioperl generates images of > different height depending on the number of tracks. > > there's a live example of the code in SVN here: > http://toxic.berkeley.edu/bpederse/genome-browser/ > with images generated by a colleague's modules on first request. those > images are then cached by a simple perl script included in the SVN > repo. all subsequent requests are returned from the cache. > an image request (automatically generated by the javascript) looks like: > http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512 > but any implementation need only implement xmin and xmax. all other > parameters will be used for caching but are not required. > > if anyone is interested in getting this going with bioperl image > generation--or improving the project in any way, let me know and i'll > add you as a committer and provide any javascript support that i can. > > -brent > > tar ball download: > http://genome-browser.googlecode.com/files/genome-browser-0.02.tar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Jul 5 17:42:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 16:42:40 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> References: <839755.95349.qm@web36514.mail.mud.yahoo.com> <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> Message-ID: <3219E785-D475-4C21-ABCC-89FABD502E05@uiuc.edu> Update: seems to be back up. Give it a try now. chris On Jul 5, 2007, at 3:21 PM, Chris Fields wrote: > NCBI esearch doesn't seem to be working at the moment. I'm getting > 'Internal Server Error' at this time. Try back again at a later > point. > > chris > > On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote: > >> Dear All, >> >> I have a problem if Bio::DB::Query::GenBank module. I >> am trying to count the number of protein sequences and >> the module did not return the expected number by count >> object. >> >> use Bio::DB::GenBank; >> use Bio::DB::Query::GenBank; >> >> $query_string = "Trypanosoma cruzi[Organism]"; >> >> my $query = >> Bio::DB::Query::GenBank->new(-db=>'protein', >> >> -query=>$query_string); >> my $count = $query->count; >> my @ids = $query->ids; >> >> print "$count\n"; >> >> Thanks. >> Glauber >> >> >> >> >> _____________________________________________________________________ >> _ >> ______________ >> Novo Yahoo! Cad?? - Experimente uma nova busca. >> http://yahoo.com.br/oqueeuganhocomisso >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Jul 6 03:09:17 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 08:09:17 +0100 Subject: [Bioperl-l] API Changes In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: <468DEA9D.6010809@sheffield.ac.uk> David Messina wrote: >> [Chris] >> The 1.5 releases I believe break some aspects of 1.4 API >> > > Yes, this is true. > > I question, though, whether it's relevant given that virtually no one > uses 1.4 anymore. In any case, I would venture that the number of > people who would be bitten by the 1.4->1.5 API change is much smaller > than the number of people who download 1.4 and then ask us why it > doesn't work. > I'm not really up-to-speed with how the API should remain stable etc. Is the idea that the API should be stable from 1.4 though the 1.5 dev and then the next stale release can change that API? So any stable to stable upgrade could involve an API change while a stable to dev upgrade should have the same API? Does a stable API mean that the same method calls are available in a newer release....what about adding new methods to a newer release? How are these API changes currently tracked? It seems to me that Test::More might be able to help in testing the API: can_ok($module, @methods); Nath From n.haigh at sheffield.ac.uk Fri Jul 6 07:10:14 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 12:10:14 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange Message-ID: <468E2316.1030804@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm taking a look at the tests for Bio::Variation::RNAChange. If you create a new oject without arguments: my $obj = Bio::Variation::RNAChange->new(); What do you expect the following to return: $obj->label(); I thought it would probably be: 'inframe' However you get: 'inframe, deletion' Can anyone in the know explain what behaviour would be expected? Cheers Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjiMVczuW2jkwy2gRAv+0AJ9tA/1WgEbTRCen+FCi/DU/P2RnAwCfbGit B8DxDViDOcx2gTFjSwQ2kNg= =SroY -----END PGP SIGNATURE----- From n.haigh at sheffield.ac.uk Fri Jul 6 08:54:33 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 13:54:33 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E2316.1030804@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> Message-ID: <468E3B89.3090202@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nathan S. Haigh wrote: > I'm taking a look at the tests for Bio::Variation::RNAChange. > > If you create a new oject without arguments: > my $obj = Bio::Variation::RNAChange->new(); > > What do you expect the following to return: > $obj->label(); > > I thought it would probably be: > 'inframe' > > However you get: > 'inframe, deletion' > > Can anyone in the know explain what behaviour would be expected? > > Cheers > Nath Following on from this, AAChange has the following two methods: add_Allele() and allele_mut() It appears that allele_mut is only capable of remembering 1 allele at a time, whereas add_Allele() is provided to add support for mutliple alleles - is that correct? However, add_Allele() also calls allele_mut(), such that mutliple calls to add_Allele will result in the overwriting of the allele being remembered by allele_mut(). Things are further complicated by the fact that label() uses allele_mut() to decide on the label to return. Shouldn't label know aout multiple alleles set by multiple calls to add_Allele? It may be my lack of understanding alleles and what these classes are intending to do, but trying to rewrite the test scripts to improve code coverage has let me a little confused! Thanks Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjjuJczuW2jkwy2gRAgogAKDXAn8h5iFIBCjtQgxYsrUGofYpOwCguC6I b8ZOENvDDDIxphAoxeKg8/E= =f/sa -----END PGP SIGNATURE----- From tanzeem.mb at gmail.com Thu Jul 5 02:39:34 2007 From: tanzeem.mb at gmail.com (tanzeem) Date: Wed, 4 Jul 2007 23:39:34 -0700 (PDT) Subject: [Bioperl-l] Problem working with remoteblast submit method in webbrowser. In-Reply-To: <11114623.post@talk.nabble.com> References: <11114623.post@talk.nabble.com> Message-ID: <11441586.post@talk.nabble.com> Ifound it myself.run apache as root and disable selinux, the problem will not recur. tanzeem wrote: > > I have a program which uses the Bio perl remoteblast module which > compares a aminoacid fasta file with swissprot database. The > submit_blast() method works successfully when run from commandline.But > when the program is run from web browser it returns -1. I was trying to > adapt the code from Remoteblast synopsis for my need. > -- View this message in context: http://www.nabble.com/Problem-working-with-remoteblast-submit-method-in-webbrowser.-tf3919886.html#a11441586 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Fri Jul 6 09:00:32 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 06 Jul 2007 09:00:32 -0400 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: <1183726832.2566.34.camel@localhost.localdomain> On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote: > > I think we should just bite the bullet and proceed with pulling out > the controversial operator overloading in Bio::Annotation*, deprecate > the tag methods in AnnotatableI, and go about fixing everything up. > If that occurs (which seems to be the major impediment) and we get > GMOD/GBrowse playing well with BioPerl then we can aim for a new > stable release, and then institute a regular release cycle. > I think this sounds like a good idea to me too. I'm planning on having a GMOD hackathon at the end of the summer; if I had a new API by then, we could focus on fixing anything that gets broken by the changes. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Fri Jul 6 09:10:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 6 Jul 2007 08:10:41 -0500 Subject: [Bioperl-l] API Changes In-Reply-To: <468DEA9D.6010809@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> Message-ID: On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote: > David Messina wrote: >>> [Chris] >>> The 1.5 releases I believe break some aspects of 1.4 API >>> >> >> Yes, this is true. >> >> I question, though, whether it's relevant given that virtually no one >> uses 1.4 anymore. In any case, I would venture that the number of >> people who would be bitten by the 1.4->1.5 API change is much smaller >> than the number of people who download 1.4 and then ask us why it >> doesn't work. >> > > I'm not really up-to-speed with how the API should remain stable > etc. Is > the idea that the API should be stable from 1.4 though the 1.5 dev and > then the next stale release can change that API? So any stable to > stable > upgrade could involve an API change while a stable to dev upgrade > should > have the same API? Does a stable API mean that the same method > calls are > available in a newer release....what about adding new methods to a > newer > release? > > How are these API changes currently tracked? It seems to me that > Test::More might be able to help in testing the API: > > can_ok($module, @methods); > > > Nath It's basically a 'contract' of sorts between the devs (us) and users (us/them) that the API won't change for the extent of that release series, thus ensuring any scripts out there generating tons of data won't break down if they attempt to call a renamed method. We try to maintain the API state anyway for those reasons, but in a dev release series we might decide to change some method names for consistency and deprecate older ambiguously-named methods (see below). For a stable release it's critical the API remain intact. There are a few methods which are considered deprecated or will be deprecated. For instance, we recently talked about changes to method names which use case to specify whether you're receiving an object (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested list, or whether to use each_* vs next_* for iterators. Consistency is nice! chris From heikki at sanbi.ac.za Fri Jul 6 09:20:26 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 6 Jul 2007 15:20:26 +0200 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E3B89.3090202@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> <468E3B89.3090202@sheffield.ac.uk> Message-ID: <200707061520.27000.heikki@sanbi.ac.za> Hi Nat, These modules have not been touched for a while and were developed for a specific task. A revire is defiitely in order. The way RNAChange->label was written, it should return 'inframe' when given no alleles, but 'no change' would actually be better. The multiple alleles were originally though to be a good idea, but the vocabulary for labels was developed for single allele, only, The use of the module ended up being limited to single allele, so add_allele() behaviour was conveniently ignored but not removed. :( -Heikki On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: > Nathan S. Haigh wrote: > > I'm taking a look at the tests for Bio::Variation::RNAChange. > > > > If you create a new oject without arguments: > > my $obj = Bio::Variation::RNAChange->new(); > > > > What do you expect the following to return: > > $obj->label(); > > > > I thought it would probably be: > > 'inframe' > > > > However you get: > > 'inframe, deletion' > > > > Can anyone in the know explain what behaviour would be expected? > > > > Cheers > > Nath > > Following on from this, AAChange has the following two methods: > add_Allele() and allele_mut() > > It appears that allele_mut is only capable of remembering 1 allele at a > time, whereas add_Allele() is provided to add support for mutliple > alleles - is that correct? > > However, add_Allele() also calls allele_mut(), such that mutliple calls > to add_Allele will result in the overwriting of the allele being > remembered by allele_mut(). Things are further complicated by the fact > that label() uses allele_mut() to decide on the label to return. > Shouldn't label know aout multiple alleles set by multiple calls to > add_Allele? > > It may be my lack of understanding alleles and what these classes are > intending to do, but trying to rewrite the test scripts to improve code > coverage has let me a little confused! > > Thanks > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From schlesi at ebi.ac.uk Fri Jul 6 10:24:05 2007 From: schlesi at ebi.ac.uk (Felix Schlesinger) Date: Fri, 6 Jul 2007 15:24:05 +0100 Subject: [Bioperl-l] Unrooting a tree Message-ID: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> Hi, I am reading a rooted tree in newick format from a string (i.e. a bifurcation at the root) and would like to unroot it (i.e. a trifurcation at the root). I tried getting a grandchild of the root and adding it as a direct child, but that does not seem to work (the root still only has two descendents and the tree structure gets messed up). Is there a nice way to do this directly in bioperl? Doing it on the newick string is possible of course, but not nice. Thanks Felix From n.haigh at sheffield.ac.uk Fri Jul 6 11:37:19 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 16:37:19 +0100 Subject: [Bioperl-l] API Changes In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> Message-ID: <468E61AF.9040106@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris Fields wrote: > > On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote: > >> David Messina wrote: >>>> [Chris] >>>> The 1.5 releases I believe break some aspects of 1.4 API >>>> >>> >>> Yes, this is true. >>> >>> I question, though, whether it's relevant given that virtually no one >>> uses 1.4 anymore. In any case, I would venture that the number of >>> people who would be bitten by the 1.4->1.5 API change is much smaller >>> than the number of people who download 1.4 and then ask us why it >>> doesn't work. >>> >> >> I'm not really up-to-speed with how the API should remain stable etc. Is >> the idea that the API should be stable from 1.4 though the 1.5 dev and >> then the next stale release can change that API? So any stable to stable >> upgrade could involve an API change while a stable to dev upgrade should >> have the same API? Does a stable API mean that the same method calls are >> available in a newer release....what about adding new methods to a newer >> release? >> >> How are these API changes currently tracked? It seems to me that >> Test::More might be able to help in testing the API: >> >> can_ok($module, @methods); >> >> >> Nath > > It's basically a 'contract' of sorts between the devs (us) and users > (us/them) that the API won't change for the extent of that release > series, thus ensuring any scripts out there generating tons of data > won't break down if they attempt to call a renamed method. We try to > maintain the API state anyway for those reasons, but in a dev release > series we might decide to change some method names for consistency and > deprecate older ambiguously-named methods (see below). For a stable > release it's critical the API remain intact. Hmm, still not 100% clear - it is Friday! So, someone running a script that was designed when 1.4 was released should still be able to run their script for all future releases. So all changes need to be backward compatible? So you have several situations regarding method names: 1) Adding new methods should e fine since past scripts don't know about them and won't have used them 2) Removing methods would break past scripts that used them 3) Renamed methods would break past scripts that used the old name A stable API to me, means the same method calls should still be able to accept the same arguments (inc the constructor) and return the same object/data etc. What if a module is pretty outdated and would benefit from a rewrite - should all the old method names be included, what if this makes coding difficult? > > There are a few methods which are considered deprecated or will be > deprecated. For instance, we recently talked about changes to method > names which use case to specify whether you're receiving an object > (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested > list, or whether to use each_* vs next_* for iterators. Consistency is > nice! > You mean the use of case to signify objects vs data being returned are to be deprecated or encouraged? What was the outcome of the each_* vs next_*? Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjmGvczuW2jkwy2gRAkGeAKDBXVSBvN0b39xbK1+2RLed35knSQCgz3pk kAWH1zVa1ycopijl761cvkQ= =fppH -----END PGP SIGNATURE----- From n.haigh at sheffield.ac.uk Fri Jul 6 11:43:41 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 16:43:41 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <200707061520.27000.heikki@sanbi.ac.za> References: <468E2316.1030804@sheffield.ac.uk> <468E3B89.3090202@sheffield.ac.uk> <200707061520.27000.heikki@sanbi.ac.za> Message-ID: <468E632D.4090801@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Heikki Lehvaslaiho wrote: > Hi Nat, > > These modules have not been touched for a while and were developed for a > specific task. A revire is defiitely in order. > > The way RNAChange->label was written, it should return 'inframe' when given no > alleles, but 'no change' would actually be better. Wouldn't this effectively be changing the API since past scripts "could" expect "inframe" to be returned. > > The multiple alleles were originally though to be a good idea, but the > vocabulary for labels was developed for single allele, only, The use of the > module ended up being limited to single allele, so add_allele() behaviour was > conveniently ignored but not removed. :( So add_Allele() and each_Allele() should be deprecated in favour of allele_mut()? - From my post about API's.....how should the capitalisation of add_Allele() and each_Allele() be changed? Cheers Nath > > -Heikki > > > > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: >> Nathan S. Haigh wrote: >>> I'm taking a look at the tests for Bio::Variation::RNAChange. >>> >>> If you create a new oject without arguments: >>> my $obj = Bio::Variation::RNAChange->new(); >>> >>> What do you expect the following to return: >>> $obj->label(); >>> >>> I thought it would probably be: >>> 'inframe' >>> >>> However you get: >>> 'inframe, deletion' >>> >>> Can anyone in the know explain what behaviour would be expected? >>> >>> Cheers >>> Nath >> Following on from this, AAChange has the following two methods: >> add_Allele() and allele_mut() >> >> It appears that allele_mut is only capable of remembering 1 allele at a >> time, whereas add_Allele() is provided to add support for mutliple >> alleles - is that correct? >> >> However, add_Allele() also calls allele_mut(), such that mutliple calls >> to add_Allele will result in the overwriting of the allele being >> remembered by allele_mut(). Things are further complicated by the fact >> that label() uses allele_mut() to decide on the label to return. >> Shouldn't label know aout multiple alleles set by multiple calls to >> add_Allele? >> >> It may be my lack of understanding alleles and what these classes are >> intending to do, but trying to rewrite the test scripts to improve code >> coverage has let me a little confused! >> >> Thanks >> Nath >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjmMtczuW2jkwy2gRAgQHAKC+S5mVh4lqR95NmgR6z+aU9br5lQCfc6ue GBHuSHfsesX1ko55s+ME2Zc= =tkG8 -----END PGP SIGNATURE----- From cjfields at uiuc.edu Sat Jul 7 16:57:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 7 Jul 2007 15:57:37 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <1183726832.2566.34.camel@localhost.localdomain> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <1183726832.2566.34.camel@localhost.localdomain> Message-ID: <198D3F24-8510-453D-9201-21F2CCEC3519@uiuc.edu> We'll prob. get a start soon, then. I'll let you know when we start. chris On Jul 6, 2007, at 8:00 AM, Scott Cain wrote: > On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote: >> >> I think we should just bite the bullet and proceed with pulling out >> the controversial operator overloading in Bio::Annotation*, deprecate >> the tag methods in AnnotatableI, and go about fixing everything up. >> If that occurs (which seems to be the major impediment) and we get >> GMOD/GBrowse playing well with BioPerl then we can aim for a new >> stable release, and then institute a regular release cycle. >> > I think this sounds like a good idea to me too. I'm planning on > having > a GMOD hackathon at the end of the summer; if I had a new API by then, > we could focus on fixing anything that gets broken by the changes. > > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Jul 7 17:17:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 7 Jul 2007 16:17:14 -0500 Subject: [Bioperl-l] API Changes In-Reply-To: <468E61AF.9040106@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> <468E61AF.9040106@sheffield.ac.uk> Message-ID: <369F72D5-E5A3-4A33-BDEC-D462A339474F@uiuc.edu> On Jul 6, 2007, at 10:37 AM, Nathan S. Haigh wrote: > ... > Hmm, still not 100% clear - it is Friday! > > So, someone running a script that was designed when 1.4 was released > should still be able to run their script for all future releases. > So all > changes need to be backward compatible? It helps. For instance, if we change method names (rename each_Foo as next_Foo), we should have each_Foo delegate to next_Foo for the time being. If we plan on deprecating the old method altogether we would add a warning message when it's called, then delegate. It's a better solution than just changing the method outright, which means the user has to search through docs to find the renamed method. > So you have several situations regarding method names: > 1) Adding new methods should e fine since past scripts don't know > about > them and won't have used them > 2) Removing methods would break past scripts that used them > 3) Renamed methods would break past scripts that used the old name > > A stable API to me, means the same method calls should still be > able to > accept the same arguments (inc the constructor) and return the same > object/data etc. Yes. > What if a module is pretty outdated and would benefit from a rewrite - > should all the old method names be included, what if this makes coding > difficult? It depends on the module. If a complete rewrite is needed then maybe starting with a new module/interface is best, and we could deprecate the older module completely. That has been done already with Bio::Tools::BPLite (in favor of SearchIO) and a few other modules. >> There are a few methods which are considered deprecated or will be >> deprecated. For instance, we recently talked about changes to method >> names which use case to specify whether you're receiving an object >> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. >> nested >> list, or whether to use each_* vs next_* for iterators. >> Consistency is >> nice! >> > > You mean the use of case to signify objects vs data being returned are > to be deprecated or encouraged? What was the outcome of the each_* vs > next_*? > > Nath Here's the section I added to the wiki (it started in a thread a few weeks or so ago, so it's a summary really): http://www.bioperl.org/wiki/Advanced_BioPerl#Method_names Feel free to add to it or make suggestions. BTWm Hilmar mentioned there was a movement to rename methods in old code to follow these recs but it was never completed. It should be taken up again at some point but the recommendations are mainly here for newer code. chris From heikki at sanbi.ac.za Sun Jul 8 03:32:21 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Sun, 8 Jul 2007 09:32:21 +0200 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E632D.4090801@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> <200707061520.27000.heikki@sanbi.ac.za> <468E632D.4090801@sheffield.ac.uk> Message-ID: <200707080932.21818.heikki@sanbi.ac.za> On Friday 06 July 2007 17:43:41 Nathan S. Haigh wrote: > Heikki Lehvaslaiho wrote: > > Hi Nat, > > > > These modules have not been touched for a while and were developed for a > > specific task. A revire is defiitely in order. > > > > The way RNAChange->label was written, it should return 'inframe' when > > given no alleles, but 'no change' would actually be better. > > Wouldn't this effectively be changing the API since past scripts "could" > expect "inframe" to be returned. Checking tha actal usage and what happens when you do change of a nucleotide to itself, you get the label 'silent'. I guess that would be a valid lable value even when the alleles are not initialised, too. > > The multiple alleles were originally though to be a good idea, but the > > vocabulary for labels was developed for single allele, only, The use of > > the module ended up being limited to single allele, so add_allele() > > behaviour was conveniently ignored but not removed. :( > > So add_Allele() and each_Allele() should be deprecated in favour of > allele_mut()? Yes. > From my post about API's.....how should the capitalisation of > add_Allele() and each_Allele() be changed? Definitely, keept the current ones as deprecated alternatives. -Heikki > Cheers > Nath > > > -Heikki > > > > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: > >> Nathan S. Haigh wrote: > >>> I'm taking a look at the tests for Bio::Variation::RNAChange. > >>> > >>> If you create a new oject without arguments: > >>> my $obj = Bio::Variation::RNAChange->new(); > >>> > >>> What do you expect the following to return: > >>> $obj->label(); > >>> > >>> I thought it would probably be: > >>> 'inframe' > >>> > >>> However you get: > >>> 'inframe, deletion' > >>> > >>> Can anyone in the know explain what behaviour would be expected? > >>> > >>> Cheers > >>> Nath > >> > >> Following on from this, AAChange has the following two methods: > >> add_Allele() and allele_mut() > >> > >> It appears that allele_mut is only capable of remembering 1 allele at a > >> time, whereas add_Allele() is provided to add support for mutliple > >> alleles - is that correct? > >> > >> However, add_Allele() also calls allele_mut(), such that mutliple calls > >> to add_Allele will result in the overwriting of the allele being > >> remembered by allele_mut(). Things are further complicated by the fact > >> that label() uses allele_mut() to decide on the label to return. > >> Shouldn't label know aout multiple alleles set by multiple calls to > >> add_Allele? > >> > >> It may be my lack of understanding alleles and what these classes are > >> intending to do, but trying to rewrite the test scripts to improve code > >> coverage has let me a little confused! > >> > >> Thanks > >> Nath > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From xing.y.hu at gmail.com Mon Jul 9 02:26:40 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Mon, 09 Jul 2007 14:26:40 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? Message-ID: <4691D520.60700@gmail.com> Hi friends, I wrote a script for getting genomic sequence file from GenBank. To fulfill that target, I used DB::GenBank module to get the sequence via get_Seq_by_acc, and it works well. But this time, facing enormous amount of ESTs, I have no idea how to download them swiftly and elegantly. PROBLEM DESCRIPTION: goal: download all EST files of a specific species from GenBank, say Arabidopsis Thaliana or Oryza sativa(rice). other: whether all of ESTs are in a single file or separatedly placed does not matter. Can I use a bioperl script to achieve that? And How? I really appreciate. Xing. From akozik at atgc.org Mon Jul 9 08:25:14 2007 From: akozik at atgc.org (Alexander Kozik) Date: Mon, 09 Jul 2007 05:25:14 -0700 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <4691D520.60700@gmail.com> References: <4691D520.60700@gmail.com> Message-ID: <4692292A.1080900@atgc.org> To download genomic sequences or ESTs for any organism (in various formats) you can use NCBI Taxonomy Browser: http://www.ncbi.nlm.nih.gov/Taxonomy/ you can use taxonomy id to access different organisms, Arabidopsis for example (3702): http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 or by direct web link: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 assembled genomes can be accessed via ftp: ftp://ftp.ncbi.nih.gov/genomes/ To download large amount of selected sequences (ESTs for example) you can use batch Entrez: http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide (select EST for EST, it's critical) It seems, to solve the problem you describe, you don't need to use bioperl. NCBI GenBank Entrez provides all necessary tools to work on these simple and frequent tasks. -Alex -- Alexander Kozik Bioinformatics Specialist Genome and Biomedical Sciences Facility 451 East Health Sciences Drive University of California Davis, CA 95616-8816 Phone: (530) 754-9127 email#1: akozik at atgc.org email#2: akozik at gmail.com web: http://www.atgc.org/ Xing Hu wrote: > Hi friends, > > I wrote a script for getting genomic sequence file from GenBank. To > fulfill that target, I used DB::GenBank module to get the sequence via > get_Seq_by_acc, and it works well. But this time, facing enormous amount > of ESTs, I have no idea how to download them swiftly and elegantly. > > PROBLEM DESCRIPTION: > goal: download all EST files of a specific species from GenBank, say > Arabidopsis Thaliana or Oryza sativa(rice). > other: whether all of ESTs are in a single file or separatedly > placed does not matter. > > Can I use a bioperl script to achieve that? And How? I really > appreciate. > > Xing. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Jul 9 10:17:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 9 Jul 2007 09:17:23 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <4692292A.1080900@atgc.org> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> Message-ID: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> Caveat: if you have millions of ESTs please consider NOT using my eutil script below or NCBI Batch Entrez, which would repeatedly hit the NCBI server thousands of times. At least try looking for other ways to retrieve the data you want (ftp, organism-specific resources like Ensembl, so on), or run any scripts or data retrieval in off hours so you don't overtax the NCBI server. There is a way you can use BioPerl if you don't mind living on the bleeding edge by using bioperl-live (core code from CVS). I have been working on a set of modules for the last year (Bio::DB::EUtilities) which interact with all the various eutils for building data pipelines which uses the NCBI CGI interface. You could possibly retrieve all relevant ESTs using a variation of the example script here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch Note that the code examples do NOT work with rel. 1.5.2 code as the API has changed quite a bit; I'm working to rectify some of that. The script I would use is below. It retrieves batches of 500 sequences (in fasta format) at a time, for a total of 10000 max seq records, saving the raw record data directly to a file (appending as you go along). I added an eval block to check the server status and redo the call up to 4 times before giving up completely. Using eval this way hasn't been extensively tested but should work. --------------------------------------- use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'nucest', -term => 'txid3702', -usehistory => 'y', -keep_histories => 1); my $count = $factory->get_count; print "Count: $count\n"; if (my $hist = $factory->next_History) { print "History returned\n"; # note db carries over from above $factory->set_parameters(-eutil => 'efetch', -rettype => 'fasta', -history => $hist); my ($retmax, $retstart) = (500,0); my $retry = 1; my $maxcount = $count < 10000 ? $count : 10000; # set max # seq records to return RETRIEVE_SEQS: while ($retstart < $maxcount) { print "Returning from ",$retstart+1," to ",$retstart+ $retmax,"\n"; $factory->set_parameters(-retmax => $retmax, -retstart => $retstart); # check in case of server error eval{ $factory->get_Response(-file => ">>ESTs.fas"); }; if ($@) { die "Server error: $@. Try again later" if $retry == 5; print STDERR "Server error, redo #$retry\n"; $retry++ && redo RETRIEVE_SEQS; } $retstart += $retmax; } } --------------------------------------- chris On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: > To download genomic sequences or ESTs for any organism (in various > formats) you can use NCBI Taxonomy Browser: > http://www.ncbi.nlm.nih.gov/Taxonomy/ > > you can use taxonomy id to access different organisms, Arabidopsis for > example (3702): > http://www.ncbi.nlm.nih.gov/sites/entrez? > db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 > > or by direct web link: > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? > mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 > > assembled genomes can be accessed via ftp: > ftp://ftp.ncbi.nih.gov/genomes/ > > To download large amount of selected sequences (ESTs for example) you > can use batch Entrez: > http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html > http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide > (select EST for EST, it's critical) > > It seems, to solve the problem you describe, you don't need to use > bioperl. NCBI GenBank Entrez provides all necessary tools to work on > these simple and frequent tasks. > > -Alex > > -- > Alexander Kozik > Bioinformatics Specialist > Genome and Biomedical Sciences Facility > 451 East Health Sciences Drive > University of California > Davis, CA 95616-8816 > Phone: (530) 754-9127 > email#1: akozik at atgc.org > email#2: akozik at gmail.com > web: http://www.atgc.org/ > > > > Xing Hu wrote: >> Hi friends, >> >> I wrote a script for getting genomic sequence file from >> GenBank. To >> fulfill that target, I used DB::GenBank module to get the sequence >> via >> get_Seq_by_acc, and it works well. But this time, facing enormous >> amount >> of ESTs, I have no idea how to download them swiftly and elegantly. >> >> PROBLEM DESCRIPTION: >> goal: download all EST files of a specific species from >> GenBank, say >> Arabidopsis Thaliana or Oryza sativa(rice). >> other: whether all of ESTs are in a single file or separatedly >> placed does not matter. >> >> Can I use a bioperl script to achieve that? And How? I really >> appreciate. >> >> Xing. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Mon Jul 9 14:08:07 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 9 Jul 2007 11:08:07 -0700 Subject: [Bioperl-l] Unrooting a tree In-Reply-To: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> Message-ID: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> I don't think there is a function for this yet but it would be a good one to have. I assume you don't really want to take a shot at writing it though? To make this work I think you have to create a new node which contains the trifurcation and this node is what the root is set to. -jason On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote: > Hi, > > I am reading a rooted tree in newick format from a string (i.e. a > bifurcation at the root) and would like to unroot it (i.e. a > trifurcation at the root). I tried getting a grandchild of the root > and adding it as a direct child, but that does not seem to work (the > root still only has two descendents and the tree structure gets messed > up). Is there a nice way to do this directly in bioperl? Doing it on > the newick string is possible of course, but not nice. > > Thanks > Felix > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From lstein at cshl.edu Mon Jul 9 17:35:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 9 Jul 2007 17:35:49 -0400 Subject: [Bioperl-l] JOB NOTICE: Looking for CSHL bioinformatics core manager Message-ID: <6dce9a0b0707091435h3d134b05oa6f7da24839c24bb@mail.gmail.com> Hi Folks, Sorry for the job spam. We're looking for a manager of the Cold Spring Harbor Laboratory bioinformatics core facility. This is a semi-independent staff position supporting CSHL scientific researchers by providing consultation, data mining and software development activities. You will have a software staff of two, a nice salary, good health benefits, and an exciting and dynamic environment to work in. I'm looking for someone with a strong bioinformatics background, at least five years experience programming Perl, Java or Python in a academic or commercial environment, and management experience. If you are interested, please send your CV and cover letter to me. Thanks, Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From stewarta at nmrc.navy.mil Mon Jul 9 18:16:12 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Mon, 9 Jul 2007 18:16:12 -0400 Subject: [Bioperl-l] rpsblast Message-ID: <9DF71DFB-F54E-4392-89E3-33345EC2DB36@nmrc.navy.mil> When I run... $result = $factory->rpsblast($seq); ... where $seq is a Bio::Seq object, it seems to simply copy the $seq object to $result; When I run something similar... $rpsblast('/path/to/ myFile'); ... the value of $result then becomes '/path/to/myFile'. Anyone else encounter this? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From jason_stajich at berkeley.edu Mon Jul 9 21:36:10 2007 From: jason_stajich at berkeley.edu (Jason Stajich) Date: Mon, 9 Jul 2007 18:36:10 -0700 Subject: [Bioperl-l] BOSC2007 Message-ID: I posted a quick note about meeting up at BOSC/ISMB this year. If you are attending, please sign your name on the page or at least express an interest on whether you are interested in a BoF. We'll try and discuss some of the current topics in BioPerl development as well try and use the time to coordinate any development that benefits from the face-to-face time. http://bioperl.org/wiki/BOSC2007_Meetup http://bioperl.org/news/2007/07/09/are-you-going-to-ismbbosc-2007/ -jason -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From schlesi at ebi.ac.uk Tue Jul 10 08:58:00 2007 From: schlesi at ebi.ac.uk (Felix Schlesinger) Date: Tue, 10 Jul 2007 13:58:00 +0100 Subject: [Bioperl-l] Unrooting a tree In-Reply-To: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> Message-ID: <7317d50c0707100558m76853bf8s37ee1e8852835306@mail.gmail.com> Hi, > I don't think there is a function for this yet but it would be a good one > to have. > I assume you don't really want to take a shot at writing it though? > To make this work I think you have to create a new node which contains the > trifurcation and this node is what the root is set to. Creating a new root is fine, but what would the (3) children of that node be? I took a different approach now, where I iterate over all (indirect) descendents of the root, find the first one which does not have the root as its direct ancestor and move it up the tree, i.e. foreach my $d ($root->get_all_Descendents){ if ($d->ancestor != $root){ $d->ancestor->remove_Descendent($d); if ($root->add_Descendent($d, 1) == 3){ last; }}} This will make the old root a trifurcation. It does the right thing for what I am trying to do, but is not general I believe (it does for example at the moment not worry about branch length). Also instead of taking the first, taking the most distant possible subtree of a clade up to the root might be better. Felix > On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote: > > Hi, > > I am reading a rooted tree in newick format from a string (i.e. a > bifurcation at the root) and would like to unroot it (i.e. a > trifurcation at the root). I tried getting a grandchild of the root > and adding it as a direct child, but that does not seem to work (the > root still only has two descendents and the tree structure gets messed > up). Is there a nice way to do this directly in bioperl? Doing it on > the newick string is possible of course, but not nice. > > Thanks > Felix > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From xing.y.hu at gmail.com Tue Jul 10 09:29:36 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Tue, 10 Jul 2007 21:29:36 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> Message-ID: <469389C0.5060303@gmail.com> Thanks you guys. I had to confess that how stupid I was. The easiest way seems to be the way using NCBI Taxonomy Browser which suggested by alex. As a matter of fact, I knew that but I thought it was necessary to have all items selected before pressing save to launch download. So I was desperate to find a button that could achieve that without hundreds of thousands of clicking by me. "What about select none of those items at all?" -- This idea finally came to me after days of struggling and the problem was solved. Xing Chris Fields wrote: > Caveat: if you have millions of ESTs please consider NOT using my > eutil script below or NCBI Batch Entrez, which would repeatedly hit > the NCBI server thousands of times. At least try looking for other > ways to retrieve the data you want (ftp, organism-specific resources > like Ensembl, so on), or run any scripts or data retrieval in off > hours so you don't overtax the NCBI server. > > There is a way you can use BioPerl if you don't mind living on the > bleeding edge by using bioperl-live (core code from CVS). I have been > working on a set of modules for the last year (Bio::DB::EUtilities) > which interact with all the various eutils for building data pipelines > which uses the NCBI CGI interface. You could possibly retrieve all > relevant ESTs using a variation of the example script here: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch > > Note that the code examples do NOT work with rel. 1.5.2 code as the > API has changed quite a bit; I'm working to rectify some of that. > > The script I would use is below. It retrieves batches of 500 > sequences (in fasta format) at a time, for a total of 10000 max seq > records, saving the raw record data directly to a file (appending as > you go along). I added an eval block to check the server status and > redo the call up to 4 times before giving up completely. Using eval > this way hasn't been extensively tested but should work. > > --------------------------------------- > > use Bio::DB::EUtilities; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'nucest', > -term => 'txid3702', > -usehistory => 'y', > -keep_histories => 1); > > my $count = $factory->get_count; > > print "Count: $count\n"; > > if (my $hist = $factory->next_History) { > print "History returned\n"; > # note db carries over from above > $factory->set_parameters(-eutil => 'efetch', > -rettype => 'fasta', > -history => $hist); > my ($retmax, $retstart) = (500,0); > my $retry = 1; > my $maxcount = $count < 10000 ? $count : 10000; # set max # seq > records to return > RETRIEVE_SEQS: > while ($retstart < $maxcount) { > print "Returning from ",$retstart+1," to > ",$retstart+$retmax,"\n"; > $factory->set_parameters(-retmax => $retmax, > -retstart => $retstart); > # check in case of server error > eval{ > $factory->get_Response(-file => ">>ESTs.fas"); > }; > if ($@) { > die "Server error: $@. Try again later" if $retry == 5; > print STDERR "Server error, redo #$retry\n"; > $retry++ && redo RETRIEVE_SEQS; > } > $retstart += $retmax; > } > } > > > --------------------------------------- > > > chris > > On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: > >> To download genomic sequences or ESTs for any organism (in various >> formats) you can use NCBI Taxonomy Browser: >> http://www.ncbi.nlm.nih.gov/Taxonomy/ >> >> you can use taxonomy id to access different organisms, Arabidopsis for >> example (3702): >> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >> >> >> or by direct web link: >> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >> >> >> assembled genomes can be accessed via ftp: >> ftp://ftp.ncbi.nih.gov/genomes/ >> >> To download large amount of selected sequences (ESTs for example) you >> can use batch Entrez: >> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >> (select EST for EST, it's critical) >> >> It seems, to solve the problem you describe, you don't need to use >> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >> these simple and frequent tasks. >> >> -Alex >> >> --Alexander Kozik >> Bioinformatics Specialist >> Genome and Biomedical Sciences Facility >> 451 East Health Sciences Drive >> University of California >> Davis, CA 95616-8816 >> Phone: (530) 754-9127 >> email#1: akozik at atgc.org >> email#2: akozik at gmail.com >> web: http://www.atgc.org/ >> >> >> >> Xing Hu wrote: >>> Hi friends, >>> >>> I wrote a script for getting genomic sequence file from GenBank. To >>> fulfill that target, I used DB::GenBank module to get the sequence via >>> get_Seq_by_acc, and it works well. But this time, facing enormous >>> amount >>> of ESTs, I have no idea how to download them swiftly and elegantly. >>> >>> PROBLEM DESCRIPTION: >>> goal: download all EST files of a specific species from GenBank, >>> say >>> Arabidopsis Thaliana or Oryza sativa(rice). >>> other: whether all of ESTs are in a single file or separatedly >>> placed does not matter. >>> >>> Can I use a bioperl script to achieve that? And How? I really >>> appreciate. >>> >>> Xing. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From davila at ioc.fiocruz.br Tue Jul 10 09:58:29 2007 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Tue, 10 Jul 2007 10:58:29 -0300 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <469389C0.5060303@gmail.com> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> Message-ID: <46939085.40906@ioc.fiocruz.br> Hi Xing, Unfortunately that did not work for me... there are 5133 T. brucei ESTs (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) and 13971 from T. cruzi (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) that I cannot download at once in GenBank format... even when I select "GenBank" format in the Display menu I can only see and get/download 500 ESTs each time... I also downloaded all ESTs from GenBank (a pity there are not subsets of them !) but merging all them generate a file bigger than 120GB to be processed... Just asked Diogo (my student) to give a try to the script sent by Chris Fields.. so finger crossed ;-) Cheers, Alberto Xing Hu wrote: > Thanks you guys. > > I had to confess that how stupid I was. The easiest way seems to be the > way using NCBI Taxonomy Browser which suggested by alex. As a matter of > fact, I knew that but I thought it was necessary to have all items > selected before pressing save to launch download. So I was desperate to > find a button that could achieve that without hundreds of thousands of > clicking by me. "What about select none of those items at all?" -- This > idea finally came to me after days of struggling and the problem was solved. > > Xing > > > > Chris Fields wrote: >> Caveat: if you have millions of ESTs please consider NOT using my >> eutil script below or NCBI Batch Entrez, which would repeatedly hit >> the NCBI server thousands of times. At least try looking for other >> ways to retrieve the data you want (ftp, organism-specific resources >> like Ensembl, so on), or run any scripts or data retrieval in off >> hours so you don't overtax the NCBI server. >> >> There is a way you can use BioPerl if you don't mind living on the >> bleeding edge by using bioperl-live (core code from CVS). I have been >> working on a set of modules for the last year (Bio::DB::EUtilities) >> which interact with all the various eutils for building data pipelines >> which uses the NCBI CGI interface. You could possibly retrieve all >> relevant ESTs using a variation of the example script here: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch >> >> Note that the code examples do NOT work with rel. 1.5.2 code as the >> API has changed quite a bit; I'm working to rectify some of that. >> >> The script I would use is below. It retrieves batches of 500 >> sequences (in fasta format) at a time, for a total of 10000 max seq >> records, saving the raw record data directly to a file (appending as >> you go along). I added an eval block to check the server status and >> redo the call up to 4 times before giving up completely. Using eval >> this way hasn't been extensively tested but should work. >> >> --------------------------------------- >> >> use Bio::DB::EUtilities; >> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -db => 'nucest', >> -term => 'txid3702', >> -usehistory => 'y', >> -keep_histories => 1); >> >> my $count = $factory->get_count; >> >> print "Count: $count\n"; >> >> if (my $hist = $factory->next_History) { >> print "History returned\n"; >> # note db carries over from above >> $factory->set_parameters(-eutil => 'efetch', >> -rettype => 'fasta', >> -history => $hist); >> my ($retmax, $retstart) = (500,0); >> my $retry = 1; >> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >> records to return >> RETRIEVE_SEQS: >> while ($retstart < $maxcount) { >> print "Returning from ",$retstart+1," to >> ",$retstart+$retmax,"\n"; >> $factory->set_parameters(-retmax => $retmax, >> -retstart => $retstart); >> # check in case of server error >> eval{ >> $factory->get_Response(-file => ">>ESTs.fas"); >> }; >> if ($@) { >> die "Server error: $@. Try again later" if $retry == 5; >> print STDERR "Server error, redo #$retry\n"; >> $retry++ && redo RETRIEVE_SEQS; >> } >> $retstart += $retmax; >> } >> } >> >> >> --------------------------------------- >> >> >> chris >> >> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >> >>> To download genomic sequences or ESTs for any organism (in various >>> formats) you can use NCBI Taxonomy Browser: >>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>> >>> you can use taxonomy id to access different organisms, Arabidopsis for >>> example (3702): >>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>> >>> >>> or by direct web link: >>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>> >>> >>> assembled genomes can be accessed via ftp: >>> ftp://ftp.ncbi.nih.gov/genomes/ >>> >>> To download large amount of selected sequences (ESTs for example) you >>> can use batch Entrez: >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>> (select EST for EST, it's critical) >>> >>> It seems, to solve the problem you describe, you don't need to use >>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >>> these simple and frequent tasks. >>> >>> -Alex >>> >>> --Alexander Kozik >>> Bioinformatics Specialist >>> Genome and Biomedical Sciences Facility >>> 451 East Health Sciences Drive >>> University of California >>> Davis, CA 95616-8816 >>> Phone: (530) 754-9127 >>> email#1: akozik at atgc.org >>> email#2: akozik at gmail.com >>> web: http://www.atgc.org/ >>> >>> >>> >>> Xing Hu wrote: >>>> Hi friends, >>>> >>>> I wrote a script for getting genomic sequence file from GenBank. To >>>> fulfill that target, I used DB::GenBank module to get the sequence via >>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>> amount >>>> of ESTs, I have no idea how to download them swiftly and elegantly. >>>> >>>> PROBLEM DESCRIPTION: >>>> goal: download all EST files of a specific species from GenBank, >>>> say >>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>> other: whether all of ESTs are in a single file or separatedly >>>> placed does not matter. >>>> >>>> Can I use a bioperl script to achieve that? And How? I really >>>> appreciate. >>>> >>>> Xing. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> From cjfields at uiuc.edu Tue Jul 10 10:05:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 09:05:43 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <46939085.40906@ioc.fiocruz.br> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> Message-ID: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> Just make sure you're using the latest from CVS. Let me know if it doesn't work and I'll look into it. chris On Jul 10, 2007, at 8:58 AM, Alberto Davila wrote: > Hi Xing, > > Unfortunately that did not work for me... there are 5133 T. brucei > ESTs > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691 > [Organism:exp]&cmd=Search&db=nucest&QueryKey=8) > and 13971 from T. cruzi > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693 > [Organism:exp]&cmd=Search&db=nucest&QueryKey=11) > that I cannot download at once in GenBank format... even when I > select > "GenBank" format in the Display menu I can only see and get/ > download 500 > ESTs each time... > > I also downloaded all ESTs from GenBank (a pity there are not > subsets of > them !) but merging all them generate a file bigger than 120GB to be > processed... > > Just asked Diogo (my student) to give a try to the script sent by > Chris > Fields.. so finger crossed ;-) > > Cheers, Alberto > > > Xing Hu wrote: >> Thanks you guys. >> >> I had to confess that how stupid I was. The easiest way seems to >> be the >> way using NCBI Taxonomy Browser which suggested by alex. As a >> matter of >> fact, I knew that but I thought it was necessary to have all items >> selected before pressing save to launch download. So I was >> desperate to >> find a button that could achieve that without hundreds of >> thousands of >> clicking by me. "What about select none of those items at all?" -- >> This >> idea finally came to me after days of struggling and the problem >> was solved. >> >> Xing >> >> >> >> Chris Fields wrote: >>> Caveat: if you have millions of ESTs please consider NOT using my >>> eutil script below or NCBI Batch Entrez, which would repeatedly hit >>> the NCBI server thousands of times. At least try looking for other >>> ways to retrieve the data you want (ftp, organism-specific resources >>> like Ensembl, so on), or run any scripts or data retrieval in off >>> hours so you don't overtax the NCBI server. >>> >>> There is a way you can use BioPerl if you don't mind living on the >>> bleeding edge by using bioperl-live (core code from CVS). I have >>> been >>> working on a set of modules for the last year (Bio::DB::EUtilities) >>> which interact with all the various eutils for building data >>> pipelines >>> which uses the NCBI CGI interface. You could possibly retrieve all >>> relevant ESTs using a variation of the example script here: >>> >>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-. >>> 3Eefetch >>> >>> Note that the code examples do NOT work with rel. 1.5.2 code as the >>> API has changed quite a bit; I'm working to rectify some of that. >>> >>> The script I would use is below. It retrieves batches of 500 >>> sequences (in fasta format) at a time, for a total of 10000 max seq >>> records, saving the raw record data directly to a file (appending as >>> you go along). I added an eval block to check the server status and >>> redo the call up to 4 times before giving up completely. Using eval >>> this way hasn't been extensively tested but should work. >>> >>> --------------------------------------- >>> >>> use Bio::DB::EUtilities; >>> >>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >>> -db => 'nucest', >>> -term => 'txid3702', >>> -usehistory => 'y', >>> -keep_histories => 1); >>> >>> my $count = $factory->get_count; >>> >>> print "Count: $count\n"; >>> >>> if (my $hist = $factory->next_History) { >>> print "History returned\n"; >>> # note db carries over from above >>> $factory->set_parameters(-eutil => 'efetch', >>> -rettype => 'fasta', >>> -history => $hist); >>> my ($retmax, $retstart) = (500,0); >>> my $retry = 1; >>> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >>> records to return >>> RETRIEVE_SEQS: >>> while ($retstart < $maxcount) { >>> print "Returning from ",$retstart+1," to >>> ",$retstart+$retmax,"\n"; >>> $factory->set_parameters(-retmax => $retmax, >>> -retstart => $retstart); >>> # check in case of server error >>> eval{ >>> $factory->get_Response(-file => ">>ESTs.fas"); >>> }; >>> if ($@) { >>> die "Server error: $@. Try again later" if $retry == 5; >>> print STDERR "Server error, redo #$retry\n"; >>> $retry++ && redo RETRIEVE_SEQS; >>> } >>> $retstart += $retmax; >>> } >>> } >>> >>> >>> --------------------------------------- >>> >>> >>> chris >>> >>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >>> >>>> To download genomic sequences or ESTs for any organism (in various >>>> formats) you can use NCBI Taxonomy Browser: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>>> >>>> you can use taxonomy id to access different organisms, >>>> Arabidopsis for >>>> example (3702): >>>> http://www.ncbi.nlm.nih.gov/sites/entrez? >>>> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>>> >>>> >>>> or by direct web link: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? >>>> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>>> >>>> >>>> assembled genomes can be accessed via ftp: >>>> ftp://ftp.ncbi.nih.gov/genomes/ >>>> >>>> To download large amount of selected sequences (ESTs for >>>> example) you >>>> can use batch Entrez: >>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>>> (select EST for EST, it's critical) >>>> >>>> It seems, to solve the problem you describe, you don't need to use >>>> bioperl. NCBI GenBank Entrez provides all necessary tools to >>>> work on >>>> these simple and frequent tasks. >>>> >>>> -Alex >>>> >>>> --Alexander Kozik >>>> Bioinformatics Specialist >>>> Genome and Biomedical Sciences Facility >>>> 451 East Health Sciences Drive >>>> University of California >>>> Davis, CA 95616-8816 >>>> Phone: (530) 754-9127 >>>> email#1: akozik at atgc.org >>>> email#2: akozik at gmail.com >>>> web: http://www.atgc.org/ >>>> >>>> >>>> >>>> Xing Hu wrote: >>>>> Hi friends, >>>>> >>>>> I wrote a script for getting genomic sequence file from >>>>> GenBank. To >>>>> fulfill that target, I used DB::GenBank module to get the >>>>> sequence via >>>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>>> amount >>>>> of ESTs, I have no idea how to download them swiftly and >>>>> elegantly. >>>>> >>>>> PROBLEM DESCRIPTION: >>>>> goal: download all EST files of a specific species from >>>>> GenBank, >>>>> say >>>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>>> other: whether all of ESTs are in a single file or separatedly >>>>> placed does not matter. >>>>> >>>>> Can I use a bioperl script to achieve that? And How? I really >>>>> appreciate. >>>>> >>>>> Xing. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From diogoat at gmail.com Tue Jul 10 10:15:20 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 10 Jul 2007 11:15:20 -0300 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> Message-ID: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> Deal All, I use this script bellow, and it`s work very fine! I only changed the query! And the script gave me the 5133 EST from T. brucei. ################################################################################# use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query =>'gbdiv est[prop] AND Trypanosoma brucei [organism]', db => 'nucleotide'); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); my $out = Bio::SeqIO->new(-format => 'Genbank', -file => '>>Tbrucei.EST.fasta'); while (my $seq = $seqio->next_seq){ $out->write_seq($seq); } #################################################################### Diogo Tschoeke/Fiocruz (Alberto`s Student) From cjfields at uiuc.edu Tue Jul 10 10:35:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 09:35:03 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> Message-ID: <4D704A90-A88A-44A3-B514-E5031CBF288C@uiuc.edu> That will work as well; the key difference between my example and this one is that the seq stream retrieved using Bio::DB::GenBank passes through Bio::SeqIO while Bio::DB::EUtilities saves the raw seq record directly to a file (or callback or HTTP::Response) for optionally parsing later. If you have problems with Bio::SeqIO you can always use Bio::DB::EUtilities to get around the issue until we resolve it. chris On Jul 10, 2007, at 9:15 AM, Diogo Tschoeke wrote: > Deal All, > I use this script bellow, and it`s work very fine! > I only changed the query! And the script gave me the 5133 EST from T. > brucei. > > ###################################################################### > ########### > use strict; > use warnings; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > my $query = Bio::DB::Query::GenBank->new > (-query =>'gbdiv est[prop] AND > Trypanosoma > brucei [organism]', > db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > my $out = Bio::SeqIO->new(-format => 'Genbank', > -file => '>>Tbrucei.EST.fasta'); > while (my $seq = $seqio->next_seq){ > $out->write_seq($seq); > } > #################################################################### > > Diogo Tschoeke/Fiocruz (Alberto`s Student) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hartzell at alerce.com Tue Jul 10 12:50:31 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 12:50:31 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <18067.47319.254632.538811@almost.alerce.com> Jason Stajich writes: > [...] > Do you know how to have svn commit messages generate summary emails > as well? I've made a local installation of the SVN::Notify bits in my home directory and set up its notification script. If folks are happy with it then I'll work on getting The Powers That Be to do a real install and we'll use it for the real repository. It's currently configured to include diffs inline in the message. I prefer them as an attachment, but the current configuration of the bioperl-guts-l list stalls messages w/ attachments and requires admin intervention. I have a support@ request going on it and will change it if/when we get the issue resolved. So, to review: svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/ is the top of the repository and svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/bioperl-live/trunk will get you the main branch of bioperl-live. Remember that the repository is transient, don't put anything important in there.... Have at it, but remember that the entire world will see your commit messages. g. From xing.y.hu at gmail.com Tue Jul 10 13:08:35 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Wed, 11 Jul 2007 01:08:35 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <46939085.40906@ioc.fiocruz.br> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> Message-ID: <4693BD13.2070509@gmail.com> Hi Alberto, Yes, I know that there is only choice for showing no more than 500 entries on the NCBI website. However, I completely ignored that (doesn't mean that I have not seen that), and pulled down the "send to" and chose "file". Then a small window popped up, after saying yes to that, the downloading started. You might ask me how I know that it was not a batch of only 5 (default selection) or 500 ESTs? To be honest, I don't know at the first time. But the download has accumulated to millions bytes since then(due to my bad network condition, I have no idea when it will reach the end), and that doesn't look like a little batch of ESTs less than one thousand. Actually, I wrote a script to count the sequences within the temporary file and got a number much bigger than ten thousand. So I guess it works. BTW, I never thought Bio::DB::Genbank can do that! Again, thanks you guys! Xing Alberto Davila wrote: > Hi Xing, > > Unfortunately that did not work for me... there are 5133 T. brucei ESTs > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) > and 13971 from T. cruzi > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) > that I cannot download at once in GenBank format... even when I select > "GenBank" format in the Display menu I can only see and get/download 500 > ESTs each time... > > I also downloaded all ESTs from GenBank (a pity there are not subsets of > them !) but merging all them generate a file bigger than 120GB to be > processed... > > Just asked Diogo (my student) to give a try to the script sent by Chris > Fields.. so finger crossed ;-) > > Cheers, Alberto > > > Xing Hu wrote: > >> Thanks you guys. >> >> I had to confess that how stupid I was. The easiest way seems to be the >> way using NCBI Taxonomy Browser which suggested by alex. As a matter of >> fact, I knew that but I thought it was necessary to have all items >> selected before pressing save to launch download. So I was desperate to >> find a button that could achieve that without hundreds of thousands of >> clicking by me. "What about select none of those items at all?" -- This >> idea finally came to me after days of struggling and the problem was solved. >> >> Xing >> >> >> >> Chris Fields wrote: >> >>> Caveat: if you have millions of ESTs please consider NOT using my >>> eutil script below or NCBI Batch Entrez, which would repeatedly hit >>> the NCBI server thousands of times. At least try looking for other >>> ways to retrieve the data you want (ftp, organism-specific resources >>> like Ensembl, so on), or run any scripts or data retrieval in off >>> hours so you don't overtax the NCBI server. >>> >>> There is a way you can use BioPerl if you don't mind living on the >>> bleeding edge by using bioperl-live (core code from CVS). I have been >>> working on a set of modules for the last year (Bio::DB::EUtilities) >>> which interact with all the various eutils for building data pipelines >>> which uses the NCBI CGI interface. You could possibly retrieve all >>> relevant ESTs using a variation of the example script here: >>> >>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch >>> >>> Note that the code examples do NOT work with rel. 1.5.2 code as the >>> API has changed quite a bit; I'm working to rectify some of that. >>> >>> The script I would use is below. It retrieves batches of 500 >>> sequences (in fasta format) at a time, for a total of 10000 max seq >>> records, saving the raw record data directly to a file (appending as >>> you go along). I added an eval block to check the server status and >>> redo the call up to 4 times before giving up completely. Using eval >>> this way hasn't been extensively tested but should work. >>> >>> --------------------------------------- >>> >>> use Bio::DB::EUtilities; >>> >>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >>> -db => 'nucest', >>> -term => 'txid3702', >>> -usehistory => 'y', >>> -keep_histories => 1); >>> >>> my $count = $factory->get_count; >>> >>> print "Count: $count\n"; >>> >>> if (my $hist = $factory->next_History) { >>> print "History returned\n"; >>> # note db carries over from above >>> $factory->set_parameters(-eutil => 'efetch', >>> -rettype => 'fasta', >>> -history => $hist); >>> my ($retmax, $retstart) = (500,0); >>> my $retry = 1; >>> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >>> records to return >>> RETRIEVE_SEQS: >>> while ($retstart < $maxcount) { >>> print "Returning from ",$retstart+1," to >>> ",$retstart+$retmax,"\n"; >>> $factory->set_parameters(-retmax => $retmax, >>> -retstart => $retstart); >>> # check in case of server error >>> eval{ >>> $factory->get_Response(-file => ">>ESTs.fas"); >>> }; >>> if ($@) { >>> die "Server error: $@. Try again later" if $retry == 5; >>> print STDERR "Server error, redo #$retry\n"; >>> $retry++ && redo RETRIEVE_SEQS; >>> } >>> $retstart += $retmax; >>> } >>> } >>> >>> >>> --------------------------------------- >>> >>> >>> chris >>> >>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >>> >>> >>>> To download genomic sequences or ESTs for any organism (in various >>>> formats) you can use NCBI Taxonomy Browser: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>>> >>>> you can use taxonomy id to access different organisms, Arabidopsis for >>>> example (3702): >>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>>> >>>> >>>> or by direct web link: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>>> >>>> >>>> assembled genomes can be accessed via ftp: >>>> ftp://ftp.ncbi.nih.gov/genomes/ >>>> >>>> To download large amount of selected sequences (ESTs for example) you >>>> can use batch Entrez: >>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>>> (select EST for EST, it's critical) >>>> >>>> It seems, to solve the problem you describe, you don't need to use >>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >>>> these simple and frequent tasks. >>>> >>>> -Alex >>>> >>>> --Alexander Kozik >>>> Bioinformatics Specialist >>>> Genome and Biomedical Sciences Facility >>>> 451 East Health Sciences Drive >>>> University of California >>>> Davis, CA 95616-8816 >>>> Phone: (530) 754-9127 >>>> email#1: akozik at atgc.org >>>> email#2: akozik at gmail.com >>>> web: http://www.atgc.org/ >>>> >>>> >>>> >>>> Xing Hu wrote: >>>> >>>>> Hi friends, >>>>> >>>>> I wrote a script for getting genomic sequence file from GenBank. To >>>>> fulfill that target, I used DB::GenBank module to get the sequence via >>>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>>> amount >>>>> of ESTs, I have no idea how to download them swiftly and elegantly. >>>>> >>>>> PROBLEM DESCRIPTION: >>>>> goal: download all EST files of a specific species from GenBank, >>>>> say >>>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>>> other: whether all of ESTs are in a single file or separatedly >>>>> placed does not matter. >>>>> >>>>> Can I use a bioperl script to achieve that? And How? I really >>>>> appreciate. >>>>> >>>>> Xing. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Tue Jul 10 13:14:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 10 Jul 2007 18:14:29 +0100 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.47319.254632.538811@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> Message-ID: <4693BE75.4090005@sendu.me.uk> George Hartzell wrote: > Jason Stajich writes: > > [...] > > Do you know how to have svn commit messages generate summary emails > > as well? > > I've made a local installation of the SVN::Notify bits in my home > directory and set up its notification script. If folks are happy with > it then I'll work on getting The Powers That Be to do a real install > and we'll use it for the real repository. > > It's currently configured to include diffs inline in the message. I > prefer them as an attachment, but the current configuration of the > bioperl-guts-l list stalls messages w/ attachments and requires admin > intervention. I have a support@ request going on it and will change > it if/when we get the issue resolved. Can I put a vote in that you don't? I search through email body text in my archive of guts to find certain diffs, so really like the diffs inline. Also, is there any way to get rid of the 'bioperl' in [bioperl revision] in the subject? Seems redundant and makes it harder to see what was changed in a small email client window. From aaron.j.mackey at gsk.com Tue Jul 10 13:20:15 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 10 Jul 2007 13:20:15 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.47319.254632.538811@almost.alerce.com> Message-ID: George, this is all very nice to finally have, thank you for your efforts! Any chance that the diff-as-attachment vs. diffs-inline question can be different for each subscriber? The utility of the "guts" mailing list (to me) is that it's an encyclopedia of browsable, skimmable, and searchable diffs, not just a date-stamped record of diffs (if so, why provide an attachment at all, just provide a URL to the diff in the respository). Thanks again, -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 07/10/2007 12:50:31 PM: > Jason Stajich writes: > > [...] > > Do you know how to have svn commit messages generate summary emails > > as well? > > I've made a local installation of the SVN::Notify bits in my home > directory and set up its notification script. If folks are happy with > it then I'll work on getting The Powers That Be to do a real install > and we'll use it for the real repository. > > It's currently configured to include diffs inline in the message. I > prefer them as an attachment, but the current configuration of the > bioperl-guts-l list stalls messages w/ attachments and requires admin > intervention. I have a support@ request going on it and will change > it if/when we get the issue resolved. > > So, to review: > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/ > > is the top of the repository and > > svn co svn+ssh://dev.open-bio. > org/home/hartzell/bioperl_take2/bioperl-live/trunk > > will get you the main branch of bioperl-live. > > Remember that the repository is transient, don't put anything > important in there.... > > Have at it, but remember that the entire world will see your commit > messages. > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Jul 10 14:18:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 13:18:07 -0500 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <4693BE75.4090005@sendu.me.uk> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: On Jul 10, 2007, at 12:14 PM, Sendu Bala wrote: > George Hartzell wrote: >> Jason Stajich writes: >>> [...] >>> Do you know how to have svn commit messages generate summary emails >>> as well? >> >> I've made a local installation of the SVN::Notify bits in my home >> directory and set up its notification script. If folks are happy >> with >> it then I'll work on getting The Powers That Be to do a real install >> and we'll use it for the real repository. >> >> It's currently configured to include diffs inline in the message. I >> prefer them as an attachment, but the current configuration of the >> bioperl-guts-l list stalls messages w/ attachments and requires admin >> intervention. I have a support@ request going on it and will change >> it if/when we get the issue resolved. > > Can I put a vote in that you don't? I search through email body > text in > my archive of guts to find certain diffs, so really like the diffs > inline. > > Also, is there any way to get rid of the 'bioperl' in [bioperl > revision] > in the subject? Seems redundant and makes it harder to see what was > changed in a small email client window. Agree on both counts; the devs have gotten used to seeing the diffs inline. We prob. need to schedule a specific day/time when the switchover would take place so we can announce (so everyone knows and no one can gripe). Did we ever resolve the svn->cvs issue? Jason pointed out some tools a while ago... chris From hartzell at alerce.com Tue Jul 10 16:09:09 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 16:09:09 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <4693BE75.4090005@sendu.me.uk> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: <18067.59237.519166.454578@almost.alerce.com> Sendu Bala writes: > George Hartzell wrote: > > Jason Stajich writes: > > > [...] > > > Do you know how to have svn commit messages generate summary emails > > > as well? > > > > I've made a local installation of the SVN::Notify bits in my home > > directory and set up its notification script. If folks are happy with > > it then I'll work on getting The Powers That Be to do a real install > > and we'll use it for the real repository. > > > > It's currently configured to include diffs inline in the message. I > > prefer them as an attachment, but the current configuration of the > > bioperl-guts-l list stalls messages w/ attachments and requires admin > > intervention. I have a support@ request going on it and will change > > it if/when we get the issue resolved. > > Can I put a vote in that you don't? I search through email body text in > my archive of guts to find certain diffs, so really like the diffs inline. Ok, three votes against attachments. Anyone want to vote in support, otherwise I'll just leave 'em inline. > Also, is there any way to get rid of the 'bioperl' in [bioperl revision] > in the subject? Seems redundant and makes it harder to see what was > changed in a small email client window. Sure. The default's just [RevisionNumber]. Does that work for folk? g. From hartzell at alerce.com Tue Jul 10 16:11:36 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 16:11:36 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: <18067.59384.247108.463648@almost.alerce.com> Chris Fields writes: > [...] > We prob. need to schedule a specific day/time when the switchover > would take place so we can announce (so everyone knows and no one can > gripe). Did we ever resolve the svn->cvs issue? Jason pointed out > some tools a while ago... I haven't done anything about it. I think that we also need to have some input from the admin/support folk about access methods (https, etc...). Are we going to want to mirror the repository anywhere? g. From hartzell at alerce.com Wed Jul 11 09:17:08 2007 From: hartzell at alerce.com (George Hartzell) Date: Wed, 11 Jul 2007 09:17:08 -0400 Subject: [Bioperl-l] extra hook functionality for svn repos? Message-ID: <18068.55380.626778.486775@almost.alerce.com> There are a bunch of "contributed" hook scripts at http://subversion.tigris.org/tools_contrib.html#hook_scripts Given that many bioperl users depend on case-preserving but case-insensitive file systems, I'm wondering if hooking up the case-insensitive.py script might be worthwhile. Likewise, the check-mime-type.pl script might help us keep svn:mime-type and svn:eol-style properties up to date. There are others there, but none that I found interesting. How big-brother do we want the repository to be? g. From cjfields at uiuc.edu Wed Jul 11 09:40:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Jul 2007 08:40:54 -0500 Subject: [Bioperl-l] extra hook functionality for svn repos? In-Reply-To: <18068.55380.626778.486775@almost.alerce.com> References: <18068.55380.626778.486775@almost.alerce.com> Message-ID: On Jul 11, 2007, at 8:17 AM, George Hartzell wrote: > > There are a bunch of "contributed" hook scripts at > > http://subversion.tigris.org/tools_contrib.html#hook_scripts > > Given that many bioperl users depend on case-preserving but > case-insensitive file systems, I'm wondering if hooking up the > case-insensitive.py script might be worthwhile. I'm not sure how often we run into this, though. Anyone know? > Likewise, the check-mime-type.pl script might help us keep > svn:mime-type and svn:eol-style properties up to date. The latter two might be nice. I thought we planned on defaulting to a simple 'plain text' mime type on commits if it isn't specifically predefined, but maybe this way is better? > There are others there, but none that I found interesting. > > How big-brother do we want the repository to be? > > g. 'Friendly' big-brother, not 'dystopian' big-brother. chris From marian.thieme at lycos.de Wed Jul 11 05:05:18 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Wed, 11 Jul 2007 09:05:18 +0000 Subject: [Bioperl-l] submitting code Message-ID: <188661178019848@lycos-europe.com> An HTML attachment was scrubbed... URL: From dmessina at wustl.edu Wed Jul 11 16:14:17 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 11 Jul 2007 15:14:17 -0500 Subject: [Bioperl-l] submitting code In-Reply-To: <188661178019848@lycos-europe.com> References: <188661178019848@lycos-europe.com> Message-ID: <4DF90B9A-7FFA-4867-B5D3-E6F05EC84BBC@wustl.edu> Hi Marian, Thanks so much for contributing! The best way would be to create a Bugzilla ticket and then attach the code to that ticket. One of the developers will check it in and give you feedback if there are any little tweaks that would be helpful*. Would you be able to include documentation and test cases with your module? Dave * For more info: http://www.bioperl.org/wiki/FAQ#I. 27ve_got_an_idea_for_a_module_how_do_I_contribute_it.3F http://www.bioperl.org/wiki/Developer_Information http://www.bioperl.org/wiki/Becoming_a_developer http://bioperl.org/pipermail/bioperl-l/2003-February/011226.html -- Dave Messina Senior Analyst, Assembly Group Genome Sequencing Center Washington University St. Louis, MO From marian.thieme at lycos.de Wed Jul 11 11:12:20 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Wed, 11 Jul 2007 15:12:20 +0000 Subject: [Bioperl-l] submitting code Message-ID: <188661178030343@lycos-europe.com> An HTML attachment was scrubbed... URL: From e-just at northwestern.edu Thu Jul 12 10:37:03 2007 From: e-just at northwestern.edu (Eric Just) Date: Thu, 12 Jul 2007 09:37:03 -0500 Subject: [Bioperl-l] Job opening in Chicago Message-ID: Hello everyone, We have an opening at dictyBase (Northwestern University in Chicago) for a Bioinformatics Software Engineer. This job involves writing and maintaining software for a genome database using Chado/OO-Perl/Bioperl and many other state of the art technologies. For more information please see: http://dictybase.org/dictybase_jobs.htm Thanks, Eric From cjfields at uiuc.edu Thu Jul 12 12:09:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Jul 2007 11:09:02 -0500 Subject: [Bioperl-l] DB::SeqFeature::Store::GFF3Loader question Message-ID: I have been running into some GFF formatting issues where the attributes column is left undef (no '.'), which causes GFF3Loader::parse_attributes() to complain with an 'use of undefined string with split' warning. Would it be okay with the powers that be (Scott, Lincoln) to add a warning or exception there? I'm guessing a warning is better in this case, as just returning works fine. chris From jason at bioperl.org Fri Jul 13 13:30:05 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 13:30:05 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.59384.247108.463648@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> Message-ID: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> I'll try and look into this and other stuff with the migration in next week or so - maybe we'll make some time to talk it through during BOSC. I don't know yet when I'll actually have time to think about it properly. I am still worried about doing https because of the current system we have supporting user logins and that we didn't want to run a web server on the main repository machine and we'll have to install DAV on the main repository machine. if ssh+svn is going to be sufficient hurdle for people, note it was already a hurdle for them with CVS, but we'll have to think a bit more on it. We might be able to do some sort of NFS (or other exported FS) but exported to the webserver machine but that is may be a recipe for disaster. -jason On Jul 10, 2007, at 4:11 PM, George Hartzell wrote: > Chris Fields writes: >> [...] >> We prob. need to schedule a specific day/time when the switchover >> would take place so we can announce (so everyone knows and no one can >> gripe). Did we ever resolve the svn->cvs issue? Jason pointed out >> some tools a while ago... > > I haven't done anything about it. > > I think that we also need to have some input from the admin/support > folk about access methods (https, etc...). > > Are we going to want to mirror the repository anywhere? > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Fri Jul 13 14:29:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 13 Jul 2007 13:29:22 -0500 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> Message-ID: <5F5EB9B6-11AF-4D20-95B1-EBBD40A98962@uiuc.edu> I don't think there's a huge rush on this since BOSC is imminent. If devs really want https then we can try adding it after migration, but if it becomes too much of a headache (particularly for the web admins) I wouldn't worry about it. chris On Jul 13, 2007, at 12:30 PM, Jason Stajich wrote: > I'll try and look into this and other stuff with the migration in > next week or so - maybe we'll make some time to talk it through > during BOSC. I don't know yet when I'll actually have time to think > about it properly. > > I am still worried about doing https because of the current system we > have supporting user logins and that we didn't want to run a web > server on the main repository machine and we'll have to install DAV > on the main repository machine. if ssh+svn is going to be sufficient > hurdle for people, note it was already a hurdle for them with CVS, > but we'll have to think a bit more on it. > > We might be able to do some sort of NFS (or other exported FS) but > exported to the webserver machine but that is may be a recipe for > disaster. > > -jason > On Jul 10, 2007, at 4:11 PM, George Hartzell wrote: > >> Chris Fields writes: >>> [...] >>> We prob. need to schedule a specific day/time when the switchover >>> would take place so we can announce (so everyone knows and no one >>> can >>> gripe). Did we ever resolve the svn->cvs issue? Jason pointed out >>> some tools a while ago... >> >> I haven't done anything about it. >> >> I think that we also need to have some input from the admin/support >> folk about access methods (https, etc...). >> >> Are we going to want to mirror the repository anywhere? >> >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sheris at eps.berkeley.edu Fri Jul 13 14:42:32 2007 From: sheris at eps.berkeley.edu (Sheri Simmons) Date: Fri, 13 Jul 2007 11:42:32 -0700 Subject: [Bioperl-l] Problem with Bio::PopGen::Individual Message-ID: <200707131142.32366.sheris@eps.berkeley.edu> Hi, I have a collection of sequencing reads aligned with a consensus sequence that I input into a Bio::PopGen::Population object in order to calculate allele frequencies. The consensus sequence is included to force clustalw to give a better alignment. However, I need to remove the consensus sequence before calculating allele frequencies in the individual reads. I'm having trouble with this part of it. I get the following error message: "Can't locate object method "person_id" via package "Bio::PopGen::Individual" at /usr/share/perl5/Bio/PopGen/Population.pm line 260, line 49." Here is the code snippet producing the error. $pop is a Bio::PopGen::Population object. my @consensus = "gene_consensus"; $pop->remove_Individuals(@consensus); I also tried: my @consensus = $pop->get_Individuals(-unique_id => "gene_consensus"); $pop->remove_Individuals(@consensus); which produced the same error. Can anyone send me in the right direction? I suspect this is a simple problem. Sheri -- Sheri Simmons Department of Earth and Planetary Sciences University of California, Berkeley Berkeley, CA 94720-4767 From jason at bioperl.org Fri Jul 13 16:17:31 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 16:17:31 -0400 Subject: [Bioperl-l] Problem with Bio::PopGen::Individual In-Reply-To: <200707131142.32366.sheris@eps.berkeley.edu> References: <200707131142.32366.sheris@eps.berkeley.edu> Message-ID: <99A3513A-7DBE-4C89-B38B-8C2B76B0E14F@bioperl.org> Hi Sheri - Shoot - that was my fault - bug in the code where I was only using "Person" not Individuals for the code when I was testing. I've commited a bugfix to CVS - do you need me to send you the updated file or are you comfortable grabbing the code from CVS or http://code.open-bio.org This is the change - you may have a different version of BioPerl than what is in CVS so you may have to make the changes on line 260 rather than 282 -- or you can upgrade to latest code via CVS (although this is probably harder for you since you've got stuff installed in /usr/ share)': RCS file: /home/repository/bioperl/bioperl-live/Bio/PopGen/ Population.pm,v retrieving revision 1.22 diff -r1.22 Population.pm 282c282 < unshift @tosplice, $i if( $namehash{$ind->person_id} ); --- > unshift @tosplice, $i if( $namehash{$ind->unique_id} ); -jason On Jul 13, 2007, at 2:42 PM, Sheri Simmons wrote: > Hi, > I have a collection of sequencing reads aligned with a consensus > sequence that > I input into a Bio::PopGen::Population object in order to calculate > allele > frequencies. The consensus sequence is included to force clustalw > to give a > better alignment. However, I need to remove the consensus sequence > before > calculating allele frequencies in the individual reads. I'm having > trouble > with this part of it. I get the following error message: > > "Can't locate object method "person_id" via package > "Bio::PopGen::Individual" > at /usr/share/perl5/Bio/PopGen/Population.pm line 260, line > 49." > > Here is the code snippet producing the error. $pop is a > Bio::PopGen::Population object. > > my @consensus = "gene_consensus"; > $pop->remove_Individuals(@consensus); > > I also tried: > my @consensus = $pop->get_Individuals(-unique_id => > "gene_consensus"); > $pop->remove_Individuals(@consensus); > > which produced the same error. Can anyone send me in the right > direction? I > suspect this is a simple problem. > > Sheri > > -- > Sheri Simmons > Department of Earth and Planetary Sciences > University of California, Berkeley > Berkeley, CA 94720-4767 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From hartzell at alerce.com Fri Jul 13 16:34:14 2007 From: hartzell at alerce.com (George Hartzell) Date: Fri, 13 Jul 2007 16:34:14 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> Message-ID: <18071.57798.130368.703488@almost.alerce.com> Jason Stajich writes: > I'll try and look into this and other stuff with the migration in > next week or so - maybe we'll make some time to talk it through > during BOSC. I don't know yet when I'll actually have time to think > about it properly. > > I am still worried about doing https because of the current system we > have supporting user logins and that we didn't want to run a web > server on the main repository machine and we'll have to install DAV > on the main repository machine. if ssh+svn is going to be sufficient > hurdle for people, note it was already a hurdle for them with CVS, > but we'll have to think a bit more on it. > [...] How are you thinking about providing anonymous readonly non-dev access to the repository? svn+ssh using an anonymous/guest account (can it be screwed down tightly enough?) svn-mirror the repo onto the public machine and do DAV there w/out having to worry about authenticating the devs? g. From jason at bioperl.org Fri Jul 13 17:33:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 17:33:29 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18071.57798.130368.703488@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> <18071.57798.130368.703488@almost.alerce.com> Message-ID: <5C42D957-BCCA-46B6-8121-3313CE4B0F2A@bioperl.org> On Jul 13, 2007, at 4:34 PM, George Hartzell wrote: > Jason Stajich writes: >> I'll try and look into this and other stuff with the migration in >> next week or so - maybe we'll make some time to talk it through >> during BOSC. I don't know yet when I'll actually have time to think >> about it properly. >> >> I am still worried about doing https because of the current system we >> have supporting user logins and that we didn't want to run a web >> server on the main repository machine and we'll have to install DAV >> on the main repository machine. if ssh+svn is going to be sufficient >> hurdle for people, note it was already a hurdle for them with CVS, >> but we'll have to think a bit more on it. >> [...] > > How are you thinking about providing anonymous readonly non-dev access > to the repository? svn+ssh using an anonymous/guest account (can it > be screwed down tightly enough?) svn-mirror the repo onto the public > machine and do DAV there w/out having to worry about authenticating > the devs? > We'll do svn on the public anonymous machine like we already do with CVS and with SVN See: http://code.open-bio.org AND http://code.open-bio.org/svnweb/ See blipkit. -jason > g. > > -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From scrosson at uchicago.edu Fri Jul 13 18:15:30 2007 From: scrosson at uchicago.edu (Sean Crosson) Date: Fri, 13 Jul 2007 22:15:30 +0000 (UTC) Subject: [Bioperl-l] ace to fasta conversion Message-ID: I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta and it works great. We're now trying to convert a big (250 MB) .ace file to fasta. The documentation suggests I can do this, but everytime I run the script below, it outputs an empty .fas file. Does anyone have any suggestions on how to make this script work? Does SeqIO really convert between these file types? Thanks for your help. #!/usr/bin/perl -w use Bio::SeqIO; $in = Bio::SeqIO->new(-file => "454Contigs.ace", -format => 'ace'); $out = Bio::SeqIO->new(-file => ">454Contigs.fas", -format => 'fasta'); while ( $seq = $in->next_seq() ) {$out->write_seq($seq); } From cvillamar at gmail.com Fri Jul 13 19:24:04 2007 From: cvillamar at gmail.com (Carlos Villacorta) Date: Fri, 13 Jul 2007 16:24:04 -0700 Subject: [Bioperl-l] beginner problem with fasta headers Message-ID: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> hi all, I have a embl sequence file, when formatting to fasta with Seqio it gives a long string header for each sequence that my following phylogenetic software cannot handle... Does anyone knows how to format those embl or genbank files to fasta but retrieving in the headers just two or three fields (e.g. id | gene | sp_name)? Any advice with this problem would be very appreciated, thanks! From j_martin at lbl.gov Fri Jul 13 20:05:45 2007 From: j_martin at lbl.gov (Joel Martin) Date: Fri, 13 Jul 2007 17:05:45 -0700 Subject: [Bioperl-l] ace to fasta conversion In-Reply-To: References: Message-ID: <20070714000544.GB29841@eniac.jgi-psf.org> Hello, the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use is a phrap/consed ace file. They aren't related at all. You might try poking around in Bio::AssemblyIO which should read assembly ace files. Joel On Fri, Jul 13, 2007 at 10:15:30PM +0000, Sean Crosson wrote: > I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta > and it works great. We're now trying to convert a big (250 MB) .ace file to > fasta. The documentation suggests I can do this, but everytime I run the script > below, it outputs an empty .fas file. Does anyone have any suggestions on how > to make this script work? Does SeqIO really convert between these file types? > Thanks for your help. > > #!/usr/bin/perl -w > > use Bio::SeqIO; > > > $in = Bio::SeqIO->new(-file => "454Contigs.ace", > -format => 'ace'); > $out = Bio::SeqIO->new(-file => ">454Contigs.fas", > -format => 'fasta'); > while ( $seq = $in->next_seq() ) {$out->write_seq($seq); } > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sat Jul 14 00:06:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 13 Jul 2007 23:06:27 -0500 Subject: [Bioperl-l] beginner problem with fasta headers In-Reply-To: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> References: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> Message-ID: <0089195A-4935-49F2-A8E7-C1F9B8A34D4E@uiuc.edu> Some reading material... http://www.bioperl.org/wiki/ FAQ#Accession_numbers_are_not_present_for_FASTA_sequence_files http://www.bioperl.org/wiki/ FAQ#I_would_like_to_make_my_own_custom_fasta_header_- _how_do_I_do_this.3F http://www.bioperl.org/wiki/FASTA_sequence_format#Note Quiz on Monday! chris On Jul 13, 2007, at 6:24 PM, Carlos Villacorta wrote: > hi all, > I have a embl sequence file, when formatting to fasta with Seqio it > gives a long string header for each sequence that my following > phylogenetic software cannot handle... > Does anyone knows how to format those embl or genbank files to fasta > but retrieving in the headers just two or three fields (e.g. id | gene > | sp_name)? > Any advice with this problem would be very appreciated, thanks! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scrosson at uchicago.edu Fri Jul 13 23:43:59 2007 From: scrosson at uchicago.edu (scrosson) Date: Fri, 13 Jul 2007 20:43:59 -0700 (PDT) Subject: [Bioperl-l] ace to fasta conversion In-Reply-To: <20070714000544.GB29841@eniac.jgi-psf.org> References: <20070714000544.GB29841@eniac.jgi-psf.org> Message-ID: <11590811.post@talk.nabble.com> This problem now makes sense. I've been playing with Bio::Assembly::IO, which does indeed read phrap .ace files. Does anyone have an idea how to pull the assembled contigs out of a Bio::Assembly object and write them out as multi-fasta (or strings for that matter)? None of our workstations are running phrap/consed and I'd love to see these contigs. Sean Hello, the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use is a phrap/consed ace file. They aren't related at all. You might try poking around in Bio::AssemblyIO which should read assembly ace files. Joel -- View this message in context: http://www.nabble.com/ace-to-fasta-conversion-tf4077370.html#a11590811 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bioperlanand at yahoo.com Sat Jul 14 13:55:53 2007 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Sat, 14 Jul 2007 10:55:53 -0700 (PDT) Subject: [Bioperl-l] a question on obtain PDB records using bioperl Message-ID: <798126.17426.qm@web36804.mail.mud.yahoo.com> Hi everybody, Is there a method in Bioperl to obtain PDB record(s) on the fly, i.e. something similar to Bio:Perl methods to retrieve EMBL or GenBank records. Thanks in advance, Anand --------------------------------- Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. From johnsonm at gmail.com Tue Jul 17 14:23:58 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 17 Jul 2007 13:23:58 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? Message-ID: I'm tinkering with parsing iprscan reports with BioPerl. I noticed that this: my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => 'interpro'); while (my $seq = $seqio->next_seq()) { ... } Does not work unless I first 'use XML::DOM::XPath'. I get this error: Can't locate object method "findnodes" via package "XML::DOM::Document" at bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line 30. I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to suck in XML::DOM::Xpath. I see that t/interpro.t requires XML::DOM::XPath: test_begin(-tests => 17, -requires_module => 'XML::DOM::XPath'); Is suppose the reason the test specs a require XML::DOM::XPath is so that tests can be skipped if XML::DOM::XPath is not available. Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? From sac at bioperl.org Tue Jul 17 15:49:32 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 17 Jul 2007 12:49:32 -0700 Subject: [Bioperl-l] Ohloh account for bioperl Message-ID: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> I came across a web app that tracks various metrics for open source projects, noticed that bioperl wasn't listed, and added it: http://www.ohloh.net/projects/6685 Seems like an interesting resource that could help add some visibility. It creates metrics by directly processing the source code repository. I hooked it up to the CVS repos for bioperl-live, -db, -run, and -pipeline. It has yet to do its analysis at this point. Feel free to create Ohloh accounts for yourselves. When you add yourself as a contributor to Bioperl, you can indicate the username associated with your commits, but this requires that it first process the commit logs to figure out what the usernames are. You can still create an account, just update it later with your username. Steve From cjfields at uiuc.edu Tue Jul 17 17:04:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 17 Jul 2007 16:04:44 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? In-Reply-To: References: Message-ID: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu> On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote: > I'm tinkering with parsing iprscan reports with BioPerl. I noticed > that this: > > my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => > 'interpro'); > > while (my $seq = $seqio->next_seq()) { > ... > } > > Does not work unless I first 'use XML::DOM::XPath'. I get this error: > > Can't locate object method "findnodes" via package > "XML::DOM::Document" at > bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line > 30. > > I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to > suck in XML::DOM::Xpath. I see that t/interpro.t requires > XML::DOM::XPath: > > test_begin(-tests => 17, > -requires_module => 'XML::DOM::XPath'); > > Is suppose the reason the test specs a require XML::DOM::XPath is so > that tests can be skipped if XML::DOM::XPath is not available. > Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? You're right; I think tests passed b/c XML::DOM::XPath (if present), was eval'd as a required module. When I commented out the spot where it is eval'd in the test suite I can replicate this error. I have added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it passes fine. Thanks for the heads up! chris From xianranli78 at yahoo.com.cn Wed Jul 18 01:55:19 2007 From: xianranli78 at yahoo.com.cn (Xianran Li) Date: Wed, 18 Jul 2007 13:55:19 +0800 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file Message-ID: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> Hi, I want to extract some infomation from the gff3 file like: 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? Thanks for your help. Xianran Li From georg.otto at tuebingen.mpg.de Wed Jul 18 05:32:26 2007 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Wed, 18 Jul 2007 11:32:26 +0200 Subject: [Bioperl-l] run megablast Message-ID: Hi, is there a module to run megablast in a script (equivalent to ncbi blast in StandAloneBlast.pm)? Cheers, Georg From jeevitesh at ibab.ac.in Wed Jul 18 06:03:24 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 15:33:24 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <47819.192.168.1.125.1184753004.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D we need to find the shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From jeevitesh at ibab.ac.in Wed Jul 18 03:15:33 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 12:45:33 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <55933.192.168.1.125.1184742933.squirrel@webmail.ibab.ac.in> Hi Friends, we need your valuable help in finding the SHARED PATH BETWEEN TWO NODES OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES. Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From jeevitesh at ibab.ac.in Wed Jul 18 04:45:50 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 14:15:50 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <43613.192.168.1.125.1184748350.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D we need to find the shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From cain.cshl at gmail.com Wed Jul 18 09:10:40 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 18 Jul 2007 09:10:40 -0400 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file In-Reply-To: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> Message-ID: <1184764240.2570.31.camel@localhost.localdomain> Hi Xianran Li, Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing as Bio::DB::GFF3), then you can use the attributes method to get anything in the ninth column: my ($name) = $gene->attributes('Name'); The parenthesis are needed around $name because the attributes method returns a list and the parens capture the first item of the list into $name. Scott On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote: > Hi, > > I want to extract some infomation from the gff3 file like: > > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative > > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? > > Thanks for your help. > > > Xianran Li > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From johnsonm at gmail.com Wed Jul 18 16:53:00 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 18 Jul 2007 15:53:00 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? In-Reply-To: <469DB6C6.9010702@pasteur.fr> References: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu> <469DB6C6.9010702@pasteur.fr> Message-ID: The output from InterProScan, invoked thusly: iprscan -cli -seqtype p -i input_file -o output_file -format xml On 7/18/07, Emmanuel Quevillon wrote: > Hi guys, > > I read your email and I wondered which iprscan file you've > been talking about? Is it the file produced by InterProScan > or the file called match.xml representing the whole uniprot > database against InterPro? Reading the xml parser > implemented into Bio::SeqIO::interpro, I guess it is the > second one? > In such case, I just want to let you know that the xml > schema changed and the file name also. It is now called > match_complete.xml. > I attached the DTD to be able to see the new structure. > Here is an example of the new data representation. > > > crc64="F1DD0C1042811B48"> > name="PyrdxlP-dep_Trfase_major_sub1" dbname="GENE3D" > status="T" evd="HMMPfam"> > type="Domain" /> > > > dbname="PANTHER" status="T" evd="not_rel"> > > > > > As you can see some time there is no interpro info (no ipr > element). > > I think it would be good to change also the interpro parser ? > > Regards > > Emmanuel > > Chris Fields wrote: > > On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote: > > > >> I'm tinkering with parsing iprscan reports with BioPerl. I noticed > >> that this: > >> > >> my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => > >> 'interpro'); > >> > >> while (my $seq = $seqio->next_seq()) { > >> ... > >> } > >> > >> Does not work unless I first 'use XML::DOM::XPath'. I get this error: > >> > >> Can't locate object method "findnodes" via package > >> "XML::DOM::Document" at > >> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line > >> 30. > >> > >> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to > >> suck in XML::DOM::Xpath. I see that t/interpro.t requires > >> XML::DOM::XPath: > >> > >> test_begin(-tests => 17, > >> -requires_module => 'XML::DOM::XPath'); > >> > >> Is suppose the reason the test specs a require XML::DOM::XPath is so > >> that tests can be skipped if XML::DOM::XPath is not available. > >> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? > > > > You're right; I think tests passed b/c XML::DOM::XPath (if present), > > was eval'd as a required module. When I commented out the spot where > > it is eval'd in the test suite I can replicate this error. I have > > added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it > > passes fine. > > > > Thanks for the heads up! > > > > chris > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cain.cshl at gmail.com Wed Jul 18 22:47:53 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 18 Jul 2007 22:47:53 -0400 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file In-Reply-To: <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL> References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> <1184764240.2570.31.camel@localhost.localdomain> <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL> Message-ID: <1184813273.2570.96.camel@localhost.localdomain> [Please always reply to the mailing list so that answers can archived] Yes, because commas are not allowed in GFF3 in an unescaped form. Essentially, you are doing this with your GFF3: Name=receptor kinase ORK10;Name= putative and when you do this: my ($name) = $gene->attributes('Name'); you are getting the first item in the list of names, and I suspect which one you get is random. To fix it, you need to replace the comma with %2C (the URL escape code for a comma). If you generated this GFF3, you will need to add a step to URI encode your attribute strings. If you got it from someone else, you should point out to them that their GFF is flawed. Scott On Thu, 2007-07-19 at 10:32 +0800, Xianran Li wrote: > However, the $name return the string "putative" rather than "receptor kinase ORK10". Is any particular reason? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing > as Bio::DB::GFF3), then you can use the attributes method to get > anything in the ninth column: > > my ($name) = $gene->attributes('Name'); > > The parenthesis are needed around $name because the attributes method > returns a list and the parens capture the first item of the list into > $name. > > Scott > > > On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote: > > Hi, > > > > I want to extract some infomation from the gff3 file like: > > > > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative > > > > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? > > > > Thanks for your help. > > > > > > Xianran Li > ----- Original Message ----- > From: "Scott Cain" > To: "Xianran Li" > Cc: > Sent: Wednesday, July 18, 2007 9:10 PM > Subject: Re: [Bioperl-l] extract information with Bio::DB::GFF3 fromgff3 file > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l�??i??'?????h??& -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From acutter at eeb.utoronto.ca Thu Jul 19 22:25:08 2007 From: acutter at eeb.utoronto.ca (Asher Cutter) Date: Thu, 19 Jul 2007 22:25:08 -0400 Subject: [Bioperl-l] tree comparisons with bioperl Message-ID: <46A01D04.5040209@eeb.utoronto.ca> I was reading over the functions for working with trees in bioperl. I am looking for something that will compare two topologies and report back if they are equivalent. i.e. something like: does ((a,(b,c)) == ((A,B),C) ? (in this case, no) But of course in reality they would be more complicated topologies. This would be useful for simulating random trees to compare with some given topology of interest. I saw the methods for testing for monophyly and paraphyly, but not much beyond that...perhaps I have missed something? Any suggestions? Thanks, Asher -- ___________________________________ Asher D. Cutter Assistant Professor Department of Ecology & Evolutionary Biology University of Toronto 25 Harbord St. Toronto, ON, M5S 3G5 tel: 416-978-4602 email: acutter at eeb.utoronto.ca http://www.eeb.utoronto.ca/faculty/faculty_profile.cfm?prof_id=130 ___________________________________ From jeevitesh at ibab.ac.in Fri Jul 20 00:25:22 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Fri, 20 Jul 2007 09:55:22 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <53244.192.168.1.125.1184905522.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D The shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From n.haigh at sheffield.ac.uk Sun Jul 22 07:34:58 2007 From: n.haigh at sheffield.ac.uk (Nathan S Haigh) Date: Sun, 22 Jul 2007 12:34:58 +0100 Subject: [Bioperl-l] Ohloh account for bioperl In-Reply-To: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> References: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> Message-ID: <46A340E2.4040505@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Steve Chervitz wrote: > I came across a web app that tracks various metrics for open source > projects, noticed that bioperl wasn't listed, and added it: > > http://www.ohloh.net/projects/6685 > > Seems like an interesting resource that could help add some > visibility. It creates metrics by directly processing the source code > repository. I hooked it up to the CVS repos for bioperl-live, -db, > -run, and -pipeline. It has yet to do its analysis at this point. > > Feel free to create Ohloh accounts for yourselves. When you add > yourself as a contributor to Bioperl, you can indicate the username > associated with your commits, but this requires that it first process > the commit logs to figure out what the usernames are. You can still > create an account, just update it later with your username. > > Steve > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Nice to see the graphs of number of commits each developer has made over the last 5 years and how new developers have arisen while those more "seasoned" developers can relax a little more -proof of an excellent open source project! Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGo0Dih5z4PPfwHQoRAua4AJ9nxDJeqAZIbyv0M3g+6Y2xWzkEEgCgnHBO 4JWvG5Gy+H/UqpeXYAcSCX0= =LrFt -----END PGP SIGNATURE----- From cjfields at uiuc.edu Sun Jul 22 23:53:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 22 Jul 2007 22:53:48 -0500 Subject: [Bioperl-l] run megablast In-Reply-To: References: Message-ID: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu> StandAloneBlast runs the megablast executable directly, though I think you can specify a MegaBlast search using blastall with the '-n' flag. We could probably add this functionality in fairly easily since SearchIO can parse megablast output; no one's had the need to code it yet. chris On Jul 18, 2007, at 4:32 AM, Georg Otto wrote: > > Hi, > > is there a module to run megablast in a script (equivalent to ncbi > blast in StandAloneBlast.pm)? > > Cheers, > > Georg > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jeevitesh at ibab.ac.in Mon Jul 23 06:34:36 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Mon, 23 Jul 2007 16:04:36 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D The shared path between AB and AC is 2. and for AC and BD the shared path is 6. We need to find the shared distance as said above. Kindly helps us it will help our research a lot. With Thanks & regards jeevitesh From bix at sendu.me.uk Mon Jul 23 07:08:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 23 Jul 2007 12:08:23 +0100 Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance In-Reply-To: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> References: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> Message-ID: <46A48C27.6060905@sendu.me.uk> jeevitesh at ibab.ac.in wrote: > Hi Friends, > > We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF > A TREE. Please stop sending this message. We heard you the first time. If no one answered, either no one knows the answer or no one understood you. > The Distance method of TreeIO in Bioperl module gives the total distance. > > But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as > illustrated > in figure. > > Suppose we have a tree > A C > \ / > \2 2/ > \__________/ > / 6 \ > /2 2\ > / \ > B D > > The shared path between AB and AC is 2. > and for AC and BD the shared path is 6. I don't follow. But if you already know how to work the answer out, describe the algorithm in words and maybe someone can code it up for you. From georg.otto at tuebingen.mpg.de Mon Jul 23 09:56:46 2007 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Mon, 23 Jul 2007 15:56:46 +0200 Subject: [Bioperl-l] run megablast References: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu> Message-ID: Thanks a lot! I guess I should have read the blast documentation more carefully.... Best, Georg Chris Fields writes: > StandAloneBlast runs the megablast executable directly, though I > think you can specify a MegaBlast search using blastall with the '-n' > flag. > > We could probably add this functionality in fairly easily since > SearchIO can parse megablast output; no one's had the need to code it > yet. > > chris > > On Jul 18, 2007, at 4:32 AM, Georg Otto wrote: > >> >> Hi, >> >> is there a module to run megablast in a script (equivalent to ncbi >> blast in StandAloneBlast.pm)? >> >> Cheers, >> >> Georg >> From cjfields at uiuc.edu Mon Jul 23 11:41:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 23 Jul 2007 10:41:35 -0500 Subject: [Bioperl-l] Bio::Assembly bug/feature? Message-ID: <52744D70-CED6-49DB-8A17-0998F125D9AD@uiuc.edu> To all: I think I have found a major problem with Bio::Assembly; this was first noticed on Mac OS X in relation to bug 2320 and Bio::Assembly::IO. I am uncertain whether this is meant to be a feature or a bug but it certainly needs to be documented or fixed as it leads to subtle errors. I also can't see the advantage of this approach, but maybe I can be enlightened? Either way, I think it's worth a discussion for those willing to follow. I'll add as a bug later if needed. A bit of background: each instance of a Bio::Assembly::Contig has a Bio::SeqFeature::Collection instance attached to it; each Bio::SeqFeature::Collection itself has a tied DB_File handle attached which remains open during the lifetime of the Bio::SF::Collection object. When using Bio::Assembly one adds the various Contig objects to a Bio::Assembly::Scaffold. So, for instance, if one had ~1000 Contigs in a Scaffold, one would also have ~1000 open tied db handles, one per Contig instance. So far, so good. Unfortunately, when adding a ton of Contig objects to a Bio::Assembly::Scaffold one can run into a host of system-dependent issues based on resource usage limits (as one might expect). This script: ------------------------------ use Bio::Assembly::Scaffold; use Bio::Assembly::Contig; use Bio::SeqFeature::Generic; my $scaffold = Bio::Assembly::Scaffold->new(); for my $id (1..15000) { print "Contig #$id\n"; my $contig = Bio::Assembly::Contig->new(-id => $id); my $feat = Bio::SeqFeature::Generic->new(-start=>1, -end=>10, -strand=>1); $contig->add_features([$feat]); $scaffold->add_contig($contig); } ------------------------------ may fail on Mac OS X when one reaches the maximum number of open file descriptors possible on Mac OS X (on UNIX'y systems, this is 'ulimit - n'); the call to tie the DB_File handle in SF::Collection fails silently, so later on when called on you get the following: ... Contig #251 Contig #252 Contig #253 Contig #254 Can't call method "put" on an undefined value at /Users/cjfields/src/ bioperl-live/Bio/SeqFeature/Collection.pm line 225. I have added an exception to catch this. On Mac OS X you can increase the file descriptor limit using ulimit, at least to a certain point. However, when testing this out on dev.open-bio.org (Linux) the 'tie' sometimes fails (and the exception pops up), but it isn't dependent on 'ulimit -n'. This is what happens more often: ... Contig #10567 Contig #10568 Contig #10569 Contig #10570 Out of memory! Sometimes followed by a seg fault. Ick! Any ideas? For instance, should we set this up so that one SF::Collection is used for all the Contigs (since each one has a unique ID anyway)? Leave as is and document/track the issue as a bug? Both? chris From ba6450 at wayne.edu Mon Jul 23 16:06:14 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Mon, 23 Jul 2007 16:06:14 -0400 (EDT) Subject: [Bioperl-l] error running codeml Message-ID: <20070723160614.EEU90041@mirapointms6.wayne.edu> Hello everyone: I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: [code] use Bio::Tools::Run::Phylo::PAML::Codeml; use Bio::AlignIO; use Bio::TreeIO; my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => 'NM_000034.CDSalign.paml'); my $aln = $alignio->next_aln; my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); my $tree = $treeio->next_tree; my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); $codeml->alignment($aln); $codeml->tree($tree); my ($rc,$parser) = $codeml->run(); my $result = $parser->next_result; my $MLmatrix = $result->get_MLmatrix(); print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; [/code] It gives the following error when I try to compile: [error] ------------ EXCEPTION: Bio::Root::Exception ------------- MSG: unable to find or run executable for 'codeml' STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 ----------------------------------------------------------- Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 [/error] Any idea, guys? Munirul Islam Phd Student Computer Science Wayne State University From arareko at campus.iztacala.unam.mx Mon Jul 23 17:19:24 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 23 Jul 2007 16:19:24 -0500 Subject: [Bioperl-l] error running codeml In-Reply-To: <20070723160614.EEU90041@mirapointms6.wayne.edu> References: <20070723160614.EEU90041@mirapointms6.wayne.edu> Message-ID: <46A51B5C.9080808@campus.iztacala.unam.mx> Apparently, your script isn't able to locate the codeml executable in your Windows environment. Do you have the PAML package installed? Instructions on how to install it are located here: http://abacus.gene.ucl.ac.uk/software/paml.html Regards, Mauricio. Munirul Islam wrote: > Hello everyone: > > I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: > > [code] > use Bio::Tools::Run::Phylo::PAML::Codeml; > use Bio::AlignIO; > use Bio::TreeIO; > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > -file => 'NM_000034.CDSalign.paml'); > > my $aln = $alignio->next_aln; > > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); > my $tree = $treeio->next_tree; > > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); > > $codeml->alignment($aln); > $codeml->tree($tree); > > my ($rc,$parser) = $codeml->run(); > my $result = $parser->next_result; > my $MLmatrix = $result->get_MLmatrix(); > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; > [/code] > > It gives the following error when I try to compile: > > [error] > ------------ EXCEPTION: Bio::Root::Exception ------------- > MSG: unable to find or run executable for 'codeml' > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 > ----------------------------------------------------------- > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 > [/error] > > Any idea, guys? > > Munirul Islam > Phd Student > Computer Science > Wayne State University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From ba6450 at wayne.edu Mon Jul 23 19:53:22 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Mon, 23 Jul 2007 19:53:22 -0400 (EDT) Subject: [Bioperl-l] error running codeml Message-ID: <20070723195322.EEV22403@mirapointms6.wayne.edu> Thanks Mauricio. I needed to add an environment variable for the paml directiory. $ENV{'PAMLDIR'} = 'c:\paml3.15\bin'; One question ... I would like to save the temp files. So, what modification do I need to make such that $obj->save_tempfiles returns 1 within codeml.pm? Regards Munir ---- Original message ---- >Date: Mon, 23 Jul 2007 16:19:24 -0500 >From: Mauricio Herrera Cuadra >Subject: Re: [Bioperl-l] error running codeml >To: Munirul Islam >Cc: bioperl-l at lists.open-bio.org > >Apparently, your script isn't able to locate the codeml executable in >your Windows environment. Do you have the PAML package installed? >Instructions on how to install it are located here: > >http://abacus.gene.ucl.ac.uk/software/paml.html > >Regards, >Mauricio. > >Munirul Islam wrote: >> Hello everyone: >> >> I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: >> >> [code] >> use Bio::Tools::Run::Phylo::PAML::Codeml; >> use Bio::AlignIO; >> use Bio::TreeIO; >> >> my $alignio = Bio::AlignIO->new(-format => 'phylip', >> -file => 'NM_000034.CDSalign.paml'); >> >> my $aln = $alignio->next_aln; >> >> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); >> my $tree = $treeio->next_tree; >> >> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); >> >> $codeml->alignment($aln); >> $codeml->tree($tree); >> >> my ($rc,$parser) = $codeml->run(); >> my $result = $parser->next_result; >> my $MLmatrix = $result->get_MLmatrix(); >> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; >> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; >> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; >> [/code] >> >> It gives the following error when I try to compile: >> >> [error] >> ------------ EXCEPTION: Bio::Root::Exception ------------- >> MSG: unable to find or run executable for 'codeml' >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 >> ----------------------------------------------------------- >> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 >> [/error] >> >> Any idea, guys? >> >> Munirul Islam >> Phd Student >> Computer Science >> Wayne State University >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >-- >MAURICIO HERRERA CUADRA >arareko at campus.iztacala.unam.mx >Laboratorio de Gen?tica >Unidad de Morfofisiolog?a y Funci?n >Facultad de Estudios Superiores Iztacala, UNAM > From jason at bioperl.org Tue Jul 24 03:19:18 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 24 Jul 2007 09:19:18 +0200 Subject: [Bioperl-l] error running codeml In-Reply-To: <46A51B5C.9080808@campus.iztacala.unam.mx> References: <20070723160614.EEU90041@mirapointms6.wayne.edu> <46A51B5C.9080808@campus.iztacala.unam.mx> Message-ID: <8273f6c20707240019q1f5e55c9i79a3142a92e2be6e@mail.gmail.com> when you initialize the Codeml object just pass in my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1); OR do $codeml->save_tempfiles(1); You may want to set you TEMPDIR as well and you print out where the tempdir is located with print $codeml->tempdir; and I think you can get the temp outfile. my $name = $codeml->outfile_name; print "name is $name\n"; -jason On 7/23/07, Mauricio Herrera Cuadra wrote: > > Apparently, your script isn't able to locate the codeml executable in > your Windows environment. Do you have the PAML package installed? > Instructions on how to install it are located here: > > http://abacus.gene.ucl.ac.uk/software/paml.html > > Regards, > Mauricio. > > > Munirul Islam wrote: > > Hello everyone: > > > > I am new to bioperl. I am running perl in Eclipse in Windows. Here is > the code: > > > > [code] > > use Bio::Tools::Run::Phylo::PAML::Codeml; > > use Bio::AlignIO; > > use Bio::TreeIO; > > > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > > -file => 'NM_000034.CDSalign.paml'); > > > > my $aln = $alignio->next_aln; > > > > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); > > my $tree = $treeio->next_tree; > > > > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); > > > > $codeml->alignment($aln); > > $codeml->tree($tree); > > > > my ($rc,$parser) = $codeml->run(); > > my $result = $parser->next_result; > > my $MLmatrix = $result->get_MLmatrix(); > > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; > > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; > > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; > > [/code] > > > > It gives the following error when I try to compile: > > > > [error] > > ------------ EXCEPTION: Bio::Root::Exception ------------- > > MSG: unable to find or run executable for 'codeml' > > STACK: Error::throw > > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 > > ----------------------------------------------------------- > > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI > (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 > > [/error] > > > > Any idea, guys? > > > > Munirul Islam > > Phd Student > > Computer Science > > Wayne State University > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From ba6450 at wayne.edu Tue Jul 24 17:16:54 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Tue, 24 Jul 2007 17:16:54 -0400 (EDT) Subject: [Bioperl-l] error loading sequence Message-ID: <20070724171654.EEX04380@mirapointms6.wayne.edu> Hello everyone: I am having problem loading a sequence file from within a directory. ############################################################# $dirname = "rundir"; opendir (DIR, $dirname) || die("can't open $dirname"); while (defined($file = readdir(DIR))) { next if $file =~ /^\.\.?$/; # skip . and .. $abs_path = File::Spec->rel2abs( $file ) ; # gives a file not found exception for the following code my $alignio = Bio::AlignIO->new(-format => 'nexus', -file => $abs_path); my $aln = $alignio->next_aln; @sequencenames -> $aln->_read_taxlabels; foreach $taxa (@sequencenames) { print $taxa . "\n"; } } ############################################################# Your suggestions please. Regards, Munirul Islam PhD Student Computer Science Wayne State University Detroit, Michigan, USA From bix at sendu.me.uk Tue Jul 24 18:39:33 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 24 Jul 2007 23:39:33 +0100 Subject: [Bioperl-l] error loading sequence In-Reply-To: <20070724171654.EEX04380@mirapointms6.wayne.edu> References: <20070724171654.EEX04380@mirapointms6.wayne.edu> Message-ID: <46A67FA5.3070505@sendu.me.uk> Munirul Islam wrote: > Hello everyone: > > I am having problem loading a sequence file from within a directory. > > ############################################################# > $dirname = "rundir"; > opendir (DIR, $dirname) || die("can't open $dirname"); > > while (defined($file = readdir(DIR))) { > next if $file =~ /^\.\.?$/; # skip . and .. > $abs_path = File::Spec->rel2abs( $file ) ; > > # gives a file not found exception for the following code This isn't a Bioperl problem. You're using the wrong File::Spec method. You want File::Spec->catfile($dirname, $file). From ba6450 at wayne.edu Tue Jul 24 20:10:04 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Tue, 24 Jul 2007 20:10:04 -0400 (EDT) Subject: [Bioperl-l] error loading sequence Message-ID: <20070724201004.EEX30791@mirapointms6.wayne.edu> Thanks. That worked nicely. I need your suggestion to load codeml control data from a file. Consider the following code: ------------------------------------------------------------- my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1, -params => {'noisy' => 9, 'verbose' => 2, 'runmode' => 0, 'seqtype' => 1, 'CodonFreq' => 2, 'aaDist' => 0, 'model' => 2, 'NSsites' => 2, 'icode' => 0 }); ------------------------------------------------------------- Tried to modify it by passing a hash reference after loading data from a file.: ------------------------------------------------------------- my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1, -params => \%hashlist ); ------------------------------------------------------------- Still that didn't work. Your suggestions pls. Munir ---- Original message ---- >Date: Tue, 24 Jul 2007 23:39:33 +0100 >From: Sendu Bala >Subject: Re: [Bioperl-l] error loading sequence >To: Munirul Islam >Cc: bioperl-l at lists.open-bio.org > >Munirul Islam wrote: >> Hello everyone: >> >> I am having problem loading a sequence file from within a directory. >> >> ############################################################# >> $dirname = "rundir"; >> opendir (DIR, $dirname) || die("can't open $dirname"); >> >> while (defined($file = readdir(DIR))) { >> next if $file =~ /^\.\.?$/; # skip . and .. >> $abs_path = File::Spec->rel2abs( $file ) ; >> >> # gives a file not found exception for the following code > >This isn't a Bioperl problem. You're using the wrong File::Spec method. >You want File::Spec->catfile($dirname, $file). From ba6450 at wayne.edu Thu Jul 26 15:21:20 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Thu, 26 Jul 2007 15:21:20 -0400 (EDT) Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl) Message-ID: <20070726152120.EFA94600@mirapointms6.wayne.edu> Hello Everyone: I have an alignment ('seq.txt'). It runs fine when I directly run codeml. But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved. my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => 'seq.txt'); I guess its not in valid phylip format. I tried to change 'seq.txt' to sequential format. Still that didn't work. Any suggestions on how to load 'seq.txt' in bioperl? Thanks, Munir PhD Student Computer Science Wayne State University -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: seq.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seq.out Type: application/octet-stream Size: 24318 bytes Desc: not available URL: From jason at bioperl.org Thu Jul 26 20:12:03 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Jul 2007 17:12:03 -0700 Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl) In-Reply-To: <20070726152120.EFA94600@mirapointms6.wayne.edu> References: <20070726152120.EFA94600@mirapointms6.wayne.edu> Message-ID: <8273f6c20707261712o149fb884v2044421146e8bc24@mail.gmail.com> You can try and pass in -interleaved => 0 as another option when you init your AlignIO object. On 7/26/07, Munirul Islam wrote: > Hello Everyone: > > I have an alignment ('seq.txt'). It runs fine when I directly run codeml. But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved. > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > -file => 'seq.txt'); > > I guess its not in valid phylip format. > > I tried to change 'seq.txt' to sequential format. Still that didn't work. > > Any suggestions on how to load 'seq.txt' in bioperl? > > Thanks, > > Munir > PhD Student > Computer Science > Wayne State University > > 11 2202 > > human > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAT AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC > GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC > CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT > TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CAC CCC TCA GAG CGC CCC ACA GCT GGC CCC > ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG > CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT > GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG --- > --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG > CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CGG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGA GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG > AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TCC CGG AGT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > chimp > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAC AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AAA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC > GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC > CCC AGC GAG AGA CTT TAC ACC CAG GAT GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC > CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT > TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CGC CCC TCA GAG CGC CCC ACA GCT GGC CCC > ACA GGT CCC CCC NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN --- NNN ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG > CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT > GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG --- > --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG > CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT TTG GAC AAG > CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG > AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TCC CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > macaca > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AAA ACC NNN AAT CTC ACT GAC AGG CAG CTG GCA GAG GAC TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CAT --- GGA GAC TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC CAG ACC GGT GAG CTA GAC AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAA GAC GCC TTT GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGG CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCG > CTG GGC AAG GGC GTC GTG GTT CCA ACT AAG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACA GAC GGT CGC TCC GAC > GGC GTG CCC TGG TGC AGT ACC ACA GCC AAC TAC GAC ACT GAC CGC CGG TTT GGC TTC TGT > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAC GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GCC GAC TCG ACC GTG ATC GGG GGC AAC TCG GCG GGG GAG CTG TGC GTT TTC CCC TTC > ACC TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT CTG TTC CTC GTG GCA GCT CAC GAA TTC GGC CAC GCG CTG GGC TTA GAT CAT > ACC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGA TTC ACT GAG GAG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CAG TAT CTC TAT GGT TCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACT GGA CCC CCC ACT GTC CGC CCC TCA GAC CGC CCC ACA GCC GGC CCC > ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG ACC ACT ACT --- GTG > CCT TTG AAT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC ACG GAG ATC > GGG AAC CAG CTG TAT CTG TTC AAG GAT GGG AGG TAC TGG --- --- CGA TTC TCC GAG --- > --- CGC AGG GGG AGC CGG CTG CAG GGC CCC TTC CTT ATC GCC GAC ACG TGG CCC GCG TTG > CCC CGC AAG CTG GAC TCG GCC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTA GAC AAG > CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG CGT GGC GCG GGG > AAG ATG CTG CTA TTC AGC GGG CGG CGC TTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTA GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CAA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TTC CAG AGT NNN NNN NNN NNN NNN NNN NNN GGG GTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > mouse > GCT GCC CCT TAC CAG CGC --- CAG CCG --- ACT TTT --- GTG GTC TTC CCC AAA GAC CTG > AAA ACC TCC AAC CTC ACG GAC ACC CAG CTG GCA GAG GCA TAC TTG TAC CGC TAT GGT TAC > ACC CGG GCC GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCT CTA CGG --- CCG GCT TTG > CTG ATG CTT CAG AAG CAG CTC TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC CAG ACA CTA > AAG GCC ATT CGA ACA CCA CGC TGT GGT GTC CCA GAC GTG GGT CGA TTC CAA ACC TTC AAA > GGC NNN CTC AAG TGG GAC CAT CAT AAC ATC ACA TAC TGG ATC CAA AAC TAC TCT GAA GAC > TTG CCG CGA GAC ATG ATC GAT GAC GCC TTC GCG CGC GCC TTC GCG GTG TGG GGC GAG GTG > GCA CCC CTC ACC TTC ACC CGC GTG TAC GGA CCC GAA GCG GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGC AAG GAC GGC CTT CTG GCA CAC GCC > TTT CCC CCT GGC GCC GGC GTT CAG GGA GAT GCC CAT TTC GAC GAC GAC GAG TTG TGG TCG > CTG GGC AAA GGC GTC GTG ATC CCC ACT TAC TAT GGA AAC TCA AAT GGT GCC CCA TGT CAC > TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TCG GCC TGC ACC ACA GAC GGC CGC AAC GAC > GGC ACG CCT TGG TGT AGC ACA ACA GCT GAC TAC GAT AAG GAC GGC AAA TTT GGT TTC TGC > CCT AGT GAG AGA CTC TAC ACG GAG CAC GGC AAC GGA GAA GGC AAA CCC TGT GTG TTC CCG > TTC ATC TTT GAG GGC CGC TCC TAC TCT GCC TGC ACC ACT AAA GGC CGC TCG GAT GGT TAC > CGC TGG TGC GCC ACC ACA GCC AAC TAT GAC CAG GAT AAA CTG TAT GGC TTC TGC CCT ACC > CGA GTG GAC GCG ACC GTA GTT GGG GGC AAC TCG GCA GGA GAG CTG TGC GTC TTC CCC TTC > GTC TTC CTG GGC AAG CAG TAC TCT TCC TGT ACC AGC GAC GGC CGC AGG GAT GGG CGC CTC > TGG TGT GCG ACC ACA TCG AAC TTC GAC ACT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA > GGG TAC AGC CTG TTC CTG GTG GCA GCG CAC GAG TTC GGC CAT GCA CTG GGC TTA GAT CAT > TCC AGC GTG CCG GAA GCG CTC ATG TAC CCG CTG TAT AGC TAC CTC GAG GGC TTC CCT CTG > AAT AAA GAC GAC ATA GAC GGC ATC CAG TAT CTG TAT GGT CGT GGC TCT AAG CCT GAC CCA > AGG CCT CCA GCC ACC ACC ACA ACT NNN NNN NNN GAA --- CCA CAG CCG ACA GCA CCT CCC > ACT ATG TGT CCC ACT ATA CCT CCC ACG GCC TAT CCC ACA GTG GGC CCC ACG GTT GGC CCT > ACA GGC GCC CCC TCA CCT GGC CCC ACA AGC AGC CCG TCA CCT GGC CCT ACA GGC GCC CCC > TCA CCT GGC CCT ACA GCG CCC --- CCT ACT GCG GGC TCT TCT GAG GCC TCT ACA --- GAG > TCT TTG AGT CCG GCA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCT ATT GCT GAG ATC > CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT TGG TAC TGG --- --- AAG TTC CTG AAT --- > --- CAT AGA GGA AGC CCA TTA CAG GGC CCC TTC CTT ACT GCC CGC ACG TGG CCA GCC CTG > CCT GCA ACG CTG GAC TCC GCC TTT GAG GAT CCG CAG ACC AAG AGG GTT TTC TTC TTC TCT > GGA CGT CAA ATG TGG GTG TAC ACA GGC AAG ACC GTG CTG GGC CCC AGG AGT CTG GAT AAG > TTG GGT CTA GGC CCA GAG GTA ACC CAC GTC AGC GGG CTT CTC CCG CGT CGT CTC --- GGG > AAG GCT CTG CTG TTC AGC AAG GGG CGT GTC TGG AGA TTC GAC TTG AAG TCT CAG AAG GTG > GAT CCC CAG AGC GTC ATT CGC --- --- GTG GAT AAG GAG TTC TCT GGT GTG CCC TGG AAC > TCA CAC GAC ATC TTC CAG TAC CAA --- GAC AAA GCC TAT --- TTC TGC CAT GGC AAA TTC > TTC TGG CGT GTG AGT TTC CAA AAT GAG GTG AAC AAG GTG GAC CAT GAG GTG AAC CAG GTG > GAC GAC GTG GGC TAC GTG ACC TAC GAC CTC CTG CAG TGC CCT > rat > GCT GCC CCT CAC CAG CGC --- CAG CCG --- ACT TAT --- GTG GTC TTC CCC CGA GAC CTG > AAA ACC TCC AAC CTC ACG GAC ACA CAG CTG GCA GAG GAT TAC CTG TAC CGC TAT GGT TAC > ACT CGG GCA GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCC CTG CGG --- CCC GCT TTG > CTG ATG CTT CAG AAG CAG CTG TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC GAG ACA CTA > AAG GCC ATT CGT TCA CCG CGC TGT GGT GTC CCA GAC GTG GGC AAA TTC CAA ACC TTC GAA > GGC GAC CTC AAG TGG CAC CAT CAT AAC ATC ACC TAT TGG ATC CAA AGC TAC ACC GAA GAC > TTG CCG CGA GAC GTG ATC GAT GAC TCC TTC GCG CGC GCC TTC GCG GTG TGG AGC GCG GTG > ACA CCG CTC ACC TTC ACC CGC GTG TAC GGG CTC GAA GCA GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGG GAC GGG TAT CCC TTC GAC GGC AAG GAT GGT CTA CTG GCA CAC GCC > TTT CCC CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAC GAG TTG TGG TCG > CTG GGC AAA GGC GCC GTG GTC CCC ACT TAC TTT GGA AAC GCA AAT GGT GCC CCA TGT CAC > TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TTG TCC TGC ACC ACG GAT GGC CGC AAC GAC > GGC AAG CCT TGG TGT GGC ACG ACA GCT GAC TAC GAC ACA GAC AGA AAA TAT GGT TTC TGC > CCC AGT GAG AAT CTC TAC ACG GAG CAT GGC AAC GGA GAC GGC AAA CCC TGC GTA TTT CCA > TTC ATC TTC GAG GGC CAC TCC TAC TCT GCC TGC ACC ACT AAA GGT CGC TCG GAT GGT TAT > CGC TGG TGC GCC ACC ACC GCC AAC TAT GAC CAG GAT AAG CTG TAT GGC TTC TGT CCT ACT > CGA GCC GAC GTC ACT GTA ACT GGG GGC AAC TCG GCA GGA GAG ATG TGC GTC TTC CCC TTC > GTC TTC CTG GGC AAG CAG TAC TCT ACC TGT ACC GGC GAG GGC CGC AGT GAT GGG CGC CTC > TGG TGC GCG ACG ACG TCG AAC TTC GAC GCT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA > GGG TAC AGC CTG TTT CTG GTG GCA GCG CAC GAG TTC GGC CAT GCG CTG GGC TTA GAT CAT > TCT TCA GTG CCG GAA GCG CTC ATG TAC CCC ATG TAT CAC TAC CAC GAG GAC TCC CCT CTG > CAT GAA GAC GAC ATA AAA GGC ATC CAG CAT CTG TAT GGT CGT GGC TCT AAA CCT GAC CCA > AGG CCT CCA GCC ACC ACC GCA GCT NNN NNN NNN GAA --- CCA CAG CCG ACA GCT CCT CCC > ACT ATG TGT CCC ACT GCA CCT CCC ATG GCC TAT CCC ACA GGG GGC CCC ACA GTC GCC CCT > ACA GGC GCC CCC TCA CCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCT ACT GCT GGT CCT TCT GAG GCC CCT ACA --- GAG > TCT TCG ACT CCA GTA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCC ATT GCT GAT ATC > CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT CGG TAT TGG --- --- AAG TTC TCG AAT --- > --- CAC GGA GGA AGC CAA TTG CAG GGC CCC TTT CTT ATT GCC CGC ACG TGG CCA GCT TTG > CCT GCA AAG TTG AAC TCA GCC TTT GAG GAT CCG CAG TCC AAG AAG ATT TTC TTC TTC TCT > GGG CGC AAA ATG TGG GTG TAC ACA GGC CAG ACG GTG CTG GGC CCC AGG AGT CTG GAT AAG > TTG GGG CTA GGC TCA GAG GTA ACC CTG GTC ACC GGA CTT CTC CCG CGT CGT GGA --- GGG > AAG GCT CTG CTG ATC AGC CGG GAA CGT ATC TGG AAA TTC GAC TTG AAG TCT CAG AAG GTG > GAT CCC CAG AGC GTT ACT CGC --- --- TTG GAT AAC GAG TTC TCT GGC GTG CCC TGG AAC > TCA CAC AAC GTC TTT CAC TAC CAA --- GAC AAG GCC TAT --- TTC TGC CAT GAC AAA TAC > TTC TGG CGT GTG AGT TTC CAC AAC NNN NNN NNN NNN NNN NNN NNN CGG GTG AAC CAG GTG > GAC CAC GTG GCC TAC GTG ACC TAT GAC CTC CTG CAG TGC CCT > rabbit > GCC GCC CCT CGC CGC CGC --- CAG CCC --- ACC TTG --- GTG GTC TTC CCA GGA GAG CTG > AGA ACC NNN AGG CTC ACC GAC AGG CAG CTG GCA GAG GAG TAC CTG TTC CGC TAT GGT TAC > ACC CGC GTA GCC AGC ATG CAC --- GGA GAC AGC CAG --- TCC CTG CGG CTG CCG --- CTG > CTA CTT CTG CAG AAG CAT CTG TCC CTG CCG GAG ACG GGG GAG CTG GAT AAT GCC ACC CTG > GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC GTG GGC AAA TTC CAG ACC TTC GAG > GGT GAC CTC AAG TGG CAC CAC CAC AAC ATC ACG TAC TGG ATC CAA AAC TAC TCC GAA GAC > CTG CCG CGC GAC GTC ATC GAC GAC GCC TTC GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG > ACG CCA CTC ACC TTC ACC CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGG > GTC GCG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGC AAG GAC GGG CTC CTG GCG CAC GCC > TTC CCT CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAA GAG CTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCC ACG TAC TTT GGA AAC GCC GAC GGC GCC CCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC ACC GCC TGC ACC ACG GAC GGC CGC TCT GAC > GGC ATG GCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTT GGC TTC TGC > CCC AGC GAA AGA CTC TAC ACC CAG GAC GGC AAC GCA GAC GGC AAG CCC TGC GAG TTT CCG > TTC ATC TTC CAG GGC CGT ACC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCC GAC GGC CAC > CGC TGG TGC GCC ACC ACC GCC AGC TAC GAC AAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GCT GAC TCC ACG GTG GTC GGG GGC AAC TCG GCG GGA GAG CTG TGT GTC TTC CCC TTC > GTC TTC CTG GGC AAA GAG TAC TCG TCC TGT ACC AGC GAG GGT CGC AGG GAT GGG CGC CTC > TGG TGT GCC ACC ACT TCC AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCT GAT AAA > GGA TAC AGC CTG TTC CTC GTG GCA GCC CAC GAG TTC GGC CAT GCA CTG GGC TTG GAT CAC > TCC TCT GTG CCG GAG CGC CTC ATG TAC CCC ATG TAC CGC TAC CTA GAG GGG TCC CCC CTG > CAC GAG GAC GAC GTC AGG GGC ATC CAG CAT CTA TAT GGT CCT AAC CCC AAC CCC CAG CCT > --- CCA GCC ACC ACC ACA CCT GAN NNN NNN NNN NNN NNG CCG CAG CCC ACG GCT CCC CCG > ACG GCC TGC CCC ACC TGG CCG GCC ACT GTG CGC CCC TCC GAG CAC CCC ACT ACC AGC CCT > ACC GGC GCC CCC TCA GCT GGC CCT ACC GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACG GCC AGC CCC TCT GCG GCC CCC ACT --- GCG > TCC TTG GAC CCA GCT GAA GAC GTC TGC AAC GTG AAT GTC TTC GAC GCC ATC GCC GAG ATA > GGG AAC AAG CTG CAT GTC TTC AAG GAT GGG AGG TAC TGG --- --- CGG TTC TCC GAG --- > --- GGC AGT GGG CGC CGG CCG CAG GGC CCC TTC CTC ATC GCC GAC ACC TGG CCC GCG CTG > CCG GCC AAG CTG GAC TCC GCC TTT GAG GAG CCG CTC ACC AAG AAG CTG TTC TTC TTC TCG > GGG CGC CAA GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGT CCC GAG GTG CCG CAC GTC ACC GGA GCC CTC CCG CGC GCC GGG --- GGC > AAG GTG CTG CTG TTC GGC GCG CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACG GTG > GAT TCC CGG AGC GGC GCT CCG --- --- GTG GAT CAG ATG TTC CCC GGG GTG CCT TTG AAC > ACA CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TTC TGG CGT GTG AGT ACC CGG AAC NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CTA GTG > GAC CAG GTG GGC TAC GTG AGC TTT GAC ATC CTG CAC TGC CCT > dog > GCA GCT CCC AGA CCA CAC --- AAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAC CTG > AGA ACT NNN AAT CTC ACT GAC AAG CAG CTG GCA GAG GAA TAT CTG TTT CGC TAT GGC TAC > ACT CAA GTG GCC GAG CTG AGC --- GAC GAC AAG CAG --- TCC CTG AGT CGC GGG --- CTG > CGG CTT CTC CAG AGG CGC CTG GCT CTG CCT GAG ACT GGA GAG CTG GAC AAA ACC ACC CTG > GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC CTG GGC AAA TTC CAG ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC AAC GAC ATC ACT TAC TGG ATA CAA AAC TAC TCG GAA GAC > TTG CCC CGC GAC GTG ATC GAC GAC GCC TTT GCC CGA GCC TTC GCG GTC TGG AGC GCG GTG > ACA CCG CTC ACC TTC ACT CGC GTG TAC GGC CCC GAA GCC GAC ATC ATC ATT CAG TTT GGT > GTT AGG GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTT CTG GCT CAC GCC > TTT CCT CCC GGC CCG GGC ATT CAG GGA GAC GCC CAC TTC GAC GAC GAG GAG TTA TGG ACT > CTG GGC AAG GGC GTC GTG GTT CCG ACC CAC TTC GGA AAC GCA GAT GGC GCC CCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACG GAC GGC CGC TCC GAT > GAC ACG CCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTC GGC TTC TGC > CCC AGC GAG AAA CTC TAC ACC CAG GAC GGC AAT GGG GAC GGC AAG CCC TGC GTG TTT CCG > TTC ACC TTC GAG GGC CGC TCC TAC TCC ACG TGC ACC ACC GAC GGC CGC TCG GAC GGC TAC > CGC TGG TGC TCC ACC ACC GGC GAC TAC GAC CAG GAC AAA CTC TAC GGC TTC TGC CCA ACC > CGA GTC GAT TCC GCG GTG ACC GGG GGC AAC TCC GCC GGG GAG CCG TGT GTC TTC CCC TTC > ATC TTC CTG GGC AAG CAG TAC TCG ACG TGC ACC AGG GAG GGC CGC GGA GAT GGG CAC CTC > TGG TGC GCC ACC ACT TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGC CTG TTC CTT GTG GCC GCC CAT GAG TTC GGC CAC GCG CTG GGT TTA GAT CAT > TCA TCG GTG CCA GAA GCG CTC ATG TAC CCC ATG TAC AGC TTC ACC GAG GGC CCC CCC CTG > CAT GAA GAC GAC GTG AGG GGC ATC CAG CAT CTG TAC GGT CCT CGC CCT GAA CCT GAG CCA > CAG CCT CCA ACC GCN NNN NNN NNN NNN NNN NNN NNN --- NNC CCG CCC ACC GCC CCG CCC > ACC GTC TGC GCT ACT GGT CCT CCC ACC ACC CGC CCC TCA GAG CGC CCC ACT GCT GGC CCC > ACA GGC CCC CCT GCA GCT GGC CCC ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCC TCT GAG GCC CCT ACA --- GTG > CCT GTG GAT CCG GCA GAG GAT ATA TGC AAA GTG AAC ATC TTC GAC GCC ATC GCG GAG ATC > AGG AAC TAC TTG CAT TTC TTC AAG GAA GGG AAG TAC TGG --- --- CGA TTC TCC AAG --- > --- GGC AAG GGA CGC CGG GTG CAG GGC CCC TTC CTT ATC ACC GAC ACG TGG CCT GCG CTG > CCC CGC AAG CTG GAC TCC GCC TTT GAG GAC GGG CTC ACC AAG AAG ACT TTC TTC TTC TCT > GGG CGC CAA GTG TGG GTG TAC ACA GGC ACG TCG GTG GTA GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGC CCG GAG GTT ACC CAA GTC ACC GGC GCC CTC CCG CAA GGC GGG --- GGT > AAG GTG CTG CTG TTC AGC AGG CAG CGC TTC TGG AGT TTC GAC GTG AAG ACG CAG ACC GTG > GAT CCC AGG AGC GCC GGC TCG --- --- GTG GAA CAG ATG TAC CCC GGG GTG CCC TTG AAC > ACG CAT GAC ATC TTC CAG TAC CAA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TAC TGG CGT GTG AAT TCT CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CAG GTG > GAC GAA GTG GGC TAC GTG ACC TTT GAC ATT TTG CAG TGC CCT > cow > GCT GTC CCC AGA CGA CGC --- CAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAA CCA > CGA ACC NNN AAC CTC ACC AAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGC TAC > ACT CCT GGG GCA GAG CTG AGC --- GAG GAC GGT CAG --- TCC CTG CAG CGA GCT CTG CTG > CGC --- TTC CAG CGG CGC CTG TCC CTG CCC GAG ACT GGC GAG CTG GAC AGC ACC ACC CTG > AAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC GTG GGC AGA TTC CAG ACC TTT GAG > GGC GAA CTC AAG TGG CAC CAC CAC AAC ATC ACC TAC TGG ATC CAA AAT TAC TCG GAA GAC > CTG CCG CGC GCC GTG ATC GAC GAC GCC TTT GCC CGC GCT TTC GCG CTC TGG AGC GCT GTG > ACG CCG CTC ACC TTC ACT CGA GTG TAC GGC CCC GAA GCT GAC ATT GTC ATC CAG TTT GGT > GTT AGA GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTC CTG GCA CAC GCC > TTT CCG CCT GGC AAA GGC ATT CAG GGA GAT GCC CAC TTC GAC GAT GAA GAG TTG TGG TCT > CTG GGC AAA GGC GTT GTG ATC CCG ACC TAC TTC GGA AAC GCG AAG GGC GCC GCC TGC CAC > TTC CCC TTC ACC TTT GAG GGT CGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGT TCC GAC > GAC ATG CTC TGG TGC AGC ACC ACC GCC GAC TAC GAC GCC GAC CGC CAG TTC GGC TTC TGC > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCG GAC GGC AAG CCC TGC GTC TTC CCG > TTC ACC TTC CAG GGC CGC ACC TAC TCC GCC TGT ACC TCC GAT GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GTC GAT GCA ACG GTG ACC GGG GGC AAC GCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACC TTC CTG GGC AAG GAA TAC TCG GCC TGC ACC AGA GAG GGT CGC AAT GAT GGG CAC CTC > TGG TGC GCC ACC ACC TCC AAC TTC GAC AAA GAC AAG AAG TGG GGC TTC TGC CCG GAT CAA > GGA TAC AGC CTG TTC CTT GTG GCC GCA CAC GAG TTT GGC CAC GCG CTG GGC TTA GAT CAC > ACC TCC GTG CCA GAG GCG CTC ATG TAC CCC ATG TAC AGA TTC ACA GAG GAG CAC CCC CTG > CAT AGG GAC GAT GTT CAG GGC ATC CAG CAT CTG TAT GGT CCT CGC CCT GAG CCT GAA CCA > CGG CCT CCG ACC ACT ACC ACC ACT ACC ACC ACC GAA --- CCC CAG CCC ACC GCT CCC CCC > ACG GTC TGC GTC ACG GGG CCT CCC ACC GCC CGC CCC TCA GAG GGT CCC ACT ACT GGC CCC > ACA GGG CCC CCG GCA GCT GGC CCT ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCT --- CCC ACG GCT GGC CCT TCT GCG GCC CCG ACG GAG TCC > CCG --- GAT CCA GCG GAG GAC GTC TGC AAC GTG GAC ATC TTC GAC GCC ATC GCG GAG ATT > AGG AAC CGC TTG CAT TTC TTC AAG GCT GGG AAG TAC TGG --- --- AGA CTT TCT GAG --- > --- GGA GGG GGC CGC CGG GTG CAG GGT CCC TTC CTT GTC AAG AGC AAG TGG CCT GCG CTG > CCC CGC AAG CTG GAC TCC GCC TTC GAG GAT CCG CTC ACC AAG AAG ATT TTC TTC TTC TCT > GGG CGC CAA GTA TGG GTG TAC ACC GGC GCG TCG TTG CTA GGC CCG AGG CGT CTG GAC AAG > TTG GGC CTG GGC CCG GAA GTG GCC CAG GTC ACC GGG GCC CTC CCG CGC CCT GAG --- GGT > AAG GTG CTG CTG TTC AGC GGG CAG AGC TTC TGG AGG TTC GAC GTG AAG ACA CAG AAG GTG > GAT CCC CAG AGC GTC ACC CCC --- --- GTG GAC CAG ATG TTC CCC GGG GTG CCC ATT AGC > ACG CAC GAC ATC TTT CAG TAC CAA --- GAG AAA GCT TAC --- TTC TGC CAG GAT CAC TTC > TAC TGG CGC GTG AGT TCC CAG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAT CAG GTG > GAC TAT GTG GGC TAC GTG ACC TTC GAC CTC CTG AAG TGC CCT > elephant > --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- > --- --- --- --- --- --- --- --- --- --- --- GAG --- TAT CTG TAC CGC TAT GGC TAC > ACT CGT GTG GCG GAG ATG AAC --- --- AGT AAG GTG --- TCC CTG GGT --- CGA GCG CTA > AGG CTT CTC CAG CAA AAC CTG GCC CTG CCC GAG ACC GGC GAG CTG GAC AGC ACC ACC CTG > GAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC ATG GGT GGC TTC CAG ACC TTC GAG > GGT GAC CTC AAG TGG AAC CAC CAC AAC ATC ACA TAC TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCC AAA CAA GTG ATC GAA GAC GCT TTT GCC CGC GCC TTC GCG GCG TGG AGC GAG GTG > ACA CCA CTC ACC TTC ACC CGC CTG CGC AGC AGG GAC GTG GAC ATC GTC ATC CGG TTT GGG > GTC AAG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGG AAG GAC GGG CTG CTG GCA CAC GCC > TTT CCT CCC GGC CCC GGC ATT CAG GGA GAC GCG CAC TTC GAC GAT GAC GAA TTG TGG TCG > TTG GGC AAG GGC GTC GTG GTT CCC ACC CGC TTT GGA AAC GCA GAT GGC GCC GCC TGC CAC > TTT CCC TTC ACC TTC CAG GGC CGC TCG TAC ACT GCC TGC ACC GCC GAC GGC CGC TCC GAC > GGC CAG CTC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGC CAG TTT GGC TTC TGC > CCC AGT GAG AGG CTC TAC ACC CAG CAC GGC AAT GAC AAC GGC AAG CCC TGC GTG TTT CCG > TTC ACG TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACC GAC GGC CGC TCG GAT GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAT GGC TTC TGT CCC ACC > CGA --- GNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- NNN NNN NNN --- > --- --- --- --- --- --- --- --- NNN NNN --- NNN NNN NNN --- --- --- --- --- --- > --- --- --- --- NNN NNN NNN NNN NNN --- --- --- --- --- --- --- --- NNN NNN NNN > NNN NNN --- --- --- --- NNN --- NNN NNN NNN NNN --- --- --- --- NNN NNN --- --- > --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- --- NNN NNN NNN NNN --- > --- --- --- --- --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- NNN NNN NNN --- NNN > NNN ATA GTG CTG TTT AGT AGA CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACT GTG > GAG CCC CGG AGC GTC CGC TCG --- --- GTG GAC CAG GTG TTC TCC GGG GTG CCC TTG GAC > ACG CAC GAC ATC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG TGT TTC CGG AAT GAT --- AAT GAA --- --- --- --- GTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG AAC TTT GAC ATC CTG CAG TGC CCT > opossum > GCT GCA CCC CGA GGG GGC CCC TCT CCC GGG TCT ATC TTG ATC ACC TTT CCT GAA GAG AGA > --- ACA CGC ACT CTC ACT GAC CAG CAA TTT GCT GAG GAA TAT CTG CTT CGG TAC GGC TAC > ATC CCG --- GCA GGG CTT CTG --- GGC CAA AAC CAC ACT TCT CTG AAG --- CAT GCC TTA > AAG AAA CTC CAA CGT CAG CTG GCC CTG ACA CAG ACG GGA GAG CTG GAC AGC GCC ACC ATC > GAG GCA ATG CGG GCC CCG CGC TGC GGA GTA CCC GAC GTC GCC CCA TTC CAA ACC TTC GAG > GGT GAA CTG AAG TGG AAA CAT CAG AAC ATC ACC TAT CGG ATC CAG AAT TAC TCC CCC GAC > CTG CCT CCT GAG GTG ACG GAT GAT GCT TTC CAA CGA GCC TTT GCT CTG TGG AGT AAA GTG > ACC CCA CTC ACC TTC ACA CGT GTC AGC AGC GGG GAG GCA GAC ATC CTG ATC CAG TTT GGG > ACC AGA GAG CAC GGC GAT GGA TAC CCT TTT GAC GGG AAA GAT GGA CTC TTG GCT CAC GCT > TTC CCC CCG GGC CCA GGA ATC CAG GGA GAT GCC CAC TTT GAT GAC GAG GAG TTC TGG ACT > CTA GGC AAA GGC GTC GTG GTC AAA ACG CGG TTC GGG AAC GCA GAC GGA GCC CCC TGC CAC > TTT CCT TTC ACC TTC GAG GGC AGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCT GAC > GGG CTG CAC TGG TGC AGC ACT ACG GCT GAC TAT GAC AAG GAC CGC CTT TAC GGC TTT TGC > CCT AGC GAG CTG CTC TAC ACC CTG GAT GGT AAC GCC AAT GGC GAT CCC TGC GTG TTC CCC > TTC ACC TTC GAT GGT CGT TCC TAC ACA GCC TGC ACC ACT GAA GGA CGC TCT GAC GGC TAC > CGC TGG TGT GCC ACT ACT GCC AGT TAC GAT CAG GAC AAG CTT TAT GGC TTC TGT CCC AAC > CGA --- GAT ACT GCG GTG AGC GGA GGC AAC TCC CAA GGG GAA CCC TGC GTC TTT CCC TTC > ACT TTC CTA AAT CGA GAA TAC TCA GCC TGC ACC AGT GAG GGC CGC AGT GAC GGT CGT CTC > TGG TGT GCG ACC ACC GAT GAC TTC GAT CGG GAT CAC AAG TGG GGC TTC TGT CAG GAT CGA > GGG TAC AGC TTA TTC CTT GTG GCC GCG CAC GAG TTT GGG CAC GCG CTG GGC TTG GAC CAC > TCA TCT GTG CCG GAA GCA TTG ATG TTC CCA ATG TAC CGT TTT ACC GAG GGA CCC CCG TTG > CAT GAG GAT GAC GTG AAG GGA ATC CAA CAT CTG TAT GGT TCT AGG ACT GAG CCG GAT CCG > GAA CCT CCG ACC TCT --- --- --- TCT CCC TTA GAG --- CCA GAT TCC ACC ACT CAG TTC > AAT GCT TGT --- --- --- CCC --- TCT GTA --- CCC CCC CCT --- --- --- GCC AGA CCC > ACC GGC CCT CCT ACT GCT CGC CCC TCA --- --- --- --- --- --- --- --- GCA CCT CCC > ACT GCT GGA CCC ACT GGT CCT --- CCC ACA GCC AAC CCT CCT GTG CCC CCC ACT --- GGG > CCC TTG GAC CCA GCT GAC GAC GCT TGT GGC GTC CTG GTA TTT GAT GCC ATC GCT GAG ATT > CGA GGC CAG CTT CAC TTC TTC AAA GAC GGA CGG TAC TGG CGA GTC CCC AGG GAC TCC --- > --- AAG --- GGG CCA --- ACT CAA GGA CCC TTC CTC ATT GCT AAC ACT TGG TCT GCT TTG > CCC CCA AAA CTG GAC TCG GCT TTC GAA GAT CCC CTG ACT AAG AAA CTC TTC TTC TTT TCA > GGT AAA GGT ATG TGG GTA TAC ACA GGC CAG TCA GTT GTA GGT CCC CGG CGC CTG GAG AAG > CTG GGT CTG CAT AGC AGA GTT CAA AGG ATA ACA GGT GCC ATT CAG CAT AAT GGA --- GGC > AAG GTG CTA TTA TTC AGC CAG AAT CAA TAT TGG AGG TTG GAT GTG AAG AAG CAG AAG GTA > GAC TCA AGA GAA CCT TAC CCT --- --- GTG GAG AAC ATG TTC CCT GGA GTA CCT GAA AAC > ACT CAT GAT GTT TTC CTG TAT AAG GGA GAT ACA --- TAC --- TTC TGC CAG GGC ATC TTC > TTC TGG CGC GTG AAC --- --- --- --- --- AAG GAG --- --- --- --- --- AAC AAG GTG > GAC TTA GTA GGC TAC GTG ACC TAC GAC CTC CTG --- --- --- > chicken > GCC GCC CCA CTG CAC AGC --- AAG CCG CAG GCG GTC --- ATC ACC TTC CCA GGG GAG CTG > --- CTC AGC GCC CCA TCA GAC GTG GAG CTG GCG GAG AAC TAC CTG CTG CGC TTC GGC TAC > ATC CAG GAG GCA GAG GTG AGG AGG AGC AGC AAG CAC GTG TCC CTG GCC --- AAA GCG CTG > CGC AGG ATG CAG AAG CAG CTG GGG CTG GAG GAG ACG GGG GAG CTG GAC GCC AGC ACC CTG > GAG GCC ATG CGA GCC CCC CGC TGT GGG GTG CCT GAC GTG GGG GGT TTC CTC ACC TTC GAG > GGG GAG CTC AAA TGG GAC CAC ATG GAC CTC ACG TAC CGG GTG ATG AAC TAC TCC CCC GAC > CTG GAC CGT GCC GTG ATA GAT GAT GCC TTC CGG CGG GCA TTC AAG GTG TGG AGT GAT GTC > ACT CCC CTC ACC TTC ACC CAG ATT TAC AGC GGC GAG GCA GAC ATC ATG ATC ATG TTC GGC > AGC CAA GAG CAT GGT GAT GGG TAC CCC TTC GAC GGC AAG GAT GGG CTC CTG GCC CAC GCC > TTT CCC CCC GGC AGT GGG ATT CAG GGC GAT GCC CAC TTC GAT GAT GAT GAG TTC TGG ACT > CTG GGA ACC GGC TTA GAG GTG AAG ACC CGC TAT GGG AAT GCC AAC GGG GCC AGC TGC CAC > TTC CCC TTC ATC TTT GAG GGC CGC TCC TAC TCC CGG TGC ATC ACG GAG GGC CGC ACG GAT > GGG ATG CTG TGG TGT GCC ACC ACC GCC AGC TAC GAC GCC GAC AAG ACC TAC GGC TTC TGC > CCC AGC GAG CTG CTC TAC ACC AAT GGT GGC AAC AGC GAT GGG TCT CCC TGC GTC TTC CCC > TTC ATC TTC GAT GGC GCC TCC TAT GAC ACC TGC ACC ACA GAT GGG CGC TCT GAC GGC TAT > CGC TGG TGT GCC ACC ACG GCC AAC TTC GAC CAG GAC AAG AAA TAC GGC TTC TGC CCC AAC > CGA --- GAC ACG GCG GCG ATC GGT GGC AAC TCC CAG GGG GAC CCG TGT GTC TTC CCC TTC > ACC TTC CTG GGG CAG TCC TAC AGC GCG CGC ACC AGC CAG GGC CGG CAG GAC GGG AAG CTC > TGG TGT GCC ACC ACC AGC AAC TAT GAC ACC GAC AAG AAG TGG GGC TTC TGC CCA GAC AGA > GGT TAC AGC ATC TTC TTG GTG GCT GCC CAC GAG TTT GGG CAC TCA CTG GGG CTG GAC CAC > TCC AGC GTG CGC GAG GCA TTG ATG TAC CCT ATG TAC AGC TAC GTC CAG GAC TTC CAG CTG > CAT GAG GAT GAT GTC CAG GGC ATC CAG TAC CTC TAT GGT CGT GGC TCT GGC CCT GAG CCC > ACC CCC CCG --- --- --- --- --- GCA CCT TTG --- --- CCC --- --- ACC GAG GAG --- > --- --- --- --- --- --- CCC CAG TCC ATA --- CCC ACC GAA --- --- --- GCT --- --- > --- GGC --- --- AGT GCT TCC ACC ACA --- --- --- --- --- --- --- --- GAG GAG GAG > GAG GAG --- GAG ACA --- CCT GAG CCC ACA GCT GAG --- --- --- --- CCC AGC --- --- > CCC GTG GAC CCC AGC CGG GAT GCC TGC ATG GAG AAG AAC TTC GAC GCC ATC ACT GAG ATC > AAT GGA GAG CTG CAC TTC TTC AAG AAT GGG AAA TAC TGG --- --- ACC CAC TCG TCC TTC > TGG AAA TCA GGC --- --- ACT CAG GGC GCC TTC TCT ATC GCT GAC ACC TGG CCC GGC CTC > CCG GCT GTC ATC GAC GCG GCG TTC CAA GAT GTG CTC ACC AAG AGG GTC TTC TTC TTC GCG > GGA CGG CAG TTC TGG GTG TTC TCC GGC AAG AAC GCA GTG GGC CCC CGT AGG ATT GAG AAG > TTG GGC ATT GGG AAG GAG GCC GGG CGC ATC ACG GGG GCC CTG CAG CGG GGA CGT --- GGC > AAA GTG CTG CTC TTC AGT GGG GAG CAC TAC TGG AGG CTG GAC GTG AAG GTC CAG ACA GTG > GAC --- AAG GGC --- TAC CCC CGT GAC ACT GAT GAT GTC TTT ACT GGT GTC CCC CTT GAC > GCA CGT AAC GTC TTC CTG TAC CAA --- GAC AAG --- TAC CAC TTC TGC CGG GAC AGC TTC > TAC TGG AGG ATG ACC --- --- --- --- --- CCA CGT --- --- --- --- --- TAC CAG GTG > GAC CGC GTG GGA TAC ATC AGA TAC GAC CTC CTG CAG TGC CCC > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From ba6450 at wayne.edu Thu Jul 26 21:20:11 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Thu, 26 Jul 2007 21:20:11 -0400 (EDT) Subject: [Bioperl-l] Finding the Sequence List in an Alignment Message-ID: <20070726212011.EFB49252@mirapointms6.wayne.edu> Thanks. The error is removed now. I have a question. Is there any function that I can use to get the sequence list (human, chimp, etc.) after loading an alignment from file? Munir ---- Original message ---- >Date: Thu, 26 Jul 2007 17:12:03 -0700 >From: "Jason Stajich" >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in bioperl) >To: "Munirul Islam" >Cc: bioperl-l at lists.open-bio.org > >You can try and pass in -interleaved => 0 as another option when you >init your AlignIO object. > From jason at bioperl.org Fri Jul 27 00:28:36 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Jul 2007 21:28:36 -0700 Subject: [Bioperl-l] Finding the Sequence List in an Alignment In-Reply-To: <20070726212011.EFB49252@mirapointms6.wayne.edu> References: <20070726212011.EFB49252@mirapointms6.wayne.edu> Message-ID: <8273f6c20707262128s23e7e3ebgeb1cb74b3c0baf37@mail.gmail.com> Have you tried reading the documentation for the Bio::SimpleAlign object? for my $seq ( $aln->each_seq ) { print $seq->display_id, "\n"; } I'd appreciate if you added some of your questions with the answers to the FAQ or to other places on the wiki so that other people can benefit from your learning here. On 7/26/07, Munirul Islam wrote: > > Thanks. The error is removed now. > > I have a question. Is there any function that I can use to get the > sequence list (human, chimp, etc.) after loading an alignment from file? > > Munir > > ---- Original message ---- > >Date: Thu, 26 Jul 2007 17:12:03 -0700 > >From: "Jason Stajich" > >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in > bioperl) > >To: "Munirul Islam" > >Cc: bioperl-l at lists.open-bio.org > > > >You can try and pass in -interleaved => 0 as another option when you > >init your AlignIO object. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From arareko at campus.iztacala.unam.mx Fri Jul 27 11:18:55 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 27 Jul 2007 10:18:55 -0500 Subject: [Bioperl-l] Perl Survey 2007 Message-ID: <46AA0CDF.1030503@campus.iztacala.unam.mx> It really takes about 5 minutes: http://perlsurvey.org/ Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From dhoworth at mrc-lmb.cam.ac.uk Fri Jul 27 12:07:17 2007 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Fri, 27 Jul 2007 17:07:17 +0100 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: <46AA0CDF.1030503@campus.iztacala.unam.mx> References: <46AA0CDF.1030503@campus.iztacala.unam.mx> Message-ID: <46AA1835.2020004@mrc-lmb.cam.ac.uk> Mauricio Herrera Cuadra wrote: > It really takes about 5 minutes: > http://perlsurvey.org/ and gives all your personal information including email address to anybody who cares to snoop the HTTP POST message! So there's definitely no anonymity. Cheers, Dave From spiros at lokku.com Fri Jul 27 12:38:57 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Fri, 27 Jul 2007 17:38:57 +0100 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: <46AA1835.2020004@mrc-lmb.cam.ac.uk> References: <46AA0CDF.1030503@campus.iztacala.unam.mx> <46AA1835.2020004@mrc-lmb.cam.ac.uk> Message-ID: On 7/27/07, Dave Howorth wrote: > Mauricio Herrera Cuadra wrote: > > It really takes about 5 minutes: > > http://perlsurvey.org/ > > and gives all your personal information including email address to > anybody who cares to snoop the HTTP POST message! So there's definitely > no anonymity. Not to mention that it requires registration (?). Who is behind the survey ? I am on a number of Perl and Perl related lists and haven't seen it being mentioned. Spiros From arareko at campus.iztacala.unam.mx Fri Jul 27 13:37:31 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 27 Jul 2007 12:37:31 -0500 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: References: <46AA0CDF.1030503@campus.iztacala.unam.mx> <46AA1835.2020004@mrc-lmb.cam.ac.uk> Message-ID: <46AA2D5B.9080304@campus.iztacala.unam.mx> Spiros Denaxas wrote: > On 7/27/07, Dave Howorth wrote: >> Mauricio Herrera Cuadra wrote: >>> It really takes about 5 minutes: >>> http://perlsurvey.org/ >> and gives all your personal information including email address to >> anybody who cares to snoop the HTTP POST message! So there's definitely >> no anonymity. I didn't provided any personal information other than my country and birthyear. As for my email, I always use the one I have for all the SPAM I'd like to subscribe to :) > Not to mention that it requires registration (?). Who is behind the > survey ? I am on a number of Perl and Perl related lists and haven't > seen it being mentioned. Registration is rather different from confirming your email (which prevents filling the DB multiple times by spambots/yourself, thus screwing the survey). Who's behind it, its purpose, privacy, etc., please read the FAQ: http://perlsurvey.org/faq/ Cheers, Mauricio. > Spiros > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From Alicia.Amadoz at uv.es Mon Jul 30 11:46:57 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Mon, 30 Jul 2007 17:46:57 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver Message-ID: <1245168492amadoz@uv.es> Hi, i'm trying to run a bioperl script in linux with standaloneblast from a webserver but I have the following error: -------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- I have tried several things to fix it as setting some environment variables both directly through the shell and adding some code in my script with, BEGIN { $ENV{PATH} .= ':/usr/local/blast-2.2.16'; $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; $ENV{BLASTDATADIR} = '/usr/local/data/'; } and with, $local->executable('/usr/local/bin'); my $blast_report = $local->blastall($inputfilename); I have also checked that the webserver has permission of read and execute in all blast executables and directories. But trying all of these things it keeps showing the same error above. Any more idea to solve this problem? My script works well when I use it as a simply script and I've reboot the system several times when changes where performed. Thanks to anyone who will be able to help me! Regards, Alicia From gyang at plantbio.uga.edu Mon Jul 30 16:58:51 2007 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon, 30 Jul 2007 16:58:51 -0400 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml Message-ID: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> I am running remoteblast and using readmethod "xml", I noticed that it is printing the output repeatedly nonstop. It's like in a loop. Did anybody notice this before? Can anybody help me getting out of this? Thanks a lot, Guojun Yang University of Georgia From grafman at graphcomp.com Sun Jul 29 17:08:04 2007 From: grafman at graphcomp.com (Grafman Productions) Date: Sun, 29 Jul 2007 14:08:04 -0700 Subject: [Bioperl-l] Perl 3D OpenGL Message-ID: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> If this posting is inappropriate, please let me know - my apologies. I recently came across an article on BioPerl, and it occurred to me that there might be some need for 3D rendering within your BioPerl project. I released a number of new/updated Perl OpenGL (POGL) modules this year, along with benchmarks that demonstrate that it performs comparably to C. If there's a need for 3D features within BioPerl, and if I can be of any assistance in helping to add such features, I would enjoy the opportunity. From torsten.seemann at infotech.monash.edu.au Mon Jul 30 19:27:46 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 31 Jul 2007 09:27:46 +1000 Subject: [Bioperl-l] error using standaloneblast through webserver In-Reply-To: <1245168492amadoz@uv.es> References: <1245168492amadoz@uv.es> Message-ID: Alicia, > Hi, i'm trying to run a bioperl script in linux with standaloneblast > from a webserver but I have the following error: > -------------------- WARNING --------------------- > MSG: cannot find path to blastall > --------------------------------------------------- > $ENV{BLASTDATADIR} = '/usr/local/data/'; > $ENV{PATH} .= ':/usr/local/blast-2.2.16'; > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; I think the last one (or two) paths should be '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard BLAST installation is where the 'blastall' binary actually lives. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From cjfields at uiuc.edu Mon Jul 30 20:53:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 30 Jul 2007 19:53:45 -0500 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> Message-ID: On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: > I am running remoteblast and using readmethod "xml", I noticed that > it is printing the output repeatedly nonstop. It's like in a loop. > Did anybody notice this before? Can anybody help me getting out of > this? > Thanks a lot, > > > Guojun Yang > University of Georgia Not seeing that using bioperl-live; you may need to update RemoteBlast.pm as this sounds similar to an issue that popped up earlier in the spring. chris From torsten.seemann at infotech.monash.edu.au Tue Jul 31 02:24:34 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 31 Jul 2007 16:24:34 +1000 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> Message-ID: > as this sounds similar to an issue that popped up > earlier in the spring. I could have sworn it was autumn! ;-) -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From Alicia.Amadoz at uv.es Tue Jul 31 06:11:54 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Tue, 31 Jul 2007 12:11:54 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver In-Reply-To: References: Message-ID: <2361686267amadoz@uv.es> Hi, I tried what you suggested and that was it, it works perfectly. Thank you very much. Regards, Alicia > Alicia, > > > Hi, i'm trying to run a bioperl script in linux with standaloneblast > > from a webserver but I have the following error: > > -------------------- WARNING --------------------- > > MSG: cannot find path to blastall > > --------------------------------------------------- > > $ENV{BLASTDATADIR} = '/usr/local/data/'; > > $ENV{PATH} .= ':/usr/local/blast-2.2.16'; > > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; > > I think the last one (or two) paths should be > '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard > BLAST installation is where the 'blastall' binary actually lives. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > > From jay at jays.net Tue Jul 31 08:00:56 2007 From: jay at jays.net (Jay Hannah) Date: Tue, 31 Jul 2007 07:00:56 -0500 Subject: [Bioperl-l] Perl 3D OpenGL In-Reply-To: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> Message-ID: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote: > If this posting is inappropriate, please let me know - my apologies. Not at all. AFAIK this is the perfect place to discuss any contributions you're motivated to make to the BioPerl project. > I recently came across an article on BioPerl, and it occurred to me > that > there might be some need for 3D rendering within your BioPerl project. > > I released a number of new/updated Perl OpenGL (POGL) modules this > year, > along with benchmarks that demonstrate that it performs comparably > to C. > > If there's a need for 3D features within BioPerl, and if I can be > of any > assistance in helping to add such features, I would enjoy the > opportunity. I know nothing about 3D modeling in biology, nor do I hang out with any protein structure folks, but 3D always sounds sexy. -grin- If you're new to bioinformatics (I certainly am) you might want to read this: http://en.wikipedia.org/wiki/Protein_structure Because that's probably where your 3D work would be used. Especially note the "Software" section, where you'll find some of the "competition". :) There's some cool stuff out there. I don't know what all would or wouldn't be time well spent in Perl / BioPerl. HTH, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From cjfields at uiuc.edu Tue Jul 31 12:51:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 31 Jul 2007 11:51:42 -0500 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: <20070731104052.b4b93021@dogwood.plantbio.uga.edu> References: <20070731104052.b4b93021@dogwood.plantbio.uga.edu> Message-ID: <7A2D7E4A-4024-48DB-88C8-063388A98419@uiuc.edu> Make sure to keep responses on the ail list. You might want to run a full install, just in case. If I remember correctly Sendu made some changes a while back in the BLAST-related modules which may be related to this. At the very least install/ upgrade all modules in Bio::Tools::Run. chris On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote: > Thanks, Chris, > But when I replaced the old RemoteBlast.pm with the new one, I got > "can't locate the object method "retrieve_parameter"". Does this > mean I need to install something else? > Guojun > > ----- Original Message ----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast > with xml > > >>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: >>>> I am running remoteblast and using readmethod "xml", I noticed that >>> it is printing the output repeatedly nonstop. It's like in a loop. >>> Did anybody notice this before? Can anybody help me getting out of >>> this? >>> Thanks a lot, >>> >>> >>> Guojun Yang >>> University of Georgia >>> Not seeing that using bioperl-live; you may need to update >> RemoteBlast.pm as this sounds similar to an issue that popped up >> earlier in the spring. >>> chris >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Jul 31 22:15:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 31 Jul 2007 21:15:45 -0500 Subject: [Bioperl-l] Perl 3D OpenGL In-Reply-To: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> Message-ID: <04BCAD9E-CC25-4F0A-85B1-FBA91C64CE7D@uiuc.edu> On Jul 31, 2007, at 7:00 AM, Jay Hannah wrote: > On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote: >> If this posting is inappropriate, please let me know - my apologies. > > Not at all. AFAIK this is the perfect place to discuss any > contributions you're motivated to make to the BioPerl project. > >> I recently came across an article on BioPerl, and it occurred to me >> that >> there might be some need for 3D rendering within your BioPerl >> project. >> >> I released a number of new/updated Perl OpenGL (POGL) modules this >> year, >> along with benchmarks that demonstrate that it performs comparably >> to C. >> >> If there's a need for 3D features within BioPerl, and if I can be >> of any >> assistance in helping to add such features, I would enjoy the >> opportunity. > > I know nothing about 3D modeling in biology, nor do I hang out with > any protein structure folks, but 3D always sounds sexy. -grin- > > If you're new to bioinformatics (I certainly am) you might want to > read this: > > http://en.wikipedia.org/wiki/Protein_structure > > Because that's probably where your 3D work would be used. Especially > note the "Software" section, where you'll find some of the > "competition". :) > > There's some cool stuff out there. I don't know what all would or > wouldn't be time well spent in Perl / BioPerl. > > HTH, > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah I agree that protein structure is the best place for something like this. It's a wide open area as far as I'm concerned; in fact I would say that Bio::Structure is getting pretty dated, so if anyone wants to take it over, refactor the code, and so on I don't have a problem. chris From dmessina at wustl.edu Sun Jul 1 01:38:48 2007 From: dmessina at wustl.edu (David Messina) Date: Sun, 1 Jul 2007 00:38:48 -0500 Subject: [Bioperl-l] svn auto-properties [was Re: First cut svn repository] In-Reply-To: <46869226.70203@sheffield.ac.uk> References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <5764264E-5C40-4C9E-B1C9-A70628AC1DD0@uiuc.edu> <18051.44281.831316.749586@almost.alerce.com> <18051.61992.627473.323346@almost.alerce.com> <4684AF3D.5090907@sheffield.ac.uk> <843758CD-9C5B-4DDA-9FF4-B90AA225BDB3@wustl.edu> <468628AC.9060200@sheffield.ac.uk> <461F64B9-87FD-458A-8945-8238E7076109@wustl.edu> <46869226.70203@sheffield.ac.uk> Message-ID: <3164A6E3-77CF-4E61-9609-1408768862B1@wustl.edu> > [Nath] > I think the list of seq formats recognised by Bioperl in Bio::SeqIO > and > Bio::AlignIO would be a good start. As these are likely to be the ones > that are sensitive to file format recognition and thus could break > tests > if renamed. Sounds good to me. I will do a quick tour of the rest of the repo looking for other common or important file extensions, but I don't expect there to be many if any. > [still Nath] > I think a lot of people have used "." in file names as an > alternative to > a space. I think it would be beneficial to use an underscore "_" in > these cases and leave the "." to represent the beginning of the file > extension. That's a great idea. > [Chris] > Do we need to define every filetype extension, or can there be a > fallback (eg if it isn't on the list or has no extension it's plain > text)? For every file that's added, svn takes a peek to see if it's human- readable. If not, it's tagged with the generic MIME type application/ octet-stream. (It does this so it knows not to try to do diffs and merges on a binary file.) So the default for a human-readable file is no MIME type, which I believe is essentially the same thing as text/plain. And then regardless of the outcome of svn's peek, any matching auto- props are then applied, overriding svn's choice. So if we don't define every extension, I think we'll be fine. It'd be nice to have everything tagged with a MIME type, though. For one thing, Apache will use it to do the right thing when people browse the repo over the web. And two, because metadata is cool. :) One more thing: in the course of reading up on this, I learned that my earlier expectation about multiple auto-prop matches was incorrect. It's true that multiple unrelated matches means that multiple properties are set on the file. But when a file matches multiple *conflicting* auto-property patterns, there's no telling which value it'll get. Dave From hartzell at alerce.com Sun Jul 1 12:29:29 2007 From: hartzell at alerce.com (George Hartzell) Date: Sun, 1 Jul 2007 09:29:29 -0700 Subject: [Bioperl-l] First cut svn repository In-Reply-To: References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <4683A7D1.8070403@sendu.me.uk> <18051.48684.996884.134046@almost.alerce.com> <4683C385.3050904@sendu.me.uk> <18051.63674.685297.426813@almost.alerce.com> <18052.3946.224905.415905@almost.alerce.com> <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net> Message-ID: <18055.54889.677775.868974@almost.alerce.com> Hilmar Lapp writes: > It turns out that both files are also present on the release-0-9-3, > bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add > > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ > HUMBETGLOA.fasta > > to the post-processing commands. > [...] Will do. Thanks for working out the incantations! g. From cjfields at uiuc.edu Mon Jul 2 09:26:06 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:26:06 -0500 Subject: [Bioperl-l] test data Message-ID: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> I am planing on adding test data to cvs for eutils and have run across some stuff in bugzilla that needs to be added as well. Should we, as convention, start adding data sequestered to a fold with the test name, within t/data? This might make life easier in the long run (keep track of files, get rid of old files, etc), and may make it easier for wrapping up the correct data with tests if we start submitting single module CPAN updates. chris From cjfields at uiuc.edu Mon Jul 2 09:52:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:52:27 -0500 Subject: [Bioperl-l] test data In-Reply-To: <468901C1.8020505@sendu.me.uk> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> Message-ID: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Chris Fields wrote: >> I am planing on adding test data to cvs for eutils and have run >> across some stuff in bugzilla that needs to be added as well. >> Should we, as convention, start adding data sequestered to a fold >> with the test name, within t/data? > > I'd actually argue that this shouldn't be done: data is sometimes > reused amongst multiple different test scripts, and when looking > for data to reuse its easier to spot it in a single directory > compared to searching through multiple directories. > > >> This might make life easier in the long run (keep track of files, >> get rid of old files, etc), and may make it easier for wrapping up >> the correct data with tests if we start submitting single module >> CPAN updates. > > I don't think that will be an issue. The automated process would > read the test script and see what input files it uses, copying > those into the archive. So, just be sure to standardise on using > test_input_file() to make that possible. > > > That said, I wouldn't mind especially either way. Just don't do it > now, since test script names (and therefore the name of the > directory you'd want to store the input files in) might all change. > > > In fact we can imagine that we have a test script t/ > BioZombieKitten.t which stores its test data in t/data/ > BioZombieKitten/input.file but the script gets the path to this > file by: > my $input_file = test_input_file('input.file'); > > test_input_file() is then implemented to look for the file in the > subdir of data corresponding to the script name if we're dealing > with the 900-modules-in-a-package checkout-type situation, but just > in t/data if we're in the one-module-in-a-package situation. > > In any case, things will be most flexible if you drop files > directly into t/data for now and reference them without any subdirs > in the call to test_input_file(). Fine by me, I just find it very cluttered. BioZombieKitten?!? chris From bix at sendu.me.uk Mon Jul 2 10:00:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 15:00:37 +0100 Subject: [Bioperl-l] test data In-Reply-To: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> Message-ID: <46890505.1070707@sendu.me.uk> Chris Fields wrote: > On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Fine by me, I just find it very cluttered. Yes, I agree. I also wish we had a decent naming convention for files. (Ie. it would be nice to have a good idea what a file was for without having to study the test script that uses it.) > BioZombieKitten?!? I get Bio/perl/ and Bio/ware/ confused in my head ;) http://forums.bioware.com/viewtopic.html?topic=562916&forum=84 From bix at sendu.me.uk Mon Jul 2 09:46:41 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 14:46:41 +0100 Subject: [Bioperl-l] test data In-Reply-To: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> Message-ID: <468901C1.8020505@sendu.me.uk> Chris Fields wrote: > I am planing on adding test data to cvs for eutils and have run across > some stuff in bugzilla that needs to be added as well. > > Should we, as convention, start adding data sequestered to a fold with > the test name, within t/data? I'd actually argue that this shouldn't be done: data is sometimes reused amongst multiple different test scripts, and when looking for data to reuse its easier to spot it in a single directory compared to searching through multiple directories. > This might make life easier in the long > run (keep track of files, get rid of old files, etc), and may make it > easier for wrapping up the correct data with tests if we start > submitting single module CPAN updates. I don't think that will be an issue. The automated process would read the test script and see what input files it uses, copying those into the archive. So, just be sure to standardise on using test_input_file() to make that possible. That said, I wouldn't mind especially either way. Just don't do it now, since test script names (and therefore the name of the directory you'd want to store the input files in) might all change. In fact we can imagine that we have a test script t/BioZombieKitten.t which stores its test data in t/data/BioZombieKitten/input.file but the script gets the path to this file by: my $input_file = test_input_file('input.file'); test_input_file() is then implemented to look for the file in the subdir of data corresponding to the script name if we're dealing with the 900-modules-in-a-package checkout-type situation, but just in t/data if we're in the one-module-in-a-package situation. In any case, things will be most flexible if you drop files directly into t/data for now and reference them without any subdirs in the call to test_input_file(). From hlapp at gmx.net Mon Jul 2 16:02:37 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 16:02:37 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18054.63942.316904.413911@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: Just FYI, after applying the changes I've been sending, I was able to check out the repository in its entirety. -hilmar On Jun 30, 2007, at 8:48 PM, George Hartzell wrote: > > There's a second cut at the subversion repository. I've done a better > job of setting svn:keywords and svn:eol-style on various files. The > defaults were more cautious and I used an auto-props files based on > the wiki version. > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2 > > The old repository's still around as > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1 > > I renamed it so that people would work with it by mistake. If, for > some hard-to-imagine reason, you have a working copy that you want to > run against it, you should be able to do an svn switch --relocate on > your working copy and be back in shape. In fact, it might be a good > time to give it a try.... > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From wrp at virginia.edu Mon Jul 2 16:08:04 2007 From: wrp at virginia.edu (William R. Pearson) Date: Mon, 2 Jul 2007 16:08:04 -0400 Subject: [Bioperl-l] Course: Computational and Comparative Genomics Message-ID: <4B3F66D7-CF05-4CD1-A148-272B4B56FBD4@virginia.edu> Course announcement - Application deadline, July 15, 2007 ================================================================ Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS November 7 - 13, 200 Application Deadline: July 15, 2007 INSTRUCTORS: Pearson, William, Ph.D., University of Virginia, Charlottesville, VA Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of Prussia, PA Beyond BLAST and FASTA - Alignment: from proteins to genomes - This course presents a comprehensive overview of the theory and practice of computational methods for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment, and genome scale alignment. Additional topics include gene finding, dentifying signals in unaligned sequences, integration of genetic and sequence information in biological databases. The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course makes extensive use of local WWW pages to present problem sets and the computing tools to solve them. Students use Windows and Mac workstations attached to a UNIX server. The course is designed for biologists seeking advanced training in biological sequence analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis and comparative genomics. The primary focus of the Computational and Comparative Genomics Course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and developing new algorithms. Cold Spring Harbor also offers a "Programming for Biology" course, which focuses more on software development. For additional information and the lecture schedule and problem sets for the 2006 course, see: http://fasta.bioch.virginia.edu/cshl06 ================================================================ To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/courses/courseapplication.asp ================================================================ Bill Pearson From niels at genomics.dk Mon Jul 2 16:45:07 2007 From: niels at genomics.dk (Niels Larsen) Date: Mon, 02 Jul 2007 22:45:07 +0200 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <468963D3.3000007@genomics.dk> I write hoping someone could show me how to create a PrimarySeq object without parsing features and all first. The lines below return "Can't locate object method "next_seq" via package "Bio::PrimarySeq" at ./tst2 line 16." whereas calling Bio::SeqIO-> gives no error, but a too big object. The GenBank record after the __END__ is the "1.gb" file. I could not find out how from the tutorial or the Bio::PrimarySeq description. Niels L #!/usr/bin/env perl use strict; use warnings FATAL => qw ( all ); use Data::Dumper; use Bio::Seq; use Bio::SeqIO; my ( $seq_h, $seq ); $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", -format => 'genbank' ); $seq = $seq_h->next_seq(); # print Dumper( $seq ); __END__ LOCUS X60065 9 bp mRNA linear MAM 14-NOV-2006 DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. ACCESSION X60065 REGION: 1..9 VERSION X60065.1 GI:5 KEYWORDS beta-2 glycoprotein I. SOURCE Bos taurus (cattle) ORGANISM Bos taurus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bos. REFERENCE 1 AUTHORS Bendixen,E., Halkier,T., Magnusson,S., Sottrup-Jensen,L. and Kristensen,T. TITLE Complete primary structure of bovine beta 2-glycoprotein I: localization of the disulfide bridges JOURNAL Biochemistry 31 (14), 3611-3617 (1992) PUBMED 1567819 REFERENCE 2 (bases 1 to 9) AUTHORS Kristensen,T. TITLE Direct Submission JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of Mol Biology, University of Aarhus, C F Mollers Alle 130, DK-8000 Aarhus C, DENMARK FEATURES Location/Qualifiers source 1..9 /organism="Bos taurus" /mol_type="mRNA" /db_xref="taxon:9913" /clone="pBB2I" /tissue_type="liver" gene <1..>9 /gene="beta-2-gpI" CDS <1..>9 /gene="beta-2-gpI" /codon_start=1 /product="beta-2-glycoprotein I" /protein_id="CAA42669.1" /db_xref="GI:6" /db_xref="GOA:P17690" /db_xref="UniProtKB/Swiss-Prot:P17690" /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT DASDVKPC" sig_peptide <1..>9 /gene="beta-2-gpI" ORIGIN 1 ccagcgctc // From Kevin.M.Brown at asu.edu Mon Jul 2 17:35:12 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 2 Jul 2007 14:35:12 -0700 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <468963D3.3000007@genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> Message-ID: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Start by having a look at the following link: http://bioperl.org/cgi-bin/deob_interface.cgi SeqIO is how one reads or writes sequences to/from files. Bio::PrimarySeq is just an object that holds information about a sequence obtained from a file. As for how to parse a Genbank file into a list of features: $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); while (my $seq = $file->next_seq()) { @features = $seq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { # @sorted_features holds all the Bio::PrimarySeq features obtained from the genbank file push @sorted_features, $f; } } } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Niels Larsen > Sent: Monday, July 02, 2007 1:45 PM > Cc: bioperl-l List > Subject: [Bioperl-l] simple PrimarySeq question > > I write hoping someone could show me how to create a > PrimarySeq object without parsing features and all first. The > lines below return > > "Can't locate object method "next_seq" via package > "Bio::PrimarySeq" at ./tst2 line 16." > > whereas calling Bio::SeqIO-> gives no error, but a too big object. > The GenBank record after the __END__ is the "1.gb" file. I > could not find out how from the tutorial or the > Bio::PrimarySeq description. > > Niels L > > > #!/usr/bin/env perl > > use strict; > use warnings FATAL => qw ( all ); > > use Data::Dumper; > > use Bio::Seq; > use Bio::SeqIO; > > my ( $seq_h, $seq ); > > $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => > 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", > -format => 'genbank' ); > > $seq = $seq_h->next_seq(); > > # print Dumper( $seq ); > > __END__ > > LOCUS X60065 9 bp mRNA linear > MAM 14-NOV-2006 > DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. > ACCESSION X60065 REGION: 1..9 > VERSION X60065.1 GI:5 > KEYWORDS beta-2 glycoprotein I. > SOURCE Bos taurus (cattle) > ORGANISM Bos taurus > Eukaryota; Metazoa; Chordata; Craniata; > Vertebrata; Euteleostomi; > Mammalia; Eutheria; Laurasiatheria; > Cetartiodactyla; Ruminantia; > Pecora; Bovidae; Bovinae; Bos. > REFERENCE 1 > AUTHORS Bendixen,E., Halkier,T., Magnusson,S., > Sottrup-Jensen,L. and > Kristensen,T. > TITLE Complete primary structure of bovine beta > 2-glycoprotein I: > localization of the disulfide bridges > JOURNAL Biochemistry 31 (14), 3611-3617 (1992) > PUBMED 1567819 > REFERENCE 2 (bases 1 to 9) > AUTHORS Kristensen,T. > TITLE Direct Submission > JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of > Mol Biology, > University of Aarhus, C F Mollers Alle 130, > DK-8000 Aarhus C, > DENMARK > FEATURES Location/Qualifiers > source 1..9 > /organism="Bos taurus" > /mol_type="mRNA" > /db_xref="taxon:9913" > /clone="pBB2I" > /tissue_type="liver" > gene <1..>9 > /gene="beta-2-gpI" > CDS <1..>9 > /gene="beta-2-gpI" > /codon_start=1 > /product="beta-2-glycoprotein I" > /protein_id="CAA42669.1" > /db_xref="GI:6" > /db_xref="GOA:P17690" > /db_xref="UniProtKB/Swiss-Prot:P17690" > > /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI > > VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT > > ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN > > SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN > > PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER > > VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT > DASDVKPC" > sig_peptide <1..>9 > /gene="beta-2-gpI" > ORIGIN > 1 ccagcgctc > // > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From niels at genomics.dk Mon Jul 2 20:41:24 2007 From: niels at genomics.dk (niels at genomics.dk) Date: Tue, 3 Jul 2007 02:41:24 +0200 (CEST) Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Message-ID: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Kevin, Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO gets entries from file, and from those large parsed entries I can get a simplified primary_seq object. But the SeqIO object includes feature and annotation objects etc that takes time to make, and I wish to know if there is a way to get a primari_seq object without this overhead. I apologize if I overlooked it in the docs. Niels > Start by having a look at the following link: > http://bioperl.org/cgi-bin/deob_interface.cgi > > SeqIO is how one reads or writes sequences to/from files. > Bio::PrimarySeq is just an object that holds information about a > sequence obtained from a file. > > As for how to parse a Genbank file into a list of features: > > $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); > while (my $seq = $file->next_seq()) > { > @features = $seq->all_SeqFeatures; > # sort features by their primary tags > for my $f (@features) > { > my $tag = $f->primary_tag; > if ($tag eq 'CDS') > { > # @sorted_features holds all the Bio::PrimarySeq > features obtained from the genbank file > push @sorted_features, $f; > } > } > } > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Niels Larsen >> Sent: Monday, July 02, 2007 1:45 PM >> Cc: bioperl-l List >> Subject: [Bioperl-l] simple PrimarySeq question >> >> I write hoping someone could show me how to create a >> PrimarySeq object without parsing features and all first. The >> lines below return >> >> "Can't locate object method "next_seq" via package >> "Bio::PrimarySeq" at ./tst2 line 16." >> >> whereas calling Bio::SeqIO-> gives no error, but a too big object. >> The GenBank record after the __END__ is the "1.gb" file. I >> could not find out how from the tutorial or the >> Bio::PrimarySeq description. >> >> Niels L >> >> >> #!/usr/bin/env perl >> >> use strict; >> use warnings FATAL => qw ( all ); >> >> use Data::Dumper; >> >> use Bio::Seq; >> use Bio::SeqIO; >> >> my ( $seq_h, $seq ); >> >> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >> -format => 'genbank' ); >> >> $seq = $seq_h->next_seq(); >> >> # print Dumper( $seq ); >> >> __END__ >> >> LOCUS X60065 9 bp mRNA linear >> MAM 14-NOV-2006 >> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >> ACCESSION X60065 REGION: 1..9 >> VERSION X60065.1 GI:5 >> KEYWORDS beta-2 glycoprotein I. >> SOURCE Bos taurus (cattle) >> ORGANISM Bos taurus >> Eukaryota; Metazoa; Chordata; Craniata; >> Vertebrata; Euteleostomi; >> Mammalia; Eutheria; Laurasiatheria; >> Cetartiodactyla; Ruminantia; >> Pecora; Bovidae; Bovinae; Bos. >> REFERENCE 1 >> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >> Sottrup-Jensen,L. and >> Kristensen,T. >> TITLE Complete primary structure of bovine beta >> 2-glycoprotein I: >> localization of the disulfide bridges >> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >> PUBMED 1567819 >> REFERENCE 2 (bases 1 to 9) >> AUTHORS Kristensen,T. >> TITLE Direct Submission >> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >> Mol Biology, >> University of Aarhus, C F Mollers Alle 130, >> DK-8000 Aarhus C, >> DENMARK >> FEATURES Location/Qualifiers >> source 1..9 >> /organism="Bos taurus" >> /mol_type="mRNA" >> /db_xref="taxon:9913" >> /clone="pBB2I" >> /tissue_type="liver" >> gene <1..>9 >> /gene="beta-2-gpI" >> CDS <1..>9 >> /gene="beta-2-gpI" >> /codon_start=1 >> /product="beta-2-glycoprotein I" >> /protein_id="CAA42669.1" >> /db_xref="GI:6" >> /db_xref="GOA:P17690" >> /db_xref="UniProtKB/Swiss-Prot:P17690" >> >> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >> >> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >> >> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >> >> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >> >> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >> >> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >> DASDVKPC" >> sig_peptide <1..>9 >> /gene="beta-2-gpI" >> ORIGIN >> 1 ccagcgctc >> // >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Mon Jul 2 22:36:19 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 22:36:19 -0400 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Message-ID: <84F5C120-FE0B-472D-8F1B-026AD238E959@gmx.net> Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have examples for what you want to do: use Bio::SeqIO; # usually you won't instantiate this yourself - a SeqIO object - # you will have one already my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank"); my $builder = $seqin->sequence_builder(); # if you need only sequence, id, and description (e.g. for # conversion to FASTA format): $builder->want_none(); $builder->add_wanted_slot('display_id','desc','seq'); # if you want everything except the sequence and features $builder->want_all(1); # this is the default if it's untouched $builder->add_unwanted_slot('seq','features'); Let us know if that doesn't answer your question. Note that this is currently only implemented for Genbank format. -hilmar On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote: > Kevin, > > Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO > gets entries from file, and from those large parsed entries I can > get a > simplified primary_seq object. But the SeqIO object includes feature > and annotation objects etc that takes time to make, and I wish to know > if there is a way to get a primari_seq object without this overhead. I > apologize if I overlooked it in the docs. > > Niels > > > > >> Start by having a look at the following link: >> http://bioperl.org/cgi-bin/deob_interface.cgi >> >> SeqIO is how one reads or writes sequences to/from files. >> Bio::PrimarySeq is just an object that holds information about a >> sequence obtained from a file. >> >> As for how to parse a Genbank file into a list of features: >> >> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); >> while (my $seq = $file->next_seq()) >> { >> @features = $seq->all_SeqFeatures; >> # sort features by their primary tags >> for my $f (@features) >> { >> my $tag = $f->primary_tag; >> if ($tag eq 'CDS') >> { >> # @sorted_features holds all the Bio::PrimarySeq >> features obtained from the genbank file >> push @sorted_features, $f; >> } >> } >> } >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Niels Larsen >>> Sent: Monday, July 02, 2007 1:45 PM >>> Cc: bioperl-l List >>> Subject: [Bioperl-l] simple PrimarySeq question >>> >>> I write hoping someone could show me how to create a >>> PrimarySeq object without parsing features and all first. The >>> lines below return >>> >>> "Can't locate object method "next_seq" via package >>> "Bio::PrimarySeq" at ./tst2 line 16." >>> >>> whereas calling Bio::SeqIO-> gives no error, but a too big object. >>> The GenBank record after the __END__ is the "1.gb" file. I >>> could not find out how from the tutorial or the >>> Bio::PrimarySeq description. >>> >>> Niels L >>> >>> >>> #!/usr/bin/env perl >>> >>> use strict; >>> use warnings FATAL => qw ( all ); >>> >>> use Data::Dumper; >>> >>> use Bio::Seq; >>> use Bio::SeqIO; >>> >>> my ( $seq_h, $seq ); >>> >>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >>> -format => 'genbank' ); >>> >>> $seq = $seq_h->next_seq(); >>> >>> # print Dumper( $seq ); >>> >>> __END__ >>> >>> LOCUS X60065 9 bp mRNA linear >>> MAM 14-NOV-2006 >>> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >>> ACCESSION X60065 REGION: 1..9 >>> VERSION X60065.1 GI:5 >>> KEYWORDS beta-2 glycoprotein I. >>> SOURCE Bos taurus (cattle) >>> ORGANISM Bos taurus >>> Eukaryota; Metazoa; Chordata; Craniata; >>> Vertebrata; Euteleostomi; >>> Mammalia; Eutheria; Laurasiatheria; >>> Cetartiodactyla; Ruminantia; >>> Pecora; Bovidae; Bovinae; Bos. >>> REFERENCE 1 >>> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >>> Sottrup-Jensen,L. and >>> Kristensen,T. >>> TITLE Complete primary structure of bovine beta >>> 2-glycoprotein I: >>> localization of the disulfide bridges >>> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >>> PUBMED 1567819 >>> REFERENCE 2 (bases 1 to 9) >>> AUTHORS Kristensen,T. >>> TITLE Direct Submission >>> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >>> Mol Biology, >>> University of Aarhus, C F Mollers Alle 130, >>> DK-8000 Aarhus C, >>> DENMARK >>> FEATURES Location/Qualifiers >>> source 1..9 >>> /organism="Bos taurus" >>> /mol_type="mRNA" >>> /db_xref="taxon:9913" >>> /clone="pBB2I" >>> /tissue_type="liver" >>> gene <1..>9 >>> /gene="beta-2-gpI" >>> CDS <1..>9 >>> /gene="beta-2-gpI" >>> /codon_start=1 >>> /product="beta-2-glycoprotein I" >>> /protein_id="CAA42669.1" >>> /db_xref="GI:6" >>> /db_xref="GOA:P17690" >>> /db_xref="UniProtKB/Swiss-Prot:P17690" >>> >>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >>> >>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >>> >>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >>> >>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >>> >>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >>> >>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >>> DASDVKPC" >>> sig_peptide <1..>9 >>> /gene="beta-2-gpI" >>> ORIGIN >>> 1 ccagcgctc >>> // >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ewijaya at gmail.com Tue Jul 3 02:56:30 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 14:56:30 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at gmail.com Tue Jul 3 03:00:16 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 15:00:16 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707030000t5ab77608x264d49125255a6d1@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at i2r.a-star.edu.sg Tue Jul 3 02:35:12 2007 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 3 Jul 2007 14:35:12 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net> Message-ID: <3ACF03E372996C4EACD542EA8A05E66A26EB85@mailbe01.teak.local.net> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward ------------ Institute For Infocomm Research - Disclaimer -------------This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.-------------------------------------------------------- From lstein at cshl.edu Tue Jul 3 10:41:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 3 Jul 2007 10:40:26 -0401 Subject: [Bioperl-l] Problem with GD.pm version 2.35 In-Reply-To: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> References: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Message-ID: <6dce9a0b0707030741r52b8d0beq757a8faf982e1f2f@mail.gmail.com> This happens when there is a mismatch between the compiled (.so) portion of GD and the perl (.pm) version. Typically it occurs when you have installed GD incorrectly by, e.g., copying the .pm file into position rather than using the make file. Solution: Uninstall old versions of GD by manually finding all occurrences of GD.so and GD.pm and removing them. Then reinstall the correct way. Lincoln On 7/3/07, Edward Wijaya wrote: > > Dear all, > I was trying to perform check with this command: > > $ perl -MGD -e 'print $GD::VERSION'; > > And it gave: > > GD object version 2.32 does not match $GD::VERSION 2.35 at > /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. > Compilation failed in require. > BEGIN failed--compilation aborted. > > Similarly my script that uses GD.pm doesn't execute. > > > I have installed the latest version of libgd version 2.0.35 downloaded > from > http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 > > Can anybody suggest how can I resolve my problem? > > This is my Perl version: > This is perl, v5.8.8 built for i386-linux-thread-multi > > -- > Edward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Jul 4 01:45:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 00:45:16 -0500 Subject: [Bioperl-l] genbank2gff3 - Name attribute? Message-ID: I noticed that genbank2gff3.pl doesn't have an explicitly defined way of converting the gene/locus/etc name to a Name tag (for, say, GBrowse). Any particular reason? Should I stick with GFF2 for now? chris From bix at sendu.me.uk Wed Jul 4 06:00:31 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 04 Jul 2007 11:00:31 +0100 Subject: [Bioperl-l] Splitting Bioperl Message-ID: <468B6FBF.1070708@sendu.me.uk> To summarise some previous threads: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/focus=15409 # Bioperl is currently one monolithic distribution of ~900 modules # There is some desire to split it up into smaller functional groups # There are some problems with that proposal # An extreme variant of that proposal is to make the groups individual modules Following this discussion: http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html (especially Adam Kennedy's postings of 4/07, soon to appear in that archive), the extreme variant doesn't seem like a good idea. I'm now suggesting that Steve's original split idea, as modified/expanded by Adam's driver and other ideas, is the best choice. The problems I previously identified can be solved in the same way they were solved in my extreme variant: the splits are done by Build.PL automation working on a single repository/code-base, not by splitting things up at the repository level. As I see it, the way forward now is for someone interested enough to decide on the specifics of how things will be split and offer them up to the group for discussion. I don't mean vague possibilities of what might work as a split, but rather some real thought should go into it to make sure the split makes sense and will actually work in practice. Following that, the splits can be implemented by some automated dist action of Build.PL. If there isn't sufficient interest to make this happen, I don't see that as a terrible thing. There are benefits to keeping Bioperl monolithic, and some of the problems (eg. lack of updates) can be solved without changing its nature. From cjfields at uiuc.edu Wed Jul 4 10:53:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 09:53:45 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <468B6FBF.1070708@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> Message-ID: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> On Jul 4, 2007, at 5:00 AM, Sendu Bala wrote: > To summarise some previous threads: > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/ > focus=15409 > > # Bioperl is currently one monolithic distribution of ~900 modules > # There is some desire to split it up into smaller functional groups > # There are some problems with that proposal > # An extreme variant of that proposal is to make the groups individual > modules > > > Following this discussion: > http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html > (especially Adam Kennedy's postings of 4/07, soon to appear in that > archive), the extreme variant doesn't seem like a good idea. brian d foy made some sound arguments against it as well. > I'm now suggesting that Steve's original split idea, as > modified/expanded by Adam's driver and other ideas, is the best > choice. > The problems I previously identified can be solved in the same way > they > were solved in my extreme variant: the splits are done by Build.PL > automation working on a single repository/code-base, not by splitting > things up at the repository level. > > As I see it, the way forward now is for someone interested enough to > decide on the specifics of how things will be split and offer them > up to > the group for discussion. I don't mean vague possibilities of what > might > work as a split, but rather some real thought should go into it to > make > sure the split makes sense and will actually work in practice. We've already identified a few (SearchIO, Tools, GBrowse-related, etc). ... > If there isn't sufficient interest to make this happen, I don't see > that > as a terrible thing. There are benefits to keeping Bioperl monolithic, > and some of the problems (eg. lack of updates) can be solved without > changing its nature. If so, proposals that solve this problem need to be made as well. If we stay monolithic, then here's mine: we start having fixed, regularly timed dev releases like Parrot, monthly or bimonthly (quite common on CPAN), with brief release reports on which bugs have been fixed, code has been added, so on. Not every bug has to be fixed per dev release; if that were true there would never be releases for some of the XML parser packages. No RCs for dev releases (it's a dev release!). These would be 1.x.y. We can then, every once in a while, have a bug-squashing session, hackathon, etc, and have regular non-dev release (1.x) that all core devs accept and that passes a particular milestone. As for the advantage of a split approach, as mentioned previously it is to focus modules/tests/scripts into groups with related functions. Even just splitting off ones with external reqs (XML parsers, GD, etc) into an 'aux' release would be an advantage, as it doesn't confront a new user with the burden of installing a large list of dependencies, some of which may be complicated for a perl newbie to either install from scratch (DBD::mysql, GD) or to get the latest bug-fixed prereq release for their OS (the recent debacle with XML::SAX::Expat issues come to mind, which wasn't immediately available for win32 as a PPM). I'm fairly open to any approach as long as it's reasonably though out, though I am admittedly a bit biased towards the split approach. I do think some change is in order; I worry about there ever being a 1.6 release at this point. chris From davila at ioc.fiocruz.br Wed Jul 4 13:11:20 2007 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Wed, 04 Jul 2007 14:11:20 -0300 Subject: [Bioperl-l] ESTs in EST format Message-ID: <468BD4B8.5050105@ioc.fiocruz.br> Dear All, I am trying to get all ESTs from a given species (eg: Trypanosoma brucei) from Genbank in EST format (eg: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucest&id=10280980)... while using Entrez I can "display" individual EST entries in EST format, this "EST format" is not an option in the main "display" menu for batch download ... I dont see the EST format listed (http://www.bioperl.org/wiki/Sequence_formats) among the ones that SeqIO deal with, so wonder there would another BioPerl module to do this ? any tips, would be greatly appreciated ;-) Kindest regards, Alberto From jason at bioperl.org Wed Jul 4 13:52:59 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 10:52:59 -0700 Subject: [Bioperl-l] ESTs in EST format In-Reply-To: <468BD4B8.5050105@ioc.fiocruz.br> References: <468BD4B8.5050105@ioc.fiocruz.br> Message-ID: Currently we don't support this format as far as I know it isn't a published standard nor is it a format that you NCBI distributes this data in flat format for (i.e. genbank dumps). Is there any reason why you can't get what you need from the GenBank format? http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? db=nucest&qty=1&c_start=1&list_uids=10280980&uids=&dopt=gb -jason On Jul 4, 2007, at 10:11 AM, Alberto Davila wrote: > Dear All, > > I am trying to get all ESTs from a given species (eg: Trypanosoma > brucei) from Genbank in EST format (eg: > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > db=nucest&id=10280980)... > while using Entrez I can "display" individual EST entries in EST > format, > this "EST format" is not an option in the main "display" menu for > batch > download ... > > I dont see the EST format listed > (http://www.bioperl.org/wiki/Sequence_formats) among the ones that > SeqIO > deal with, so wonder there would another BioPerl module to do > this ? any > tips, would be greatly appreciated ;-) > > Kindest regards, Alberto > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From dmessina at wustl.edu Wed Jul 4 14:37:22 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 4 Jul 2007 13:37:22 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> Message-ID: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > we start having fixed, > regularly timed dev releases like Parrot, monthly or bimonthly (quite > common on CPAN), with brief release reports on which bugs have been > fixed, code has been added, so on. Not every bug has to be fixed per > dev release; if that were true there would never be releases for some > of the XML parser packages. No RCs for dev releases (it's a dev > release!). These would be 1.x.y. We can then, every once in a > while, have a bug-squashing session, hackathon, etc, and have regular > non-dev release (1.x) that all core devs accept and that passes a > particular milestone. Regardless of whether we split or don't, I think these ideas of adding a little more structure to BioPerl's development cycles -- especially having bug-squashing and hacking sessions, where we all band together and commit some time to cranking through a bunch of to- dos -- would be beneficial, particularly as a means to keeping a certain basal level of momentum in BioPerl. Dave From jason at bioperl.org Wed Jul 4 15:45:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 12:45:29 -0700 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I definitely agree - we can live up to the unstable "living on the edge" nature of dev releases a bit more perhaps? On Jul 4, 2007, at 11:37 AM, David Messina wrote: > > On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > >> we start having fixed, >> regularly timed dev releases like Parrot, monthly or bimonthly (quite >> common on CPAN), with brief release reports on which bugs have been >> fixed, code has been added, so on. Not every bug has to be fixed per >> dev release; if that were true there would never be releases for some >> of the XML parser packages. No RCs for dev releases (it's a dev >> release!). These would be 1.x.y. We can then, every once in a >> while, have a bug-squashing session, hackathon, etc, and have regular >> non-dev release (1.x) that all core devs accept and that passes a >> particular milestone. > > > Regardless of whether we split or don't, I think these ideas of > adding a little more structure to BioPerl's development cycles -- > especially having bug-squashing and hacking sessions, where we all > band together and commit some time to cranking through a bunch of to- > dos -- would be beneficial, particularly as a means to keeping a > certain basal level of momentum in BioPerl. > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Wed Jul 4 16:54:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 15:54:14 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I think what's partially responsible for slowing down releases is the expectation that each dev release is supposed to have all bugs fixed, work for every OS, etc. In other words, act like a stable release. A developer release by nature is living on the edge, so why not have regular dev releases? We keep telling users to update to using bioperl-live whenever something breaks, anyway. We could decide to split stuff off along the way into more 'stable' sections if there were more demand for it, and have the more API-volatile code (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the 'dev' tag until we feel it's ready for prime time. chris On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > I definitely agree - we can live up to the unstable "living on the > edge" nature of dev releases a bit more perhaps? > > > On Jul 4, 2007, at 11:37 AM, David Messina wrote: > >> >> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: >> >>> we start having fixed, >>> regularly timed dev releases like Parrot, monthly or bimonthly >>> (quite >>> common on CPAN), with brief release reports on which bugs have been >>> fixed, code has been added, so on. Not every bug has to be fixed >>> per >>> dev release; if that were true there would never be releases for >>> some >>> of the XML parser packages. No RCs for dev releases (it's a dev >>> release!). These would be 1.x.y. We can then, every once in a >>> while, have a bug-squashing session, hackathon, etc, and have >>> regular >>> non-dev release (1.x) that all core devs accept and that passes a >>> particular milestone. >> >> >> Regardless of whether we split or don't, I think these ideas of >> adding a little more structure to BioPerl's development cycles -- >> especially having bug-squashing and hacking sessions, where we all >> band together and commit some time to cranking through a bunch of to- >> dos -- would be beneficial, particularly as a means to keeping a >> certain basal level of momentum in BioPerl. >> >> Dave >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Thu Jul 5 04:09:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 09:09:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: <468CA721.4020804@sheffield.ac.uk> Chris Fields wrote: > I think what's partially responsible for slowing down releases is the > expectation that each dev release is supposed to have all bugs fixed, > work for every OS, etc. In other words, act like a stable release. > > A developer release by nature is living on the edge, so why not have > regular dev releases? We keep telling users to update to using > bioperl-live whenever something breaks, anyway. We could decide to > split stuff off along the way into more 'stable' sections if there > were more demand for it, and have the more API-volatile code > (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the > 'dev' tag until we feel it's ready for prime time. > > chris > > On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > > -- snip -- I agree, although would the dev releases still need to pass all the tests? I'm thinking of people installing via CPAN. I also agree with what was said in a previous post about bringing back bioperl-run (and some others) back into the same repository as bioperl-core (after a successful move over to svn) and have Build.PL deal with creating the packages etc for CPAN. This would hopefully help keep the run package (and others) up to speed with the core package. I also agree with previous posts about organising and/or having some naming convention for test data files. I think an approach whereby data files were organised into directory trees (1 - 3 deep) with names that elude to the type of data in that subtree/file rather than the tests that use it etc. For example: t/data |__ formats | |__ seq | | |__ legal_fasta | | | |__ extension.fas | | | |__ extension.fasta | | | |__ extension.foo | | | |__ extension.bar | | | |__ no_extension | | | |__ interleaved.fas | | | |__ non_interleaved.fas | | | |__ single_seq.fas | | | |__ multiple_seq.fas | | | |__ desc_line1.fas | | | |__ desc_line2.fas | | | | | |__ illegal_fasta | | | |__ illegal_chars.fas | | | |__ some_other_illegal_alternative.fas | | | | | |__ legal_genbank | | | |__ etc etc | | | | | |__ illegal_genank | | |__ etc etc | | | |__ aln | |__ blast | | |__ legal_blastx | | | | | |__ legal_blastp | | | | | |__ legal_tblastx | | | | | |__ legal_plastpsi | | | | | |__ legal_wublast | |__ foo | |__ bar | |__ misc | |__ etc This type of setup, might lend itself to having a test script simply try to parse all the files in a directory to ensure nothing fails (for legal file formats) and fails for illegal formats. Naming of the file paths would help test authors to identify a suitable data file for their own tests before adding their own to the t/data dir. It might also help to identify areas where example test data is currently lacking. Thinking about this a little more, I think it would be a good idea to include Test::Exception in t/lib. We should also be testing that warnings and exceptions are generated when expected - e.g. illegal characters in seq files etc etc. Without these sorts of tests we are only getting half the story. This testing might account for a large chunk of the poor test coverage, particularly when it comes to branches in the code. Anyway, this type of reorganisation couldn't take place until the svn repo is up and working. I'd appreciate any comments on the above! Nath From bix at sendu.me.uk Thu Jul 5 04:55:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 09:55:25 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <468CB1FD.7060301@sendu.me.uk> Nathan S. Haigh wrote: > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Yes, they'd all have to pass. 'Developer release' should never have the connotation of 'broken release'. However, getting all tests to pass is a lot easier than fixing all bugs in bugzilla. (... which actually goes to show how poor our tests are) Worst case, if we were forced to stick to a schedule but couldn't fix a failing test, we could always make it a 'todo' test. > I also agree with what was said in a previous post about bringing back > bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) Agree (with myself essentially). > I also agree with previous posts about organising and/or having some > naming convention for test data files. I think an approach whereby data > files were organised into directory trees (1 - 3 deep) with names that > elude to the type of data in that subtree/file rather than the tests > that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas [snip] At that level, files don't need extensions and can have fully informative names that explain what's interesting or special about them. > This type of setup, might lend itself to having a test script simply try > to parse all the files in a directory to ensure nothing fails (for legal > file formats) and fails for illegal formats. Great idea. > Thinking about this a little more, I think it would be a good idea to > include Test::Exception in t/lib. Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > Anyway, this type of reorganisation couldn't take place until the svn > repo is up and working. Agree. From bix at sendu.me.uk Thu Jul 5 05:39:10 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 10:39:10 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <468CBC3E.1020408@sendu.me.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Thinking about this a little more, I think it would be a good idea to >> include Test::Exception in t/lib. > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. I've now done that: BioperlTest loads Test::Exception, from the copy in t/lib if necessary. So, in BioperlTest-using scripts you now have access to the methods dies_ok, lives_ok, throws_ok and lives_and. From N.Haigh at sheffield.ac.uk Thu Jul 5 06:01:04 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 11:01:04 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <1183629664.468cc1609891a@webmail.shef.ac.uk> Quoting Sendu Bala : -- snip -- > > > > I also agree with previous posts about organising and/or having some > > naming convention for test data files. I think an approach whereby data > > files were organised into directory trees (1 - 3 deep) with names that > > elude to the type of data in that subtree/file rather than the tests > > that use it etc. For example: > > > > t/data > > |__ formats > > | |__ seq > > | | |__ legal_fasta > > | | | |__ extension.fas > [snip] > > At that level, files don't need extensions and can have fully > informative names that explain what's interesting or special about them. > You may be correct in most cases, however, isn't there a method for detecting the file format from the file extension and failing that it peeks inside the file? Therefore there should be a file extension for each of these to get good code coverage as well as each format not having an extension to check that the peek inside the file correctly determines the format. -- snip -- From bix at sendu.me.uk Thu Jul 5 06:04:16 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:04:16 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <1183629664.468cc1609891a@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> Message-ID: <468CC220.804@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Sendu Bala : > > -- snip -- >> >>> I also agree with previous posts about organising and/or having >>> some naming convention for test data files. I think an approach >>> whereby data files were organised into directory trees (1 - 3 >>> deep) with names that elude to the type of data in that >>> subtree/file rather than the tests that use it etc. For example: >>> >>> t/data |__ formats | |__ seq | | |__ >>> legal_fasta | | | |__ extension.fas >>> >> [snip] >> >> At that level, files don't need extensions and can have fully >> informative names that explain what's interesting or special about >> them. >> > > You may be correct in most cases, however, isn't there a method for > detecting the file format from the file extension and failing that it > peeks inside the file? Therefore there should be a file extension for > each of these to get good code coverage as well as each format not > having an extension to check that the peek inside the file correctly > determines the format. Yes, you're quite correct. From bix at sendu.me.uk Thu Jul 5 06:47:12 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:47:12 +0100 Subject: [Bioperl-l] Warnings Message-ID: <468CCC30.90406@sendu.me.uk> I'm trying to get Test::Warn to work with Bioperl warnings as produced by Bio::Root::RootI::warn(). However, afaict the warnings must be generated with CORE::warn(), not print STDERR. Is there any particular reason RootI::warn is done with print and not CORE::warn ? Can I change it to a warn? From bix at sendu.me.uk Thu Jul 5 09:04:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:04:50 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <200707051458.59921.heikki@sanbi.ac.za> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> Message-ID: <468CEC72.4090909@sendu.me.uk> Heikki Lehvaslaiho wrote: > My guess is that using 'print STDERR' avoids showing sometimes annoying > errordescription at programname line NN > syntax being used. Afaik, CORE::warn "anything\n"; never includes the line number: messages with a new line always disable that feature. Bio::Root::RootI::warn /always/ puts new lines into the message, so they /never/ have the line number. > On the other hand, the main reason we need to set verbosity to 1 in BioPerl > objects is to find where warnings are coming from. Maybe extra text in > warnings leads to easier debugging. > > I favour changing it. So its my understanding there will be absolutely no difference in behaviour following this change (except that warning can be caught by Test::Warn). I just wanted to confirm my understanding. From hlapp at gmx.net Thu Jul 5 09:07:27 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 5 Jul 2007 09:07:27 -0400 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > Chris Fields wrote: >> I think what's partially responsible for slowing down releases is the >> expectation that each dev release is supposed to have all bugs fixed, >> work for every OS, etc. In other words, act like a stable release. >> It doesn't. A stable release has a stable API that will be supported until the next stable release through point releases. >> A developer release by nature is living on the edge, so why not have >> regular dev releases? There's no problem with regular dev releases, but tests will need to pass. There was never a stipulation that all bugs need to have been fixed. But all tests need to pass, so in an ideal world (in which everything is being tested) all tests passing would imply all (known) bugs fixed. Obviously, we don't live in an ideal world ... If not everything passes then what is the big difference to a code snapshot? If using cvs (or svn) is too difficult for most people, we can consider creating a mechanism that puts up nightly snapshots for download. > -- snip -- > > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. For example, that's another point. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From heikki at sanbi.ac.za Thu Jul 5 09:12:37 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 15:12:37 +0200 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <200707051512.38185.heikki@sanbi.ac.za> One more suggestion: It would be extemaly useful if we had a standard way of testing that a when a file is read into a bioperl object and then written out again into a same format, the input and output files are identical. If not, the test should show where the the differences start (showing all the differences would just clutter the screen). This standard method/subroutine should be used to test all sequence and other text file IO. Any takers? -Heikki On Thursday 05 July 2007 11:39:10 Sendu Bala wrote: > Sendu Bala wrote: > > Nathan S. Haigh wrote: > >> Thinking about this a little more, I think it would be a good idea to > >> include Test::Exception in t/lib. > > > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Jul 5 08:58:59 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 14:58:59 +0200 Subject: [Bioperl-l] Warnings In-Reply-To: <468CCC30.90406@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> Message-ID: <200707051458.59921.heikki@sanbi.ac.za> My guess is that using 'print STDERR' avoids showing sometimes annoying errordescription at programname line NN syntax being used. On the other hand, the main reason we need to set verbosity to 1 in BioPerl objects is to find where warnings are coming from. Maybe extra text in warnings leads to easier debugging. I favour changing it. -Heikki On Thursday 05 July 2007 12:47:12 Sendu Bala wrote: > I'm trying to get Test::Warn to work with Bioperl warnings as produced > by Bio::Root::RootI::warn(). However, afaict the warnings must be > generated with CORE::warn(), not print STDERR. > > Is there any particular reason RootI::warn is done with print and not > CORE::warn ? Can I change it to a warn? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Thu Jul 5 09:44:08 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:44:08 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF5A8.7040402@sendu.me.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that > a when a file is read into a bioperl object and then written out > again into a same format, the input and output files are identical. As Hilmar has pointed out in the past, Bioperl doesn't aim for the files to be identical, only for none of the information to be lost and to be ouput in the correct format. So a round-trip test should read in the original, store all the parsed data, write it out, then read in the written version and see if the new parsed data matches the original. For simpler or ultra-strict file formats, though... > If not, the test should show where the the differences start (showing > all the differences would just clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other text file IO. > > Any takers? There's already something along these lines in t/SeqIO.t (the section that uses Algorithm::Diff). I copied that over from the old testformats.pl script but haven't really taken the time to see if its a good way of doing the test. Is it? Can someone come up with something better? Can someone generalise it if necessary? I imagine you could just read the files into arrays and use Test::More::is_deeply(). If that would be satisfactory I could easily add a little method to BioperlTest that did that. From n.haigh at sheffield.ac.uk Thu Jul 5 09:47:24 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 14:47:24 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF66C.2070907@sheffield.ac.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that a when a > file is read into a bioperl object and then written out again into a same > format, the input and output files are identical. If not, the test should > show where the the differences start (showing all the differences would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence and other > text file IO. > > Any takers? > > -Heikki > Wouldn't this require info about the formatting of the file to be stored in the object as well, such that the same formatting could be used when writing the file? Wouldn't a better approach be to read the contents of file1 into ojb1, write obj1 to file2 in the same format, and then read file2 into obj2 and compare obj1 to obj2 to ensure we have all the same data. Nath From cjfields at uiuc.edu Thu Jul 5 09:52:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 08:52:12 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote: > ... > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Remains to be decided. All current tests (net and non-non) should pass. Any bug fixes should try to have added tests if possible, with in-process stuff as TODO's. Network tests are left up to user discretion, so if they fail for any particular reason there is a way around them. > I also agree with what was said in a previous post about bringing > back bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) and have > Build.PL deal with creating the packages etc for CPAN. This would > hopefully help keep the run package (and others) up to speed with > the core package. It's up to how we want to have everything split. I don't think it's immediately prescient (there are more important priorities, i.e. bugs, svn) but I would say folding everything back into live and 'splitting' them out using an automated Build process is a viable option. > I also agree with previous posts about organising and/or having > some naming convention for test data files. I think an approach > whereby data files were organised into directory trees (1 - 3 deep) > with names that elude to the type of data in that subtree/file > rather than the tests that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas > | | | |__ extension.fasta > | | | |__ extension.foo > | | | |__ extension.bar > | | | |__ no_extension > | | | |__ interleaved.fas > | | | |__ non_interleaved.fas > | | | |__ single_seq.fas > | | | |__ multiple_seq.fas > | | | |__ desc_line1.fas > | | | |__ desc_line2.fas > | | | > | | |__ illegal_fasta > | | | |__ illegal_chars.fas > | | | |__ > some_other_illegal_alternative.fas > | | | > | | |__ legal_genbank > | | | |__ etc etc > | | | > | | |__ illegal_genank > | | |__ etc etc > | | > | |__ aln > | |__ blast > | | |__ legal_blastx > | | | > | | |__ legal_blastp > | | | > | | |__ legal_tblastx > | | | > | | |__ legal_plastpsi > | | | > | | |__ legal_wublast > | |__ foo > | |__ bar > | |__ misc > | > |__ etc > > This type of setup, might lend itself to having a test script > simply try to parse all the files in a directory to ensure nothing > fails (for legal file formats) and fails for illegal formats. > Naming of the file paths would help test authors to identify a > suitable data file for their own tests before adding their own to > the t/data dir. It might also help to identify areas where example > test data is currently lacking. ... This seems like more of a 'guess sequence' and format validation issue, something we've talked about before: http://bugzilla.open-bio.org/show_bug.cgi?id=1508 The way I feel about it is sequence format validation and sequence parsing should be separate issues and therefore in separate classes (with parsing optionally preceded by validation), but that's something for another discussion. > Thinking about this a little more, I think it would be a good idea > to include Test::Exception in t/lib. We should also be testing that > warnings and exceptions are generated when expected - e.g. illegal > characters in seq files etc etc. Without these sorts of tests we > are only getting half the story. This testing might account for a > large chunk of the poor test coverage, particularly when it comes > to branches in the code. > > Anyway, this type of reorganisation couldn't take place until the > svn repo is up and working. > > I'd appreciate any comments on the above! > Nath chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:08:29 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:08:29 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CF5A8.7040402@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> Message-ID: <468CFB5D.6080406@sheffield.ac.uk> Is there a way to install all the modules that are used in the tests? I mean there are cases where tests are skipped and pass if the required module for testing is not installed. Therefore, missing out a chunk of the tests. It would be desirable to be able to install all these modules in order to complete they whole test suite - any ideas if/how this can be done? Cheers Nath From bix at sendu.me.uk Thu Jul 5 10:15:34 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 15:15:34 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: <468CFD06.3080604@sendu.me.uk> Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these modules > in order to complete they whole test suite - any ideas if/how this can > be done? Yes, add them as recommended (or perhaps 'build_requires') modules in Build.PL, then run Build.PL and install the modules when it asks you. Everything should be in Build.PL already. If I missed something, please add it. From cjfields at uiuc.edu Thu Jul 5 10:18:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:08 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 9:08 AM, Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the > tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these > modules > in order to complete they whole test suite - any ideas if/how this can > be done? > > Cheers > Nath That's optionally done upon 'perl Build.PL', correct? So if you choose not to install a particular prereq (i.e. XML::SAX), you shouldn't be forced to install it later just for tests. Or am I misunderstanding you? chris From cjfields at uiuc.edu Thu Jul 5 10:18:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:23 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CC220.804@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > Nathan S. Haigh wrote: >> Quoting Sendu Bala : >>> ... >>> At that level, files don't need extensions and can have fully >>> informative names that explain what's interesting or special about >>> them. >>> >> >> You may be correct in most cases, however, isn't there a method for >> detecting the file format from the file extension and failing that it >> peeks inside the file? Therefore there should be a file extension for >> each of these to get good code coverage as well as each format not >> having an extension to check that the peek inside the file correctly >> determines the format. > > Yes, you're quite correct. I actually like Sendu's idea more, or the idea of each test suite having it's own directory. Tests which need to guess/validate the format are probably best left sequestered to a specific suite focused on format guessing/ validation, at least in my opinion. chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:22:40 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:22:40 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFD06.3080604@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> Message-ID: <468CFEB0.80201@sheffield.ac.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Is there a way to install all the modules that are used in the tests? >> I mean there are cases where tests are skipped and pass if the >> required module for testing is not installed. Therefore, missing out a >> chunk of the tests. It would be desirable to be able to install all >> these modules in order to complete they whole test suite - any ideas >> if/how this can be done? > > Yes, add them as recommended (or perhaps 'build_requires') modules in > Build.PL, then run Build.PL and install the modules when it asks you. > > Everything should be in Build.PL already. If I missed something, please > add it. > OK, to clarify using the test file Sendu mentioned in a previous post: t/SeqIO.t This test skips tests if Algorithm::Diff, IO::ScalarArray or IO::String are not installed (the first two are not mentioned in Build.PL). However, if there are a lot of such skips in the whole test suite then there maybe few system with all these modules installed in order to conduct a complete test. These are the modules I'm referring to. Nath From n.haigh at sheffield.ac.uk Thu Jul 5 10:30:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:30:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: <468D006D.6050806@sheffield.ac.uk> Chris Fields wrote: > > On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > >> Nathan S. Haigh wrote: >>> Quoting Sendu Bala : >>>> ... >>>> At that level, files don't need extensions and can have fully >>>> informative names that explain what's interesting or special about >>>> them. >>>> >>> >>> You may be correct in most cases, however, isn't there a method for >>> detecting the file format from the file extension and failing that it >>> peeks inside the file? Therefore there should be a file extension for >>> each of these to get good code coverage as well as each format not >>> having an extension to check that the peek inside the file correctly >>> determines the format. >> >> Yes, you're quite correct. > > I actually like Sendu's idea more, or the idea of each test suite having > it's own directory. > > Tests which need to guess/validate the format are probably best left > sequestered to a specific suite focused on format guessing/validation, > at least in my opinion. > > chris How easily would this lend itself to using the same data for multiple tests, or is it likely to lead to/exacerbate a culture of adding duplicate data files in each "test suite" rather than reusing? Nath From cjfields at uiuc.edu Thu Jul 5 10:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:33:46 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 8:07 AM, Hilmar Lapp wrote: > On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > >> Chris Fields wrote: >>> I think what's partially responsible for slowing down releases is >>> the >>> expectation that each dev release is supposed to have all bugs >>> fixed, >>> work for every OS, etc. In other words, act like a stable release. > > It doesn't. A stable release has a stable API that will be > supported until the next stable release through point releases. I agree, but I think there is still an expectation that 1.5.2 and beyond are more like true 'stable' releases even though we still designate them as 'developer.' We unfortunately reinforce that when we tell users they need to update to v. 1.5.2 or bioperl-live to fix a particular bug in the 1.4 release. There's nothing we can do about that now (hindsight is always 20/20, and 1.4 is just too old). We (pumpkin, core devs) can try correcting that by ensuring any bug fixes be committed to any new stable branch as well as to live, at least until it becomes too problematic to maintain that particular stable branch (at which point we would go about getting ready for the next 'stable' and repeat the cycle over again). >>> A developer release by nature is living on the edge, so why not have >>> regular dev releases? > > There's no problem with regular dev releases, but tests will need > to pass. There was never a stipulation that all bugs need to have > been fixed. But all tests need to pass, so in an ideal world (in > which everything is being tested) all tests passing would imply all > (known) bugs fixed. Obviously, we don't live in an ideal world ... ...particularly when it comes to network-related tests and remote server problems (but those are by default not run, so there is a way around test fails there). I agree here as well (all tests must pass). As for the bug fixes, we can just stipulate which ones were fixed with the release (in a RELEASE_NOTES or similar), and maybe have TODO's in the test suite designating they are being worked on. Basically, at regular intervals, maybe with a few weeks of lead time, the pumpkin would announce an impending dev. release. Go through rounds of tests, bug fixes, etc. When all tests pass post it on CPAN as a dev. release. If we have a stable release branch with relevant bug fixes we can post that as well, again to the point where it becomes too problematic. Would we just take a snapshot of MAIN and any relevant stable branch at that particular point for the CPAN release, just increasing the version number (1.x.y)? Would it make sense to have a 1.x.y branch for each release (I don't think so, but maybe others disagree)? > If not everything passes then what is the big difference to a code > snapshot? If using cvs (or svn) is too difficult for most people, > we can consider creating a mechanism that puts up nightly snapshots > for download. If we feel a nightly snapshot is warranted we could do that though. I personally don't think there is a need, particularly since we have several means to obtain the latest code at any point in time (including the browsable CVS 'Download tarball'). We could state the next dev/stable CPAN release (pending on date dd/mm/yy) will have the bug fix, and if they want it immediately then pick it up from CVS. >> -- snip -- >> >> I agree, although would the dev releases still need to pass all the >> tests? I'm thinking of people installing via CPAN. > > For example, that's another point. > > -hilmar Yes, I agree. As an aside, I don't think dev. releases pop up when you run a simple 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know the answer to that. chris From cjfields at uiuc.edu Thu Jul 5 10:34:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:34:22 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > One more suggestion: > > It would be extemaly useful if we had a standard way of testing > that a when a > file is read into a bioperl object and then written out again into > a same > format, the input and output files are identical. If not, the test > should > show where the the differences start (showing all the differences > would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other > text file IO. > > Any takers? > > -Heikki ... I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t that do some checking, I think, but something like this would be of use. However, what if the test file is old (as many in t/data are) and the format has changed? GenBank and EMBL, for instance, have gone through several changes to format. chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:43:51 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:43:51 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <468D03A7.3090408@sheffield.ac.uk> Chris Fields wrote: -- snip -- >>> >>> I agree, although would the dev releases still need to pass all the >>> tests? I'm thinking of people installing via CPAN. >> >> For example, that's another point. >> >> -hilmar > > Yes, I agree. > > As an aside, I don't think dev. releases pop up when you run a simple > 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know > the answer to that. > > chris Thats right, it'll only install the non-developer releases (1.4 currently). If you want to install the developer release from CPAN you need to know the path the archive and then do: cpan> install S/SE/SENDU/bioperl-1.5.2_102.tar.gz as detailed on the wiki: http://www.bioperl.org/wiki/Release_1.5.2 Nath From cjfields at uiuc.edu Thu Jul 5 10:49:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:49:33 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFEB0.80201@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> Message-ID: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > Sendu Bala wrote: >> ... >> Yes, add them as recommended (or perhaps 'build_requires') modules in >> Build.PL, then run Build.PL and install the modules when it asks you. >> >> Everything should be in Build.PL already. If I missed something, >> please >> add it. >> > > OK, to clarify using the test file Sendu mentioned in a previous post: > t/SeqIO.t > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > IO::String > are not installed (the first two are not mentioned in Build.PL). > However, if there are a lot of such skips in the whole test suite then > there maybe few system with all these modules installed in order to > conduct a complete test. These are the modules I'm referring to. > > Nath If they are only necessary for tests, work for all OSs, and are pure Perl they should be added to t/lib, like Test::More and the rest. If they only work for some OSs they could be added to t/lib and skip based on OS, but they still must be pure Perl. I would avoid anything that requires any compiling for XS or Inline altogether (I don't want to go down the nightmare road of OS-dependent compiler issues for a few tests). Finally, if they are needed for core modules (not just tests) then they should be added to the core prereqs in Build. chris From cjfields at uiuc.edu Thu Jul 5 10:52:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:52:58 -0500 Subject: [Bioperl-l] Warnings In-Reply-To: <468CEC72.4090909@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> Message-ID: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > ... > > So its my understanding there will be absolutely no difference in > behaviour following this change (except that warning can be caught by > Test::Warn). I just wanted to confirm my understanding. You can always just try it out and run tests. Might be interesting to see if anything breaks. chris From N.Haigh at sheffield.ac.uk Thu Jul 5 10:58:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 15:58:30 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> Message-ID: <1183647510.468d07168963c@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > > > > One more suggestion: > > > > It would be extemaly useful if we had a standard way of testing > > that a when a > > file is read into a bioperl object and then written out again into > > a same > > format, the input and output files are identical. If not, the test > > should > > show where the the differences start (showing all the differences > > would just > > clutter the screen). > > > > This standard method/subroutine should be used to test all sequence > > and other > > text file IO. > > > > Any takers? > > > > -Heikki > ... > > I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t > that do some checking, I think, but something like this would be of > use. However, what if the test file is old (as many in t/data are) > and the format has changed? GenBank and EMBL, for instance, have > gone through several changes to format. > > chris > > Is there any way to distinguish variants apart other than just layout? e.g. a version number of the likes? Nath From N.Haigh at sheffield.ac.uk Thu Jul 5 11:04:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 16:04:30 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> Message-ID: <1183647870.468d087ed4c80@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > > > Sendu Bala wrote: > >> ... > >> Yes, add them as recommended (or perhaps 'build_requires') modules in > >> Build.PL, then run Build.PL and install the modules when it asks you. > >> > >> Everything should be in Build.PL already. If I missed something, > >> please > >> add it. > >> > > > > OK, to clarify using the test file Sendu mentioned in a previous post: > > t/SeqIO.t > > > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > > IO::String > > are not installed (the first two are not mentioned in Build.PL). > > However, if there are a lot of such skips in the whole test suite then > > there maybe few system with all these modules installed in order to > > conduct a complete test. These are the modules I'm referring to. > > > > Nath > > If they are only necessary for tests, work for all OSs, and are pure > Perl they should be added to t/lib, like Test::More and the rest. If > they only work for some OSs they could be added to t/lib and skip > based on OS, but they still must be pure Perl. I would avoid > anything that requires any compiling for XS or Inline altogether (I > don't want to go down the nightmare road of OS-dependent compiler > issues for a few tests). If this is the case, there surely is no need to skip the tests if they should be provided in the t/lib dir. Am I missing something!? > > Finally, if they are needed for core modules (not just tests) then > they should be added to the core prereqs in Build. > > chris > From bix at sendu.me.uk Thu Jul 5 11:13:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:13:35 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: <468D0A9F.4010709@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Chris Fields : >>> OK, to clarify using the test file Sendu mentioned in a previous >>> post: t/SeqIO.t >>> >>> This test skips tests if Algorithm::Diff, IO::ScalarArray or >>> IO::String are not installed >> >> If they are only necessary for tests, work for all OSs, and are >> pure Perl they should be added to t/lib, like Test::More and the >> rest. If they only work for some OSs they could be added to t/lib >> and skip based on OS, but they still must be pure Perl. I would >> avoid anything that requires any compiling for XS or Inline >> altogether (I don't want to go down the nightmare road of >> OS-dependent compiler issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? That skip in SeqIO.t is new and I simply didn't think of them as important enough to make anyone install them or include them in t/lib. I'd go ahead and add those modules, but like I say, it may make more sense just to use is_deeply(), removing the dependency on Algorithm::Diff and IO::ScalarArray completely. From cjfields at uiuc.edu Thu Jul 5 11:35:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:35:41 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: On Jul 5, 2007, at 10:04 AM, Nathan S. Haigh wrote: > ... >> If they are only necessary for tests, work for all OSs, and are pure >> Perl they should be added to t/lib, like Test::More and the rest. If >> they only work for some OSs they could be added to t/lib and skip >> based on OS, but they still must be pure Perl. I would avoid >> anything that requires any compiling for XS or Inline altogether (I >> don't want to go down the nightmare road of OS-dependent compiler >> issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? No, you are correct, but these are currently not in t/lib (unless someone snuck them in....) Of the modules you listed above, only one (IO::String) is required by the core modules. The others are not. Users shouldn't be forced to install Algorithm::Diff or IO::ScalarArray just to run tests, so anything not required should go into t/lib if at all possible. If there any reasons (OS issues, list of prereqs) which preclude adding these to t/lib we need to ask ourselves (1) why we are using that module in the first place? And, if there is a good reason, (2) can we skip them if they aren't present? Both of those options are already available. chris From cjfields at uiuc.edu Thu Jul 5 11:50:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:50:55 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468D006D.6050806@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> <468D006D.6050806@sheffield.ac.uk> Message-ID: <404EEDE8-53AC-411E-B4F0-CF4B4AABE9E0@uiuc.edu> On Jul 5, 2007, at 9:30 AM, Nathan S. Haigh wrote: > ... >> I actually like Sendu's idea more, or the idea of each test suite >> having it's own directory. >> Tests which need to guess/validate the format are probably best >> left sequestered to a specific suite focused on format guessing/ >> validation, at least in my opinion. >> chris > > > How easily would this lend itself to using the same data for > multiple tests, or is it likely to lead to/exacerbate a culture of > adding duplicate data files in each "test suite" rather than reusing? > > Nath If there is a group of test data used for more than one test suite we can group those together into a common use folder, or we can go by format. I'm pretty open to anything, really, as long as it is more organized. My point is really concerned more with validation/guessing. I think we should limit those tests to their respective specific test suites, or even to sections within a particular test suite (for instance, genbank.t), but not to force sequence guessing or validation in other cases. To me validation, guessing, and parsing are three distinct issues (much like XML parsers handle things), so they require three distinct tests. As for true sequence validation, there is no official format validation scheme yet in BioPerl. It's sort of unofficially intergrated into the sequence parsers themselves (something which I find to be problematic for several reasons too long to outline here). chris From cjfields at uiuc.edu Thu Jul 5 11:54:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:54:42 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <1183647510.468d07168963c@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> <1183647510.468d07168963c@webmail.shef.ac.uk> Message-ID: <48474A2C-2A58-4D51-8E7F-7CE083948D0F@uiuc.edu> On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote: > Quoting Chris Fields : > >> >> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: >> >>> >>> One more suggestion: >>> >>> It would be extemaly useful if we had a standard way of testing >>> that a when a >>> file is read into a bioperl object and then written out again into >>> a same >>> format, the input and output files are identical. If not, the test >>> should >>> show where the the differences start (showing all the differences >>> would just >>> clutter the screen). >>> >>> This standard method/subroutine should be used to test all sequence >>> and other >>> text file IO. >>> >>> Any takers? >>> >>> -Heikki >> ... >> >> I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t >> that do some checking, I think, but something like this would be of >> use. However, what if the test file is old (as many in t/data are) >> and the format has changed? GenBank and EMBL, for instance, have >> gone through several changes to format. >> >> chris >> >> > > Is there any way to distinguish variants apart other than just > layout? e.g. a version number of the likes? > > Nath I don't think so; this veers back into the whole validation issue (i.e. does the record fit certain specifications). There are examples of seq records from different sources which bioperl is expected to parse, for example Ensembl GenBank records. Some of those have feature tags or annotation fields which may not appear in output when using write_seq(). I don't think it's as important to replicate the output data exactly like the input as much as it's important to have the data represented in a Bio::Seq object (or any other Bio* instance) in a consistent manner and have the ability to incorporate new fields (such as the recent addition of genome projects) transparently. The latter is hard to do with the current genbank parser (you have to specifically code for it), but it is a bit easier to do with the driver-handler model I'm working on. chris From bix at sendu.me.uk Thu Jul 5 11:56:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:56:29 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <468D14AD.8050007@sendu.me.uk> Sendu Bala wrote: > Sendu Bala wrote: >> Nathan S. Haigh wrote: >>> Thinking about this a little more, I think it would be a good idea to >>> include Test::Exception in t/lib. >> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. And I've also now added in support for Test::Warn, giving you warning_is, warnings_are, warning_like and warnings_like. I've updated the HOWTO as well: http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests You can see these things in action in t/seq_quality.t From bix at sendu.me.uk Thu Jul 5 11:57:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:57:23 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> Message-ID: <468D14E3.6030104@sendu.me.uk> Chris Fields wrote: > > On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > >> ... >> >> So its my understanding there will be absolutely no difference in >> behaviour following this change (except that warning can be caught by >> Test::Warn). I just wanted to confirm my understanding. > > You can always just try it out and run tests. Might be interesting to > see if anything breaks. I've made the change. Everything seems ok as far as I can tell. From dmessina at wustl.edu Thu Jul 5 12:02:26 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:02:26 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 9:33 AM, Chris Fields wrote: > I agree, but I think there is still an expectation that 1.5.2 and > beyond are more like true 'stable' releases even though we still > designate them as 'developer.' We unfortunately reinforce that when > we tell users they need to update to v. 1.5.2 or bioperl-live to fix > a particular bug in the 1.4 release. I know this has been discussed before, but while we're talking about future release plans, it might be worth revisiting the BioPerl policy of designating only even-numbered releases as 'stable'. It's taking so long to get from 1.4 to 1.6. While the principle of keeping a stable API between 'stable' releases is valid in the ideal case, I think that continuing to label 1.5.2 (or whatever the latest 'dev' release is) as a developer release (which implies potentially unstable or bleeding-edge code) is highly misleading since we would never ever tell anyone to get 1.4 instead. Alternatively, if we adopt a more aggressive release schedule as Chris proposed a couple days ago, then perhaps we could agree to push out an even-numbered release once a year or so, so that there is a 'stable' release we could recommend. > If we feel a nightly snapshot is warranted we could do that though. > I personally don't think there is a need, particularly since we have > several means to obtain the latest code at any point in time > (including the browsable CVS 'Download tarball'). We could state the > next dev/stable CPAN release (pending on date dd/mm/yy) will have the > bug fix, and if they want it immediately then pick it up from CVS. To make it easier for people to obtain the latest tarball, we could put the 'download tarball' link directly on the 'Getting_BioPerl' wiki page instead of only a link to the viewcvs interface. That way they wouldn't have to navigate the source tree to figure out which tarball they want (which is almost always going to be the bioperl- live tarball). I think the actual URL underlying the 'Download tarball' link on viewcvs is stable: http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- live.tar.gz?tarball=1 Dave From cjfields at uiuc.edu Thu Jul 5 12:13:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:13:30 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 11:02 AM, David Messina wrote: > ... > I know this has been discussed before, but while we're talking > about future release plans, it might be worth revisiting the > BioPerl policy of designating only even-numbered releases as > 'stable'. It's taking so long to get from 1.4 to 1.6. While the > principle of keeping a stable API between 'stable' releases is > valid in the ideal case, I think that continuing to label 1.5.2 (or > whatever the latest 'dev' release is) as a developer release (which > implies potentially unstable or bleeding-edge code) is highly > misleading since we would never ever tell anyone to get 1.4 instead. > > Alternatively, if we adopt a more aggressive release schedule as > Chris proposed a couple days ago, then perhaps we could agree to > push out an even-numbered release once a year or so, so that there > is a 'stable' release we could recommend. I think the idea of 'stable' is best summarized back in Hilmar's post (i.e. we support a particular API for that release). The 1.5 releases I believe break some aspects of 1.4 API (some of the Feature/ Annotation stuff introduced before the official 1.5 release). We still need to address some of those issues before a 1.6 which seems to be the only real stumbling block, but they are unfortunately not well-documented and are somewhat interwoven with GMOD code. > ... > To make it easier for people to obtain the latest tarball, we could > put the 'download tarball' link directly on the 'Getting_BioPerl' > wiki page instead of only a link to the viewcvs interface. That way > they wouldn't have to navigate the source tree to figure out which > tarball they want (which is almost always going to be the bioperl- > live tarball). > > I think the actual URL underlying the 'Download tarball' link on > viewcvs is stable: > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- > live.tar.gz?tarball=1 > > Dave Sounds reasonable enough. Do you want to do the honors? chris From dmessina at wustl.edu Thu Jul 5 12:44:28 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:44:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> > [Chris] > The 1.5 releases I believe break some aspects of 1.4 API Yes, this is true. I question, though, whether it's relevant given that virtually no one uses 1.4 anymore. In any case, I would venture that the number of people who would be bitten by the 1.4->1.5 API change is much smaller than the number of people who download 1.4 and then ask us why it doesn't work. I think that, rather than continuing to call 1.5.x the developer release in order to adhere to the API guarantee, it would be much clearer to users if we state clearly that everyone should download 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API changes. >> [me] >> we could put the 'download tarball' link directly on the >> 'Getting_BioPerl' wiki page > > [Chris] > Sounds reasonable enough. Do you want to do the honors? Done. Dave From cjfields at uiuc.edu Thu Jul 5 12:57:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:57:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: On Jul 5, 2007, at 11:44 AM, David Messina wrote: > >> [Chris] >> The 1.5 releases I believe break some aspects of 1.4 API > > Yes, this is true. > > I question, though, whether it's relevant given that virtually no > one uses 1.4 anymore. In any case, I would venture that the number > of people who would be bitten by the 1.4->1.5 API change is much > smaller than the number of people who download 1.4 and then ask us > why it doesn't work. > > I think that, rather than continuing to call 1.5.x the developer > release in order to adhere to the API guarantee, it would be much > clearer to users if we state clearly that everyone should download > 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API > changes. You'd be surprised how many are still using bioperl 1.2.3 (Ensembl) and 1.4 (any admin too scared to go with a 'dev' release). The real answer is to get out a stable 1.6 ASAP. The problem we currently have is (horrible Texas pun) 'too many pokers in the fire.' We have svn migration, major changes in the test suite, talk about splitting bioperl, a lot of bugs to sort through, new code to add or work on, etc. Not to mention our $jobs! I think we should just bite the bullet and proceed with pulling out the controversial operator overloading in Bio::Annotation*, deprecate the tag methods in AnnotatableI, and go about fixing everything up. If that occurs (which seems to be the major impediment) and we get GMOD/GBrowse playing well with BioPerl then we can aim for a new stable release, and then institute a regular release cycle. chris From bpederse at gmail.com Thu Jul 5 13:58:24 2007 From: bpederse at gmail.com (Brent Pedersen) Date: Thu, 5 Jul 2007 10:58:24 -0700 Subject: [Bioperl-l] slippy map for genomic features. Message-ID: hi, here's a side project i've been tinkering on in googlecode svn that may be useful to some. http://code.google.com/p/genome-browser/ it's a simple hack on top of OpenLayers (openlayers.org) to provide a javascript slippy map interface and API to view and browse genomic features. It can be used with any image generation program that can accept &xmin= and &xmax= parameters through the url. -- though i havent had it working it bioperl as bioperl generates images of different height depending on the number of tracks. there's a live example of the code in SVN here: http://toxic.berkeley.edu/bpederse/genome-browser/ with images generated by a colleague's modules on first request. those images are then cached by a simple perl script included in the SVN repo. all subsequent requests are returned from the cache. an image request (automatically generated by the javascript) looks like: http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512 but any implementation need only implement xmin and xmax. all other parameters will be used for caching but are not required. if anyone is interested in getting this going with bioperl image generation--or improving the project in any way, let me know and i'll add you as a committer and provide any javascript support that i can. -brent tar ball download: http://genome-browser.googlecode.com/files/genome-browser-0.02.tar From dmessina at wustl.edu Thu Jul 5 14:39:16 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 13:39:16 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: > The real answer is to get out a stable 1.6 ASAP. The problem we > currently have is (horrible Texas pun) 'too many pokers in the > fire.' We have svn migration, major changes in the test suite, > talk about splitting bioperl, a lot of bugs to sort through, new > code to add or work on, etc. Not to mention our $jobs! Yep, I hear ya. > I think we should just bite the bullet and proceed with pulling out > the controversial operator overloading in Bio::Annotation*, > deprecate the tag methods in AnnotatableI, and go about fixing > everything up. If that occurs (which seems to be the major > impediment) and we get GMOD/GBrowse playing well with BioPerl then > we can aim for a new stable release, and then institute a regular > release cycle. That's a great plan. You're right -- better to devote energy to 1.6 than to interim solutions. Alright, I give, I give! :) Dave From glauberwagner at yahoo.com.br Thu Jul 5 15:56:43 2007 From: glauberwagner at yahoo.com.br (Glauber Wagner) Date: Thu, 5 Jul 2007 16:56:43 -0300 (ART) Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com> Message-ID: <839755.95349.qm@web36514.mail.mud.yahoo.com> Dear All, I have a problem if Bio::DB::Query::GenBank module. I am trying to count the number of protein sequences and the module did not return the expected number by count object. use Bio::DB::GenBank; use Bio::DB::Query::GenBank; $query_string = "Trypanosoma cruzi[Organism]"; my $query = Bio::DB::Query::GenBank->new(-db=>'protein', -query=>$query_string); my $count = $query->count; my @ids = $query->ids; print "$count\n"; Thanks. Glauber ____________________________________________________________________________________ Novo Yahoo! Cad?? - Experimente uma nova busca. http://yahoo.com.br/oqueeuganhocomisso From cjfields at uiuc.edu Thu Jul 5 16:21:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 15:21:49 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <839755.95349.qm@web36514.mail.mud.yahoo.com> References: <839755.95349.qm@web36514.mail.mud.yahoo.com> Message-ID: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> NCBI esearch doesn't seem to be working at the moment. I'm getting 'Internal Server Error' at this time. Try back again at a later point. chris On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote: > Dear All, > > I have a problem if Bio::DB::Query::GenBank module. I > am trying to count the number of protein sequences and > the module did not return the expected number by count > object. > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > $query_string = "Trypanosoma cruzi[Organism]"; > > my $query = > Bio::DB::Query::GenBank->new(-db=>'protein', > > -query=>$query_string); > my $count = $query->count; > my @ids = $query->ids; > > print "$count\n"; > > Thanks. > Glauber > > > > > ______________________________________________________________________ > ______________ > Novo Yahoo! Cad?? - Experimente uma nova busca. > http://yahoo.com.br/oqueeuganhocomisso > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mitch_skinner at berkeley.edu Thu Jul 5 17:22:38 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Thu, 05 Jul 2007 14:22:38 -0700 Subject: [Bioperl-l] slippy map for genomic features. In-Reply-To: References: Message-ID: <468D611E.7020904@berkeley.edu> Hi, FWIW, we've been working on something similar: http://genome.biowiki.org/dmel/static/browser/prototype_gbrowse.html based on GBrowse/Bio::Graphics and javascript that Andrew wrote from scratch (with the prototype library). When our project was starting up (fall 05) Andrew looked but didn't find openlayers; I'm not sure if it was public back then but their current svn only goes back to 2006. I think that things like layout (bumping) ought to be done in advance on a chromosome-wide basis; otherwise it's difficult to keep features from ending up at different heights on neighboring tiles. And it would be difficult for the server to know what was being clicked on. So we've been doing some up-front work to either do layout or to just render all the tiles in advance: http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/TileGenerator.pm?revision=1.1&view=markup which is driven by this script: http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/generate-tiles.pl?revision=1.14&view=markup Or you could just not bump at all, I guess. I think of that as important functionality but I'd be interested in hearing about use cases where it's not necessary. It's not just bumping, though; things like text labels also make it difficult to predict exactly what pixels a feature will span if you only have its genomic coordinates. To make features clickable we've been using imagemaps; it simplifies the server code but it bogs down the client quite a bit. I'd certainly be interested in seeing if there are ways we could work together; if you're at Berkeley maybe we could meet. Regards, Mitch Brent Pedersen wrote: > hi, > here's a side project i've been tinkering on in googlecode svn that > may be useful to some. > http://code.google.com/p/genome-browser/ > it's a simple hack on top of OpenLayers (openlayers.org) to provide a > javascript slippy map interface and API to view and browse genomic > features. It can be used with any image generation program that can > accept &xmin= and &xmax= parameters through the url. -- though i > havent had it working it bioperl as bioperl generates images of > different height depending on the number of tracks. > > there's a live example of the code in SVN here: > http://toxic.berkeley.edu/bpederse/genome-browser/ > with images generated by a colleague's modules on first request. those > images are then cached by a simple perl script included in the SVN > repo. all subsequent requests are returned from the cache. > an image request (automatically generated by the javascript) looks like: > http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512 > but any implementation need only implement xmin and xmax. all other > parameters will be used for caching but are not required. > > if anyone is interested in getting this going with bioperl image > generation--or improving the project in any way, let me know and i'll > add you as a committer and provide any javascript support that i can. > > -brent > > tar ball download: > http://genome-browser.googlecode.com/files/genome-browser-0.02.tar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Jul 5 17:42:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 16:42:40 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> References: <839755.95349.qm@web36514.mail.mud.yahoo.com> <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> Message-ID: <3219E785-D475-4C21-ABCC-89FABD502E05@uiuc.edu> Update: seems to be back up. Give it a try now. chris On Jul 5, 2007, at 3:21 PM, Chris Fields wrote: > NCBI esearch doesn't seem to be working at the moment. I'm getting > 'Internal Server Error' at this time. Try back again at a later > point. > > chris > > On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote: > >> Dear All, >> >> I have a problem if Bio::DB::Query::GenBank module. I >> am trying to count the number of protein sequences and >> the module did not return the expected number by count >> object. >> >> use Bio::DB::GenBank; >> use Bio::DB::Query::GenBank; >> >> $query_string = "Trypanosoma cruzi[Organism]"; >> >> my $query = >> Bio::DB::Query::GenBank->new(-db=>'protein', >> >> -query=>$query_string); >> my $count = $query->count; >> my @ids = $query->ids; >> >> print "$count\n"; >> >> Thanks. >> Glauber >> >> >> >> >> _____________________________________________________________________ >> _ >> ______________ >> Novo Yahoo! Cad?? - Experimente uma nova busca. >> http://yahoo.com.br/oqueeuganhocomisso >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Jul 6 03:09:17 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 08:09:17 +0100 Subject: [Bioperl-l] API Changes In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: <468DEA9D.6010809@sheffield.ac.uk> David Messina wrote: >> [Chris] >> The 1.5 releases I believe break some aspects of 1.4 API >> > > Yes, this is true. > > I question, though, whether it's relevant given that virtually no one > uses 1.4 anymore. In any case, I would venture that the number of > people who would be bitten by the 1.4->1.5 API change is much smaller > than the number of people who download 1.4 and then ask us why it > doesn't work. > I'm not really up-to-speed with how the API should remain stable etc. Is the idea that the API should be stable from 1.4 though the 1.5 dev and then the next stale release can change that API? So any stable to stable upgrade could involve an API change while a stable to dev upgrade should have the same API? Does a stable API mean that the same method calls are available in a newer release....what about adding new methods to a newer release? How are these API changes currently tracked? It seems to me that Test::More might be able to help in testing the API: can_ok($module, @methods); Nath From n.haigh at sheffield.ac.uk Fri Jul 6 07:10:14 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 12:10:14 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange Message-ID: <468E2316.1030804@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm taking a look at the tests for Bio::Variation::RNAChange. If you create a new oject without arguments: my $obj = Bio::Variation::RNAChange->new(); What do you expect the following to return: $obj->label(); I thought it would probably be: 'inframe' However you get: 'inframe, deletion' Can anyone in the know explain what behaviour would be expected? Cheers Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjiMVczuW2jkwy2gRAv+0AJ9tA/1WgEbTRCen+FCi/DU/P2RnAwCfbGit B8DxDViDOcx2gTFjSwQ2kNg= =SroY -----END PGP SIGNATURE----- From n.haigh at sheffield.ac.uk Fri Jul 6 08:54:33 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 13:54:33 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E2316.1030804@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> Message-ID: <468E3B89.3090202@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nathan S. Haigh wrote: > I'm taking a look at the tests for Bio::Variation::RNAChange. > > If you create a new oject without arguments: > my $obj = Bio::Variation::RNAChange->new(); > > What do you expect the following to return: > $obj->label(); > > I thought it would probably be: > 'inframe' > > However you get: > 'inframe, deletion' > > Can anyone in the know explain what behaviour would be expected? > > Cheers > Nath Following on from this, AAChange has the following two methods: add_Allele() and allele_mut() It appears that allele_mut is only capable of remembering 1 allele at a time, whereas add_Allele() is provided to add support for mutliple alleles - is that correct? However, add_Allele() also calls allele_mut(), such that mutliple calls to add_Allele will result in the overwriting of the allele being remembered by allele_mut(). Things are further complicated by the fact that label() uses allele_mut() to decide on the label to return. Shouldn't label know aout multiple alleles set by multiple calls to add_Allele? It may be my lack of understanding alleles and what these classes are intending to do, but trying to rewrite the test scripts to improve code coverage has let me a little confused! Thanks Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjjuJczuW2jkwy2gRAgogAKDXAn8h5iFIBCjtQgxYsrUGofYpOwCguC6I b8ZOENvDDDIxphAoxeKg8/E= =f/sa -----END PGP SIGNATURE----- From tanzeem.mb at gmail.com Thu Jul 5 02:39:34 2007 From: tanzeem.mb at gmail.com (tanzeem) Date: Wed, 4 Jul 2007 23:39:34 -0700 (PDT) Subject: [Bioperl-l] Problem working with remoteblast submit method in webbrowser. In-Reply-To: <11114623.post@talk.nabble.com> References: <11114623.post@talk.nabble.com> Message-ID: <11441586.post@talk.nabble.com> Ifound it myself.run apache as root and disable selinux, the problem will not recur. tanzeem wrote: > > I have a program which uses the Bio perl remoteblast module which > compares a aminoacid fasta file with swissprot database. The > submit_blast() method works successfully when run from commandline.But > when the program is run from web browser it returns -1. I was trying to > adapt the code from Remoteblast synopsis for my need. > -- View this message in context: http://www.nabble.com/Problem-working-with-remoteblast-submit-method-in-webbrowser.-tf3919886.html#a11441586 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Fri Jul 6 09:00:32 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 06 Jul 2007 09:00:32 -0400 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: <1183726832.2566.34.camel@localhost.localdomain> On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote: > > I think we should just bite the bullet and proceed with pulling out > the controversial operator overloading in Bio::Annotation*, deprecate > the tag methods in AnnotatableI, and go about fixing everything up. > If that occurs (which seems to be the major impediment) and we get > GMOD/GBrowse playing well with BioPerl then we can aim for a new > stable release, and then institute a regular release cycle. > I think this sounds like a good idea to me too. I'm planning on having a GMOD hackathon at the end of the summer; if I had a new API by then, we could focus on fixing anything that gets broken by the changes. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Fri Jul 6 09:10:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 6 Jul 2007 08:10:41 -0500 Subject: [Bioperl-l] API Changes In-Reply-To: <468DEA9D.6010809@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> Message-ID: On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote: > David Messina wrote: >>> [Chris] >>> The 1.5 releases I believe break some aspects of 1.4 API >>> >> >> Yes, this is true. >> >> I question, though, whether it's relevant given that virtually no one >> uses 1.4 anymore. In any case, I would venture that the number of >> people who would be bitten by the 1.4->1.5 API change is much smaller >> than the number of people who download 1.4 and then ask us why it >> doesn't work. >> > > I'm not really up-to-speed with how the API should remain stable > etc. Is > the idea that the API should be stable from 1.4 though the 1.5 dev and > then the next stale release can change that API? So any stable to > stable > upgrade could involve an API change while a stable to dev upgrade > should > have the same API? Does a stable API mean that the same method > calls are > available in a newer release....what about adding new methods to a > newer > release? > > How are these API changes currently tracked? It seems to me that > Test::More might be able to help in testing the API: > > can_ok($module, @methods); > > > Nath It's basically a 'contract' of sorts between the devs (us) and users (us/them) that the API won't change for the extent of that release series, thus ensuring any scripts out there generating tons of data won't break down if they attempt to call a renamed method. We try to maintain the API state anyway for those reasons, but in a dev release series we might decide to change some method names for consistency and deprecate older ambiguously-named methods (see below). For a stable release it's critical the API remain intact. There are a few methods which are considered deprecated or will be deprecated. For instance, we recently talked about changes to method names which use case to specify whether you're receiving an object (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested list, or whether to use each_* vs next_* for iterators. Consistency is nice! chris From heikki at sanbi.ac.za Fri Jul 6 09:20:26 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 6 Jul 2007 15:20:26 +0200 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E3B89.3090202@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> <468E3B89.3090202@sheffield.ac.uk> Message-ID: <200707061520.27000.heikki@sanbi.ac.za> Hi Nat, These modules have not been touched for a while and were developed for a specific task. A revire is defiitely in order. The way RNAChange->label was written, it should return 'inframe' when given no alleles, but 'no change' would actually be better. The multiple alleles were originally though to be a good idea, but the vocabulary for labels was developed for single allele, only, The use of the module ended up being limited to single allele, so add_allele() behaviour was conveniently ignored but not removed. :( -Heikki On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: > Nathan S. Haigh wrote: > > I'm taking a look at the tests for Bio::Variation::RNAChange. > > > > If you create a new oject without arguments: > > my $obj = Bio::Variation::RNAChange->new(); > > > > What do you expect the following to return: > > $obj->label(); > > > > I thought it would probably be: > > 'inframe' > > > > However you get: > > 'inframe, deletion' > > > > Can anyone in the know explain what behaviour would be expected? > > > > Cheers > > Nath > > Following on from this, AAChange has the following two methods: > add_Allele() and allele_mut() > > It appears that allele_mut is only capable of remembering 1 allele at a > time, whereas add_Allele() is provided to add support for mutliple > alleles - is that correct? > > However, add_Allele() also calls allele_mut(), such that mutliple calls > to add_Allele will result in the overwriting of the allele being > remembered by allele_mut(). Things are further complicated by the fact > that label() uses allele_mut() to decide on the label to return. > Shouldn't label know aout multiple alleles set by multiple calls to > add_Allele? > > It may be my lack of understanding alleles and what these classes are > intending to do, but trying to rewrite the test scripts to improve code > coverage has let me a little confused! > > Thanks > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From schlesi at ebi.ac.uk Fri Jul 6 10:24:05 2007 From: schlesi at ebi.ac.uk (Felix Schlesinger) Date: Fri, 6 Jul 2007 15:24:05 +0100 Subject: [Bioperl-l] Unrooting a tree Message-ID: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> Hi, I am reading a rooted tree in newick format from a string (i.e. a bifurcation at the root) and would like to unroot it (i.e. a trifurcation at the root). I tried getting a grandchild of the root and adding it as a direct child, but that does not seem to work (the root still only has two descendents and the tree structure gets messed up). Is there a nice way to do this directly in bioperl? Doing it on the newick string is possible of course, but not nice. Thanks Felix From n.haigh at sheffield.ac.uk Fri Jul 6 11:37:19 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 16:37:19 +0100 Subject: [Bioperl-l] API Changes In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> Message-ID: <468E61AF.9040106@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris Fields wrote: > > On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote: > >> David Messina wrote: >>>> [Chris] >>>> The 1.5 releases I believe break some aspects of 1.4 API >>>> >>> >>> Yes, this is true. >>> >>> I question, though, whether it's relevant given that virtually no one >>> uses 1.4 anymore. In any case, I would venture that the number of >>> people who would be bitten by the 1.4->1.5 API change is much smaller >>> than the number of people who download 1.4 and then ask us why it >>> doesn't work. >>> >> >> I'm not really up-to-speed with how the API should remain stable etc. Is >> the idea that the API should be stable from 1.4 though the 1.5 dev and >> then the next stale release can change that API? So any stable to stable >> upgrade could involve an API change while a stable to dev upgrade should >> have the same API? Does a stable API mean that the same method calls are >> available in a newer release....what about adding new methods to a newer >> release? >> >> How are these API changes currently tracked? It seems to me that >> Test::More might be able to help in testing the API: >> >> can_ok($module, @methods); >> >> >> Nath > > It's basically a 'contract' of sorts between the devs (us) and users > (us/them) that the API won't change for the extent of that release > series, thus ensuring any scripts out there generating tons of data > won't break down if they attempt to call a renamed method. We try to > maintain the API state anyway for those reasons, but in a dev release > series we might decide to change some method names for consistency and > deprecate older ambiguously-named methods (see below). For a stable > release it's critical the API remain intact. Hmm, still not 100% clear - it is Friday! So, someone running a script that was designed when 1.4 was released should still be able to run their script for all future releases. So all changes need to be backward compatible? So you have several situations regarding method names: 1) Adding new methods should e fine since past scripts don't know about them and won't have used them 2) Removing methods would break past scripts that used them 3) Renamed methods would break past scripts that used the old name A stable API to me, means the same method calls should still be able to accept the same arguments (inc the constructor) and return the same object/data etc. What if a module is pretty outdated and would benefit from a rewrite - should all the old method names be included, what if this makes coding difficult? > > There are a few methods which are considered deprecated or will be > deprecated. For instance, we recently talked about changes to method > names which use case to specify whether you're receiving an object > (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested > list, or whether to use each_* vs next_* for iterators. Consistency is > nice! > You mean the use of case to signify objects vs data being returned are to be deprecated or encouraged? What was the outcome of the each_* vs next_*? Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjmGvczuW2jkwy2gRAkGeAKDBXVSBvN0b39xbK1+2RLed35knSQCgz3pk kAWH1zVa1ycopijl761cvkQ= =fppH -----END PGP SIGNATURE----- From n.haigh at sheffield.ac.uk Fri Jul 6 11:43:41 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 16:43:41 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <200707061520.27000.heikki@sanbi.ac.za> References: <468E2316.1030804@sheffield.ac.uk> <468E3B89.3090202@sheffield.ac.uk> <200707061520.27000.heikki@sanbi.ac.za> Message-ID: <468E632D.4090801@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Heikki Lehvaslaiho wrote: > Hi Nat, > > These modules have not been touched for a while and were developed for a > specific task. A revire is defiitely in order. > > The way RNAChange->label was written, it should return 'inframe' when given no > alleles, but 'no change' would actually be better. Wouldn't this effectively be changing the API since past scripts "could" expect "inframe" to be returned. > > The multiple alleles were originally though to be a good idea, but the > vocabulary for labels was developed for single allele, only, The use of the > module ended up being limited to single allele, so add_allele() behaviour was > conveniently ignored but not removed. :( So add_Allele() and each_Allele() should be deprecated in favour of allele_mut()? - From my post about API's.....how should the capitalisation of add_Allele() and each_Allele() be changed? Cheers Nath > > -Heikki > > > > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: >> Nathan S. Haigh wrote: >>> I'm taking a look at the tests for Bio::Variation::RNAChange. >>> >>> If you create a new oject without arguments: >>> my $obj = Bio::Variation::RNAChange->new(); >>> >>> What do you expect the following to return: >>> $obj->label(); >>> >>> I thought it would probably be: >>> 'inframe' >>> >>> However you get: >>> 'inframe, deletion' >>> >>> Can anyone in the know explain what behaviour would be expected? >>> >>> Cheers >>> Nath >> Following on from this, AAChange has the following two methods: >> add_Allele() and allele_mut() >> >> It appears that allele_mut is only capable of remembering 1 allele at a >> time, whereas add_Allele() is provided to add support for mutliple >> alleles - is that correct? >> >> However, add_Allele() also calls allele_mut(), such that mutliple calls >> to add_Allele will result in the overwriting of the allele being >> remembered by allele_mut(). Things are further complicated by the fact >> that label() uses allele_mut() to decide on the label to return. >> Shouldn't label know aout multiple alleles set by multiple calls to >> add_Allele? >> >> It may be my lack of understanding alleles and what these classes are >> intending to do, but trying to rewrite the test scripts to improve code >> coverage has let me a little confused! >> >> Thanks >> Nath >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjmMtczuW2jkwy2gRAgQHAKC+S5mVh4lqR95NmgR6z+aU9br5lQCfc6ue GBHuSHfsesX1ko55s+ME2Zc= =tkG8 -----END PGP SIGNATURE----- From cjfields at uiuc.edu Sat Jul 7 16:57:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 7 Jul 2007 15:57:37 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <1183726832.2566.34.camel@localhost.localdomain> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <1183726832.2566.34.camel@localhost.localdomain> Message-ID: <198D3F24-8510-453D-9201-21F2CCEC3519@uiuc.edu> We'll prob. get a start soon, then. I'll let you know when we start. chris On Jul 6, 2007, at 8:00 AM, Scott Cain wrote: > On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote: >> >> I think we should just bite the bullet and proceed with pulling out >> the controversial operator overloading in Bio::Annotation*, deprecate >> the tag methods in AnnotatableI, and go about fixing everything up. >> If that occurs (which seems to be the major impediment) and we get >> GMOD/GBrowse playing well with BioPerl then we can aim for a new >> stable release, and then institute a regular release cycle. >> > I think this sounds like a good idea to me too. I'm planning on > having > a GMOD hackathon at the end of the summer; if I had a new API by then, > we could focus on fixing anything that gets broken by the changes. > > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Jul 7 17:17:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 7 Jul 2007 16:17:14 -0500 Subject: [Bioperl-l] API Changes In-Reply-To: <468E61AF.9040106@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> <468E61AF.9040106@sheffield.ac.uk> Message-ID: <369F72D5-E5A3-4A33-BDEC-D462A339474F@uiuc.edu> On Jul 6, 2007, at 10:37 AM, Nathan S. Haigh wrote: > ... > Hmm, still not 100% clear - it is Friday! > > So, someone running a script that was designed when 1.4 was released > should still be able to run their script for all future releases. > So all > changes need to be backward compatible? It helps. For instance, if we change method names (rename each_Foo as next_Foo), we should have each_Foo delegate to next_Foo for the time being. If we plan on deprecating the old method altogether we would add a warning message when it's called, then delegate. It's a better solution than just changing the method outright, which means the user has to search through docs to find the renamed method. > So you have several situations regarding method names: > 1) Adding new methods should e fine since past scripts don't know > about > them and won't have used them > 2) Removing methods would break past scripts that used them > 3) Renamed methods would break past scripts that used the old name > > A stable API to me, means the same method calls should still be > able to > accept the same arguments (inc the constructor) and return the same > object/data etc. Yes. > What if a module is pretty outdated and would benefit from a rewrite - > should all the old method names be included, what if this makes coding > difficult? It depends on the module. If a complete rewrite is needed then maybe starting with a new module/interface is best, and we could deprecate the older module completely. That has been done already with Bio::Tools::BPLite (in favor of SearchIO) and a few other modules. >> There are a few methods which are considered deprecated or will be >> deprecated. For instance, we recently talked about changes to method >> names which use case to specify whether you're receiving an object >> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. >> nested >> list, or whether to use each_* vs next_* for iterators. >> Consistency is >> nice! >> > > You mean the use of case to signify objects vs data being returned are > to be deprecated or encouraged? What was the outcome of the each_* vs > next_*? > > Nath Here's the section I added to the wiki (it started in a thread a few weeks or so ago, so it's a summary really): http://www.bioperl.org/wiki/Advanced_BioPerl#Method_names Feel free to add to it or make suggestions. BTWm Hilmar mentioned there was a movement to rename methods in old code to follow these recs but it was never completed. It should be taken up again at some point but the recommendations are mainly here for newer code. chris From heikki at sanbi.ac.za Sun Jul 8 03:32:21 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Sun, 8 Jul 2007 09:32:21 +0200 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E632D.4090801@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> <200707061520.27000.heikki@sanbi.ac.za> <468E632D.4090801@sheffield.ac.uk> Message-ID: <200707080932.21818.heikki@sanbi.ac.za> On Friday 06 July 2007 17:43:41 Nathan S. Haigh wrote: > Heikki Lehvaslaiho wrote: > > Hi Nat, > > > > These modules have not been touched for a while and were developed for a > > specific task. A revire is defiitely in order. > > > > The way RNAChange->label was written, it should return 'inframe' when > > given no alleles, but 'no change' would actually be better. > > Wouldn't this effectively be changing the API since past scripts "could" > expect "inframe" to be returned. Checking tha actal usage and what happens when you do change of a nucleotide to itself, you get the label 'silent'. I guess that would be a valid lable value even when the alleles are not initialised, too. > > The multiple alleles were originally though to be a good idea, but the > > vocabulary for labels was developed for single allele, only, The use of > > the module ended up being limited to single allele, so add_allele() > > behaviour was conveniently ignored but not removed. :( > > So add_Allele() and each_Allele() should be deprecated in favour of > allele_mut()? Yes. > From my post about API's.....how should the capitalisation of > add_Allele() and each_Allele() be changed? Definitely, keept the current ones as deprecated alternatives. -Heikki > Cheers > Nath > > > -Heikki > > > > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: > >> Nathan S. Haigh wrote: > >>> I'm taking a look at the tests for Bio::Variation::RNAChange. > >>> > >>> If you create a new oject without arguments: > >>> my $obj = Bio::Variation::RNAChange->new(); > >>> > >>> What do you expect the following to return: > >>> $obj->label(); > >>> > >>> I thought it would probably be: > >>> 'inframe' > >>> > >>> However you get: > >>> 'inframe, deletion' > >>> > >>> Can anyone in the know explain what behaviour would be expected? > >>> > >>> Cheers > >>> Nath > >> > >> Following on from this, AAChange has the following two methods: > >> add_Allele() and allele_mut() > >> > >> It appears that allele_mut is only capable of remembering 1 allele at a > >> time, whereas add_Allele() is provided to add support for mutliple > >> alleles - is that correct? > >> > >> However, add_Allele() also calls allele_mut(), such that mutliple calls > >> to add_Allele will result in the overwriting of the allele being > >> remembered by allele_mut(). Things are further complicated by the fact > >> that label() uses allele_mut() to decide on the label to return. > >> Shouldn't label know aout multiple alleles set by multiple calls to > >> add_Allele? > >> > >> It may be my lack of understanding alleles and what these classes are > >> intending to do, but trying to rewrite the test scripts to improve code > >> coverage has let me a little confused! > >> > >> Thanks > >> Nath > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From xing.y.hu at gmail.com Mon Jul 9 02:26:40 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Mon, 09 Jul 2007 14:26:40 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? Message-ID: <4691D520.60700@gmail.com> Hi friends, I wrote a script for getting genomic sequence file from GenBank. To fulfill that target, I used DB::GenBank module to get the sequence via get_Seq_by_acc, and it works well. But this time, facing enormous amount of ESTs, I have no idea how to download them swiftly and elegantly. PROBLEM DESCRIPTION: goal: download all EST files of a specific species from GenBank, say Arabidopsis Thaliana or Oryza sativa(rice). other: whether all of ESTs are in a single file or separatedly placed does not matter. Can I use a bioperl script to achieve that? And How? I really appreciate. Xing. From akozik at atgc.org Mon Jul 9 08:25:14 2007 From: akozik at atgc.org (Alexander Kozik) Date: Mon, 09 Jul 2007 05:25:14 -0700 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <4691D520.60700@gmail.com> References: <4691D520.60700@gmail.com> Message-ID: <4692292A.1080900@atgc.org> To download genomic sequences or ESTs for any organism (in various formats) you can use NCBI Taxonomy Browser: http://www.ncbi.nlm.nih.gov/Taxonomy/ you can use taxonomy id to access different organisms, Arabidopsis for example (3702): http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 or by direct web link: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 assembled genomes can be accessed via ftp: ftp://ftp.ncbi.nih.gov/genomes/ To download large amount of selected sequences (ESTs for example) you can use batch Entrez: http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide (select EST for EST, it's critical) It seems, to solve the problem you describe, you don't need to use bioperl. NCBI GenBank Entrez provides all necessary tools to work on these simple and frequent tasks. -Alex -- Alexander Kozik Bioinformatics Specialist Genome and Biomedical Sciences Facility 451 East Health Sciences Drive University of California Davis, CA 95616-8816 Phone: (530) 754-9127 email#1: akozik at atgc.org email#2: akozik at gmail.com web: http://www.atgc.org/ Xing Hu wrote: > Hi friends, > > I wrote a script for getting genomic sequence file from GenBank. To > fulfill that target, I used DB::GenBank module to get the sequence via > get_Seq_by_acc, and it works well. But this time, facing enormous amount > of ESTs, I have no idea how to download them swiftly and elegantly. > > PROBLEM DESCRIPTION: > goal: download all EST files of a specific species from GenBank, say > Arabidopsis Thaliana or Oryza sativa(rice). > other: whether all of ESTs are in a single file or separatedly > placed does not matter. > > Can I use a bioperl script to achieve that? And How? I really > appreciate. > > Xing. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Jul 9 10:17:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 9 Jul 2007 09:17:23 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <4692292A.1080900@atgc.org> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> Message-ID: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> Caveat: if you have millions of ESTs please consider NOT using my eutil script below or NCBI Batch Entrez, which would repeatedly hit the NCBI server thousands of times. At least try looking for other ways to retrieve the data you want (ftp, organism-specific resources like Ensembl, so on), or run any scripts or data retrieval in off hours so you don't overtax the NCBI server. There is a way you can use BioPerl if you don't mind living on the bleeding edge by using bioperl-live (core code from CVS). I have been working on a set of modules for the last year (Bio::DB::EUtilities) which interact with all the various eutils for building data pipelines which uses the NCBI CGI interface. You could possibly retrieve all relevant ESTs using a variation of the example script here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch Note that the code examples do NOT work with rel. 1.5.2 code as the API has changed quite a bit; I'm working to rectify some of that. The script I would use is below. It retrieves batches of 500 sequences (in fasta format) at a time, for a total of 10000 max seq records, saving the raw record data directly to a file (appending as you go along). I added an eval block to check the server status and redo the call up to 4 times before giving up completely. Using eval this way hasn't been extensively tested but should work. --------------------------------------- use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'nucest', -term => 'txid3702', -usehistory => 'y', -keep_histories => 1); my $count = $factory->get_count; print "Count: $count\n"; if (my $hist = $factory->next_History) { print "History returned\n"; # note db carries over from above $factory->set_parameters(-eutil => 'efetch', -rettype => 'fasta', -history => $hist); my ($retmax, $retstart) = (500,0); my $retry = 1; my $maxcount = $count < 10000 ? $count : 10000; # set max # seq records to return RETRIEVE_SEQS: while ($retstart < $maxcount) { print "Returning from ",$retstart+1," to ",$retstart+ $retmax,"\n"; $factory->set_parameters(-retmax => $retmax, -retstart => $retstart); # check in case of server error eval{ $factory->get_Response(-file => ">>ESTs.fas"); }; if ($@) { die "Server error: $@. Try again later" if $retry == 5; print STDERR "Server error, redo #$retry\n"; $retry++ && redo RETRIEVE_SEQS; } $retstart += $retmax; } } --------------------------------------- chris On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: > To download genomic sequences or ESTs for any organism (in various > formats) you can use NCBI Taxonomy Browser: > http://www.ncbi.nlm.nih.gov/Taxonomy/ > > you can use taxonomy id to access different organisms, Arabidopsis for > example (3702): > http://www.ncbi.nlm.nih.gov/sites/entrez? > db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 > > or by direct web link: > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? > mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 > > assembled genomes can be accessed via ftp: > ftp://ftp.ncbi.nih.gov/genomes/ > > To download large amount of selected sequences (ESTs for example) you > can use batch Entrez: > http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html > http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide > (select EST for EST, it's critical) > > It seems, to solve the problem you describe, you don't need to use > bioperl. NCBI GenBank Entrez provides all necessary tools to work on > these simple and frequent tasks. > > -Alex > > -- > Alexander Kozik > Bioinformatics Specialist > Genome and Biomedical Sciences Facility > 451 East Health Sciences Drive > University of California > Davis, CA 95616-8816 > Phone: (530) 754-9127 > email#1: akozik at atgc.org > email#2: akozik at gmail.com > web: http://www.atgc.org/ > > > > Xing Hu wrote: >> Hi friends, >> >> I wrote a script for getting genomic sequence file from >> GenBank. To >> fulfill that target, I used DB::GenBank module to get the sequence >> via >> get_Seq_by_acc, and it works well. But this time, facing enormous >> amount >> of ESTs, I have no idea how to download them swiftly and elegantly. >> >> PROBLEM DESCRIPTION: >> goal: download all EST files of a specific species from >> GenBank, say >> Arabidopsis Thaliana or Oryza sativa(rice). >> other: whether all of ESTs are in a single file or separatedly >> placed does not matter. >> >> Can I use a bioperl script to achieve that? And How? I really >> appreciate. >> >> Xing. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Mon Jul 9 14:08:07 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 9 Jul 2007 11:08:07 -0700 Subject: [Bioperl-l] Unrooting a tree In-Reply-To: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> Message-ID: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> I don't think there is a function for this yet but it would be a good one to have. I assume you don't really want to take a shot at writing it though? To make this work I think you have to create a new node which contains the trifurcation and this node is what the root is set to. -jason On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote: > Hi, > > I am reading a rooted tree in newick format from a string (i.e. a > bifurcation at the root) and would like to unroot it (i.e. a > trifurcation at the root). I tried getting a grandchild of the root > and adding it as a direct child, but that does not seem to work (the > root still only has two descendents and the tree structure gets messed > up). Is there a nice way to do this directly in bioperl? Doing it on > the newick string is possible of course, but not nice. > > Thanks > Felix > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From lstein at cshl.edu Mon Jul 9 17:35:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 9 Jul 2007 17:35:49 -0400 Subject: [Bioperl-l] JOB NOTICE: Looking for CSHL bioinformatics core manager Message-ID: <6dce9a0b0707091435h3d134b05oa6f7da24839c24bb@mail.gmail.com> Hi Folks, Sorry for the job spam. We're looking for a manager of the Cold Spring Harbor Laboratory bioinformatics core facility. This is a semi-independent staff position supporting CSHL scientific researchers by providing consultation, data mining and software development activities. You will have a software staff of two, a nice salary, good health benefits, and an exciting and dynamic environment to work in. I'm looking for someone with a strong bioinformatics background, at least five years experience programming Perl, Java or Python in a academic or commercial environment, and management experience. If you are interested, please send your CV and cover letter to me. Thanks, Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From stewarta at nmrc.navy.mil Mon Jul 9 18:16:12 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Mon, 9 Jul 2007 18:16:12 -0400 Subject: [Bioperl-l] rpsblast Message-ID: <9DF71DFB-F54E-4392-89E3-33345EC2DB36@nmrc.navy.mil> When I run... $result = $factory->rpsblast($seq); ... where $seq is a Bio::Seq object, it seems to simply copy the $seq object to $result; When I run something similar... $rpsblast('/path/to/ myFile'); ... the value of $result then becomes '/path/to/myFile'. Anyone else encounter this? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From jason_stajich at berkeley.edu Mon Jul 9 21:36:10 2007 From: jason_stajich at berkeley.edu (Jason Stajich) Date: Mon, 9 Jul 2007 18:36:10 -0700 Subject: [Bioperl-l] BOSC2007 Message-ID: I posted a quick note about meeting up at BOSC/ISMB this year. If you are attending, please sign your name on the page or at least express an interest on whether you are interested in a BoF. We'll try and discuss some of the current topics in BioPerl development as well try and use the time to coordinate any development that benefits from the face-to-face time. http://bioperl.org/wiki/BOSC2007_Meetup http://bioperl.org/news/2007/07/09/are-you-going-to-ismbbosc-2007/ -jason -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From schlesi at ebi.ac.uk Tue Jul 10 08:58:00 2007 From: schlesi at ebi.ac.uk (Felix Schlesinger) Date: Tue, 10 Jul 2007 13:58:00 +0100 Subject: [Bioperl-l] Unrooting a tree In-Reply-To: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> Message-ID: <7317d50c0707100558m76853bf8s37ee1e8852835306@mail.gmail.com> Hi, > I don't think there is a function for this yet but it would be a good one > to have. > I assume you don't really want to take a shot at writing it though? > To make this work I think you have to create a new node which contains the > trifurcation and this node is what the root is set to. Creating a new root is fine, but what would the (3) children of that node be? I took a different approach now, where I iterate over all (indirect) descendents of the root, find the first one which does not have the root as its direct ancestor and move it up the tree, i.e. foreach my $d ($root->get_all_Descendents){ if ($d->ancestor != $root){ $d->ancestor->remove_Descendent($d); if ($root->add_Descendent($d, 1) == 3){ last; }}} This will make the old root a trifurcation. It does the right thing for what I am trying to do, but is not general I believe (it does for example at the moment not worry about branch length). Also instead of taking the first, taking the most distant possible subtree of a clade up to the root might be better. Felix > On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote: > > Hi, > > I am reading a rooted tree in newick format from a string (i.e. a > bifurcation at the root) and would like to unroot it (i.e. a > trifurcation at the root). I tried getting a grandchild of the root > and adding it as a direct child, but that does not seem to work (the > root still only has two descendents and the tree structure gets messed > up). Is there a nice way to do this directly in bioperl? Doing it on > the newick string is possible of course, but not nice. > > Thanks > Felix > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From xing.y.hu at gmail.com Tue Jul 10 09:29:36 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Tue, 10 Jul 2007 21:29:36 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> Message-ID: <469389C0.5060303@gmail.com> Thanks you guys. I had to confess that how stupid I was. The easiest way seems to be the way using NCBI Taxonomy Browser which suggested by alex. As a matter of fact, I knew that but I thought it was necessary to have all items selected before pressing save to launch download. So I was desperate to find a button that could achieve that without hundreds of thousands of clicking by me. "What about select none of those items at all?" -- This idea finally came to me after days of struggling and the problem was solved. Xing Chris Fields wrote: > Caveat: if you have millions of ESTs please consider NOT using my > eutil script below or NCBI Batch Entrez, which would repeatedly hit > the NCBI server thousands of times. At least try looking for other > ways to retrieve the data you want (ftp, organism-specific resources > like Ensembl, so on), or run any scripts or data retrieval in off > hours so you don't overtax the NCBI server. > > There is a way you can use BioPerl if you don't mind living on the > bleeding edge by using bioperl-live (core code from CVS). I have been > working on a set of modules for the last year (Bio::DB::EUtilities) > which interact with all the various eutils for building data pipelines > which uses the NCBI CGI interface. You could possibly retrieve all > relevant ESTs using a variation of the example script here: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch > > Note that the code examples do NOT work with rel. 1.5.2 code as the > API has changed quite a bit; I'm working to rectify some of that. > > The script I would use is below. It retrieves batches of 500 > sequences (in fasta format) at a time, for a total of 10000 max seq > records, saving the raw record data directly to a file (appending as > you go along). I added an eval block to check the server status and > redo the call up to 4 times before giving up completely. Using eval > this way hasn't been extensively tested but should work. > > --------------------------------------- > > use Bio::DB::EUtilities; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'nucest', > -term => 'txid3702', > -usehistory => 'y', > -keep_histories => 1); > > my $count = $factory->get_count; > > print "Count: $count\n"; > > if (my $hist = $factory->next_History) { > print "History returned\n"; > # note db carries over from above > $factory->set_parameters(-eutil => 'efetch', > -rettype => 'fasta', > -history => $hist); > my ($retmax, $retstart) = (500,0); > my $retry = 1; > my $maxcount = $count < 10000 ? $count : 10000; # set max # seq > records to return > RETRIEVE_SEQS: > while ($retstart < $maxcount) { > print "Returning from ",$retstart+1," to > ",$retstart+$retmax,"\n"; > $factory->set_parameters(-retmax => $retmax, > -retstart => $retstart); > # check in case of server error > eval{ > $factory->get_Response(-file => ">>ESTs.fas"); > }; > if ($@) { > die "Server error: $@. Try again later" if $retry == 5; > print STDERR "Server error, redo #$retry\n"; > $retry++ && redo RETRIEVE_SEQS; > } > $retstart += $retmax; > } > } > > > --------------------------------------- > > > chris > > On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: > >> To download genomic sequences or ESTs for any organism (in various >> formats) you can use NCBI Taxonomy Browser: >> http://www.ncbi.nlm.nih.gov/Taxonomy/ >> >> you can use taxonomy id to access different organisms, Arabidopsis for >> example (3702): >> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >> >> >> or by direct web link: >> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >> >> >> assembled genomes can be accessed via ftp: >> ftp://ftp.ncbi.nih.gov/genomes/ >> >> To download large amount of selected sequences (ESTs for example) you >> can use batch Entrez: >> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >> (select EST for EST, it's critical) >> >> It seems, to solve the problem you describe, you don't need to use >> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >> these simple and frequent tasks. >> >> -Alex >> >> --Alexander Kozik >> Bioinformatics Specialist >> Genome and Biomedical Sciences Facility >> 451 East Health Sciences Drive >> University of California >> Davis, CA 95616-8816 >> Phone: (530) 754-9127 >> email#1: akozik at atgc.org >> email#2: akozik at gmail.com >> web: http://www.atgc.org/ >> >> >> >> Xing Hu wrote: >>> Hi friends, >>> >>> I wrote a script for getting genomic sequence file from GenBank. To >>> fulfill that target, I used DB::GenBank module to get the sequence via >>> get_Seq_by_acc, and it works well. But this time, facing enormous >>> amount >>> of ESTs, I have no idea how to download them swiftly and elegantly. >>> >>> PROBLEM DESCRIPTION: >>> goal: download all EST files of a specific species from GenBank, >>> say >>> Arabidopsis Thaliana or Oryza sativa(rice). >>> other: whether all of ESTs are in a single file or separatedly >>> placed does not matter. >>> >>> Can I use a bioperl script to achieve that? And How? I really >>> appreciate. >>> >>> Xing. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From davila at ioc.fiocruz.br Tue Jul 10 09:58:29 2007 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Tue, 10 Jul 2007 10:58:29 -0300 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <469389C0.5060303@gmail.com> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> Message-ID: <46939085.40906@ioc.fiocruz.br> Hi Xing, Unfortunately that did not work for me... there are 5133 T. brucei ESTs (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) and 13971 from T. cruzi (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) that I cannot download at once in GenBank format... even when I select "GenBank" format in the Display menu I can only see and get/download 500 ESTs each time... I also downloaded all ESTs from GenBank (a pity there are not subsets of them !) but merging all them generate a file bigger than 120GB to be processed... Just asked Diogo (my student) to give a try to the script sent by Chris Fields.. so finger crossed ;-) Cheers, Alberto Xing Hu wrote: > Thanks you guys. > > I had to confess that how stupid I was. The easiest way seems to be the > way using NCBI Taxonomy Browser which suggested by alex. As a matter of > fact, I knew that but I thought it was necessary to have all items > selected before pressing save to launch download. So I was desperate to > find a button that could achieve that without hundreds of thousands of > clicking by me. "What about select none of those items at all?" -- This > idea finally came to me after days of struggling and the problem was solved. > > Xing > > > > Chris Fields wrote: >> Caveat: if you have millions of ESTs please consider NOT using my >> eutil script below or NCBI Batch Entrez, which would repeatedly hit >> the NCBI server thousands of times. At least try looking for other >> ways to retrieve the data you want (ftp, organism-specific resources >> like Ensembl, so on), or run any scripts or data retrieval in off >> hours so you don't overtax the NCBI server. >> >> There is a way you can use BioPerl if you don't mind living on the >> bleeding edge by using bioperl-live (core code from CVS). I have been >> working on a set of modules for the last year (Bio::DB::EUtilities) >> which interact with all the various eutils for building data pipelines >> which uses the NCBI CGI interface. You could possibly retrieve all >> relevant ESTs using a variation of the example script here: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch >> >> Note that the code examples do NOT work with rel. 1.5.2 code as the >> API has changed quite a bit; I'm working to rectify some of that. >> >> The script I would use is below. It retrieves batches of 500 >> sequences (in fasta format) at a time, for a total of 10000 max seq >> records, saving the raw record data directly to a file (appending as >> you go along). I added an eval block to check the server status and >> redo the call up to 4 times before giving up completely. Using eval >> this way hasn't been extensively tested but should work. >> >> --------------------------------------- >> >> use Bio::DB::EUtilities; >> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -db => 'nucest', >> -term => 'txid3702', >> -usehistory => 'y', >> -keep_histories => 1); >> >> my $count = $factory->get_count; >> >> print "Count: $count\n"; >> >> if (my $hist = $factory->next_History) { >> print "History returned\n"; >> # note db carries over from above >> $factory->set_parameters(-eutil => 'efetch', >> -rettype => 'fasta', >> -history => $hist); >> my ($retmax, $retstart) = (500,0); >> my $retry = 1; >> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >> records to return >> RETRIEVE_SEQS: >> while ($retstart < $maxcount) { >> print "Returning from ",$retstart+1," to >> ",$retstart+$retmax,"\n"; >> $factory->set_parameters(-retmax => $retmax, >> -retstart => $retstart); >> # check in case of server error >> eval{ >> $factory->get_Response(-file => ">>ESTs.fas"); >> }; >> if ($@) { >> die "Server error: $@. Try again later" if $retry == 5; >> print STDERR "Server error, redo #$retry\n"; >> $retry++ && redo RETRIEVE_SEQS; >> } >> $retstart += $retmax; >> } >> } >> >> >> --------------------------------------- >> >> >> chris >> >> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >> >>> To download genomic sequences or ESTs for any organism (in various >>> formats) you can use NCBI Taxonomy Browser: >>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>> >>> you can use taxonomy id to access different organisms, Arabidopsis for >>> example (3702): >>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>> >>> >>> or by direct web link: >>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>> >>> >>> assembled genomes can be accessed via ftp: >>> ftp://ftp.ncbi.nih.gov/genomes/ >>> >>> To download large amount of selected sequences (ESTs for example) you >>> can use batch Entrez: >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>> (select EST for EST, it's critical) >>> >>> It seems, to solve the problem you describe, you don't need to use >>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >>> these simple and frequent tasks. >>> >>> -Alex >>> >>> --Alexander Kozik >>> Bioinformatics Specialist >>> Genome and Biomedical Sciences Facility >>> 451 East Health Sciences Drive >>> University of California >>> Davis, CA 95616-8816 >>> Phone: (530) 754-9127 >>> email#1: akozik at atgc.org >>> email#2: akozik at gmail.com >>> web: http://www.atgc.org/ >>> >>> >>> >>> Xing Hu wrote: >>>> Hi friends, >>>> >>>> I wrote a script for getting genomic sequence file from GenBank. To >>>> fulfill that target, I used DB::GenBank module to get the sequence via >>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>> amount >>>> of ESTs, I have no idea how to download them swiftly and elegantly. >>>> >>>> PROBLEM DESCRIPTION: >>>> goal: download all EST files of a specific species from GenBank, >>>> say >>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>> other: whether all of ESTs are in a single file or separatedly >>>> placed does not matter. >>>> >>>> Can I use a bioperl script to achieve that? And How? I really >>>> appreciate. >>>> >>>> Xing. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> From cjfields at uiuc.edu Tue Jul 10 10:05:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 09:05:43 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <46939085.40906@ioc.fiocruz.br> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> Message-ID: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> Just make sure you're using the latest from CVS. Let me know if it doesn't work and I'll look into it. chris On Jul 10, 2007, at 8:58 AM, Alberto Davila wrote: > Hi Xing, > > Unfortunately that did not work for me... there are 5133 T. brucei > ESTs > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691 > [Organism:exp]&cmd=Search&db=nucest&QueryKey=8) > and 13971 from T. cruzi > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693 > [Organism:exp]&cmd=Search&db=nucest&QueryKey=11) > that I cannot download at once in GenBank format... even when I > select > "GenBank" format in the Display menu I can only see and get/ > download 500 > ESTs each time... > > I also downloaded all ESTs from GenBank (a pity there are not > subsets of > them !) but merging all them generate a file bigger than 120GB to be > processed... > > Just asked Diogo (my student) to give a try to the script sent by > Chris > Fields.. so finger crossed ;-) > > Cheers, Alberto > > > Xing Hu wrote: >> Thanks you guys. >> >> I had to confess that how stupid I was. The easiest way seems to >> be the >> way using NCBI Taxonomy Browser which suggested by alex. As a >> matter of >> fact, I knew that but I thought it was necessary to have all items >> selected before pressing save to launch download. So I was >> desperate to >> find a button that could achieve that without hundreds of >> thousands of >> clicking by me. "What about select none of those items at all?" -- >> This >> idea finally came to me after days of struggling and the problem >> was solved. >> >> Xing >> >> >> >> Chris Fields wrote: >>> Caveat: if you have millions of ESTs please consider NOT using my >>> eutil script below or NCBI Batch Entrez, which would repeatedly hit >>> the NCBI server thousands of times. At least try looking for other >>> ways to retrieve the data you want (ftp, organism-specific resources >>> like Ensembl, so on), or run any scripts or data retrieval in off >>> hours so you don't overtax the NCBI server. >>> >>> There is a way you can use BioPerl if you don't mind living on the >>> bleeding edge by using bioperl-live (core code from CVS). I have >>> been >>> working on a set of modules for the last year (Bio::DB::EUtilities) >>> which interact with all the various eutils for building data >>> pipelines >>> which uses the NCBI CGI interface. You could possibly retrieve all >>> relevant ESTs using a variation of the example script here: >>> >>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-. >>> 3Eefetch >>> >>> Note that the code examples do NOT work with rel. 1.5.2 code as the >>> API has changed quite a bit; I'm working to rectify some of that. >>> >>> The script I would use is below. It retrieves batches of 500 >>> sequences (in fasta format) at a time, for a total of 10000 max seq >>> records, saving the raw record data directly to a file (appending as >>> you go along). I added an eval block to check the server status and >>> redo the call up to 4 times before giving up completely. Using eval >>> this way hasn't been extensively tested but should work. >>> >>> --------------------------------------- >>> >>> use Bio::DB::EUtilities; >>> >>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >>> -db => 'nucest', >>> -term => 'txid3702', >>> -usehistory => 'y', >>> -keep_histories => 1); >>> >>> my $count = $factory->get_count; >>> >>> print "Count: $count\n"; >>> >>> if (my $hist = $factory->next_History) { >>> print "History returned\n"; >>> # note db carries over from above >>> $factory->set_parameters(-eutil => 'efetch', >>> -rettype => 'fasta', >>> -history => $hist); >>> my ($retmax, $retstart) = (500,0); >>> my $retry = 1; >>> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >>> records to return >>> RETRIEVE_SEQS: >>> while ($retstart < $maxcount) { >>> print "Returning from ",$retstart+1," to >>> ",$retstart+$retmax,"\n"; >>> $factory->set_parameters(-retmax => $retmax, >>> -retstart => $retstart); >>> # check in case of server error >>> eval{ >>> $factory->get_Response(-file => ">>ESTs.fas"); >>> }; >>> if ($@) { >>> die "Server error: $@. Try again later" if $retry == 5; >>> print STDERR "Server error, redo #$retry\n"; >>> $retry++ && redo RETRIEVE_SEQS; >>> } >>> $retstart += $retmax; >>> } >>> } >>> >>> >>> --------------------------------------- >>> >>> >>> chris >>> >>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >>> >>>> To download genomic sequences or ESTs for any organism (in various >>>> formats) you can use NCBI Taxonomy Browser: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>>> >>>> you can use taxonomy id to access different organisms, >>>> Arabidopsis for >>>> example (3702): >>>> http://www.ncbi.nlm.nih.gov/sites/entrez? >>>> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>>> >>>> >>>> or by direct web link: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? >>>> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>>> >>>> >>>> assembled genomes can be accessed via ftp: >>>> ftp://ftp.ncbi.nih.gov/genomes/ >>>> >>>> To download large amount of selected sequences (ESTs for >>>> example) you >>>> can use batch Entrez: >>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>>> (select EST for EST, it's critical) >>>> >>>> It seems, to solve the problem you describe, you don't need to use >>>> bioperl. NCBI GenBank Entrez provides all necessary tools to >>>> work on >>>> these simple and frequent tasks. >>>> >>>> -Alex >>>> >>>> --Alexander Kozik >>>> Bioinformatics Specialist >>>> Genome and Biomedical Sciences Facility >>>> 451 East Health Sciences Drive >>>> University of California >>>> Davis, CA 95616-8816 >>>> Phone: (530) 754-9127 >>>> email#1: akozik at atgc.org >>>> email#2: akozik at gmail.com >>>> web: http://www.atgc.org/ >>>> >>>> >>>> >>>> Xing Hu wrote: >>>>> Hi friends, >>>>> >>>>> I wrote a script for getting genomic sequence file from >>>>> GenBank. To >>>>> fulfill that target, I used DB::GenBank module to get the >>>>> sequence via >>>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>>> amount >>>>> of ESTs, I have no idea how to download them swiftly and >>>>> elegantly. >>>>> >>>>> PROBLEM DESCRIPTION: >>>>> goal: download all EST files of a specific species from >>>>> GenBank, >>>>> say >>>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>>> other: whether all of ESTs are in a single file or separatedly >>>>> placed does not matter. >>>>> >>>>> Can I use a bioperl script to achieve that? And How? I really >>>>> appreciate. >>>>> >>>>> Xing. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From diogoat at gmail.com Tue Jul 10 10:15:20 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 10 Jul 2007 11:15:20 -0300 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> Message-ID: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> Deal All, I use this script bellow, and it`s work very fine! I only changed the query! And the script gave me the 5133 EST from T. brucei. ################################################################################# use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query =>'gbdiv est[prop] AND Trypanosoma brucei [organism]', db => 'nucleotide'); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); my $out = Bio::SeqIO->new(-format => 'Genbank', -file => '>>Tbrucei.EST.fasta'); while (my $seq = $seqio->next_seq){ $out->write_seq($seq); } #################################################################### Diogo Tschoeke/Fiocruz (Alberto`s Student) From cjfields at uiuc.edu Tue Jul 10 10:35:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 09:35:03 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> Message-ID: <4D704A90-A88A-44A3-B514-E5031CBF288C@uiuc.edu> That will work as well; the key difference between my example and this one is that the seq stream retrieved using Bio::DB::GenBank passes through Bio::SeqIO while Bio::DB::EUtilities saves the raw seq record directly to a file (or callback or HTTP::Response) for optionally parsing later. If you have problems with Bio::SeqIO you can always use Bio::DB::EUtilities to get around the issue until we resolve it. chris On Jul 10, 2007, at 9:15 AM, Diogo Tschoeke wrote: > Deal All, > I use this script bellow, and it`s work very fine! > I only changed the query! And the script gave me the 5133 EST from T. > brucei. > > ###################################################################### > ########### > use strict; > use warnings; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > my $query = Bio::DB::Query::GenBank->new > (-query =>'gbdiv est[prop] AND > Trypanosoma > brucei [organism]', > db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > my $out = Bio::SeqIO->new(-format => 'Genbank', > -file => '>>Tbrucei.EST.fasta'); > while (my $seq = $seqio->next_seq){ > $out->write_seq($seq); > } > #################################################################### > > Diogo Tschoeke/Fiocruz (Alberto`s Student) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hartzell at alerce.com Tue Jul 10 12:50:31 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 12:50:31 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <18067.47319.254632.538811@almost.alerce.com> Jason Stajich writes: > [...] > Do you know how to have svn commit messages generate summary emails > as well? I've made a local installation of the SVN::Notify bits in my home directory and set up its notification script. If folks are happy with it then I'll work on getting The Powers That Be to do a real install and we'll use it for the real repository. It's currently configured to include diffs inline in the message. I prefer them as an attachment, but the current configuration of the bioperl-guts-l list stalls messages w/ attachments and requires admin intervention. I have a support@ request going on it and will change it if/when we get the issue resolved. So, to review: svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/ is the top of the repository and svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/bioperl-live/trunk will get you the main branch of bioperl-live. Remember that the repository is transient, don't put anything important in there.... Have at it, but remember that the entire world will see your commit messages. g. From xing.y.hu at gmail.com Tue Jul 10 13:08:35 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Wed, 11 Jul 2007 01:08:35 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <46939085.40906@ioc.fiocruz.br> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> Message-ID: <4693BD13.2070509@gmail.com> Hi Alberto, Yes, I know that there is only choice for showing no more than 500 entries on the NCBI website. However, I completely ignored that (doesn't mean that I have not seen that), and pulled down the "send to" and chose "file". Then a small window popped up, after saying yes to that, the downloading started. You might ask me how I know that it was not a batch of only 5 (default selection) or 500 ESTs? To be honest, I don't know at the first time. But the download has accumulated to millions bytes since then(due to my bad network condition, I have no idea when it will reach the end), and that doesn't look like a little batch of ESTs less than one thousand. Actually, I wrote a script to count the sequences within the temporary file and got a number much bigger than ten thousand. So I guess it works. BTW, I never thought Bio::DB::Genbank can do that! Again, thanks you guys! Xing Alberto Davila wrote: > Hi Xing, > > Unfortunately that did not work for me... there are 5133 T. brucei ESTs > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) > and 13971 from T. cruzi > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) > that I cannot download at once in GenBank format... even when I select > "GenBank" format in the Display menu I can only see and get/download 500 > ESTs each time... > > I also downloaded all ESTs from GenBank (a pity there are not subsets of > them !) but merging all them generate a file bigger than 120GB to be > processed... > > Just asked Diogo (my student) to give a try to the script sent by Chris > Fields.. so finger crossed ;-) > > Cheers, Alberto > > > Xing Hu wrote: > >> Thanks you guys. >> >> I had to confess that how stupid I was. The easiest way seems to be the >> way using NCBI Taxonomy Browser which suggested by alex. As a matter of >> fact, I knew that but I thought it was necessary to have all items >> selected before pressing save to launch download. So I was desperate to >> find a button that could achieve that without hundreds of thousands of >> clicking by me. "What about select none of those items at all?" -- This >> idea finally came to me after days of struggling and the problem was solved. >> >> Xing >> >> >> >> Chris Fields wrote: >> >>> Caveat: if you have millions of ESTs please consider NOT using my >>> eutil script below or NCBI Batch Entrez, which would repeatedly hit >>> the NCBI server thousands of times. At least try looking for other >>> ways to retrieve the data you want (ftp, organism-specific resources >>> like Ensembl, so on), or run any scripts or data retrieval in off >>> hours so you don't overtax the NCBI server. >>> >>> There is a way you can use BioPerl if you don't mind living on the >>> bleeding edge by using bioperl-live (core code from CVS). I have been >>> working on a set of modules for the last year (Bio::DB::EUtilities) >>> which interact with all the various eutils for building data pipelines >>> which uses the NCBI CGI interface. You could possibly retrieve all >>> relevant ESTs using a variation of the example script here: >>> >>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch >>> >>> Note that the code examples do NOT work with rel. 1.5.2 code as the >>> API has changed quite a bit; I'm working to rectify some of that. >>> >>> The script I would use is below. It retrieves batches of 500 >>> sequences (in fasta format) at a time, for a total of 10000 max seq >>> records, saving the raw record data directly to a file (appending as >>> you go along). I added an eval block to check the server status and >>> redo the call up to 4 times before giving up completely. Using eval >>> this way hasn't been extensively tested but should work. >>> >>> --------------------------------------- >>> >>> use Bio::DB::EUtilities; >>> >>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >>> -db => 'nucest', >>> -term => 'txid3702', >>> -usehistory => 'y', >>> -keep_histories => 1); >>> >>> my $count = $factory->get_count; >>> >>> print "Count: $count\n"; >>> >>> if (my $hist = $factory->next_History) { >>> print "History returned\n"; >>> # note db carries over from above >>> $factory->set_parameters(-eutil => 'efetch', >>> -rettype => 'fasta', >>> -history => $hist); >>> my ($retmax, $retstart) = (500,0); >>> my $retry = 1; >>> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >>> records to return >>> RETRIEVE_SEQS: >>> while ($retstart < $maxcount) { >>> print "Returning from ",$retstart+1," to >>> ",$retstart+$retmax,"\n"; >>> $factory->set_parameters(-retmax => $retmax, >>> -retstart => $retstart); >>> # check in case of server error >>> eval{ >>> $factory->get_Response(-file => ">>ESTs.fas"); >>> }; >>> if ($@) { >>> die "Server error: $@. Try again later" if $retry == 5; >>> print STDERR "Server error, redo #$retry\n"; >>> $retry++ && redo RETRIEVE_SEQS; >>> } >>> $retstart += $retmax; >>> } >>> } >>> >>> >>> --------------------------------------- >>> >>> >>> chris >>> >>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >>> >>> >>>> To download genomic sequences or ESTs for any organism (in various >>>> formats) you can use NCBI Taxonomy Browser: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>>> >>>> you can use taxonomy id to access different organisms, Arabidopsis for >>>> example (3702): >>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>>> >>>> >>>> or by direct web link: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>>> >>>> >>>> assembled genomes can be accessed via ftp: >>>> ftp://ftp.ncbi.nih.gov/genomes/ >>>> >>>> To download large amount of selected sequences (ESTs for example) you >>>> can use batch Entrez: >>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>>> (select EST for EST, it's critical) >>>> >>>> It seems, to solve the problem you describe, you don't need to use >>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >>>> these simple and frequent tasks. >>>> >>>> -Alex >>>> >>>> --Alexander Kozik >>>> Bioinformatics Specialist >>>> Genome and Biomedical Sciences Facility >>>> 451 East Health Sciences Drive >>>> University of California >>>> Davis, CA 95616-8816 >>>> Phone: (530) 754-9127 >>>> email#1: akozik at atgc.org >>>> email#2: akozik at gmail.com >>>> web: http://www.atgc.org/ >>>> >>>> >>>> >>>> Xing Hu wrote: >>>> >>>>> Hi friends, >>>>> >>>>> I wrote a script for getting genomic sequence file from GenBank. To >>>>> fulfill that target, I used DB::GenBank module to get the sequence via >>>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>>> amount >>>>> of ESTs, I have no idea how to download them swiftly and elegantly. >>>>> >>>>> PROBLEM DESCRIPTION: >>>>> goal: download all EST files of a specific species from GenBank, >>>>> say >>>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>>> other: whether all of ESTs are in a single file or separatedly >>>>> placed does not matter. >>>>> >>>>> Can I use a bioperl script to achieve that? And How? I really >>>>> appreciate. >>>>> >>>>> Xing. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Tue Jul 10 13:14:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 10 Jul 2007 18:14:29 +0100 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.47319.254632.538811@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> Message-ID: <4693BE75.4090005@sendu.me.uk> George Hartzell wrote: > Jason Stajich writes: > > [...] > > Do you know how to have svn commit messages generate summary emails > > as well? > > I've made a local installation of the SVN::Notify bits in my home > directory and set up its notification script. If folks are happy with > it then I'll work on getting The Powers That Be to do a real install > and we'll use it for the real repository. > > It's currently configured to include diffs inline in the message. I > prefer them as an attachment, but the current configuration of the > bioperl-guts-l list stalls messages w/ attachments and requires admin > intervention. I have a support@ request going on it and will change > it if/when we get the issue resolved. Can I put a vote in that you don't? I search through email body text in my archive of guts to find certain diffs, so really like the diffs inline. Also, is there any way to get rid of the 'bioperl' in [bioperl revision] in the subject? Seems redundant and makes it harder to see what was changed in a small email client window. From aaron.j.mackey at gsk.com Tue Jul 10 13:20:15 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 10 Jul 2007 13:20:15 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.47319.254632.538811@almost.alerce.com> Message-ID: George, this is all very nice to finally have, thank you for your efforts! Any chance that the diff-as-attachment vs. diffs-inline question can be different for each subscriber? The utility of the "guts" mailing list (to me) is that it's an encyclopedia of browsable, skimmable, and searchable diffs, not just a date-stamped record of diffs (if so, why provide an attachment at all, just provide a URL to the diff in the respository). Thanks again, -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 07/10/2007 12:50:31 PM: > Jason Stajich writes: > > [...] > > Do you know how to have svn commit messages generate summary emails > > as well? > > I've made a local installation of the SVN::Notify bits in my home > directory and set up its notification script. If folks are happy with > it then I'll work on getting The Powers That Be to do a real install > and we'll use it for the real repository. > > It's currently configured to include diffs inline in the message. I > prefer them as an attachment, but the current configuration of the > bioperl-guts-l list stalls messages w/ attachments and requires admin > intervention. I have a support@ request going on it and will change > it if/when we get the issue resolved. > > So, to review: > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/ > > is the top of the repository and > > svn co svn+ssh://dev.open-bio. > org/home/hartzell/bioperl_take2/bioperl-live/trunk > > will get you the main branch of bioperl-live. > > Remember that the repository is transient, don't put anything > important in there.... > > Have at it, but remember that the entire world will see your commit > messages. > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Jul 10 14:18:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 13:18:07 -0500 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <4693BE75.4090005@sendu.me.uk> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: On Jul 10, 2007, at 12:14 PM, Sendu Bala wrote: > George Hartzell wrote: >> Jason Stajich writes: >>> [...] >>> Do you know how to have svn commit messages generate summary emails >>> as well? >> >> I've made a local installation of the SVN::Notify bits in my home >> directory and set up its notification script. If folks are happy >> with >> it then I'll work on getting The Powers That Be to do a real install >> and we'll use it for the real repository. >> >> It's currently configured to include diffs inline in the message. I >> prefer them as an attachment, but the current configuration of the >> bioperl-guts-l list stalls messages w/ attachments and requires admin >> intervention. I have a support@ request going on it and will change >> it if/when we get the issue resolved. > > Can I put a vote in that you don't? I search through email body > text in > my archive of guts to find certain diffs, so really like the diffs > inline. > > Also, is there any way to get rid of the 'bioperl' in [bioperl > revision] > in the subject? Seems redundant and makes it harder to see what was > changed in a small email client window. Agree on both counts; the devs have gotten used to seeing the diffs inline. We prob. need to schedule a specific day/time when the switchover would take place so we can announce (so everyone knows and no one can gripe). Did we ever resolve the svn->cvs issue? Jason pointed out some tools a while ago... chris From hartzell at alerce.com Tue Jul 10 16:09:09 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 16:09:09 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <4693BE75.4090005@sendu.me.uk> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: <18067.59237.519166.454578@almost.alerce.com> Sendu Bala writes: > George Hartzell wrote: > > Jason Stajich writes: > > > [...] > > > Do you know how to have svn commit messages generate summary emails > > > as well? > > > > I've made a local installation of the SVN::Notify bits in my home > > directory and set up its notification script. If folks are happy with > > it then I'll work on getting The Powers That Be to do a real install > > and we'll use it for the real repository. > > > > It's currently configured to include diffs inline in the message. I > > prefer them as an attachment, but the current configuration of the > > bioperl-guts-l list stalls messages w/ attachments and requires admin > > intervention. I have a support@ request going on it and will change > > it if/when we get the issue resolved. > > Can I put a vote in that you don't? I search through email body text in > my archive of guts to find certain diffs, so really like the diffs inline. Ok, three votes against attachments. Anyone want to vote in support, otherwise I'll just leave 'em inline. > Also, is there any way to get rid of the 'bioperl' in [bioperl revision] > in the subject? Seems redundant and makes it harder to see what was > changed in a small email client window. Sure. The default's just [RevisionNumber]. Does that work for folk? g. From hartzell at alerce.com Tue Jul 10 16:11:36 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 16:11:36 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: <18067.59384.247108.463648@almost.alerce.com> Chris Fields writes: > [...] > We prob. need to schedule a specific day/time when the switchover > would take place so we can announce (so everyone knows and no one can > gripe). Did we ever resolve the svn->cvs issue? Jason pointed out > some tools a while ago... I haven't done anything about it. I think that we also need to have some input from the admin/support folk about access methods (https, etc...). Are we going to want to mirror the repository anywhere? g. From hartzell at alerce.com Wed Jul 11 09:17:08 2007 From: hartzell at alerce.com (George Hartzell) Date: Wed, 11 Jul 2007 09:17:08 -0400 Subject: [Bioperl-l] extra hook functionality for svn repos? Message-ID: <18068.55380.626778.486775@almost.alerce.com> There are a bunch of "contributed" hook scripts at http://subversion.tigris.org/tools_contrib.html#hook_scripts Given that many bioperl users depend on case-preserving but case-insensitive file systems, I'm wondering if hooking up the case-insensitive.py script might be worthwhile. Likewise, the check-mime-type.pl script might help us keep svn:mime-type and svn:eol-style properties up to date. There are others there, but none that I found interesting. How big-brother do we want the repository to be? g. From cjfields at uiuc.edu Wed Jul 11 09:40:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Jul 2007 08:40:54 -0500 Subject: [Bioperl-l] extra hook functionality for svn repos? In-Reply-To: <18068.55380.626778.486775@almost.alerce.com> References: <18068.55380.626778.486775@almost.alerce.com> Message-ID: On Jul 11, 2007, at 8:17 AM, George Hartzell wrote: > > There are a bunch of "contributed" hook scripts at > > http://subversion.tigris.org/tools_contrib.html#hook_scripts > > Given that many bioperl users depend on case-preserving but > case-insensitive file systems, I'm wondering if hooking up the > case-insensitive.py script might be worthwhile. I'm not sure how often we run into this, though. Anyone know? > Likewise, the check-mime-type.pl script might help us keep > svn:mime-type and svn:eol-style properties up to date. The latter two might be nice. I thought we planned on defaulting to a simple 'plain text' mime type on commits if it isn't specifically predefined, but maybe this way is better? > There are others there, but none that I found interesting. > > How big-brother do we want the repository to be? > > g. 'Friendly' big-brother, not 'dystopian' big-brother. chris From marian.thieme at lycos.de Wed Jul 11 05:05:18 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Wed, 11 Jul 2007 09:05:18 +0000 Subject: [Bioperl-l] submitting code Message-ID: <188661178019848@lycos-europe.com> An HTML attachment was scrubbed... URL: From dmessina at wustl.edu Wed Jul 11 16:14:17 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 11 Jul 2007 15:14:17 -0500 Subject: [Bioperl-l] submitting code In-Reply-To: <188661178019848@lycos-europe.com> References: <188661178019848@lycos-europe.com> Message-ID: <4DF90B9A-7FFA-4867-B5D3-E6F05EC84BBC@wustl.edu> Hi Marian, Thanks so much for contributing! The best way would be to create a Bugzilla ticket and then attach the code to that ticket. One of the developers will check it in and give you feedback if there are any little tweaks that would be helpful*. Would you be able to include documentation and test cases with your module? Dave * For more info: http://www.bioperl.org/wiki/FAQ#I. 27ve_got_an_idea_for_a_module_how_do_I_contribute_it.3F http://www.bioperl.org/wiki/Developer_Information http://www.bioperl.org/wiki/Becoming_a_developer http://bioperl.org/pipermail/bioperl-l/2003-February/011226.html -- Dave Messina Senior Analyst, Assembly Group Genome Sequencing Center Washington University St. Louis, MO From marian.thieme at lycos.de Wed Jul 11 11:12:20 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Wed, 11 Jul 2007 15:12:20 +0000 Subject: [Bioperl-l] submitting code Message-ID: <188661178030343@lycos-europe.com> An HTML attachment was scrubbed... URL: From e-just at northwestern.edu Thu Jul 12 10:37:03 2007 From: e-just at northwestern.edu (Eric Just) Date: Thu, 12 Jul 2007 09:37:03 -0500 Subject: [Bioperl-l] Job opening in Chicago Message-ID: Hello everyone, We have an opening at dictyBase (Northwestern University in Chicago) for a Bioinformatics Software Engineer. This job involves writing and maintaining software for a genome database using Chado/OO-Perl/Bioperl and many other state of the art technologies. For more information please see: http://dictybase.org/dictybase_jobs.htm Thanks, Eric From cjfields at uiuc.edu Thu Jul 12 12:09:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Jul 2007 11:09:02 -0500 Subject: [Bioperl-l] DB::SeqFeature::Store::GFF3Loader question Message-ID: I have been running into some GFF formatting issues where the attributes column is left undef (no '.'), which causes GFF3Loader::parse_attributes() to complain with an 'use of undefined string with split' warning. Would it be okay with the powers that be (Scott, Lincoln) to add a warning or exception there? I'm guessing a warning is better in this case, as just returning works fine. chris From jason at bioperl.org Fri Jul 13 13:30:05 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 13:30:05 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.59384.247108.463648@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> Message-ID: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> I'll try and look into this and other stuff with the migration in next week or so - maybe we'll make some time to talk it through during BOSC. I don't know yet when I'll actually have time to think about it properly. I am still worried about doing https because of the current system we have supporting user logins and that we didn't want to run a web server on the main repository machine and we'll have to install DAV on the main repository machine. if ssh+svn is going to be sufficient hurdle for people, note it was already a hurdle for them with CVS, but we'll have to think a bit more on it. We might be able to do some sort of NFS (or other exported FS) but exported to the webserver machine but that is may be a recipe for disaster. -jason On Jul 10, 2007, at 4:11 PM, George Hartzell wrote: > Chris Fields writes: >> [...] >> We prob. need to schedule a specific day/time when the switchover >> would take place so we can announce (so everyone knows and no one can >> gripe). Did we ever resolve the svn->cvs issue? Jason pointed out >> some tools a while ago... > > I haven't done anything about it. > > I think that we also need to have some input from the admin/support > folk about access methods (https, etc...). > > Are we going to want to mirror the repository anywhere? > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Fri Jul 13 14:29:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 13 Jul 2007 13:29:22 -0500 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> Message-ID: <5F5EB9B6-11AF-4D20-95B1-EBBD40A98962@uiuc.edu> I don't think there's a huge rush on this since BOSC is imminent. If devs really want https then we can try adding it after migration, but if it becomes too much of a headache (particularly for the web admins) I wouldn't worry about it. chris On Jul 13, 2007, at 12:30 PM, Jason Stajich wrote: > I'll try and look into this and other stuff with the migration in > next week or so - maybe we'll make some time to talk it through > during BOSC. I don't know yet when I'll actually have time to think > about it properly. > > I am still worried about doing https because of the current system we > have supporting user logins and that we didn't want to run a web > server on the main repository machine and we'll have to install DAV > on the main repository machine. if ssh+svn is going to be sufficient > hurdle for people, note it was already a hurdle for them with CVS, > but we'll have to think a bit more on it. > > We might be able to do some sort of NFS (or other exported FS) but > exported to the webserver machine but that is may be a recipe for > disaster. > > -jason > On Jul 10, 2007, at 4:11 PM, George Hartzell wrote: > >> Chris Fields writes: >>> [...] >>> We prob. need to schedule a specific day/time when the switchover >>> would take place so we can announce (so everyone knows and no one >>> can >>> gripe). Did we ever resolve the svn->cvs issue? Jason pointed out >>> some tools a while ago... >> >> I haven't done anything about it. >> >> I think that we also need to have some input from the admin/support >> folk about access methods (https, etc...). >> >> Are we going to want to mirror the repository anywhere? >> >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sheris at eps.berkeley.edu Fri Jul 13 14:42:32 2007 From: sheris at eps.berkeley.edu (Sheri Simmons) Date: Fri, 13 Jul 2007 11:42:32 -0700 Subject: [Bioperl-l] Problem with Bio::PopGen::Individual Message-ID: <200707131142.32366.sheris@eps.berkeley.edu> Hi, I have a collection of sequencing reads aligned with a consensus sequence that I input into a Bio::PopGen::Population object in order to calculate allele frequencies. The consensus sequence is included to force clustalw to give a better alignment. However, I need to remove the consensus sequence before calculating allele frequencies in the individual reads. I'm having trouble with this part of it. I get the following error message: "Can't locate object method "person_id" via package "Bio::PopGen::Individual" at /usr/share/perl5/Bio/PopGen/Population.pm line 260, line 49." Here is the code snippet producing the error. $pop is a Bio::PopGen::Population object. my @consensus = "gene_consensus"; $pop->remove_Individuals(@consensus); I also tried: my @consensus = $pop->get_Individuals(-unique_id => "gene_consensus"); $pop->remove_Individuals(@consensus); which produced the same error. Can anyone send me in the right direction? I suspect this is a simple problem. Sheri -- Sheri Simmons Department of Earth and Planetary Sciences University of California, Berkeley Berkeley, CA 94720-4767 From jason at bioperl.org Fri Jul 13 16:17:31 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 16:17:31 -0400 Subject: [Bioperl-l] Problem with Bio::PopGen::Individual In-Reply-To: <200707131142.32366.sheris@eps.berkeley.edu> References: <200707131142.32366.sheris@eps.berkeley.edu> Message-ID: <99A3513A-7DBE-4C89-B38B-8C2B76B0E14F@bioperl.org> Hi Sheri - Shoot - that was my fault - bug in the code where I was only using "Person" not Individuals for the code when I was testing. I've commited a bugfix to CVS - do you need me to send you the updated file or are you comfortable grabbing the code from CVS or http://code.open-bio.org This is the change - you may have a different version of BioPerl than what is in CVS so you may have to make the changes on line 260 rather than 282 -- or you can upgrade to latest code via CVS (although this is probably harder for you since you've got stuff installed in /usr/ share)': RCS file: /home/repository/bioperl/bioperl-live/Bio/PopGen/ Population.pm,v retrieving revision 1.22 diff -r1.22 Population.pm 282c282 < unshift @tosplice, $i if( $namehash{$ind->person_id} ); --- > unshift @tosplice, $i if( $namehash{$ind->unique_id} ); -jason On Jul 13, 2007, at 2:42 PM, Sheri Simmons wrote: > Hi, > I have a collection of sequencing reads aligned with a consensus > sequence that > I input into a Bio::PopGen::Population object in order to calculate > allele > frequencies. The consensus sequence is included to force clustalw > to give a > better alignment. However, I need to remove the consensus sequence > before > calculating allele frequencies in the individual reads. I'm having > trouble > with this part of it. I get the following error message: > > "Can't locate object method "person_id" via package > "Bio::PopGen::Individual" > at /usr/share/perl5/Bio/PopGen/Population.pm line 260, line > 49." > > Here is the code snippet producing the error. $pop is a > Bio::PopGen::Population object. > > my @consensus = "gene_consensus"; > $pop->remove_Individuals(@consensus); > > I also tried: > my @consensus = $pop->get_Individuals(-unique_id => > "gene_consensus"); > $pop->remove_Individuals(@consensus); > > which produced the same error. Can anyone send me in the right > direction? I > suspect this is a simple problem. > > Sheri > > -- > Sheri Simmons > Department of Earth and Planetary Sciences > University of California, Berkeley > Berkeley, CA 94720-4767 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From hartzell at alerce.com Fri Jul 13 16:34:14 2007 From: hartzell at alerce.com (George Hartzell) Date: Fri, 13 Jul 2007 16:34:14 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> Message-ID: <18071.57798.130368.703488@almost.alerce.com> Jason Stajich writes: > I'll try and look into this and other stuff with the migration in > next week or so - maybe we'll make some time to talk it through > during BOSC. I don't know yet when I'll actually have time to think > about it properly. > > I am still worried about doing https because of the current system we > have supporting user logins and that we didn't want to run a web > server on the main repository machine and we'll have to install DAV > on the main repository machine. if ssh+svn is going to be sufficient > hurdle for people, note it was already a hurdle for them with CVS, > but we'll have to think a bit more on it. > [...] How are you thinking about providing anonymous readonly non-dev access to the repository? svn+ssh using an anonymous/guest account (can it be screwed down tightly enough?) svn-mirror the repo onto the public machine and do DAV there w/out having to worry about authenticating the devs? g. From jason at bioperl.org Fri Jul 13 17:33:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 17:33:29 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18071.57798.130368.703488@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> <18071.57798.130368.703488@almost.alerce.com> Message-ID: <5C42D957-BCCA-46B6-8121-3313CE4B0F2A@bioperl.org> On Jul 13, 2007, at 4:34 PM, George Hartzell wrote: > Jason Stajich writes: >> I'll try and look into this and other stuff with the migration in >> next week or so - maybe we'll make some time to talk it through >> during BOSC. I don't know yet when I'll actually have time to think >> about it properly. >> >> I am still worried about doing https because of the current system we >> have supporting user logins and that we didn't want to run a web >> server on the main repository machine and we'll have to install DAV >> on the main repository machine. if ssh+svn is going to be sufficient >> hurdle for people, note it was already a hurdle for them with CVS, >> but we'll have to think a bit more on it. >> [...] > > How are you thinking about providing anonymous readonly non-dev access > to the repository? svn+ssh using an anonymous/guest account (can it > be screwed down tightly enough?) svn-mirror the repo onto the public > machine and do DAV there w/out having to worry about authenticating > the devs? > We'll do svn on the public anonymous machine like we already do with CVS and with SVN See: http://code.open-bio.org AND http://code.open-bio.org/svnweb/ See blipkit. -jason > g. > > -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From scrosson at uchicago.edu Fri Jul 13 18:15:30 2007 From: scrosson at uchicago.edu (Sean Crosson) Date: Fri, 13 Jul 2007 22:15:30 +0000 (UTC) Subject: [Bioperl-l] ace to fasta conversion Message-ID: I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta and it works great. We're now trying to convert a big (250 MB) .ace file to fasta. The documentation suggests I can do this, but everytime I run the script below, it outputs an empty .fas file. Does anyone have any suggestions on how to make this script work? Does SeqIO really convert between these file types? Thanks for your help. #!/usr/bin/perl -w use Bio::SeqIO; $in = Bio::SeqIO->new(-file => "454Contigs.ace", -format => 'ace'); $out = Bio::SeqIO->new(-file => ">454Contigs.fas", -format => 'fasta'); while ( $seq = $in->next_seq() ) {$out->write_seq($seq); } From cvillamar at gmail.com Fri Jul 13 19:24:04 2007 From: cvillamar at gmail.com (Carlos Villacorta) Date: Fri, 13 Jul 2007 16:24:04 -0700 Subject: [Bioperl-l] beginner problem with fasta headers Message-ID: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> hi all, I have a embl sequence file, when formatting to fasta with Seqio it gives a long string header for each sequence that my following phylogenetic software cannot handle... Does anyone knows how to format those embl or genbank files to fasta but retrieving in the headers just two or three fields (e.g. id | gene | sp_name)? Any advice with this problem would be very appreciated, thanks! From j_martin at lbl.gov Fri Jul 13 20:05:45 2007 From: j_martin at lbl.gov (Joel Martin) Date: Fri, 13 Jul 2007 17:05:45 -0700 Subject: [Bioperl-l] ace to fasta conversion In-Reply-To: References: Message-ID: <20070714000544.GB29841@eniac.jgi-psf.org> Hello, the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use is a phrap/consed ace file. They aren't related at all. You might try poking around in Bio::AssemblyIO which should read assembly ace files. Joel On Fri, Jul 13, 2007 at 10:15:30PM +0000, Sean Crosson wrote: > I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta > and it works great. We're now trying to convert a big (250 MB) .ace file to > fasta. The documentation suggests I can do this, but everytime I run the script > below, it outputs an empty .fas file. Does anyone have any suggestions on how > to make this script work? Does SeqIO really convert between these file types? > Thanks for your help. > > #!/usr/bin/perl -w > > use Bio::SeqIO; > > > $in = Bio::SeqIO->new(-file => "454Contigs.ace", > -format => 'ace'); > $out = Bio::SeqIO->new(-file => ">454Contigs.fas", > -format => 'fasta'); > while ( $seq = $in->next_seq() ) {$out->write_seq($seq); } > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sat Jul 14 00:06:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 13 Jul 2007 23:06:27 -0500 Subject: [Bioperl-l] beginner problem with fasta headers In-Reply-To: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> References: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> Message-ID: <0089195A-4935-49F2-A8E7-C1F9B8A34D4E@uiuc.edu> Some reading material... http://www.bioperl.org/wiki/ FAQ#Accession_numbers_are_not_present_for_FASTA_sequence_files http://www.bioperl.org/wiki/ FAQ#I_would_like_to_make_my_own_custom_fasta_header_- _how_do_I_do_this.3F http://www.bioperl.org/wiki/FASTA_sequence_format#Note Quiz on Monday! chris On Jul 13, 2007, at 6:24 PM, Carlos Villacorta wrote: > hi all, > I have a embl sequence file, when formatting to fasta with Seqio it > gives a long string header for each sequence that my following > phylogenetic software cannot handle... > Does anyone knows how to format those embl or genbank files to fasta > but retrieving in the headers just two or three fields (e.g. id | gene > | sp_name)? > Any advice with this problem would be very appreciated, thanks! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scrosson at uchicago.edu Fri Jul 13 23:43:59 2007 From: scrosson at uchicago.edu (scrosson) Date: Fri, 13 Jul 2007 20:43:59 -0700 (PDT) Subject: [Bioperl-l] ace to fasta conversion In-Reply-To: <20070714000544.GB29841@eniac.jgi-psf.org> References: <20070714000544.GB29841@eniac.jgi-psf.org> Message-ID: <11590811.post@talk.nabble.com> This problem now makes sense. I've been playing with Bio::Assembly::IO, which does indeed read phrap .ace files. Does anyone have an idea how to pull the assembled contigs out of a Bio::Assembly object and write them out as multi-fasta (or strings for that matter)? None of our workstations are running phrap/consed and I'd love to see these contigs. Sean Hello, the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use is a phrap/consed ace file. They aren't related at all. You might try poking around in Bio::AssemblyIO which should read assembly ace files. Joel -- View this message in context: http://www.nabble.com/ace-to-fasta-conversion-tf4077370.html#a11590811 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bioperlanand at yahoo.com Sat Jul 14 13:55:53 2007 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Sat, 14 Jul 2007 10:55:53 -0700 (PDT) Subject: [Bioperl-l] a question on obtain PDB records using bioperl Message-ID: <798126.17426.qm@web36804.mail.mud.yahoo.com> Hi everybody, Is there a method in Bioperl to obtain PDB record(s) on the fly, i.e. something similar to Bio:Perl methods to retrieve EMBL or GenBank records. Thanks in advance, Anand --------------------------------- Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. From johnsonm at gmail.com Tue Jul 17 14:23:58 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 17 Jul 2007 13:23:58 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? Message-ID: I'm tinkering with parsing iprscan reports with BioPerl. I noticed that this: my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => 'interpro'); while (my $seq = $seqio->next_seq()) { ... } Does not work unless I first 'use XML::DOM::XPath'. I get this error: Can't locate object method "findnodes" via package "XML::DOM::Document" at bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line 30. I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to suck in XML::DOM::Xpath. I see that t/interpro.t requires XML::DOM::XPath: test_begin(-tests => 17, -requires_module => 'XML::DOM::XPath'); Is suppose the reason the test specs a require XML::DOM::XPath is so that tests can be skipped if XML::DOM::XPath is not available. Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? From sac at bioperl.org Tue Jul 17 15:49:32 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 17 Jul 2007 12:49:32 -0700 Subject: [Bioperl-l] Ohloh account for bioperl Message-ID: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> I came across a web app that tracks various metrics for open source projects, noticed that bioperl wasn't listed, and added it: http://www.ohloh.net/projects/6685 Seems like an interesting resource that could help add some visibility. It creates metrics by directly processing the source code repository. I hooked it up to the CVS repos for bioperl-live, -db, -run, and -pipeline. It has yet to do its analysis at this point. Feel free to create Ohloh accounts for yourselves. When you add yourself as a contributor to Bioperl, you can indicate the username associated with your commits, but this requires that it first process the commit logs to figure out what the usernames are. You can still create an account, just update it later with your username. Steve From cjfields at uiuc.edu Tue Jul 17 17:04:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 17 Jul 2007 16:04:44 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? In-Reply-To: References: Message-ID: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu> On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote: > I'm tinkering with parsing iprscan reports with BioPerl. I noticed > that this: > > my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => > 'interpro'); > > while (my $seq = $seqio->next_seq()) { > ... > } > > Does not work unless I first 'use XML::DOM::XPath'. I get this error: > > Can't locate object method "findnodes" via package > "XML::DOM::Document" at > bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line > 30. > > I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to > suck in XML::DOM::Xpath. I see that t/interpro.t requires > XML::DOM::XPath: > > test_begin(-tests => 17, > -requires_module => 'XML::DOM::XPath'); > > Is suppose the reason the test specs a require XML::DOM::XPath is so > that tests can be skipped if XML::DOM::XPath is not available. > Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? You're right; I think tests passed b/c XML::DOM::XPath (if present), was eval'd as a required module. When I commented out the spot where it is eval'd in the test suite I can replicate this error. I have added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it passes fine. Thanks for the heads up! chris From xianranli78 at yahoo.com.cn Wed Jul 18 01:55:19 2007 From: xianranli78 at yahoo.com.cn (Xianran Li) Date: Wed, 18 Jul 2007 13:55:19 +0800 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file Message-ID: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> Hi, I want to extract some infomation from the gff3 file like: 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? Thanks for your help. Xianran Li From georg.otto at tuebingen.mpg.de Wed Jul 18 05:32:26 2007 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Wed, 18 Jul 2007 11:32:26 +0200 Subject: [Bioperl-l] run megablast Message-ID: Hi, is there a module to run megablast in a script (equivalent to ncbi blast in StandAloneBlast.pm)? Cheers, Georg From jeevitesh at ibab.ac.in Wed Jul 18 06:03:24 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 15:33:24 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <47819.192.168.1.125.1184753004.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D we need to find the shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From jeevitesh at ibab.ac.in Wed Jul 18 03:15:33 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 12:45:33 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <55933.192.168.1.125.1184742933.squirrel@webmail.ibab.ac.in> Hi Friends, we need your valuable help in finding the SHARED PATH BETWEEN TWO NODES OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES. Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From jeevitesh at ibab.ac.in Wed Jul 18 04:45:50 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 14:15:50 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <43613.192.168.1.125.1184748350.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D we need to find the shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From cain.cshl at gmail.com Wed Jul 18 09:10:40 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 18 Jul 2007 09:10:40 -0400 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file In-Reply-To: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> Message-ID: <1184764240.2570.31.camel@localhost.localdomain> Hi Xianran Li, Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing as Bio::DB::GFF3), then you can use the attributes method to get anything in the ninth column: my ($name) = $gene->attributes('Name'); The parenthesis are needed around $name because the attributes method returns a list and the parens capture the first item of the list into $name. Scott On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote: > Hi, > > I want to extract some infomation from the gff3 file like: > > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative > > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? > > Thanks for your help. > > > Xianran Li > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From johnsonm at gmail.com Wed Jul 18 16:53:00 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 18 Jul 2007 15:53:00 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? In-Reply-To: <469DB6C6.9010702@pasteur.fr> References: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu> <469DB6C6.9010702@pasteur.fr> Message-ID: The output from InterProScan, invoked thusly: iprscan -cli -seqtype p -i input_file -o output_file -format xml On 7/18/07, Emmanuel Quevillon wrote: > Hi guys, > > I read your email and I wondered which iprscan file you've > been talking about? Is it the file produced by InterProScan > or the file called match.xml representing the whole uniprot > database against InterPro? Reading the xml parser > implemented into Bio::SeqIO::interpro, I guess it is the > second one? > In such case, I just want to let you know that the xml > schema changed and the file name also. It is now called > match_complete.xml. > I attached the DTD to be able to see the new structure. > Here is an example of the new data representation. > > > crc64="F1DD0C1042811B48"> > name="PyrdxlP-dep_Trfase_major_sub1" dbname="GENE3D" > status="T" evd="HMMPfam"> > type="Domain" /> > > > dbname="PANTHER" status="T" evd="not_rel"> > > > > > As you can see some time there is no interpro info (no ipr > element). > > I think it would be good to change also the interpro parser ? > > Regards > > Emmanuel > > Chris Fields wrote: > > On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote: > > > >> I'm tinkering with parsing iprscan reports with BioPerl. I noticed > >> that this: > >> > >> my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => > >> 'interpro'); > >> > >> while (my $seq = $seqio->next_seq()) { > >> ... > >> } > >> > >> Does not work unless I first 'use XML::DOM::XPath'. I get this error: > >> > >> Can't locate object method "findnodes" via package > >> "XML::DOM::Document" at > >> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line > >> 30. > >> > >> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to > >> suck in XML::DOM::Xpath. I see that t/interpro.t requires > >> XML::DOM::XPath: > >> > >> test_begin(-tests => 17, > >> -requires_module => 'XML::DOM::XPath'); > >> > >> Is suppose the reason the test specs a require XML::DOM::XPath is so > >> that tests can be skipped if XML::DOM::XPath is not available. > >> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? > > > > You're right; I think tests passed b/c XML::DOM::XPath (if present), > > was eval'd as a required module. When I commented out the spot where > > it is eval'd in the test suite I can replicate this error. I have > > added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it > > passes fine. > > > > Thanks for the heads up! > > > > chris > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cain.cshl at gmail.com Wed Jul 18 22:47:53 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 18 Jul 2007 22:47:53 -0400 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file In-Reply-To: <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL> References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> <1184764240.2570.31.camel@localhost.localdomain> <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL> Message-ID: <1184813273.2570.96.camel@localhost.localdomain> [Please always reply to the mailing list so that answers can archived] Yes, because commas are not allowed in GFF3 in an unescaped form. Essentially, you are doing this with your GFF3: Name=receptor kinase ORK10;Name= putative and when you do this: my ($name) = $gene->attributes('Name'); you are getting the first item in the list of names, and I suspect which one you get is random. To fix it, you need to replace the comma with %2C (the URL escape code for a comma). If you generated this GFF3, you will need to add a step to URI encode your attribute strings. If you got it from someone else, you should point out to them that their GFF is flawed. Scott On Thu, 2007-07-19 at 10:32 +0800, Xianran Li wrote: > However, the $name return the string "putative" rather than "receptor kinase ORK10". Is any particular reason? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing > as Bio::DB::GFF3), then you can use the attributes method to get > anything in the ninth column: > > my ($name) = $gene->attributes('Name'); > > The parenthesis are needed around $name because the attributes method > returns a list and the parens capture the first item of the list into > $name. > > Scott > > > On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote: > > Hi, > > > > I want to extract some infomation from the gff3 file like: > > > > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative > > > > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? > > > > Thanks for your help. > > > > > > Xianran Li > ----- Original Message ----- > From: "Scott Cain" > To: "Xianran Li" > Cc: > Sent: Wednesday, July 18, 2007 9:10 PM > Subject: Re: [Bioperl-l] extract information with Bio::DB::GFF3 fromgff3 file > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l�??i??'?????h??& -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From acutter at eeb.utoronto.ca Thu Jul 19 22:25:08 2007 From: acutter at eeb.utoronto.ca (Asher Cutter) Date: Thu, 19 Jul 2007 22:25:08 -0400 Subject: [Bioperl-l] tree comparisons with bioperl Message-ID: <46A01D04.5040209@eeb.utoronto.ca> I was reading over the functions for working with trees in bioperl. I am looking for something that will compare two topologies and report back if they are equivalent. i.e. something like: does ((a,(b,c)) == ((A,B),C) ? (in this case, no) But of course in reality they would be more complicated topologies. This would be useful for simulating random trees to compare with some given topology of interest. I saw the methods for testing for monophyly and paraphyly, but not much beyond that...perhaps I have missed something? Any suggestions? Thanks, Asher -- ___________________________________ Asher D. Cutter Assistant Professor Department of Ecology & Evolutionary Biology University of Toronto 25 Harbord St. Toronto, ON, M5S 3G5 tel: 416-978-4602 email: acutter at eeb.utoronto.ca http://www.eeb.utoronto.ca/faculty/faculty_profile.cfm?prof_id=130 ___________________________________ From jeevitesh at ibab.ac.in Fri Jul 20 00:25:22 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Fri, 20 Jul 2007 09:55:22 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <53244.192.168.1.125.1184905522.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D The shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From n.haigh at sheffield.ac.uk Sun Jul 22 07:34:58 2007 From: n.haigh at sheffield.ac.uk (Nathan S Haigh) Date: Sun, 22 Jul 2007 12:34:58 +0100 Subject: [Bioperl-l] Ohloh account for bioperl In-Reply-To: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> References: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> Message-ID: <46A340E2.4040505@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Steve Chervitz wrote: > I came across a web app that tracks various metrics for open source > projects, noticed that bioperl wasn't listed, and added it: > > http://www.ohloh.net/projects/6685 > > Seems like an interesting resource that could help add some > visibility. It creates metrics by directly processing the source code > repository. I hooked it up to the CVS repos for bioperl-live, -db, > -run, and -pipeline. It has yet to do its analysis at this point. > > Feel free to create Ohloh accounts for yourselves. When you add > yourself as a contributor to Bioperl, you can indicate the username > associated with your commits, but this requires that it first process > the commit logs to figure out what the usernames are. You can still > create an account, just update it later with your username. > > Steve > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Nice to see the graphs of number of commits each developer has made over the last 5 years and how new developers have arisen while those more "seasoned" developers can relax a little more -proof of an excellent open source project! Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGo0Dih5z4PPfwHQoRAua4AJ9nxDJeqAZIbyv0M3g+6Y2xWzkEEgCgnHBO 4JWvG5Gy+H/UqpeXYAcSCX0= =LrFt -----END PGP SIGNATURE----- From cjfields at uiuc.edu Sun Jul 22 23:53:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 22 Jul 2007 22:53:48 -0500 Subject: [Bioperl-l] run megablast In-Reply-To: References: Message-ID: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu> StandAloneBlast runs the megablast executable directly, though I think you can specify a MegaBlast search using blastall with the '-n' flag. We could probably add this functionality in fairly easily since SearchIO can parse megablast output; no one's had the need to code it yet. chris On Jul 18, 2007, at 4:32 AM, Georg Otto wrote: > > Hi, > > is there a module to run megablast in a script (equivalent to ncbi > blast in StandAloneBlast.pm)? > > Cheers, > > Georg > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jeevitesh at ibab.ac.in Mon Jul 23 06:34:36 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Mon, 23 Jul 2007 16:04:36 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D The shared path between AB and AC is 2. and for AC and BD the shared path is 6. We need to find the shared distance as said above. Kindly helps us it will help our research a lot. With Thanks & regards jeevitesh From bix at sendu.me.uk Mon Jul 23 07:08:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 23 Jul 2007 12:08:23 +0100 Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance In-Reply-To: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> References: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> Message-ID: <46A48C27.6060905@sendu.me.uk> jeevitesh at ibab.ac.in wrote: > Hi Friends, > > We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF > A TREE. Please stop sending this message. We heard you the first time. If no one answered, either no one knows the answer or no one understood you. > The Distance method of TreeIO in Bioperl module gives the total distance. > > But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as > illustrated > in figure. > > Suppose we have a tree > A C > \ / > \2 2/ > \__________/ > / 6 \ > /2 2\ > / \ > B D > > The shared path between AB and AC is 2. > and for AC and BD the shared path is 6. I don't follow. But if you already know how to work the answer out, describe the algorithm in words and maybe someone can code it up for you. From georg.otto at tuebingen.mpg.de Mon Jul 23 09:56:46 2007 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Mon, 23 Jul 2007 15:56:46 +0200 Subject: [Bioperl-l] run megablast References: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu> Message-ID: Thanks a lot! I guess I should have read the blast documentation more carefully.... Best, Georg Chris Fields writes: > StandAloneBlast runs the megablast executable directly, though I > think you can specify a MegaBlast search using blastall with the '-n' > flag. > > We could probably add this functionality in fairly easily since > SearchIO can parse megablast output; no one's had the need to code it > yet. > > chris > > On Jul 18, 2007, at 4:32 AM, Georg Otto wrote: > >> >> Hi, >> >> is there a module to run megablast in a script (equivalent to ncbi >> blast in StandAloneBlast.pm)? >> >> Cheers, >> >> Georg >> From cjfields at uiuc.edu Mon Jul 23 11:41:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 23 Jul 2007 10:41:35 -0500 Subject: [Bioperl-l] Bio::Assembly bug/feature? Message-ID: <52744D70-CED6-49DB-8A17-0998F125D9AD@uiuc.edu> To all: I think I have found a major problem with Bio::Assembly; this was first noticed on Mac OS X in relation to bug 2320 and Bio::Assembly::IO. I am uncertain whether this is meant to be a feature or a bug but it certainly needs to be documented or fixed as it leads to subtle errors. I also can't see the advantage of this approach, but maybe I can be enlightened? Either way, I think it's worth a discussion for those willing to follow. I'll add as a bug later if needed. A bit of background: each instance of a Bio::Assembly::Contig has a Bio::SeqFeature::Collection instance attached to it; each Bio::SeqFeature::Collection itself has a tied DB_File handle attached which remains open during the lifetime of the Bio::SF::Collection object. When using Bio::Assembly one adds the various Contig objects to a Bio::Assembly::Scaffold. So, for instance, if one had ~1000 Contigs in a Scaffold, one would also have ~1000 open tied db handles, one per Contig instance. So far, so good. Unfortunately, when adding a ton of Contig objects to a Bio::Assembly::Scaffold one can run into a host of system-dependent issues based on resource usage limits (as one might expect). This script: ------------------------------ use Bio::Assembly::Scaffold; use Bio::Assembly::Contig; use Bio::SeqFeature::Generic; my $scaffold = Bio::Assembly::Scaffold->new(); for my $id (1..15000) { print "Contig #$id\n"; my $contig = Bio::Assembly::Contig->new(-id => $id); my $feat = Bio::SeqFeature::Generic->new(-start=>1, -end=>10, -strand=>1); $contig->add_features([$feat]); $scaffold->add_contig($contig); } ------------------------------ may fail on Mac OS X when one reaches the maximum number of open file descriptors possible on Mac OS X (on UNIX'y systems, this is 'ulimit - n'); the call to tie the DB_File handle in SF::Collection fails silently, so later on when called on you get the following: ... Contig #251 Contig #252 Contig #253 Contig #254 Can't call method "put" on an undefined value at /Users/cjfields/src/ bioperl-live/Bio/SeqFeature/Collection.pm line 225. I have added an exception to catch this. On Mac OS X you can increase the file descriptor limit using ulimit, at least to a certain point. However, when testing this out on dev.open-bio.org (Linux) the 'tie' sometimes fails (and the exception pops up), but it isn't dependent on 'ulimit -n'. This is what happens more often: ... Contig #10567 Contig #10568 Contig #10569 Contig #10570 Out of memory! Sometimes followed by a seg fault. Ick! Any ideas? For instance, should we set this up so that one SF::Collection is used for all the Contigs (since each one has a unique ID anyway)? Leave as is and document/track the issue as a bug? Both? chris From ba6450 at wayne.edu Mon Jul 23 16:06:14 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Mon, 23 Jul 2007 16:06:14 -0400 (EDT) Subject: [Bioperl-l] error running codeml Message-ID: <20070723160614.EEU90041@mirapointms6.wayne.edu> Hello everyone: I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: [code] use Bio::Tools::Run::Phylo::PAML::Codeml; use Bio::AlignIO; use Bio::TreeIO; my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => 'NM_000034.CDSalign.paml'); my $aln = $alignio->next_aln; my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); my $tree = $treeio->next_tree; my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); $codeml->alignment($aln); $codeml->tree($tree); my ($rc,$parser) = $codeml->run(); my $result = $parser->next_result; my $MLmatrix = $result->get_MLmatrix(); print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; [/code] It gives the following error when I try to compile: [error] ------------ EXCEPTION: Bio::Root::Exception ------------- MSG: unable to find or run executable for 'codeml' STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 ----------------------------------------------------------- Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 [/error] Any idea, guys? Munirul Islam Phd Student Computer Science Wayne State University From arareko at campus.iztacala.unam.mx Mon Jul 23 17:19:24 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 23 Jul 2007 16:19:24 -0500 Subject: [Bioperl-l] error running codeml In-Reply-To: <20070723160614.EEU90041@mirapointms6.wayne.edu> References: <20070723160614.EEU90041@mirapointms6.wayne.edu> Message-ID: <46A51B5C.9080808@campus.iztacala.unam.mx> Apparently, your script isn't able to locate the codeml executable in your Windows environment. Do you have the PAML package installed? Instructions on how to install it are located here: http://abacus.gene.ucl.ac.uk/software/paml.html Regards, Mauricio. Munirul Islam wrote: > Hello everyone: > > I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: > > [code] > use Bio::Tools::Run::Phylo::PAML::Codeml; > use Bio::AlignIO; > use Bio::TreeIO; > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > -file => 'NM_000034.CDSalign.paml'); > > my $aln = $alignio->next_aln; > > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); > my $tree = $treeio->next_tree; > > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); > > $codeml->alignment($aln); > $codeml->tree($tree); > > my ($rc,$parser) = $codeml->run(); > my $result = $parser->next_result; > my $MLmatrix = $result->get_MLmatrix(); > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; > [/code] > > It gives the following error when I try to compile: > > [error] > ------------ EXCEPTION: Bio::Root::Exception ------------- > MSG: unable to find or run executable for 'codeml' > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 > ----------------------------------------------------------- > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 > [/error] > > Any idea, guys? > > Munirul Islam > Phd Student > Computer Science > Wayne State University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From ba6450 at wayne.edu Mon Jul 23 19:53:22 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Mon, 23 Jul 2007 19:53:22 -0400 (EDT) Subject: [Bioperl-l] error running codeml Message-ID: <20070723195322.EEV22403@mirapointms6.wayne.edu> Thanks Mauricio. I needed to add an environment variable for the paml directiory. $ENV{'PAMLDIR'} = 'c:\paml3.15\bin'; One question ... I would like to save the temp files. So, what modification do I need to make such that $obj->save_tempfiles returns 1 within codeml.pm? Regards Munir ---- Original message ---- >Date: Mon, 23 Jul 2007 16:19:24 -0500 >From: Mauricio Herrera Cuadra >Subject: Re: [Bioperl-l] error running codeml >To: Munirul Islam >Cc: bioperl-l at lists.open-bio.org > >Apparently, your script isn't able to locate the codeml executable in >your Windows environment. Do you have the PAML package installed? >Instructions on how to install it are located here: > >http://abacus.gene.ucl.ac.uk/software/paml.html > >Regards, >Mauricio. > >Munirul Islam wrote: >> Hello everyone: >> >> I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: >> >> [code] >> use Bio::Tools::Run::Phylo::PAML::Codeml; >> use Bio::AlignIO; >> use Bio::TreeIO; >> >> my $alignio = Bio::AlignIO->new(-format => 'phylip', >> -file => 'NM_000034.CDSalign.paml'); >> >> my $aln = $alignio->next_aln; >> >> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); >> my $tree = $treeio->next_tree; >> >> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); >> >> $codeml->alignment($aln); >> $codeml->tree($tree); >> >> my ($rc,$parser) = $codeml->run(); >> my $result = $parser->next_result; >> my $MLmatrix = $result->get_MLmatrix(); >> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; >> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; >> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; >> [/code] >> >> It gives the following error when I try to compile: >> >> [error] >> ------------ EXCEPTION: Bio::Root::Exception ------------- >> MSG: unable to find or run executable for 'codeml' >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 >> ----------------------------------------------------------- >> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 >> [/error] >> >> Any idea, guys? >> >> Munirul Islam >> Phd Student >> Computer Science >> Wayne State University >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >-- >MAURICIO HERRERA CUADRA >arareko at campus.iztacala.unam.mx >Laboratorio de Gen?tica >Unidad de Morfofisiolog?a y Funci?n >Facultad de Estudios Superiores Iztacala, UNAM > From jason at bioperl.org Tue Jul 24 03:19:18 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 24 Jul 2007 09:19:18 +0200 Subject: [Bioperl-l] error running codeml In-Reply-To: <46A51B5C.9080808@campus.iztacala.unam.mx> References: <20070723160614.EEU90041@mirapointms6.wayne.edu> <46A51B5C.9080808@campus.iztacala.unam.mx> Message-ID: <8273f6c20707240019q1f5e55c9i79a3142a92e2be6e@mail.gmail.com> when you initialize the Codeml object just pass in my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1); OR do $codeml->save_tempfiles(1); You may want to set you TEMPDIR as well and you print out where the tempdir is located with print $codeml->tempdir; and I think you can get the temp outfile. my $name = $codeml->outfile_name; print "name is $name\n"; -jason On 7/23/07, Mauricio Herrera Cuadra wrote: > > Apparently, your script isn't able to locate the codeml executable in > your Windows environment. Do you have the PAML package installed? > Instructions on how to install it are located here: > > http://abacus.gene.ucl.ac.uk/software/paml.html > > Regards, > Mauricio. > > > Munirul Islam wrote: > > Hello everyone: > > > > I am new to bioperl. I am running perl in Eclipse in Windows. Here is > the code: > > > > [code] > > use Bio::Tools::Run::Phylo::PAML::Codeml; > > use Bio::AlignIO; > > use Bio::TreeIO; > > > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > > -file => 'NM_000034.CDSalign.paml'); > > > > my $aln = $alignio->next_aln; > > > > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); > > my $tree = $treeio->next_tree; > > > > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); > > > > $codeml->alignment($aln); > > $codeml->tree($tree); > > > > my ($rc,$parser) = $codeml->run(); > > my $result = $parser->next_result; > > my $MLmatrix = $result->get_MLmatrix(); > > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; > > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; > > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; > > [/code] > > > > It gives the following error when I try to compile: > > > > [error] > > ------------ EXCEPTION: Bio::Root::Exception ------------- > > MSG: unable to find or run executable for 'codeml' > > STACK: Error::throw > > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 > > ----------------------------------------------------------- > > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI > (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 > > [/error] > > > > Any idea, guys? > > > > Munirul Islam > > Phd Student > > Computer Science > > Wayne State University > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From ba6450 at wayne.edu Tue Jul 24 17:16:54 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Tue, 24 Jul 2007 17:16:54 -0400 (EDT) Subject: [Bioperl-l] error loading sequence Message-ID: <20070724171654.EEX04380@mirapointms6.wayne.edu> Hello everyone: I am having problem loading a sequence file from within a directory. ############################################################# $dirname = "rundir"; opendir (DIR, $dirname) || die("can't open $dirname"); while (defined($file = readdir(DIR))) { next if $file =~ /^\.\.?$/; # skip . and .. $abs_path = File::Spec->rel2abs( $file ) ; # gives a file not found exception for the following code my $alignio = Bio::AlignIO->new(-format => 'nexus', -file => $abs_path); my $aln = $alignio->next_aln; @sequencenames -> $aln->_read_taxlabels; foreach $taxa (@sequencenames) { print $taxa . "\n"; } } ############################################################# Your suggestions please. Regards, Munirul Islam PhD Student Computer Science Wayne State University Detroit, Michigan, USA From bix at sendu.me.uk Tue Jul 24 18:39:33 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 24 Jul 2007 23:39:33 +0100 Subject: [Bioperl-l] error loading sequence In-Reply-To: <20070724171654.EEX04380@mirapointms6.wayne.edu> References: <20070724171654.EEX04380@mirapointms6.wayne.edu> Message-ID: <46A67FA5.3070505@sendu.me.uk> Munirul Islam wrote: > Hello everyone: > > I am having problem loading a sequence file from within a directory. > > ############################################################# > $dirname = "rundir"; > opendir (DIR, $dirname) || die("can't open $dirname"); > > while (defined($file = readdir(DIR))) { > next if $file =~ /^\.\.?$/; # skip . and .. > $abs_path = File::Spec->rel2abs( $file ) ; > > # gives a file not found exception for the following code This isn't a Bioperl problem. You're using the wrong File::Spec method. You want File::Spec->catfile($dirname, $file). From ba6450 at wayne.edu Tue Jul 24 20:10:04 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Tue, 24 Jul 2007 20:10:04 -0400 (EDT) Subject: [Bioperl-l] error loading sequence Message-ID: <20070724201004.EEX30791@mirapointms6.wayne.edu> Thanks. That worked nicely. I need your suggestion to load codeml control data from a file. Consider the following code: ------------------------------------------------------------- my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1, -params => {'noisy' => 9, 'verbose' => 2, 'runmode' => 0, 'seqtype' => 1, 'CodonFreq' => 2, 'aaDist' => 0, 'model' => 2, 'NSsites' => 2, 'icode' => 0 }); ------------------------------------------------------------- Tried to modify it by passing a hash reference after loading data from a file.: ------------------------------------------------------------- my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1, -params => \%hashlist ); ------------------------------------------------------------- Still that didn't work. Your suggestions pls. Munir ---- Original message ---- >Date: Tue, 24 Jul 2007 23:39:33 +0100 >From: Sendu Bala >Subject: Re: [Bioperl-l] error loading sequence >To: Munirul Islam >Cc: bioperl-l at lists.open-bio.org > >Munirul Islam wrote: >> Hello everyone: >> >> I am having problem loading a sequence file from within a directory. >> >> ############################################################# >> $dirname = "rundir"; >> opendir (DIR, $dirname) || die("can't open $dirname"); >> >> while (defined($file = readdir(DIR))) { >> next if $file =~ /^\.\.?$/; # skip . and .. >> $abs_path = File::Spec->rel2abs( $file ) ; >> >> # gives a file not found exception for the following code > >This isn't a Bioperl problem. You're using the wrong File::Spec method. >You want File::Spec->catfile($dirname, $file). From ba6450 at wayne.edu Thu Jul 26 15:21:20 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Thu, 26 Jul 2007 15:21:20 -0400 (EDT) Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl) Message-ID: <20070726152120.EFA94600@mirapointms6.wayne.edu> Hello Everyone: I have an alignment ('seq.txt'). It runs fine when I directly run codeml. But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved. my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => 'seq.txt'); I guess its not in valid phylip format. I tried to change 'seq.txt' to sequential format. Still that didn't work. Any suggestions on how to load 'seq.txt' in bioperl? Thanks, Munir PhD Student Computer Science Wayne State University -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: seq.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seq.out Type: application/octet-stream Size: 24318 bytes Desc: not available URL: From jason at bioperl.org Thu Jul 26 20:12:03 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Jul 2007 17:12:03 -0700 Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl) In-Reply-To: <20070726152120.EFA94600@mirapointms6.wayne.edu> References: <20070726152120.EFA94600@mirapointms6.wayne.edu> Message-ID: <8273f6c20707261712o149fb884v2044421146e8bc24@mail.gmail.com> You can try and pass in -interleaved => 0 as another option when you init your AlignIO object. On 7/26/07, Munirul Islam wrote: > Hello Everyone: > > I have an alignment ('seq.txt'). It runs fine when I directly run codeml. But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved. > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > -file => 'seq.txt'); > > I guess its not in valid phylip format. > > I tried to change 'seq.txt' to sequential format. Still that didn't work. > > Any suggestions on how to load 'seq.txt' in bioperl? > > Thanks, > > Munir > PhD Student > Computer Science > Wayne State University > > 11 2202 > > human > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAT AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC > GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC > CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT > TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CAC CCC TCA GAG CGC CCC ACA GCT GGC CCC > ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG > CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT > GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG --- > --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG > CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CGG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGA GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG > AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TCC CGG AGT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > chimp > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAC AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AAA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC > GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC > CCC AGC GAG AGA CTT TAC ACC CAG GAT GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC > CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT > TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CGC CCC TCA GAG CGC CCC ACA GCT GGC CCC > ACA GGT CCC CCC NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN --- NNN ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG > CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT > GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG --- > --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG > CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT TTG GAC AAG > CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG > AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TCC CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > macaca > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AAA ACC NNN AAT CTC ACT GAC AGG CAG CTG GCA GAG GAC TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CAT --- GGA GAC TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC CAG ACC GGT GAG CTA GAC AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAA GAC GCC TTT GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGG CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCG > CTG GGC AAG GGC GTC GTG GTT CCA ACT AAG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACA GAC GGT CGC TCC GAC > GGC GTG CCC TGG TGC AGT ACC ACA GCC AAC TAC GAC ACT GAC CGC CGG TTT GGC TTC TGT > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAC GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GCC GAC TCG ACC GTG ATC GGG GGC AAC TCG GCG GGG GAG CTG TGC GTT TTC CCC TTC > ACC TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT CTG TTC CTC GTG GCA GCT CAC GAA TTC GGC CAC GCG CTG GGC TTA GAT CAT > ACC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGA TTC ACT GAG GAG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CAG TAT CTC TAT GGT TCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACT GGA CCC CCC ACT GTC CGC CCC TCA GAC CGC CCC ACA GCC GGC CCC > ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG ACC ACT ACT --- GTG > CCT TTG AAT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC ACG GAG ATC > GGG AAC CAG CTG TAT CTG TTC AAG GAT GGG AGG TAC TGG --- --- CGA TTC TCC GAG --- > --- CGC AGG GGG AGC CGG CTG CAG GGC CCC TTC CTT ATC GCC GAC ACG TGG CCC GCG TTG > CCC CGC AAG CTG GAC TCG GCC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTA GAC AAG > CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG CGT GGC GCG GGG > AAG ATG CTG CTA TTC AGC GGG CGG CGC TTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTA GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CAA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TTC CAG AGT NNN NNN NNN NNN NNN NNN NNN GGG GTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > mouse > GCT GCC CCT TAC CAG CGC --- CAG CCG --- ACT TTT --- GTG GTC TTC CCC AAA GAC CTG > AAA ACC TCC AAC CTC ACG GAC ACC CAG CTG GCA GAG GCA TAC TTG TAC CGC TAT GGT TAC > ACC CGG GCC GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCT CTA CGG --- CCG GCT TTG > CTG ATG CTT CAG AAG CAG CTC TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC CAG ACA CTA > AAG GCC ATT CGA ACA CCA CGC TGT GGT GTC CCA GAC GTG GGT CGA TTC CAA ACC TTC AAA > GGC NNN CTC AAG TGG GAC CAT CAT AAC ATC ACA TAC TGG ATC CAA AAC TAC TCT GAA GAC > TTG CCG CGA GAC ATG ATC GAT GAC GCC TTC GCG CGC GCC TTC GCG GTG TGG GGC GAG GTG > GCA CCC CTC ACC TTC ACC CGC GTG TAC GGA CCC GAA GCG GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGC AAG GAC GGC CTT CTG GCA CAC GCC > TTT CCC CCT GGC GCC GGC GTT CAG GGA GAT GCC CAT TTC GAC GAC GAC GAG TTG TGG TCG > CTG GGC AAA GGC GTC GTG ATC CCC ACT TAC TAT GGA AAC TCA AAT GGT GCC CCA TGT CAC > TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TCG GCC TGC ACC ACA GAC GGC CGC AAC GAC > GGC ACG CCT TGG TGT AGC ACA ACA GCT GAC TAC GAT AAG GAC GGC AAA TTT GGT TTC TGC > CCT AGT GAG AGA CTC TAC ACG GAG CAC GGC AAC GGA GAA GGC AAA CCC TGT GTG TTC CCG > TTC ATC TTT GAG GGC CGC TCC TAC TCT GCC TGC ACC ACT AAA GGC CGC TCG GAT GGT TAC > CGC TGG TGC GCC ACC ACA GCC AAC TAT GAC CAG GAT AAA CTG TAT GGC TTC TGC CCT ACC > CGA GTG GAC GCG ACC GTA GTT GGG GGC AAC TCG GCA GGA GAG CTG TGC GTC TTC CCC TTC > GTC TTC CTG GGC AAG CAG TAC TCT TCC TGT ACC AGC GAC GGC CGC AGG GAT GGG CGC CTC > TGG TGT GCG ACC ACA TCG AAC TTC GAC ACT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA > GGG TAC AGC CTG TTC CTG GTG GCA GCG CAC GAG TTC GGC CAT GCA CTG GGC TTA GAT CAT > TCC AGC GTG CCG GAA GCG CTC ATG TAC CCG CTG TAT AGC TAC CTC GAG GGC TTC CCT CTG > AAT AAA GAC GAC ATA GAC GGC ATC CAG TAT CTG TAT GGT CGT GGC TCT AAG CCT GAC CCA > AGG CCT CCA GCC ACC ACC ACA ACT NNN NNN NNN GAA --- CCA CAG CCG ACA GCA CCT CCC > ACT ATG TGT CCC ACT ATA CCT CCC ACG GCC TAT CCC ACA GTG GGC CCC ACG GTT GGC CCT > ACA GGC GCC CCC TCA CCT GGC CCC ACA AGC AGC CCG TCA CCT GGC CCT ACA GGC GCC CCC > TCA CCT GGC CCT ACA GCG CCC --- CCT ACT GCG GGC TCT TCT GAG GCC TCT ACA --- GAG > TCT TTG AGT CCG GCA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCT ATT GCT GAG ATC > CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT TGG TAC TGG --- --- AAG TTC CTG AAT --- > --- CAT AGA GGA AGC CCA TTA CAG GGC CCC TTC CTT ACT GCC CGC ACG TGG CCA GCC CTG > CCT GCA ACG CTG GAC TCC GCC TTT GAG GAT CCG CAG ACC AAG AGG GTT TTC TTC TTC TCT > GGA CGT CAA ATG TGG GTG TAC ACA GGC AAG ACC GTG CTG GGC CCC AGG AGT CTG GAT AAG > TTG GGT CTA GGC CCA GAG GTA ACC CAC GTC AGC GGG CTT CTC CCG CGT CGT CTC --- GGG > AAG GCT CTG CTG TTC AGC AAG GGG CGT GTC TGG AGA TTC GAC TTG AAG TCT CAG AAG GTG > GAT CCC CAG AGC GTC ATT CGC --- --- GTG GAT AAG GAG TTC TCT GGT GTG CCC TGG AAC > TCA CAC GAC ATC TTC CAG TAC CAA --- GAC AAA GCC TAT --- TTC TGC CAT GGC AAA TTC > TTC TGG CGT GTG AGT TTC CAA AAT GAG GTG AAC AAG GTG GAC CAT GAG GTG AAC CAG GTG > GAC GAC GTG GGC TAC GTG ACC TAC GAC CTC CTG CAG TGC CCT > rat > GCT GCC CCT CAC CAG CGC --- CAG CCG --- ACT TAT --- GTG GTC TTC CCC CGA GAC CTG > AAA ACC TCC AAC CTC ACG GAC ACA CAG CTG GCA GAG GAT TAC CTG TAC CGC TAT GGT TAC > ACT CGG GCA GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCC CTG CGG --- CCC GCT TTG > CTG ATG CTT CAG AAG CAG CTG TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC GAG ACA CTA > AAG GCC ATT CGT TCA CCG CGC TGT GGT GTC CCA GAC GTG GGC AAA TTC CAA ACC TTC GAA > GGC GAC CTC AAG TGG CAC CAT CAT AAC ATC ACC TAT TGG ATC CAA AGC TAC ACC GAA GAC > TTG CCG CGA GAC GTG ATC GAT GAC TCC TTC GCG CGC GCC TTC GCG GTG TGG AGC GCG GTG > ACA CCG CTC ACC TTC ACC CGC GTG TAC GGG CTC GAA GCA GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGG GAC GGG TAT CCC TTC GAC GGC AAG GAT GGT CTA CTG GCA CAC GCC > TTT CCC CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAC GAG TTG TGG TCG > CTG GGC AAA GGC GCC GTG GTC CCC ACT TAC TTT GGA AAC GCA AAT GGT GCC CCA TGT CAC > TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TTG TCC TGC ACC ACG GAT GGC CGC AAC GAC > GGC AAG CCT TGG TGT GGC ACG ACA GCT GAC TAC GAC ACA GAC AGA AAA TAT GGT TTC TGC > CCC AGT GAG AAT CTC TAC ACG GAG CAT GGC AAC GGA GAC GGC AAA CCC TGC GTA TTT CCA > TTC ATC TTC GAG GGC CAC TCC TAC TCT GCC TGC ACC ACT AAA GGT CGC TCG GAT GGT TAT > CGC TGG TGC GCC ACC ACC GCC AAC TAT GAC CAG GAT AAG CTG TAT GGC TTC TGT CCT ACT > CGA GCC GAC GTC ACT GTA ACT GGG GGC AAC TCG GCA GGA GAG ATG TGC GTC TTC CCC TTC > GTC TTC CTG GGC AAG CAG TAC TCT ACC TGT ACC GGC GAG GGC CGC AGT GAT GGG CGC CTC > TGG TGC GCG ACG ACG TCG AAC TTC GAC GCT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA > GGG TAC AGC CTG TTT CTG GTG GCA GCG CAC GAG TTC GGC CAT GCG CTG GGC TTA GAT CAT > TCT TCA GTG CCG GAA GCG CTC ATG TAC CCC ATG TAT CAC TAC CAC GAG GAC TCC CCT CTG > CAT GAA GAC GAC ATA AAA GGC ATC CAG CAT CTG TAT GGT CGT GGC TCT AAA CCT GAC CCA > AGG CCT CCA GCC ACC ACC GCA GCT NNN NNN NNN GAA --- CCA CAG CCG ACA GCT CCT CCC > ACT ATG TGT CCC ACT GCA CCT CCC ATG GCC TAT CCC ACA GGG GGC CCC ACA GTC GCC CCT > ACA GGC GCC CCC TCA CCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCT ACT GCT GGT CCT TCT GAG GCC CCT ACA --- GAG > TCT TCG ACT CCA GTA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCC ATT GCT GAT ATC > CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT CGG TAT TGG --- --- AAG TTC TCG AAT --- > --- CAC GGA GGA AGC CAA TTG CAG GGC CCC TTT CTT ATT GCC CGC ACG TGG CCA GCT TTG > CCT GCA AAG TTG AAC TCA GCC TTT GAG GAT CCG CAG TCC AAG AAG ATT TTC TTC TTC TCT > GGG CGC AAA ATG TGG GTG TAC ACA GGC CAG ACG GTG CTG GGC CCC AGG AGT CTG GAT AAG > TTG GGG CTA GGC TCA GAG GTA ACC CTG GTC ACC GGA CTT CTC CCG CGT CGT GGA --- GGG > AAG GCT CTG CTG ATC AGC CGG GAA CGT ATC TGG AAA TTC GAC TTG AAG TCT CAG AAG GTG > GAT CCC CAG AGC GTT ACT CGC --- --- TTG GAT AAC GAG TTC TCT GGC GTG CCC TGG AAC > TCA CAC AAC GTC TTT CAC TAC CAA --- GAC AAG GCC TAT --- TTC TGC CAT GAC AAA TAC > TTC TGG CGT GTG AGT TTC CAC AAC NNN NNN NNN NNN NNN NNN NNN CGG GTG AAC CAG GTG > GAC CAC GTG GCC TAC GTG ACC TAT GAC CTC CTG CAG TGC CCT > rabbit > GCC GCC CCT CGC CGC CGC --- CAG CCC --- ACC TTG --- GTG GTC TTC CCA GGA GAG CTG > AGA ACC NNN AGG CTC ACC GAC AGG CAG CTG GCA GAG GAG TAC CTG TTC CGC TAT GGT TAC > ACC CGC GTA GCC AGC ATG CAC --- GGA GAC AGC CAG --- TCC CTG CGG CTG CCG --- CTG > CTA CTT CTG CAG AAG CAT CTG TCC CTG CCG GAG ACG GGG GAG CTG GAT AAT GCC ACC CTG > GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC GTG GGC AAA TTC CAG ACC TTC GAG > GGT GAC CTC AAG TGG CAC CAC CAC AAC ATC ACG TAC TGG ATC CAA AAC TAC TCC GAA GAC > CTG CCG CGC GAC GTC ATC GAC GAC GCC TTC GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG > ACG CCA CTC ACC TTC ACC CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGG > GTC GCG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGC AAG GAC GGG CTC CTG GCG CAC GCC > TTC CCT CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAA GAG CTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCC ACG TAC TTT GGA AAC GCC GAC GGC GCC CCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC ACC GCC TGC ACC ACG GAC GGC CGC TCT GAC > GGC ATG GCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTT GGC TTC TGC > CCC AGC GAA AGA CTC TAC ACC CAG GAC GGC AAC GCA GAC GGC AAG CCC TGC GAG TTT CCG > TTC ATC TTC CAG GGC CGT ACC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCC GAC GGC CAC > CGC TGG TGC GCC ACC ACC GCC AGC TAC GAC AAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GCT GAC TCC ACG GTG GTC GGG GGC AAC TCG GCG GGA GAG CTG TGT GTC TTC CCC TTC > GTC TTC CTG GGC AAA GAG TAC TCG TCC TGT ACC AGC GAG GGT CGC AGG GAT GGG CGC CTC > TGG TGT GCC ACC ACT TCC AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCT GAT AAA > GGA TAC AGC CTG TTC CTC GTG GCA GCC CAC GAG TTC GGC CAT GCA CTG GGC TTG GAT CAC > TCC TCT GTG CCG GAG CGC CTC ATG TAC CCC ATG TAC CGC TAC CTA GAG GGG TCC CCC CTG > CAC GAG GAC GAC GTC AGG GGC ATC CAG CAT CTA TAT GGT CCT AAC CCC AAC CCC CAG CCT > --- CCA GCC ACC ACC ACA CCT GAN NNN NNN NNN NNN NNG CCG CAG CCC ACG GCT CCC CCG > ACG GCC TGC CCC ACC TGG CCG GCC ACT GTG CGC CCC TCC GAG CAC CCC ACT ACC AGC CCT > ACC GGC GCC CCC TCA GCT GGC CCT ACC GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACG GCC AGC CCC TCT GCG GCC CCC ACT --- GCG > TCC TTG GAC CCA GCT GAA GAC GTC TGC AAC GTG AAT GTC TTC GAC GCC ATC GCC GAG ATA > GGG AAC AAG CTG CAT GTC TTC AAG GAT GGG AGG TAC TGG --- --- CGG TTC TCC GAG --- > --- GGC AGT GGG CGC CGG CCG CAG GGC CCC TTC CTC ATC GCC GAC ACC TGG CCC GCG CTG > CCG GCC AAG CTG GAC TCC GCC TTT GAG GAG CCG CTC ACC AAG AAG CTG TTC TTC TTC TCG > GGG CGC CAA GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGT CCC GAG GTG CCG CAC GTC ACC GGA GCC CTC CCG CGC GCC GGG --- GGC > AAG GTG CTG CTG TTC GGC GCG CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACG GTG > GAT TCC CGG AGC GGC GCT CCG --- --- GTG GAT CAG ATG TTC CCC GGG GTG CCT TTG AAC > ACA CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TTC TGG CGT GTG AGT ACC CGG AAC NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CTA GTG > GAC CAG GTG GGC TAC GTG AGC TTT GAC ATC CTG CAC TGC CCT > dog > GCA GCT CCC AGA CCA CAC --- AAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAC CTG > AGA ACT NNN AAT CTC ACT GAC AAG CAG CTG GCA GAG GAA TAT CTG TTT CGC TAT GGC TAC > ACT CAA GTG GCC GAG CTG AGC --- GAC GAC AAG CAG --- TCC CTG AGT CGC GGG --- CTG > CGG CTT CTC CAG AGG CGC CTG GCT CTG CCT GAG ACT GGA GAG CTG GAC AAA ACC ACC CTG > GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC CTG GGC AAA TTC CAG ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC AAC GAC ATC ACT TAC TGG ATA CAA AAC TAC TCG GAA GAC > TTG CCC CGC GAC GTG ATC GAC GAC GCC TTT GCC CGA GCC TTC GCG GTC TGG AGC GCG GTG > ACA CCG CTC ACC TTC ACT CGC GTG TAC GGC CCC GAA GCC GAC ATC ATC ATT CAG TTT GGT > GTT AGG GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTT CTG GCT CAC GCC > TTT CCT CCC GGC CCG GGC ATT CAG GGA GAC GCC CAC TTC GAC GAC GAG GAG TTA TGG ACT > CTG GGC AAG GGC GTC GTG GTT CCG ACC CAC TTC GGA AAC GCA GAT GGC GCC CCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACG GAC GGC CGC TCC GAT > GAC ACG CCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTC GGC TTC TGC > CCC AGC GAG AAA CTC TAC ACC CAG GAC GGC AAT GGG GAC GGC AAG CCC TGC GTG TTT CCG > TTC ACC TTC GAG GGC CGC TCC TAC TCC ACG TGC ACC ACC GAC GGC CGC TCG GAC GGC TAC > CGC TGG TGC TCC ACC ACC GGC GAC TAC GAC CAG GAC AAA CTC TAC GGC TTC TGC CCA ACC > CGA GTC GAT TCC GCG GTG ACC GGG GGC AAC TCC GCC GGG GAG CCG TGT GTC TTC CCC TTC > ATC TTC CTG GGC AAG CAG TAC TCG ACG TGC ACC AGG GAG GGC CGC GGA GAT GGG CAC CTC > TGG TGC GCC ACC ACT TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGC CTG TTC CTT GTG GCC GCC CAT GAG TTC GGC CAC GCG CTG GGT TTA GAT CAT > TCA TCG GTG CCA GAA GCG CTC ATG TAC CCC ATG TAC AGC TTC ACC GAG GGC CCC CCC CTG > CAT GAA GAC GAC GTG AGG GGC ATC CAG CAT CTG TAC GGT CCT CGC CCT GAA CCT GAG CCA > CAG CCT CCA ACC GCN NNN NNN NNN NNN NNN NNN NNN --- NNC CCG CCC ACC GCC CCG CCC > ACC GTC TGC GCT ACT GGT CCT CCC ACC ACC CGC CCC TCA GAG CGC CCC ACT GCT GGC CCC > ACA GGC CCC CCT GCA GCT GGC CCC ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCC TCT GAG GCC CCT ACA --- GTG > CCT GTG GAT CCG GCA GAG GAT ATA TGC AAA GTG AAC ATC TTC GAC GCC ATC GCG GAG ATC > AGG AAC TAC TTG CAT TTC TTC AAG GAA GGG AAG TAC TGG --- --- CGA TTC TCC AAG --- > --- GGC AAG GGA CGC CGG GTG CAG GGC CCC TTC CTT ATC ACC GAC ACG TGG CCT GCG CTG > CCC CGC AAG CTG GAC TCC GCC TTT GAG GAC GGG CTC ACC AAG AAG ACT TTC TTC TTC TCT > GGG CGC CAA GTG TGG GTG TAC ACA GGC ACG TCG GTG GTA GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGC CCG GAG GTT ACC CAA GTC ACC GGC GCC CTC CCG CAA GGC GGG --- GGT > AAG GTG CTG CTG TTC AGC AGG CAG CGC TTC TGG AGT TTC GAC GTG AAG ACG CAG ACC GTG > GAT CCC AGG AGC GCC GGC TCG --- --- GTG GAA CAG ATG TAC CCC GGG GTG CCC TTG AAC > ACG CAT GAC ATC TTC CAG TAC CAA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TAC TGG CGT GTG AAT TCT CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CAG GTG > GAC GAA GTG GGC TAC GTG ACC TTT GAC ATT TTG CAG TGC CCT > cow > GCT GTC CCC AGA CGA CGC --- CAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAA CCA > CGA ACC NNN AAC CTC ACC AAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGC TAC > ACT CCT GGG GCA GAG CTG AGC --- GAG GAC GGT CAG --- TCC CTG CAG CGA GCT CTG CTG > CGC --- TTC CAG CGG CGC CTG TCC CTG CCC GAG ACT GGC GAG CTG GAC AGC ACC ACC CTG > AAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC GTG GGC AGA TTC CAG ACC TTT GAG > GGC GAA CTC AAG TGG CAC CAC CAC AAC ATC ACC TAC TGG ATC CAA AAT TAC TCG GAA GAC > CTG CCG CGC GCC GTG ATC GAC GAC GCC TTT GCC CGC GCT TTC GCG CTC TGG AGC GCT GTG > ACG CCG CTC ACC TTC ACT CGA GTG TAC GGC CCC GAA GCT GAC ATT GTC ATC CAG TTT GGT > GTT AGA GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTC CTG GCA CAC GCC > TTT CCG CCT GGC AAA GGC ATT CAG GGA GAT GCC CAC TTC GAC GAT GAA GAG TTG TGG TCT > CTG GGC AAA GGC GTT GTG ATC CCG ACC TAC TTC GGA AAC GCG AAG GGC GCC GCC TGC CAC > TTC CCC TTC ACC TTT GAG GGT CGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGT TCC GAC > GAC ATG CTC TGG TGC AGC ACC ACC GCC GAC TAC GAC GCC GAC CGC CAG TTC GGC TTC TGC > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCG GAC GGC AAG CCC TGC GTC TTC CCG > TTC ACC TTC CAG GGC CGC ACC TAC TCC GCC TGT ACC TCC GAT GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GTC GAT GCA ACG GTG ACC GGG GGC AAC GCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACC TTC CTG GGC AAG GAA TAC TCG GCC TGC ACC AGA GAG GGT CGC AAT GAT GGG CAC CTC > TGG TGC GCC ACC ACC TCC AAC TTC GAC AAA GAC AAG AAG TGG GGC TTC TGC CCG GAT CAA > GGA TAC AGC CTG TTC CTT GTG GCC GCA CAC GAG TTT GGC CAC GCG CTG GGC TTA GAT CAC > ACC TCC GTG CCA GAG GCG CTC ATG TAC CCC ATG TAC AGA TTC ACA GAG GAG CAC CCC CTG > CAT AGG GAC GAT GTT CAG GGC ATC CAG CAT CTG TAT GGT CCT CGC CCT GAG CCT GAA CCA > CGG CCT CCG ACC ACT ACC ACC ACT ACC ACC ACC GAA --- CCC CAG CCC ACC GCT CCC CCC > ACG GTC TGC GTC ACG GGG CCT CCC ACC GCC CGC CCC TCA GAG GGT CCC ACT ACT GGC CCC > ACA GGG CCC CCG GCA GCT GGC CCT ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCT --- CCC ACG GCT GGC CCT TCT GCG GCC CCG ACG GAG TCC > CCG --- GAT CCA GCG GAG GAC GTC TGC AAC GTG GAC ATC TTC GAC GCC ATC GCG GAG ATT > AGG AAC CGC TTG CAT TTC TTC AAG GCT GGG AAG TAC TGG --- --- AGA CTT TCT GAG --- > --- GGA GGG GGC CGC CGG GTG CAG GGT CCC TTC CTT GTC AAG AGC AAG TGG CCT GCG CTG > CCC CGC AAG CTG GAC TCC GCC TTC GAG GAT CCG CTC ACC AAG AAG ATT TTC TTC TTC TCT > GGG CGC CAA GTA TGG GTG TAC ACC GGC GCG TCG TTG CTA GGC CCG AGG CGT CTG GAC AAG > TTG GGC CTG GGC CCG GAA GTG GCC CAG GTC ACC GGG GCC CTC CCG CGC CCT GAG --- GGT > AAG GTG CTG CTG TTC AGC GGG CAG AGC TTC TGG AGG TTC GAC GTG AAG ACA CAG AAG GTG > GAT CCC CAG AGC GTC ACC CCC --- --- GTG GAC CAG ATG TTC CCC GGG GTG CCC ATT AGC > ACG CAC GAC ATC TTT CAG TAC CAA --- GAG AAA GCT TAC --- TTC TGC CAG GAT CAC TTC > TAC TGG CGC GTG AGT TCC CAG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAT CAG GTG > GAC TAT GTG GGC TAC GTG ACC TTC GAC CTC CTG AAG TGC CCT > elephant > --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- > --- --- --- --- --- --- --- --- --- --- --- GAG --- TAT CTG TAC CGC TAT GGC TAC > ACT CGT GTG GCG GAG ATG AAC --- --- AGT AAG GTG --- TCC CTG GGT --- CGA GCG CTA > AGG CTT CTC CAG CAA AAC CTG GCC CTG CCC GAG ACC GGC GAG CTG GAC AGC ACC ACC CTG > GAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC ATG GGT GGC TTC CAG ACC TTC GAG > GGT GAC CTC AAG TGG AAC CAC CAC AAC ATC ACA TAC TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCC AAA CAA GTG ATC GAA GAC GCT TTT GCC CGC GCC TTC GCG GCG TGG AGC GAG GTG > ACA CCA CTC ACC TTC ACC CGC CTG CGC AGC AGG GAC GTG GAC ATC GTC ATC CGG TTT GGG > GTC AAG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGG AAG GAC GGG CTG CTG GCA CAC GCC > TTT CCT CCC GGC CCC GGC ATT CAG GGA GAC GCG CAC TTC GAC GAT GAC GAA TTG TGG TCG > TTG GGC AAG GGC GTC GTG GTT CCC ACC CGC TTT GGA AAC GCA GAT GGC GCC GCC TGC CAC > TTT CCC TTC ACC TTC CAG GGC CGC TCG TAC ACT GCC TGC ACC GCC GAC GGC CGC TCC GAC > GGC CAG CTC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGC CAG TTT GGC TTC TGC > CCC AGT GAG AGG CTC TAC ACC CAG CAC GGC AAT GAC AAC GGC AAG CCC TGC GTG TTT CCG > TTC ACG TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACC GAC GGC CGC TCG GAT GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAT GGC TTC TGT CCC ACC > CGA --- GNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- NNN NNN NNN --- > --- --- --- --- --- --- --- --- NNN NNN --- NNN NNN NNN --- --- --- --- --- --- > --- --- --- --- NNN NNN NNN NNN NNN --- --- --- --- --- --- --- --- NNN NNN NNN > NNN NNN --- --- --- --- NNN --- NNN NNN NNN NNN --- --- --- --- NNN NNN --- --- > --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- --- NNN NNN NNN NNN --- > --- --- --- --- --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- NNN NNN NNN --- NNN > NNN ATA GTG CTG TTT AGT AGA CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACT GTG > GAG CCC CGG AGC GTC CGC TCG --- --- GTG GAC CAG GTG TTC TCC GGG GTG CCC TTG GAC > ACG CAC GAC ATC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG TGT TTC CGG AAT GAT --- AAT GAA --- --- --- --- GTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG AAC TTT GAC ATC CTG CAG TGC CCT > opossum > GCT GCA CCC CGA GGG GGC CCC TCT CCC GGG TCT ATC TTG ATC ACC TTT CCT GAA GAG AGA > --- ACA CGC ACT CTC ACT GAC CAG CAA TTT GCT GAG GAA TAT CTG CTT CGG TAC GGC TAC > ATC CCG --- GCA GGG CTT CTG --- GGC CAA AAC CAC ACT TCT CTG AAG --- CAT GCC TTA > AAG AAA CTC CAA CGT CAG CTG GCC CTG ACA CAG ACG GGA GAG CTG GAC AGC GCC ACC ATC > GAG GCA ATG CGG GCC CCG CGC TGC GGA GTA CCC GAC GTC GCC CCA TTC CAA ACC TTC GAG > GGT GAA CTG AAG TGG AAA CAT CAG AAC ATC ACC TAT CGG ATC CAG AAT TAC TCC CCC GAC > CTG CCT CCT GAG GTG ACG GAT GAT GCT TTC CAA CGA GCC TTT GCT CTG TGG AGT AAA GTG > ACC CCA CTC ACC TTC ACA CGT GTC AGC AGC GGG GAG GCA GAC ATC CTG ATC CAG TTT GGG > ACC AGA GAG CAC GGC GAT GGA TAC CCT TTT GAC GGG AAA GAT GGA CTC TTG GCT CAC GCT > TTC CCC CCG GGC CCA GGA ATC CAG GGA GAT GCC CAC TTT GAT GAC GAG GAG TTC TGG ACT > CTA GGC AAA GGC GTC GTG GTC AAA ACG CGG TTC GGG AAC GCA GAC GGA GCC CCC TGC CAC > TTT CCT TTC ACC TTC GAG GGC AGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCT GAC > GGG CTG CAC TGG TGC AGC ACT ACG GCT GAC TAT GAC AAG GAC CGC CTT TAC GGC TTT TGC > CCT AGC GAG CTG CTC TAC ACC CTG GAT GGT AAC GCC AAT GGC GAT CCC TGC GTG TTC CCC > TTC ACC TTC GAT GGT CGT TCC TAC ACA GCC TGC ACC ACT GAA GGA CGC TCT GAC GGC TAC > CGC TGG TGT GCC ACT ACT GCC AGT TAC GAT CAG GAC AAG CTT TAT GGC TTC TGT CCC AAC > CGA --- GAT ACT GCG GTG AGC GGA GGC AAC TCC CAA GGG GAA CCC TGC GTC TTT CCC TTC > ACT TTC CTA AAT CGA GAA TAC TCA GCC TGC ACC AGT GAG GGC CGC AGT GAC GGT CGT CTC > TGG TGT GCG ACC ACC GAT GAC TTC GAT CGG GAT CAC AAG TGG GGC TTC TGT CAG GAT CGA > GGG TAC AGC TTA TTC CTT GTG GCC GCG CAC GAG TTT GGG CAC GCG CTG GGC TTG GAC CAC > TCA TCT GTG CCG GAA GCA TTG ATG TTC CCA ATG TAC CGT TTT ACC GAG GGA CCC CCG TTG > CAT GAG GAT GAC GTG AAG GGA ATC CAA CAT CTG TAT GGT TCT AGG ACT GAG CCG GAT CCG > GAA CCT CCG ACC TCT --- --- --- TCT CCC TTA GAG --- CCA GAT TCC ACC ACT CAG TTC > AAT GCT TGT --- --- --- CCC --- TCT GTA --- CCC CCC CCT --- --- --- GCC AGA CCC > ACC GGC CCT CCT ACT GCT CGC CCC TCA --- --- --- --- --- --- --- --- GCA CCT CCC > ACT GCT GGA CCC ACT GGT CCT --- CCC ACA GCC AAC CCT CCT GTG CCC CCC ACT --- GGG > CCC TTG GAC CCA GCT GAC GAC GCT TGT GGC GTC CTG GTA TTT GAT GCC ATC GCT GAG ATT > CGA GGC CAG CTT CAC TTC TTC AAA GAC GGA CGG TAC TGG CGA GTC CCC AGG GAC TCC --- > --- AAG --- GGG CCA --- ACT CAA GGA CCC TTC CTC ATT GCT AAC ACT TGG TCT GCT TTG > CCC CCA AAA CTG GAC TCG GCT TTC GAA GAT CCC CTG ACT AAG AAA CTC TTC TTC TTT TCA > GGT AAA GGT ATG TGG GTA TAC ACA GGC CAG TCA GTT GTA GGT CCC CGG CGC CTG GAG AAG > CTG GGT CTG CAT AGC AGA GTT CAA AGG ATA ACA GGT GCC ATT CAG CAT AAT GGA --- GGC > AAG GTG CTA TTA TTC AGC CAG AAT CAA TAT TGG AGG TTG GAT GTG AAG AAG CAG AAG GTA > GAC TCA AGA GAA CCT TAC CCT --- --- GTG GAG AAC ATG TTC CCT GGA GTA CCT GAA AAC > ACT CAT GAT GTT TTC CTG TAT AAG GGA GAT ACA --- TAC --- TTC TGC CAG GGC ATC TTC > TTC TGG CGC GTG AAC --- --- --- --- --- AAG GAG --- --- --- --- --- AAC AAG GTG > GAC TTA GTA GGC TAC GTG ACC TAC GAC CTC CTG --- --- --- > chicken > GCC GCC CCA CTG CAC AGC --- AAG CCG CAG GCG GTC --- ATC ACC TTC CCA GGG GAG CTG > --- CTC AGC GCC CCA TCA GAC GTG GAG CTG GCG GAG AAC TAC CTG CTG CGC TTC GGC TAC > ATC CAG GAG GCA GAG GTG AGG AGG AGC AGC AAG CAC GTG TCC CTG GCC --- AAA GCG CTG > CGC AGG ATG CAG AAG CAG CTG GGG CTG GAG GAG ACG GGG GAG CTG GAC GCC AGC ACC CTG > GAG GCC ATG CGA GCC CCC CGC TGT GGG GTG CCT GAC GTG GGG GGT TTC CTC ACC TTC GAG > GGG GAG CTC AAA TGG GAC CAC ATG GAC CTC ACG TAC CGG GTG ATG AAC TAC TCC CCC GAC > CTG GAC CGT GCC GTG ATA GAT GAT GCC TTC CGG CGG GCA TTC AAG GTG TGG AGT GAT GTC > ACT CCC CTC ACC TTC ACC CAG ATT TAC AGC GGC GAG GCA GAC ATC ATG ATC ATG TTC GGC > AGC CAA GAG CAT GGT GAT GGG TAC CCC TTC GAC GGC AAG GAT GGG CTC CTG GCC CAC GCC > TTT CCC CCC GGC AGT GGG ATT CAG GGC GAT GCC CAC TTC GAT GAT GAT GAG TTC TGG ACT > CTG GGA ACC GGC TTA GAG GTG AAG ACC CGC TAT GGG AAT GCC AAC GGG GCC AGC TGC CAC > TTC CCC TTC ATC TTT GAG GGC CGC TCC TAC TCC CGG TGC ATC ACG GAG GGC CGC ACG GAT > GGG ATG CTG TGG TGT GCC ACC ACC GCC AGC TAC GAC GCC GAC AAG ACC TAC GGC TTC TGC > CCC AGC GAG CTG CTC TAC ACC AAT GGT GGC AAC AGC GAT GGG TCT CCC TGC GTC TTC CCC > TTC ATC TTC GAT GGC GCC TCC TAT GAC ACC TGC ACC ACA GAT GGG CGC TCT GAC GGC TAT > CGC TGG TGT GCC ACC ACG GCC AAC TTC GAC CAG GAC AAG AAA TAC GGC TTC TGC CCC AAC > CGA --- GAC ACG GCG GCG ATC GGT GGC AAC TCC CAG GGG GAC CCG TGT GTC TTC CCC TTC > ACC TTC CTG GGG CAG TCC TAC AGC GCG CGC ACC AGC CAG GGC CGG CAG GAC GGG AAG CTC > TGG TGT GCC ACC ACC AGC AAC TAT GAC ACC GAC AAG AAG TGG GGC TTC TGC CCA GAC AGA > GGT TAC AGC ATC TTC TTG GTG GCT GCC CAC GAG TTT GGG CAC TCA CTG GGG CTG GAC CAC > TCC AGC GTG CGC GAG GCA TTG ATG TAC CCT ATG TAC AGC TAC GTC CAG GAC TTC CAG CTG > CAT GAG GAT GAT GTC CAG GGC ATC CAG TAC CTC TAT GGT CGT GGC TCT GGC CCT GAG CCC > ACC CCC CCG --- --- --- --- --- GCA CCT TTG --- --- CCC --- --- ACC GAG GAG --- > --- --- --- --- --- --- CCC CAG TCC ATA --- CCC ACC GAA --- --- --- GCT --- --- > --- GGC --- --- AGT GCT TCC ACC ACA --- --- --- --- --- --- --- --- GAG GAG GAG > GAG GAG --- GAG ACA --- CCT GAG CCC ACA GCT GAG --- --- --- --- CCC AGC --- --- > CCC GTG GAC CCC AGC CGG GAT GCC TGC ATG GAG AAG AAC TTC GAC GCC ATC ACT GAG ATC > AAT GGA GAG CTG CAC TTC TTC AAG AAT GGG AAA TAC TGG --- --- ACC CAC TCG TCC TTC > TGG AAA TCA GGC --- --- ACT CAG GGC GCC TTC TCT ATC GCT GAC ACC TGG CCC GGC CTC > CCG GCT GTC ATC GAC GCG GCG TTC CAA GAT GTG CTC ACC AAG AGG GTC TTC TTC TTC GCG > GGA CGG CAG TTC TGG GTG TTC TCC GGC AAG AAC GCA GTG GGC CCC CGT AGG ATT GAG AAG > TTG GGC ATT GGG AAG GAG GCC GGG CGC ATC ACG GGG GCC CTG CAG CGG GGA CGT --- GGC > AAA GTG CTG CTC TTC AGT GGG GAG CAC TAC TGG AGG CTG GAC GTG AAG GTC CAG ACA GTG > GAC --- AAG GGC --- TAC CCC CGT GAC ACT GAT GAT GTC TTT ACT GGT GTC CCC CTT GAC > GCA CGT AAC GTC TTC CTG TAC CAA --- GAC AAG --- TAC CAC TTC TGC CGG GAC AGC TTC > TAC TGG AGG ATG ACC --- --- --- --- --- CCA CGT --- --- --- --- --- TAC CAG GTG > GAC CGC GTG GGA TAC ATC AGA TAC GAC CTC CTG CAG TGC CCC > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From ba6450 at wayne.edu Thu Jul 26 21:20:11 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Thu, 26 Jul 2007 21:20:11 -0400 (EDT) Subject: [Bioperl-l] Finding the Sequence List in an Alignment Message-ID: <20070726212011.EFB49252@mirapointms6.wayne.edu> Thanks. The error is removed now. I have a question. Is there any function that I can use to get the sequence list (human, chimp, etc.) after loading an alignment from file? Munir ---- Original message ---- >Date: Thu, 26 Jul 2007 17:12:03 -0700 >From: "Jason Stajich" >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in bioperl) >To: "Munirul Islam" >Cc: bioperl-l at lists.open-bio.org > >You can try and pass in -interleaved => 0 as another option when you >init your AlignIO object. > From jason at bioperl.org Fri Jul 27 00:28:36 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Jul 2007 21:28:36 -0700 Subject: [Bioperl-l] Finding the Sequence List in an Alignment In-Reply-To: <20070726212011.EFB49252@mirapointms6.wayne.edu> References: <20070726212011.EFB49252@mirapointms6.wayne.edu> Message-ID: <8273f6c20707262128s23e7e3ebgeb1cb74b3c0baf37@mail.gmail.com> Have you tried reading the documentation for the Bio::SimpleAlign object? for my $seq ( $aln->each_seq ) { print $seq->display_id, "\n"; } I'd appreciate if you added some of your questions with the answers to the FAQ or to other places on the wiki so that other people can benefit from your learning here. On 7/26/07, Munirul Islam wrote: > > Thanks. The error is removed now. > > I have a question. Is there any function that I can use to get the > sequence list (human, chimp, etc.) after loading an alignment from file? > > Munir > > ---- Original message ---- > >Date: Thu, 26 Jul 2007 17:12:03 -0700 > >From: "Jason Stajich" > >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in > bioperl) > >To: "Munirul Islam" > >Cc: bioperl-l at lists.open-bio.org > > > >You can try and pass in -interleaved => 0 as another option when you > >init your AlignIO object. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From arareko at campus.iztacala.unam.mx Fri Jul 27 11:18:55 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 27 Jul 2007 10:18:55 -0500 Subject: [Bioperl-l] Perl Survey 2007 Message-ID: <46AA0CDF.1030503@campus.iztacala.unam.mx> It really takes about 5 minutes: http://perlsurvey.org/ Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From dhoworth at mrc-lmb.cam.ac.uk Fri Jul 27 12:07:17 2007 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Fri, 27 Jul 2007 17:07:17 +0100 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: <46AA0CDF.1030503@campus.iztacala.unam.mx> References: <46AA0CDF.1030503@campus.iztacala.unam.mx> Message-ID: <46AA1835.2020004@mrc-lmb.cam.ac.uk> Mauricio Herrera Cuadra wrote: > It really takes about 5 minutes: > http://perlsurvey.org/ and gives all your personal information including email address to anybody who cares to snoop the HTTP POST message! So there's definitely no anonymity. Cheers, Dave From spiros at lokku.com Fri Jul 27 12:38:57 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Fri, 27 Jul 2007 17:38:57 +0100 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: <46AA1835.2020004@mrc-lmb.cam.ac.uk> References: <46AA0CDF.1030503@campus.iztacala.unam.mx> <46AA1835.2020004@mrc-lmb.cam.ac.uk> Message-ID: On 7/27/07, Dave Howorth wrote: > Mauricio Herrera Cuadra wrote: > > It really takes about 5 minutes: > > http://perlsurvey.org/ > > and gives all your personal information including email address to > anybody who cares to snoop the HTTP POST message! So there's definitely > no anonymity. Not to mention that it requires registration (?). Who is behind the survey ? I am on a number of Perl and Perl related lists and haven't seen it being mentioned. Spiros From arareko at campus.iztacala.unam.mx Fri Jul 27 13:37:31 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 27 Jul 2007 12:37:31 -0500 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: References: <46AA0CDF.1030503@campus.iztacala.unam.mx> <46AA1835.2020004@mrc-lmb.cam.ac.uk> Message-ID: <46AA2D5B.9080304@campus.iztacala.unam.mx> Spiros Denaxas wrote: > On 7/27/07, Dave Howorth wrote: >> Mauricio Herrera Cuadra wrote: >>> It really takes about 5 minutes: >>> http://perlsurvey.org/ >> and gives all your personal information including email address to >> anybody who cares to snoop the HTTP POST message! So there's definitely >> no anonymity. I didn't provided any personal information other than my country and birthyear. As for my email, I always use the one I have for all the SPAM I'd like to subscribe to :) > Not to mention that it requires registration (?). Who is behind the > survey ? I am on a number of Perl and Perl related lists and haven't > seen it being mentioned. Registration is rather different from confirming your email (which prevents filling the DB multiple times by spambots/yourself, thus screwing the survey). Who's behind it, its purpose, privacy, etc., please read the FAQ: http://perlsurvey.org/faq/ Cheers, Mauricio. > Spiros > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From Alicia.Amadoz at uv.es Mon Jul 30 11:46:57 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Mon, 30 Jul 2007 17:46:57 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver Message-ID: <1245168492amadoz@uv.es> Hi, i'm trying to run a bioperl script in linux with standaloneblast from a webserver but I have the following error: -------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- I have tried several things to fix it as setting some environment variables both directly through the shell and adding some code in my script with, BEGIN { $ENV{PATH} .= ':/usr/local/blast-2.2.16'; $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; $ENV{BLASTDATADIR} = '/usr/local/data/'; } and with, $local->executable('/usr/local/bin'); my $blast_report = $local->blastall($inputfilename); I have also checked that the webserver has permission of read and execute in all blast executables and directories. But trying all of these things it keeps showing the same error above. Any more idea to solve this problem? My script works well when I use it as a simply script and I've reboot the system several times when changes where performed. Thanks to anyone who will be able to help me! Regards, Alicia From gyang at plantbio.uga.edu Mon Jul 30 16:58:51 2007 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon, 30 Jul 2007 16:58:51 -0400 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml Message-ID: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> I am running remoteblast and using readmethod "xml", I noticed that it is printing the output repeatedly nonstop. It's like in a loop. Did anybody notice this before? Can anybody help me getting out of this? Thanks a lot, Guojun Yang University of Georgia From grafman at graphcomp.com Sun Jul 29 17:08:04 2007 From: grafman at graphcomp.com (Grafman Productions) Date: Sun, 29 Jul 2007 14:08:04 -0700 Subject: [Bioperl-l] Perl 3D OpenGL Message-ID: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> If this posting is inappropriate, please let me know - my apologies. I recently came across an article on BioPerl, and it occurred to me that there might be some need for 3D rendering within your BioPerl project. I released a number of new/updated Perl OpenGL (POGL) modules this year, along with benchmarks that demonstrate that it performs comparably to C. If there's a need for 3D features within BioPerl, and if I can be of any assistance in helping to add such features, I would enjoy the opportunity. From torsten.seemann at infotech.monash.edu.au Mon Jul 30 19:27:46 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 31 Jul 2007 09:27:46 +1000 Subject: [Bioperl-l] error using standaloneblast through webserver In-Reply-To: <1245168492amadoz@uv.es> References: <1245168492amadoz@uv.es> Message-ID: Alicia, > Hi, i'm trying to run a bioperl script in linux with standaloneblast > from a webserver but I have the following error: > -------------------- WARNING --------------------- > MSG: cannot find path to blastall > --------------------------------------------------- > $ENV{BLASTDATADIR} = '/usr/local/data/'; > $ENV{PATH} .= ':/usr/local/blast-2.2.16'; > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; I think the last one (or two) paths should be '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard BLAST installation is where the 'blastall' binary actually lives. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From cjfields at uiuc.edu Mon Jul 30 20:53:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 30 Jul 2007 19:53:45 -0500 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> Message-ID: On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: > I am running remoteblast and using readmethod "xml", I noticed that > it is printing the output repeatedly nonstop. It's like in a loop. > Did anybody notice this before? Can anybody help me getting out of > this? > Thanks a lot, > > > Guojun Yang > University of Georgia Not seeing that using bioperl-live; you may need to update RemoteBlast.pm as this sounds similar to an issue that popped up earlier in the spring. chris From torsten.seemann at infotech.monash.edu.au Tue Jul 31 02:24:34 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 31 Jul 2007 16:24:34 +1000 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> Message-ID: > as this sounds similar to an issue that popped up > earlier in the spring. I could have sworn it was autumn! ;-) -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From Alicia.Amadoz at uv.es Tue Jul 31 06:11:54 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Tue, 31 Jul 2007 12:11:54 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver In-Reply-To: References: Message-ID: <2361686267amadoz@uv.es> Hi, I tried what you suggested and that was it, it works perfectly. Thank you very much. Regards, Alicia > Alicia, > > > Hi, i'm trying to run a bioperl script in linux with standaloneblast > > from a webserver but I have the following error: > > -------------------- WARNING --------------------- > > MSG: cannot find path to blastall > > --------------------------------------------------- > > $ENV{BLASTDATADIR} = '/usr/local/data/'; > > $ENV{PATH} .= ':/usr/local/blast-2.2.16'; > > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; > > I think the last one (or two) paths should be > '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard > BLAST installation is where the 'blastall' binary actually lives. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > > From jay at jays.net Tue Jul 31 08:00:56 2007 From: jay at jays.net (Jay Hannah) Date: Tue, 31 Jul 2007 07:00:56 -0500 Subject: [Bioperl-l] Perl 3D OpenGL In-Reply-To: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> Message-ID: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote: > If this posting is inappropriate, please let me know - my apologies. Not at all. AFAIK this is the perfect place to discuss any contributions you're motivated to make to the BioPerl project. > I recently came across an article on BioPerl, and it occurred to me > that > there might be some need for 3D rendering within your BioPerl project. > > I released a number of new/updated Perl OpenGL (POGL) modules this > year, > along with benchmarks that demonstrate that it performs comparably > to C. > > If there's a need for 3D features within BioPerl, and if I can be > of any > assistance in helping to add such features, I would enjoy the > opportunity. I know nothing about 3D modeling in biology, nor do I hang out with any protein structure folks, but 3D always sounds sexy. -grin- If you're new to bioinformatics (I certainly am) you might want to read this: http://en.wikipedia.org/wiki/Protein_structure Because that's probably where your 3D work would be used. Especially note the "Software" section, where you'll find some of the "competition". :) There's some cool stuff out there. I don't know what all would or wouldn't be time well spent in Perl / BioPerl. HTH, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From cjfields at uiuc.edu Tue Jul 31 12:51:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 31 Jul 2007 11:51:42 -0500 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: <20070731104052.b4b93021@dogwood.plantbio.uga.edu> References: <20070731104052.b4b93021@dogwood.plantbio.uga.edu> Message-ID: <7A2D7E4A-4024-48DB-88C8-063388A98419@uiuc.edu> Make sure to keep responses on the ail list. You might want to run a full install, just in case. If I remember correctly Sendu made some changes a while back in the BLAST-related modules which may be related to this. At the very least install/ upgrade all modules in Bio::Tools::Run. chris On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote: > Thanks, Chris, > But when I replaced the old RemoteBlast.pm with the new one, I got > "can't locate the object method "retrieve_parameter"". Does this > mean I need to install something else? > Guojun > > ----- Original Message ----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast > with xml > > >>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: >>>> I am running remoteblast and using readmethod "xml", I noticed that >>> it is printing the output repeatedly nonstop. It's like in a loop. >>> Did anybody notice this before? Can anybody help me getting out of >>> this? >>> Thanks a lot, >>> >>> >>> Guojun Yang >>> University of Georgia >>> Not seeing that using bioperl-live; you may need to update >> RemoteBlast.pm as this sounds similar to an issue that popped up >> earlier in the spring. >>> chris >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Jul 31 22:15:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 31 Jul 2007 21:15:45 -0500 Subject: [Bioperl-l] Perl 3D OpenGL In-Reply-To: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> Message-ID: <04BCAD9E-CC25-4F0A-85B1-FBA91C64CE7D@uiuc.edu> On Jul 31, 2007, at 7:00 AM, Jay Hannah wrote: > On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote: >> If this posting is inappropriate, please let me know - my apologies. > > Not at all. AFAIK this is the perfect place to discuss any > contributions you're motivated to make to the BioPerl project. > >> I recently came across an article on BioPerl, and it occurred to me >> that >> there might be some need for 3D rendering within your BioPerl >> project. >> >> I released a number of new/updated Perl OpenGL (POGL) modules this >> year, >> along with benchmarks that demonstrate that it performs comparably >> to C. >> >> If there's a need for 3D features within BioPerl, and if I can be >> of any >> assistance in helping to add such features, I would enjoy the >> opportunity. > > I know nothing about 3D modeling in biology, nor do I hang out with > any protein structure folks, but 3D always sounds sexy. -grin- > > If you're new to bioinformatics (I certainly am) you might want to > read this: > > http://en.wikipedia.org/wiki/Protein_structure > > Because that's probably where your 3D work would be used. Especially > note the "Software" section, where you'll find some of the > "competition". :) > > There's some cool stuff out there. I don't know what all would or > wouldn't be time well spent in Perl / BioPerl. > > HTH, > > Jay Hannah > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah I agree that protein structure is the best place for something like this. It's a wide open area as far as I'm concerned; in fact I would say that Bio::Structure is getting pretty dated, so if anyone wants to take it over, refactor the code, and so on I don't have a problem. chris From cjfields at uiuc.edu Sun Jul 1 00:40:53 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 30 Jun 2007 19:40:53 -0500 Subject: [Bioperl-l] First cut svn repository In-Reply-To: <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net> References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <4683A7D1.8070403@sendu.me.uk> <18051.48684.996884.134046@almost.alerce.com> <4683C385.3050904@sendu.me.uk> <18051.63674.685297.426813@almost.alerce.com> <18052.3946.224905.415905@almost.alerce.com> <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net> Message-ID: Checkout worked for me (Mac OS X) using both: svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl-live/ tags/release-0-9-2/t/data svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl-live/ tags/release-0-9-2/ so removing the offending file worked (good catch!). Haven't run a full co but probably isn't necessary. chris On Jun 30, 2007, at 6:36 PM, Hilmar Lapp wrote: > > On Jun 28, 2007, at 3:43 PM, George Hartzell wrote: > >> I just did the experiment, and filename-insensitivity seems to be >> breaking something. >> >> I'm using an svn I picked up from http://www.codingmonkeys.de/mbo/. >> >> I reformatted a memory stick to be case sensitive and co of >> >> bioperl/bioperl-live/tags/release-0-9-2/t >> >> worked, then I made a directory in my home dir (normal mac thing) and >> got the same error as above. > > You picked up a rename of a file from lower case extension to upper > case extension. Unfortunately, there are several months between > adding the upper-case and removing the lower-case version. > > We can reconstruct what happened with this using svn log on the > directory (this does not require a checkout): > > $ svn log --verbose svn+ssh://dev.open-bio.org/home/hartzell/ > bioperl/bioperl-live/trunk/t/data > > Searching for HUMBETGLOA yields the following two commits that > added one and removed the other: > > ---------------------------------------------------------------------- > -- > r2245 | jason | 2001-12-08 11:59:05 -0500 (Sat, 08 Dec 2001) | 2 lines > Changed paths: > M /bioperl-live/trunk/t/SearchIO.t > A /bioperl-live/trunk/t/data/HUMBETGLOA.FASTA > A /bioperl-live/trunk/t/data/cysprot1.FASTA > > added tests for FASTA > > ---------------------------------------------------------------------- > -- > r2877 | jason | 2002-03-11 22:39:40 -0500 (Mon, 11 Mar 2002) | 2 lines > Changed paths: > A /bioperl-live/trunk/t/data/HUMBETGLOA.fa > D /bioperl-live/trunk/t/data/HUMBETGLOA.fasta > > renaming file to avoid clobbering on windows > > Unfortunately, both files are in the tag (again, no checkout > required): > > $ svn list svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- > live/tags/release-0-9-2/t/data | grep HUMBETGLOA | grep -i fasta > HUMBETGLOA.FASTA > HUMBETGLOA.fasta > > We can remove the offending version from the repository (again, > without needing a checkout): > > $ svn rm svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- > live/tags/release-0-9-2/t/data/HUMBETGLOA.fasta > > I did this, and now the tag checks out fine on OSX. Can anyone > confirm? > > (BTW the ability to operate on the repository w/o needing a > checkout is another advantage of svn) > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hartzell at alerce.com Sun Jul 1 00:48:06 2007 From: hartzell at alerce.com (George Hartzell) Date: Sat, 30 Jun 2007 17:48:06 -0700 Subject: [Bioperl-l] Take 2 of the new subversion repository. Message-ID: <18054.63942.316904.413911@almost.alerce.com> There's a second cut at the subversion repository. I've done a better job of setting svn:keywords and svn:eol-style on various files. The defaults were more cautious and I used an auto-props files based on the wiki version. svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2 The old repository's still around as svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1 I renamed it so that people would work with it by mistake. If, for some hard-to-imagine reason, you have a working copy that you want to run against it, you should be able to do an svn switch --relocate on your working copy and be back in shape. In fact, it might be a good time to give it a try.... g. From hartzell at alerce.com Sun Jul 1 01:17:18 2007 From: hartzell at alerce.com (George Hartzell) Date: Sat, 30 Jun 2007 18:17:18 -0700 Subject: [Bioperl-l] First cut svn repository In-Reply-To: References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <4683A7D1.8070403@sendu.me.uk> <18051.48684.996884.134046@almost.alerce.com> <4683C385.3050904@sendu.me.uk> <18051.63674.685297.426813@almost.alerce.com> <18052.3946.224905.415905@almost.alerce.com> <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net> Message-ID: <18055.158.30409.808612@almost.alerce.com> Chris Fields writes: > Checkout worked for me (Mac OS X) using both: > > svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl-live/ > tags/release-0-9-2/t/data > svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl-live/ > tags/release-0-9-2/ > > so removing the offending file worked (good catch!). Haven't run a > full co but probably isn't necessary. > [...] I'll keep a note of that as something to do when I prepare the final cut of the repository. g. From jason at bioperl.org Sun Jul 1 01:25:30 2007 From: jason at bioperl.org (Jason Stajich) Date: Sat, 30 Jun 2007 18:25:30 -0700 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18054.63942.316904.413911@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: Thanks George - I also did chgrp -R bioperl /home/hartzell/bioperl_take? to make sure the group permission was set right. We may also want to do a chmod g+s on all the dirs in there as well so that permissions are preserved when this gets deployed for real. If anyone wants to make some changes to files and commit them, as well as make some branches/tags to play around a little bit since we'll likely throw this away and do it again from locked down version from CVS at some appointed time. Do you know how to have svn commit messages generate summary emails as well? -j On Jun 30, 2007, at 5:48 PM, George Hartzell wrote: > > There's a second cut at the subversion repository. I've done a better > job of setting svn:keywords and svn:eol-style on various files. The > defaults were more cautious and I used an auto-props files based on > the wiki version. > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2 > > The old repository's still around as > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1 > > I renamed it so that people would work with it by mistake. If, for > some hard-to-imagine reason, you have a working copy that you want to > run against it, you should be able to do an svn switch --relocate on > your working copy and be back in shape. In fact, it might be a good > time to give it a try.... > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From hlapp at gmx.net Sun Jul 1 02:21:25 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 30 Jun 2007 22:21:25 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18054.63942.316904.413911@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <5F53A433-BAA9-431D-A0C5-5955690D0B73@gmx.net> On Jun 30, 2007, at 8:48 PM, George Hartzell wrote: > I renamed it so that people would work with it by mistake. If, for > some hard-to-imagine reason, you have a working copy that you want to > run against it, It's not so hard to imagine - checking out the entire repository takes a long time. > you should be able to do an svn switch --relocate on > your working copy and be back in shape. In fact, it might be a good > time to give it a try.... It doesn't work: svn: The repository at 'svn+ssh://dev.open-bio.org/home/hartzell/ bioperl_take2' has uuid '31277767-6726-dc11-ab4c-0019e3f901d6', but the WC has '27e854f1-f323-dc11-8c1b-0019e3f901d6' You can't relocate to a totally new repository (relocating to bioperl_take1 does work though). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sun Jul 1 02:39:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 30 Jun 2007 21:39:27 -0500 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <7C6FD6C9-CBED-40D3-BA90-4B34F79E6DE0@uiuc.edu> There are a few CPAN modules available; here's one: http://search.cpan.org/~dwheeler/SVN-Notify-2.66/lib/SVN/Notify.pm chris On Jun 30, 2007, at 8:25 PM, Jason Stajich wrote: > Thanks George - > I also did > chgrp -R bioperl /home/hartzell/bioperl_take? > to make sure the group permission was set right. > > We may also want to do a chmod g+s on all the dirs in there as well > so that permissions are preserved when this gets deployed for real. > > If anyone wants to make some changes to files and commit them, as > well as make some branches/tags to play around a little bit since > we'll likely throw this away and do it again from locked down version > from CVS at some appointed time. > > Do you know how to have svn commit messages generate summary emails > as well? > > -j > On Jun 30, 2007, at 5:48 PM, George Hartzell wrote: > >> >> There's a second cut at the subversion repository. I've done a >> better >> job of setting svn:keywords and svn:eol-style on various files. The >> defaults were more cautious and I used an auto-props files based on >> the wiki version. >> >> svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2 >> >> The old repository's still around as >> >> svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1 >> >> I renamed it so that people would work with it by mistake. If, for >> some hard-to-imagine reason, you have a working copy that you want to >> run against it, you should be able to do an svn switch --relocate on >> your working copy and be back in shape. In fact, it might be a good >> time to give it a try.... >> >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Jul 1 02:46:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 30 Jun 2007 21:46:05 -0500 Subject: [Bioperl-l] Splits again In-Reply-To: <4686CC04.6000403@sendu.me.uk> References: <467949EC.9040100@sendu.me.uk> <467FBDD3.8050009@sendu.me.uk> <46823ABE.2080300@sendu.me.uk> <4682B000.2050707@sheffield.ac.uk> <4682B798.1010409@sheffield.ac.uk> <4682C6F5.4020406@sendu.me.uk> <4682D12E.3000803@sendu.me.uk> <2517AA40-9CDF-44F0-9665-107549DFD30C@uiuc.edu> <4682E824.1050507@sendu.me.uk> <4683624F.6020402@sendu.me.uk> <4683DBEA.90005@sendu.me.uk> <904D660A-3A2F-46F5-A198-0C00CBBF14C1@uiuc.edu> <468409C7.7020102@sendu.me.uk> <4686CC04.6000403@sendu.me.uk> Message-ID: On Jun 30, 2007, at 4:32 PM, Sendu Bala wrote: > Hilmar Lapp wrote: >> On Jun 28, 2007, at 3:19 PM, Sendu Bala wrote: >>> [...] >>> Very definitely the latter. The key benefit of my approach is >>> that the organisation stays as is and that a snapshot of the >>> repository remains a single directory of modules in Bio so that >>> people don't have to 'install' Bioperl, they can still just >>> uncompress the archive (or check out the package from svn) and >>> point their PERL5LIB to the root dir of the package. > [snip] >> In this sense, I understand a release pumpkin will generate ~900 >> packages to upload to CPAN? How much hassle is that compared to >> what uploading a bioperl release means right now? > > I'd have to investigate. I did my uploads using the PAUSE website, > which for 900 packages would be unfeasible. Will have to see if the > process can be automated. Not that they would care one way or another but maybe we should contact the CPAN maintainers to get their thoughts. They might have some ideas... >> How brittle is all the Build.PL code that will be needed to >> automate all of this, and how difficult will it be to maintain? >> For example, if someone adds in 10 new modules, what Build.PL- >> related work will need to be done? > > Well, my plan will be that once the work is done, you won't need to > touch the Build.PL code again. My intent is that the pumpkin can > just type one command and not think about anything. > > As for the reality, I won't know until I think about it properly > and experiment. A good experiment for a branch. I still think this could be accomplished step-wise; for instance run a quick test using something with a simple dependency tree like Bio::Root::Root (only needs RootI), finish up with Bio::Root*, then work down into PrimarySeq, Seq, etc. Submit them to CPAN piecemeal or in batches (all Bio::Seq*, so on). If the Build.PL, etc are to be generated on the fly then maybe there should be a simple way of registering or matching tests to modules (or vice versa) to ease the pain, particularly for new code. chris From hlapp at gmx.net Sun Jul 1 02:56:04 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 30 Jun 2007 22:56:04 -0400 Subject: [Bioperl-l] First cut svn repository In-Reply-To: References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <4683A7D1.8070403@sendu.me.uk> <18051.48684.996884.134046@almost.alerce.com> <4683C385.3050904@sendu.me.uk> <18051.63674.685297.426813@almost.alerce.com> <18052.3946.224905.415905@almost.alerce.com> <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net> Message-ID: It turns out that both files are also present on the release-0-9-3, bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ HUMBETGLOA.fasta $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ HUMBETGLOA.fasta $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ HUMBETGLOA.fasta $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ HUMBETGLOA.fasta to the post-processing commands. -hilmar On Jun 30, 2007, at 8:40 PM, Chris Fields wrote: > Checkout worked for me (Mac OS X) using both: > > svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- > live/tags/release-0-9-2/t/data > svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- > live/tags/release-0-9-2/ > > so removing the offending file worked (good catch!). Haven't run a > full co but probably isn't necessary. > > chris > > On Jun 30, 2007, at 6:36 PM, Hilmar Lapp wrote: > >> >> On Jun 28, 2007, at 3:43 PM, George Hartzell wrote: >> >>> I just did the experiment, and filename-insensitivity seems to be >>> breaking something. >>> >>> I'm using an svn I picked up from http://www.codingmonkeys.de/mbo/. >>> >>> I reformatted a memory stick to be case sensitive and co of >>> >>> bioperl/bioperl-live/tags/release-0-9-2/t >>> >>> worked, then I made a directory in my home dir (normal mac thing) >>> and >>> got the same error as above. >> >> You picked up a rename of a file from lower case extension to >> upper case extension. Unfortunately, there are several months >> between adding the upper-case and removing the lower-case version. >> >> We can reconstruct what happened with this using svn log on the >> directory (this does not require a checkout): >> >> $ svn log --verbose svn+ssh://dev.open-bio.org/home/hartzell/ >> bioperl/bioperl-live/trunk/t/data >> >> Searching for HUMBETGLOA yields the following two commits that >> added one and removed the other: >> >> --------------------------------------------------------------------- >> --- >> r2245 | jason | 2001-12-08 11:59:05 -0500 (Sat, 08 Dec 2001) | 2 >> lines >> Changed paths: >> M /bioperl-live/trunk/t/SearchIO.t >> A /bioperl-live/trunk/t/data/HUMBETGLOA.FASTA >> A /bioperl-live/trunk/t/data/cysprot1.FASTA >> >> added tests for FASTA >> >> --------------------------------------------------------------------- >> --- >> r2877 | jason | 2002-03-11 22:39:40 -0500 (Mon, 11 Mar 2002) | 2 >> lines >> Changed paths: >> A /bioperl-live/trunk/t/data/HUMBETGLOA.fa >> D /bioperl-live/trunk/t/data/HUMBETGLOA.fasta >> >> renaming file to avoid clobbering on windows >> >> Unfortunately, both files are in the tag (again, no checkout >> required): >> >> $ svn list svn+ssh://dev.open-bio.org/home/hartzell/bioperl/ >> bioperl-live/tags/release-0-9-2/t/data | grep HUMBETGLOA | grep -i >> fasta >> HUMBETGLOA.FASTA >> HUMBETGLOA.fasta >> >> We can remove the offending version from the repository (again, >> without needing a checkout): >> >> $ svn rm svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- >> live/tags/release-0-9-2/t/data/HUMBETGLOA.fasta >> >> I did this, and now the tag checks out fine on OSX. Can anyone >> confirm? >> >> (BTW the ability to operate on the repository w/o needing a >> checkout is another advantage of svn) >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From dmessina at wustl.edu Sun Jul 1 05:38:48 2007 From: dmessina at wustl.edu (David Messina) Date: Sun, 1 Jul 2007 00:38:48 -0500 Subject: [Bioperl-l] svn auto-properties [was Re: First cut svn repository] In-Reply-To: <46869226.70203@sheffield.ac.uk> References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <5764264E-5C40-4C9E-B1C9-A70628AC1DD0@uiuc.edu> <18051.44281.831316.749586@almost.alerce.com> <18051.61992.627473.323346@almost.alerce.com> <4684AF3D.5090907@sheffield.ac.uk> <843758CD-9C5B-4DDA-9FF4-B90AA225BDB3@wustl.edu> <468628AC.9060200@sheffield.ac.uk> <461F64B9-87FD-458A-8945-8238E7076109@wustl.edu> <46869226.70203@sheffield.ac.uk> Message-ID: <3164A6E3-77CF-4E61-9609-1408768862B1@wustl.edu> > [Nath] > I think the list of seq formats recognised by Bioperl in Bio::SeqIO > and > Bio::AlignIO would be a good start. As these are likely to be the ones > that are sensitive to file format recognition and thus could break > tests > if renamed. Sounds good to me. I will do a quick tour of the rest of the repo looking for other common or important file extensions, but I don't expect there to be many if any. > [still Nath] > I think a lot of people have used "." in file names as an > alternative to > a space. I think it would be beneficial to use an underscore "_" in > these cases and leave the "." to represent the beginning of the file > extension. That's a great idea. > [Chris] > Do we need to define every filetype extension, or can there be a > fallback (eg if it isn't on the list or has no extension it's plain > text)? For every file that's added, svn takes a peek to see if it's human- readable. If not, it's tagged with the generic MIME type application/ octet-stream. (It does this so it knows not to try to do diffs and merges on a binary file.) So the default for a human-readable file is no MIME type, which I believe is essentially the same thing as text/plain. And then regardless of the outcome of svn's peek, any matching auto- props are then applied, overriding svn's choice. So if we don't define every extension, I think we'll be fine. It'd be nice to have everything tagged with a MIME type, though. For one thing, Apache will use it to do the right thing when people browse the repo over the web. And two, because metadata is cool. :) One more thing: in the course of reading up on this, I learned that my earlier expectation about multiple auto-prop matches was incorrect. It's true that multiple unrelated matches means that multiple properties are set on the file. But when a file matches multiple *conflicting* auto-property patterns, there's no telling which value it'll get. Dave From hartzell at alerce.com Sun Jul 1 16:29:29 2007 From: hartzell at alerce.com (George Hartzell) Date: Sun, 1 Jul 2007 09:29:29 -0700 Subject: [Bioperl-l] First cut svn repository In-Reply-To: References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <4683A7D1.8070403@sendu.me.uk> <18051.48684.996884.134046@almost.alerce.com> <4683C385.3050904@sendu.me.uk> <18051.63674.685297.426813@almost.alerce.com> <18052.3946.224905.415905@almost.alerce.com> <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net> Message-ID: <18055.54889.677775.868974@almost.alerce.com> Hilmar Lapp writes: > It turns out that both files are also present on the release-0-9-3, > bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add > > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ > HUMBETGLOA.fasta > > to the post-processing commands. > [...] Will do. Thanks for working out the incantations! g. From cjfields at uiuc.edu Mon Jul 2 13:26:06 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:26:06 -0500 Subject: [Bioperl-l] test data Message-ID: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> I am planing on adding test data to cvs for eutils and have run across some stuff in bugzilla that needs to be added as well. Should we, as convention, start adding data sequestered to a fold with the test name, within t/data? This might make life easier in the long run (keep track of files, get rid of old files, etc), and may make it easier for wrapping up the correct data with tests if we start submitting single module CPAN updates. chris From cjfields at uiuc.edu Mon Jul 2 13:52:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:52:27 -0500 Subject: [Bioperl-l] test data In-Reply-To: <468901C1.8020505@sendu.me.uk> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> Message-ID: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Chris Fields wrote: >> I am planing on adding test data to cvs for eutils and have run >> across some stuff in bugzilla that needs to be added as well. >> Should we, as convention, start adding data sequestered to a fold >> with the test name, within t/data? > > I'd actually argue that this shouldn't be done: data is sometimes > reused amongst multiple different test scripts, and when looking > for data to reuse its easier to spot it in a single directory > compared to searching through multiple directories. > > >> This might make life easier in the long run (keep track of files, >> get rid of old files, etc), and may make it easier for wrapping up >> the correct data with tests if we start submitting single module >> CPAN updates. > > I don't think that will be an issue. The automated process would > read the test script and see what input files it uses, copying > those into the archive. So, just be sure to standardise on using > test_input_file() to make that possible. > > > That said, I wouldn't mind especially either way. Just don't do it > now, since test script names (and therefore the name of the > directory you'd want to store the input files in) might all change. > > > In fact we can imagine that we have a test script t/ > BioZombieKitten.t which stores its test data in t/data/ > BioZombieKitten/input.file but the script gets the path to this > file by: > my $input_file = test_input_file('input.file'); > > test_input_file() is then implemented to look for the file in the > subdir of data corresponding to the script name if we're dealing > with the 900-modules-in-a-package checkout-type situation, but just > in t/data if we're in the one-module-in-a-package situation. > > In any case, things will be most flexible if you drop files > directly into t/data for now and reference them without any subdirs > in the call to test_input_file(). Fine by me, I just find it very cluttered. BioZombieKitten?!? chris From bix at sendu.me.uk Mon Jul 2 14:00:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 15:00:37 +0100 Subject: [Bioperl-l] test data In-Reply-To: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> Message-ID: <46890505.1070707@sendu.me.uk> Chris Fields wrote: > On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Fine by me, I just find it very cluttered. Yes, I agree. I also wish we had a decent naming convention for files. (Ie. it would be nice to have a good idea what a file was for without having to study the test script that uses it.) > BioZombieKitten?!? I get Bio/perl/ and Bio/ware/ confused in my head ;) http://forums.bioware.com/viewtopic.html?topic=562916&forum=84 From bix at sendu.me.uk Mon Jul 2 13:46:41 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 14:46:41 +0100 Subject: [Bioperl-l] test data In-Reply-To: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> Message-ID: <468901C1.8020505@sendu.me.uk> Chris Fields wrote: > I am planing on adding test data to cvs for eutils and have run across > some stuff in bugzilla that needs to be added as well. > > Should we, as convention, start adding data sequestered to a fold with > the test name, within t/data? I'd actually argue that this shouldn't be done: data is sometimes reused amongst multiple different test scripts, and when looking for data to reuse its easier to spot it in a single directory compared to searching through multiple directories. > This might make life easier in the long > run (keep track of files, get rid of old files, etc), and may make it > easier for wrapping up the correct data with tests if we start > submitting single module CPAN updates. I don't think that will be an issue. The automated process would read the test script and see what input files it uses, copying those into the archive. So, just be sure to standardise on using test_input_file() to make that possible. That said, I wouldn't mind especially either way. Just don't do it now, since test script names (and therefore the name of the directory you'd want to store the input files in) might all change. In fact we can imagine that we have a test script t/BioZombieKitten.t which stores its test data in t/data/BioZombieKitten/input.file but the script gets the path to this file by: my $input_file = test_input_file('input.file'); test_input_file() is then implemented to look for the file in the subdir of data corresponding to the script name if we're dealing with the 900-modules-in-a-package checkout-type situation, but just in t/data if we're in the one-module-in-a-package situation. In any case, things will be most flexible if you drop files directly into t/data for now and reference them without any subdirs in the call to test_input_file(). From hlapp at gmx.net Mon Jul 2 20:02:37 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 16:02:37 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18054.63942.316904.413911@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: Just FYI, after applying the changes I've been sending, I was able to check out the repository in its entirety. -hilmar On Jun 30, 2007, at 8:48 PM, George Hartzell wrote: > > There's a second cut at the subversion repository. I've done a better > job of setting svn:keywords and svn:eol-style on various files. The > defaults were more cautious and I used an auto-props files based on > the wiki version. > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2 > > The old repository's still around as > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1 > > I renamed it so that people would work with it by mistake. If, for > some hard-to-imagine reason, you have a working copy that you want to > run against it, you should be able to do an svn switch --relocate on > your working copy and be back in shape. In fact, it might be a good > time to give it a try.... > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From wrp at virginia.edu Mon Jul 2 20:08:04 2007 From: wrp at virginia.edu (William R. Pearson) Date: Mon, 2 Jul 2007 16:08:04 -0400 Subject: [Bioperl-l] Course: Computational and Comparative Genomics Message-ID: <4B3F66D7-CF05-4CD1-A148-272B4B56FBD4@virginia.edu> Course announcement - Application deadline, July 15, 2007 ================================================================ Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS November 7 - 13, 200 Application Deadline: July 15, 2007 INSTRUCTORS: Pearson, William, Ph.D., University of Virginia, Charlottesville, VA Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of Prussia, PA Beyond BLAST and FASTA - Alignment: from proteins to genomes - This course presents a comprehensive overview of the theory and practice of computational methods for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment, and genome scale alignment. Additional topics include gene finding, dentifying signals in unaligned sequences, integration of genetic and sequence information in biological databases. The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course makes extensive use of local WWW pages to present problem sets and the computing tools to solve them. Students use Windows and Mac workstations attached to a UNIX server. The course is designed for biologists seeking advanced training in biological sequence analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis and comparative genomics. The primary focus of the Computational and Comparative Genomics Course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and developing new algorithms. Cold Spring Harbor also offers a "Programming for Biology" course, which focuses more on software development. For additional information and the lecture schedule and problem sets for the 2006 course, see: http://fasta.bioch.virginia.edu/cshl06 ================================================================ To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/courses/courseapplication.asp ================================================================ Bill Pearson From niels at genomics.dk Mon Jul 2 20:45:07 2007 From: niels at genomics.dk (Niels Larsen) Date: Mon, 02 Jul 2007 22:45:07 +0200 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <468963D3.3000007@genomics.dk> I write hoping someone could show me how to create a PrimarySeq object without parsing features and all first. The lines below return "Can't locate object method "next_seq" via package "Bio::PrimarySeq" at ./tst2 line 16." whereas calling Bio::SeqIO-> gives no error, but a too big object. The GenBank record after the __END__ is the "1.gb" file. I could not find out how from the tutorial or the Bio::PrimarySeq description. Niels L #!/usr/bin/env perl use strict; use warnings FATAL => qw ( all ); use Data::Dumper; use Bio::Seq; use Bio::SeqIO; my ( $seq_h, $seq ); $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", -format => 'genbank' ); $seq = $seq_h->next_seq(); # print Dumper( $seq ); __END__ LOCUS X60065 9 bp mRNA linear MAM 14-NOV-2006 DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. ACCESSION X60065 REGION: 1..9 VERSION X60065.1 GI:5 KEYWORDS beta-2 glycoprotein I. SOURCE Bos taurus (cattle) ORGANISM Bos taurus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bos. REFERENCE 1 AUTHORS Bendixen,E., Halkier,T., Magnusson,S., Sottrup-Jensen,L. and Kristensen,T. TITLE Complete primary structure of bovine beta 2-glycoprotein I: localization of the disulfide bridges JOURNAL Biochemistry 31 (14), 3611-3617 (1992) PUBMED 1567819 REFERENCE 2 (bases 1 to 9) AUTHORS Kristensen,T. TITLE Direct Submission JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of Mol Biology, University of Aarhus, C F Mollers Alle 130, DK-8000 Aarhus C, DENMARK FEATURES Location/Qualifiers source 1..9 /organism="Bos taurus" /mol_type="mRNA" /db_xref="taxon:9913" /clone="pBB2I" /tissue_type="liver" gene <1..>9 /gene="beta-2-gpI" CDS <1..>9 /gene="beta-2-gpI" /codon_start=1 /product="beta-2-glycoprotein I" /protein_id="CAA42669.1" /db_xref="GI:6" /db_xref="GOA:P17690" /db_xref="UniProtKB/Swiss-Prot:P17690" /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT DASDVKPC" sig_peptide <1..>9 /gene="beta-2-gpI" ORIGIN 1 ccagcgctc // From Kevin.M.Brown at asu.edu Mon Jul 2 21:35:12 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 2 Jul 2007 14:35:12 -0700 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <468963D3.3000007@genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> Message-ID: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Start by having a look at the following link: http://bioperl.org/cgi-bin/deob_interface.cgi SeqIO is how one reads or writes sequences to/from files. Bio::PrimarySeq is just an object that holds information about a sequence obtained from a file. As for how to parse a Genbank file into a list of features: $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); while (my $seq = $file->next_seq()) { @features = $seq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { # @sorted_features holds all the Bio::PrimarySeq features obtained from the genbank file push @sorted_features, $f; } } } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Niels Larsen > Sent: Monday, July 02, 2007 1:45 PM > Cc: bioperl-l List > Subject: [Bioperl-l] simple PrimarySeq question > > I write hoping someone could show me how to create a > PrimarySeq object without parsing features and all first. The > lines below return > > "Can't locate object method "next_seq" via package > "Bio::PrimarySeq" at ./tst2 line 16." > > whereas calling Bio::SeqIO-> gives no error, but a too big object. > The GenBank record after the __END__ is the "1.gb" file. I > could not find out how from the tutorial or the > Bio::PrimarySeq description. > > Niels L > > > #!/usr/bin/env perl > > use strict; > use warnings FATAL => qw ( all ); > > use Data::Dumper; > > use Bio::Seq; > use Bio::SeqIO; > > my ( $seq_h, $seq ); > > $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => > 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", > -format => 'genbank' ); > > $seq = $seq_h->next_seq(); > > # print Dumper( $seq ); > > __END__ > > LOCUS X60065 9 bp mRNA linear > MAM 14-NOV-2006 > DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. > ACCESSION X60065 REGION: 1..9 > VERSION X60065.1 GI:5 > KEYWORDS beta-2 glycoprotein I. > SOURCE Bos taurus (cattle) > ORGANISM Bos taurus > Eukaryota; Metazoa; Chordata; Craniata; > Vertebrata; Euteleostomi; > Mammalia; Eutheria; Laurasiatheria; > Cetartiodactyla; Ruminantia; > Pecora; Bovidae; Bovinae; Bos. > REFERENCE 1 > AUTHORS Bendixen,E., Halkier,T., Magnusson,S., > Sottrup-Jensen,L. and > Kristensen,T. > TITLE Complete primary structure of bovine beta > 2-glycoprotein I: > localization of the disulfide bridges > JOURNAL Biochemistry 31 (14), 3611-3617 (1992) > PUBMED 1567819 > REFERENCE 2 (bases 1 to 9) > AUTHORS Kristensen,T. > TITLE Direct Submission > JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of > Mol Biology, > University of Aarhus, C F Mollers Alle 130, > DK-8000 Aarhus C, > DENMARK > FEATURES Location/Qualifiers > source 1..9 > /organism="Bos taurus" > /mol_type="mRNA" > /db_xref="taxon:9913" > /clone="pBB2I" > /tissue_type="liver" > gene <1..>9 > /gene="beta-2-gpI" > CDS <1..>9 > /gene="beta-2-gpI" > /codon_start=1 > /product="beta-2-glycoprotein I" > /protein_id="CAA42669.1" > /db_xref="GI:6" > /db_xref="GOA:P17690" > /db_xref="UniProtKB/Swiss-Prot:P17690" > > /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI > > VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT > > ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN > > SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN > > PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER > > VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT > DASDVKPC" > sig_peptide <1..>9 > /gene="beta-2-gpI" > ORIGIN > 1 ccagcgctc > // > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From niels at genomics.dk Tue Jul 3 00:41:24 2007 From: niels at genomics.dk (niels at genomics.dk) Date: Tue, 3 Jul 2007 02:41:24 +0200 (CEST) Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Message-ID: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Kevin, Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO gets entries from file, and from those large parsed entries I can get a simplified primary_seq object. But the SeqIO object includes feature and annotation objects etc that takes time to make, and I wish to know if there is a way to get a primari_seq object without this overhead. I apologize if I overlooked it in the docs. Niels > Start by having a look at the following link: > http://bioperl.org/cgi-bin/deob_interface.cgi > > SeqIO is how one reads or writes sequences to/from files. > Bio::PrimarySeq is just an object that holds information about a > sequence obtained from a file. > > As for how to parse a Genbank file into a list of features: > > $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); > while (my $seq = $file->next_seq()) > { > @features = $seq->all_SeqFeatures; > # sort features by their primary tags > for my $f (@features) > { > my $tag = $f->primary_tag; > if ($tag eq 'CDS') > { > # @sorted_features holds all the Bio::PrimarySeq > features obtained from the genbank file > push @sorted_features, $f; > } > } > } > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Niels Larsen >> Sent: Monday, July 02, 2007 1:45 PM >> Cc: bioperl-l List >> Subject: [Bioperl-l] simple PrimarySeq question >> >> I write hoping someone could show me how to create a >> PrimarySeq object without parsing features and all first. The >> lines below return >> >> "Can't locate object method "next_seq" via package >> "Bio::PrimarySeq" at ./tst2 line 16." >> >> whereas calling Bio::SeqIO-> gives no error, but a too big object. >> The GenBank record after the __END__ is the "1.gb" file. I >> could not find out how from the tutorial or the >> Bio::PrimarySeq description. >> >> Niels L >> >> >> #!/usr/bin/env perl >> >> use strict; >> use warnings FATAL => qw ( all ); >> >> use Data::Dumper; >> >> use Bio::Seq; >> use Bio::SeqIO; >> >> my ( $seq_h, $seq ); >> >> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >> -format => 'genbank' ); >> >> $seq = $seq_h->next_seq(); >> >> # print Dumper( $seq ); >> >> __END__ >> >> LOCUS X60065 9 bp mRNA linear >> MAM 14-NOV-2006 >> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >> ACCESSION X60065 REGION: 1..9 >> VERSION X60065.1 GI:5 >> KEYWORDS beta-2 glycoprotein I. >> SOURCE Bos taurus (cattle) >> ORGANISM Bos taurus >> Eukaryota; Metazoa; Chordata; Craniata; >> Vertebrata; Euteleostomi; >> Mammalia; Eutheria; Laurasiatheria; >> Cetartiodactyla; Ruminantia; >> Pecora; Bovidae; Bovinae; Bos. >> REFERENCE 1 >> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >> Sottrup-Jensen,L. and >> Kristensen,T. >> TITLE Complete primary structure of bovine beta >> 2-glycoprotein I: >> localization of the disulfide bridges >> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >> PUBMED 1567819 >> REFERENCE 2 (bases 1 to 9) >> AUTHORS Kristensen,T. >> TITLE Direct Submission >> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >> Mol Biology, >> University of Aarhus, C F Mollers Alle 130, >> DK-8000 Aarhus C, >> DENMARK >> FEATURES Location/Qualifiers >> source 1..9 >> /organism="Bos taurus" >> /mol_type="mRNA" >> /db_xref="taxon:9913" >> /clone="pBB2I" >> /tissue_type="liver" >> gene <1..>9 >> /gene="beta-2-gpI" >> CDS <1..>9 >> /gene="beta-2-gpI" >> /codon_start=1 >> /product="beta-2-glycoprotein I" >> /protein_id="CAA42669.1" >> /db_xref="GI:6" >> /db_xref="GOA:P17690" >> /db_xref="UniProtKB/Swiss-Prot:P17690" >> >> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >> >> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >> >> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >> >> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >> >> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >> >> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >> DASDVKPC" >> sig_peptide <1..>9 >> /gene="beta-2-gpI" >> ORIGIN >> 1 ccagcgctc >> // >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Tue Jul 3 02:36:19 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 22:36:19 -0400 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Message-ID: <84F5C120-FE0B-472D-8F1B-026AD238E959@gmx.net> Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have examples for what you want to do: use Bio::SeqIO; # usually you won't instantiate this yourself - a SeqIO object - # you will have one already my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank"); my $builder = $seqin->sequence_builder(); # if you need only sequence, id, and description (e.g. for # conversion to FASTA format): $builder->want_none(); $builder->add_wanted_slot('display_id','desc','seq'); # if you want everything except the sequence and features $builder->want_all(1); # this is the default if it's untouched $builder->add_unwanted_slot('seq','features'); Let us know if that doesn't answer your question. Note that this is currently only implemented for Genbank format. -hilmar On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote: > Kevin, > > Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO > gets entries from file, and from those large parsed entries I can > get a > simplified primary_seq object. But the SeqIO object includes feature > and annotation objects etc that takes time to make, and I wish to know > if there is a way to get a primari_seq object without this overhead. I > apologize if I overlooked it in the docs. > > Niels > > > > >> Start by having a look at the following link: >> http://bioperl.org/cgi-bin/deob_interface.cgi >> >> SeqIO is how one reads or writes sequences to/from files. >> Bio::PrimarySeq is just an object that holds information about a >> sequence obtained from a file. >> >> As for how to parse a Genbank file into a list of features: >> >> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); >> while (my $seq = $file->next_seq()) >> { >> @features = $seq->all_SeqFeatures; >> # sort features by their primary tags >> for my $f (@features) >> { >> my $tag = $f->primary_tag; >> if ($tag eq 'CDS') >> { >> # @sorted_features holds all the Bio::PrimarySeq >> features obtained from the genbank file >> push @sorted_features, $f; >> } >> } >> } >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Niels Larsen >>> Sent: Monday, July 02, 2007 1:45 PM >>> Cc: bioperl-l List >>> Subject: [Bioperl-l] simple PrimarySeq question >>> >>> I write hoping someone could show me how to create a >>> PrimarySeq object without parsing features and all first. The >>> lines below return >>> >>> "Can't locate object method "next_seq" via package >>> "Bio::PrimarySeq" at ./tst2 line 16." >>> >>> whereas calling Bio::SeqIO-> gives no error, but a too big object. >>> The GenBank record after the __END__ is the "1.gb" file. I >>> could not find out how from the tutorial or the >>> Bio::PrimarySeq description. >>> >>> Niels L >>> >>> >>> #!/usr/bin/env perl >>> >>> use strict; >>> use warnings FATAL => qw ( all ); >>> >>> use Data::Dumper; >>> >>> use Bio::Seq; >>> use Bio::SeqIO; >>> >>> my ( $seq_h, $seq ); >>> >>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >>> -format => 'genbank' ); >>> >>> $seq = $seq_h->next_seq(); >>> >>> # print Dumper( $seq ); >>> >>> __END__ >>> >>> LOCUS X60065 9 bp mRNA linear >>> MAM 14-NOV-2006 >>> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >>> ACCESSION X60065 REGION: 1..9 >>> VERSION X60065.1 GI:5 >>> KEYWORDS beta-2 glycoprotein I. >>> SOURCE Bos taurus (cattle) >>> ORGANISM Bos taurus >>> Eukaryota; Metazoa; Chordata; Craniata; >>> Vertebrata; Euteleostomi; >>> Mammalia; Eutheria; Laurasiatheria; >>> Cetartiodactyla; Ruminantia; >>> Pecora; Bovidae; Bovinae; Bos. >>> REFERENCE 1 >>> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >>> Sottrup-Jensen,L. and >>> Kristensen,T. >>> TITLE Complete primary structure of bovine beta >>> 2-glycoprotein I: >>> localization of the disulfide bridges >>> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >>> PUBMED 1567819 >>> REFERENCE 2 (bases 1 to 9) >>> AUTHORS Kristensen,T. >>> TITLE Direct Submission >>> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >>> Mol Biology, >>> University of Aarhus, C F Mollers Alle 130, >>> DK-8000 Aarhus C, >>> DENMARK >>> FEATURES Location/Qualifiers >>> source 1..9 >>> /organism="Bos taurus" >>> /mol_type="mRNA" >>> /db_xref="taxon:9913" >>> /clone="pBB2I" >>> /tissue_type="liver" >>> gene <1..>9 >>> /gene="beta-2-gpI" >>> CDS <1..>9 >>> /gene="beta-2-gpI" >>> /codon_start=1 >>> /product="beta-2-glycoprotein I" >>> /protein_id="CAA42669.1" >>> /db_xref="GI:6" >>> /db_xref="GOA:P17690" >>> /db_xref="UniProtKB/Swiss-Prot:P17690" >>> >>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >>> >>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >>> >>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >>> >>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >>> >>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >>> >>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >>> DASDVKPC" >>> sig_peptide <1..>9 >>> /gene="beta-2-gpI" >>> ORIGIN >>> 1 ccagcgctc >>> // >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ewijaya at gmail.com Tue Jul 3 06:56:30 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 14:56:30 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at gmail.com Tue Jul 3 07:00:16 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 15:00:16 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707030000t5ab77608x264d49125255a6d1@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at i2r.a-star.edu.sg Tue Jul 3 06:35:12 2007 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 3 Jul 2007 14:35:12 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net> Message-ID: <3ACF03E372996C4EACD542EA8A05E66A26EB85@mailbe01.teak.local.net> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward ------------ Institute For Infocomm Research - Disclaimer -------------This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.-------------------------------------------------------- From lstein at cshl.edu Tue Jul 3 14:41:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 3 Jul 2007 10:40:26 -0401 Subject: [Bioperl-l] Problem with GD.pm version 2.35 In-Reply-To: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> References: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Message-ID: <6dce9a0b0707030741r52b8d0beq757a8faf982e1f2f@mail.gmail.com> This happens when there is a mismatch between the compiled (.so) portion of GD and the perl (.pm) version. Typically it occurs when you have installed GD incorrectly by, e.g., copying the .pm file into position rather than using the make file. Solution: Uninstall old versions of GD by manually finding all occurrences of GD.so and GD.pm and removing them. Then reinstall the correct way. Lincoln On 7/3/07, Edward Wijaya wrote: > > Dear all, > I was trying to perform check with this command: > > $ perl -MGD -e 'print $GD::VERSION'; > > And it gave: > > GD object version 2.32 does not match $GD::VERSION 2.35 at > /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. > Compilation failed in require. > BEGIN failed--compilation aborted. > > Similarly my script that uses GD.pm doesn't execute. > > > I have installed the latest version of libgd version 2.0.35 downloaded > from > http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 > > Can anybody suggest how can I resolve my problem? > > This is my Perl version: > This is perl, v5.8.8 built for i386-linux-thread-multi > > -- > Edward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Jul 4 05:45:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 00:45:16 -0500 Subject: [Bioperl-l] genbank2gff3 - Name attribute? Message-ID: I noticed that genbank2gff3.pl doesn't have an explicitly defined way of converting the gene/locus/etc name to a Name tag (for, say, GBrowse). Any particular reason? Should I stick with GFF2 for now? chris From bix at sendu.me.uk Wed Jul 4 10:00:31 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 04 Jul 2007 11:00:31 +0100 Subject: [Bioperl-l] Splitting Bioperl Message-ID: <468B6FBF.1070708@sendu.me.uk> To summarise some previous threads: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/focus=15409 # Bioperl is currently one monolithic distribution of ~900 modules # There is some desire to split it up into smaller functional groups # There are some problems with that proposal # An extreme variant of that proposal is to make the groups individual modules Following this discussion: http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html (especially Adam Kennedy's postings of 4/07, soon to appear in that archive), the extreme variant doesn't seem like a good idea. I'm now suggesting that Steve's original split idea, as modified/expanded by Adam's driver and other ideas, is the best choice. The problems I previously identified can be solved in the same way they were solved in my extreme variant: the splits are done by Build.PL automation working on a single repository/code-base, not by splitting things up at the repository level. As I see it, the way forward now is for someone interested enough to decide on the specifics of how things will be split and offer them up to the group for discussion. I don't mean vague possibilities of what might work as a split, but rather some real thought should go into it to make sure the split makes sense and will actually work in practice. Following that, the splits can be implemented by some automated dist action of Build.PL. If there isn't sufficient interest to make this happen, I don't see that as a terrible thing. There are benefits to keeping Bioperl monolithic, and some of the problems (eg. lack of updates) can be solved without changing its nature. From cjfields at uiuc.edu Wed Jul 4 14:53:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 09:53:45 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <468B6FBF.1070708@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> Message-ID: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> On Jul 4, 2007, at 5:00 AM, Sendu Bala wrote: > To summarise some previous threads: > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/ > focus=15409 > > # Bioperl is currently one monolithic distribution of ~900 modules > # There is some desire to split it up into smaller functional groups > # There are some problems with that proposal > # An extreme variant of that proposal is to make the groups individual > modules > > > Following this discussion: > http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html > (especially Adam Kennedy's postings of 4/07, soon to appear in that > archive), the extreme variant doesn't seem like a good idea. brian d foy made some sound arguments against it as well. > I'm now suggesting that Steve's original split idea, as > modified/expanded by Adam's driver and other ideas, is the best > choice. > The problems I previously identified can be solved in the same way > they > were solved in my extreme variant: the splits are done by Build.PL > automation working on a single repository/code-base, not by splitting > things up at the repository level. > > As I see it, the way forward now is for someone interested enough to > decide on the specifics of how things will be split and offer them > up to > the group for discussion. I don't mean vague possibilities of what > might > work as a split, but rather some real thought should go into it to > make > sure the split makes sense and will actually work in practice. We've already identified a few (SearchIO, Tools, GBrowse-related, etc). ... > If there isn't sufficient interest to make this happen, I don't see > that > as a terrible thing. There are benefits to keeping Bioperl monolithic, > and some of the problems (eg. lack of updates) can be solved without > changing its nature. If so, proposals that solve this problem need to be made as well. If we stay monolithic, then here's mine: we start having fixed, regularly timed dev releases like Parrot, monthly or bimonthly (quite common on CPAN), with brief release reports on which bugs have been fixed, code has been added, so on. Not every bug has to be fixed per dev release; if that were true there would never be releases for some of the XML parser packages. No RCs for dev releases (it's a dev release!). These would be 1.x.y. We can then, every once in a while, have a bug-squashing session, hackathon, etc, and have regular non-dev release (1.x) that all core devs accept and that passes a particular milestone. As for the advantage of a split approach, as mentioned previously it is to focus modules/tests/scripts into groups with related functions. Even just splitting off ones with external reqs (XML parsers, GD, etc) into an 'aux' release would be an advantage, as it doesn't confront a new user with the burden of installing a large list of dependencies, some of which may be complicated for a perl newbie to either install from scratch (DBD::mysql, GD) or to get the latest bug-fixed prereq release for their OS (the recent debacle with XML::SAX::Expat issues come to mind, which wasn't immediately available for win32 as a PPM). I'm fairly open to any approach as long as it's reasonably though out, though I am admittedly a bit biased towards the split approach. I do think some change is in order; I worry about there ever being a 1.6 release at this point. chris From davila at ioc.fiocruz.br Wed Jul 4 17:11:20 2007 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Wed, 04 Jul 2007 14:11:20 -0300 Subject: [Bioperl-l] ESTs in EST format Message-ID: <468BD4B8.5050105@ioc.fiocruz.br> Dear All, I am trying to get all ESTs from a given species (eg: Trypanosoma brucei) from Genbank in EST format (eg: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucest&id=10280980)... while using Entrez I can "display" individual EST entries in EST format, this "EST format" is not an option in the main "display" menu for batch download ... I dont see the EST format listed (http://www.bioperl.org/wiki/Sequence_formats) among the ones that SeqIO deal with, so wonder there would another BioPerl module to do this ? any tips, would be greatly appreciated ;-) Kindest regards, Alberto From jason at bioperl.org Wed Jul 4 17:52:59 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 10:52:59 -0700 Subject: [Bioperl-l] ESTs in EST format In-Reply-To: <468BD4B8.5050105@ioc.fiocruz.br> References: <468BD4B8.5050105@ioc.fiocruz.br> Message-ID: Currently we don't support this format as far as I know it isn't a published standard nor is it a format that you NCBI distributes this data in flat format for (i.e. genbank dumps). Is there any reason why you can't get what you need from the GenBank format? http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? db=nucest&qty=1&c_start=1&list_uids=10280980&uids=&dopt=gb -jason On Jul 4, 2007, at 10:11 AM, Alberto Davila wrote: > Dear All, > > I am trying to get all ESTs from a given species (eg: Trypanosoma > brucei) from Genbank in EST format (eg: > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > db=nucest&id=10280980)... > while using Entrez I can "display" individual EST entries in EST > format, > this "EST format" is not an option in the main "display" menu for > batch > download ... > > I dont see the EST format listed > (http://www.bioperl.org/wiki/Sequence_formats) among the ones that > SeqIO > deal with, so wonder there would another BioPerl module to do > this ? any > tips, would be greatly appreciated ;-) > > Kindest regards, Alberto > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From dmessina at wustl.edu Wed Jul 4 18:37:22 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 4 Jul 2007 13:37:22 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> Message-ID: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > we start having fixed, > regularly timed dev releases like Parrot, monthly or bimonthly (quite > common on CPAN), with brief release reports on which bugs have been > fixed, code has been added, so on. Not every bug has to be fixed per > dev release; if that were true there would never be releases for some > of the XML parser packages. No RCs for dev releases (it's a dev > release!). These would be 1.x.y. We can then, every once in a > while, have a bug-squashing session, hackathon, etc, and have regular > non-dev release (1.x) that all core devs accept and that passes a > particular milestone. Regardless of whether we split or don't, I think these ideas of adding a little more structure to BioPerl's development cycles -- especially having bug-squashing and hacking sessions, where we all band together and commit some time to cranking through a bunch of to- dos -- would be beneficial, particularly as a means to keeping a certain basal level of momentum in BioPerl. Dave From jason at bioperl.org Wed Jul 4 19:45:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 12:45:29 -0700 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I definitely agree - we can live up to the unstable "living on the edge" nature of dev releases a bit more perhaps? On Jul 4, 2007, at 11:37 AM, David Messina wrote: > > On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > >> we start having fixed, >> regularly timed dev releases like Parrot, monthly or bimonthly (quite >> common on CPAN), with brief release reports on which bugs have been >> fixed, code has been added, so on. Not every bug has to be fixed per >> dev release; if that were true there would never be releases for some >> of the XML parser packages. No RCs for dev releases (it's a dev >> release!). These would be 1.x.y. We can then, every once in a >> while, have a bug-squashing session, hackathon, etc, and have regular >> non-dev release (1.x) that all core devs accept and that passes a >> particular milestone. > > > Regardless of whether we split or don't, I think these ideas of > adding a little more structure to BioPerl's development cycles -- > especially having bug-squashing and hacking sessions, where we all > band together and commit some time to cranking through a bunch of to- > dos -- would be beneficial, particularly as a means to keeping a > certain basal level of momentum in BioPerl. > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Wed Jul 4 20:54:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 15:54:14 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I think what's partially responsible for slowing down releases is the expectation that each dev release is supposed to have all bugs fixed, work for every OS, etc. In other words, act like a stable release. A developer release by nature is living on the edge, so why not have regular dev releases? We keep telling users to update to using bioperl-live whenever something breaks, anyway. We could decide to split stuff off along the way into more 'stable' sections if there were more demand for it, and have the more API-volatile code (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the 'dev' tag until we feel it's ready for prime time. chris On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > I definitely agree - we can live up to the unstable "living on the > edge" nature of dev releases a bit more perhaps? > > > On Jul 4, 2007, at 11:37 AM, David Messina wrote: > >> >> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: >> >>> we start having fixed, >>> regularly timed dev releases like Parrot, monthly or bimonthly >>> (quite >>> common on CPAN), with brief release reports on which bugs have been >>> fixed, code has been added, so on. Not every bug has to be fixed >>> per >>> dev release; if that were true there would never be releases for >>> some >>> of the XML parser packages. No RCs for dev releases (it's a dev >>> release!). These would be 1.x.y. We can then, every once in a >>> while, have a bug-squashing session, hackathon, etc, and have >>> regular >>> non-dev release (1.x) that all core devs accept and that passes a >>> particular milestone. >> >> >> Regardless of whether we split or don't, I think these ideas of >> adding a little more structure to BioPerl's development cycles -- >> especially having bug-squashing and hacking sessions, where we all >> band together and commit some time to cranking through a bunch of to- >> dos -- would be beneficial, particularly as a means to keeping a >> certain basal level of momentum in BioPerl. >> >> Dave >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Thu Jul 5 08:09:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 09:09:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: <468CA721.4020804@sheffield.ac.uk> Chris Fields wrote: > I think what's partially responsible for slowing down releases is the > expectation that each dev release is supposed to have all bugs fixed, > work for every OS, etc. In other words, act like a stable release. > > A developer release by nature is living on the edge, so why not have > regular dev releases? We keep telling users to update to using > bioperl-live whenever something breaks, anyway. We could decide to > split stuff off along the way into more 'stable' sections if there > were more demand for it, and have the more API-volatile code > (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the > 'dev' tag until we feel it's ready for prime time. > > chris > > On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > > -- snip -- I agree, although would the dev releases still need to pass all the tests? I'm thinking of people installing via CPAN. I also agree with what was said in a previous post about bringing back bioperl-run (and some others) back into the same repository as bioperl-core (after a successful move over to svn) and have Build.PL deal with creating the packages etc for CPAN. This would hopefully help keep the run package (and others) up to speed with the core package. I also agree with previous posts about organising and/or having some naming convention for test data files. I think an approach whereby data files were organised into directory trees (1 - 3 deep) with names that elude to the type of data in that subtree/file rather than the tests that use it etc. For example: t/data |__ formats | |__ seq | | |__ legal_fasta | | | |__ extension.fas | | | |__ extension.fasta | | | |__ extension.foo | | | |__ extension.bar | | | |__ no_extension | | | |__ interleaved.fas | | | |__ non_interleaved.fas | | | |__ single_seq.fas | | | |__ multiple_seq.fas | | | |__ desc_line1.fas | | | |__ desc_line2.fas | | | | | |__ illegal_fasta | | | |__ illegal_chars.fas | | | |__ some_other_illegal_alternative.fas | | | | | |__ legal_genbank | | | |__ etc etc | | | | | |__ illegal_genank | | |__ etc etc | | | |__ aln | |__ blast | | |__ legal_blastx | | | | | |__ legal_blastp | | | | | |__ legal_tblastx | | | | | |__ legal_plastpsi | | | | | |__ legal_wublast | |__ foo | |__ bar | |__ misc | |__ etc This type of setup, might lend itself to having a test script simply try to parse all the files in a directory to ensure nothing fails (for legal file formats) and fails for illegal formats. Naming of the file paths would help test authors to identify a suitable data file for their own tests before adding their own to the t/data dir. It might also help to identify areas where example test data is currently lacking. Thinking about this a little more, I think it would be a good idea to include Test::Exception in t/lib. We should also be testing that warnings and exceptions are generated when expected - e.g. illegal characters in seq files etc etc. Without these sorts of tests we are only getting half the story. This testing might account for a large chunk of the poor test coverage, particularly when it comes to branches in the code. Anyway, this type of reorganisation couldn't take place until the svn repo is up and working. I'd appreciate any comments on the above! Nath From bix at sendu.me.uk Thu Jul 5 08:55:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 09:55:25 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <468CB1FD.7060301@sendu.me.uk> Nathan S. Haigh wrote: > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Yes, they'd all have to pass. 'Developer release' should never have the connotation of 'broken release'. However, getting all tests to pass is a lot easier than fixing all bugs in bugzilla. (... which actually goes to show how poor our tests are) Worst case, if we were forced to stick to a schedule but couldn't fix a failing test, we could always make it a 'todo' test. > I also agree with what was said in a previous post about bringing back > bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) Agree (with myself essentially). > I also agree with previous posts about organising and/or having some > naming convention for test data files. I think an approach whereby data > files were organised into directory trees (1 - 3 deep) with names that > elude to the type of data in that subtree/file rather than the tests > that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas [snip] At that level, files don't need extensions and can have fully informative names that explain what's interesting or special about them. > This type of setup, might lend itself to having a test script simply try > to parse all the files in a directory to ensure nothing fails (for legal > file formats) and fails for illegal formats. Great idea. > Thinking about this a little more, I think it would be a good idea to > include Test::Exception in t/lib. Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > Anyway, this type of reorganisation couldn't take place until the svn > repo is up and working. Agree. From bix at sendu.me.uk Thu Jul 5 09:39:10 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 10:39:10 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <468CBC3E.1020408@sendu.me.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Thinking about this a little more, I think it would be a good idea to >> include Test::Exception in t/lib. > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. I've now done that: BioperlTest loads Test::Exception, from the copy in t/lib if necessary. So, in BioperlTest-using scripts you now have access to the methods dies_ok, lives_ok, throws_ok and lives_and. From N.Haigh at sheffield.ac.uk Thu Jul 5 10:01:04 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 11:01:04 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <1183629664.468cc1609891a@webmail.shef.ac.uk> Quoting Sendu Bala : -- snip -- > > > > I also agree with previous posts about organising and/or having some > > naming convention for test data files. I think an approach whereby data > > files were organised into directory trees (1 - 3 deep) with names that > > elude to the type of data in that subtree/file rather than the tests > > that use it etc. For example: > > > > t/data > > |__ formats > > | |__ seq > > | | |__ legal_fasta > > | | | |__ extension.fas > [snip] > > At that level, files don't need extensions and can have fully > informative names that explain what's interesting or special about them. > You may be correct in most cases, however, isn't there a method for detecting the file format from the file extension and failing that it peeks inside the file? Therefore there should be a file extension for each of these to get good code coverage as well as each format not having an extension to check that the peek inside the file correctly determines the format. -- snip -- From bix at sendu.me.uk Thu Jul 5 10:04:16 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:04:16 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <1183629664.468cc1609891a@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> Message-ID: <468CC220.804@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Sendu Bala : > > -- snip -- >> >>> I also agree with previous posts about organising and/or having >>> some naming convention for test data files. I think an approach >>> whereby data files were organised into directory trees (1 - 3 >>> deep) with names that elude to the type of data in that >>> subtree/file rather than the tests that use it etc. For example: >>> >>> t/data |__ formats | |__ seq | | |__ >>> legal_fasta | | | |__ extension.fas >>> >> [snip] >> >> At that level, files don't need extensions and can have fully >> informative names that explain what's interesting or special about >> them. >> > > You may be correct in most cases, however, isn't there a method for > detecting the file format from the file extension and failing that it > peeks inside the file? Therefore there should be a file extension for > each of these to get good code coverage as well as each format not > having an extension to check that the peek inside the file correctly > determines the format. Yes, you're quite correct. From bix at sendu.me.uk Thu Jul 5 10:47:12 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:47:12 +0100 Subject: [Bioperl-l] Warnings Message-ID: <468CCC30.90406@sendu.me.uk> I'm trying to get Test::Warn to work with Bioperl warnings as produced by Bio::Root::RootI::warn(). However, afaict the warnings must be generated with CORE::warn(), not print STDERR. Is there any particular reason RootI::warn is done with print and not CORE::warn ? Can I change it to a warn? From bix at sendu.me.uk Thu Jul 5 13:04:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:04:50 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <200707051458.59921.heikki@sanbi.ac.za> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> Message-ID: <468CEC72.4090909@sendu.me.uk> Heikki Lehvaslaiho wrote: > My guess is that using 'print STDERR' avoids showing sometimes annoying > errordescription at programname line NN > syntax being used. Afaik, CORE::warn "anything\n"; never includes the line number: messages with a new line always disable that feature. Bio::Root::RootI::warn /always/ puts new lines into the message, so they /never/ have the line number. > On the other hand, the main reason we need to set verbosity to 1 in BioPerl > objects is to find where warnings are coming from. Maybe extra text in > warnings leads to easier debugging. > > I favour changing it. So its my understanding there will be absolutely no difference in behaviour following this change (except that warning can be caught by Test::Warn). I just wanted to confirm my understanding. From hlapp at gmx.net Thu Jul 5 13:07:27 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 5 Jul 2007 09:07:27 -0400 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > Chris Fields wrote: >> I think what's partially responsible for slowing down releases is the >> expectation that each dev release is supposed to have all bugs fixed, >> work for every OS, etc. In other words, act like a stable release. >> It doesn't. A stable release has a stable API that will be supported until the next stable release through point releases. >> A developer release by nature is living on the edge, so why not have >> regular dev releases? There's no problem with regular dev releases, but tests will need to pass. There was never a stipulation that all bugs need to have been fixed. But all tests need to pass, so in an ideal world (in which everything is being tested) all tests passing would imply all (known) bugs fixed. Obviously, we don't live in an ideal world ... If not everything passes then what is the big difference to a code snapshot? If using cvs (or svn) is too difficult for most people, we can consider creating a mechanism that puts up nightly snapshots for download. > -- snip -- > > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. For example, that's another point. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From heikki at sanbi.ac.za Thu Jul 5 13:12:37 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 15:12:37 +0200 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <200707051512.38185.heikki@sanbi.ac.za> One more suggestion: It would be extemaly useful if we had a standard way of testing that a when a file is read into a bioperl object and then written out again into a same format, the input and output files are identical. If not, the test should show where the the differences start (showing all the differences would just clutter the screen). This standard method/subroutine should be used to test all sequence and other text file IO. Any takers? -Heikki On Thursday 05 July 2007 11:39:10 Sendu Bala wrote: > Sendu Bala wrote: > > Nathan S. Haigh wrote: > >> Thinking about this a little more, I think it would be a good idea to > >> include Test::Exception in t/lib. > > > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Jul 5 12:58:59 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 14:58:59 +0200 Subject: [Bioperl-l] Warnings In-Reply-To: <468CCC30.90406@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> Message-ID: <200707051458.59921.heikki@sanbi.ac.za> My guess is that using 'print STDERR' avoids showing sometimes annoying errordescription at programname line NN syntax being used. On the other hand, the main reason we need to set verbosity to 1 in BioPerl objects is to find where warnings are coming from. Maybe extra text in warnings leads to easier debugging. I favour changing it. -Heikki On Thursday 05 July 2007 12:47:12 Sendu Bala wrote: > I'm trying to get Test::Warn to work with Bioperl warnings as produced > by Bio::Root::RootI::warn(). However, afaict the warnings must be > generated with CORE::warn(), not print STDERR. > > Is there any particular reason RootI::warn is done with print and not > CORE::warn ? Can I change it to a warn? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Thu Jul 5 13:44:08 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:44:08 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF5A8.7040402@sendu.me.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that > a when a file is read into a bioperl object and then written out > again into a same format, the input and output files are identical. As Hilmar has pointed out in the past, Bioperl doesn't aim for the files to be identical, only for none of the information to be lost and to be ouput in the correct format. So a round-trip test should read in the original, store all the parsed data, write it out, then read in the written version and see if the new parsed data matches the original. For simpler or ultra-strict file formats, though... > If not, the test should show where the the differences start (showing > all the differences would just clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other text file IO. > > Any takers? There's already something along these lines in t/SeqIO.t (the section that uses Algorithm::Diff). I copied that over from the old testformats.pl script but haven't really taken the time to see if its a good way of doing the test. Is it? Can someone come up with something better? Can someone generalise it if necessary? I imagine you could just read the files into arrays and use Test::More::is_deeply(). If that would be satisfactory I could easily add a little method to BioperlTest that did that. From n.haigh at sheffield.ac.uk Thu Jul 5 13:47:24 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 14:47:24 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF66C.2070907@sheffield.ac.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that a when a > file is read into a bioperl object and then written out again into a same > format, the input and output files are identical. If not, the test should > show where the the differences start (showing all the differences would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence and other > text file IO. > > Any takers? > > -Heikki > Wouldn't this require info about the formatting of the file to be stored in the object as well, such that the same formatting could be used when writing the file? Wouldn't a better approach be to read the contents of file1 into ojb1, write obj1 to file2 in the same format, and then read file2 into obj2 and compare obj1 to obj2 to ensure we have all the same data. Nath From cjfields at uiuc.edu Thu Jul 5 13:52:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 08:52:12 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote: > ... > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Remains to be decided. All current tests (net and non-non) should pass. Any bug fixes should try to have added tests if possible, with in-process stuff as TODO's. Network tests are left up to user discretion, so if they fail for any particular reason there is a way around them. > I also agree with what was said in a previous post about bringing > back bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) and have > Build.PL deal with creating the packages etc for CPAN. This would > hopefully help keep the run package (and others) up to speed with > the core package. It's up to how we want to have everything split. I don't think it's immediately prescient (there are more important priorities, i.e. bugs, svn) but I would say folding everything back into live and 'splitting' them out using an automated Build process is a viable option. > I also agree with previous posts about organising and/or having > some naming convention for test data files. I think an approach > whereby data files were organised into directory trees (1 - 3 deep) > with names that elude to the type of data in that subtree/file > rather than the tests that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas > | | | |__ extension.fasta > | | | |__ extension.foo > | | | |__ extension.bar > | | | |__ no_extension > | | | |__ interleaved.fas > | | | |__ non_interleaved.fas > | | | |__ single_seq.fas > | | | |__ multiple_seq.fas > | | | |__ desc_line1.fas > | | | |__ desc_line2.fas > | | | > | | |__ illegal_fasta > | | | |__ illegal_chars.fas > | | | |__ > some_other_illegal_alternative.fas > | | | > | | |__ legal_genbank > | | | |__ etc etc > | | | > | | |__ illegal_genank > | | |__ etc etc > | | > | |__ aln > | |__ blast > | | |__ legal_blastx > | | | > | | |__ legal_blastp > | | | > | | |__ legal_tblastx > | | | > | | |__ legal_plastpsi > | | | > | | |__ legal_wublast > | |__ foo > | |__ bar > | |__ misc > | > |__ etc > > This type of setup, might lend itself to having a test script > simply try to parse all the files in a directory to ensure nothing > fails (for legal file formats) and fails for illegal formats. > Naming of the file paths would help test authors to identify a > suitable data file for their own tests before adding their own to > the t/data dir. It might also help to identify areas where example > test data is currently lacking. ... This seems like more of a 'guess sequence' and format validation issue, something we've talked about before: http://bugzilla.open-bio.org/show_bug.cgi?id=1508 The way I feel about it is sequence format validation and sequence parsing should be separate issues and therefore in separate classes (with parsing optionally preceded by validation), but that's something for another discussion. > Thinking about this a little more, I think it would be a good idea > to include Test::Exception in t/lib. We should also be testing that > warnings and exceptions are generated when expected - e.g. illegal > characters in seq files etc etc. Without these sorts of tests we > are only getting half the story. This testing might account for a > large chunk of the poor test coverage, particularly when it comes > to branches in the code. > > Anyway, this type of reorganisation couldn't take place until the > svn repo is up and working. > > I'd appreciate any comments on the above! > Nath chris From n.haigh at sheffield.ac.uk Thu Jul 5 14:08:29 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:08:29 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CF5A8.7040402@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> Message-ID: <468CFB5D.6080406@sheffield.ac.uk> Is there a way to install all the modules that are used in the tests? I mean there are cases where tests are skipped and pass if the required module for testing is not installed. Therefore, missing out a chunk of the tests. It would be desirable to be able to install all these modules in order to complete they whole test suite - any ideas if/how this can be done? Cheers Nath From bix at sendu.me.uk Thu Jul 5 14:15:34 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 15:15:34 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: <468CFD06.3080604@sendu.me.uk> Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these modules > in order to complete they whole test suite - any ideas if/how this can > be done? Yes, add them as recommended (or perhaps 'build_requires') modules in Build.PL, then run Build.PL and install the modules when it asks you. Everything should be in Build.PL already. If I missed something, please add it. From cjfields at uiuc.edu Thu Jul 5 14:18:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:08 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 9:08 AM, Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the > tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these > modules > in order to complete they whole test suite - any ideas if/how this can > be done? > > Cheers > Nath That's optionally done upon 'perl Build.PL', correct? So if you choose not to install a particular prereq (i.e. XML::SAX), you shouldn't be forced to install it later just for tests. Or am I misunderstanding you? chris From cjfields at uiuc.edu Thu Jul 5 14:18:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:23 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CC220.804@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > Nathan S. Haigh wrote: >> Quoting Sendu Bala : >>> ... >>> At that level, files don't need extensions and can have fully >>> informative names that explain what's interesting or special about >>> them. >>> >> >> You may be correct in most cases, however, isn't there a method for >> detecting the file format from the file extension and failing that it >> peeks inside the file? Therefore there should be a file extension for >> each of these to get good code coverage as well as each format not >> having an extension to check that the peek inside the file correctly >> determines the format. > > Yes, you're quite correct. I actually like Sendu's idea more, or the idea of each test suite having it's own directory. Tests which need to guess/validate the format are probably best left sequestered to a specific suite focused on format guessing/ validation, at least in my opinion. chris From n.haigh at sheffield.ac.uk Thu Jul 5 14:22:40 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:22:40 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFD06.3080604@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> Message-ID: <468CFEB0.80201@sheffield.ac.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Is there a way to install all the modules that are used in the tests? >> I mean there are cases where tests are skipped and pass if the >> required module for testing is not installed. Therefore, missing out a >> chunk of the tests. It would be desirable to be able to install all >> these modules in order to complete they whole test suite - any ideas >> if/how this can be done? > > Yes, add them as recommended (or perhaps 'build_requires') modules in > Build.PL, then run Build.PL and install the modules when it asks you. > > Everything should be in Build.PL already. If I missed something, please > add it. > OK, to clarify using the test file Sendu mentioned in a previous post: t/SeqIO.t This test skips tests if Algorithm::Diff, IO::ScalarArray or IO::String are not installed (the first two are not mentioned in Build.PL). However, if there are a lot of such skips in the whole test suite then there maybe few system with all these modules installed in order to conduct a complete test. These are the modules I'm referring to. Nath From n.haigh at sheffield.ac.uk Thu Jul 5 14:30:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:30:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: <468D006D.6050806@sheffield.ac.uk> Chris Fields wrote: > > On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > >> Nathan S. Haigh wrote: >>> Quoting Sendu Bala : >>>> ... >>>> At that level, files don't need extensions and can have fully >>>> informative names that explain what's interesting or special about >>>> them. >>>> >>> >>> You may be correct in most cases, however, isn't there a method for >>> detecting the file format from the file extension and failing that it >>> peeks inside the file? Therefore there should be a file extension for >>> each of these to get good code coverage as well as each format not >>> having an extension to check that the peek inside the file correctly >>> determines the format. >> >> Yes, you're quite correct. > > I actually like Sendu's idea more, or the idea of each test suite having > it's own directory. > > Tests which need to guess/validate the format are probably best left > sequestered to a specific suite focused on format guessing/validation, > at least in my opinion. > > chris How easily would this lend itself to using the same data for multiple tests, or is it likely to lead to/exacerbate a culture of adding duplicate data files in each "test suite" rather than reusing? Nath From cjfields at uiuc.edu Thu Jul 5 14:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:33:46 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 8:07 AM, Hilmar Lapp wrote: > On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > >> Chris Fields wrote: >>> I think what's partially responsible for slowing down releases is >>> the >>> expectation that each dev release is supposed to have all bugs >>> fixed, >>> work for every OS, etc. In other words, act like a stable release. > > It doesn't. A stable release has a stable API that will be > supported until the next stable release through point releases. I agree, but I think there is still an expectation that 1.5.2 and beyond are more like true 'stable' releases even though we still designate them as 'developer.' We unfortunately reinforce that when we tell users they need to update to v. 1.5.2 or bioperl-live to fix a particular bug in the 1.4 release. There's nothing we can do about that now (hindsight is always 20/20, and 1.4 is just too old). We (pumpkin, core devs) can try correcting that by ensuring any bug fixes be committed to any new stable branch as well as to live, at least until it becomes too problematic to maintain that particular stable branch (at which point we would go about getting ready for the next 'stable' and repeat the cycle over again). >>> A developer release by nature is living on the edge, so why not have >>> regular dev releases? > > There's no problem with regular dev releases, but tests will need > to pass. There was never a stipulation that all bugs need to have > been fixed. But all tests need to pass, so in an ideal world (in > which everything is being tested) all tests passing would imply all > (known) bugs fixed. Obviously, we don't live in an ideal world ... ...particularly when it comes to network-related tests and remote server problems (but those are by default not run, so there is a way around test fails there). I agree here as well (all tests must pass). As for the bug fixes, we can just stipulate which ones were fixed with the release (in a RELEASE_NOTES or similar), and maybe have TODO's in the test suite designating they are being worked on. Basically, at regular intervals, maybe with a few weeks of lead time, the pumpkin would announce an impending dev. release. Go through rounds of tests, bug fixes, etc. When all tests pass post it on CPAN as a dev. release. If we have a stable release branch with relevant bug fixes we can post that as well, again to the point where it becomes too problematic. Would we just take a snapshot of MAIN and any relevant stable branch at that particular point for the CPAN release, just increasing the version number (1.x.y)? Would it make sense to have a 1.x.y branch for each release (I don't think so, but maybe others disagree)? > If not everything passes then what is the big difference to a code > snapshot? If using cvs (or svn) is too difficult for most people, > we can consider creating a mechanism that puts up nightly snapshots > for download. If we feel a nightly snapshot is warranted we could do that though. I personally don't think there is a need, particularly since we have several means to obtain the latest code at any point in time (including the browsable CVS 'Download tarball'). We could state the next dev/stable CPAN release (pending on date dd/mm/yy) will have the bug fix, and if they want it immediately then pick it up from CVS. >> -- snip -- >> >> I agree, although would the dev releases still need to pass all the >> tests? I'm thinking of people installing via CPAN. > > For example, that's another point. > > -hilmar Yes, I agree. As an aside, I don't think dev. releases pop up when you run a simple 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know the answer to that. chris From cjfields at uiuc.edu Thu Jul 5 14:34:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:34:22 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > One more suggestion: > > It would be extemaly useful if we had a standard way of testing > that a when a > file is read into a bioperl object and then written out again into > a same > format, the input and output files are identical. If not, the test > should > show where the the differences start (showing all the differences > would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other > text file IO. > > Any takers? > > -Heikki ... I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t that do some checking, I think, but something like this would be of use. However, what if the test file is old (as many in t/data are) and the format has changed? GenBank and EMBL, for instance, have gone through several changes to format. chris From n.haigh at sheffield.ac.uk Thu Jul 5 14:43:51 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:43:51 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <468D03A7.3090408@sheffield.ac.uk> Chris Fields wrote: -- snip -- >>> >>> I agree, although would the dev releases still need to pass all the >>> tests? I'm thinking of people installing via CPAN. >> >> For example, that's another point. >> >> -hilmar > > Yes, I agree. > > As an aside, I don't think dev. releases pop up when you run a simple > 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know > the answer to that. > > chris Thats right, it'll only install the non-developer releases (1.4 currently). If you want to install the developer release from CPAN you need to know the path the archive and then do: cpan> install S/SE/SENDU/bioperl-1.5.2_102.tar.gz as detailed on the wiki: http://www.bioperl.org/wiki/Release_1.5.2 Nath From cjfields at uiuc.edu Thu Jul 5 14:49:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:49:33 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFEB0.80201@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> Message-ID: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > Sendu Bala wrote: >> ... >> Yes, add them as recommended (or perhaps 'build_requires') modules in >> Build.PL, then run Build.PL and install the modules when it asks you. >> >> Everything should be in Build.PL already. If I missed something, >> please >> add it. >> > > OK, to clarify using the test file Sendu mentioned in a previous post: > t/SeqIO.t > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > IO::String > are not installed (the first two are not mentioned in Build.PL). > However, if there are a lot of such skips in the whole test suite then > there maybe few system with all these modules installed in order to > conduct a complete test. These are the modules I'm referring to. > > Nath If they are only necessary for tests, work for all OSs, and are pure Perl they should be added to t/lib, like Test::More and the rest. If they only work for some OSs they could be added to t/lib and skip based on OS, but they still must be pure Perl. I would avoid anything that requires any compiling for XS or Inline altogether (I don't want to go down the nightmare road of OS-dependent compiler issues for a few tests). Finally, if they are needed for core modules (not just tests) then they should be added to the core prereqs in Build. chris From cjfields at uiuc.edu Thu Jul 5 14:52:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:52:58 -0500 Subject: [Bioperl-l] Warnings In-Reply-To: <468CEC72.4090909@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> Message-ID: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > ... > > So its my understanding there will be absolutely no difference in > behaviour following this change (except that warning can be caught by > Test::Warn). I just wanted to confirm my understanding. You can always just try it out and run tests. Might be interesting to see if anything breaks. chris From N.Haigh at sheffield.ac.uk Thu Jul 5 14:58:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 15:58:30 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> Message-ID: <1183647510.468d07168963c@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > > > > One more suggestion: > > > > It would be extemaly useful if we had a standard way of testing > > that a when a > > file is read into a bioperl object and then written out again into > > a same > > format, the input and output files are identical. If not, the test > > should > > show where the the differences start (showing all the differences > > would just > > clutter the screen). > > > > This standard method/subroutine should be used to test all sequence > > and other > > text file IO. > > > > Any takers? > > > > -Heikki > ... > > I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t > that do some checking, I think, but something like this would be of > use. However, what if the test file is old (as many in t/data are) > and the format has changed? GenBank and EMBL, for instance, have > gone through several changes to format. > > chris > > Is there any way to distinguish variants apart other than just layout? e.g. a version number of the likes? Nath From N.Haigh at sheffield.ac.uk Thu Jul 5 15:04:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 16:04:30 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> Message-ID: <1183647870.468d087ed4c80@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > > > Sendu Bala wrote: > >> ... > >> Yes, add them as recommended (or perhaps 'build_requires') modules in > >> Build.PL, then run Build.PL and install the modules when it asks you. > >> > >> Everything should be in Build.PL already. If I missed something, > >> please > >> add it. > >> > > > > OK, to clarify using the test file Sendu mentioned in a previous post: > > t/SeqIO.t > > > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > > IO::String > > are not installed (the first two are not mentioned in Build.PL). > > However, if there are a lot of such skips in the whole test suite then > > there maybe few system with all these modules installed in order to > > conduct a complete test. These are the modules I'm referring to. > > > > Nath > > If they are only necessary for tests, work for all OSs, and are pure > Perl they should be added to t/lib, like Test::More and the rest. If > they only work for some OSs they could be added to t/lib and skip > based on OS, but they still must be pure Perl. I would avoid > anything that requires any compiling for XS or Inline altogether (I > don't want to go down the nightmare road of OS-dependent compiler > issues for a few tests). If this is the case, there surely is no need to skip the tests if they should be provided in the t/lib dir. Am I missing something!? > > Finally, if they are needed for core modules (not just tests) then > they should be added to the core prereqs in Build. > > chris > From bix at sendu.me.uk Thu Jul 5 15:13:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:13:35 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: <468D0A9F.4010709@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Chris Fields : >>> OK, to clarify using the test file Sendu mentioned in a previous >>> post: t/SeqIO.t >>> >>> This test skips tests if Algorithm::Diff, IO::ScalarArray or >>> IO::String are not installed >> >> If they are only necessary for tests, work for all OSs, and are >> pure Perl they should be added to t/lib, like Test::More and the >> rest. If they only work for some OSs they could be added to t/lib >> and skip based on OS, but they still must be pure Perl. I would >> avoid anything that requires any compiling for XS or Inline >> altogether (I don't want to go down the nightmare road of >> OS-dependent compiler issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? That skip in SeqIO.t is new and I simply didn't think of them as important enough to make anyone install them or include them in t/lib. I'd go ahead and add those modules, but like I say, it may make more sense just to use is_deeply(), removing the dependency on Algorithm::Diff and IO::ScalarArray completely. From cjfields at uiuc.edu Thu Jul 5 15:35:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:35:41 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: On Jul 5, 2007, at 10:04 AM, Nathan S. Haigh wrote: > ... >> If they are only necessary for tests, work for all OSs, and are pure >> Perl they should be added to t/lib, like Test::More and the rest. If >> they only work for some OSs they could be added to t/lib and skip >> based on OS, but they still must be pure Perl. I would avoid >> anything that requires any compiling for XS or Inline altogether (I >> don't want to go down the nightmare road of OS-dependent compiler >> issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? No, you are correct, but these are currently not in t/lib (unless someone snuck them in....) Of the modules you listed above, only one (IO::String) is required by the core modules. The others are not. Users shouldn't be forced to install Algorithm::Diff or IO::ScalarArray just to run tests, so anything not required should go into t/lib if at all possible. If there any reasons (OS issues, list of prereqs) which preclude adding these to t/lib we need to ask ourselves (1) why we are using that module in the first place? And, if there is a good reason, (2) can we skip them if they aren't present? Both of those options are already available. chris From cjfields at uiuc.edu Thu Jul 5 15:50:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:50:55 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468D006D.6050806@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> <468D006D.6050806@sheffield.ac.uk> Message-ID: <404EEDE8-53AC-411E-B4F0-CF4B4AABE9E0@uiuc.edu> On Jul 5, 2007, at 9:30 AM, Nathan S. Haigh wrote: > ... >> I actually like Sendu's idea more, or the idea of each test suite >> having it's own directory. >> Tests which need to guess/validate the format are probably best >> left sequestered to a specific suite focused on format guessing/ >> validation, at least in my opinion. >> chris > > > How easily would this lend itself to using the same data for > multiple tests, or is it likely to lead to/exacerbate a culture of > adding duplicate data files in each "test suite" rather than reusing? > > Nath If there is a group of test data used for more than one test suite we can group those together into a common use folder, or we can go by format. I'm pretty open to anything, really, as long as it is more organized. My point is really concerned more with validation/guessing. I think we should limit those tests to their respective specific test suites, or even to sections within a particular test suite (for instance, genbank.t), but not to force sequence guessing or validation in other cases. To me validation, guessing, and parsing are three distinct issues (much like XML parsers handle things), so they require three distinct tests. As for true sequence validation, there is no official format validation scheme yet in BioPerl. It's sort of unofficially intergrated into the sequence parsers themselves (something which I find to be problematic for several reasons too long to outline here). chris From cjfields at uiuc.edu Thu Jul 5 15:54:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:54:42 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <1183647510.468d07168963c@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> <1183647510.468d07168963c@webmail.shef.ac.uk> Message-ID: <48474A2C-2A58-4D51-8E7F-7CE083948D0F@uiuc.edu> On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote: > Quoting Chris Fields : > >> >> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: >> >>> >>> One more suggestion: >>> >>> It would be extemaly useful if we had a standard way of testing >>> that a when a >>> file is read into a bioperl object and then written out again into >>> a same >>> format, the input and output files are identical. If not, the test >>> should >>> show where the the differences start (showing all the differences >>> would just >>> clutter the screen). >>> >>> This standard method/subroutine should be used to test all sequence >>> and other >>> text file IO. >>> >>> Any takers? >>> >>> -Heikki >> ... >> >> I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t >> that do some checking, I think, but something like this would be of >> use. However, what if the test file is old (as many in t/data are) >> and the format has changed? GenBank and EMBL, for instance, have >> gone through several changes to format. >> >> chris >> >> > > Is there any way to distinguish variants apart other than just > layout? e.g. a version number of the likes? > > Nath I don't think so; this veers back into the whole validation issue (i.e. does the record fit certain specifications). There are examples of seq records from different sources which bioperl is expected to parse, for example Ensembl GenBank records. Some of those have feature tags or annotation fields which may not appear in output when using write_seq(). I don't think it's as important to replicate the output data exactly like the input as much as it's important to have the data represented in a Bio::Seq object (or any other Bio* instance) in a consistent manner and have the ability to incorporate new fields (such as the recent addition of genome projects) transparently. The latter is hard to do with the current genbank parser (you have to specifically code for it), but it is a bit easier to do with the driver-handler model I'm working on. chris From bix at sendu.me.uk Thu Jul 5 15:56:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:56:29 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <468D14AD.8050007@sendu.me.uk> Sendu Bala wrote: > Sendu Bala wrote: >> Nathan S. Haigh wrote: >>> Thinking about this a little more, I think it would be a good idea to >>> include Test::Exception in t/lib. >> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. And I've also now added in support for Test::Warn, giving you warning_is, warnings_are, warning_like and warnings_like. I've updated the HOWTO as well: http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests You can see these things in action in t/seq_quality.t From bix at sendu.me.uk Thu Jul 5 15:57:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:57:23 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> Message-ID: <468D14E3.6030104@sendu.me.uk> Chris Fields wrote: > > On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > >> ... >> >> So its my understanding there will be absolutely no difference in >> behaviour following this change (except that warning can be caught by >> Test::Warn). I just wanted to confirm my understanding. > > You can always just try it out and run tests. Might be interesting to > see if anything breaks. I've made the change. Everything seems ok as far as I can tell. From dmessina at wustl.edu Thu Jul 5 16:02:26 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:02:26 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 9:33 AM, Chris Fields wrote: > I agree, but I think there is still an expectation that 1.5.2 and > beyond are more like true 'stable' releases even though we still > designate them as 'developer.' We unfortunately reinforce that when > we tell users they need to update to v. 1.5.2 or bioperl-live to fix > a particular bug in the 1.4 release. I know this has been discussed before, but while we're talking about future release plans, it might be worth revisiting the BioPerl policy of designating only even-numbered releases as 'stable'. It's taking so long to get from 1.4 to 1.6. While the principle of keeping a stable API between 'stable' releases is valid in the ideal case, I think that continuing to label 1.5.2 (or whatever the latest 'dev' release is) as a developer release (which implies potentially unstable or bleeding-edge code) is highly misleading since we would never ever tell anyone to get 1.4 instead. Alternatively, if we adopt a more aggressive release schedule as Chris proposed a couple days ago, then perhaps we could agree to push out an even-numbered release once a year or so, so that there is a 'stable' release we could recommend. > If we feel a nightly snapshot is warranted we could do that though. > I personally don't think there is a need, particularly since we have > several means to obtain the latest code at any point in time > (including the browsable CVS 'Download tarball'). We could state the > next dev/stable CPAN release (pending on date dd/mm/yy) will have the > bug fix, and if they want it immediately then pick it up from CVS. To make it easier for people to obtain the latest tarball, we could put the 'download tarball' link directly on the 'Getting_BioPerl' wiki page instead of only a link to the viewcvs interface. That way they wouldn't have to navigate the source tree to figure out which tarball they want (which is almost always going to be the bioperl- live tarball). I think the actual URL underlying the 'Download tarball' link on viewcvs is stable: http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- live.tar.gz?tarball=1 Dave From cjfields at uiuc.edu Thu Jul 5 16:13:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:13:30 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 11:02 AM, David Messina wrote: > ... > I know this has been discussed before, but while we're talking > about future release plans, it might be worth revisiting the > BioPerl policy of designating only even-numbered releases as > 'stable'. It's taking so long to get from 1.4 to 1.6. While the > principle of keeping a stable API between 'stable' releases is > valid in the ideal case, I think that continuing to label 1.5.2 (or > whatever the latest 'dev' release is) as a developer release (which > implies potentially unstable or bleeding-edge code) is highly > misleading since we would never ever tell anyone to get 1.4 instead. > > Alternatively, if we adopt a more aggressive release schedule as > Chris proposed a couple days ago, then perhaps we could agree to > push out an even-numbered release once a year or so, so that there > is a 'stable' release we could recommend. I think the idea of 'stable' is best summarized back in Hilmar's post (i.e. we support a particular API for that release). The 1.5 releases I believe break some aspects of 1.4 API (some of the Feature/ Annotation stuff introduced before the official 1.5 release). We still need to address some of those issues before a 1.6 which seems to be the only real stumbling block, but they are unfortunately not well-documented and are somewhat interwoven with GMOD code. > ... > To make it easier for people to obtain the latest tarball, we could > put the 'download tarball' link directly on the 'Getting_BioPerl' > wiki page instead of only a link to the viewcvs interface. That way > they wouldn't have to navigate the source tree to figure out which > tarball they want (which is almost always going to be the bioperl- > live tarball). > > I think the actual URL underlying the 'Download tarball' link on > viewcvs is stable: > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- > live.tar.gz?tarball=1 > > Dave Sounds reasonable enough. Do you want to do the honors? chris From dmessina at wustl.edu Thu Jul 5 16:44:28 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:44:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> > [Chris] > The 1.5 releases I believe break some aspects of 1.4 API Yes, this is true. I question, though, whether it's relevant given that virtually no one uses 1.4 anymore. In any case, I would venture that the number of people who would be bitten by the 1.4->1.5 API change is much smaller than the number of people who download 1.4 and then ask us why it doesn't work. I think that, rather than continuing to call 1.5.x the developer release in order to adhere to the API guarantee, it would be much clearer to users if we state clearly that everyone should download 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API changes. >> [me] >> we could put the 'download tarball' link directly on the >> 'Getting_BioPerl' wiki page > > [Chris] > Sounds reasonable enough. Do you want to do the honors? Done. Dave From cjfields at uiuc.edu Thu Jul 5 16:57:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:57:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: On Jul 5, 2007, at 11:44 AM, David Messina wrote: > >> [Chris] >> The 1.5 releases I believe break some aspects of 1.4 API > > Yes, this is true. > > I question, though, whether it's relevant given that virtually no > one uses 1.4 anymore. In any case, I would venture that the number > of people who would be bitten by the 1.4->1.5 API change is much > smaller than the number of people who download 1.4 and then ask us > why it doesn't work. > > I think that, rather than continuing to call 1.5.x the developer > release in order to adhere to the API guarantee, it would be much > clearer to users if we state clearly that everyone should download > 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API > changes. You'd be surprised how many are still using bioperl 1.2.3 (Ensembl) and 1.4 (any admin too scared to go with a 'dev' release). The real answer is to get out a stable 1.6 ASAP. The problem we currently have is (horrible Texas pun) 'too many pokers in the fire.' We have svn migration, major changes in the test suite, talk about splitting bioperl, a lot of bugs to sort through, new code to add or work on, etc. Not to mention our $jobs! I think we should just bite the bullet and proceed with pulling out the controversial operator overloading in Bio::Annotation*, deprecate the tag methods in AnnotatableI, and go about fixing everything up. If that occurs (which seems to be the major impediment) and we get GMOD/GBrowse playing well with BioPerl then we can aim for a new stable release, and then institute a regular release cycle. chris From bpederse at gmail.com Thu Jul 5 17:58:24 2007 From: bpederse at gmail.com (Brent Pedersen) Date: Thu, 5 Jul 2007 10:58:24 -0700 Subject: [Bioperl-l] slippy map for genomic features. Message-ID: hi, here's a side project i've been tinkering on in googlecode svn that may be useful to some. http://code.google.com/p/genome-browser/ it's a simple hack on top of OpenLayers (openlayers.org) to provide a javascript slippy map interface and API to view and browse genomic features. It can be used with any image generation program that can accept &xmin= and &xmax= parameters through the url. -- though i havent had it working it bioperl as bioperl generates images of different height depending on the number of tracks. there's a live example of the code in SVN here: http://toxic.berkeley.edu/bpederse/genome-browser/ with images generated by a colleague's modules on first request. those images are then cached by a simple perl script included in the SVN repo. all subsequent requests are returned from the cache. an image request (automatically generated by the javascript) looks like: http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512 but any implementation need only implement xmin and xmax. all other parameters will be used for caching but are not required. if anyone is interested in getting this going with bioperl image generation--or improving the project in any way, let me know and i'll add you as a committer and provide any javascript support that i can. -brent tar ball download: http://genome-browser.googlecode.com/files/genome-browser-0.02.tar From dmessina at wustl.edu Thu Jul 5 18:39:16 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 13:39:16 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: > The real answer is to get out a stable 1.6 ASAP. The problem we > currently have is (horrible Texas pun) 'too many pokers in the > fire.' We have svn migration, major changes in the test suite, > talk about splitting bioperl, a lot of bugs to sort through, new > code to add or work on, etc. Not to mention our $jobs! Yep, I hear ya. > I think we should just bite the bullet and proceed with pulling out > the controversial operator overloading in Bio::Annotation*, > deprecate the tag methods in AnnotatableI, and go about fixing > everything up. If that occurs (which seems to be the major > impediment) and we get GMOD/GBrowse playing well with BioPerl then > we can aim for a new stable release, and then institute a regular > release cycle. That's a great plan. You're right -- better to devote energy to 1.6 than to interim solutions. Alright, I give, I give! :) Dave From glauberwagner at yahoo.com.br Thu Jul 5 19:56:43 2007 From: glauberwagner at yahoo.com.br (Glauber Wagner) Date: Thu, 5 Jul 2007 16:56:43 -0300 (ART) Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com> Message-ID: <839755.95349.qm@web36514.mail.mud.yahoo.com> Dear All, I have a problem if Bio::DB::Query::GenBank module. I am trying to count the number of protein sequences and the module did not return the expected number by count object. use Bio::DB::GenBank; use Bio::DB::Query::GenBank; $query_string = "Trypanosoma cruzi[Organism]"; my $query = Bio::DB::Query::GenBank->new(-db=>'protein', -query=>$query_string); my $count = $query->count; my @ids = $query->ids; print "$count\n"; Thanks. Glauber ____________________________________________________________________________________ Novo Yahoo! Cad?? - Experimente uma nova busca. http://yahoo.com.br/oqueeuganhocomisso From cjfields at uiuc.edu Thu Jul 5 20:21:49 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 15:21:49 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <839755.95349.qm@web36514.mail.mud.yahoo.com> References: <839755.95349.qm@web36514.mail.mud.yahoo.com> Message-ID: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> NCBI esearch doesn't seem to be working at the moment. I'm getting 'Internal Server Error' at this time. Try back again at a later point. chris On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote: > Dear All, > > I have a problem if Bio::DB::Query::GenBank module. I > am trying to count the number of protein sequences and > the module did not return the expected number by count > object. > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > $query_string = "Trypanosoma cruzi[Organism]"; > > my $query = > Bio::DB::Query::GenBank->new(-db=>'protein', > > -query=>$query_string); > my $count = $query->count; > my @ids = $query->ids; > > print "$count\n"; > > Thanks. > Glauber > > > > > ______________________________________________________________________ > ______________ > Novo Yahoo! Cad?? - Experimente uma nova busca. > http://yahoo.com.br/oqueeuganhocomisso > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mitch_skinner at berkeley.edu Thu Jul 5 21:22:38 2007 From: mitch_skinner at berkeley.edu (Mitch Skinner) Date: Thu, 05 Jul 2007 14:22:38 -0700 Subject: [Bioperl-l] slippy map for genomic features. In-Reply-To: References: Message-ID: <468D611E.7020904@berkeley.edu> Hi, FWIW, we've been working on something similar: http://genome.biowiki.org/dmel/static/browser/prototype_gbrowse.html based on GBrowse/Bio::Graphics and javascript that Andrew wrote from scratch (with the prototype library). When our project was starting up (fall 05) Andrew looked but didn't find openlayers; I'm not sure if it was public back then but their current svn only goes back to 2006. I think that things like layout (bumping) ought to be done in advance on a chromosome-wide basis; otherwise it's difficult to keep features from ending up at different heights on neighboring tiles. And it would be difficult for the server to know what was being clicked on. So we've been doing some up-front work to either do layout or to just render all the tiles in advance: http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/TileGenerator.pm?revision=1.1&view=markup which is driven by this script: http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/generate-tiles.pl?revision=1.14&view=markup Or you could just not bump at all, I guess. I think of that as important functionality but I'd be interested in hearing about use cases where it's not necessary. It's not just bumping, though; things like text labels also make it difficult to predict exactly what pixels a feature will span if you only have its genomic coordinates. To make features clickable we've been using imagemaps; it simplifies the server code but it bogs down the client quite a bit. I'd certainly be interested in seeing if there are ways we could work together; if you're at Berkeley maybe we could meet. Regards, Mitch Brent Pedersen wrote: > hi, > here's a side project i've been tinkering on in googlecode svn that > may be useful to some. > http://code.google.com/p/genome-browser/ > it's a simple hack on top of OpenLayers (openlayers.org) to provide a > javascript slippy map interface and API to view and browse genomic > features. It can be used with any image generation program that can > accept &xmin= and &xmax= parameters through the url. -- though i > havent had it working it bioperl as bioperl generates images of > different height depending on the number of tracks. > > there's a live example of the code in SVN here: > http://toxic.berkeley.edu/bpederse/genome-browser/ > with images generated by a colleague's modules on first request. those > images are then cached by a simple perl script included in the SVN > repo. all subsequent requests are returned from the cache. > an image request (automatically generated by the javascript) looks like: > http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512 > but any implementation need only implement xmin and xmax. all other > parameters will be used for caching but are not required. > > if anyone is interested in getting this going with bioperl image > generation--or improving the project in any way, let me know and i'll > add you as a committer and provide any javascript support that i can. > > -brent > > tar ball download: > http://genome-browser.googlecode.com/files/genome-browser-0.02.tar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Jul 5 21:42:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 16:42:40 -0500 Subject: [Bioperl-l] Bio::DB::Query::GenBank failures In-Reply-To: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> References: <839755.95349.qm@web36514.mail.mud.yahoo.com> <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu> Message-ID: <3219E785-D475-4C21-ABCC-89FABD502E05@uiuc.edu> Update: seems to be back up. Give it a try now. chris On Jul 5, 2007, at 3:21 PM, Chris Fields wrote: > NCBI esearch doesn't seem to be working at the moment. I'm getting > 'Internal Server Error' at this time. Try back again at a later > point. > > chris > > On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote: > >> Dear All, >> >> I have a problem if Bio::DB::Query::GenBank module. I >> am trying to count the number of protein sequences and >> the module did not return the expected number by count >> object. >> >> use Bio::DB::GenBank; >> use Bio::DB::Query::GenBank; >> >> $query_string = "Trypanosoma cruzi[Organism]"; >> >> my $query = >> Bio::DB::Query::GenBank->new(-db=>'protein', >> >> -query=>$query_string); >> my $count = $query->count; >> my @ids = $query->ids; >> >> print "$count\n"; >> >> Thanks. >> Glauber >> >> >> >> >> _____________________________________________________________________ >> _ >> ______________ >> Novo Yahoo! Cad?? - Experimente uma nova busca. >> http://yahoo.com.br/oqueeuganhocomisso >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Fri Jul 6 07:09:17 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 08:09:17 +0100 Subject: [Bioperl-l] API Changes In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: <468DEA9D.6010809@sheffield.ac.uk> David Messina wrote: >> [Chris] >> The 1.5 releases I believe break some aspects of 1.4 API >> > > Yes, this is true. > > I question, though, whether it's relevant given that virtually no one > uses 1.4 anymore. In any case, I would venture that the number of > people who would be bitten by the 1.4->1.5 API change is much smaller > than the number of people who download 1.4 and then ask us why it > doesn't work. > I'm not really up-to-speed with how the API should remain stable etc. Is the idea that the API should be stable from 1.4 though the 1.5 dev and then the next stale release can change that API? So any stable to stable upgrade could involve an API change while a stable to dev upgrade should have the same API? Does a stable API mean that the same method calls are available in a newer release....what about adding new methods to a newer release? How are these API changes currently tracked? It seems to me that Test::More might be able to help in testing the API: can_ok($module, @methods); Nath From n.haigh at sheffield.ac.uk Fri Jul 6 11:10:14 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 12:10:14 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange Message-ID: <468E2316.1030804@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm taking a look at the tests for Bio::Variation::RNAChange. If you create a new oject without arguments: my $obj = Bio::Variation::RNAChange->new(); What do you expect the following to return: $obj->label(); I thought it would probably be: 'inframe' However you get: 'inframe, deletion' Can anyone in the know explain what behaviour would be expected? Cheers Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjiMVczuW2jkwy2gRAv+0AJ9tA/1WgEbTRCen+FCi/DU/P2RnAwCfbGit B8DxDViDOcx2gTFjSwQ2kNg= =SroY -----END PGP SIGNATURE----- From n.haigh at sheffield.ac.uk Fri Jul 6 12:54:33 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 13:54:33 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E2316.1030804@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> Message-ID: <468E3B89.3090202@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nathan S. Haigh wrote: > I'm taking a look at the tests for Bio::Variation::RNAChange. > > If you create a new oject without arguments: > my $obj = Bio::Variation::RNAChange->new(); > > What do you expect the following to return: > $obj->label(); > > I thought it would probably be: > 'inframe' > > However you get: > 'inframe, deletion' > > Can anyone in the know explain what behaviour would be expected? > > Cheers > Nath Following on from this, AAChange has the following two methods: add_Allele() and allele_mut() It appears that allele_mut is only capable of remembering 1 allele at a time, whereas add_Allele() is provided to add support for mutliple alleles - is that correct? However, add_Allele() also calls allele_mut(), such that mutliple calls to add_Allele will result in the overwriting of the allele being remembered by allele_mut(). Things are further complicated by the fact that label() uses allele_mut() to decide on the label to return. Shouldn't label know aout multiple alleles set by multiple calls to add_Allele? It may be my lack of understanding alleles and what these classes are intending to do, but trying to rewrite the test scripts to improve code coverage has let me a little confused! Thanks Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjjuJczuW2jkwy2gRAgogAKDXAn8h5iFIBCjtQgxYsrUGofYpOwCguC6I b8ZOENvDDDIxphAoxeKg8/E= =f/sa -----END PGP SIGNATURE----- From tanzeem.mb at gmail.com Thu Jul 5 06:39:34 2007 From: tanzeem.mb at gmail.com (tanzeem) Date: Wed, 4 Jul 2007 23:39:34 -0700 (PDT) Subject: [Bioperl-l] Problem working with remoteblast submit method in webbrowser. In-Reply-To: <11114623.post@talk.nabble.com> References: <11114623.post@talk.nabble.com> Message-ID: <11441586.post@talk.nabble.com> Ifound it myself.run apache as root and disable selinux, the problem will not recur. tanzeem wrote: > > I have a program which uses the Bio perl remoteblast module which > compares a aminoacid fasta file with swissprot database. The > submit_blast() method works successfully when run from commandline.But > when the program is run from web browser it returns -1. I was trying to > adapt the code from Remoteblast synopsis for my need. > -- View this message in context: http://www.nabble.com/Problem-working-with-remoteblast-submit-method-in-webbrowser.-tf3919886.html#a11441586 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Fri Jul 6 13:00:32 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Fri, 06 Jul 2007 09:00:32 -0400 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: <1183726832.2566.34.camel@localhost.localdomain> On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote: > > I think we should just bite the bullet and proceed with pulling out > the controversial operator overloading in Bio::Annotation*, deprecate > the tag methods in AnnotatableI, and go about fixing everything up. > If that occurs (which seems to be the major impediment) and we get > GMOD/GBrowse playing well with BioPerl then we can aim for a new > stable release, and then institute a regular release cycle. > I think this sounds like a good idea to me too. I'm planning on having a GMOD hackathon at the end of the summer; if I had a new API by then, we could focus on fixing anything that gets broken by the changes. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From cjfields at uiuc.edu Fri Jul 6 13:10:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 6 Jul 2007 08:10:41 -0500 Subject: [Bioperl-l] API Changes In-Reply-To: <468DEA9D.6010809@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> Message-ID: On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote: > David Messina wrote: >>> [Chris] >>> The 1.5 releases I believe break some aspects of 1.4 API >>> >> >> Yes, this is true. >> >> I question, though, whether it's relevant given that virtually no one >> uses 1.4 anymore. In any case, I would venture that the number of >> people who would be bitten by the 1.4->1.5 API change is much smaller >> than the number of people who download 1.4 and then ask us why it >> doesn't work. >> > > I'm not really up-to-speed with how the API should remain stable > etc. Is > the idea that the API should be stable from 1.4 though the 1.5 dev and > then the next stale release can change that API? So any stable to > stable > upgrade could involve an API change while a stable to dev upgrade > should > have the same API? Does a stable API mean that the same method > calls are > available in a newer release....what about adding new methods to a > newer > release? > > How are these API changes currently tracked? It seems to me that > Test::More might be able to help in testing the API: > > can_ok($module, @methods); > > > Nath It's basically a 'contract' of sorts between the devs (us) and users (us/them) that the API won't change for the extent of that release series, thus ensuring any scripts out there generating tons of data won't break down if they attempt to call a renamed method. We try to maintain the API state anyway for those reasons, but in a dev release series we might decide to change some method names for consistency and deprecate older ambiguously-named methods (see below). For a stable release it's critical the API remain intact. There are a few methods which are considered deprecated or will be deprecated. For instance, we recently talked about changes to method names which use case to specify whether you're receiving an object (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested list, or whether to use each_* vs next_* for iterators. Consistency is nice! chris From heikki at sanbi.ac.za Fri Jul 6 13:20:26 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 6 Jul 2007 15:20:26 +0200 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E3B89.3090202@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> <468E3B89.3090202@sheffield.ac.uk> Message-ID: <200707061520.27000.heikki@sanbi.ac.za> Hi Nat, These modules have not been touched for a while and were developed for a specific task. A revire is defiitely in order. The way RNAChange->label was written, it should return 'inframe' when given no alleles, but 'no change' would actually be better. The multiple alleles were originally though to be a good idea, but the vocabulary for labels was developed for single allele, only, The use of the module ended up being limited to single allele, so add_allele() behaviour was conveniently ignored but not removed. :( -Heikki On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: > Nathan S. Haigh wrote: > > I'm taking a look at the tests for Bio::Variation::RNAChange. > > > > If you create a new oject without arguments: > > my $obj = Bio::Variation::RNAChange->new(); > > > > What do you expect the following to return: > > $obj->label(); > > > > I thought it would probably be: > > 'inframe' > > > > However you get: > > 'inframe, deletion' > > > > Can anyone in the know explain what behaviour would be expected? > > > > Cheers > > Nath > > Following on from this, AAChange has the following two methods: > add_Allele() and allele_mut() > > It appears that allele_mut is only capable of remembering 1 allele at a > time, whereas add_Allele() is provided to add support for mutliple > alleles - is that correct? > > However, add_Allele() also calls allele_mut(), such that mutliple calls > to add_Allele will result in the overwriting of the allele being > remembered by allele_mut(). Things are further complicated by the fact > that label() uses allele_mut() to decide on the label to return. > Shouldn't label know aout multiple alleles set by multiple calls to > add_Allele? > > It may be my lack of understanding alleles and what these classes are > intending to do, but trying to rewrite the test scripts to improve code > coverage has let me a little confused! > > Thanks > Nath > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From schlesi at ebi.ac.uk Fri Jul 6 14:24:05 2007 From: schlesi at ebi.ac.uk (Felix Schlesinger) Date: Fri, 6 Jul 2007 15:24:05 +0100 Subject: [Bioperl-l] Unrooting a tree Message-ID: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> Hi, I am reading a rooted tree in newick format from a string (i.e. a bifurcation at the root) and would like to unroot it (i.e. a trifurcation at the root). I tried getting a grandchild of the root and adding it as a direct child, but that does not seem to work (the root still only has two descendents and the tree structure gets messed up). Is there a nice way to do this directly in bioperl? Doing it on the newick string is possible of course, but not nice. Thanks Felix From n.haigh at sheffield.ac.uk Fri Jul 6 15:37:19 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 16:37:19 +0100 Subject: [Bioperl-l] API Changes In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> Message-ID: <468E61AF.9040106@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris Fields wrote: > > On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote: > >> David Messina wrote: >>>> [Chris] >>>> The 1.5 releases I believe break some aspects of 1.4 API >>>> >>> >>> Yes, this is true. >>> >>> I question, though, whether it's relevant given that virtually no one >>> uses 1.4 anymore. In any case, I would venture that the number of >>> people who would be bitten by the 1.4->1.5 API change is much smaller >>> than the number of people who download 1.4 and then ask us why it >>> doesn't work. >>> >> >> I'm not really up-to-speed with how the API should remain stable etc. Is >> the idea that the API should be stable from 1.4 though the 1.5 dev and >> then the next stale release can change that API? So any stable to stable >> upgrade could involve an API change while a stable to dev upgrade should >> have the same API? Does a stable API mean that the same method calls are >> available in a newer release....what about adding new methods to a newer >> release? >> >> How are these API changes currently tracked? It seems to me that >> Test::More might be able to help in testing the API: >> >> can_ok($module, @methods); >> >> >> Nath > > It's basically a 'contract' of sorts between the devs (us) and users > (us/them) that the API won't change for the extent of that release > series, thus ensuring any scripts out there generating tons of data > won't break down if they attempt to call a renamed method. We try to > maintain the API state anyway for those reasons, but in a dev release > series we might decide to change some method names for consistency and > deprecate older ambiguously-named methods (see below). For a stable > release it's critical the API remain intact. Hmm, still not 100% clear - it is Friday! So, someone running a script that was designed when 1.4 was released should still be able to run their script for all future releases. So all changes need to be backward compatible? So you have several situations regarding method names: 1) Adding new methods should e fine since past scripts don't know about them and won't have used them 2) Removing methods would break past scripts that used them 3) Renamed methods would break past scripts that used the old name A stable API to me, means the same method calls should still be able to accept the same arguments (inc the constructor) and return the same object/data etc. What if a module is pretty outdated and would benefit from a rewrite - should all the old method names be included, what if this makes coding difficult? > > There are a few methods which are considered deprecated or will be > deprecated. For instance, we recently talked about changes to method > names which use case to specify whether you're receiving an object > (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested > list, or whether to use each_* vs next_* for iterators. Consistency is > nice! > You mean the use of case to signify objects vs data being returned are to be deprecated or encouraged? What was the outcome of the each_* vs next_*? Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjmGvczuW2jkwy2gRAkGeAKDBXVSBvN0b39xbK1+2RLed35knSQCgz3pk kAWH1zVa1ycopijl761cvkQ= =fppH -----END PGP SIGNATURE----- From n.haigh at sheffield.ac.uk Fri Jul 6 15:43:41 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 06 Jul 2007 16:43:41 +0100 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <200707061520.27000.heikki@sanbi.ac.za> References: <468E2316.1030804@sheffield.ac.uk> <468E3B89.3090202@sheffield.ac.uk> <200707061520.27000.heikki@sanbi.ac.za> Message-ID: <468E632D.4090801@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Heikki Lehvaslaiho wrote: > Hi Nat, > > These modules have not been touched for a while and were developed for a > specific task. A revire is defiitely in order. > > The way RNAChange->label was written, it should return 'inframe' when given no > alleles, but 'no change' would actually be better. Wouldn't this effectively be changing the API since past scripts "could" expect "inframe" to be returned. > > The multiple alleles were originally though to be a good idea, but the > vocabulary for labels was developed for single allele, only, The use of the > module ended up being limited to single allele, so add_allele() behaviour was > conveniently ignored but not removed. :( So add_Allele() and each_Allele() should be deprecated in favour of allele_mut()? - From my post about API's.....how should the capitalisation of add_Allele() and each_Allele() be changed? Cheers Nath > > -Heikki > > > > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: >> Nathan S. Haigh wrote: >>> I'm taking a look at the tests for Bio::Variation::RNAChange. >>> >>> If you create a new oject without arguments: >>> my $obj = Bio::Variation::RNAChange->new(); >>> >>> What do you expect the following to return: >>> $obj->label(); >>> >>> I thought it would probably be: >>> 'inframe' >>> >>> However you get: >>> 'inframe, deletion' >>> >>> Can anyone in the know explain what behaviour would be expected? >>> >>> Cheers >>> Nath >> Following on from this, AAChange has the following two methods: >> add_Allele() and allele_mut() >> >> It appears that allele_mut is only capable of remembering 1 allele at a >> time, whereas add_Allele() is provided to add support for mutliple >> alleles - is that correct? >> >> However, add_Allele() also calls allele_mut(), such that mutliple calls >> to add_Allele will result in the overwriting of the allele being >> remembered by allele_mut(). Things are further complicated by the fact >> that label() uses allele_mut() to decide on the label to return. >> Shouldn't label know aout multiple alleles set by multiple calls to >> add_Allele? >> >> It may be my lack of understanding alleles and what these classes are >> intending to do, but trying to rewrite the test scripts to improve code >> coverage has let me a little confused! >> >> Thanks >> Nath >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGjmMtczuW2jkwy2gRAgQHAKC+S5mVh4lqR95NmgR6z+aU9br5lQCfc6ue GBHuSHfsesX1ko55s+ME2Zc= =tkG8 -----END PGP SIGNATURE----- From cjfields at uiuc.edu Sat Jul 7 20:57:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 7 Jul 2007 15:57:37 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <1183726832.2566.34.camel@localhost.localdomain> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <1183726832.2566.34.camel@localhost.localdomain> Message-ID: <198D3F24-8510-453D-9201-21F2CCEC3519@uiuc.edu> We'll prob. get a start soon, then. I'll let you know when we start. chris On Jul 6, 2007, at 8:00 AM, Scott Cain wrote: > On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote: >> >> I think we should just bite the bullet and proceed with pulling out >> the controversial operator overloading in Bio::Annotation*, deprecate >> the tag methods in AnnotatableI, and go about fixing everything up. >> If that occurs (which seems to be the major impediment) and we get >> GMOD/GBrowse playing well with BioPerl then we can aim for a new >> stable release, and then institute a regular release cycle. >> > I think this sounds like a good idea to me too. I'm planning on > having > a GMOD hackathon at the end of the summer; if I had a new API by then, > we could focus on fixing anything that gets broken by the changes. > > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sat Jul 7 21:17:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 7 Jul 2007 16:17:14 -0500 Subject: [Bioperl-l] API Changes In-Reply-To: <468E61AF.9040106@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> <468DEA9D.6010809@sheffield.ac.uk> <468E61AF.9040106@sheffield.ac.uk> Message-ID: <369F72D5-E5A3-4A33-BDEC-D462A339474F@uiuc.edu> On Jul 6, 2007, at 10:37 AM, Nathan S. Haigh wrote: > ... > Hmm, still not 100% clear - it is Friday! > > So, someone running a script that was designed when 1.4 was released > should still be able to run their script for all future releases. > So all > changes need to be backward compatible? It helps. For instance, if we change method names (rename each_Foo as next_Foo), we should have each_Foo delegate to next_Foo for the time being. If we plan on deprecating the old method altogether we would add a warning message when it's called, then delegate. It's a better solution than just changing the method outright, which means the user has to search through docs to find the renamed method. > So you have several situations regarding method names: > 1) Adding new methods should e fine since past scripts don't know > about > them and won't have used them > 2) Removing methods would break past scripts that used them > 3) Renamed methods would break past scripts that used the old name > > A stable API to me, means the same method calls should still be > able to > accept the same arguments (inc the constructor) and return the same > object/data etc. Yes. > What if a module is pretty outdated and would benefit from a rewrite - > should all the old method names be included, what if this makes coding > difficult? It depends on the module. If a complete rewrite is needed then maybe starting with a new module/interface is best, and we could deprecate the older module completely. That has been done already with Bio::Tools::BPLite (in favor of SearchIO) and a few other modules. >> There are a few methods which are considered deprecated or will be >> deprecated. For instance, we recently talked about changes to method >> names which use case to specify whether you're receiving an object >> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. >> nested >> list, or whether to use each_* vs next_* for iterators. >> Consistency is >> nice! >> > > You mean the use of case to signify objects vs data being returned are > to be deprecated or encouraged? What was the outcome of the each_* vs > next_*? > > Nath Here's the section I added to the wiki (it started in a thread a few weeks or so ago, so it's a summary really): http://www.bioperl.org/wiki/Advanced_BioPerl#Method_names Feel free to add to it or make suggestions. BTWm Hilmar mentioned there was a movement to rename methods in old code to follow these recs but it was never completed. It should be taken up again at some point but the recommendations are mainly here for newer code. chris From heikki at sanbi.ac.za Sun Jul 8 07:32:21 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Sun, 8 Jul 2007 09:32:21 +0200 Subject: [Bioperl-l] Bio::Variation::RNAChange In-Reply-To: <468E632D.4090801@sheffield.ac.uk> References: <468E2316.1030804@sheffield.ac.uk> <200707061520.27000.heikki@sanbi.ac.za> <468E632D.4090801@sheffield.ac.uk> Message-ID: <200707080932.21818.heikki@sanbi.ac.za> On Friday 06 July 2007 17:43:41 Nathan S. Haigh wrote: > Heikki Lehvaslaiho wrote: > > Hi Nat, > > > > These modules have not been touched for a while and were developed for a > > specific task. A revire is defiitely in order. > > > > The way RNAChange->label was written, it should return 'inframe' when > > given no alleles, but 'no change' would actually be better. > > Wouldn't this effectively be changing the API since past scripts "could" > expect "inframe" to be returned. Checking tha actal usage and what happens when you do change of a nucleotide to itself, you get the label 'silent'. I guess that would be a valid lable value even when the alleles are not initialised, too. > > The multiple alleles were originally though to be a good idea, but the > > vocabulary for labels was developed for single allele, only, The use of > > the module ended up being limited to single allele, so add_allele() > > behaviour was conveniently ignored but not removed. :( > > So add_Allele() and each_Allele() should be deprecated in favour of > allele_mut()? Yes. > From my post about API's.....how should the capitalisation of > add_Allele() and each_Allele() be changed? Definitely, keept the current ones as deprecated alternatives. -Heikki > Cheers > Nath > > > -Heikki > > > > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote: > >> Nathan S. Haigh wrote: > >>> I'm taking a look at the tests for Bio::Variation::RNAChange. > >>> > >>> If you create a new oject without arguments: > >>> my $obj = Bio::Variation::RNAChange->new(); > >>> > >>> What do you expect the following to return: > >>> $obj->label(); > >>> > >>> I thought it would probably be: > >>> 'inframe' > >>> > >>> However you get: > >>> 'inframe, deletion' > >>> > >>> Can anyone in the know explain what behaviour would be expected? > >>> > >>> Cheers > >>> Nath > >> > >> Following on from this, AAChange has the following two methods: > >> add_Allele() and allele_mut() > >> > >> It appears that allele_mut is only capable of remembering 1 allele at a > >> time, whereas add_Allele() is provided to add support for mutliple > >> alleles - is that correct? > >> > >> However, add_Allele() also calls allele_mut(), such that mutliple calls > >> to add_Allele will result in the overwriting of the allele being > >> remembered by allele_mut(). Things are further complicated by the fact > >> that label() uses allele_mut() to decide on the label to return. > >> Shouldn't label know aout multiple alleles set by multiple calls to > >> add_Allele? > >> > >> It may be my lack of understanding alleles and what these classes are > >> intending to do, but trying to rewrite the test scripts to improve code > >> coverage has let me a little confused! > >> > >> Thanks > >> Nath > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From xing.y.hu at gmail.com Mon Jul 9 06:26:40 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Mon, 09 Jul 2007 14:26:40 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? Message-ID: <4691D520.60700@gmail.com> Hi friends, I wrote a script for getting genomic sequence file from GenBank. To fulfill that target, I used DB::GenBank module to get the sequence via get_Seq_by_acc, and it works well. But this time, facing enormous amount of ESTs, I have no idea how to download them swiftly and elegantly. PROBLEM DESCRIPTION: goal: download all EST files of a specific species from GenBank, say Arabidopsis Thaliana or Oryza sativa(rice). other: whether all of ESTs are in a single file or separatedly placed does not matter. Can I use a bioperl script to achieve that? And How? I really appreciate. Xing. From akozik at atgc.org Mon Jul 9 12:25:14 2007 From: akozik at atgc.org (Alexander Kozik) Date: Mon, 09 Jul 2007 05:25:14 -0700 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <4691D520.60700@gmail.com> References: <4691D520.60700@gmail.com> Message-ID: <4692292A.1080900@atgc.org> To download genomic sequences or ESTs for any organism (in various formats) you can use NCBI Taxonomy Browser: http://www.ncbi.nlm.nih.gov/Taxonomy/ you can use taxonomy id to access different organisms, Arabidopsis for example (3702): http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 or by direct web link: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 assembled genomes can be accessed via ftp: ftp://ftp.ncbi.nih.gov/genomes/ To download large amount of selected sequences (ESTs for example) you can use batch Entrez: http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide (select EST for EST, it's critical) It seems, to solve the problem you describe, you don't need to use bioperl. NCBI GenBank Entrez provides all necessary tools to work on these simple and frequent tasks. -Alex -- Alexander Kozik Bioinformatics Specialist Genome and Biomedical Sciences Facility 451 East Health Sciences Drive University of California Davis, CA 95616-8816 Phone: (530) 754-9127 email#1: akozik at atgc.org email#2: akozik at gmail.com web: http://www.atgc.org/ Xing Hu wrote: > Hi friends, > > I wrote a script for getting genomic sequence file from GenBank. To > fulfill that target, I used DB::GenBank module to get the sequence via > get_Seq_by_acc, and it works well. But this time, facing enormous amount > of ESTs, I have no idea how to download them swiftly and elegantly. > > PROBLEM DESCRIPTION: > goal: download all EST files of a specific species from GenBank, say > Arabidopsis Thaliana or Oryza sativa(rice). > other: whether all of ESTs are in a single file or separatedly > placed does not matter. > > Can I use a bioperl script to achieve that? And How? I really > appreciate. > > Xing. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Jul 9 14:17:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 9 Jul 2007 09:17:23 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <4692292A.1080900@atgc.org> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> Message-ID: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> Caveat: if you have millions of ESTs please consider NOT using my eutil script below or NCBI Batch Entrez, which would repeatedly hit the NCBI server thousands of times. At least try looking for other ways to retrieve the data you want (ftp, organism-specific resources like Ensembl, so on), or run any scripts or data retrieval in off hours so you don't overtax the NCBI server. There is a way you can use BioPerl if you don't mind living on the bleeding edge by using bioperl-live (core code from CVS). I have been working on a set of modules for the last year (Bio::DB::EUtilities) which interact with all the various eutils for building data pipelines which uses the NCBI CGI interface. You could possibly retrieve all relevant ESTs using a variation of the example script here: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch Note that the code examples do NOT work with rel. 1.5.2 code as the API has changed quite a bit; I'm working to rectify some of that. The script I would use is below. It retrieves batches of 500 sequences (in fasta format) at a time, for a total of 10000 max seq records, saving the raw record data directly to a file (appending as you go along). I added an eval block to check the server status and redo the call up to 4 times before giving up completely. Using eval this way hasn't been extensively tested but should work. --------------------------------------- use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'nucest', -term => 'txid3702', -usehistory => 'y', -keep_histories => 1); my $count = $factory->get_count; print "Count: $count\n"; if (my $hist = $factory->next_History) { print "History returned\n"; # note db carries over from above $factory->set_parameters(-eutil => 'efetch', -rettype => 'fasta', -history => $hist); my ($retmax, $retstart) = (500,0); my $retry = 1; my $maxcount = $count < 10000 ? $count : 10000; # set max # seq records to return RETRIEVE_SEQS: while ($retstart < $maxcount) { print "Returning from ",$retstart+1," to ",$retstart+ $retmax,"\n"; $factory->set_parameters(-retmax => $retmax, -retstart => $retstart); # check in case of server error eval{ $factory->get_Response(-file => ">>ESTs.fas"); }; if ($@) { die "Server error: $@. Try again later" if $retry == 5; print STDERR "Server error, redo #$retry\n"; $retry++ && redo RETRIEVE_SEQS; } $retstart += $retmax; } } --------------------------------------- chris On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: > To download genomic sequences or ESTs for any organism (in various > formats) you can use NCBI Taxonomy Browser: > http://www.ncbi.nlm.nih.gov/Taxonomy/ > > you can use taxonomy id to access different organisms, Arabidopsis for > example (3702): > http://www.ncbi.nlm.nih.gov/sites/entrez? > db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 > > or by direct web link: > http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? > mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 > > assembled genomes can be accessed via ftp: > ftp://ftp.ncbi.nih.gov/genomes/ > > To download large amount of selected sequences (ESTs for example) you > can use batch Entrez: > http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html > http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide > (select EST for EST, it's critical) > > It seems, to solve the problem you describe, you don't need to use > bioperl. NCBI GenBank Entrez provides all necessary tools to work on > these simple and frequent tasks. > > -Alex > > -- > Alexander Kozik > Bioinformatics Specialist > Genome and Biomedical Sciences Facility > 451 East Health Sciences Drive > University of California > Davis, CA 95616-8816 > Phone: (530) 754-9127 > email#1: akozik at atgc.org > email#2: akozik at gmail.com > web: http://www.atgc.org/ > > > > Xing Hu wrote: >> Hi friends, >> >> I wrote a script for getting genomic sequence file from >> GenBank. To >> fulfill that target, I used DB::GenBank module to get the sequence >> via >> get_Seq_by_acc, and it works well. But this time, facing enormous >> amount >> of ESTs, I have no idea how to download them swiftly and elegantly. >> >> PROBLEM DESCRIPTION: >> goal: download all EST files of a specific species from >> GenBank, say >> Arabidopsis Thaliana or Oryza sativa(rice). >> other: whether all of ESTs are in a single file or separatedly >> placed does not matter. >> >> Can I use a bioperl script to achieve that? And How? I really >> appreciate. >> >> Xing. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Mon Jul 9 18:08:07 2007 From: jason at bioperl.org (Jason Stajich) Date: Mon, 9 Jul 2007 11:08:07 -0700 Subject: [Bioperl-l] Unrooting a tree In-Reply-To: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> Message-ID: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> I don't think there is a function for this yet but it would be a good one to have. I assume you don't really want to take a shot at writing it though? To make this work I think you have to create a new node which contains the trifurcation and this node is what the root is set to. -jason On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote: > Hi, > > I am reading a rooted tree in newick format from a string (i.e. a > bifurcation at the root) and would like to unroot it (i.e. a > trifurcation at the root). I tried getting a grandchild of the root > and adding it as a direct child, but that does not seem to work (the > root still only has two descendents and the tree structure gets messed > up). Is there a nice way to do this directly in bioperl? Doing it on > the newick string is possible of course, but not nice. > > Thanks > Felix > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From lstein at cshl.edu Mon Jul 9 21:35:49 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 9 Jul 2007 17:35:49 -0400 Subject: [Bioperl-l] JOB NOTICE: Looking for CSHL bioinformatics core manager Message-ID: <6dce9a0b0707091435h3d134b05oa6f7da24839c24bb@mail.gmail.com> Hi Folks, Sorry for the job spam. We're looking for a manager of the Cold Spring Harbor Laboratory bioinformatics core facility. This is a semi-independent staff position supporting CSHL scientific researchers by providing consultation, data mining and software development activities. You will have a software staff of two, a nice salary, good health benefits, and an exciting and dynamic environment to work in. I'm looking for someone with a strong bioinformatics background, at least five years experience programming Perl, Java or Python in a academic or commercial environment, and management experience. If you are interested, please send your CV and cover letter to me. Thanks, Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From stewarta at nmrc.navy.mil Mon Jul 9 22:16:12 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Mon, 9 Jul 2007 18:16:12 -0400 Subject: [Bioperl-l] rpsblast Message-ID: <9DF71DFB-F54E-4392-89E3-33345EC2DB36@nmrc.navy.mil> When I run... $result = $factory->rpsblast($seq); ... where $seq is a Bio::Seq object, it seems to simply copy the $seq object to $result; When I run something similar... $rpsblast('/path/to/ myFile'); ... the value of $result then becomes '/path/to/myFile'. Anyone else encounter this? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From jason_stajich at berkeley.edu Tue Jul 10 01:36:10 2007 From: jason_stajich at berkeley.edu (Jason Stajich) Date: Mon, 9 Jul 2007 18:36:10 -0700 Subject: [Bioperl-l] BOSC2007 Message-ID: I posted a quick note about meeting up at BOSC/ISMB this year. If you are attending, please sign your name on the page or at least express an interest on whether you are interested in a BoF. We'll try and discuss some of the current topics in BioPerl development as well try and use the time to coordinate any development that benefits from the face-to-face time. http://bioperl.org/wiki/BOSC2007_Meetup http://bioperl.org/news/2007/07/09/are-you-going-to-ismbbosc-2007/ -jason -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ From schlesi at ebi.ac.uk Tue Jul 10 12:58:00 2007 From: schlesi at ebi.ac.uk (Felix Schlesinger) Date: Tue, 10 Jul 2007 13:58:00 +0100 Subject: [Bioperl-l] Unrooting a tree In-Reply-To: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com> <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org> Message-ID: <7317d50c0707100558m76853bf8s37ee1e8852835306@mail.gmail.com> Hi, > I don't think there is a function for this yet but it would be a good one > to have. > I assume you don't really want to take a shot at writing it though? > To make this work I think you have to create a new node which contains the > trifurcation and this node is what the root is set to. Creating a new root is fine, but what would the (3) children of that node be? I took a different approach now, where I iterate over all (indirect) descendents of the root, find the first one which does not have the root as its direct ancestor and move it up the tree, i.e. foreach my $d ($root->get_all_Descendents){ if ($d->ancestor != $root){ $d->ancestor->remove_Descendent($d); if ($root->add_Descendent($d, 1) == 3){ last; }}} This will make the old root a trifurcation. It does the right thing for what I am trying to do, but is not general I believe (it does for example at the moment not worry about branch length). Also instead of taking the first, taking the most distant possible subtree of a clade up to the root might be better. Felix > On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote: > > Hi, > > I am reading a rooted tree in newick format from a string (i.e. a > bifurcation at the root) and would like to unroot it (i.e. a > trifurcation at the root). I tried getting a grandchild of the root > and adding it as a direct child, but that does not seem to work (the > root still only has two descendents and the tree structure gets messed > up). Is there a nice way to do this directly in bioperl? Doing it on > the newick string is possible of course, but not nice. > > Thanks > Felix > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > From xing.y.hu at gmail.com Tue Jul 10 13:29:36 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Tue, 10 Jul 2007 21:29:36 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> Message-ID: <469389C0.5060303@gmail.com> Thanks you guys. I had to confess that how stupid I was. The easiest way seems to be the way using NCBI Taxonomy Browser which suggested by alex. As a matter of fact, I knew that but I thought it was necessary to have all items selected before pressing save to launch download. So I was desperate to find a button that could achieve that without hundreds of thousands of clicking by me. "What about select none of those items at all?" -- This idea finally came to me after days of struggling and the problem was solved. Xing Chris Fields wrote: > Caveat: if you have millions of ESTs please consider NOT using my > eutil script below or NCBI Batch Entrez, which would repeatedly hit > the NCBI server thousands of times. At least try looking for other > ways to retrieve the data you want (ftp, organism-specific resources > like Ensembl, so on), or run any scripts or data retrieval in off > hours so you don't overtax the NCBI server. > > There is a way you can use BioPerl if you don't mind living on the > bleeding edge by using bioperl-live (core code from CVS). I have been > working on a set of modules for the last year (Bio::DB::EUtilities) > which interact with all the various eutils for building data pipelines > which uses the NCBI CGI interface. You could possibly retrieve all > relevant ESTs using a variation of the example script here: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch > > Note that the code examples do NOT work with rel. 1.5.2 code as the > API has changed quite a bit; I'm working to rectify some of that. > > The script I would use is below. It retrieves batches of 500 > sequences (in fasta format) at a time, for a total of 10000 max seq > records, saving the raw record data directly to a file (appending as > you go along). I added an eval block to check the server status and > redo the call up to 4 times before giving up completely. Using eval > this way hasn't been extensively tested but should work. > > --------------------------------------- > > use Bio::DB::EUtilities; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'nucest', > -term => 'txid3702', > -usehistory => 'y', > -keep_histories => 1); > > my $count = $factory->get_count; > > print "Count: $count\n"; > > if (my $hist = $factory->next_History) { > print "History returned\n"; > # note db carries over from above > $factory->set_parameters(-eutil => 'efetch', > -rettype => 'fasta', > -history => $hist); > my ($retmax, $retstart) = (500,0); > my $retry = 1; > my $maxcount = $count < 10000 ? $count : 10000; # set max # seq > records to return > RETRIEVE_SEQS: > while ($retstart < $maxcount) { > print "Returning from ",$retstart+1," to > ",$retstart+$retmax,"\n"; > $factory->set_parameters(-retmax => $retmax, > -retstart => $retstart); > # check in case of server error > eval{ > $factory->get_Response(-file => ">>ESTs.fas"); > }; > if ($@) { > die "Server error: $@. Try again later" if $retry == 5; > print STDERR "Server error, redo #$retry\n"; > $retry++ && redo RETRIEVE_SEQS; > } > $retstart += $retmax; > } > } > > > --------------------------------------- > > > chris > > On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: > >> To download genomic sequences or ESTs for any organism (in various >> formats) you can use NCBI Taxonomy Browser: >> http://www.ncbi.nlm.nih.gov/Taxonomy/ >> >> you can use taxonomy id to access different organisms, Arabidopsis for >> example (3702): >> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >> >> >> or by direct web link: >> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >> >> >> assembled genomes can be accessed via ftp: >> ftp://ftp.ncbi.nih.gov/genomes/ >> >> To download large amount of selected sequences (ESTs for example) you >> can use batch Entrez: >> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >> (select EST for EST, it's critical) >> >> It seems, to solve the problem you describe, you don't need to use >> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >> these simple and frequent tasks. >> >> -Alex >> >> --Alexander Kozik >> Bioinformatics Specialist >> Genome and Biomedical Sciences Facility >> 451 East Health Sciences Drive >> University of California >> Davis, CA 95616-8816 >> Phone: (530) 754-9127 >> email#1: akozik at atgc.org >> email#2: akozik at gmail.com >> web: http://www.atgc.org/ >> >> >> >> Xing Hu wrote: >>> Hi friends, >>> >>> I wrote a script for getting genomic sequence file from GenBank. To >>> fulfill that target, I used DB::GenBank module to get the sequence via >>> get_Seq_by_acc, and it works well. But this time, facing enormous >>> amount >>> of ESTs, I have no idea how to download them swiftly and elegantly. >>> >>> PROBLEM DESCRIPTION: >>> goal: download all EST files of a specific species from GenBank, >>> say >>> Arabidopsis Thaliana or Oryza sativa(rice). >>> other: whether all of ESTs are in a single file or separatedly >>> placed does not matter. >>> >>> Can I use a bioperl script to achieve that? And How? I really >>> appreciate. >>> >>> Xing. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From davila at ioc.fiocruz.br Tue Jul 10 13:58:29 2007 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Tue, 10 Jul 2007 10:58:29 -0300 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <469389C0.5060303@gmail.com> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> Message-ID: <46939085.40906@ioc.fiocruz.br> Hi Xing, Unfortunately that did not work for me... there are 5133 T. brucei ESTs (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) and 13971 from T. cruzi (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) that I cannot download at once in GenBank format... even when I select "GenBank" format in the Display menu I can only see and get/download 500 ESTs each time... I also downloaded all ESTs from GenBank (a pity there are not subsets of them !) but merging all them generate a file bigger than 120GB to be processed... Just asked Diogo (my student) to give a try to the script sent by Chris Fields.. so finger crossed ;-) Cheers, Alberto Xing Hu wrote: > Thanks you guys. > > I had to confess that how stupid I was. The easiest way seems to be the > way using NCBI Taxonomy Browser which suggested by alex. As a matter of > fact, I knew that but I thought it was necessary to have all items > selected before pressing save to launch download. So I was desperate to > find a button that could achieve that without hundreds of thousands of > clicking by me. "What about select none of those items at all?" -- This > idea finally came to me after days of struggling and the problem was solved. > > Xing > > > > Chris Fields wrote: >> Caveat: if you have millions of ESTs please consider NOT using my >> eutil script below or NCBI Batch Entrez, which would repeatedly hit >> the NCBI server thousands of times. At least try looking for other >> ways to retrieve the data you want (ftp, organism-specific resources >> like Ensembl, so on), or run any scripts or data retrieval in off >> hours so you don't overtax the NCBI server. >> >> There is a way you can use BioPerl if you don't mind living on the >> bleeding edge by using bioperl-live (core code from CVS). I have been >> working on a set of modules for the last year (Bio::DB::EUtilities) >> which interact with all the various eutils for building data pipelines >> which uses the NCBI CGI interface. You could possibly retrieve all >> relevant ESTs using a variation of the example script here: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch >> >> Note that the code examples do NOT work with rel. 1.5.2 code as the >> API has changed quite a bit; I'm working to rectify some of that. >> >> The script I would use is below. It retrieves batches of 500 >> sequences (in fasta format) at a time, for a total of 10000 max seq >> records, saving the raw record data directly to a file (appending as >> you go along). I added an eval block to check the server status and >> redo the call up to 4 times before giving up completely. Using eval >> this way hasn't been extensively tested but should work. >> >> --------------------------------------- >> >> use Bio::DB::EUtilities; >> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -db => 'nucest', >> -term => 'txid3702', >> -usehistory => 'y', >> -keep_histories => 1); >> >> my $count = $factory->get_count; >> >> print "Count: $count\n"; >> >> if (my $hist = $factory->next_History) { >> print "History returned\n"; >> # note db carries over from above >> $factory->set_parameters(-eutil => 'efetch', >> -rettype => 'fasta', >> -history => $hist); >> my ($retmax, $retstart) = (500,0); >> my $retry = 1; >> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >> records to return >> RETRIEVE_SEQS: >> while ($retstart < $maxcount) { >> print "Returning from ",$retstart+1," to >> ",$retstart+$retmax,"\n"; >> $factory->set_parameters(-retmax => $retmax, >> -retstart => $retstart); >> # check in case of server error >> eval{ >> $factory->get_Response(-file => ">>ESTs.fas"); >> }; >> if ($@) { >> die "Server error: $@. Try again later" if $retry == 5; >> print STDERR "Server error, redo #$retry\n"; >> $retry++ && redo RETRIEVE_SEQS; >> } >> $retstart += $retmax; >> } >> } >> >> >> --------------------------------------- >> >> >> chris >> >> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >> >>> To download genomic sequences or ESTs for any organism (in various >>> formats) you can use NCBI Taxonomy Browser: >>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>> >>> you can use taxonomy id to access different organisms, Arabidopsis for >>> example (3702): >>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>> >>> >>> or by direct web link: >>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>> >>> >>> assembled genomes can be accessed via ftp: >>> ftp://ftp.ncbi.nih.gov/genomes/ >>> >>> To download large amount of selected sequences (ESTs for example) you >>> can use batch Entrez: >>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>> (select EST for EST, it's critical) >>> >>> It seems, to solve the problem you describe, you don't need to use >>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >>> these simple and frequent tasks. >>> >>> -Alex >>> >>> --Alexander Kozik >>> Bioinformatics Specialist >>> Genome and Biomedical Sciences Facility >>> 451 East Health Sciences Drive >>> University of California >>> Davis, CA 95616-8816 >>> Phone: (530) 754-9127 >>> email#1: akozik at atgc.org >>> email#2: akozik at gmail.com >>> web: http://www.atgc.org/ >>> >>> >>> >>> Xing Hu wrote: >>>> Hi friends, >>>> >>>> I wrote a script for getting genomic sequence file from GenBank. To >>>> fulfill that target, I used DB::GenBank module to get the sequence via >>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>> amount >>>> of ESTs, I have no idea how to download them swiftly and elegantly. >>>> >>>> PROBLEM DESCRIPTION: >>>> goal: download all EST files of a specific species from GenBank, >>>> say >>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>> other: whether all of ESTs are in a single file or separatedly >>>> placed does not matter. >>>> >>>> Can I use a bioperl script to achieve that? And How? I really >>>> appreciate. >>>> >>>> Xing. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> From cjfields at uiuc.edu Tue Jul 10 14:05:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 09:05:43 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <46939085.40906@ioc.fiocruz.br> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> Message-ID: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> Just make sure you're using the latest from CVS. Let me know if it doesn't work and I'll look into it. chris On Jul 10, 2007, at 8:58 AM, Alberto Davila wrote: > Hi Xing, > > Unfortunately that did not work for me... there are 5133 T. brucei > ESTs > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691 > [Organism:exp]&cmd=Search&db=nucest&QueryKey=8) > and 13971 from T. cruzi > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693 > [Organism:exp]&cmd=Search&db=nucest&QueryKey=11) > that I cannot download at once in GenBank format... even when I > select > "GenBank" format in the Display menu I can only see and get/ > download 500 > ESTs each time... > > I also downloaded all ESTs from GenBank (a pity there are not > subsets of > them !) but merging all them generate a file bigger than 120GB to be > processed... > > Just asked Diogo (my student) to give a try to the script sent by > Chris > Fields.. so finger crossed ;-) > > Cheers, Alberto > > > Xing Hu wrote: >> Thanks you guys. >> >> I had to confess that how stupid I was. The easiest way seems to >> be the >> way using NCBI Taxonomy Browser which suggested by alex. As a >> matter of >> fact, I knew that but I thought it was necessary to have all items >> selected before pressing save to launch download. So I was >> desperate to >> find a button that could achieve that without hundreds of >> thousands of >> clicking by me. "What about select none of those items at all?" -- >> This >> idea finally came to me after days of struggling and the problem >> was solved. >> >> Xing >> >> >> >> Chris Fields wrote: >>> Caveat: if you have millions of ESTs please consider NOT using my >>> eutil script below or NCBI Batch Entrez, which would repeatedly hit >>> the NCBI server thousands of times. At least try looking for other >>> ways to retrieve the data you want (ftp, organism-specific resources >>> like Ensembl, so on), or run any scripts or data retrieval in off >>> hours so you don't overtax the NCBI server. >>> >>> There is a way you can use BioPerl if you don't mind living on the >>> bleeding edge by using bioperl-live (core code from CVS). I have >>> been >>> working on a set of modules for the last year (Bio::DB::EUtilities) >>> which interact with all the various eutils for building data >>> pipelines >>> which uses the NCBI CGI interface. You could possibly retrieve all >>> relevant ESTs using a variation of the example script here: >>> >>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-. >>> 3Eefetch >>> >>> Note that the code examples do NOT work with rel. 1.5.2 code as the >>> API has changed quite a bit; I'm working to rectify some of that. >>> >>> The script I would use is below. It retrieves batches of 500 >>> sequences (in fasta format) at a time, for a total of 10000 max seq >>> records, saving the raw record data directly to a file (appending as >>> you go along). I added an eval block to check the server status and >>> redo the call up to 4 times before giving up completely. Using eval >>> this way hasn't been extensively tested but should work. >>> >>> --------------------------------------- >>> >>> use Bio::DB::EUtilities; >>> >>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >>> -db => 'nucest', >>> -term => 'txid3702', >>> -usehistory => 'y', >>> -keep_histories => 1); >>> >>> my $count = $factory->get_count; >>> >>> print "Count: $count\n"; >>> >>> if (my $hist = $factory->next_History) { >>> print "History returned\n"; >>> # note db carries over from above >>> $factory->set_parameters(-eutil => 'efetch', >>> -rettype => 'fasta', >>> -history => $hist); >>> my ($retmax, $retstart) = (500,0); >>> my $retry = 1; >>> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >>> records to return >>> RETRIEVE_SEQS: >>> while ($retstart < $maxcount) { >>> print "Returning from ",$retstart+1," to >>> ",$retstart+$retmax,"\n"; >>> $factory->set_parameters(-retmax => $retmax, >>> -retstart => $retstart); >>> # check in case of server error >>> eval{ >>> $factory->get_Response(-file => ">>ESTs.fas"); >>> }; >>> if ($@) { >>> die "Server error: $@. Try again later" if $retry == 5; >>> print STDERR "Server error, redo #$retry\n"; >>> $retry++ && redo RETRIEVE_SEQS; >>> } >>> $retstart += $retmax; >>> } >>> } >>> >>> >>> --------------------------------------- >>> >>> >>> chris >>> >>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >>> >>>> To download genomic sequences or ESTs for any organism (in various >>>> formats) you can use NCBI Taxonomy Browser: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>>> >>>> you can use taxonomy id to access different organisms, >>>> Arabidopsis for >>>> example (3702): >>>> http://www.ncbi.nlm.nih.gov/sites/entrez? >>>> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>>> >>>> >>>> or by direct web link: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? >>>> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>>> >>>> >>>> assembled genomes can be accessed via ftp: >>>> ftp://ftp.ncbi.nih.gov/genomes/ >>>> >>>> To download large amount of selected sequences (ESTs for >>>> example) you >>>> can use batch Entrez: >>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>>> (select EST for EST, it's critical) >>>> >>>> It seems, to solve the problem you describe, you don't need to use >>>> bioperl. NCBI GenBank Entrez provides all necessary tools to >>>> work on >>>> these simple and frequent tasks. >>>> >>>> -Alex >>>> >>>> --Alexander Kozik >>>> Bioinformatics Specialist >>>> Genome and Biomedical Sciences Facility >>>> 451 East Health Sciences Drive >>>> University of California >>>> Davis, CA 95616-8816 >>>> Phone: (530) 754-9127 >>>> email#1: akozik at atgc.org >>>> email#2: akozik at gmail.com >>>> web: http://www.atgc.org/ >>>> >>>> >>>> >>>> Xing Hu wrote: >>>>> Hi friends, >>>>> >>>>> I wrote a script for getting genomic sequence file from >>>>> GenBank. To >>>>> fulfill that target, I used DB::GenBank module to get the >>>>> sequence via >>>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>>> amount >>>>> of ESTs, I have no idea how to download them swiftly and >>>>> elegantly. >>>>> >>>>> PROBLEM DESCRIPTION: >>>>> goal: download all EST files of a specific species from >>>>> GenBank, >>>>> say >>>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>>> other: whether all of ESTs are in a single file or separatedly >>>>> placed does not matter. >>>>> >>>>> Can I use a bioperl script to achieve that? And How? I really >>>>> appreciate. >>>>> >>>>> Xing. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From diogoat at gmail.com Tue Jul 10 14:15:20 2007 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 10 Jul 2007 11:15:20 -0300 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> Message-ID: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> Deal All, I use this script bellow, and it`s work very fine! I only changed the query! And the script gave me the 5133 EST from T. brucei. ################################################################################# use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; my $query = Bio::DB::Query::GenBank->new (-query =>'gbdiv est[prop] AND Trypanosoma brucei [organism]', db => 'nucleotide'); my $gb = new Bio::DB::GenBank; my $seqio = $gb->get_Stream_by_query($query); my $out = Bio::SeqIO->new(-format => 'Genbank', -file => '>>Tbrucei.EST.fasta'); while (my $seq = $seqio->next_seq){ $out->write_seq($seq); } #################################################################### Diogo Tschoeke/Fiocruz (Alberto`s Student) From cjfields at uiuc.edu Tue Jul 10 14:35:03 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 09:35:03 -0500 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu> <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com> Message-ID: <4D704A90-A88A-44A3-B514-E5031CBF288C@uiuc.edu> That will work as well; the key difference between my example and this one is that the seq stream retrieved using Bio::DB::GenBank passes through Bio::SeqIO while Bio::DB::EUtilities saves the raw seq record directly to a file (or callback or HTTP::Response) for optionally parsing later. If you have problems with Bio::SeqIO you can always use Bio::DB::EUtilities to get around the issue until we resolve it. chris On Jul 10, 2007, at 9:15 AM, Diogo Tschoeke wrote: > Deal All, > I use this script bellow, and it`s work very fine! > I only changed the query! And the script gave me the 5133 EST from T. > brucei. > > ###################################################################### > ########### > use strict; > use warnings; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > my $query = Bio::DB::Query::GenBank->new > (-query =>'gbdiv est[prop] AND > Trypanosoma > brucei [organism]', > db => 'nucleotide'); > my $gb = new Bio::DB::GenBank; > my $seqio = $gb->get_Stream_by_query($query); > > my $out = Bio::SeqIO->new(-format => 'Genbank', > -file => '>>Tbrucei.EST.fasta'); > while (my $seq = $seqio->next_seq){ > $out->write_seq($seq); > } > #################################################################### > > Diogo Tschoeke/Fiocruz (Alberto`s Student) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hartzell at alerce.com Tue Jul 10 16:50:31 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 12:50:31 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <18067.47319.254632.538811@almost.alerce.com> Jason Stajich writes: > [...] > Do you know how to have svn commit messages generate summary emails > as well? I've made a local installation of the SVN::Notify bits in my home directory and set up its notification script. If folks are happy with it then I'll work on getting The Powers That Be to do a real install and we'll use it for the real repository. It's currently configured to include diffs inline in the message. I prefer them as an attachment, but the current configuration of the bioperl-guts-l list stalls messages w/ attachments and requires admin intervention. I have a support@ request going on it and will change it if/when we get the issue resolved. So, to review: svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/ is the top of the repository and svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/bioperl-live/trunk will get you the main branch of bioperl-live. Remember that the repository is transient, don't put anything important in there.... Have at it, but remember that the entire world will see your commit messages. g. From xing.y.hu at gmail.com Tue Jul 10 17:08:35 2007 From: xing.y.hu at gmail.com (Xing Hu) Date: Wed, 11 Jul 2007 01:08:35 +0800 Subject: [Bioperl-l] How to download EST files via bioperl script? In-Reply-To: <46939085.40906@ioc.fiocruz.br> References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org> <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu> <469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br> Message-ID: <4693BD13.2070509@gmail.com> Hi Alberto, Yes, I know that there is only choice for showing no more than 500 entries on the NCBI website. However, I completely ignored that (doesn't mean that I have not seen that), and pulled down the "send to" and chose "file". Then a small window popped up, after saying yes to that, the downloading started. You might ask me how I know that it was not a batch of only 5 (default selection) or 500 ESTs? To be honest, I don't know at the first time. But the download has accumulated to millions bytes since then(due to my bad network condition, I have no idea when it will reach the end), and that doesn't look like a little batch of ESTs less than one thousand. Actually, I wrote a script to count the sequences within the temporary file and got a number much bigger than ten thousand. So I guess it works. BTW, I never thought Bio::DB::Genbank can do that! Again, thanks you guys! Xing Alberto Davila wrote: > Hi Xing, > > Unfortunately that did not work for me... there are 5133 T. brucei ESTs > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) > and 13971 from T. cruzi > (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) > that I cannot download at once in GenBank format... even when I select > "GenBank" format in the Display menu I can only see and get/download 500 > ESTs each time... > > I also downloaded all ESTs from GenBank (a pity there are not subsets of > them !) but merging all them generate a file bigger than 120GB to be > processed... > > Just asked Diogo (my student) to give a try to the script sent by Chris > Fields.. so finger crossed ;-) > > Cheers, Alberto > > > Xing Hu wrote: > >> Thanks you guys. >> >> I had to confess that how stupid I was. The easiest way seems to be the >> way using NCBI Taxonomy Browser which suggested by alex. As a matter of >> fact, I knew that but I thought it was necessary to have all items >> selected before pressing save to launch download. So I was desperate to >> find a button that could achieve that without hundreds of thousands of >> clicking by me. "What about select none of those items at all?" -- This >> idea finally came to me after days of struggling and the problem was solved. >> >> Xing >> >> >> >> Chris Fields wrote: >> >>> Caveat: if you have millions of ESTs please consider NOT using my >>> eutil script below or NCBI Batch Entrez, which would repeatedly hit >>> the NCBI server thousands of times. At least try looking for other >>> ways to retrieve the data you want (ftp, organism-specific resources >>> like Ensembl, so on), or run any scripts or data retrieval in off >>> hours so you don't overtax the NCBI server. >>> >>> There is a way you can use BioPerl if you don't mind living on the >>> bleeding edge by using bioperl-live (core code from CVS). I have been >>> working on a set of modules for the last year (Bio::DB::EUtilities) >>> which interact with all the various eutils for building data pipelines >>> which uses the NCBI CGI interface. You could possibly retrieve all >>> relevant ESTs using a variation of the example script here: >>> >>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch >>> >>> Note that the code examples do NOT work with rel. 1.5.2 code as the >>> API has changed quite a bit; I'm working to rectify some of that. >>> >>> The script I would use is below. It retrieves batches of 500 >>> sequences (in fasta format) at a time, for a total of 10000 max seq >>> records, saving the raw record data directly to a file (appending as >>> you go along). I added an eval block to check the server status and >>> redo the call up to 4 times before giving up completely. Using eval >>> this way hasn't been extensively tested but should work. >>> >>> --------------------------------------- >>> >>> use Bio::DB::EUtilities; >>> >>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', >>> -db => 'nucest', >>> -term => 'txid3702', >>> -usehistory => 'y', >>> -keep_histories => 1); >>> >>> my $count = $factory->get_count; >>> >>> print "Count: $count\n"; >>> >>> if (my $hist = $factory->next_History) { >>> print "History returned\n"; >>> # note db carries over from above >>> $factory->set_parameters(-eutil => 'efetch', >>> -rettype => 'fasta', >>> -history => $hist); >>> my ($retmax, $retstart) = (500,0); >>> my $retry = 1; >>> my $maxcount = $count < 10000 ? $count : 10000; # set max # seq >>> records to return >>> RETRIEVE_SEQS: >>> while ($retstart < $maxcount) { >>> print "Returning from ",$retstart+1," to >>> ",$retstart+$retmax,"\n"; >>> $factory->set_parameters(-retmax => $retmax, >>> -retstart => $retstart); >>> # check in case of server error >>> eval{ >>> $factory->get_Response(-file => ">>ESTs.fas"); >>> }; >>> if ($@) { >>> die "Server error: $@. Try again later" if $retry == 5; >>> print STDERR "Server error, redo #$retry\n"; >>> $retry++ && redo RETRIEVE_SEQS; >>> } >>> $retstart += $retmax; >>> } >>> } >>> >>> >>> --------------------------------------- >>> >>> >>> chris >>> >>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote: >>> >>> >>>> To download genomic sequences or ESTs for any organism (in various >>>> formats) you can use NCBI Taxonomy Browser: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/ >>>> >>>> you can use taxonomy id to access different organisms, Arabidopsis for >>>> example (3702): >>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 >>>> >>>> >>>> or by direct web link: >>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 >>>> >>>> >>>> assembled genomes can be accessed via ftp: >>>> ftp://ftp.ncbi.nih.gov/genomes/ >>>> >>>> To download large amount of selected sequences (ESTs for example) you >>>> can use batch Entrez: >>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html >>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide >>>> (select EST for EST, it's critical) >>>> >>>> It seems, to solve the problem you describe, you don't need to use >>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on >>>> these simple and frequent tasks. >>>> >>>> -Alex >>>> >>>> --Alexander Kozik >>>> Bioinformatics Specialist >>>> Genome and Biomedical Sciences Facility >>>> 451 East Health Sciences Drive >>>> University of California >>>> Davis, CA 95616-8816 >>>> Phone: (530) 754-9127 >>>> email#1: akozik at atgc.org >>>> email#2: akozik at gmail.com >>>> web: http://www.atgc.org/ >>>> >>>> >>>> >>>> Xing Hu wrote: >>>> >>>>> Hi friends, >>>>> >>>>> I wrote a script for getting genomic sequence file from GenBank. To >>>>> fulfill that target, I used DB::GenBank module to get the sequence via >>>>> get_Seq_by_acc, and it works well. But this time, facing enormous >>>>> amount >>>>> of ESTs, I have no idea how to download them swiftly and elegantly. >>>>> >>>>> PROBLEM DESCRIPTION: >>>>> goal: download all EST files of a specific species from GenBank, >>>>> say >>>>> Arabidopsis Thaliana or Oryza sativa(rice). >>>>> other: whether all of ESTs are in a single file or separatedly >>>>> placed does not matter. >>>>> >>>>> Can I use a bioperl script to achieve that? And How? I really >>>>> appreciate. >>>>> >>>>> Xing. >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bix at sendu.me.uk Tue Jul 10 17:14:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 10 Jul 2007 18:14:29 +0100 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.47319.254632.538811@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> Message-ID: <4693BE75.4090005@sendu.me.uk> George Hartzell wrote: > Jason Stajich writes: > > [...] > > Do you know how to have svn commit messages generate summary emails > > as well? > > I've made a local installation of the SVN::Notify bits in my home > directory and set up its notification script. If folks are happy with > it then I'll work on getting The Powers That Be to do a real install > and we'll use it for the real repository. > > It's currently configured to include diffs inline in the message. I > prefer them as an attachment, but the current configuration of the > bioperl-guts-l list stalls messages w/ attachments and requires admin > intervention. I have a support@ request going on it and will change > it if/when we get the issue resolved. Can I put a vote in that you don't? I search through email body text in my archive of guts to find certain diffs, so really like the diffs inline. Also, is there any way to get rid of the 'bioperl' in [bioperl revision] in the subject? Seems redundant and makes it harder to see what was changed in a small email client window. From aaron.j.mackey at gsk.com Tue Jul 10 17:20:15 2007 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 10 Jul 2007 13:20:15 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.47319.254632.538811@almost.alerce.com> Message-ID: George, this is all very nice to finally have, thank you for your efforts! Any chance that the diff-as-attachment vs. diffs-inline question can be different for each subscriber? The utility of the "guts" mailing list (to me) is that it's an encyclopedia of browsable, skimmable, and searchable diffs, not just a date-stamped record of diffs (if so, why provide an attachment at all, just provide a URL to the diff in the respository). Thanks again, -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 07/10/2007 12:50:31 PM: > Jason Stajich writes: > > [...] > > Do you know how to have svn commit messages generate summary emails > > as well? > > I've made a local installation of the SVN::Notify bits in my home > directory and set up its notification script. If folks are happy with > it then I'll work on getting The Powers That Be to do a real install > and we'll use it for the real repository. > > It's currently configured to include diffs inline in the message. I > prefer them as an attachment, but the current configuration of the > bioperl-guts-l list stalls messages w/ attachments and requires admin > intervention. I have a support@ request going on it and will change > it if/when we get the issue resolved. > > So, to review: > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/ > > is the top of the repository and > > svn co svn+ssh://dev.open-bio. > org/home/hartzell/bioperl_take2/bioperl-live/trunk > > will get you the main branch of bioperl-live. > > Remember that the repository is transient, don't put anything > important in there.... > > Have at it, but remember that the entire world will see your commit > messages. > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Jul 10 18:18:07 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Jul 2007 13:18:07 -0500 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <4693BE75.4090005@sendu.me.uk> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: On Jul 10, 2007, at 12:14 PM, Sendu Bala wrote: > George Hartzell wrote: >> Jason Stajich writes: >>> [...] >>> Do you know how to have svn commit messages generate summary emails >>> as well? >> >> I've made a local installation of the SVN::Notify bits in my home >> directory and set up its notification script. If folks are happy >> with >> it then I'll work on getting The Powers That Be to do a real install >> and we'll use it for the real repository. >> >> It's currently configured to include diffs inline in the message. I >> prefer them as an attachment, but the current configuration of the >> bioperl-guts-l list stalls messages w/ attachments and requires admin >> intervention. I have a support@ request going on it and will change >> it if/when we get the issue resolved. > > Can I put a vote in that you don't? I search through email body > text in > my archive of guts to find certain diffs, so really like the diffs > inline. > > Also, is there any way to get rid of the 'bioperl' in [bioperl > revision] > in the subject? Seems redundant and makes it harder to see what was > changed in a small email client window. Agree on both counts; the devs have gotten used to seeing the diffs inline. We prob. need to schedule a specific day/time when the switchover would take place so we can announce (so everyone knows and no one can gripe). Did we ever resolve the svn->cvs issue? Jason pointed out some tools a while ago... chris From hartzell at alerce.com Tue Jul 10 20:09:09 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 16:09:09 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <4693BE75.4090005@sendu.me.uk> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: <18067.59237.519166.454578@almost.alerce.com> Sendu Bala writes: > George Hartzell wrote: > > Jason Stajich writes: > > > [...] > > > Do you know how to have svn commit messages generate summary emails > > > as well? > > > > I've made a local installation of the SVN::Notify bits in my home > > directory and set up its notification script. If folks are happy with > > it then I'll work on getting The Powers That Be to do a real install > > and we'll use it for the real repository. > > > > It's currently configured to include diffs inline in the message. I > > prefer them as an attachment, but the current configuration of the > > bioperl-guts-l list stalls messages w/ attachments and requires admin > > intervention. I have a support@ request going on it and will change > > it if/when we get the issue resolved. > > Can I put a vote in that you don't? I search through email body text in > my archive of guts to find certain diffs, so really like the diffs inline. Ok, three votes against attachments. Anyone want to vote in support, otherwise I'll just leave 'em inline. > Also, is there any way to get rid of the 'bioperl' in [bioperl revision] > in the subject? Seems redundant and makes it harder to see what was > changed in a small email client window. Sure. The default's just [RevisionNumber]. Does that work for folk? g. From hartzell at alerce.com Tue Jul 10 20:11:36 2007 From: hartzell at alerce.com (George Hartzell) Date: Tue, 10 Jul 2007 16:11:36 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> Message-ID: <18067.59384.247108.463648@almost.alerce.com> Chris Fields writes: > [...] > We prob. need to schedule a specific day/time when the switchover > would take place so we can announce (so everyone knows and no one can > gripe). Did we ever resolve the svn->cvs issue? Jason pointed out > some tools a while ago... I haven't done anything about it. I think that we also need to have some input from the admin/support folk about access methods (https, etc...). Are we going to want to mirror the repository anywhere? g. From hartzell at alerce.com Wed Jul 11 13:17:08 2007 From: hartzell at alerce.com (George Hartzell) Date: Wed, 11 Jul 2007 09:17:08 -0400 Subject: [Bioperl-l] extra hook functionality for svn repos? Message-ID: <18068.55380.626778.486775@almost.alerce.com> There are a bunch of "contributed" hook scripts at http://subversion.tigris.org/tools_contrib.html#hook_scripts Given that many bioperl users depend on case-preserving but case-insensitive file systems, I'm wondering if hooking up the case-insensitive.py script might be worthwhile. Likewise, the check-mime-type.pl script might help us keep svn:mime-type and svn:eol-style properties up to date. There are others there, but none that I found interesting. How big-brother do we want the repository to be? g. From cjfields at uiuc.edu Wed Jul 11 13:40:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Jul 2007 08:40:54 -0500 Subject: [Bioperl-l] extra hook functionality for svn repos? In-Reply-To: <18068.55380.626778.486775@almost.alerce.com> References: <18068.55380.626778.486775@almost.alerce.com> Message-ID: On Jul 11, 2007, at 8:17 AM, George Hartzell wrote: > > There are a bunch of "contributed" hook scripts at > > http://subversion.tigris.org/tools_contrib.html#hook_scripts > > Given that many bioperl users depend on case-preserving but > case-insensitive file systems, I'm wondering if hooking up the > case-insensitive.py script might be worthwhile. I'm not sure how often we run into this, though. Anyone know? > Likewise, the check-mime-type.pl script might help us keep > svn:mime-type and svn:eol-style properties up to date. The latter two might be nice. I thought we planned on defaulting to a simple 'plain text' mime type on commits if it isn't specifically predefined, but maybe this way is better? > There are others there, but none that I found interesting. > > How big-brother do we want the repository to be? > > g. 'Friendly' big-brother, not 'dystopian' big-brother. chris From marian.thieme at lycos.de Wed Jul 11 09:05:18 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Wed, 11 Jul 2007 09:05:18 +0000 Subject: [Bioperl-l] submitting code Message-ID: <188661178019848@lycos-europe.com> An HTML attachment was scrubbed... URL: From dmessina at wustl.edu Wed Jul 11 20:14:17 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 11 Jul 2007 15:14:17 -0500 Subject: [Bioperl-l] submitting code In-Reply-To: <188661178019848@lycos-europe.com> References: <188661178019848@lycos-europe.com> Message-ID: <4DF90B9A-7FFA-4867-B5D3-E6F05EC84BBC@wustl.edu> Hi Marian, Thanks so much for contributing! The best way would be to create a Bugzilla ticket and then attach the code to that ticket. One of the developers will check it in and give you feedback if there are any little tweaks that would be helpful*. Would you be able to include documentation and test cases with your module? Dave * For more info: http://www.bioperl.org/wiki/FAQ#I. 27ve_got_an_idea_for_a_module_how_do_I_contribute_it.3F http://www.bioperl.org/wiki/Developer_Information http://www.bioperl.org/wiki/Becoming_a_developer http://bioperl.org/pipermail/bioperl-l/2003-February/011226.html -- Dave Messina Senior Analyst, Assembly Group Genome Sequencing Center Washington University St. Louis, MO From marian.thieme at lycos.de Wed Jul 11 15:12:20 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Wed, 11 Jul 2007 15:12:20 +0000 Subject: [Bioperl-l] submitting code Message-ID: <188661178030343@lycos-europe.com> An HTML attachment was scrubbed... URL: From e-just at northwestern.edu Thu Jul 12 14:37:03 2007 From: e-just at northwestern.edu (Eric Just) Date: Thu, 12 Jul 2007 09:37:03 -0500 Subject: [Bioperl-l] Job opening in Chicago Message-ID: Hello everyone, We have an opening at dictyBase (Northwestern University in Chicago) for a Bioinformatics Software Engineer. This job involves writing and maintaining software for a genome database using Chado/OO-Perl/Bioperl and many other state of the art technologies. For more information please see: http://dictybase.org/dictybase_jobs.htm Thanks, Eric From cjfields at uiuc.edu Thu Jul 12 16:09:02 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Jul 2007 11:09:02 -0500 Subject: [Bioperl-l] DB::SeqFeature::Store::GFF3Loader question Message-ID: I have been running into some GFF formatting issues where the attributes column is left undef (no '.'), which causes GFF3Loader::parse_attributes() to complain with an 'use of undefined string with split' warning. Would it be okay with the powers that be (Scott, Lincoln) to add a warning or exception there? I'm guessing a warning is better in this case, as just returning works fine. chris From jason at bioperl.org Fri Jul 13 17:30:05 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 13:30:05 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18067.59384.247108.463648@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> Message-ID: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> I'll try and look into this and other stuff with the migration in next week or so - maybe we'll make some time to talk it through during BOSC. I don't know yet when I'll actually have time to think about it properly. I am still worried about doing https because of the current system we have supporting user logins and that we didn't want to run a web server on the main repository machine and we'll have to install DAV on the main repository machine. if ssh+svn is going to be sufficient hurdle for people, note it was already a hurdle for them with CVS, but we'll have to think a bit more on it. We might be able to do some sort of NFS (or other exported FS) but exported to the webserver machine but that is may be a recipe for disaster. -jason On Jul 10, 2007, at 4:11 PM, George Hartzell wrote: > Chris Fields writes: >> [...] >> We prob. need to schedule a specific day/time when the switchover >> would take place so we can announce (so everyone knows and no one can >> gripe). Did we ever resolve the svn->cvs issue? Jason pointed out >> some tools a while ago... > > I haven't done anything about it. > > I think that we also need to have some input from the admin/support > folk about access methods (https, etc...). > > Are we going to want to mirror the repository anywhere? > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Fri Jul 13 18:29:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 13 Jul 2007 13:29:22 -0500 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> Message-ID: <5F5EB9B6-11AF-4D20-95B1-EBBD40A98962@uiuc.edu> I don't think there's a huge rush on this since BOSC is imminent. If devs really want https then we can try adding it after migration, but if it becomes too much of a headache (particularly for the web admins) I wouldn't worry about it. chris On Jul 13, 2007, at 12:30 PM, Jason Stajich wrote: > I'll try and look into this and other stuff with the migration in > next week or so - maybe we'll make some time to talk it through > during BOSC. I don't know yet when I'll actually have time to think > about it properly. > > I am still worried about doing https because of the current system we > have supporting user logins and that we didn't want to run a web > server on the main repository machine and we'll have to install DAV > on the main repository machine. if ssh+svn is going to be sufficient > hurdle for people, note it was already a hurdle for them with CVS, > but we'll have to think a bit more on it. > > We might be able to do some sort of NFS (or other exported FS) but > exported to the webserver machine but that is may be a recipe for > disaster. > > -jason > On Jul 10, 2007, at 4:11 PM, George Hartzell wrote: > >> Chris Fields writes: >>> [...] >>> We prob. need to schedule a specific day/time when the switchover >>> would take place so we can announce (so everyone knows and no one >>> can >>> gripe). Did we ever resolve the svn->cvs issue? Jason pointed out >>> some tools a while ago... >> >> I haven't done anything about it. >> >> I think that we also need to have some input from the admin/support >> folk about access methods (https, etc...). >> >> Are we going to want to mirror the repository anywhere? >> >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sheris at eps.berkeley.edu Fri Jul 13 18:42:32 2007 From: sheris at eps.berkeley.edu (Sheri Simmons) Date: Fri, 13 Jul 2007 11:42:32 -0700 Subject: [Bioperl-l] Problem with Bio::PopGen::Individual Message-ID: <200707131142.32366.sheris@eps.berkeley.edu> Hi, I have a collection of sequencing reads aligned with a consensus sequence that I input into a Bio::PopGen::Population object in order to calculate allele frequencies. The consensus sequence is included to force clustalw to give a better alignment. However, I need to remove the consensus sequence before calculating allele frequencies in the individual reads. I'm having trouble with this part of it. I get the following error message: "Can't locate object method "person_id" via package "Bio::PopGen::Individual" at /usr/share/perl5/Bio/PopGen/Population.pm line 260, line 49." Here is the code snippet producing the error. $pop is a Bio::PopGen::Population object. my @consensus = "gene_consensus"; $pop->remove_Individuals(@consensus); I also tried: my @consensus = $pop->get_Individuals(-unique_id => "gene_consensus"); $pop->remove_Individuals(@consensus); which produced the same error. Can anyone send me in the right direction? I suspect this is a simple problem. Sheri -- Sheri Simmons Department of Earth and Planetary Sciences University of California, Berkeley Berkeley, CA 94720-4767 From jason at bioperl.org Fri Jul 13 20:17:31 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 16:17:31 -0400 Subject: [Bioperl-l] Problem with Bio::PopGen::Individual In-Reply-To: <200707131142.32366.sheris@eps.berkeley.edu> References: <200707131142.32366.sheris@eps.berkeley.edu> Message-ID: <99A3513A-7DBE-4C89-B38B-8C2B76B0E14F@bioperl.org> Hi Sheri - Shoot - that was my fault - bug in the code where I was only using "Person" not Individuals for the code when I was testing. I've commited a bugfix to CVS - do you need me to send you the updated file or are you comfortable grabbing the code from CVS or http://code.open-bio.org This is the change - you may have a different version of BioPerl than what is in CVS so you may have to make the changes on line 260 rather than 282 -- or you can upgrade to latest code via CVS (although this is probably harder for you since you've got stuff installed in /usr/ share)': RCS file: /home/repository/bioperl/bioperl-live/Bio/PopGen/ Population.pm,v retrieving revision 1.22 diff -r1.22 Population.pm 282c282 < unshift @tosplice, $i if( $namehash{$ind->person_id} ); --- > unshift @tosplice, $i if( $namehash{$ind->unique_id} ); -jason On Jul 13, 2007, at 2:42 PM, Sheri Simmons wrote: > Hi, > I have a collection of sequencing reads aligned with a consensus > sequence that > I input into a Bio::PopGen::Population object in order to calculate > allele > frequencies. The consensus sequence is included to force clustalw > to give a > better alignment. However, I need to remove the consensus sequence > before > calculating allele frequencies in the individual reads. I'm having > trouble > with this part of it. I get the following error message: > > "Can't locate object method "person_id" via package > "Bio::PopGen::Individual" > at /usr/share/perl5/Bio/PopGen/Population.pm line 260, line > 49." > > Here is the code snippet producing the error. $pop is a > Bio::PopGen::Population object. > > my @consensus = "gene_consensus"; > $pop->remove_Individuals(@consensus); > > I also tried: > my @consensus = $pop->get_Individuals(-unique_id => > "gene_consensus"); > $pop->remove_Individuals(@consensus); > > which produced the same error. Can anyone send me in the right > direction? I > suspect this is a simple problem. > > Sheri > > -- > Sheri Simmons > Department of Earth and Planetary Sciences > University of California, Berkeley > Berkeley, CA 94720-4767 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From hartzell at alerce.com Fri Jul 13 20:34:14 2007 From: hartzell at alerce.com (George Hartzell) Date: Fri, 13 Jul 2007 16:34:14 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> Message-ID: <18071.57798.130368.703488@almost.alerce.com> Jason Stajich writes: > I'll try and look into this and other stuff with the migration in > next week or so - maybe we'll make some time to talk it through > during BOSC. I don't know yet when I'll actually have time to think > about it properly. > > I am still worried about doing https because of the current system we > have supporting user logins and that we didn't want to run a web > server on the main repository machine and we'll have to install DAV > on the main repository machine. if ssh+svn is going to be sufficient > hurdle for people, note it was already a hurdle for them with CVS, > but we'll have to think a bit more on it. > [...] How are you thinking about providing anonymous readonly non-dev access to the repository? svn+ssh using an anonymous/guest account (can it be screwed down tightly enough?) svn-mirror the repo onto the public machine and do DAV there w/out having to worry about authenticating the devs? g. From jason at bioperl.org Fri Jul 13 21:33:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Jul 2007 17:33:29 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18071.57798.130368.703488@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> <18067.47319.254632.538811@almost.alerce.com> <4693BE75.4090005@sendu.me.uk> <18067.59384.247108.463648@almost.alerce.com> <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org> <18071.57798.130368.703488@almost.alerce.com> Message-ID: <5C42D957-BCCA-46B6-8121-3313CE4B0F2A@bioperl.org> On Jul 13, 2007, at 4:34 PM, George Hartzell wrote: > Jason Stajich writes: >> I'll try and look into this and other stuff with the migration in >> next week or so - maybe we'll make some time to talk it through >> during BOSC. I don't know yet when I'll actually have time to think >> about it properly. >> >> I am still worried about doing https because of the current system we >> have supporting user logins and that we didn't want to run a web >> server on the main repository machine and we'll have to install DAV >> on the main repository machine. if ssh+svn is going to be sufficient >> hurdle for people, note it was already a hurdle for them with CVS, >> but we'll have to think a bit more on it. >> [...] > > How are you thinking about providing anonymous readonly non-dev access > to the repository? svn+ssh using an anonymous/guest account (can it > be screwed down tightly enough?) svn-mirror the repo onto the public > machine and do DAV there w/out having to worry about authenticating > the devs? > We'll do svn on the public anonymous machine like we already do with CVS and with SVN See: http://code.open-bio.org AND http://code.open-bio.org/svnweb/ See blipkit. -jason > g. > > -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From scrosson at uchicago.edu Fri Jul 13 22:15:30 2007 From: scrosson at uchicago.edu (Sean Crosson) Date: Fri, 13 Jul 2007 22:15:30 +0000 (UTC) Subject: [Bioperl-l] ace to fasta conversion Message-ID: I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta and it works great. We're now trying to convert a big (250 MB) .ace file to fasta. The documentation suggests I can do this, but everytime I run the script below, it outputs an empty .fas file. Does anyone have any suggestions on how to make this script work? Does SeqIO really convert between these file types? Thanks for your help. #!/usr/bin/perl -w use Bio::SeqIO; $in = Bio::SeqIO->new(-file => "454Contigs.ace", -format => 'ace'); $out = Bio::SeqIO->new(-file => ">454Contigs.fas", -format => 'fasta'); while ( $seq = $in->next_seq() ) {$out->write_seq($seq); } From cvillamar at gmail.com Fri Jul 13 23:24:04 2007 From: cvillamar at gmail.com (Carlos Villacorta) Date: Fri, 13 Jul 2007 16:24:04 -0700 Subject: [Bioperl-l] beginner problem with fasta headers Message-ID: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> hi all, I have a embl sequence file, when formatting to fasta with Seqio it gives a long string header for each sequence that my following phylogenetic software cannot handle... Does anyone knows how to format those embl or genbank files to fasta but retrieving in the headers just two or three fields (e.g. id | gene | sp_name)? Any advice with this problem would be very appreciated, thanks! From j_martin at lbl.gov Sat Jul 14 00:05:45 2007 From: j_martin at lbl.gov (Joel Martin) Date: Fri, 13 Jul 2007 17:05:45 -0700 Subject: [Bioperl-l] ace to fasta conversion In-Reply-To: References: Message-ID: <20070714000544.GB29841@eniac.jgi-psf.org> Hello, the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use is a phrap/consed ace file. They aren't related at all. You might try poking around in Bio::AssemblyIO which should read assembly ace files. Joel On Fri, Jul 13, 2007 at 10:15:30PM +0000, Sean Crosson wrote: > I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta > and it works great. We're now trying to convert a big (250 MB) .ace file to > fasta. The documentation suggests I can do this, but everytime I run the script > below, it outputs an empty .fas file. Does anyone have any suggestions on how > to make this script work? Does SeqIO really convert between these file types? > Thanks for your help. > > #!/usr/bin/perl -w > > use Bio::SeqIO; > > > $in = Bio::SeqIO->new(-file => "454Contigs.ace", > -format => 'ace'); > $out = Bio::SeqIO->new(-file => ">454Contigs.fas", > -format => 'fasta'); > while ( $seq = $in->next_seq() ) {$out->write_seq($seq); } > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sat Jul 14 04:06:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 13 Jul 2007 23:06:27 -0500 Subject: [Bioperl-l] beginner problem with fasta headers In-Reply-To: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> References: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com> Message-ID: <0089195A-4935-49F2-A8E7-C1F9B8A34D4E@uiuc.edu> Some reading material... http://www.bioperl.org/wiki/ FAQ#Accession_numbers_are_not_present_for_FASTA_sequence_files http://www.bioperl.org/wiki/ FAQ#I_would_like_to_make_my_own_custom_fasta_header_- _how_do_I_do_this.3F http://www.bioperl.org/wiki/FASTA_sequence_format#Note Quiz on Monday! chris On Jul 13, 2007, at 6:24 PM, Carlos Villacorta wrote: > hi all, > I have a embl sequence file, when formatting to fasta with Seqio it > gives a long string header for each sequence that my following > phylogenetic software cannot handle... > Does anyone knows how to format those embl or genbank files to fasta > but retrieving in the headers just two or three fields (e.g. id | gene > | sp_name)? > Any advice with this problem would be very appreciated, thanks! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scrosson at uchicago.edu Sat Jul 14 03:43:59 2007 From: scrosson at uchicago.edu (scrosson) Date: Fri, 13 Jul 2007 20:43:59 -0700 (PDT) Subject: [Bioperl-l] ace to fasta conversion In-Reply-To: <20070714000544.GB29841@eniac.jgi-psf.org> References: <20070714000544.GB29841@eniac.jgi-psf.org> Message-ID: <11590811.post@talk.nabble.com> This problem now makes sense. I've been playing with Bio::Assembly::IO, which does indeed read phrap .ace files. Does anyone have an idea how to pull the assembled contigs out of a Bio::Assembly object and write them out as multi-fasta (or strings for that matter)? None of our workstations are running phrap/consed and I'd love to see these contigs. Sean Hello, the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use is a phrap/consed ace file. They aren't related at all. You might try poking around in Bio::AssemblyIO which should read assembly ace files. Joel -- View this message in context: http://www.nabble.com/ace-to-fasta-conversion-tf4077370.html#a11590811 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bioperlanand at yahoo.com Sat Jul 14 17:55:53 2007 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Sat, 14 Jul 2007 10:55:53 -0700 (PDT) Subject: [Bioperl-l] a question on obtain PDB records using bioperl Message-ID: <798126.17426.qm@web36804.mail.mud.yahoo.com> Hi everybody, Is there a method in Bioperl to obtain PDB record(s) on the fly, i.e. something similar to Bio:Perl methods to retrieve EMBL or GenBank records. Thanks in advance, Anand --------------------------------- Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. From johnsonm at gmail.com Tue Jul 17 18:23:58 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 17 Jul 2007 13:23:58 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? Message-ID: I'm tinkering with parsing iprscan reports with BioPerl. I noticed that this: my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => 'interpro'); while (my $seq = $seqio->next_seq()) { ... } Does not work unless I first 'use XML::DOM::XPath'. I get this error: Can't locate object method "findnodes" via package "XML::DOM::Document" at bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line 30. I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to suck in XML::DOM::Xpath. I see that t/interpro.t requires XML::DOM::XPath: test_begin(-tests => 17, -requires_module => 'XML::DOM::XPath'); Is suppose the reason the test specs a require XML::DOM::XPath is so that tests can be skipped if XML::DOM::XPath is not available. Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? From sac at bioperl.org Tue Jul 17 19:49:32 2007 From: sac at bioperl.org (Steve Chervitz) Date: Tue, 17 Jul 2007 12:49:32 -0700 Subject: [Bioperl-l] Ohloh account for bioperl Message-ID: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> I came across a web app that tracks various metrics for open source projects, noticed that bioperl wasn't listed, and added it: http://www.ohloh.net/projects/6685 Seems like an interesting resource that could help add some visibility. It creates metrics by directly processing the source code repository. I hooked it up to the CVS repos for bioperl-live, -db, -run, and -pipeline. It has yet to do its analysis at this point. Feel free to create Ohloh accounts for yourselves. When you add yourself as a contributor to Bioperl, you can indicate the username associated with your commits, but this requires that it first process the commit logs to figure out what the usernames are. You can still create an account, just update it later with your username. Steve From cjfields at uiuc.edu Tue Jul 17 21:04:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 17 Jul 2007 16:04:44 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? In-Reply-To: References: Message-ID: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu> On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote: > I'm tinkering with parsing iprscan reports with BioPerl. I noticed > that this: > > my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => > 'interpro'); > > while (my $seq = $seqio->next_seq()) { > ... > } > > Does not work unless I first 'use XML::DOM::XPath'. I get this error: > > Can't locate object method "findnodes" via package > "XML::DOM::Document" at > bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line > 30. > > I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to > suck in XML::DOM::Xpath. I see that t/interpro.t requires > XML::DOM::XPath: > > test_begin(-tests => 17, > -requires_module => 'XML::DOM::XPath'); > > Is suppose the reason the test specs a require XML::DOM::XPath is so > that tests can be skipped if XML::DOM::XPath is not available. > Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? You're right; I think tests passed b/c XML::DOM::XPath (if present), was eval'd as a required module. When I commented out the spot where it is eval'd in the test suite I can replicate this error. I have added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it passes fine. Thanks for the heads up! chris From xianranli78 at yahoo.com.cn Wed Jul 18 05:55:19 2007 From: xianranli78 at yahoo.com.cn (Xianran Li) Date: Wed, 18 Jul 2007 13:55:19 +0800 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file Message-ID: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> Hi, I want to extract some infomation from the gff3 file like: 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? Thanks for your help. Xianran Li From georg.otto at tuebingen.mpg.de Wed Jul 18 09:32:26 2007 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Wed, 18 Jul 2007 11:32:26 +0200 Subject: [Bioperl-l] run megablast Message-ID: Hi, is there a module to run megablast in a script (equivalent to ncbi blast in StandAloneBlast.pm)? Cheers, Georg From jeevitesh at ibab.ac.in Wed Jul 18 10:03:24 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 15:33:24 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <47819.192.168.1.125.1184753004.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D we need to find the shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From jeevitesh at ibab.ac.in Wed Jul 18 07:15:33 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 12:45:33 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <55933.192.168.1.125.1184742933.squirrel@webmail.ibab.ac.in> Hi Friends, we need your valuable help in finding the SHARED PATH BETWEEN TWO NODES OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES. Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From jeevitesh at ibab.ac.in Wed Jul 18 08:45:50 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Wed, 18 Jul 2007 14:15:50 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <43613.192.168.1.125.1184748350.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D we need to find the shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From cain.cshl at gmail.com Wed Jul 18 13:10:40 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 18 Jul 2007 09:10:40 -0400 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file In-Reply-To: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> Message-ID: <1184764240.2570.31.camel@localhost.localdomain> Hi Xianran Li, Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing as Bio::DB::GFF3), then you can use the attributes method to get anything in the ninth column: my ($name) = $gene->attributes('Name'); The parenthesis are needed around $name because the attributes method returns a list and the parens capture the first item of the list into $name. Scott On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote: > Hi, > > I want to extract some infomation from the gff3 file like: > > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative > > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? > > Thanks for your help. > > > Xianran Li > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From johnsonm at gmail.com Wed Jul 18 20:53:00 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Wed, 18 Jul 2007 15:53:00 -0500 Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'? In-Reply-To: <469DB6C6.9010702@pasteur.fr> References: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu> <469DB6C6.9010702@pasteur.fr> Message-ID: The output from InterProScan, invoked thusly: iprscan -cli -seqtype p -i input_file -o output_file -format xml On 7/18/07, Emmanuel Quevillon wrote: > Hi guys, > > I read your email and I wondered which iprscan file you've > been talking about? Is it the file produced by InterProScan > or the file called match.xml representing the whole uniprot > database against InterPro? Reading the xml parser > implemented into Bio::SeqIO::interpro, I guess it is the > second one? > In such case, I just want to let you know that the xml > schema changed and the file name also. It is now called > match_complete.xml. > I attached the DTD to be able to see the new structure. > Here is an example of the new data representation. > > > crc64="F1DD0C1042811B48"> > name="PyrdxlP-dep_Trfase_major_sub1" dbname="GENE3D" > status="T" evd="HMMPfam"> > type="Domain" /> > > > dbname="PANTHER" status="T" evd="not_rel"> > > > > > As you can see some time there is no interpro info (no ipr > element). > > I think it would be good to change also the interpro parser ? > > Regards > > Emmanuel > > Chris Fields wrote: > > On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote: > > > >> I'm tinkering with parsing iprscan reports with BioPerl. I noticed > >> that this: > >> > >> my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => > >> 'interpro'); > >> > >> while (my $seq = $seqio->next_seq()) { > >> ... > >> } > >> > >> Does not work unless I first 'use XML::DOM::XPath'. I get this error: > >> > >> Can't locate object method "findnodes" via package > >> "XML::DOM::Document" at > >> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, line > >> 30. > >> > >> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to > >> suck in XML::DOM::Xpath. I see that t/interpro.t requires > >> XML::DOM::XPath: > >> > >> test_begin(-tests => 17, > >> -requires_module => 'XML::DOM::XPath'); > >> > >> Is suppose the reason the test specs a require XML::DOM::XPath is so > >> that tests can be skipped if XML::DOM::XPath is not available. > >> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though? > > > > You're right; I think tests passed b/c XML::DOM::XPath (if present), > > was eval'd as a required module. When I commented out the spot where > > it is eval'd in the test suite I can replicate this error. I have > > added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it > > passes fine. > > > > Thanks for the heads up! > > > > chris > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cain.cshl at gmail.com Thu Jul 19 02:47:53 2007 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 18 Jul 2007 22:47:53 -0400 Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file In-Reply-To: <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL> References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL> <1184764240.2570.31.camel@localhost.localdomain> <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL> Message-ID: <1184813273.2570.96.camel@localhost.localdomain> [Please always reply to the mailing list so that answers can archived] Yes, because commas are not allowed in GFF3 in an unescaped form. Essentially, you are doing this with your GFF3: Name=receptor kinase ORK10;Name= putative and when you do this: my ($name) = $gene->attributes('Name'); you are getting the first item in the list of names, and I suspect which one you get is random. To fix it, you need to replace the comma with %2C (the URL escape code for a comma). If you generated this GFF3, you will need to add a step to URI encode your attribute strings. If you got it from someone else, you should point out to them that their GFF is flawed. Scott On Thu, 2007-07-19 at 10:32 +0800, Xianran Li wrote: > However, the $name return the string "putative" rather than "receptor kinase ORK10". Is any particular reason? > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing > as Bio::DB::GFF3), then you can use the attributes method to get > anything in the ninth column: > > my ($name) = $gene->attributes('Name'); > > The parenthesis are needed around $name because the attributes method > returns a list and the parens capture the first item of the list into > $name. > > Scott > > > On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote: > > Hi, > > > > I want to extract some infomation from the gff3 file like: > > > > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative > > > > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ? > > > > Thanks for your help. > > > > > > Xianran Li > ----- Original Message ----- > From: "Scott Cain" > To: "Xianran Li" > Cc: > Sent: Wednesday, July 18, 2007 9:10 PM > Subject: Re: [Bioperl-l] extract information with Bio::DB::GFF3 fromgff3 file > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l�??i??'?????h??& -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From acutter at eeb.utoronto.ca Fri Jul 20 02:25:08 2007 From: acutter at eeb.utoronto.ca (Asher Cutter) Date: Thu, 19 Jul 2007 22:25:08 -0400 Subject: [Bioperl-l] tree comparisons with bioperl Message-ID: <46A01D04.5040209@eeb.utoronto.ca> I was reading over the functions for working with trees in bioperl. I am looking for something that will compare two topologies and report back if they are equivalent. i.e. something like: does ((a,(b,c)) == ((A,B),C) ? (in this case, no) But of course in reality they would be more complicated topologies. This would be useful for simulating random trees to compare with some given topology of interest. I saw the methods for testing for monophyly and paraphyly, but not much beyond that...perhaps I have missed something? Any suggestions? Thanks, Asher -- ___________________________________ Asher D. Cutter Assistant Professor Department of Ecology & Evolutionary Biology University of Toronto 25 Harbord St. Toronto, ON, M5S 3G5 tel: 416-978-4602 email: acutter at eeb.utoronto.ca http://www.eeb.utoronto.ca/faculty/faculty_profile.cfm?prof_id=130 ___________________________________ From jeevitesh at ibab.ac.in Fri Jul 20 04:25:22 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Fri, 20 Jul 2007 09:55:22 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <53244.192.168.1.125.1184905522.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D The shared path between AB and AC is 2. and for AC and BD the shared path is 6 Any comment on this will be greatly appreciated. With Thanks & regards jeevitesh From n.haigh at sheffield.ac.uk Sun Jul 22 11:34:58 2007 From: n.haigh at sheffield.ac.uk (Nathan S Haigh) Date: Sun, 22 Jul 2007 12:34:58 +0100 Subject: [Bioperl-l] Ohloh account for bioperl In-Reply-To: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> References: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com> Message-ID: <46A340E2.4040505@sheffield.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Steve Chervitz wrote: > I came across a web app that tracks various metrics for open source > projects, noticed that bioperl wasn't listed, and added it: > > http://www.ohloh.net/projects/6685 > > Seems like an interesting resource that could help add some > visibility. It creates metrics by directly processing the source code > repository. I hooked it up to the CVS repos for bioperl-live, -db, > -run, and -pipeline. It has yet to do its analysis at this point. > > Feel free to create Ohloh accounts for yourselves. When you add > yourself as a contributor to Bioperl, you can indicate the username > associated with your commits, but this requires that it first process > the commit logs to figure out what the usernames are. You can still > create an account, just update it later with your username. > > Steve > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Nice to see the graphs of number of commits each developer has made over the last 5 years and how new developers have arisen while those more "seasoned" developers can relax a little more -proof of an excellent open source project! Nath -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGo0Dih5z4PPfwHQoRAua4AJ9nxDJeqAZIbyv0M3g+6Y2xWzkEEgCgnHBO 4JWvG5Gy+H/UqpeXYAcSCX0= =LrFt -----END PGP SIGNATURE----- From cjfields at uiuc.edu Mon Jul 23 03:53:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 22 Jul 2007 22:53:48 -0500 Subject: [Bioperl-l] run megablast In-Reply-To: References: Message-ID: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu> StandAloneBlast runs the megablast executable directly, though I think you can specify a MegaBlast search using blastall with the '-n' flag. We could probably add this functionality in fairly easily since SearchIO can parse megablast output; no one's had the need to code it yet. chris On Jul 18, 2007, at 4:32 AM, Georg Otto wrote: > > Hi, > > is there a module to run megablast in a script (equivalent to ncbi > blast in StandAloneBlast.pm)? > > Cheers, > > Georg > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jeevitesh at ibab.ac.in Mon Jul 23 10:34:36 2007 From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in) Date: Mon, 23 Jul 2007 16:04:36 +0530 (IST) Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance Message-ID: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> Hi Friends, We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF A TREE. The Distance method of TreeIO in Bioperl module gives the total distance. But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as illustrated in figure. Suppose we have a tree A C \ / \2 2/ \__________/ / 6 \ /2 2\ / \ B D The shared path between AB and AC is 2. and for AC and BD the shared path is 6. We need to find the shared distance as said above. Kindly helps us it will help our research a lot. With Thanks & regards jeevitesh From bix at sendu.me.uk Mon Jul 23 11:08:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 23 Jul 2007 12:08:23 +0100 Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance In-Reply-To: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> References: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in> Message-ID: <46A48C27.6060905@sendu.me.uk> jeevitesh at ibab.ac.in wrote: > Hi Friends, > > We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF > A TREE. Please stop sending this message. We heard you the first time. If no one answered, either no one knows the answer or no one understood you. > The Distance method of TreeIO in Bioperl module gives the total distance. > > But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as > illustrated > in figure. > > Suppose we have a tree > A C > \ / > \2 2/ > \__________/ > / 6 \ > /2 2\ > / \ > B D > > The shared path between AB and AC is 2. > and for AC and BD the shared path is 6. I don't follow. But if you already know how to work the answer out, describe the algorithm in words and maybe someone can code it up for you. From georg.otto at tuebingen.mpg.de Mon Jul 23 13:56:46 2007 From: georg.otto at tuebingen.mpg.de (Georg Otto) Date: Mon, 23 Jul 2007 15:56:46 +0200 Subject: [Bioperl-l] run megablast References: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu> Message-ID: Thanks a lot! I guess I should have read the blast documentation more carefully.... Best, Georg Chris Fields writes: > StandAloneBlast runs the megablast executable directly, though I > think you can specify a MegaBlast search using blastall with the '-n' > flag. > > We could probably add this functionality in fairly easily since > SearchIO can parse megablast output; no one's had the need to code it > yet. > > chris > > On Jul 18, 2007, at 4:32 AM, Georg Otto wrote: > >> >> Hi, >> >> is there a module to run megablast in a script (equivalent to ncbi >> blast in StandAloneBlast.pm)? >> >> Cheers, >> >> Georg >> From cjfields at uiuc.edu Mon Jul 23 15:41:35 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 23 Jul 2007 10:41:35 -0500 Subject: [Bioperl-l] Bio::Assembly bug/feature? Message-ID: <52744D70-CED6-49DB-8A17-0998F125D9AD@uiuc.edu> To all: I think I have found a major problem with Bio::Assembly; this was first noticed on Mac OS X in relation to bug 2320 and Bio::Assembly::IO. I am uncertain whether this is meant to be a feature or a bug but it certainly needs to be documented or fixed as it leads to subtle errors. I also can't see the advantage of this approach, but maybe I can be enlightened? Either way, I think it's worth a discussion for those willing to follow. I'll add as a bug later if needed. A bit of background: each instance of a Bio::Assembly::Contig has a Bio::SeqFeature::Collection instance attached to it; each Bio::SeqFeature::Collection itself has a tied DB_File handle attached which remains open during the lifetime of the Bio::SF::Collection object. When using Bio::Assembly one adds the various Contig objects to a Bio::Assembly::Scaffold. So, for instance, if one had ~1000 Contigs in a Scaffold, one would also have ~1000 open tied db handles, one per Contig instance. So far, so good. Unfortunately, when adding a ton of Contig objects to a Bio::Assembly::Scaffold one can run into a host of system-dependent issues based on resource usage limits (as one might expect). This script: ------------------------------ use Bio::Assembly::Scaffold; use Bio::Assembly::Contig; use Bio::SeqFeature::Generic; my $scaffold = Bio::Assembly::Scaffold->new(); for my $id (1..15000) { print "Contig #$id\n"; my $contig = Bio::Assembly::Contig->new(-id => $id); my $feat = Bio::SeqFeature::Generic->new(-start=>1, -end=>10, -strand=>1); $contig->add_features([$feat]); $scaffold->add_contig($contig); } ------------------------------ may fail on Mac OS X when one reaches the maximum number of open file descriptors possible on Mac OS X (on UNIX'y systems, this is 'ulimit - n'); the call to tie the DB_File handle in SF::Collection fails silently, so later on when called on you get the following: ... Contig #251 Contig #252 Contig #253 Contig #254 Can't call method "put" on an undefined value at /Users/cjfields/src/ bioperl-live/Bio/SeqFeature/Collection.pm line 225. I have added an exception to catch this. On Mac OS X you can increase the file descriptor limit using ulimit, at least to a certain point. However, when testing this out on dev.open-bio.org (Linux) the 'tie' sometimes fails (and the exception pops up), but it isn't dependent on 'ulimit -n'. This is what happens more often: ... Contig #10567 Contig #10568 Contig #10569 Contig #10570 Out of memory! Sometimes followed by a seg fault. Ick! Any ideas? For instance, should we set this up so that one SF::Collection is used for all the Contigs (since each one has a unique ID anyway)? Leave as is and document/track the issue as a bug? Both? chris From ba6450 at wayne.edu Mon Jul 23 20:06:14 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Mon, 23 Jul 2007 16:06:14 -0400 (EDT) Subject: [Bioperl-l] error running codeml Message-ID: <20070723160614.EEU90041@mirapointms6.wayne.edu> Hello everyone: I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: [code] use Bio::Tools::Run::Phylo::PAML::Codeml; use Bio::AlignIO; use Bio::TreeIO; my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => 'NM_000034.CDSalign.paml'); my $aln = $alignio->next_aln; my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); my $tree = $treeio->next_tree; my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); $codeml->alignment($aln); $codeml->tree($tree); my ($rc,$parser) = $codeml->run(); my $result = $parser->next_result; my $MLmatrix = $result->get_MLmatrix(); print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; [/code] It gives the following error when I try to compile: [error] ------------ EXCEPTION: Bio::Root::Exception ------------- MSG: unable to find or run executable for 'codeml' STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 ----------------------------------------------------------- Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 [/error] Any idea, guys? Munirul Islam Phd Student Computer Science Wayne State University From arareko at campus.iztacala.unam.mx Mon Jul 23 21:19:24 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Mon, 23 Jul 2007 16:19:24 -0500 Subject: [Bioperl-l] error running codeml In-Reply-To: <20070723160614.EEU90041@mirapointms6.wayne.edu> References: <20070723160614.EEU90041@mirapointms6.wayne.edu> Message-ID: <46A51B5C.9080808@campus.iztacala.unam.mx> Apparently, your script isn't able to locate the codeml executable in your Windows environment. Do you have the PAML package installed? Instructions on how to install it are located here: http://abacus.gene.ucl.ac.uk/software/paml.html Regards, Mauricio. Munirul Islam wrote: > Hello everyone: > > I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: > > [code] > use Bio::Tools::Run::Phylo::PAML::Codeml; > use Bio::AlignIO; > use Bio::TreeIO; > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > -file => 'NM_000034.CDSalign.paml'); > > my $aln = $alignio->next_aln; > > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); > my $tree = $treeio->next_tree; > > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); > > $codeml->alignment($aln); > $codeml->tree($tree); > > my ($rc,$parser) = $codeml->run(); > my $result = $parser->next_result; > my $MLmatrix = $result->get_MLmatrix(); > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; > [/code] > > It gives the following error when I try to compile: > > [error] > ------------ EXCEPTION: Bio::Root::Exception ------------- > MSG: unable to find or run executable for 'codeml' > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 > ----------------------------------------------------------- > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 > [/error] > > Any idea, guys? > > Munirul Islam > Phd Student > Computer Science > Wayne State University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From ba6450 at wayne.edu Mon Jul 23 23:53:22 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Mon, 23 Jul 2007 19:53:22 -0400 (EDT) Subject: [Bioperl-l] error running codeml Message-ID: <20070723195322.EEV22403@mirapointms6.wayne.edu> Thanks Mauricio. I needed to add an environment variable for the paml directiory. $ENV{'PAMLDIR'} = 'c:\paml3.15\bin'; One question ... I would like to save the temp files. So, what modification do I need to make such that $obj->save_tempfiles returns 1 within codeml.pm? Regards Munir ---- Original message ---- >Date: Mon, 23 Jul 2007 16:19:24 -0500 >From: Mauricio Herrera Cuadra >Subject: Re: [Bioperl-l] error running codeml >To: Munirul Islam >Cc: bioperl-l at lists.open-bio.org > >Apparently, your script isn't able to locate the codeml executable in >your Windows environment. Do you have the PAML package installed? >Instructions on how to install it are located here: > >http://abacus.gene.ucl.ac.uk/software/paml.html > >Regards, >Mauricio. > >Munirul Islam wrote: >> Hello everyone: >> >> I am new to bioperl. I am running perl in Eclipse in Windows. Here is the code: >> >> [code] >> use Bio::Tools::Run::Phylo::PAML::Codeml; >> use Bio::AlignIO; >> use Bio::TreeIO; >> >> my $alignio = Bio::AlignIO->new(-format => 'phylip', >> -file => 'NM_000034.CDSalign.paml'); >> >> my $aln = $alignio->next_aln; >> >> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); >> my $tree = $treeio->next_tree; >> >> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); >> >> $codeml->alignment($aln); >> $codeml->tree($tree); >> >> my ($rc,$parser) = $codeml->run(); >> my $result = $parser->next_result; >> my $MLmatrix = $result->get_MLmatrix(); >> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; >> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; >> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; >> [/code] >> >> It gives the following error when I try to compile: >> >> [error] >> ------------ EXCEPTION: Bio::Root::Exception ------------- >> MSG: unable to find or run executable for 'codeml' >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 >> ----------------------------------------------------------- >> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 >> [/error] >> >> Any idea, guys? >> >> Munirul Islam >> Phd Student >> Computer Science >> Wayne State University >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >-- >MAURICIO HERRERA CUADRA >arareko at campus.iztacala.unam.mx >Laboratorio de Gen?tica >Unidad de Morfofisiolog?a y Funci?n >Facultad de Estudios Superiores Iztacala, UNAM > From jason at bioperl.org Tue Jul 24 07:19:18 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 24 Jul 2007 09:19:18 +0200 Subject: [Bioperl-l] error running codeml In-Reply-To: <46A51B5C.9080808@campus.iztacala.unam.mx> References: <20070723160614.EEU90041@mirapointms6.wayne.edu> <46A51B5C.9080808@campus.iztacala.unam.mx> Message-ID: <8273f6c20707240019q1f5e55c9i79a3142a92e2be6e@mail.gmail.com> when you initialize the Codeml object just pass in my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1); OR do $codeml->save_tempfiles(1); You may want to set you TEMPDIR as well and you print out where the tempdir is located with print $codeml->tempdir; and I think you can get the temp outfile. my $name = $codeml->outfile_name; print "name is $name\n"; -jason On 7/23/07, Mauricio Herrera Cuadra wrote: > > Apparently, your script isn't able to locate the codeml executable in > your Windows environment. Do you have the PAML package installed? > Instructions on how to install it are located here: > > http://abacus.gene.ucl.ac.uk/software/paml.html > > Regards, > Mauricio. > > > Munirul Islam wrote: > > Hello everyone: > > > > I am new to bioperl. I am running perl in Eclipse in Windows. Here is > the code: > > > > [code] > > use Bio::Tools::Run::Phylo::PAML::Codeml; > > use Bio::AlignIO; > > use Bio::TreeIO; > > > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > > -file => 'NM_000034.CDSalign.paml'); > > > > my $aln = $alignio->next_aln; > > > > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt'); > > my $tree = $treeio->next_tree; > > > > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new(); > > > > $codeml->alignment($aln); > > $codeml->tree($tree); > > > > my ($rc,$parser) = $codeml->run(); > > my $result = $parser->next_result; > > my $MLmatrix = $result->get_MLmatrix(); > > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n"; > > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n"; > > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n"; > > [/code] > > > > It gives the following error when I try to compile: > > > > [error] > > ------------ EXCEPTION: Bio::Root::Exception ------------- > > MSG: unable to find or run executable for 'codeml' > > STACK: Error::throw > > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572 > > ----------------------------------------------------------- > > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI > (Permission denied) at C:/Perl/lib/File/Temp.pm line 898 > > [/error] > > > > Any idea, guys? > > > > Munirul Islam > > Phd Student > > Computer Science > > Wayne State University > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From ba6450 at wayne.edu Tue Jul 24 21:16:54 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Tue, 24 Jul 2007 17:16:54 -0400 (EDT) Subject: [Bioperl-l] error loading sequence Message-ID: <20070724171654.EEX04380@mirapointms6.wayne.edu> Hello everyone: I am having problem loading a sequence file from within a directory. ############################################################# $dirname = "rundir"; opendir (DIR, $dirname) || die("can't open $dirname"); while (defined($file = readdir(DIR))) { next if $file =~ /^\.\.?$/; # skip . and .. $abs_path = File::Spec->rel2abs( $file ) ; # gives a file not found exception for the following code my $alignio = Bio::AlignIO->new(-format => 'nexus', -file => $abs_path); my $aln = $alignio->next_aln; @sequencenames -> $aln->_read_taxlabels; foreach $taxa (@sequencenames) { print $taxa . "\n"; } } ############################################################# Your suggestions please. Regards, Munirul Islam PhD Student Computer Science Wayne State University Detroit, Michigan, USA From bix at sendu.me.uk Tue Jul 24 22:39:33 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 24 Jul 2007 23:39:33 +0100 Subject: [Bioperl-l] error loading sequence In-Reply-To: <20070724171654.EEX04380@mirapointms6.wayne.edu> References: <20070724171654.EEX04380@mirapointms6.wayne.edu> Message-ID: <46A67FA5.3070505@sendu.me.uk> Munirul Islam wrote: > Hello everyone: > > I am having problem loading a sequence file from within a directory. > > ############################################################# > $dirname = "rundir"; > opendir (DIR, $dirname) || die("can't open $dirname"); > > while (defined($file = readdir(DIR))) { > next if $file =~ /^\.\.?$/; # skip . and .. > $abs_path = File::Spec->rel2abs( $file ) ; > > # gives a file not found exception for the following code This isn't a Bioperl problem. You're using the wrong File::Spec method. You want File::Spec->catfile($dirname, $file). From ba6450 at wayne.edu Wed Jul 25 00:10:04 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Tue, 24 Jul 2007 20:10:04 -0400 (EDT) Subject: [Bioperl-l] error loading sequence Message-ID: <20070724201004.EEX30791@mirapointms6.wayne.edu> Thanks. That worked nicely. I need your suggestion to load codeml control data from a file. Consider the following code: ------------------------------------------------------------- my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1, -params => {'noisy' => 9, 'verbose' => 2, 'runmode' => 0, 'seqtype' => 1, 'CodonFreq' => 2, 'aaDist' => 0, 'model' => 2, 'NSsites' => 2, 'icode' => 0 }); ------------------------------------------------------------- Tried to modify it by passing a hash reference after loading data from a file.: ------------------------------------------------------------- my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1, -params => \%hashlist ); ------------------------------------------------------------- Still that didn't work. Your suggestions pls. Munir ---- Original message ---- >Date: Tue, 24 Jul 2007 23:39:33 +0100 >From: Sendu Bala >Subject: Re: [Bioperl-l] error loading sequence >To: Munirul Islam >Cc: bioperl-l at lists.open-bio.org > >Munirul Islam wrote: >> Hello everyone: >> >> I am having problem loading a sequence file from within a directory. >> >> ############################################################# >> $dirname = "rundir"; >> opendir (DIR, $dirname) || die("can't open $dirname"); >> >> while (defined($file = readdir(DIR))) { >> next if $file =~ /^\.\.?$/; # skip . and .. >> $abs_path = File::Spec->rel2abs( $file ) ; >> >> # gives a file not found exception for the following code > >This isn't a Bioperl problem. You're using the wrong File::Spec method. >You want File::Spec->catfile($dirname, $file). From ba6450 at wayne.edu Thu Jul 26 19:21:20 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Thu, 26 Jul 2007 15:21:20 -0400 (EDT) Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl) Message-ID: <20070726152120.EFA94600@mirapointms6.wayne.edu> Hello Everyone: I have an alignment ('seq.txt'). It runs fine when I directly run codeml. But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved. my $alignio = Bio::AlignIO->new(-format => 'phylip', -file => 'seq.txt'); I guess its not in valid phylip format. I tried to change 'seq.txt' to sequential format. Still that didn't work. Any suggestions on how to load 'seq.txt' in bioperl? Thanks, Munir PhD Student Computer Science Wayne State University -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: seq.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seq.out Type: application/octet-stream Size: 24318 bytes Desc: not available URL: From jason at bioperl.org Fri Jul 27 00:12:03 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Jul 2007 17:12:03 -0700 Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl) In-Reply-To: <20070726152120.EFA94600@mirapointms6.wayne.edu> References: <20070726152120.EFA94600@mirapointms6.wayne.edu> Message-ID: <8273f6c20707261712o149fb884v2044421146e8bc24@mail.gmail.com> You can try and pass in -interleaved => 0 as another option when you init your AlignIO object. On 7/26/07, Munirul Islam wrote: > Hello Everyone: > > I have an alignment ('seq.txt'). It runs fine when I directly run codeml. But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved. > > my $alignio = Bio::AlignIO->new(-format => 'phylip', > -file => 'seq.txt'); > > I guess its not in valid phylip format. > > I tried to change 'seq.txt' to sequential format. Still that didn't work. > > Any suggestions on how to load 'seq.txt' in bioperl? > > Thanks, > > Munir > PhD Student > Computer Science > Wayne State University > > 11 2202 > > human > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAT AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC > GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC > CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT > TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CAC CCC TCA GAG CGC CCC ACA GCT GGC CCC > ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG > CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT > GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG --- > --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG > CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CGG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGA GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG > AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TCC CGG AGT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > chimp > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAC AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AAA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC > GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC > CCC AGC GAG AGA CTT TAC ACC CAG GAT GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC > CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT > TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CGC CCC TCA GAG CGC CCC ACA GCT GGC CCC > ACA GGT CCC CCC NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN --- NNN ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG > CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT > GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG --- > --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG > CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT TTG GAC AAG > CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG > AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TCC CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > macaca > GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG > AAA ACC NNN AAT CTC ACT GAC AGG CAG CTG GCA GAG GAC TAC CTG TAC CGC TAT GGT TAC > ACT CGG GTG GCA GAG ATG CAT --- GGA GAC TCG AAA --- TCT CTG GGG --- CCT GCG CTG > CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC CAG ACC GGT GAG CTA GAC AGC GCC ACG CTG > AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCG CGG GCG GTG ATT GAA GAC GCC TTT GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG > ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC > TTT CCT CCT GGG CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCG > CTG GGC AAG GGC GTC GTG GTT CCA ACT AAG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACA GAC GGT CGC TCC GAC > GGC GTG CCC TGG TGC AGT ACC ACA GCC AAC TAC GAC ACT GAC CGC CGG TTT GGC TTC TGT > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAC GGG AAA CCC TGC CAG TTT CCA > TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GCC GAC TCG ACC GTG ATC GGG GGC AAC TCG GCG GGG GAG CTG TGC GTT TTC CCC TTC > ACC TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC > TGG TGC GCT ACC ACC TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGT CTG TTC CTC GTG GCA GCT CAC GAA TTC GGC CAC GCG CTG GGC TTA GAT CAT > ACC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGA TTC ACT GAG GAG CCC CCC TTG > CAT AAG GAC GAC GTG AAT GGC ATC CAG TAT CTC TAT GGT TCT CGC CCT GAA CCT GAG CCA > CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG > ACG GTC TGC CCC ACT GGA CCC CCC ACT GTC CGC CCC TCA GAC CGC CCC ACA GCC GGC CCC > ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG ACC ACT ACT --- GTG > CCT TTG AAT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC ACG GAG ATC > GGG AAC CAG CTG TAT CTG TTC AAG GAT GGG AGG TAC TGG --- --- CGA TTC TCC GAG --- > --- CGC AGG GGG AGC CGG CTG CAG GGC CCC TTC CTT ATC GCC GAC ACG TGG CCC GCG TTG > CCC CGC AAG CTG GAC TCG GCC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT > GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTA GAC AAG > CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG CGT GGC GCG GGG > AAG ATG CTG CTA TTC AGC GGG CGG CGC TTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG > GAT CCC CGG AGC GCC AGC GAG --- --- GTA GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC > ACG CAC GAC GTC TTC CAG TAC CAA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG AGT TTC CAG AGT NNN NNN NNN NNN NNN NNN NNN GGG GTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT > mouse > GCT GCC CCT TAC CAG CGC --- CAG CCG --- ACT TTT --- GTG GTC TTC CCC AAA GAC CTG > AAA ACC TCC AAC CTC ACG GAC ACC CAG CTG GCA GAG GCA TAC TTG TAC CGC TAT GGT TAC > ACC CGG GCC GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCT CTA CGG --- CCG GCT TTG > CTG ATG CTT CAG AAG CAG CTC TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC CAG ACA CTA > AAG GCC ATT CGA ACA CCA CGC TGT GGT GTC CCA GAC GTG GGT CGA TTC CAA ACC TTC AAA > GGC NNN CTC AAG TGG GAC CAT CAT AAC ATC ACA TAC TGG ATC CAA AAC TAC TCT GAA GAC > TTG CCG CGA GAC ATG ATC GAT GAC GCC TTC GCG CGC GCC TTC GCG GTG TGG GGC GAG GTG > GCA CCC CTC ACC TTC ACC CGC GTG TAC GGA CCC GAA GCG GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGC AAG GAC GGC CTT CTG GCA CAC GCC > TTT CCC CCT GGC GCC GGC GTT CAG GGA GAT GCC CAT TTC GAC GAC GAC GAG TTG TGG TCG > CTG GGC AAA GGC GTC GTG ATC CCC ACT TAC TAT GGA AAC TCA AAT GGT GCC CCA TGT CAC > TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TCG GCC TGC ACC ACA GAC GGC CGC AAC GAC > GGC ACG CCT TGG TGT AGC ACA ACA GCT GAC TAC GAT AAG GAC GGC AAA TTT GGT TTC TGC > CCT AGT GAG AGA CTC TAC ACG GAG CAC GGC AAC GGA GAA GGC AAA CCC TGT GTG TTC CCG > TTC ATC TTT GAG GGC CGC TCC TAC TCT GCC TGC ACC ACT AAA GGC CGC TCG GAT GGT TAC > CGC TGG TGC GCC ACC ACA GCC AAC TAT GAC CAG GAT AAA CTG TAT GGC TTC TGC CCT ACC > CGA GTG GAC GCG ACC GTA GTT GGG GGC AAC TCG GCA GGA GAG CTG TGC GTC TTC CCC TTC > GTC TTC CTG GGC AAG CAG TAC TCT TCC TGT ACC AGC GAC GGC CGC AGG GAT GGG CGC CTC > TGG TGT GCG ACC ACA TCG AAC TTC GAC ACT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA > GGG TAC AGC CTG TTC CTG GTG GCA GCG CAC GAG TTC GGC CAT GCA CTG GGC TTA GAT CAT > TCC AGC GTG CCG GAA GCG CTC ATG TAC CCG CTG TAT AGC TAC CTC GAG GGC TTC CCT CTG > AAT AAA GAC GAC ATA GAC GGC ATC CAG TAT CTG TAT GGT CGT GGC TCT AAG CCT GAC CCA > AGG CCT CCA GCC ACC ACC ACA ACT NNN NNN NNN GAA --- CCA CAG CCG ACA GCA CCT CCC > ACT ATG TGT CCC ACT ATA CCT CCC ACG GCC TAT CCC ACA GTG GGC CCC ACG GTT GGC CCT > ACA GGC GCC CCC TCA CCT GGC CCC ACA AGC AGC CCG TCA CCT GGC CCT ACA GGC GCC CCC > TCA CCT GGC CCT ACA GCG CCC --- CCT ACT GCG GGC TCT TCT GAG GCC TCT ACA --- GAG > TCT TTG AGT CCG GCA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCT ATT GCT GAG ATC > CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT TGG TAC TGG --- --- AAG TTC CTG AAT --- > --- CAT AGA GGA AGC CCA TTA CAG GGC CCC TTC CTT ACT GCC CGC ACG TGG CCA GCC CTG > CCT GCA ACG CTG GAC TCC GCC TTT GAG GAT CCG CAG ACC AAG AGG GTT TTC TTC TTC TCT > GGA CGT CAA ATG TGG GTG TAC ACA GGC AAG ACC GTG CTG GGC CCC AGG AGT CTG GAT AAG > TTG GGT CTA GGC CCA GAG GTA ACC CAC GTC AGC GGG CTT CTC CCG CGT CGT CTC --- GGG > AAG GCT CTG CTG TTC AGC AAG GGG CGT GTC TGG AGA TTC GAC TTG AAG TCT CAG AAG GTG > GAT CCC CAG AGC GTC ATT CGC --- --- GTG GAT AAG GAG TTC TCT GGT GTG CCC TGG AAC > TCA CAC GAC ATC TTC CAG TAC CAA --- GAC AAA GCC TAT --- TTC TGC CAT GGC AAA TTC > TTC TGG CGT GTG AGT TTC CAA AAT GAG GTG AAC AAG GTG GAC CAT GAG GTG AAC CAG GTG > GAC GAC GTG GGC TAC GTG ACC TAC GAC CTC CTG CAG TGC CCT > rat > GCT GCC CCT CAC CAG CGC --- CAG CCG --- ACT TAT --- GTG GTC TTC CCC CGA GAC CTG > AAA ACC TCC AAC CTC ACG GAC ACA CAG CTG GCA GAG GAT TAC CTG TAC CGC TAT GGT TAC > ACT CGG GCA GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCC CTG CGG --- CCC GCT TTG > CTG ATG CTT CAG AAG CAG CTG TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC GAG ACA CTA > AAG GCC ATT CGT TCA CCG CGC TGT GGT GTC CCA GAC GTG GGC AAA TTC CAA ACC TTC GAA > GGC GAC CTC AAG TGG CAC CAT CAT AAC ATC ACC TAT TGG ATC CAA AGC TAC ACC GAA GAC > TTG CCG CGA GAC GTG ATC GAT GAC TCC TTC GCG CGC GCC TTC GCG GTG TGG AGC GCG GTG > ACA CCG CTC ACC TTC ACC CGC GTG TAC GGG CTC GAA GCA GAC ATT GTC ATC CAG TTT GGT > GTC GCG GAG CAC GGG GAC GGG TAT CCC TTC GAC GGC AAG GAT GGT CTA CTG GCA CAC GCC > TTT CCC CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAC GAG TTG TGG TCG > CTG GGC AAA GGC GCC GTG GTC CCC ACT TAC TTT GGA AAC GCA AAT GGT GCC CCA TGT CAC > TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TTG TCC TGC ACC ACG GAT GGC CGC AAC GAC > GGC AAG CCT TGG TGT GGC ACG ACA GCT GAC TAC GAC ACA GAC AGA AAA TAT GGT TTC TGC > CCC AGT GAG AAT CTC TAC ACG GAG CAT GGC AAC GGA GAC GGC AAA CCC TGC GTA TTT CCA > TTC ATC TTC GAG GGC CAC TCC TAC TCT GCC TGC ACC ACT AAA GGT CGC TCG GAT GGT TAT > CGC TGG TGC GCC ACC ACC GCC AAC TAT GAC CAG GAT AAG CTG TAT GGC TTC TGT CCT ACT > CGA GCC GAC GTC ACT GTA ACT GGG GGC AAC TCG GCA GGA GAG ATG TGC GTC TTC CCC TTC > GTC TTC CTG GGC AAG CAG TAC TCT ACC TGT ACC GGC GAG GGC CGC AGT GAT GGG CGC CTC > TGG TGC GCG ACG ACG TCG AAC TTC GAC GCT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA > GGG TAC AGC CTG TTT CTG GTG GCA GCG CAC GAG TTC GGC CAT GCG CTG GGC TTA GAT CAT > TCT TCA GTG CCG GAA GCG CTC ATG TAC CCC ATG TAT CAC TAC CAC GAG GAC TCC CCT CTG > CAT GAA GAC GAC ATA AAA GGC ATC CAG CAT CTG TAT GGT CGT GGC TCT AAA CCT GAC CCA > AGG CCT CCA GCC ACC ACC GCA GCT NNN NNN NNN GAA --- CCA CAG CCG ACA GCT CCT CCC > ACT ATG TGT CCC ACT GCA CCT CCC ATG GCC TAT CCC ACA GGG GGC CCC ACA GTC GCC CCT > ACA GGC GCC CCC TCA CCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCT ACT GCT GGT CCT TCT GAG GCC CCT ACA --- GAG > TCT TCG ACT CCA GTA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCC ATT GCT GAT ATC > CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT CGG TAT TGG --- --- AAG TTC TCG AAT --- > --- CAC GGA GGA AGC CAA TTG CAG GGC CCC TTT CTT ATT GCC CGC ACG TGG CCA GCT TTG > CCT GCA AAG TTG AAC TCA GCC TTT GAG GAT CCG CAG TCC AAG AAG ATT TTC TTC TTC TCT > GGG CGC AAA ATG TGG GTG TAC ACA GGC CAG ACG GTG CTG GGC CCC AGG AGT CTG GAT AAG > TTG GGG CTA GGC TCA GAG GTA ACC CTG GTC ACC GGA CTT CTC CCG CGT CGT GGA --- GGG > AAG GCT CTG CTG ATC AGC CGG GAA CGT ATC TGG AAA TTC GAC TTG AAG TCT CAG AAG GTG > GAT CCC CAG AGC GTT ACT CGC --- --- TTG GAT AAC GAG TTC TCT GGC GTG CCC TGG AAC > TCA CAC AAC GTC TTT CAC TAC CAA --- GAC AAG GCC TAT --- TTC TGC CAT GAC AAA TAC > TTC TGG CGT GTG AGT TTC CAC AAC NNN NNN NNN NNN NNN NNN NNN CGG GTG AAC CAG GTG > GAC CAC GTG GCC TAC GTG ACC TAT GAC CTC CTG CAG TGC CCT > rabbit > GCC GCC CCT CGC CGC CGC --- CAG CCC --- ACC TTG --- GTG GTC TTC CCA GGA GAG CTG > AGA ACC NNN AGG CTC ACC GAC AGG CAG CTG GCA GAG GAG TAC CTG TTC CGC TAT GGT TAC > ACC CGC GTA GCC AGC ATG CAC --- GGA GAC AGC CAG --- TCC CTG CGG CTG CCG --- CTG > CTA CTT CTG CAG AAG CAT CTG TCC CTG CCG GAG ACG GGG GAG CTG GAT AAT GCC ACC CTG > GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC GTG GGC AAA TTC CAG ACC TTC GAG > GGT GAC CTC AAG TGG CAC CAC CAC AAC ATC ACG TAC TGG ATC CAA AAC TAC TCC GAA GAC > CTG CCG CGC GAC GTC ATC GAC GAC GCC TTC GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG > ACG CCA CTC ACC TTC ACC CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGG > GTC GCG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGC AAG GAC GGG CTC CTG GCG CAC GCC > TTC CCT CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAA GAG CTG TGG TCC > CTG GGC AAG GGC GTC GTG GTT CCC ACG TAC TTT GGA AAC GCC GAC GGC GCC CCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC ACC GCC TGC ACC ACG GAC GGC CGC TCT GAC > GGC ATG GCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTT GGC TTC TGC > CCC AGC GAA AGA CTC TAC ACC CAG GAC GGC AAC GCA GAC GGC AAG CCC TGC GAG TTT CCG > TTC ATC TTC CAG GGC CGT ACC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCC GAC GGC CAC > CGC TGG TGC GCC ACC ACC GCC AGC TAC GAC AAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GCT GAC TCC ACG GTG GTC GGG GGC AAC TCG GCG GGA GAG CTG TGT GTC TTC CCC TTC > GTC TTC CTG GGC AAA GAG TAC TCG TCC TGT ACC AGC GAG GGT CGC AGG GAT GGG CGC CTC > TGG TGT GCC ACC ACT TCC AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCT GAT AAA > GGA TAC AGC CTG TTC CTC GTG GCA GCC CAC GAG TTC GGC CAT GCA CTG GGC TTG GAT CAC > TCC TCT GTG CCG GAG CGC CTC ATG TAC CCC ATG TAC CGC TAC CTA GAG GGG TCC CCC CTG > CAC GAG GAC GAC GTC AGG GGC ATC CAG CAT CTA TAT GGT CCT AAC CCC AAC CCC CAG CCT > --- CCA GCC ACC ACC ACA CCT GAN NNN NNN NNN NNN NNG CCG CAG CCC ACG GCT CCC CCG > ACG GCC TGC CCC ACC TGG CCG GCC ACT GTG CGC CCC TCC GAG CAC CCC ACT ACC AGC CCT > ACC GGC GCC CCC TCA GCT GGC CCT ACC GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACG GCC AGC CCC TCT GCG GCC CCC ACT --- GCG > TCC TTG GAC CCA GCT GAA GAC GTC TGC AAC GTG AAT GTC TTC GAC GCC ATC GCC GAG ATA > GGG AAC AAG CTG CAT GTC TTC AAG GAT GGG AGG TAC TGG --- --- CGG TTC TCC GAG --- > --- GGC AGT GGG CGC CGG CCG CAG GGC CCC TTC CTC ATC GCC GAC ACC TGG CCC GCG CTG > CCG GCC AAG CTG GAC TCC GCC TTT GAG GAG CCG CTC ACC AAG AAG CTG TTC TTC TTC TCG > GGG CGC CAA GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGT CCC GAG GTG CCG CAC GTC ACC GGA GCC CTC CCG CGC GCC GGG --- GGC > AAG GTG CTG CTG TTC GGC GCG CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACG GTG > GAT TCC CGG AGC GGC GCT CCG --- --- GTG GAT CAG ATG TTC CCC GGG GTG CCT TTG AAC > ACA CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TTC TGG CGT GTG AGT ACC CGG AAC NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CTA GTG > GAC CAG GTG GGC TAC GTG AGC TTT GAC ATC CTG CAC TGC CCT > dog > GCA GCT CCC AGA CCA CAC --- AAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAC CTG > AGA ACT NNN AAT CTC ACT GAC AAG CAG CTG GCA GAG GAA TAT CTG TTT CGC TAT GGC TAC > ACT CAA GTG GCC GAG CTG AGC --- GAC GAC AAG CAG --- TCC CTG AGT CGC GGG --- CTG > CGG CTT CTC CAG AGG CGC CTG GCT CTG CCT GAG ACT GGA GAG CTG GAC AAA ACC ACC CTG > GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC CTG GGC AAA TTC CAG ACC TTT GAG > GGC GAC CTC AAG TGG CAC CAC AAC GAC ATC ACT TAC TGG ATA CAA AAC TAC TCG GAA GAC > TTG CCC CGC GAC GTG ATC GAC GAC GCC TTT GCC CGA GCC TTC GCG GTC TGG AGC GCG GTG > ACA CCG CTC ACC TTC ACT CGC GTG TAC GGC CCC GAA GCC GAC ATC ATC ATT CAG TTT GGT > GTT AGG GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTT CTG GCT CAC GCC > TTT CCT CCC GGC CCG GGC ATT CAG GGA GAC GCC CAC TTC GAC GAC GAG GAG TTA TGG ACT > CTG GGC AAG GGC GTC GTG GTT CCG ACC CAC TTC GGA AAC GCA GAT GGC GCC CCC TGC CAC > TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACG GAC GGC CGC TCC GAT > GAC ACG CCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTC GGC TTC TGC > CCC AGC GAG AAA CTC TAC ACC CAG GAC GGC AAT GGG GAC GGC AAG CCC TGC GTG TTT CCG > TTC ACC TTC GAG GGC CGC TCC TAC TCC ACG TGC ACC ACC GAC GGC CGC TCG GAC GGC TAC > CGC TGG TGC TCC ACC ACC GGC GAC TAC GAC CAG GAC AAA CTC TAC GGC TTC TGC CCA ACC > CGA GTC GAT TCC GCG GTG ACC GGG GGC AAC TCC GCC GGG GAG CCG TGT GTC TTC CCC TTC > ATC TTC CTG GGC AAG CAG TAC TCG ACG TGC ACC AGG GAG GGC CGC GGA GAT GGG CAC CTC > TGG TGC GCC ACC ACT TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA > GGA TAC AGC CTG TTC CTT GTG GCC GCC CAT GAG TTC GGC CAC GCG CTG GGT TTA GAT CAT > TCA TCG GTG CCA GAA GCG CTC ATG TAC CCC ATG TAC AGC TTC ACC GAG GGC CCC CCC CTG > CAT GAA GAC GAC GTG AGG GGC ATC CAG CAT CTG TAC GGT CCT CGC CCT GAA CCT GAG CCA > CAG CCT CCA ACC GCN NNN NNN NNN NNN NNN NNN NNN --- NNC CCG CCC ACC GCC CCG CCC > ACC GTC TGC GCT ACT GGT CCT CCC ACC ACC CGC CCC TCA GAG CGC CCC ACT GCT GGC CCC > ACA GGC CCC CCT GCA GCT GGC CCC ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCC TCT GAG GCC CCT ACA --- GTG > CCT GTG GAT CCG GCA GAG GAT ATA TGC AAA GTG AAC ATC TTC GAC GCC ATC GCG GAG ATC > AGG AAC TAC TTG CAT TTC TTC AAG GAA GGG AAG TAC TGG --- --- CGA TTC TCC AAG --- > --- GGC AAG GGA CGC CGG GTG CAG GGC CCC TTC CTT ATC ACC GAC ACG TGG CCT GCG CTG > CCC CGC AAG CTG GAC TCC GCC TTT GAG GAC GGG CTC ACC AAG AAG ACT TTC TTC TTC TCT > GGG CGC CAA GTG TGG GTG TAC ACA GGC ACG TCG GTG GTA GGC CCG AGG CGT CTG GAC AAG > CTG GGC CTG GGC CCG GAG GTT ACC CAA GTC ACC GGC GCC CTC CCG CAA GGC GGG --- GGT > AAG GTG CTG CTG TTC AGC AGG CAG CGC TTC TGG AGT TTC GAC GTG AAG ACG CAG ACC GTG > GAT CCC AGG AGC GCC GGC TCG --- --- GTG GAA CAG ATG TAC CCC GGG GTG CCC TTG AAC > ACG CAT GAC ATC TTC CAG TAC CAA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TAC TGG CGT GTG AAT TCT CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CAG GTG > GAC GAA GTG GGC TAC GTG ACC TTT GAC ATT TTG CAG TGC CCT > cow > GCT GTC CCC AGA CGA CGC --- CAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAA CCA > CGA ACC NNN AAC CTC ACC AAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGC TAC > ACT CCT GGG GCA GAG CTG AGC --- GAG GAC GGT CAG --- TCC CTG CAG CGA GCT CTG CTG > CGC --- TTC CAG CGG CGC CTG TCC CTG CCC GAG ACT GGC GAG CTG GAC AGC ACC ACC CTG > AAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC GTG GGC AGA TTC CAG ACC TTT GAG > GGC GAA CTC AAG TGG CAC CAC CAC AAC ATC ACC TAC TGG ATC CAA AAT TAC TCG GAA GAC > CTG CCG CGC GCC GTG ATC GAC GAC GCC TTT GCC CGC GCT TTC GCG CTC TGG AGC GCT GTG > ACG CCG CTC ACC TTC ACT CGA GTG TAC GGC CCC GAA GCT GAC ATT GTC ATC CAG TTT GGT > GTT AGA GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTC CTG GCA CAC GCC > TTT CCG CCT GGC AAA GGC ATT CAG GGA GAT GCC CAC TTC GAC GAT GAA GAG TTG TGG TCT > CTG GGC AAA GGC GTT GTG ATC CCG ACC TAC TTC GGA AAC GCG AAG GGC GCC GCC TGC CAC > TTC CCC TTC ACC TTT GAG GGT CGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGT TCC GAC > GAC ATG CTC TGG TGC AGC ACC ACC GCC GAC TAC GAC GCC GAC CGC CAG TTC GGC TTC TGC > CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCG GAC GGC AAG CCC TGC GTC TTC CCG > TTC ACC TTC CAG GGC CGC ACC TAC TCC GCC TGT ACC TCC GAT GGT CGC TCC GAC GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC > CGA GTC GAT GCA ACG GTG ACC GGG GGC AAC GCG GCG GGG GAG CTG TGC GTC TTC CCC TTC > ACC TTC CTG GGC AAG GAA TAC TCG GCC TGC ACC AGA GAG GGT CGC AAT GAT GGG CAC CTC > TGG TGC GCC ACC ACC TCC AAC TTC GAC AAA GAC AAG AAG TGG GGC TTC TGC CCG GAT CAA > GGA TAC AGC CTG TTC CTT GTG GCC GCA CAC GAG TTT GGC CAC GCG CTG GGC TTA GAT CAC > ACC TCC GTG CCA GAG GCG CTC ATG TAC CCC ATG TAC AGA TTC ACA GAG GAG CAC CCC CTG > CAT AGG GAC GAT GTT CAG GGC ATC CAG CAT CTG TAT GGT CCT CGC CCT GAG CCT GAA CCA > CGG CCT CCG ACC ACT ACC ACC ACT ACC ACC ACC GAA --- CCC CAG CCC ACC GCT CCC CCC > ACG GTC TGC GTC ACG GGG CCT CCC ACC GCC CGC CCC TCA GAG GGT CCC ACT ACT GGC CCC > ACA GGG CCC CCG GCA GCT GGC CCT ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN CCT --- CCC ACG GCT GGC CCT TCT GCG GCC CCG ACG GAG TCC > CCG --- GAT CCA GCG GAG GAC GTC TGC AAC GTG GAC ATC TTC GAC GCC ATC GCG GAG ATT > AGG AAC CGC TTG CAT TTC TTC AAG GCT GGG AAG TAC TGG --- --- AGA CTT TCT GAG --- > --- GGA GGG GGC CGC CGG GTG CAG GGT CCC TTC CTT GTC AAG AGC AAG TGG CCT GCG CTG > CCC CGC AAG CTG GAC TCC GCC TTC GAG GAT CCG CTC ACC AAG AAG ATT TTC TTC TTC TCT > GGG CGC CAA GTA TGG GTG TAC ACC GGC GCG TCG TTG CTA GGC CCG AGG CGT CTG GAC AAG > TTG GGC CTG GGC CCG GAA GTG GCC CAG GTC ACC GGG GCC CTC CCG CGC CCT GAG --- GGT > AAG GTG CTG CTG TTC AGC GGG CAG AGC TTC TGG AGG TTC GAC GTG AAG ACA CAG AAG GTG > GAT CCC CAG AGC GTC ACC CCC --- --- GTG GAC CAG ATG TTC CCC GGG GTG CCC ATT AGC > ACG CAC GAC ATC TTT CAG TAC CAA --- GAG AAA GCT TAC --- TTC TGC CAG GAT CAC TTC > TAC TGG CGC GTG AGT TCC CAG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAT CAG GTG > GAC TAT GTG GGC TAC GTG ACC TTC GAC CTC CTG AAG TGC CCT > elephant > --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- > --- --- --- --- --- --- --- --- --- --- --- GAG --- TAT CTG TAC CGC TAT GGC TAC > ACT CGT GTG GCG GAG ATG AAC --- --- AGT AAG GTG --- TCC CTG GGT --- CGA GCG CTA > AGG CTT CTC CAG CAA AAC CTG GCC CTG CCC GAG ACC GGC GAG CTG GAC AGC ACC ACC CTG > GAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC ATG GGT GGC TTC CAG ACC TTC GAG > GGT GAC CTC AAG TGG AAC CAC CAC AAC ATC ACA TAC TGG ATC CAA AAC TAC TCG GAA GAC > TTG CCC AAA CAA GTG ATC GAA GAC GCT TTT GCC CGC GCC TTC GCG GCG TGG AGC GAG GTG > ACA CCA CTC ACC TTC ACC CGC CTG CGC AGC AGG GAC GTG GAC ATC GTC ATC CGG TTT GGG > GTC AAG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGG AAG GAC GGG CTG CTG GCA CAC GCC > TTT CCT CCC GGC CCC GGC ATT CAG GGA GAC GCG CAC TTC GAC GAT GAC GAA TTG TGG TCG > TTG GGC AAG GGC GTC GTG GTT CCC ACC CGC TTT GGA AAC GCA GAT GGC GCC GCC TGC CAC > TTT CCC TTC ACC TTC CAG GGC CGC TCG TAC ACT GCC TGC ACC GCC GAC GGC CGC TCC GAC > GGC CAG CTC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGC CAG TTT GGC TTC TGC > CCC AGT GAG AGG CTC TAC ACC CAG CAC GGC AAT GAC AAC GGC AAG CCC TGC GTG TTT CCG > TTC ACG TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACC GAC GGC CGC TCG GAT GGC TAC > CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAT GGC TTC TGT CCC ACC > CGA --- GNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- NNN NNN NNN --- > --- --- --- --- --- --- --- --- NNN NNN --- NNN NNN NNN --- --- --- --- --- --- > --- --- --- --- NNN NNN NNN NNN NNN --- --- --- --- --- --- --- --- NNN NNN NNN > NNN NNN --- --- --- --- NNN --- NNN NNN NNN NNN --- --- --- --- NNN NNN --- --- > --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- --- NNN NNN NNN NNN --- > --- --- --- --- --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN > NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- NNN NNN NNN --- NNN > NNN ATA GTG CTG TTT AGT AGA CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACT GTG > GAG CCC CGG AGC GTC CGC TCG --- --- GTG GAC CAG GTG TTC TCC GGG GTG CCC TTG GAC > ACG CAC GAC ATC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC > TAC TGG CGC GTG TGT TTC CGG AAT GAT --- AAT GAA --- --- --- --- GTG AAC CAG GTG > GAC CAA GTG GGC TAC GTG AAC TTT GAC ATC CTG CAG TGC CCT > opossum > GCT GCA CCC CGA GGG GGC CCC TCT CCC GGG TCT ATC TTG ATC ACC TTT CCT GAA GAG AGA > --- ACA CGC ACT CTC ACT GAC CAG CAA TTT GCT GAG GAA TAT CTG CTT CGG TAC GGC TAC > ATC CCG --- GCA GGG CTT CTG --- GGC CAA AAC CAC ACT TCT CTG AAG --- CAT GCC TTA > AAG AAA CTC CAA CGT CAG CTG GCC CTG ACA CAG ACG GGA GAG CTG GAC AGC GCC ACC ATC > GAG GCA ATG CGG GCC CCG CGC TGC GGA GTA CCC GAC GTC GCC CCA TTC CAA ACC TTC GAG > GGT GAA CTG AAG TGG AAA CAT CAG AAC ATC ACC TAT CGG ATC CAG AAT TAC TCC CCC GAC > CTG CCT CCT GAG GTG ACG GAT GAT GCT TTC CAA CGA GCC TTT GCT CTG TGG AGT AAA GTG > ACC CCA CTC ACC TTC ACA CGT GTC AGC AGC GGG GAG GCA GAC ATC CTG ATC CAG TTT GGG > ACC AGA GAG CAC GGC GAT GGA TAC CCT TTT GAC GGG AAA GAT GGA CTC TTG GCT CAC GCT > TTC CCC CCG GGC CCA GGA ATC CAG GGA GAT GCC CAC TTT GAT GAC GAG GAG TTC TGG ACT > CTA GGC AAA GGC GTC GTG GTC AAA ACG CGG TTC GGG AAC GCA GAC GGA GCC CCC TGC CAC > TTT CCT TTC ACC TTC GAG GGC AGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCT GAC > GGG CTG CAC TGG TGC AGC ACT ACG GCT GAC TAT GAC AAG GAC CGC CTT TAC GGC TTT TGC > CCT AGC GAG CTG CTC TAC ACC CTG GAT GGT AAC GCC AAT GGC GAT CCC TGC GTG TTC CCC > TTC ACC TTC GAT GGT CGT TCC TAC ACA GCC TGC ACC ACT GAA GGA CGC TCT GAC GGC TAC > CGC TGG TGT GCC ACT ACT GCC AGT TAC GAT CAG GAC AAG CTT TAT GGC TTC TGT CCC AAC > CGA --- GAT ACT GCG GTG AGC GGA GGC AAC TCC CAA GGG GAA CCC TGC GTC TTT CCC TTC > ACT TTC CTA AAT CGA GAA TAC TCA GCC TGC ACC AGT GAG GGC CGC AGT GAC GGT CGT CTC > TGG TGT GCG ACC ACC GAT GAC TTC GAT CGG GAT CAC AAG TGG GGC TTC TGT CAG GAT CGA > GGG TAC AGC TTA TTC CTT GTG GCC GCG CAC GAG TTT GGG CAC GCG CTG GGC TTG GAC CAC > TCA TCT GTG CCG GAA GCA TTG ATG TTC CCA ATG TAC CGT TTT ACC GAG GGA CCC CCG TTG > CAT GAG GAT GAC GTG AAG GGA ATC CAA CAT CTG TAT GGT TCT AGG ACT GAG CCG GAT CCG > GAA CCT CCG ACC TCT --- --- --- TCT CCC TTA GAG --- CCA GAT TCC ACC ACT CAG TTC > AAT GCT TGT --- --- --- CCC --- TCT GTA --- CCC CCC CCT --- --- --- GCC AGA CCC > ACC GGC CCT CCT ACT GCT CGC CCC TCA --- --- --- --- --- --- --- --- GCA CCT CCC > ACT GCT GGA CCC ACT GGT CCT --- CCC ACA GCC AAC CCT CCT GTG CCC CCC ACT --- GGG > CCC TTG GAC CCA GCT GAC GAC GCT TGT GGC GTC CTG GTA TTT GAT GCC ATC GCT GAG ATT > CGA GGC CAG CTT CAC TTC TTC AAA GAC GGA CGG TAC TGG CGA GTC CCC AGG GAC TCC --- > --- AAG --- GGG CCA --- ACT CAA GGA CCC TTC CTC ATT GCT AAC ACT TGG TCT GCT TTG > CCC CCA AAA CTG GAC TCG GCT TTC GAA GAT CCC CTG ACT AAG AAA CTC TTC TTC TTT TCA > GGT AAA GGT ATG TGG GTA TAC ACA GGC CAG TCA GTT GTA GGT CCC CGG CGC CTG GAG AAG > CTG GGT CTG CAT AGC AGA GTT CAA AGG ATA ACA GGT GCC ATT CAG CAT AAT GGA --- GGC > AAG GTG CTA TTA TTC AGC CAG AAT CAA TAT TGG AGG TTG GAT GTG AAG AAG CAG AAG GTA > GAC TCA AGA GAA CCT TAC CCT --- --- GTG GAG AAC ATG TTC CCT GGA GTA CCT GAA AAC > ACT CAT GAT GTT TTC CTG TAT AAG GGA GAT ACA --- TAC --- TTC TGC CAG GGC ATC TTC > TTC TGG CGC GTG AAC --- --- --- --- --- AAG GAG --- --- --- --- --- AAC AAG GTG > GAC TTA GTA GGC TAC GTG ACC TAC GAC CTC CTG --- --- --- > chicken > GCC GCC CCA CTG CAC AGC --- AAG CCG CAG GCG GTC --- ATC ACC TTC CCA GGG GAG CTG > --- CTC AGC GCC CCA TCA GAC GTG GAG CTG GCG GAG AAC TAC CTG CTG CGC TTC GGC TAC > ATC CAG GAG GCA GAG GTG AGG AGG AGC AGC AAG CAC GTG TCC CTG GCC --- AAA GCG CTG > CGC AGG ATG CAG AAG CAG CTG GGG CTG GAG GAG ACG GGG GAG CTG GAC GCC AGC ACC CTG > GAG GCC ATG CGA GCC CCC CGC TGT GGG GTG CCT GAC GTG GGG GGT TTC CTC ACC TTC GAG > GGG GAG CTC AAA TGG GAC CAC ATG GAC CTC ACG TAC CGG GTG ATG AAC TAC TCC CCC GAC > CTG GAC CGT GCC GTG ATA GAT GAT GCC TTC CGG CGG GCA TTC AAG GTG TGG AGT GAT GTC > ACT CCC CTC ACC TTC ACC CAG ATT TAC AGC GGC GAG GCA GAC ATC ATG ATC ATG TTC GGC > AGC CAA GAG CAT GGT GAT GGG TAC CCC TTC GAC GGC AAG GAT GGG CTC CTG GCC CAC GCC > TTT CCC CCC GGC AGT GGG ATT CAG GGC GAT GCC CAC TTC GAT GAT GAT GAG TTC TGG ACT > CTG GGA ACC GGC TTA GAG GTG AAG ACC CGC TAT GGG AAT GCC AAC GGG GCC AGC TGC CAC > TTC CCC TTC ATC TTT GAG GGC CGC TCC TAC TCC CGG TGC ATC ACG GAG GGC CGC ACG GAT > GGG ATG CTG TGG TGT GCC ACC ACC GCC AGC TAC GAC GCC GAC AAG ACC TAC GGC TTC TGC > CCC AGC GAG CTG CTC TAC ACC AAT GGT GGC AAC AGC GAT GGG TCT CCC TGC GTC TTC CCC > TTC ATC TTC GAT GGC GCC TCC TAT GAC ACC TGC ACC ACA GAT GGG CGC TCT GAC GGC TAT > CGC TGG TGT GCC ACC ACG GCC AAC TTC GAC CAG GAC AAG AAA TAC GGC TTC TGC CCC AAC > CGA --- GAC ACG GCG GCG ATC GGT GGC AAC TCC CAG GGG GAC CCG TGT GTC TTC CCC TTC > ACC TTC CTG GGG CAG TCC TAC AGC GCG CGC ACC AGC CAG GGC CGG CAG GAC GGG AAG CTC > TGG TGT GCC ACC ACC AGC AAC TAT GAC ACC GAC AAG AAG TGG GGC TTC TGC CCA GAC AGA > GGT TAC AGC ATC TTC TTG GTG GCT GCC CAC GAG TTT GGG CAC TCA CTG GGG CTG GAC CAC > TCC AGC GTG CGC GAG GCA TTG ATG TAC CCT ATG TAC AGC TAC GTC CAG GAC TTC CAG CTG > CAT GAG GAT GAT GTC CAG GGC ATC CAG TAC CTC TAT GGT CGT GGC TCT GGC CCT GAG CCC > ACC CCC CCG --- --- --- --- --- GCA CCT TTG --- --- CCC --- --- ACC GAG GAG --- > --- --- --- --- --- --- CCC CAG TCC ATA --- CCC ACC GAA --- --- --- GCT --- --- > --- GGC --- --- AGT GCT TCC ACC ACA --- --- --- --- --- --- --- --- GAG GAG GAG > GAG GAG --- GAG ACA --- CCT GAG CCC ACA GCT GAG --- --- --- --- CCC AGC --- --- > CCC GTG GAC CCC AGC CGG GAT GCC TGC ATG GAG AAG AAC TTC GAC GCC ATC ACT GAG ATC > AAT GGA GAG CTG CAC TTC TTC AAG AAT GGG AAA TAC TGG --- --- ACC CAC TCG TCC TTC > TGG AAA TCA GGC --- --- ACT CAG GGC GCC TTC TCT ATC GCT GAC ACC TGG CCC GGC CTC > CCG GCT GTC ATC GAC GCG GCG TTC CAA GAT GTG CTC ACC AAG AGG GTC TTC TTC TTC GCG > GGA CGG CAG TTC TGG GTG TTC TCC GGC AAG AAC GCA GTG GGC CCC CGT AGG ATT GAG AAG > TTG GGC ATT GGG AAG GAG GCC GGG CGC ATC ACG GGG GCC CTG CAG CGG GGA CGT --- GGC > AAA GTG CTG CTC TTC AGT GGG GAG CAC TAC TGG AGG CTG GAC GTG AAG GTC CAG ACA GTG > GAC --- AAG GGC --- TAC CCC CGT GAC ACT GAT GAT GTC TTT ACT GGT GTC CCC CTT GAC > GCA CGT AAC GTC TTC CTG TAC CAA --- GAC AAG --- TAC CAC TTC TGC CGG GAC AGC TTC > TAC TGG AGG ATG ACC --- --- --- --- --- CCA CGT --- --- --- --- --- TAC CAG GTG > GAC CGC GTG GGA TAC ATC AGA TAC GAC CTC CTG CAG TGC CCC > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From ba6450 at wayne.edu Fri Jul 27 01:20:11 2007 From: ba6450 at wayne.edu (Munirul Islam) Date: Thu, 26 Jul 2007 21:20:11 -0400 (EDT) Subject: [Bioperl-l] Finding the Sequence List in an Alignment Message-ID: <20070726212011.EFB49252@mirapointms6.wayne.edu> Thanks. The error is removed now. I have a question. Is there any function that I can use to get the sequence list (human, chimp, etc.) after loading an alignment from file? Munir ---- Original message ---- >Date: Thu, 26 Jul 2007 17:12:03 -0700 >From: "Jason Stajich" >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in bioperl) >To: "Munirul Islam" >Cc: bioperl-l at lists.open-bio.org > >You can try and pass in -interleaved => 0 as another option when you >init your AlignIO object. > From jason at bioperl.org Fri Jul 27 04:28:36 2007 From: jason at bioperl.org (Jason Stajich) Date: Thu, 26 Jul 2007 21:28:36 -0700 Subject: [Bioperl-l] Finding the Sequence List in an Alignment In-Reply-To: <20070726212011.EFB49252@mirapointms6.wayne.edu> References: <20070726212011.EFB49252@mirapointms6.wayne.edu> Message-ID: <8273f6c20707262128s23e7e3ebgeb1cb74b3c0baf37@mail.gmail.com> Have you tried reading the documentation for the Bio::SimpleAlign object? for my $seq ( $aln->each_seq ) { print $seq->display_id, "\n"; } I'd appreciate if you added some of your questions with the answers to the FAQ or to other places on the wiki so that other people can benefit from your learning here. On 7/26/07, Munirul Islam wrote: > > Thanks. The error is removed now. > > I have a question. Is there any function that I can use to get the > sequence list (human, chimp, etc.) after loading an alignment from file? > > Munir > > ---- Original message ---- > >Date: Thu, 26 Jul 2007 17:12:03 -0700 > >From: "Jason Stajich" > >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in > bioperl) > >To: "Munirul Islam" > >Cc: bioperl-l at lists.open-bio.org > > > >You can try and pass in -interleaved => 0 as another option when you > >init your AlignIO object. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jason Stajich jason at bioperl.org http://bioperl.org/wiki/User:Jason From arareko at campus.iztacala.unam.mx Fri Jul 27 15:18:55 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 27 Jul 2007 10:18:55 -0500 Subject: [Bioperl-l] Perl Survey 2007 Message-ID: <46AA0CDF.1030503@campus.iztacala.unam.mx> It really takes about 5 minutes: http://perlsurvey.org/ Cheers, Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From dhoworth at mrc-lmb.cam.ac.uk Fri Jul 27 16:07:17 2007 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Fri, 27 Jul 2007 17:07:17 +0100 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: <46AA0CDF.1030503@campus.iztacala.unam.mx> References: <46AA0CDF.1030503@campus.iztacala.unam.mx> Message-ID: <46AA1835.2020004@mrc-lmb.cam.ac.uk> Mauricio Herrera Cuadra wrote: > It really takes about 5 minutes: > http://perlsurvey.org/ and gives all your personal information including email address to anybody who cares to snoop the HTTP POST message! So there's definitely no anonymity. Cheers, Dave From spiros at lokku.com Fri Jul 27 16:38:57 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Fri, 27 Jul 2007 17:38:57 +0100 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: <46AA1835.2020004@mrc-lmb.cam.ac.uk> References: <46AA0CDF.1030503@campus.iztacala.unam.mx> <46AA1835.2020004@mrc-lmb.cam.ac.uk> Message-ID: On 7/27/07, Dave Howorth wrote: > Mauricio Herrera Cuadra wrote: > > It really takes about 5 minutes: > > http://perlsurvey.org/ > > and gives all your personal information including email address to > anybody who cares to snoop the HTTP POST message! So there's definitely > no anonymity. Not to mention that it requires registration (?). Who is behind the survey ? I am on a number of Perl and Perl related lists and haven't seen it being mentioned. Spiros From arareko at campus.iztacala.unam.mx Fri Jul 27 17:37:31 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Fri, 27 Jul 2007 12:37:31 -0500 Subject: [Bioperl-l] Perl Survey 2007 In-Reply-To: References: <46AA0CDF.1030503@campus.iztacala.unam.mx> <46AA1835.2020004@mrc-lmb.cam.ac.uk> Message-ID: <46AA2D5B.9080304@campus.iztacala.unam.mx> Spiros Denaxas wrote: > On 7/27/07, Dave Howorth wrote: >> Mauricio Herrera Cuadra wrote: >>> It really takes about 5 minutes: >>> http://perlsurvey.org/ >> and gives all your personal information including email address to >> anybody who cares to snoop the HTTP POST message! So there's definitely >> no anonymity. I didn't provided any personal information other than my country and birthyear. As for my email, I always use the one I have for all the SPAM I'd like to subscribe to :) > Not to mention that it requires registration (?). Who is behind the > survey ? I am on a number of Perl and Perl related lists and haven't > seen it being mentioned. Registration is rather different from confirming your email (which prevents filling the DB multiple times by spambots/yourself, thus screwing the survey). Who's behind it, its purpose, privacy, etc., please read the FAQ: http://perlsurvey.org/faq/ Cheers, Mauricio. > Spiros > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From Alicia.Amadoz at uv.es Mon Jul 30 15:46:57 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Mon, 30 Jul 2007 17:46:57 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver Message-ID: <1245168492amadoz@uv.es> Hi, i'm trying to run a bioperl script in linux with standaloneblast from a webserver but I have the following error: -------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- I have tried several things to fix it as setting some environment variables both directly through the shell and adding some code in my script with, BEGIN { $ENV{PATH} .= ':/usr/local/blast-2.2.16'; $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; $ENV{BLASTDATADIR} = '/usr/local/data/'; } and with, $local->executable('/usr/local/bin'); my $blast_report = $local->blastall($inputfilename); I have also checked that the webserver has permission of read and execute in all blast executables and directories. But trying all of these things it keeps showing the same error above. Any more idea to solve this problem? My script works well when I use it as a simply script and I've reboot the system several times when changes where performed. Thanks to anyone who will be able to help me! Regards, Alicia From gyang at plantbio.uga.edu Mon Jul 30 20:58:51 2007 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon, 30 Jul 2007 16:58:51 -0400 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml Message-ID: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> I am running remoteblast and using readmethod "xml", I noticed that it is printing the output repeatedly nonstop. It's like in a loop. Did anybody notice this before? Can anybody help me getting out of this? Thanks a lot, Guojun Yang University of Georgia From grafman at graphcomp.com Sun Jul 29 21:08:04 2007 From: grafman at graphcomp.com (Grafman Productions) Date: Sun, 29 Jul 2007 14:08:04 -0700 Subject: [Bioperl-l] Perl 3D OpenGL Message-ID: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> If this posting is inappropriate, please let me know - my apologies. I recently came across an article on BioPerl, and it occurred to me that there might be some need for 3D rendering within your BioPerl project. I released a number of new/updated Perl OpenGL (POGL) modules this year, along with benchmarks that demonstrate that it performs comparably to C. If there's a need for 3D features within BioPerl, and if I can be of any assistance in helping to add such features, I would enjoy the opportunity. From torsten.seemann at infotech.monash.edu.au Mon Jul 30 23:27:46 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 31 Jul 2007 09:27:46 +1000 Subject: [Bioperl-l] error using standaloneblast through webserver In-Reply-To: <1245168492amadoz@uv.es> References: <1245168492amadoz@uv.es> Message-ID: Alicia, > Hi, i'm trying to run a bioperl script in linux with standaloneblast > from a webserver but I have the following error: > -------------------- WARNING --------------------- > MSG: cannot find path to blastall > --------------------------------------------------- > $ENV{BLASTDATADIR} = '/usr/local/data/'; > $ENV{PATH} .= ':/usr/local/blast-2.2.16'; > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; I think the last one (or two) paths should be '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard BLAST installation is where the 'blastall' binary actually lives. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From cjfields at uiuc.edu Tue Jul 31 00:53:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 30 Jul 2007 19:53:45 -0500 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> Message-ID: On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: > I am running remoteblast and using readmethod "xml", I noticed that > it is printing the output repeatedly nonstop. It's like in a loop. > Did anybody notice this before? Can anybody help me getting out of > this? > Thanks a lot, > > > Guojun Yang > University of Georgia Not seeing that using bioperl-live; you may need to update RemoteBlast.pm as this sounds similar to an issue that popped up earlier in the spring. chris From torsten.seemann at infotech.monash.edu.au Tue Jul 31 06:24:34 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 31 Jul 2007 16:24:34 +1000 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu> Message-ID: > as this sounds similar to an issue that popped up > earlier in the spring. I could have sworn it was autumn! ;-) -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University From Alicia.Amadoz at uv.es Tue Jul 31 10:11:54 2007 From: Alicia.Amadoz at uv.es (Alicia Amadoz) Date: Tue, 31 Jul 2007 12:11:54 +0200 (CEST) Subject: [Bioperl-l] error using standaloneblast through webserver In-Reply-To: References: Message-ID: <2361686267amadoz@uv.es> Hi, I tried what you suggested and that was it, it works perfectly. Thank you very much. Regards, Alicia > Alicia, > > > Hi, i'm trying to run a bioperl script in linux with standaloneblast > > from a webserver but I have the following error: > > -------------------- WARNING --------------------- > > MSG: cannot find path to blastall > > --------------------------------------------------- > > $ENV{BLASTDATADIR} = '/usr/local/data/'; > > $ENV{PATH} .= ':/usr/local/blast-2.2.16'; > > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; > > I think the last one (or two) paths should be > '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard > BLAST installation is where the 'blastall' binary actually lives. > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > > From jay at jays.net Tue Jul 31 12:00:56 2007 From: jay at jays.net (Jay Hannah) Date: Tue, 31 Jul 2007 07:00:56 -0500 Subject: [Bioperl-l] Perl 3D OpenGL In-Reply-To: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE> Message-ID: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net> On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote: > If this posting is inappropriate, please let me know - my apologies. Not at all. AFAIK this is the perfect place to discuss any contributions you're motivated to make to the BioPerl project. > I recently came across an article on BioPerl, and it occurred to me > that > there might be some need for 3D rendering within your BioPerl project. > > I released a number of new/updated Perl OpenGL (POGL) modules this > year, > along with benchmarks that demonstrate that it performs comparably > to C. > > If there's a need for 3D features within BioPerl, and if I can be > of any > assistance in helping to add such features, I would enjoy the > opportunity. I know nothing about 3D modeling in biology, nor do I hang out with any protein structure folks, but 3D always sounds sexy. -grin- If you're new to bioinformatics (I certainly am) you might want to read this: http://en.wikipedia.org/wiki/Protein_structure Because that's probably where your 3D work would be used. Especially note the "Software" section, where you'll find some of the "competition". :) There's some cool stuff out there. I don't know what all would or wouldn't be time well spent in Perl / BioPerl. HTH, Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From cjfields at uiuc.edu Tue Jul 31 16:51:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 31 Jul 2007 11:51:42 -0500 Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml In-Reply-To: <20070731104052.b4b93021@dogwood.plantbio.uga.edu> References: <20070731104052.b4b93021@dogwood.plantbio.uga.edu> Message-ID: <7A2D7E4A-4024-48DB-88C8-063388A98419@uiuc.edu> Make sure to keep responses on the ail list. You might want to run a full install, just in case. If I remember correctly Sendu made some changes a while back in the BLAST-related modules which may be related to this. At the very least install/ upgrade all modules in Bio::Tools::Run. chris On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote: > Thanks, Chris, > But when I replaced the old RemoteBlast.pm with the new one, I got > "can't locate the object method "retrieve_parameter"". Does this > mean I need to install something else? > Guojun > > ----- Original Message ----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast > with xml > > >>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote: >>>> I am running remoteblast and using readmethod "xml", I noticed that >>> it is printing the output repeatedly nonstop. It's like in a loop. >>> Did anybody notice this before? Can anybody help me getting out of >>> this? >>> Thanks a lot, >>> >>> >>> Guojun Yang >>> University of Georgia >>> Not seeing that using bioperl-live; you may need to update >> RemoteBlast.pm as this sounds similar to an issue that popped up >> earlier in the spring. >>> chris >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign