From bix at sendu.me.uk Fri Jun 1 04:06:04 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 01 Jun 2007 09:06:04 +0100 Subject: [Bioperl-l] ClustalW Score? In-Reply-To: <1A4207F8295607498283FE9E93B775B40334A01A@EX02.asurite.ad.asu.edu> References: <00e201c7a2de$91f60f50$2d01a8c0@PICO><465E9B58.1020403@sendu.me.uk> <49B6333A-18B9-4B63-80EF-81C57A295494@bioperl.org> <1A4207F8295607498283FE9E93B775B40334A01A@EX02.asurite.ad.asu.edu> Message-ID: <465FD36C.5060603@sendu.me.uk> Kevin Brown wrote: >> you're right --- it is not really my code, I was just >> elaborating Kevin's example --- it would probably need to be >> more specific or perhaps the last Score seen is sufficient >> for what one is trying to capture? > > I took that code from a pairwise clustal alignment script that I wrote > to deal with aligning a bunch of short sequences against a long one to > see where they line up at. When all of them were fed to Clustal the > short sequences all ended up aligned to each other and not well aligned > to the longer sequence. I only saw one score in the output from the > pairwise, so that is what I used to find a reasonable value. Ok, well I've hedged my bets and used both. Now commited to CVS. From jy at genseq.co.uk Fri Jun 1 22:39:48 2007 From: jy at genseq.co.uk (Jean-Yves Sireau) Date: Sat, 2 Jun 2007 10:39:48 +0800 Subject: [Bioperl-l] Genseq Message-ID: <20070602103948.093d713c@jys.my.regentmarkets.com> Dear List members, I would like to let you know of the formation of Genseq Ltd., a bioinformatics company that will (in time!) offer genome sequencing to high net worth individuals and bioinformatic analysis of the sequence data to detect predisposition to illness. The company's website is www.genseq.co.uk Genseq would be willing to sponsor bioperl, whether financially or by providing resources, notably for any bioperl-related activities in the Asia Pacific region. Genseq's bioinformatics team will be based in Cyberjaya (Malaysia), and we are in particular interested to promote bioperl in Malaysia. We are also actively recruiting at the moment in Malaysia and India. If there was sufficient demand, we would be willing to organise a bioperl conference in Cyberjaya at the Cyberview Lodge (www.cyberview-lodge.com), which would be the ideal place for such a conference in Malaysia. Looking forward to your comments, suggestions and proposals. Best regards Jean-Yves Sireau -- Jean-Yves Sireau CEO, Genseq Ltd. www.genseq.co.uk From cjfields at uiuc.edu Sat Jun 2 01:16:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 2 Jun 2007 00:16:05 -0500 Subject: [Bioperl-l] EUtilities overhaul started Message-ID: To anyone using Bio::DB::EUilities, I am in the midst of a major overhaul to the various EUtilities tools and to Bio::DB::GenericWebDBI (the latter which I am forming into more or less a test bed for other database interfaces). I'm about 80% done at this point, and will likely start committing changes this coming week. The overall interface will change (something I had warned about in the Bio::DB::EUtilities POD) but I am hoping it will be more intuitive and easier to use in the long run. I'll describe the overall redesign and use in an upcoming HOWTO (as recommended by Brian a while back). If anyone has any suggestions/ideas/flames, please let me know! Cheers! chris From cjfields at uiuc.edu Sat Jun 2 10:39:25 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 2 Jun 2007 09:39:25 -0500 Subject: [Bioperl-l] EUtilities overhaul started In-Reply-To: References: Message-ID: Yes, there are a few odd issues, though that's one I've not heard of yet. You might try one of the sub-nucleotide databases (nuccore, nucest, nucgss). I'll try looking into it and (if necessary) pester NCBI about it. I'll pass this on to the mail list to see if anyone else knows about the problem. chris On Jun 2, 2007, at 8:28 AM, Bernd Brandt wrote: > Hi Chris, > > Thanks for your work on EUtilities. > For a production task, I used EUtilitities directly (given your > announced overhaul). I noticed a recent problem at NCBI (reported two > weeks ago to NCBI, no reply yet). Possibly you may run into this with > testing: if you ePOST gi ids to the EU server and then use this set in > Esearch (using the query key) no results are returned for the > nucleotide database. > ESearches like "db=$db%23$QueryKey" typically fail if the $db is > nucleotide (but work f $db='protein'). The XML output has Count 0 and > an empty QueryTranslationSet for db=nucleotide only. > For completeness, I attach a simple test script I used. > > > Best regards, > Bernd > > > On 6/2/07, Chris Fields wrote: >> To anyone using Bio::DB::EUilities, >> >> I am in the midst of a major overhaul to the various EUtilities tools >> and to Bio::DB::GenericWebDBI (the latter which I am forming into >> more or less a test bed for other database interfaces). I'm about >> 80% done at this point, and will likely start committing changes this >> coming week. >> >> The overall interface will change (something I had warned about in >> the Bio::DB::EUtilities POD) but I am hoping it will be more >> intuitive and easier to use in the long run. I'll describe the >> overall redesign and use in an upcoming HOWTO (as recommended by >> Brian a while back). >> >> If anyone has any suggestions/ideas/flames, please let me know! >> >> Cheers! >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Sun Jun 3 00:51:57 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 2 Jun 2007 23:51:57 -0500 Subject: [Bioperl-l] EUtilities overhaul started In-Reply-To: References: Message-ID: <1A2AF5C4-6A58-4FDD-A4CA-6ABCE30F0D1B@uiuc.edu> I can confirm this; however it only relates to the use of history with esearch and nucleotide (use of the history with other eutils seems to work fine); retrieving sequences via efetch is not affected. If I find out anything more I'll post something on the mail list. chris On Jun 2, 2007, at 11:48 AM, Bernd Brandt wrote: > I can confirm that using the correct sub-nucleotide database works > (nuccore in my case). > This seems to be a quite recent change/bug at NCBI. Until recently, > db=nucleotide worked. Moreover, EInfo still lists nucleotide as valid > db. > It is not optimal to have to choose the sub-database and the searches > work via the Entrez web-interface. Note that this problem is related > to the ESearch and db=nucleotide. > > bernd > > On 6/2/07, Chris Fields wrote: >> Yes, there are a few odd issues, though that's one I've not heard of >> yet. You might try one of the sub-nucleotide databases (nuccore, >> nucest, nucgss). >> >> I'll try looking into it and (if necessary) pester NCBI about it. >> I'll pass this on to the mail list to see if anyone else knows about >> the problem. >> >> chris >> >> On Jun 2, 2007, at 8:28 AM, Bernd Brandt wrote: >> >> > Hi Chris, >> > >> > Thanks for your work on EUtilities. >> > For a production task, I used EUtilitities directly (given your >> > announced overhaul). I noticed a recent problem at NCBI >> (reported two >> > weeks ago to NCBI, no reply yet). Possibly you may run into this >> with >> > testing: if you ePOST gi ids to the EU server and then use this >> set in >> > Esearch (using the query key) no results are returned for the >> > nucleotide database. >> > ESearches like "db=$db%23$QueryKey" typically fail if the $db is >> > nucleotide (but work f $db='protein'). The XML output has Count >> 0 and >> > an empty QueryTranslationSet for db=nucleotide only. >> > For completeness, I attach a simple test script I used. >> > >> > >> > Best regards, >> > Bernd >> > >> > >> > On 6/2/07, Chris Fields wrote: >> >> To anyone using Bio::DB::EUilities, >> >> >> >> I am in the midst of a major overhaul to the various EUtilities >> tools >> >> and to Bio::DB::GenericWebDBI (the latter which I am forming into >> >> more or less a test bed for other database interfaces). I'm about >> >> 80% done at this point, and will likely start committing >> changes this >> >> coming week. >> >> >> >> The overall interface will change (something I had warned about in >> >> the Bio::DB::EUtilities POD) but I am hoping it will be more >> >> intuitive and easier to use in the long run. I'll describe the >> >> overall redesign and use in an upcoming HOWTO (as recommended by >> >> Brian a while back). >> >> >> >> If anyone has any suggestions/ideas/flames, please let me know! >> >> >> >> Cheers! >> >> >> >> chris >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From basu at pharm.stonybrook.edu Sun Jun 3 10:44:18 2007 From: basu at pharm.stonybrook.edu (Siddhartha Basu) Date: Sun, 03 Jun 2007 10:44:18 -0400 Subject: [Bioperl-l] EUtilities overhaul started In-Reply-To: References: Message-ID: On Sat, 2 Jun 2007 00:16:05 -0500 Chris Fields wrote: > To anyone using Bio::DB::EUilities, > > I am in the midst of a major overhaul to the various >EUtilities tools > and to Bio::DB::GenericWebDBI (the latter which I am >forming into > more or less a test bed for other database interfaces). > I'm about > 80% done at this point, and will likely start committing >changes this > coming week. > > The overall interface will change (something I had >warned about in > the Bio::DB::EUtilities POD) but I am hoping it will be >more > intuitive and easier to use in the long run. I'll >describe the > overall redesign and use in an upcoming HOWTO (as >recommended by > Brian a while back). Hi chris, Being a frequent user of EUtilities, hopefully this api facelift and upcoming howto will definitely be more helpful. Anyway, one thing i noticed that for each eutil call such as efetch,epost,esearch,esummary a new 'Bio::DB::Utilities' object has to be instantiated. And thereafter it cannot be set during runtime such as $eutils->id('ids'), for example.... my $eutils = Bio::DB::Eutilities->new ( -id => $id, -eutil => 'esummary', -db => 'protein', ); my $ct = $eutils->get_response->content(); ## -- now i cannot do this... $eutils->id($newid); my $ct = $eutils->get_response->content(); Is the new api going to address something along this line or is there currently anyway to reuse the object. Thanks again for this nice toolkit. -siddhartha > > If anyone has any suggestions/ideas/flames, please let >me know! > > Cheers! > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sun Jun 3 19:52:39 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 3 Jun 2007 18:52:39 -0500 Subject: [Bioperl-l] EUtilities overhaul started In-Reply-To: References: Message-ID: <5120BD7B-CA89-46E4-8D6B-6B24C1F93A5E@uiuc.edu> On Jun 3, 2007, at 9:44 AM, Siddhartha Basu wrote: > ... > Hi chris, > Being a frequent user of EUtilities, hopefully this api facelift > and upcoming howto will definitely be more helpful. > Anyway, one thing i noticed that for each eutil call such as > efetch,epost,esearch,esummary a new 'Bio::DB::Utilities' object has > to be > instantiated. And thereafter it cannot be set during runtime such as > $eutils->id('ids'), for example.... > > my $eutils = Bio::DB::Eutilities->new ( -id => $id, > -eutil => 'esummary', > -db => 'protein', > ); > my $ct = $eutils->get_response->content(); > > ## -- now i cannot do this... > $eutils->id($newid); > my $ct = $eutils->get_response->content(); I'll have to check up on that, though changing id() should work with the old API. It won't matter with the new API (it works fine), but it is still troubling... > Is the new api going to address something along this line or is > there currently anyway to reuse > the object. > Thanks again for this nice toolkit. > > -siddhartha The old API was based upon the idea of creating discrete user agents for each eutil to retrieve data. The problem with the old interface is it attempts to do too much (take care of parameters, set up requests, retrieve responses, parse data, etc), and many tasks required instantiating a new EUtilities object. I was never really satisfied with it. The new interface is a composition of three classes: the web user agent (LWP::UserAgent), a class encapsulating parameter handling, and a parser class (all which can be used independently if needed). When parameters change a new request is made 'lazily' (i.e. only when needed). Similarly, when data is requested after any parameter change a new parser instance is created and the new response is parsed. With that in mind you can now do the following: ---------------------------------------- my @params = (-eutil => 'esearch', -db => 'protein', -term => 'BRCA1', -retmax => 100); my $eutil = Bio::DB::EUtilities->new(@params); # no need to get response first; get_ids() calls that if needed my @ids = $eutil->get_ids; # below changes only those parameters, leaves all others set as before $eutil->set_parameters(-eutil => 'efetch', -id => \@ids, -retmode => 'text', -rettype => 'fasta'); # sends streamed content directly to a file $eutil->get_response(-content_file => 'seqs.fas'); # or to a LWP::UserAgent-supported request callback $eutil->get_response(-content_cb => \&my_cb); my @newparams = (-eutil => 'esearch', -db => 'protein', -term => 'BRCA2', -retmax => 100); # Resets eutility to passed parameters (or undef) $eutil->reset_parameters(@newparams); # retrieve new IDs my @new_ids = $eutil->get_ids; ---------------------------------------- Note the same eutil object is used for all of the above, so to answer your last question, yes, you should be able to create data pipelines using the same object if necessary. chris From sac at bioperl.org Mon Jun 4 13:56:57 2007 From: sac at bioperl.org (Steve Chervitz) Date: Mon, 4 Jun 2007 10:56:57 -0700 Subject: [Bioperl-l] question about Bio::Restriction::Analysis In-Reply-To: <3E4CBE0B-6EE4-4973-80DF-90C7E778DA83@cshl.edu> References: <3E4CBE0B-6EE4-4973-80DF-90C7E778DA83@cshl.edu> Message-ID: <8f200b4c0706041056o4dbaadfexddf9f82fc33c6da@mail.gmail.com> Hi Apurva, I'm cc:ing the list to let others know you have found performance issues with Bio::Restriction::Analysis. Ideally, we should focus on addressing those issues rather than fixing a module that is now deprecated. But taking a quick look at my Bio::Tools::RestrictionEnzyme module, I'm not sure why HpaII would give slower performance relative to other non-ambiguous cutters. This enzyme has a 4-base recognition sequence CCGG, and if you're feeding it a large CG-rich input sequence, that could be a factor. To test, you might try using some other 4-base cutters that aren't CG-rich (TaqI, TasI) or try some other input sequences. There is no special flag to indicate that the enzyme is non-ambiguous. The module handles that automatically. Good luck, Steve On 6/4/07, Apurva Narechania wrote: > Hi Rob and Steve, > > I was hoping you could answer a quick performance question regarding > the Bio::Restriction::Analysis module. I have found that though this > module works well, it is considerably slower than the deprecated > Bio::Tools::RestrictionEnzyme. I see that there are two algorithms > available to your module, and since I am using HpaII, a non-ambiguous > enzyme, I thought I might find similar performance to the older, > deprecated module, but I do not. Is it possible that I am not setting > the non-ambiguous flag correctly? Does it need to be set in the first > place? > > As far as Bio::Tools::RestrictionEnzyme, though it is faster, I have > found instances where it is inaccurate, especially in calculating > fragments of extremely small size 1-5 base pairs, so I would like to > use your module if possible. It just seems slow to me. > > Can you clarify? > > I have copied my code below since it is a short, simple script. > > Thanks! > Apurva Narechania > Ware Lab > Cold Spring Harbor Labs > > ---------- > > #!/usr/bin/perl > > # This program generates a fasta of restriction frags given an > # input fasta and a restriction cut site > > use Getopt::Std; > use Bio::Seq; > use Bio::SeqIO; > use strict; > > use Bio::Tools::RestrictionEnzyme; > > my %opts = (); > getopts ('f:', \%opts); > my $fasta = $opts{'f'}; > > # read fasta file > my $seqin = Bio::SeqIO -> new (-format => 'Fasta', -file => "$fasta"); > > my $x = 0; > while (my $sequence_obj = $seqin -> next_seq()){ > $x++; > my $id = $sequence_obj->id(); > > print STDERR "$x Working on $id\n"; > > # generate the rx object > my $ra = new Bio::Tools::RestrictionEnzyme(-NAME=>'HpaII'); > > my @frags = $ra->cut_seq($sequence_obj); > > my $counter = 0; > foreach my $frag (@frags){ > $counter++; > my $length = length ($frag); > print ">$id.$counter length=$length\n$frag\n"; > } > > } > > From anhthu.tieu at gsf.de Tue Jun 5 04:14:09 2007 From: anhthu.tieu at gsf.de (Tieu, Anh-Thu) Date: Tue, 5 Jun 2007 10:14:09 +0200 Subject: [Bioperl-l] problems with image maps and IE 6 or higher Message-ID: <93739F94E0F3BA43AD72423E2482341A1435F6@sw-rz010.gsf.de> Hi, I have a problem using the bioperl image maps function with the IE6 or and higher browser. It might be a more general problem with IE6 rather than with bioperl, but as I used bioperl to create my image maps, I thought I could still post this problem here and ask for people's opinion. I wondered if anyone else faced the same problem and if possible if anyone could share their experiences and their solutions.

scale alignment5 integration_pt gene intron1 usemap="mapnameD064C01" style="border:2px solid #CCCCCC;"/>

> > onclick="javascript:void(zmenu( 'scale' ));;return false;" title="scale " > alt="scale " target="_blank"/> > onclick="javascript:void(zmenu( 'alignment 5splk', '', 'seq_id: ', '', > 'start: ', '', 'stop: 0', '', 'length: bp', '', 'identity: ', '', 'e-v > alue: ' ));;return false;" title="alignment5 " alt="alignment5 " > target="_blank"/> > onclick="javascript:void(zmenu( 'alignment 5splk', '', 'seq_id: ', '', > 'start: ', '', 'stop: 0', '', 'length: bp', '', 'identity: ', '', 'e-v > alue: ' ));;return false;" title="integration_pt " alt="integration_pt " > target="_blank"/> > onclick="javascript:void(zmenu( 'Nphs1 ', > '', 'ensembl_id: ENSMUSG00000006649', '', 'start: 30168485', '', ' > stop: 30195968', '', 'length: 27483 bp' ));;return false;" title="gene " > alt="gene " target="_blank"/> > onclick="javascript:void(zmenu( 'exon1', '', 'start: 30168485', '', 'stop: > 30169003', '', 'length: 518 bp' ));;return false;" title="exon1 " a > lt="exon1 " target="_blank"/> > onclick="javascript:void(zmenu( 'intron1', '', 'start: 30169004', '', 'stop: > 30169083', '', 'length: 79 bp ' ));;return false;" title="intron1 > " alt="intron1 " target="_blank"/> > onclick="javascript:void(zmenu( 'exon2', '', 'start: 30169084', '', 'stop: > 30169299', '', 'length: 215 bp' ));;return false;" title="exon2 " a > lt="exon2 " target="_blank"/> > onclick="javascript:void(zmenu( 'intron2', '', 'start: 30169300', '', 'stop: > 30169373', '', 'length: 73 bp ' ));;return false;" title="intron2 > .. >
> > > This is part of the code I used in my HTML file to display the image map > and it really runs beautifully > with Mozilla 1.7 or the latest Firefox version. However, if used in IE6 > the clickable pop-ups do not appear/ work. > > I appreciate any help and would like to thank everyone for their help. > > Best regards, > > > Anh-Thu > ________________________________________________________________________ > GSF-Forschungszentrum > > Ingolst?dter Landstr. 1 > > 85764 M?nchen-Neuherberg, Germany > > Chairman of Supervisory Board: MinDir Dr. Peter Lange > > Board of Directors: Prof. Dr. G?nther Wess and Dr. Nikolaus Blum > > Register of Societies: Amtsgericht M?nchen HRB 6466 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Tue Jun 5 11:28:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Jun 2007 10:28:24 -0500 Subject: [Bioperl-l] [BioPython] Cannot parse GenBank file In-Reply-To: <46656D64.7010508@ribosome.natur.cuni.cz> References: <46655550.70400@ribosome.natur.cuni.cz> <46656D64.7010508@ribosome.natur.cuni.cz> Message-ID: <24065CBD-BBF6-4CA3-9523-AD50C524DAE5@uiuc.edu> Martin, The example file you give in the bioperl bugzilla report has several blank annotation lines which may lead to additional problems. When the BioPerl SeqIO parser finds annotation fields (SOURCE, ORGANISM, DEFINITION, etc) then it expects there will also be relevant data (text descriptions) accompanying it; I assume the BioPython parser expects likewise though I may be wrong. AFAIK the inclusion of field names w/o text isn't GenBank/EMBL- compliant. GenBank records lacking text either have a '.' instead or are left out entirely: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html We could add a fix but you should probably contact the ApE developers and request that field names w/o text be left out or have '.' added. chris On Jun 5, 2007, at 9:04 AM, Martin MOKREJ? wrote: > Ezequiel Panepucci wrote: >>> genbank entry = parser.parse(fhandle) >> >> there is a space character between "genbank" and "entry". >> It is a syntax error. >> I suppose you meant "genbank_entry" ? > > Yes, the next command was right and has shown the error. Sorry, I > forgot > to delete the first attempt. ;-) > >>>> genbank_entry = parser.parse(fhandle) > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", > line 187, in parse > self._scanner.feed(handle, self._consumer) > File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", > line 360, in feed > self._feed_first_line(consumer, self.line) > File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", > line 835, in _feed_first_line > assert False, \ > AssertionError: Did not recognise the LOCUS line layout: > LOCUS 6499 bp ds-DNA linear 02-AUG-2006 > >>>> > > Martin > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From stewarta at nmrc.navy.mil Tue Jun 5 11:34:14 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Tue, 5 Jun 2007 11:34:14 -0400 Subject: [Bioperl-l] Setting attributes on a Bio::DB::GFF::Feature object Message-ID: <95C9F539-A4C4-4B6A-8DA8-079B957BF909@nmrc.navy.mil> I see bidirectional mutator methods for source, type, strand, etc. in the Bio::DB::GFF::Feature documentation but I see that ->attributes is only able to get and not set the feature attributes. Is there no way to modify the attributes of a Bio::DB::GFF::Feature live? -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From cjfields at uiuc.edu Tue Jun 5 12:07:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 5 Jun 2007 11:07:41 -0500 Subject: [Bioperl-l] [BioPython] Cannot parse GenBank file In-Reply-To: <24065CBD-BBF6-4CA3-9523-AD50C524DAE5@uiuc.edu> References: <46655550.70400@ribosome.natur.cuni.cz> <46656D64.7010508@ribosome.natur.cuni.cz> <24065CBD-BBF6-4CA3-9523-AD50C524DAE5@uiuc.edu> Message-ID: One thing I missed which explains the biopython error: the LOCUS line is missing the locus identifier (see the NCBI example record link). This doesn't choke the bioperl parser but it appears to stop the biopython parser in it's tracks (maybe a feature instead of a bug!). You should try adding a unique identifier (maybe the name of the file or record) to the LOCUS line to see if it works: LOCUS testfile 6499 bp ds-DNA linear 02-AUG-2006 The bioperl parser in CVS writes out the correct alphabet when this is added: LOCUS testfile 6499 bp ds-DNA linear 02- AUG-2006 I'll try adding a warning to the bioperl parser for this. chris On Jun 5, 2007, at 10:28 AM, Chris Fields wrote: > Martin, > > The example file you give in the bioperl bugzilla report has several > blank annotation lines which may lead to additional problems. When > the BioPerl SeqIO parser finds annotation fields (SOURCE, ORGANISM, > DEFINITION, etc) then it expects there will also be relevant data > (text descriptions) accompanying it; I assume the BioPython parser > expects likewise though I may be wrong. > > AFAIK the inclusion of field names w/o text isn't GenBank/EMBL- > compliant. GenBank records lacking text either have a '.' instead or > are left out entirely: > > http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html > > We could add a fix but you should probably contact the ApE developers > and request that field names w/o text be left out or have '.' added. > > chris > > On Jun 5, 2007, at 9:04 AM, Martin MOKREJ? wrote: > >> Ezequiel Panepucci wrote: >>>> genbank entry = parser.parse(fhandle) >>> >>> there is a space character between "genbank" and "entry". >>> It is a syntax error. >>> I suppose you meant "genbank_entry" ? >> >> Yes, the next command was right and has shown the error. Sorry, I >> forgot >> to delete the first attempt. ;-) >> >>>>> genbank_entry = parser.parse(fhandle) >> Traceback (most recent call last): >> File "", line 1, in ? >> File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", >> line 187, in parse >> self._scanner.feed(handle, self._consumer) >> File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", >> line 360, in feed >> self._feed_first_line(consumer, self.line) >> File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", >> line 835, in _feed_first_line >> assert False, \ >> AssertionError: Did not recognise the LOCUS line layout: >> LOCUS 6499 bp ds-DNA linear 02-AUG-2006 >> >>>>> >> >> Martin >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From staffa at niehs.nih.gov Tue Jun 5 22:00:34 2007 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Tue, 05 Jun 2007 22:00:34 -0400 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: Message-ID: I am wondering if I knew what this error message exactly meant, if I could discern my error. I don't see much difference in this program and programs that worked. Can I assume that the new worked because an index file exists? I don't know how the filehandle UTR_TT_GENES gets involved. Maybe I should use some other module, but I really would like to have get_Seq_by_id functionality. The error message: Dpse ortholog = Dpse_GA17307 fetching GA17307 Can't call method "seq" on an undefined value at Match-emNEWTEST.pl line 84, line 4. Relevant code: #!/usr/bin/perl # # # use strict; use Bio::DB::Fasta; use Bio::Tools::SeqWords; use Bio::Seq; use Bio::SeqIO; # my $db = Bio::DB::Fasta->new('/home/staffa/clients/Kari/D_pse_genome/testit/TT_orthol ogs_Dpse_genes.fa', -makeid => \&make_my_id); ... ... ... my $pse_obj = $db->get_Seq_by_id('GA17307'); my $pse_sequence = $pse_obj->seq; Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From jason at bioperl.org Tue Jun 5 23:12:40 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 5 Jun 2007 20:12:40 -0700 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: References: Message-ID: the file handle is probably not important, Perl just reports this if there is a filehandle open. more importantly what is on line 84.... my guess is you are trying to get a sequence out and it doesn't exist - some error code around the lines getting the sequence out would be helpful. On Jun 5, 2007, at 7:00 PM, Staffa, Nick (NIH/NIEHS) wrote: > I am wondering if I knew what this error message exactly meant, if > I could > discern my error. > I don't see much difference in this program and programs that worked. > Can I assume that the new worked because an index file exists? > I don't know how the filehandle UTR_TT_GENES gets involved. > Maybe I should use some other module, but I really would like to have > get_Seq_by_id functionality. > > The error message: > Dpse ortholog = Dpse_GA17307 > fetching GA17307 > Can't call method "seq" on an undefined value at Match-emNEWTEST.pl > line 84, > line 4. > > Relevant code: > #!/usr/bin/perl > # > # > # > use strict; > use Bio::DB::Fasta; > use Bio::Tools::SeqWords; > use Bio::Seq; > use Bio::SeqIO; > # > my $db = > Bio::DB::Fasta->new('/home/staffa/clients/Kari/D_pse_genome/testit/ > TT_orthol > ogs_Dpse_genes.fa', > -makeid => \&make_my_id); > ... > ... > ... > my $pse_obj = $db->get_Seq_by_id('GA17307'); > my $pse_sequence = $pse_obj->seq; > > > > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070605/7e056ff6/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070605/7e056ff6/attachment-0001.bin From torsten.seemann at infotech.monash.edu.au Wed Jun 6 02:06:37 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 6 Jun 2007 16:06:37 +1000 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: References: Message-ID: Nick, > Can't call method "seq" on an undefined value at Match-emNEWTEST.pl line 84, The error makes it pretty clear. You are calling the ->seq method on an undefined value, ie. $pse_obj. > my $pse_obj = $db->get_Seq_by_id('GA17307'); # check we got something! die "sequence not in database" unless $pse_obj; > my $pse_sequence = $pse_obj->seq; -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University --Tel +61 3 9905 9010 From shameer at ncbs.res.in Wed Jun 6 02:27:42 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Wed, 6 Jun 2007 11:57:42 +0530 (IST) Subject: [Bioperl-l] Validation of files using BioPerl Message-ID: <34441.192.168.1.1.1181111262.squirrel@mail.ncbs.res.in> Dear All, How to validate an input file in fasta/PIR/GenPept/PDB format using Bioperl ? (This is to avoid unnecessary files to be submitted to servers by new users). Any module available ? Many thanks in advance, -- Shameer Khadar From cjfields at uiuc.edu Wed Jun 6 08:37:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Jun 2007 07:37:28 -0500 Subject: [Bioperl-l] Validation of files using BioPerl In-Reply-To: <34441.192.168.1.1.1181111262.squirrel@mail.ncbs.res.in> References: <34441.192.168.1.1.1181111262.squirrel@mail.ncbs.res.in> Message-ID: <39F5F622-0C93-4DC5-B969-491F789FC932@uiuc.edu> It has been discussed but never coded. I believe if it passes through the Bio::SeqIO parser it's generally considered validly formatted (spacing, balanced quotes), though it doesn't specifically check FT keys and qualifiers for invalid ones, look for missing annotation, check taxonomy, etc. As long as the end sequence mark (//) is present for every file, you cold try parsing the file into chunks (read with 'local $/ = '//';') and tossing the seq chunks as a filehandle (via IO::String) to a Bio::SeqIO object wrapped in an eval block (the parser resets $/, so it should work). Follow the eval with a check of $@ for caught errors. It might get tedious for big sequences... chris On Jun 6, 2007, at 1:27 AM, Shameer Khadar wrote: > Dear All, > > How to validate an input file in fasta/PIR/GenPept/PDB format using > Bioperl ? (This is to avoid unnecessary files to be submitted to > servers > by new users). Any module available ? > > Many thanks in advance, > -- > Shameer Khadar > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From staffa at niehs.nih.gov Wed Jun 6 10:40:49 2007 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Wed, 06 Jun 2007 10:40:49 -0400 Subject: [Bioperl-l] Bio::DB::Fasta In-Reply-To: Message-ID: Indeed. One must know what is actually in his header, AND one must write the appropriate make_id subroutine AND one must specify the exact ID. THEN things might work. And they did! THANK YOU On 6/6/07 2:06 AM, "Torsten Seemann" wrote: > Nick, > >> Can't call method "seq" on an undefined value at Match-emNEWTEST.pl line 84, > > The error makes it pretty clear. You are calling the ->seq method on > an undefined value, ie. $pse_obj. > >> my $pse_obj = $db->get_Seq_by_id('GA17307'); > > # check we got something! > die "sequence not in database" unless $pse_obj; > >> my $pse_sequence = $pse_obj->seq; > From jaudall at gmail.com Wed Jun 6 17:51:33 2007 From: jaudall at gmail.com (Joshua Udall) Date: Wed, 6 Jun 2007 15:51:33 -0600 Subject: [Bioperl-l] blastxml interation Message-ID: <52cea20c0706061451i39e44aeev8dc58d1e635665e7@mail.gmail.com> I was searching in the deobfuscator under *Bio::Search::Result::BlastResult*but there doesn't seem to be a method to extract the iteration number from a blastxml report. I can see this number being possibly useful to count the number of queries that didn't hit anything since the are no empty reports in the blastxml output. If I'm missing something, I would welcome an example how to retrieve the result iteration number. Thanks in advance for any suggestions. Josh From dmessina at wustl.edu Wed Jun 6 18:18:26 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 6 Jun 2007 17:18:26 -0500 Subject: [Bioperl-l] blastxml interation In-Reply-To: <52cea20c0706061451i39e44aeev8dc58d1e635665e7@mail.gmail.com> References: <52cea20c0706061451i39e44aeev8dc58d1e635665e7@mail.gmail.com> Message-ID: I think you want to look at the hits(), num_hits() and no_hits_found () methods. There is a private method _next_iteration_index() which should do what you asked for, but num_hits() looks like the better way. By the way, hits() and num_hits() are listed on the Deobfuscator as having no documentation. This (as the below shows) is incorrect and is due to some nonstandard formatting issues which I will correct. _next_iteration_index() isn't listed on the Deobfuscator because it's a private method. Hope this helps! Dave hits() This method overrides Bio::Search::Result::GenericResult::hits to take into account the possibility of multiple iterations, as occurs in PSI- BLAST reports. If there are multiple iterations, all 'new' hits for all iterations are returned. These are the hits that did not occur in a previous iteration. See Also: Bio::Search::Result::GenericResult::hits num_hits() This method overrides Bio::Search::Result::GenericResult::num_hits to take into account the possibility of multiple iterations, as occurs in PSI- BLAST reports. If there are multiple iterations, calling num_hits() returns the number of 'new' hits for each iteration. These are the hits that did not occur in a previous iteration. See Also: Bio::Search::Result::GenericResult::num_hits no_hits_found() Usage : $nohits = $blast->no_hits_found( $iteration_number ); Purpose : Get boolean indicator indicating whether or not any hits were present in the report. This is NOT the same as determining the number of hits via the hits() method, which will return zero hits if there were no hits in the report or if all hits were filtered out during the parse. Thus, this method can be used to distinguish these possibilities for hitless reports generated when filtering. Returns : Boolean Argument : (optional) integer indicating the iteration number (PSI- BLAST) If iteration number is not specified and this is a PSI- BLAST result, then this method will return true only if all iterations had no hits found. From apurva at cshl.edu Wed Jun 6 19:51:45 2007 From: apurva at cshl.edu (Apurva Narechania) Date: Wed, 6 Jun 2007 19:51:45 -0400 Subject: [Bioperl-l] non-palindromic issue in Bio::Restriction::Analysis Message-ID: <3F7C7E33-416A-4141-969A-DDC4716E8A44@cshl.edu> Hi, I was hoping you could confirm and give me some feedback on an issue I think I've found with the Bio::Restriction::Analysis module. I am using the enzyme AciI, a non-palindromic restriction enzyme with a 5' C | CGC 3' recognition site. The module should search both the forward and the reverse complement strings in the case of a non- palindromic enzyme. I have found that the this works only intermittently. For example, the following sequence: GAAAAAAACAAAGGAAGAAGCTAGCTAGCAGGGCACGCGGTTTGAGGATGGCTGGTGGCCGACCGCAGGGCG CGCGGTTG GAGGATTGCTGGTGGCCGACCAGATGAAACTCACGCGCGGCTGGGGACAGCTGGAATATTTGGGCGGCGGCG GCTGGTAT TACGGGAAAGGAGAGATAGGGTTTTGGACGGCAGCAGCTGGTATTTGGGCCACCAATTTTGCGCGCCAGTAC AGGACACC GATGCCGCAAATTGCACAATGCCTTTTATGGCGACTGACAGTGCGATGCTATAGGTATGAATTGTCGACTGA CAAAGTGA CACTATTCACATATAAATATAACGAATAACACTCAGTTGGAATATAGACATATGCCGACTCACCATCTGTGG CAATGTAT ACCGACTAACAATTCGATGCTAATTCTCTATTTATAGCGACAGTCGTCAGACACTAATTTGGTGTTGTGGTA TAATGCTA GTGCCTCACCGCTGTAGGTGTTGGTCTACTGGTGC Should digest into 10 fragments using this enzyme, but the module produces only 7. Could you please confirm this behavior, and if observed, suggest some possible fixes? This may be a bug in the _non_pal_enz method, or may be me overlooking something pretty obvious. Thanks, Apurva Narechania. From cjfields at uiuc.edu Wed Jun 6 20:51:00 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 6 Jun 2007 19:51:00 -0500 Subject: [Bioperl-l] blastxml interation In-Reply-To: References: <52cea20c0706061451i39e44aeev8dc58d1e635665e7@mail.gmail.com> Message-ID: Joshua, Just to make sure there is no confusion, do you mean a Bio::Search::Iteration::IterationI-based object? The iteration tags have multiple meanings apparently in BLAST XML output (multiple queries, multiple PSI-BLAST iterations). The current SearchIO::blastxml parser returns multiple Bio::Search::Result::BlastResult objects based on the iterations, so PSI-BLAST output is treated as multiple BLAST reports regardless (i.e. no Iteration objects). This is something I want to rectify but it may not be a easy fix. chris On Jun 6, 2007, at 5:18 PM, David Messina wrote: > I think you want to look at the hits(), num_hits() and no_hits_found > () methods. There is a private method _next_iteration_index() which > should do what you asked for, but num_hits() looks like the better > way. > > By the way, hits() and num_hits() are listed on the Deobfuscator as > having no documentation. This (as the below shows) is incorrect and > is due to some nonstandard formatting issues which I will correct. > _next_iteration_index() isn't listed on the Deobfuscator because it's > a private method. > > > Hope this helps! > Dave > > > hits() > > This method overrides Bio::Search::Result::GenericResult::hits to take > into account the possibility of multiple iterations, as occurs in PSI- > BLAST reports. > If there are multiple iterations, all 'new' hits for all iterations > are returned. > These are the hits that did not occur in a previous iteration. > See Also: Bio::Search::Result::GenericResult::hits > > num_hits() > > This method overrides Bio::Search::Result::GenericResult::num_hits to > take > into account the possibility of multiple iterations, as occurs in PSI- > BLAST reports. > If there are multiple iterations, calling num_hits() returns the > number of > 'new' hits for each iteration. These are the hits that did not occur > in a previous iteration. > See Also: Bio::Search::Result::GenericResult::num_hits > > no_hits_found() > > Usage : $nohits = $blast->no_hits_found( $iteration_number ); > Purpose : Get boolean indicator indicating whether or not any hits > were present in the report. > This is NOT the same as determining the number of > hits via > the hits() method, which will return zero hits if there > were no > hits in the report or if all hits were filtered out > during the parse. > > Thus, this method can be used to distinguish these > possibilities > for hitless reports generated when filtering. > > Returns : Boolean > Argument : (optional) integer indicating the iteration number (PSI- > BLAST) > If iteration number is not specified and this is a PSI- > BLAST result, > then this method will return true only if all > iterations had > no hits found. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Wed Jun 6 20:45:14 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 Jun 2007 20:45:14 -0400 Subject: [Bioperl-l] PostgreSQL schema support in BioSQL and bioperl-db Message-ID: I have added support to BioSQL and bioperl-db for schemas in PostgreSQL. A schema in PostgreSQL is more or less a namespace for database objects (tables, indexes, views, etc) within a database. (A database in PostgreSQL is similar to the concept of a user in Oracle or MySQL, and therefore for the latter two schemas are synonymous with a user. [Not sure I'm still up-to-date on this for MySQL, but at least that's what I recall.]) When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you specify the schema in which BioSQL resides using the --schema option. If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call also accepts a -schema named parameter, and Bio::DB::DBContextI objects have a $dbc->schema() property for getting/setting the schema, Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may also add the property to the .bioperldb connection parameter file (-schema => 'yourschemahere'). Thanks for Brian Osborne for being the instigator (and tester, and for adding the code to load_ncbi_taxonomy.pl - I came too late). -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jaudall at gmail.com Wed Jun 6 17:41:08 2007 From: jaudall at gmail.com (Joshua Udall) Date: Wed, 6 Jun 2007 15:41:08 -0600 Subject: [Bioperl-l] blastxml interation number Message-ID: <52cea20c0706061441n96ce803v9422e8d14461c2bd@mail.gmail.com> I was searching in the deobfuscator under *Bio::Search::Result::BlastResult*but there doesn't seem to be a method to extract the iteration number from a blastxml report. I can see this number being very useful to count the number of queries that didn't hit anything since the are no empty reports in the blastxml output. If I'm missing something, I would welcome an example how to retrieve the result iteration number, otherwise I'm suggesting that an iteration_count feature be added to the Result object. Thanks in advance for any suggestions. Josh From holland at ebi.ac.uk Thu Jun 7 03:33:25 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 07 Jun 2007 08:33:25 +0100 Subject: [Bioperl-l] PostgreSQL schema support in BioSQL and bioperl-db In-Reply-To: References: Message-ID: <4667B4C5.6070107@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sounds great. BioJava users shouldn't need to change anything to get this to work as PostgreSQL JDBC connection objects already require you to specify a schema. cheers, Richard Hilmar Lapp wrote: > I have added support to BioSQL and bioperl-db for schemas in PostgreSQL. > A schema in PostgreSQL is more or less a namespace for database objects > (tables, indexes, views, etc) within a database. > > (A database in PostgreSQL is similar to the concept of a user in Oracle > or MySQL, and therefore for the latter two schemas are synonymous with a > user. [Not sure I'm still up-to-date on this for MySQL, but at least > that's what I recall.]) > > When using the load_{seqdatabase,ontology,ncbi_taxonomy}.pl scripts, you > specify the schema in which BioSQL resides using the --schema option. > > If you are using bioperl-db as a library, the Bio::DB::BioDB->new() call > also accepts a -schema named parameter, and Bio::DB::DBContextI objects > have a $dbc->schema() property for getting/setting the schema, > Bio::DB::SimpleDBContext->new() accepts a -schema parameter, and you may > also add the property to the .bioperldb connection parameter file > (-schema => 'yourschemahere'). > > Thanks for Brian Osborne for being the instigator (and tester, and for > adding the code to load_ncbi_taxonomy.pl - I came too late). > > -hilmar > --=========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGZ7TF4C5LeMEKA/QRApwUAJ48q46iX152pB6Xcc/717Ie8foUTQCgm3ij W/+0iO/ZsNDn1pLuf5yXbYA= =asUn -----END PGP SIGNATURE----- From mmokrejs at ribosome.natur.cuni.cz Thu Jun 7 10:26:44 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Thu, 07 Jun 2007 16:26:44 +0200 Subject: [Bioperl-l] [BioPython] Cannot parse GenBank file In-Reply-To: References: <46655550.70400@ribosome.natur.cuni.cz> <46656D64.7010508@ribosome.natur.cuni.cz> <24065CBD-BBF6-4CA3-9523-AD50C524DAE5@uiuc.edu> Message-ID: <466815A4.9060505@ribosome.natur.cuni.cz> Hi, Chris Fields wrote: > One thing I missed which explains the biopython error: the LOCUS line is > missing the locus identifier (see the NCBI example record link). This > doesn't choke the bioperl parser but it appears to stop the biopython > parser in it's tracks (maybe a feature instead of a bug!). > > You should try adding a unique identifier (maybe the name of the file or > record) to the LOCUS line to see if it works: > > LOCUS testfile 6499 bp ds-DNA linear 02-AUG-2006 > > The bioperl parser in CVS writes out the correct alphabet when this is > added: > > LOCUS testfile 6499 bp ds-DNA linear 02-AUG-2006 > > I'll try adding a warning to the bioperl parser for this. I have updated http://bugzilla.open-bio.org/show_bug.cgi?id=2305 but let me emphasize the LOCUS line now contains LOCUS pRL 5428 bp ds-DNA linear 07-JUN-2007 which still does not comply with the line you have proposed. But it can be parsed by bioperl-live from cvs. Is it still wrong? Testcase as pRL.gb-new in the bugzilla record #2305. Martin > > chris > > On Jun 5, 2007, at 10:28 AM, Chris Fields wrote: > >> Martin, >> >> The example file you give in the bioperl bugzilla report has several >> blank annotation lines which may lead to additional problems. When >> the BioPerl SeqIO parser finds annotation fields (SOURCE, ORGANISM, >> DEFINITION, etc) then it expects there will also be relevant data >> (text descriptions) accompanying it; I assume the BioPython parser >> expects likewise though I may be wrong. >> >> AFAIK the inclusion of field names w/o text isn't GenBank/EMBL- >> compliant. GenBank records lacking text either have a '.' instead or >> are left out entirely: >> >> http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html >> >> We could add a fix but you should probably contact the ApE developers >> and request that field names w/o text be left out or have '.' added. >> >> chris >> >> On Jun 5, 2007, at 9:04 AM, Martin MOKREJ? wrote: >> >>> Ezequiel Panepucci wrote: >>>>> genbank entry = parser.parse(fhandle) >>>> >>>> there is a space character between "genbank" and "entry". >>>> It is a syntax error. >>>> I suppose you meant "genbank_entry" ? >>> >>> Yes, the next command was right and has shown the error. Sorry, I >>> forgot >>> to delete the first attempt. ;-) >>> >>>>>> genbank_entry = parser.parse(fhandle) >>> Traceback (most recent call last): >>> File "", line 1, in ? >>> File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py", >>> line 187, in parse >>> self._scanner.feed(handle, self._consumer) >>> File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", >>> line 360, in feed >>> self._feed_first_line(consumer, self.line) >>> File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py", >>> line 835, in _feed_first_line >>> assert False, \ >>> AssertionError: Did not recognise the LOCUS line layout: >>> LOCUS 6499 bp ds-DNA linear 02-AUG-2006 >>> >>>>>> >>> >>> Martin >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > -- Dr. Martin Mokrejs Dept. of Genetics and Microbiology Faculty of Science, Charles University Vinicna 5, 128 43 Prague, Czech Republic http://www.iresite.org http://www.iresite.org/~mmokrejs From cjfields at uiuc.edu Thu Jun 7 11:31:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 7 Jun 2007 10:31:45 -0500 Subject: [Bioperl-l] [BioPython] Cannot parse GenBank file In-Reply-To: <466815A4.9060505@ribosome.natur.cuni.cz> References: <46655550.70400@ribosome.natur.cuni.cz> <46656D64.7010508@ribosome.natur.cuni.cz> <24065CBD-BBF6-4CA3-9523-AD50C524DAE5@uiuc.edu> <466815A4.9060505@ribosome.natur.cuni.cz> Message-ID: <2A403865-F1E8-4D19-8D19-455C22E7C6D9@uiuc.edu> On Jun 7, 2007, at 9:26 AM, Martin MOKREJ? wrote: > Hi, > > Chris Fields wrote: >> One thing I missed which explains the biopython error: the LOCUS >> line is missing the locus identifier (see the NCBI example record >> link). This doesn't choke the bioperl parser but it appears to >> stop the biopython parser in it's tracks (maybe a feature instead >> of a bug!). >> You should try adding a unique identifier (maybe the name of the >> file or record) to the LOCUS line to see if it works: >> LOCUS testfile 6499 bp ds-DNA linear 02-AUG-2006 >> The bioperl parser in CVS writes out the correct alphabet when >> this is added: >> LOCUS testfile 6499 bp ds-DNA linear 02- >> AUG-2006 >> I'll try adding a warning to the bioperl parser for this. > > I have updated http://bugzilla.open-bio.org/show_bug.cgi?id=2305 > but let me > emphasize the LOCUS line now contains > LOCUS pRL 5428 bp ds-DNA linear > 07-JUN-2007 > > > which still does not comply with the line you have proposed. But it > can be > parsed by bioperl-live from cvs. Is it still wrong? Testcase as > pRL.gb-new > in the bugzilla record #2305. > > Martin That should work. There isn't a strict uniqueness test (that would require caching and isn't worth the trouble IMHO), though it's required you add something unique for the accession/locus if you plan on indexing them in the future. Parsing GenBank data produced from third-party software is problematic at best; there seems to be no steadfast rule with GenBank output for some programs, even though the specification is plainly stated in the NCBI release notes. My take on that is to have a stricter (read:follows release notes) GenBank parser which passes off the data in the record to default handler methods. A user could then subjugate the defined handlers with their own by subclassing the default handler class and overloading the methods or adding their own code references directly. chris ... From rich at thevillas.eclipse.co.uk Fri Jun 8 07:00:45 2007 From: rich at thevillas.eclipse.co.uk (richard) Date: Fri, 08 Jun 2007 12:00:45 +0100 Subject: [Bioperl-l] protparam Message-ID: <466936DD.8080604@thevillas.eclipse.co.uk> Hi, I noticed that in April someone asked whether there was a bioperl mod for obtaining protein sequence related properties using protparam. I have a module that could potentially be submitted to bioperl for this purpose. Does anybody have any thoughts on whether it should go in? Example script and the module are at: http://81.5.159.173/webshare/ Cheers Rich From cjfields at uiuc.edu Fri Jun 8 08:37:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Jun 2007 07:37:27 -0500 Subject: [Bioperl-l] protparam In-Reply-To: <466936DD.8080604@thevillas.eclipse.co.uk> References: <466936DD.8080604@thevillas.eclipse.co.uk> Message-ID: <4F4085B4-E500-4FF1-88A2-9AA27F28F661@uiuc.edu> Richard, We'll gladly add this in, though it'll need to be bioperlized (inherit Bio::Root::Root). We also generally ask for tests but it should be easy to write up a quick test suite using any protein seq. If you can could you add some bioperl-like POD to the module (i.e. SYNOPSIS, AUTHOR, DESCRIPTION, etc)? thanks! chris On Jun 8, 2007, at 6:00 AM, richard wrote: > > Hi, > > I noticed that in April someone asked whether there was a bioperl mod > for obtaining protein sequence related properties using protparam. > I have a module that could potentially be submitted to bioperl for > this > purpose. Does anybody have any thoughts on whether it should go in? > > Example script and the module are at: > > http://81.5.159.173/webshare/ > > > Cheers > Rich > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mmokrejs at ribosome.natur.cuni.cz Fri Jun 8 07:09:42 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Fri, 08 Jun 2007 13:09:42 +0200 Subject: [Bioperl-l] How to draw a plasmid map from a genbank-formatted file? Message-ID: <466938F6.7050903@ribosome.natur.cuni.cz> Hi, how can I convert GenBank/EMBL formatted file to a GFF file? The manpage for Bio::Graphics::FeatureFile does not help me in this way. The information is in the file, so I want just to extract the features to a GFF format, probably somewhere the sequence has to be stored ... Is there a tool so I can convert it automatically? ;) This would be great. I can't make the GFF manually for every file. Other programs draw plasmid maps also automatically from the GenBank formatted input so how can I do it in bioperl? Thanks for help, Martin From shameer at ncbs.res.in Fri Jun 8 10:11:00 2007 From: shameer at ncbs.res.in (Shameer Khadar) Date: Fri, 8 Jun 2007 19:41:00 +0530 (IST) Subject: [Bioperl-l] protparam In-Reply-To: <4F4085B4-E500-4FF1-88A2-9AA27F28F661@uiuc.edu> References: <466936DD.8080604@thevillas.eclipse.co.uk> <4F4085B4-E500-4FF1-88A2-9AA27F28F661@uiuc.edu> Message-ID: <54411.192.168.1.1.1181311860.squirrel@mail.ncbs.res.in> Richard, I asked for protparam module in bioperl ! Thats a good job. Cheers, SK > Richard, > > We'll gladly add this in, though it'll need to be bioperlized > (inherit Bio::Root::Root). We also generally ask for tests but it > should be easy to write up a quick test suite using any protein seq. > > If you can could you add some bioperl-like POD to the module (i.e. > SYNOPSIS, AUTHOR, DESCRIPTION, etc)? > > thanks! > > chris > > On Jun 8, 2007, at 6:00 AM, richard wrote: > >> >> Hi, >> >> I noticed that in April someone asked whether there was a bioperl mod >> for obtaining protein sequence related properties using protparam. >> I have a module that could potentially be submitted to bioperl for >> this >> purpose. Does anybody have any thoughts on whether it should go in? >> >> Example script and the module are at: >> >> http://81.5.159.173/webshare/ >> >> >> Cheers >> Rich >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shameer Khadar Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India T - 91-080-23666001 EXT - 6251 W - http://www.ncbs.res.in From dmessina at wustl.edu Fri Jun 8 10:58:20 2007 From: dmessina at wustl.edu (David Messina) Date: Fri, 8 Jun 2007 09:58:20 -0500 Subject: [Bioperl-l] How to draw a plasmid map from a genbank-formatted file? In-Reply-To: <466938F6.7050903@ribosome.natur.cuni.cz> References: <466938F6.7050903@ribosome.natur.cuni.cz> Message-ID: <56BAE06F-2FDF-4FA4-B6A0-96D89470AF4C@wustl.edu> Hi Martin, You're in luck -- the BioPerl core distribution includes two scripts for doing just that: genbank2gff genbank2gff3 Look in the scripts directory of the distro. Also, there is a *huge* amount of documentation and examples on the BioPerl website. http://www.bioperl.org/wiki/HOWTOs Reading those, reading the FAQ, and searching the mailing list archives are where I look first when I don't know how to do something in BioPerl. Dave -- Dave Messina Senior Analyst, Assembly Group Genome Sequencing Center Washington University St. Louis, MO From rich at thevillas.eclipse.co.uk Fri Jun 8 11:51:21 2007 From: rich at thevillas.eclipse.co.uk (richard) Date: Fri, 08 Jun 2007 16:51:21 +0100 Subject: [Bioperl-l] protparam In-Reply-To: <4F4085B4-E500-4FF1-88A2-9AA27F28F661@uiuc.edu> References: <466936DD.8080604@thevillas.eclipse.co.uk> <4F4085B4-E500-4FF1-88A2-9AA27F28F661@uiuc.edu> Message-ID: <46697AF9.2090502@thevillas.eclipse.co.uk> Hi, ok, great, that's no problem. I'll add the POD and bioperlize it, thanks Rich Chris Fields wrote: > Richard, > > We'll gladly add this in, though it'll need to be bioperlized > (inherit Bio::Root::Root). We also generally ask for tests but it > should be easy to write up a quick test suite using any protein seq. > > If you can could you add some bioperl-like POD to the module (i.e. > SYNOPSIS, AUTHOR, DESCRIPTION, etc)? > > thanks! > > chris > > On Jun 8, 2007, at 6:00 AM, richard wrote: > > >> Hi, >> >> I noticed that in April someone asked whether there was a bioperl mod >> for obtaining protein sequence related properties using protparam. >> I have a module that could potentially be submitted to bioperl for >> this >> purpose. Does anybody have any thoughts on whether it should go in? >> >> Example script and the module are at: >> >> http://81.5.159.173/webshare/ >> >> >> Cheers >> Rich >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Fri Jun 8 13:45:17 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 8 Jun 2007 12:45:17 -0500 Subject: [Bioperl-l] protparam In-Reply-To: <46697AF9.2090502@thevillas.eclipse.co.uk> References: <466936DD.8080604@thevillas.eclipse.co.uk> <4F4085B4-E500-4FF1-88A2-9AA27F28F661@uiuc.edu> <46697AF9.2090502@thevillas.eclipse.co.uk> Message-ID: Another issue is namespace. I suggest Bio::Tools::ProtParam, though there may be some others out there. We can add support for direct Bio::Seq/PrimarySeq input and other odds and ends once it's committed. Good work! chris On Jun 8, 2007, at 10:51 AM, richard wrote: > > Hi, > > ok, great, that's no problem. I'll add the POD and bioperlize it, > > thanks > Rich > > Chris Fields wrote: >> Richard, >> >> We'll gladly add this in, though it'll need to be bioperlized >> (inherit Bio::Root::Root). We also generally ask for tests but it >> should be easy to write up a quick test suite using any protein seq. >> >> If you can could you add some bioperl-like POD to the module (i.e. >> SYNOPSIS, AUTHOR, DESCRIPTION, etc)? >> >> thanks! >> >> chris >> >> On Jun 8, 2007, at 6:00 AM, richard wrote: >> >> >>> Hi, >>> >>> I noticed that in April someone asked whether there was a bioperl >>> mod >>> for obtaining protein sequence related properties using protparam. >>> I have a module that could potentially be submitted to bioperl for >>> this >>> purpose. Does anybody have any thoughts on whether it should go in? >>> >>> Example script and the module are at: >>> >>> http://81.5.159.173/webshare/ >>> >>> >>> Cheers >>> Rich >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Mon Jun 11 07:30:24 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 Jun 2007 07:30:24 -0400 Subject: [Bioperl-l] script to load ITIS taxonomy Message-ID: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net> Hi all - I added a script to load the ITIS taxonomy (www.itis.gov) into the phylodb module. It is called load_itis_taxonomy.pl and is in the scripts/ directory. It is independent of BioPerl right now (the ITIS download is either a MS SQL Server or an Informix dump - no kidding), but I'm hoping that at some point support for this can be integrated into Bio::TreeIO. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon Jun 11 08:24:50 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 11 Jun 2007 07:24:50 -0500 Subject: [Bioperl-l] script to load ITIS taxonomy In-Reply-To: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net> References: <897DB32F-4AEE-4388-A499-C71BFD2281DE@gmx.net> Message-ID: <99AC6C0F-10DD-4587-AFB3-32BC495CD2BD@uiuc.edu> On Jun 11, 2007, at 6:30 AM, Hilmar Lapp wrote: > Hi all - > > I added a script to load the ITIS taxonomy (www.itis.gov) into the > phylodb module. It is called load_itis_taxonomy.pl and is in the > scripts/ directory. > > It is independent of BioPerl right now (the ITIS download is either a > MS SQL Server or an Informix dump - no kidding), but I'm hoping that > at some point support for this can be integrated into Bio::TreeIO. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== I second the TreeIO support. Anyone up for it? chris From ryanx07 at hotmail.com Mon Jun 11 11:24:31 2007 From: ryanx07 at hotmail.com (L Xu) Date: Mon, 11 Jun 2007 10:24:31 -0500 Subject: [Bioperl-l] basic questions Message-ID: I just started to learn BioPerl by reading the BioPerl Tutorial on the BioPerl website. By trying the 1st example on my window, use Bio::Perl; $seq_object = get_sequence('swiss',"ID ROA1_HUMAN"); write_sequence(">roa1.fasta",'fasta',$seq_object); I got the error as the following: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: swissprot stream with no ID. Not swissprot in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:350 STACK: Bio::SeqIO::swiss::next_seq C:/Perl/site/lib/Bio\SeqIO\swiss.pm:178 STACK: Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:15 3 STACK: Bio::Perl::get_sequence C:/Perl/site/lib/Bio/Perl.pm:510 STACK: t8.pl:7 I cannot figure out where is wrong but cannot find the solution on the web. Could someone help me please? Also, this lead to my 2nd question: is there a way to search in the archieve of the current list? Thanks so much R ___________________________________________________________ Sent by ePrompter, the premier email notification software. Free download at http://www.ePrompter.com. _________________________________________________________________ Like puzzles? Play free games & earn great prizes. Play Clink now. http://club.live.com/clink.aspx?icid=clink_hotmailtextlink2 From dmessina at wustl.edu Mon Jun 11 12:34:29 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 11 Jun 2007 11:34:29 -0500 Subject: [Bioperl-l] basic questions In-Reply-To: References: Message-ID: <25517EA3-7BDA-44AC-BDF3-93A6810D9D63@wustl.edu> The example code works here, but I'm on OS X. Could you tell us which version of Perl and BioPerl you are using, and which operating system? Are you getting anything in the roa1.fasta file? > is there a way to search in the archieve of the current list? http://www.bioperl.org/wiki/Mailing_lists Dave From dmessina at wustl.edu Mon Jun 11 14:48:23 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 11 Jun 2007 13:48:23 -0500 Subject: [Bioperl-l] basic questions In-Reply-To: References: Message-ID: <127743A7-1923-4DBF-A96E-276B5E0A7692@wustl.edu> Hi, Please use 'Reply All' so everyone on the list can follow the discussion. Try adding the following line after the line that starts with $seq_object: print STDERR ref($seq_object), "\t", $seq_object->display_id, "\n"; And then run the program again. What do you get? Could you post a complete printout of what you're doing? Dave On Jun 11, 2007, at 11:45 AM, L Xu wrote: > I used WinXP with BioPerl Inst_version 2.1.8 (Bioperl 1.5.2) and > activeperl 5.8.8.819 Thank you very much. From johnsonm at gmail.com Mon Jun 11 20:45:13 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Mon, 11 Jun 2007 19:45:13 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Gene::Exon throws exception when encountering split location (Bio::Location::Split) Message-ID: This bit in Bio::SeqFeature::Gene::Exon is causing me some problems trying to extend Bio::Tools::Glimmer to handle 'wraparound' genes (circular genomes): sub location { my ($self,$value) = @_; if(defined($value) && $value->isa('Bio::Location::SplitLocationI')) { $self->throw("split or compound location is not allowed ". "for an object of type " . ref($self)); } return $self->SUPER::location($value); } That seems to be there all the way back to the initial revision (checked in by Hilmar). I presume it's there because of code like this ( from the seq() method in Bio::SeqFeature::Generic): # assumming our seq object is sensible, it should not have to yank # the entire sequence out here. my $seq = $self->{'_gsf_seq'}->trunc($self->start(), $self->end()); That's not going to work too well with a feature that has a Bio::Location::Split location. Fixing it up seems straightforward, if a bit hackish. Something like: my $seq; if (ref($self->location()) eq 'Bio::Location::Split')) { my $seqstring; my @sublocs = $self->location()->sub_Location(); foreach my $subloc (@sublocs) { $seqstring .= $self->{'_gsf_seq'}->trunc($subloc->start(), $subloc->end())->seq(); } my $seq = Bio::Seq->new( -id => $self->{'_gsf_seq'}->display_id(), -seq => $seqstring ); } else { $seq = $self->{'_gsf_seq'}->trunc($self->start(), $self->end()); } I don't see any companion to trunc() in Bio::PrimarySeqI for joining sequences. A join() would be handy, and make the above cleaner. Comments, suggestions, rotten fruit? From torsten.seemann at infotech.monash.edu.au Tue Jun 12 02:18:27 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 12 Jun 2007 16:18:27 +1000 Subject: [Bioperl-l] Bio::SeqFeature::Gene::Exon throws exception when encountering split location (Bio::Location::Split) In-Reply-To: References: Message-ID: Mark, > if (ref($self->location()) eq 'Bio::Location::Split')) { > my $seqstring; > my @sublocs = $self->location()->sub_Location(); > > foreach my $subloc (@sublocs) { > $seqstring .= $self->{'_gsf_seq'}->trunc($subloc->start(), > $subloc->end())->seq(); > } Can you use the ->spliced_seq() method to do this? http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/SeqFeatureI.html#POD11 -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University --Tel +61 3 9905 9010 From pengchy at yahoo.com.cn Tue Jun 12 03:00:46 2007 From: pengchy at yahoo.com.cn (=?gb2312?q?=D1=EE=20=C5=F4=B3=CC?=) Date: Tue, 12 Jun 2007 15:00:46 +0800 (CST) Subject: [Bioperl-l] Can't locate loadable object for module TFBS::Ext::pwmsearch Message-ID: <66745.92089.qm@web15205.mail.cnb.yahoo.com> hi all, Today, I download the TFBS package from http://forkhead.cgb.ki.se/TFBS/, and uncompress it and copy all the files contained in the TFBS and Ext directories to directory "C:\perl\site\lib", then put Ext under the TFBS directory. I run the example script1.pl, but a wrong message respond: Can't locate loadable object for module TFBS::Ext::pwmsearch in @INC (@INC contains: C:/perl/site/lib C:/perl/lib .) at C:/perl/site/lib/TFBS/Matrix/PWM.pm line 141 Compilation failed in require at C:/perl/site/lib/TFBS/Matrix/PWM.pm line 141, line 206. BEGIN failed--compilation aborted at C:/perl/site/lib/TFBS/Matrix/PWM.pm line 141, line 206. Compilation failed in require at C:/perl/site/lib/TFBS/Matrix/PFM.pm line 137, < DATA> line 206. BEGIN failed--compilation aborted at C:/perl/site/lib/TFBS/Matrix/PFM.pm line 137, line 206. Compilation failed in require at C:/perl/site/lib/TFBS/DB/TRANSFAC.pm line 52, line 206. BEGIN failed--compilation aborted at C:/perl/site/lib/TFBS/DB/TRANSFAC.pm line 52, line 206. Compilation failed in require at script1.pl line 3, line 206. BEGIN failed--compilation aborted at script1.pl line 3, line 206. shell returned 2 when I run the list_matrices.pl script, the same message respond. But when I empty the pwmsearch.pm file, following message respond: TFBS/Ext/pwmsearch.pm did not return a true value at :/perl/site/lib/TFBS/Matr x/PWM.pm line 141, line 206. BEGIN failed--compilation aborted at C:/perl/site/lib/TFBS/Matrix/PWM.pm line 11, line 206. Compilation failed in require at C:/perl/site/lib/TFBS/Matrix/PFM.pm line 137, DATA> line 206. BEGIN failed--compilation aborted at C:/perl/site/lib/TFBS/Matrix/PFM.pm line 17, line 206. Compilation failed in require at C:/perl/site/lib/TFBS/DB/TRANSFAC.pm line 52, DATA> line 206. BEGIN failed--compilation aborted at C:/perl/site/lib/TFBS/DB/TRANSFAC.pm line2, line 206. Compilation failed in require at script1.pl line 3, line 206. BEGIN failed--compilation aborted at script1.pl line 3, line 206. Is anyone else meet the same problem? Is it a bug for TFBS package? Best wishes! Sincerely, Pengcheng --------------------------------- ????????????????3.5G??????20M?????? From bix at sendu.me.uk Tue Jun 12 03:32:02 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 12 Jun 2007 08:32:02 +0100 Subject: [Bioperl-l] Can't locate loadable object for module TFBS::Ext::pwmsearch In-Reply-To: <66745.92089.qm@web15205.mail.cnb.yahoo.com> References: <66745.92089.qm@web15205.mail.cnb.yahoo.com> Message-ID: <466E4BF2.7020504@sendu.me.uk> ? ?? wrote: > hi all, > > Today, I download the TFBS package from > http://forkhead.cgb.ki.se/TFBS/, and uncompress it and copy all the > files contained in the TFBS and Ext directories to directory > "C:\perl\site\lib", then put Ext under the TFBS directory. I run the > example script1.pl, but a wrong message respond: > > Can't locate loadable object for module TFBS::Ext::pwmsearch in @INC You have to follow the installation instructions in the README file. Copying the files out is insufficient - you have to 'make'. From ryanx07 at hotmail.com Tue Jun 12 07:30:09 2007 From: ryanx07 at hotmail.com (L Xu) Date: Tue, 12 Jun 2007 06:30:09 -0500 Subject: [Bioperl-l] basic questions In-Reply-To: <127743A7-1923-4DBF-A96E-276B5E0A7692@wustl.edu> Message-ID: Here is the code: use Bio::Perl; $seq_object = get_sequence('swiss',"ROA1_HUMAN"); print STDERR ref($seq_object), "\t", $seq_object->display_id, "\n"; write_sequence(">roa1.fasta",'fasta',$seq_object); The output looks like the same as the previous version: Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\~Scripts>perl test.pl ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: swissprot stream with no ID. Not swissprot in my book STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:350 STACK: Bio::SeqIO::swiss::next_seq C:/Perl/site/lib/Bio\SeqIO\swiss.pm:178 STACK: Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:15 3 STACK: Bio::Perl::get_sequence C:/Perl/site/lib/Bio/Perl.pm:510 STACK: test.pl:7 ----------------------------------------------------------- Thanks. >From: David Messina >To: L Xu >CC: BioPerl list >Subject: Re: [Bioperl-l] basic questions >Date: Mon, 11 Jun 2007 13:48:23 -0500 > >Hi, > >Please use 'Reply All' so everyone on the list can follow the discussion. > >Try adding the following line after the line that starts with $seq_object: > > print STDERR ref($seq_object), "\t", $seq_object->display_id, "\n"; > >And then run the program again. What do you get? Could you post a complete >printout of what you're doing? > > >Dave > > >On Jun 11, 2007, at 11:45 AM, L Xu wrote: >>I used WinXP with BioPerl Inst_version 2.1.8 (Bioperl 1.5.2) and >>activeperl 5.8.8.819 Thank you very much. > _________________________________________________________________ Picture this ? share your photos and you could win big! http://www.GETREALPhotoContest.com?ocid=TXT_TAGHM&loc=us From pengchy at yahoo.com.cn Tue Jun 12 10:33:15 2007 From: pengchy at yahoo.com.cn (Pengcheng Yang) Date: Tue, 12 Jun 2007 22:33:15 +0800 (CST) Subject: [Bioperl-l] =?gb2312?q?=BB=D8=B8=B4=A3=BA=20Re:=20=20basic=20questions?= In-Reply-To: Message-ID: <936780.8655.qm@web15215.mail.cnb.yahoo.com> I got the same questions. I guess that the swissprote database has some problems! code: use Bio::DB::SwissProt; $sp = new Bio::DB::SwissProt; $seq = $sp->get_Seq_by_id('KPY1_ECOLI'); print ref($seq),"\t",$seq->display_id,"\n" the mesage: ------------- EXCEPTION ------------- MSG: swissprot stream with no ID. Not swissprot in my book STACK Bio::SeqIO::swiss::next_seq C:/perl/site/lib/Bio\SeqIO\swiss.pm:180 STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/perl/site/lib/Bio/DB/WebDBSeqI.pm:154 STACK toplevel t.pl:7 -------------------------------------- --- L Xu ????: > Here is the code: > > use Bio::Perl; > $seq_object = get_sequence('swiss',"ROA1_HUMAN"); > print STDERR ref($seq_object), "\t", $seq_object->display_id, "\n"; > write_sequence(">roa1.fasta",'fasta',$seq_object); > > The output looks like the same as the previous version: > > Microsoft Windows XP [Version 5.1.2600] > (C) Copyright 1985-2001 Microsoft Corp. > > C:\~Scripts>perl test.pl > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: swissprot stream with no ID. Not swissprot in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:350 > STACK: Bio::SeqIO::swiss::next_seq > C:/Perl/site/lib/Bio\SeqIO\swiss.pm:178 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_id > C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:15 > 3 > STACK: Bio::Perl::get_sequence C:/Perl/site/lib/Bio/Perl.pm:510 > STACK: test.pl:7 > ----------------------------------------------------------- > > Thanks. > > > > > > >From: David Messina > >To: L Xu > >CC: BioPerl list > >Subject: Re: [Bioperl-l] basic questions > >Date: Mon, 11 Jun 2007 13:48:23 -0500 > > > >Hi, > > > >Please use 'Reply All' so everyone on the list can follow the > discussion. > > > >Try adding the following line after the line that starts with > $seq_object: > > > > print STDERR ref($seq_object), "\t", $seq_object->display_id, "\n"; > > > >And then run the program again. What do you get? Could you post a > complete > >printout of what you're doing? > > > > > >Dave > > > > > >On Jun 11, 2007, at 11:45 AM, L Xu wrote: > >>I used WinXP with BioPerl Inst_version 2.1.8 (Bioperl 1.5.2) and > >>activeperl 5.8.8.819 Thank you very much. > > > > _________________________________________________________________ > Picture this ?share your photos and you could win big! > http://www.GETREALPhotoContest.com?ocid=TXT_TAGHM&loc=us > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Best wishes! Sincerely, Pengcheng ___________________________________________________________ ????????????????3.5G??????20M?????? http://cn.mail.yahoo.com From drummike at gmail.com Tue Jun 12 11:49:36 2007 From: drummike at gmail.com (Mike Williams) Date: Tue, 12 Jun 2007 11:49:36 -0400 Subject: [Bioperl-l] =?GB2312?B?UmU6IFtCaW9wZXJsLWxdILvYuLSjuiBSZTogYmFzaWMgcXVlc3Rpb25z?= In-Reply-To: <936780.8655.qm@web15215.mail.cnb.yahoo.com> References: <936780.8655.qm@web15215.mail.cnb.yahoo.com> Message-ID: On 6/12/07, Pengcheng Yang wrote: > I got the same questions. > I guess that the swissprote database has some problems! > code: > use Bio::DB::SwissProt; > $sp = new Bio::DB::SwissProt; > $seq = $sp->get_Seq_by_id('KPY1_ECOLI'); > print ref($seq),"\t",$seq->display_id,"\n" > ------------- EXCEPTION ------------- > MSG: swissprot stream with no ID. Not swissprot in my book > STACK toplevel t.pl:7 This is a different problem. The id was not valid. If you change KPY1 to KPYK1 it works fine. $seq = $sp->get_Seq_by_id('KPYK1_ECOLI'); print ref($seq),"\t",$seq->display_id,"\n" [mike at Wheatley]$ ./bio_quest2.pl Bio::Seq::RichSeq KPYK1_ECOLI If you got this example from the bio perl site would you please post the url? Seems to me this same problem has come up before, but I could not find it in the archives nor on the web site. Mike From ryanx07 at hotmail.com Tue Jun 12 11:42:28 2007 From: ryanx07 at hotmail.com (L Xu) Date: Tue, 12 Jun 2007 10:42:28 -0500 Subject: [Bioperl-l] basic questions Message-ID: I tested another code (the 2nd test on the same machine) from the tutorial and got error again. I don't know what happened and please help. Thanks so much. ===========================================================Code: use Bio::Restriction::EnzymeCollection; my $all_collection = Bio::Restriction::EnzymeCollection; my $six_cutter_collection = $all_collection->cutters(6); for my $enz ($six_cutter_collection){ print $enz->name,"\t",$enz->site,"\t",$enz->overhang_seq,"\n"; # prints name, recognition site, overhang } =========================================== Results: C:\~Scripts>perl t9.pl Can't use string ("Bio::Restriction::EnzymeCollecti") as a HASH ref while "stric t refs" in use at C:/Perl/site/lib/Bio/Restriction/EnzymeCollection.pm line 236. = = = Original message = = = On Jun 11, 2007, at 11:45 AM, L Xu wrote: I used WinXP with BioPerl Inst_version 2.1.8 (Bioperl 1.5.2) and? activeperl 5.8.8.819 Thank you very much. ___________________________________________________________ Sent by ePrompter, the premier email notification software. Free download at http://www.ePrompter.com. _________________________________________________________________ Need a break? Find your escape route with Live Search Maps. http://maps.live.com/default.aspx?ss=Restaurants~Hotels~Amusement%20Park&cp=33.832922~-117.915659&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=1118863&encType=1&FORM=MGAC01 From limericksean at gmail.com Tue Jun 12 12:04:40 2007 From: limericksean at gmail.com (Sean O'Keeffe) Date: Tue, 12 Jun 2007 18:04:40 +0200 Subject: [Bioperl-l] gff2xml Message-ID: <462784640706120904g25a6550dsc56a22af64ca98cd@mail.gmail.com> Hi all, I posted this on the gbrowse list earlier. I'm looking to convert gff data files into xml. Does anyone know of a module written to do this already? respect, sean. From johnsonm at gmail.com Tue Jun 12 12:10:45 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 12 Jun 2007 11:10:45 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Gene::Exon throws exception when encountering split location (Bio::Location::Split) In-Reply-To: References: Message-ID: On 6/12/07, Torsten Seemann wrote: > Can you use the ->spliced_seq() method to do this? > > http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/SeqFeatureI.html#POD11 > > -- > --Torsten Seemann > --Victorian Bioinformatics Consortium, Monash University > --Tel +61 3 9905 9010 Actually, I'd forgotten about spliced_seq(). That seems like it will Do The Right Thing. It's just up to the invoker to call spliced_seq() instead of seq() as appropriate. So, is there any other code that will break if I modify Bio::SeqFeature::Gene::Exon::location to not throw an exception when encountering Bio::Location::SplitLocationI? I'm wondering if it's just a paranoid check or if it's there to guard against something. If the latter, I need to know what code to fix. I'll dig and look, but if anybody knows or has an idea, save me some time. I suppose I can just change it and see what tests start failing. 8) From dmessina at wustl.edu Tue Jun 12 12:11:36 2007 From: dmessina at wustl.edu (David Messina) Date: Tue, 12 Jun 2007 11:11:36 -0500 Subject: [Bioperl-l] basic questions In-Reply-To: References: Message-ID: <30B8F841-E694-4577-8C15-8703E846CDFE@wustl.edu> Hmm, it almost looks like you're having an issue with line breaks. The 'swissprot stream with no ID' error made me think that perhaps Perl wasn't seeing the second argument to get_sequence. And then your new program has the error 'Can't use string ("Bio::Restriction::EnzymeCollecti")' where the end of the word is cut off. I don't know how ActivePerl handles Windows vs UNIX line breaks. Are there any example scripts that come with ActivePerl? If there are, and they run correctly, perhaps you could look to see how the line breaks are done and make sure the your program does it the same way. Other than that, I'm not seeing an obvious answer to your problem -- anyone else have a suggestion? Perhaps the easiest thing for you to do would be to reinstall BioPerl and make sure that you run the full test suite and that all of the tests pass. My guess is that something in your current setup is not quite right. Dave From cjfields at uiuc.edu Tue Jun 12 12:42:29 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Jun 2007 11:42:29 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Gene::Exon throws exception when encountering split location (Bio::Location::Split) In-Reply-To: References: Message-ID: On Jun 12, 2007, at 11:10 AM, Mark Johnson wrote: > On 6/12/07, Torsten Seemann > wrote: >> Can you use the ->spliced_seq() method to do this? >> >> http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/ >> SeqFeatureI.html#POD11 >> >> -- >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Monash University >> --Tel +61 3 9905 9010 > > Actually, I'd forgotten about spliced_seq(). That seems like it > will Do The Right Thing. It's just up to the invoker to call > spliced_seq() instead of seq() as appropriate. > So, is there any other code that will break if I modify > Bio::SeqFeature::Gene::Exon::location to not throw an exception when > encountering Bio::Location::SplitLocationI? I'm wondering if it's > just a paranoid check or if it's there to guard against something. If > the latter, I need to know what code to fix. I'll dig and look, but > if anybody knows or has an idea, save me some time. I suppose I can > just change it and see what tests start failing. 8) I'm wondering why you want to use Bio::SeqFeature::Gene::Exon to describe the 'wrap-around' genes. The SeqFeature::Gene::Exon docs state that the Exon class is used to specifically describe exons, as the name implies. Exons are primarily eukaryotic in origin, so you shouldn't encounter wraparounds, and should not have split locations by definition (which likely explains the exception). Wouldn't a SeqFeature::Generic work just as well using a split location? chris From johnsonm at gmail.com Tue Jun 12 12:59:54 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 12 Jun 2007 11:59:54 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Gene::Exon throws exception when encountering split location (Bio::Location::Split) In-Reply-To: References: Message-ID: That's a good point. Both Bio::Tools::Glimmer and Bio::Tools::Genemark produce Bio::SeqFeature::Gene objects, each with a single Bio::SeqFeature::Gene::Exon, when parsing predictions for prokaryotic sequence (multiple exons for eukaryotic). There are eukaryotic and prokaryotic versions of both predictor families. Maybe the most elegant solution would be to simply modify both modules to only emit Bio::SeqFeature::Generic features when operating on prokaryotic mode output? Fix the data model and the problem goes away. 8) On 6/12/07, Chris Fields wrote: > > On Jun 12, 2007, at 11:10 AM, Mark Johnson wrote: > > > On 6/12/07, Torsten Seemann > > wrote: > >> Can you use the ->spliced_seq() method to do this? > >> > >> http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/ > >> SeqFeatureI.html#POD11 > >> > >> -- > >> --Torsten Seemann > >> --Victorian Bioinformatics Consortium, Monash University > >> --Tel +61 3 9905 9010 > > > > Actually, I'd forgotten about spliced_seq(). That seems like it > > will Do The Right Thing. It's just up to the invoker to call > > spliced_seq() instead of seq() as appropriate. > > So, is there any other code that will break if I modify > > Bio::SeqFeature::Gene::Exon::location to not throw an exception when > > encountering Bio::Location::SplitLocationI? I'm wondering if it's > > just a paranoid check or if it's there to guard against something. If > > the latter, I need to know what code to fix. I'll dig and look, but > > if anybody knows or has an idea, save me some time. I suppose I can > > just change it and see what tests start failing. 8) > > I'm wondering why you want to use Bio::SeqFeature::Gene::Exon to > describe the 'wrap-around' genes. The SeqFeature::Gene::Exon docs > state that the Exon class is used to specifically describe exons, as > the name implies. Exons are primarily eukaryotic in origin, so you > shouldn't encounter wraparounds, and should not have split locations > by definition (which likely explains the exception). > > Wouldn't a SeqFeature::Generic work just as well using a split location? > > chris > From ryanx07 at hotmail.com Tue Jun 12 13:17:18 2007 From: ryanx07 at hotmail.com (L Xu) Date: Tue, 12 Jun 2007 12:17:18 -0500 Subject: [Bioperl-l] basic questions Message-ID: I reinstalled activePerl and BioPerl, now the activePerl is 5.8.8 build 820. However, both scripts generated the same error with my computer. I tested the code in another WinXP computer with the same versions of activePerl and BioPerl, the one for the swissprot did work but the restriction enzyme generated the same error. = = = Original message = = = Hmm, it almost looks like you're having an issue with line breaks. The 'swissprot stream with no ID' error made me think that perhaps? Perl wasn't seeing the second argument to get_sequence. And then your? new program has the error 'Can't use string? ("Bio::Restriction::EnzymeCollecti")' where the end of the word is? cut off. I don't know how ActivePerl handles Windows vs UNIX line breaks.? Are? there any example scripts that come with ActivePerl? If there are,? and they run correctly, perhaps you could look to see how the line? breaks are done and make sure the your program does it the same way. Other than that, I'm not seeing an obvious answer to your problem --? anyone else have a suggestion? Perhaps the easiest thing for you to do would be to reinstall BioPerl? and make sure that you run the full test suite and that all of the? tests pass. My guess is that something in your current setup is not? quite right. Dave ___________________________________________________________ Sent by ePrompter, the premier email notification software. Free download at http://www.ePrompter.com. _________________________________________________________________ Get a preview of Live Earth, the hottest event this summer - only on MSN http://liveearth.msn.com?source=msntaglineliveearthhm From cjfields at uiuc.edu Tue Jun 12 13:51:47 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Jun 2007 12:51:47 -0500 Subject: [Bioperl-l] basic questions In-Reply-To: References: Message-ID: This is an instance where 'use strict' would have shown the problem right away. You left off your constructor call: my $all_collection = Bio::Restriction::EnzymeCollection; should be my $all_collection = Bio::Restriction::EnzymeCollection->new; chris On Jun 12, 2007, at 12:17 PM, L Xu wrote: > I reinstalled activePerl and BioPerl, now the activePerl is 5.8.8 > build 820. > However, both scripts generated the same error with my computer. I > tested > the code in another WinXP computer with the same versions of > activePerl and > BioPerl, the one for the swissprot did work but the restriction enzyme > generated the same error. > > = = = Original message = = = > > Hmm, it almost looks like you're having an issue with line breaks. > > The 'swissprot stream with no ID' error made me think that perhaps? > Perl > wasn't seeing the second argument to get_sequence. And then your? new > program has the error 'Can't use string? > ("Bio::Restriction::EnzymeCollecti")' where the end of the word is? > cut off. > > I don't know how ActivePerl handles Windows vs UNIX line breaks.? > Are? there > any example scripts that come with ActivePerl? If there are,? and > they run > correctly, perhaps you could look to see how the line? breaks are > done and > make sure the your program does it the same way. > > Other than that, I'm not seeing an obvious answer to your problem > --? anyone > else have a suggestion? > > Perhaps the easiest thing for you to do would be to reinstall > BioPerl? and > make sure that you run the full test suite and that all of the? > tests pass. > My guess is that something in your current setup is not? quite right. > > Dave > > ___________________________________________________________ > Sent by ePrompter, the premier email notification software. > Free download at http://www.ePrompter.com. > > _________________________________________________________________ > Get a preview of Live Earth, the hottest event this summer - only > on MSN > http://liveearth.msn.com?source=msntaglineliveearthhm > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ryanx07 at hotmail.com Tue Jun 12 14:11:15 2007 From: ryanx07 at hotmail.com (L Xu) Date: Tue, 12 Jun 2007 13:11:15 -0500 Subject: [Bioperl-l] basic questions Message-ID: Thank you very much, it did make the script advanced a bit but I got the following error: C:\~Scripts>perl t9.pl Can't locate object method "name" via package "Bio::Restriction::EnzymeCollectio n" at t9.pl line 5, line 532. I checked the documentation , there is no "name" method for the package. Thanks. = = = Original message = = = This is an instance where 'use strict' would have shown the problem? right away.? You left off your constructor call: my $all_collection = Bio::Restriction::EnzymeCollection; should be my $all_collection = Bio::Restriction::EnzymeCollection->new; chris On Jun 12, 2007, at 12:17 PM, L Xu wrote: I reinstalled activePerl and BioPerl, now the activePerl is 5.8.8? build 820. However, both scripts generated the same error with my computer. I? tested the code in another WinXP computer with the same versions of? activePerl and BioPerl, the one for the swissprot did work but the restriction enzyme generated the same error. = = = Original message = = = Hmm, it almost looks like you're having an issue with line breaks. The 'swissprot stream with no ID' error made me think that perhaps?? Perl wasn't seeing the second argument to get_sequence. And then your? new program has the error 'Can't use string? ("Bio::Restriction::EnzymeCollecti")' where the end of the word is?? cut off. I don't know how ActivePerl handles Windows vs UNIX line breaks.?? Are? there any example scripts that come with ActivePerl? If there are,? and? they run correctly, perhaps you could look to see how the line? breaks are? done and make sure the your program does it the same way. Other than that, I'm not seeing an obvious answer to your problem? --? anyone else have a suggestion? Perhaps the easiest thing for you to do would be to reinstall? BioPerl? and make sure that you run the full test suite and that all of the?? tests pass. My guess is that something in your current setup is not? quite right. Dave ___________________________________________________________ Sent by ePrompter, the premier email notification software. Free download at http://www.ePrompter.com. _________________________________________________________________ Get a preview of Live Earth, the hottest event this summer - only? on MSN http://liveearth.msn.com?source=msntaglineliveearthhm _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign ___________________________________________________________ Sent by ePrompter, the premier email notification software. Free download at http://www.ePrompter.com. _________________________________________________________________ Get a preview of Live Earth, the hottest event this summer - only on MSN http://liveearth.msn.com?source=msntaglineliveearthhm From cjfields at uiuc.edu Tue Jun 12 14:35:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Jun 2007 13:35:15 -0500 Subject: [Bioperl-l] basic questions In-Reply-To: References: Message-ID: <287E93E2-1902-4796-971E-B1DCA805D032@uiuc.edu> Bio::Restriction::EnzymeCollection holds Bio::Restriction::Enzyme objects, each with its own name(). Using grouped methods like '$collection->cutters(6)' will retrieve a new EnzymeCollection containing all six-cutters from the original collection. You should use one of the EnzymeCollection accessor methods to retrieve the enzyme that you wanted first or iterate through them all. This works for me: use Bio::Restriction::EnzymeCollection; my $all_collection = Bio::Restriction::EnzymeCollection->new(); my $six_cutter_collection = $all_collection->cutters(6); for my $enz ($six_cutter_collection->each_enzyme){ print $enz->name,"\t",$enz->site,"\t",$enz->overhang_seq,"\n"; } chris On Jun 12, 2007, at 1:11 PM, L Xu wrote: > Thank you very much, it did make the script advanced a bit but I > got the following error: > > C:\~Scripts>perl t9.pl > Can't locate object method "name" via package > "Bio::Restriction::EnzymeCollectio > n" at t9.pl line 5, line 532. > > I checked the documentation , there is no "name" method for the > package. Thanks. From johnsonm at gmail.com Tue Jun 12 15:07:57 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Tue, 12 Jun 2007 14:07:57 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Gene::Exon throws exception when encountering split location (Bio::Location::Split) In-Reply-To: References: Message-ID: I'll wait a day, and if there is no opinion to the contrary, implement it this way. On 6/12/07, Mark Johnson wrote: > That's a good point. Both Bio::Tools::Glimmer and > Bio::Tools::Genemark produce Bio::SeqFeature::Gene objects, each with > a single Bio::SeqFeature::Gene::Exon, when parsing predictions for > prokaryotic sequence (multiple exons for eukaryotic). There are > eukaryotic and prokaryotic versions of both predictor families. Maybe > the most elegant solution would be to simply modify both modules to > only emit Bio::SeqFeature::Generic features when operating on > prokaryotic mode output? Fix the data model and the problem goes > away. 8) From torsten.seemann at infotech.monash.edu.au Tue Jun 12 20:18:27 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 13 Jun 2007 10:18:27 +1000 Subject: [Bioperl-l] gff2xml In-Reply-To: <462784640706120904g25a6550dsc56a22af64ca98cd@mail.gmail.com> References: <462784640706120904g25a6550dsc56a22af64ca98cd@mail.gmail.com> Message-ID: Sean > I posted this on the gbrowse list earlier. I'm looking to convert gff > data files into xml. Does anyone know of a module written to do this > already? What DTD do you want the XML to conform to? eg. ChadoXML, TinySeq XML, TIGR XML ... ? What program are you trying to get to load the XML? BioPerl has some Bio::SeqIO:xxxxx modules for some XML formats that you could use. There is a script "bp_seqconvert.pl -h" which comes with BioPerl which may be useful. -- --Torsten Seemann --Victorian Bioinformatics Consortium, Monash University --Tel +61 3 9905 9010 From hlapp at gmx.net Tue Jun 12 20:55:57 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 12 Jun 2007 20:55:57 -0400 Subject: [Bioperl-l] Bio::SeqFeature::Gene::Exon throws exception when encountering split location (Bio::Location::Split) In-Reply-To: References: Message-ID: <0915FAB4-E554-4E65-BA3F-1B916F0F95FC@gmx.net> I think it was just trying to guard against people trying to do stupid things. I'm actually not sure that representing locations on a circular genome using split locations really is the best thing. I'm wondering whether one shouldn't rather introduce a CircularLocation object (though obviously it isn't the location that's circular...). Just a thought. In the end, if you have a way to make this work that you feel comfortable with than go for it. -hilmar On Jun 12, 2007, at 12:10 PM, Mark Johnson wrote: > On 6/12/07, Torsten Seemann > wrote: >> Can you use the ->spliced_seq() method to do this? >> >> http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/ >> SeqFeatureI.html#POD11 >> >> -- >> --Torsten Seemann >> --Victorian Bioinformatics Consortium, Monash University >> --Tel +61 3 9905 9010 > > Actually, I'd forgotten about spliced_seq(). That seems like it > will Do The Right Thing. It's just up to the invoker to call > spliced_seq() instead of seq() as appropriate. > So, is there any other code that will break if I modify > Bio::SeqFeature::Gene::Exon::location to not throw an exception when > encountering Bio::Location::SplitLocationI? I'm wondering if it's > just a paranoid check or if it's there to guard against something. If > the latter, I need to know what code to fix. I'll dig and look, but > if anybody knows or has an idea, save me some time. I suppose I can > just change it and see what tests start failing. 8) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Tue Jun 12 20:57:06 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 12 Jun 2007 20:57:06 -0400 Subject: [Bioperl-l] Bio::SeqFeature::Gene::Exon throws exception when encountering split location (Bio::Location::Split) In-Reply-To: References: Message-ID: <80EAA2F1-B2DA-45F0-B591-8534C356E679@gmx.net> I like that. Don't force a model to do what you want if it doesn't really apply anyway. -hilmar On Jun 12, 2007, at 12:59 PM, Mark Johnson wrote: > That's a good point. Both Bio::Tools::Glimmer and > Bio::Tools::Genemark produce Bio::SeqFeature::Gene objects, each with > a single Bio::SeqFeature::Gene::Exon, when parsing predictions for > prokaryotic sequence (multiple exons for eukaryotic). There are > eukaryotic and prokaryotic versions of both predictor families. Maybe > the most elegant solution would be to simply modify both modules to > only emit Bio::SeqFeature::Generic features when operating on > prokaryotic mode output? Fix the data model and the problem goes > away. 8) > > On 6/12/07, Chris Fields wrote: >> >> On Jun 12, 2007, at 11:10 AM, Mark Johnson wrote: >> >>> On 6/12/07, Torsten Seemann >>> wrote: >>>> Can you use the ->spliced_seq() method to do this? >>>> >>>> http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/ >>>> SeqFeatureI.html#POD11 >>>> >>>> -- >>>> --Torsten Seemann >>>> --Victorian Bioinformatics Consortium, Monash University >>>> --Tel +61 3 9905 9010 >>> >>> Actually, I'd forgotten about spliced_seq(). That seems like it >>> will Do The Right Thing. It's just up to the invoker to call >>> spliced_seq() instead of seq() as appropriate. >>> So, is there any other code that will break if I modify >>> Bio::SeqFeature::Gene::Exon::location to not throw an exception when >>> encountering Bio::Location::SplitLocationI? I'm wondering if it's >>> just a paranoid check or if it's there to guard against >>> something. If >>> the latter, I need to know what code to fix. I'll dig and look, but >>> if anybody knows or has an idea, save me some time. I suppose I can >>> just change it and see what tests start failing. 8) >> >> I'm wondering why you want to use Bio::SeqFeature::Gene::Exon to >> describe the 'wrap-around' genes. The SeqFeature::Gene::Exon docs >> state that the Exon class is used to specifically describe exons, as >> the name implies. Exons are primarily eukaryotic in origin, so you >> shouldn't encounter wraparounds, and should not have split locations >> by definition (which likely explains the exception). >> >> Wouldn't a SeqFeature::Generic work just as well using a split >> location? >> >> chris >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Tue Jun 12 21:20:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 12 Jun 2007 20:20:41 -0500 Subject: [Bioperl-l] Bio::SeqFeature::Gene::Exon throws exception when encountering split location (Bio::Location::Split) In-Reply-To: <80EAA2F1-B2DA-45F0-B591-8534C356E679@gmx.net> References: <80EAA2F1-B2DA-45F0-B591-8534C356E679@gmx.net> Message-ID: <951EB9CA-2066-4CD1-BCD5-4E00232CA507@uiuc.edu> It will be interesting to see if bioperl handles wrap-around split locations via spliced_seq() and other methods. I can't see why it wouldn't but one never knows. Might be something to add to location tests at some point... chris On Jun 12, 2007, at 7:57 PM, Hilmar Lapp wrote: > I like that. Don't force a model to do what you want if it doesn't > really apply anyway. > > -hilmar > > On Jun 12, 2007, at 12:59 PM, Mark Johnson wrote: > >> That's a good point. Both Bio::Tools::Glimmer and >> Bio::Tools::Genemark produce Bio::SeqFeature::Gene objects, each with >> a single Bio::SeqFeature::Gene::Exon, when parsing predictions for >> prokaryotic sequence (multiple exons for eukaryotic). There are >> eukaryotic and prokaryotic versions of both predictor families. >> Maybe >> the most elegant solution would be to simply modify both modules to >> only emit Bio::SeqFeature::Generic features when operating on >> prokaryotic mode output? Fix the data model and the problem goes >> away. 8) >> >> On 6/12/07, Chris Fields wrote: >>> >>> On Jun 12, 2007, at 11:10 AM, Mark Johnson wrote: >>> >>>> On 6/12/07, Torsten Seemann >>>> wrote: >>>>> Can you use the ->spliced_seq() method to do this? >>>>> >>>>> http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/ >>>>> SeqFeatureI.html#POD11 >>>>> >>>>> -- >>>>> --Torsten Seemann >>>>> --Victorian Bioinformatics Consortium, Monash University >>>>> --Tel +61 3 9905 9010 >>>> >>>> Actually, I'd forgotten about spliced_seq(). That seems >>>> like it >>>> will Do The Right Thing. It's just up to the invoker to call >>>> spliced_seq() instead of seq() as appropriate. >>>> So, is there any other code that will break if I modify >>>> Bio::SeqFeature::Gene::Exon::location to not throw an exception >>>> when >>>> encountering Bio::Location::SplitLocationI? I'm wondering if it's >>>> just a paranoid check or if it's there to guard against >>>> something. If >>>> the latter, I need to know what code to fix. I'll dig and look, >>>> but >>>> if anybody knows or has an idea, save me some time. I suppose I >>>> can >>>> just change it and see what tests start failing. 8) >>> >>> I'm wondering why you want to use Bio::SeqFeature::Gene::Exon to >>> describe the 'wrap-around' genes. The SeqFeature::Gene::Exon docs >>> state that the Exon class is used to specifically describe exons, as >>> the name implies. Exons are primarily eukaryotic in origin, so you >>> shouldn't encounter wraparounds, and should not have split locations >>> by definition (which likely explains the exception). >>> >>> Wouldn't a SeqFeature::Generic work just as well using a split >>> location? >>> >>> chris >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ryanx07 at hotmail.com Wed Jun 13 08:16:15 2007 From: ryanx07 at hotmail.com (L Xu) Date: Wed, 13 Jun 2007 07:16:15 -0500 Subject: [Bioperl-l] Example code in Bioperl Tutorial Message-ID: Thanks so much, Chris, it works now. All the codes I tested were copied from Bioperl Tutorial. Why did they have such problems, because of the platform issue or different versions of BioPerl? I tested so far 6 scripts, three work and three don't. Here is the problem for the 3rd failed script: ================================= use strict; use Bio::Tools::Run::RemoteBlast; my $remote_blast = Bio::Tools::Run::RemoteBlast->new ( -prog => 'blastn', -data => 'ecoli', -expect => '1e-10' ); my $r = $remote_blast->submit_blast("d1.fa"); my $rc; while ( my @rids = $remote_blast->each_rid ) { for my $rid ( @rids ) { $rc = $remote_blast->retrieve_blast($rid); } } print "$rc\n"; #I just want to print sth here before parsing the result =========================================================d1.fa >example CCCTTCAGGTACCCCGAGGTAACACGAGACACTCGGGATCTGGGAAGGGGACTGGGGCTTCTTTAAAAGCGCTCAGTTTAAAAAGCTTCTATGCCTGAATAGGTGACCGGAGGCCGGCACC =========================================================result C:\>perl t13.pl -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: Unknown error) --------------------------------------------------- -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: Unknown error) --------------------------------------------------- Terminating on signal SIGINT(2) C:\> Please help me to correct the problem, thanks. = = = Original message = = = Bio::Restriction::EnzymeCollection holds Bio::Restriction::Enzyme? objects, each with its own name().? Using grouped methods like? '$collection->cutters(6)' will retrieve a new EnzymeCollection? containing all six-cutters from the original collection.? You should? use one of the EnzymeCollection accessor methods to retrieve the? enzyme that you wanted first or iterate through them all.? This works? for me: use Bio::Restriction::EnzymeCollection; my $all_collection = Bio::Restriction::EnzymeCollection->new(); my $six_cutter_collection = $all_collection->cutters(6); for my $enz ($six_cutter_collection->each_enzyme) ?? print $enz->name,"\t",$enz->site,"\t",$enz->overhang_seq,"\n"; chris On Jun 12, 2007, at 1:11 PM, L Xu wrote: Thank you very much, it did make the script advanced a bit but I? got the following error: C:\~Scripts>perl t9.pl Can't locate object method "name" via package? "Bio::Restriction::EnzymeCollectio n" at t9.pl line 5, line 532. I checked the documentation , there is no "name" method for the? package. Thanks. ___________________________________________________________ Sent by ePrompter, the premier email notification software. Free download at http://www.ePrompter.com. _________________________________________________________________ Make every IM count. Download Messenger and join the i?m Initiative now. It?s free. http://im.live.com/messenger/im/home/?source=TAGHM_June07 From cjfields at uiuc.edu Wed Jun 13 10:41:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 13 Jun 2007 09:41:55 -0500 Subject: [Bioperl-l] Example code in Bioperl Tutorial In-Reply-To: References: Message-ID: <4F7BE556-BD8C-4378-BDE7-1F31364F49DA@uiuc.edu> Judging by the output it looks like you have no network access or can't connect to the server (what remoteblast needs). Make sure you don't need proxy settings. To preempt the next question, no, I'm not going to explain what a proxy is. The RemoteBlast docs show how to set them, and Google is a wonderful tool... chris On Jun 13, 2007, at 7:16 AM, L Xu wrote: > ... > -------------------- WARNING --------------------- > MSG: > An Error Occurred > >

An Error Occurred

> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: Unknown error) > > > > --------------------------------------------------- > ... From ryanx07 at hotmail.com Wed Jun 13 11:01:07 2007 From: ryanx07 at hotmail.com (L Xu) Date: Wed, 13 Jun 2007 10:01:07 -0500 Subject: [Bioperl-l] Example code in Bioperl Tutorial Message-ID: I do have the internet connection bu not use the proxy server. I tested the network connection with ping command (below). The ncbi website does not response. Is there any special network setting needed for connecting the ncbi website? Thank you so much. C:\>ping www.yahoo.com Pinging www.yahoo-ht3.akadns.net [69.147.114.210] with 32 bytes of data: Reply from 69.147.114.210: bytes=32 time=363ms TTL=45 Reply from 69.147.114.210: bytes=32 time=319ms TTL=45 Reply from 69.147.114.210: bytes=32 time=312ms TTL=45 Reply from 69.147.114.210: bytes=32 time=360ms TTL=45 Ping statistics for 69.147.114.210: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 312ms, Maximum = 363ms, Average = 338ms C:\>ping www.ncbi.nlm.nih.gov Pinging www.ncbi.nlm.nih.gov [130.14.29.110] with 32 bytes of data: Request timed out. Request timed out. Request timed out. Request timed out. Ping statistics for 130.14.29.110: Packets: Sent = 4, Received = 0, Lost = 4 (100% loss), = = = Original message = = = Judging by the output it looks like you have no network access or? can't connect to the server (what remoteblast needs).? Make sure you? don't need proxy settings. To preempt the next question, no, I'm not going to explain what a? proxy is.? The RemoteBlast docs show how to set them, and Google is a? wonderful tool... chris On Jun 13, 2007, at 7:16 AM, L Xu wrote: ... -------------------- WARNING --------------------- MSG: An Error Occurred

An Error Occurred

500 Can't connect to www.ncbi.nlm.nih.gov:80 (connect: Unknown error) --------------------------------------------------- ... ___________________________________________________________ Sent by ePrompter, the premier email notification software. Free download at http://www.ePrompter.com. _________________________________________________________________ Get a preview of Live Earth, the hottest event this summer - only on MSN http://liveearth.msn.com?source=msntaglineliveearthhm From cjfields at uiuc.edu Wed Jun 13 12:14:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 13 Jun 2007 11:14:22 -0500 Subject: [Bioperl-l] method naming Message-ID: <724E5A3F-22CF-41B6-AC33-CD5EAD7D1251@uiuc.edu> Some quick questions on method naming. I couldn't find this on the mail list previously and just want some opinions. 1) Is there any preference on how to name a method that returns a list of class instances vs. data? I have seen 'each' (each_Location, each_tag_value) vs. 'get_all' (get_all_tags, get_all_SeqFeatures) vs. simple (hits, hsps). 2) Do we want have methods which return objects have the object name in Title Case (each_Location, get_Seq_by_id, etc) or does it really matter? chris From dmessina at wustl.edu Wed Jun 13 12:41:53 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 13 Jun 2007 11:41:53 -0500 Subject: [Bioperl-l] method naming In-Reply-To: <724E5A3F-22CF-41B6-AC33-CD5EAD7D1251@uiuc.edu> References: <724E5A3F-22CF-41B6-AC33-CD5EAD7D1251@uiuc.edu> Message-ID: <9A51046D-4827-4CF7-A2B7-7880E03129E9@wus