From forrest_zhang at 163.com Mon Mar 1 00:10:31 2010 From: forrest_zhang at 163.com (forrest) Date: Mon, 01 Mar 2010 13:10:31 +0800 Subject: [Bioperl-l] use threads to get seq file error. Message-ID: <4B8B4C47.108@163.com> Hi all, When I use threads to get Genbank format file, show some error. It is shown as: "Can't call method "get_taxon" on unblessed reference at /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." ========================================= #!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GenBank; use threads; my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); my $seq_out = Bio::SeqIO->new(-format => "genbank", -file => ">dhn_all.gb"); my @seq; my $number = @id; my $max_threads = 6; for (my $thread_number=0;$thread_number<$number;){ my %threads_seq_hash; if ($number - $thread_number > $max_threads){ for (my $thread=0;$thread<$max_threads;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } }else{ my $else_number = $number % $max_threads; for (my $thread=0;$thread<$else_number;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } } foreach my $thread (sort keys %threads_seq_hash){ my ($seq) = $threads_seq_hash{$thread}->join; push (@seq,$seq); } } foreach (@seq){ $seq_out->write_seq($_); } ========================================= How can I fix this error? Thanks. Zhang Tao From cjfields at illinois.edu Mon Mar 1 15:37:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 01 Mar 2010 14:37:18 -0600 Subject: [Bioperl-l] use threads to get seq file error. In-Reply-To: <4B8B4C47.108@163.com> References: <4B8B4C47.108@163.com> Message-ID: <1267475838.16248.8.camel@pyrimidine.igb.uiuc.edu> I get much nastier ones than that; a small taste: --------------------- WARNING --------------------- MSG: exception while parsing location line [1..680] in reading EMBL/GenBank/SwissProt, ignoring feature source (seqid=AF031249): Eval-group not allowed at runtime, use re 'eval' in regex m/(.*?)\(((?x-ism: (?> [^()]+ | \( (??{.../ at /home/cjfields/bioperl/live/Bio/Factory/FTLocationFactory.pm line 161, line 36. --------------------------------------------------- Thread 2 terminated abnormally: Can't call method "primary_tag" on an undefined value at /home/cjfields/bioperl/live/Bio/SeqIO/genbank.pm line 662, line 36. Could you report this as a bug? chris On Mon, 2010-03-01 at 13:10 +0800, forrest wrote: > Hi all, > > When I use threads to get Genbank format file, show some error. It is > shown as: > > "Can't call method "get_taxon" on unblessed reference at > /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." > > ========================================= > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > use Bio::Seq; > use Bio::DB::GenBank; > use threads; > > > my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); > > > my $seq_out = Bio::SeqIO->new(-format => "genbank", > -file => ">dhn_all.gb"); > my @seq; > > my $number = @id; > > my $max_threads = 6; > > for (my $thread_number=0;$thread_number<$number;){ > my %threads_seq_hash; > > if ($number - $thread_number > $max_threads){ > for (my $thread=0;$thread<$max_threads;){ > $threads_seq_hash{$thread} = threads->new(sub { > my $gb = Bio::DB::GenBank->new; > my $seq = $gb->get_Seq_by_acc($id[$thread_number]); > }); > $thread_number++; > $thread++; > > } > }else{ > my $else_number = $number % $max_threads; > for (my $thread=0;$thread<$else_number;){ > $threads_seq_hash{$thread} = threads->new(sub { > my $gb = Bio::DB::GenBank->new; > my $seq = $gb->get_Seq_by_acc($id[$thread_number]); > }); > $thread_number++; > $thread++; > > } > > > } > > foreach my $thread (sort keys %threads_seq_hash){ > my ($seq) = $threads_seq_hash{$thread}->join; > push (@seq,$seq); > } > } > > foreach (@seq){ > $seq_out->write_seq($_); > } > ========================================= > > > How can I fix this error? > Thanks. > > > Zhang Tao > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paolo.pavan at gmail.com Mon Mar 1 18:07:33 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 00:07:33 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> Message-ID: <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Dear all, Sorry for pushing up my post but, please does anyone have an hint for me? Maybe have I to send attached the report to the mailing list? I don't know attachment policies of the list, if it is allowed and is needed I can do that. Thank you, Paolo 2010/2/26 Paolo Pavan : > Sorry, > Maybe I forgot to add this is the megablast -m 5 output. > > Thank you again, > Paolo > > 2010/2/26 Paolo Pavan : >> Hi all, >> I have just a brief question: I've got some megablast reports such the >> one I've pasted below. >> I'm aware of the existence of the Bio::Search::IO::megablast and the >> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >> entire alignment represented as a Bio::SimpleAlign object or >> Bio::Align::AlignI implementing one? >> >> Thank you all, >> Paolo >> >> >> MEGABLAST 2.2.16 [Mar-25-2007] >> >> >> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >> "A greedy algorithm for aligning DNA sequences", >> J Comput Biol 2000; 7(1-2):203-14. >> >> Database: 00038-00053.fasta >> ?????????? 2 sequences; 2001 total letters >> >> Searching..................................................done >> >> Query= 00038-00053 >> ???????? (802 letters) >> >> >> >> ???????????????????????????????????????????????????????????????? Score??? E >> Sequences producing significant alignments:????????????????????? (bits) Value >> >> ______00038 >> 226?? 1e-62 >> ______00053 >> 115?? 3e-29 >> >> 1_0???????? 472 >> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >> ______00038 883 >> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 532 >> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >> ______00038 943 >> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 591 >> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 651 >> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 711 >> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053 1??? -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >> >> 1_0???????? 771? ccttaagtatttactttaacttgtcttgatca 802 >> ______00038 1000 -------------------------------- 1001 >> ______00053 35?? ccttaagtatt-actttaacttgtcttgatca 65 >> ? Database: 00038-00053.fasta >> ??? Posted date:? Feb 25, 2010? 4:47 PM >> ? Number of letters in database: 2001 >> ? Number of sequences in database:? 2 >> >> Lambda???? K????? H >> ??? 1.37??? 0.711???? 1.31 >> >> Gapped >> Lambda???? K????? H >> ??? 1.37??? 0.711???? 1.31 >> >> >> Matrix: blastn matrix:1 -3 >> Gap Penalties: Existence: 0, Extension: 0 >> Number of Sequences: 2 >> Number of Hits to DB: 17 >> Number of extensions: 3 >> Number of successful extensions: 3 >> Number of sequences better than 10.0: 2 >> Number of HSP's gapped: 2 >> Number of HSP's successfully gapped: 2 >> Length of query: 802 >> Length of database: 2001 >> Length adjustment: 10 >> Effective length of query: 792 >> Effective length of database: 1981 >> Effective search space:? 1568952 >> Effective search space used:? 1568952 >> X1: 9 (17.8 bits) >> X2: 20 (39.6 bits) >> X3: 51 (101.1 bits) >> S1: 9 (18.3 bits) >> S2: 9 (18.3 bits) >> > From cjfields at illinois.edu Mon Mar 1 19:30:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Mar 2010 18:30:43 -0600 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Message-ID: Paolo, You can get a Bio::SimpleAlign from the HSP object. The first code example in this section in the HOWTO demonstrates this: http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods chris On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > Dear all, > Sorry for pushing up my post but, please does anyone have an hint for me? > Maybe have I to send attached the report to the mailing list? I don't > know attachment policies of the list, if it is allowed and is needed I > can do that. > > Thank you, > Paolo > > 2010/2/26 Paolo Pavan : >> Sorry, >> Maybe I forgot to add this is the megablast -m 5 output. >> >> Thank you again, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Hi all, >>> I have just a brief question: I've got some megablast reports such the >>> one I've pasted below. >>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>> entire alignment represented as a Bio::SimpleAlign object or >>> Bio::Align::AlignI implementing one? >>> >>> Thank you all, >>> Paolo >>> >>> >>> MEGABLAST 2.2.16 [Mar-25-2007] >>> >>> >>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >>> "A greedy algorithm for aligning DNA sequences", >>> J Comput Biol 2000; 7(1-2):203-14. >>> >>> Database: 00038-00053.fasta >>> 2 sequences; 2001 total letters >>> >>> Searching..................................................done >>> >>> Query= 00038-00053 >>> (802 letters) >>> >>> >>> >>> Score E >>> Sequences producing significant alignments: (bits) Value >>> >>> ______00038 >>> 226 1e-62 >>> ______00053 >>> 115 3e-29 >>> >>> 1_0 472 >>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>> ______00038 883 >>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 532 >>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>> ______00038 943 >>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 591 >>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 651 >>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 711 >>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 1 -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>> >>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>> ______00038 1000 -------------------------------- 1001 >>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>> Database: 00038-00053.fasta >>> Posted date: Feb 25, 2010 4:47 PM >>> Number of letters in database: 2001 >>> Number of sequences in database: 2 >>> >>> Lambda K H >>> 1.37 0.711 1.31 >>> >>> Gapped >>> Lambda K H >>> 1.37 0.711 1.31 >>> >>> >>> Matrix: blastn matrix:1 -3 >>> Gap Penalties: Existence: 0, Extension: 0 >>> Number of Sequences: 2 >>> Number of Hits to DB: 17 >>> Number of extensions: 3 >>> Number of successful extensions: 3 >>> Number of sequences better than 10.0: 2 >>> Number of HSP's gapped: 2 >>> Number of HSP's successfully gapped: 2 >>> Length of query: 802 >>> Length of database: 2001 >>> Length adjustment: 10 >>> Effective length of query: 792 >>> Effective length of database: 1981 >>> Effective search space: 1568952 >>> Effective search space used: 1568952 >>> X1: 9 (17.8 bits) >>> X2: 20 (39.6 bits) >>> X3: 51 (101.1 bits) >>> S1: 9 (18.3 bits) >>> S2: 9 (18.3 bits) >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Mar 1 20:51:02 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 01 Mar 2010 17:51:02 -0800 Subject: [Bioperl-l] Any module for chromosome region analysis ? In-Reply-To: References: <1267131590.4355.2.camel@epistle> <1267131697.4355.3.camel@epistle> Message-ID: <4B8C6F06.5050905@bioperl.org> Like the ensembl perl API? Robert Bradbury wrote: > I'm not sure if the species being dealt with are "common", but it would seem > to me that a logical addition to bioperl would be an extension that took a > genome location (or locations) and interfaced one into a browser of those > regions in external databases (e.g. UCSC Genome Browser, Ensemble, etc.). > The only cases where that wouldn't work is if one is dealing with novel > species that aren't in the databases yet. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Tue Mar 2 01:21:31 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 01 Mar 2010 22:21:31 -0800 Subject: [Bioperl-l] call for project ideas - Google Summer of Code Message-ID: <4B8CAE6B.4010807@cornell.edu> Hi all, Google's Summer of Code is coming round again, very soon now (mentoring organization applications are due next week). We need project ideas for prospective Summer of Code interns. There's a page on the BioPerl wiki, please have a look and add your ideas for intern projects. For more on Google Summer of Code, what it is and how it works, see their FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs One of the summer intern ideas I have on the page so far is to help with the tough grunt work of breaking BioPerl into smaller, more easily managed distributions. I'm sure you all can think of plenty more! Here's the page: http://www.bioperl.org/wiki/Google_Summer_of_Code Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From paolo.pavan at gmail.com Tue Mar 2 09:37:59 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 15:37:59 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Message-ID: <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> Hi Chris, Thank you for your reply. So I have to understand that since the get_aln method returns the HSP alignment, there is no way to retrieve the whole alignment as in the example pasted, isn't it? Basically I'm trying to use megablast as kind of multiple local alignment engine and actually I'm not pretty sure this is a good idea but in my particular case could be suitable. I mean that the example below reports only the portions of the sequences that align loosing the portions that does not, I'm not sure I gave the idea. What do you think about? Can you give me your opinion? If there isn't any module written yet, I can try to write a parser, it could be of any interest? Thank you, Paolo 2010/3/2 Chris Fields : > Paolo, > > You can get a Bio::SimpleAlign from the HSP object. ?The first code example in this section in the HOWTO demonstrates this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > chris > > On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > >> Dear all, >> Sorry for pushing up my post but, please does anyone have an hint for me? >> Maybe have I to send attached the report to the mailing list? I don't >> know attachment policies of the list, if it is allowed and is needed I >> can do that. >> >> Thank you, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Sorry, >>> Maybe I forgot to add this is the megablast -m 5 output. >>> >>> Thank you again, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> Hi all, >>>> I have just a brief question: I've got some megablast reports such the >>>> one I've pasted below. >>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>> entire alignment represented as a Bio::SimpleAlign object or >>>> Bio::Align::AlignI implementing one? >>>> >>>> Thank you all, >>>> Paolo >>>> >>>> >>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>> >>>> >>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >>>> "A greedy algorithm for aligning DNA sequences", >>>> J Comput Biol 2000; 7(1-2):203-14. >>>> >>>> Database: 00038-00053.fasta >>>> ? ? ? ? ? ?2 sequences; 2001 total letters >>>> >>>> Searching..................................................done >>>> >>>> Query= 00038-00053 >>>> ? ? ? ? ?(802 letters) >>>> >>>> >>>> >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Score ? ?E >>>> Sequences producing significant alignments: ? ? ? ? ? ? ? ? ? ? ?(bits) Value >>>> >>>> ______00038 >>>> 226 ? 1e-62 >>>> ______00053 >>>> 115 ? 3e-29 >>>> >>>> 1_0 ? ? ? ? 472 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>> ______00038 883 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 532 >>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>> ______00038 943 >>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 591 >>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 651 >>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 711 >>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 1 ? ?-------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>> >>>> 1_0 ? ? ? ? 771 ?ccttaagtatttactttaacttgtcttgatca 802 >>>> ______00038 1000 -------------------------------- 1001 >>>> ______00053 35 ? ccttaagtatt-actttaacttgtcttgatca 65 >>>> ? Database: 00038-00053.fasta >>>> ? ? Posted date: ?Feb 25, 2010 ?4:47 PM >>>> ? Number of letters in database: 2001 >>>> ? Number of sequences in database: ?2 >>>> >>>> Lambda ? ? K ? ? ?H >>>> ? ? 1.37 ? ?0.711 ? ? 1.31 >>>> >>>> Gapped >>>> Lambda ? ? K ? ? ?H >>>> ? ? 1.37 ? ?0.711 ? ? 1.31 >>>> >>>> >>>> Matrix: blastn matrix:1 -3 >>>> Gap Penalties: Existence: 0, Extension: 0 >>>> Number of Sequences: 2 >>>> Number of Hits to DB: 17 >>>> Number of extensions: 3 >>>> Number of successful extensions: 3 >>>> Number of sequences better than 10.0: 2 >>>> Number of HSP's gapped: 2 >>>> Number of HSP's successfully gapped: 2 >>>> Length of query: 802 >>>> Length of database: 2001 >>>> Length adjustment: 10 >>>> Effective length of query: 792 >>>> Effective length of database: 1981 >>>> Effective search space: ?1568952 >>>> Effective search space used: ?1568952 >>>> X1: 9 (17.8 bits) >>>> X2: 20 (39.6 bits) >>>> X3: 51 (101.1 bits) >>>> S1: 9 (18.3 bits) >>>> S2: 9 (18.3 bits) >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Zhang_tao at uestc.edu.cn Mon Mar 1 00:02:12 2010 From: Zhang_tao at uestc.edu.cn (Zhang_tao) Date: Mon, 01 Mar 2010 13:02:12 +0800 Subject: [Bioperl-l] use threads to get seq file error. Message-ID: <467416916.06375@eyou.net> Hi all, When I use threads to get Genbank format file, show some error. It is shown as: "Can't call method "get_taxon" on unblessed reference at /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." #!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GenBank; use threads; my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); my $seq_out = Bio::SeqIO->new(-format => "genbank", -file => ">dhn_all.gb"); my @seq; my $number = @id; my $max_threads = 6; for (my $thread_number=0;$thread_number<$number;){ my %threads_seq_hash; if ($number - $thread_number > $max_threads){ for (my $thread=0;$thread<$max_threads;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } }else{ my $else_number = $number % $max_threads; for (my $thread=0;$thread<$else_number;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } } foreach my $thread (sort keys %threads_seq_hash){ my ($seq) = $threads_seq_hash{$thread}->join; push (@seq,$seq); } } foreach (@seq){ $seq_out->write_seq($_); } How can I fix this error? Thanks. Zhang Tao From lpritc at scri.ac.uk Mon Mar 1 06:32:10 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 01 Mar 2010 11:32:10 +0000 Subject: [Bioperl-l] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts Message-ID: Hi, I've tried going back through the mailing list, Googling the answer, and reading the documentation and wiki to find a solution for this. I've either missed it, or it's not there yet. Hopefully there's a simple solution, or an option that I'm just not seeing. I'm sure other people must be using CHADO for bacterial genomes, and I would be interested in hearing about best practice for using CHADO/GBROWSE with these sequences (I've seen http://gmod.org/wiki/Chado_for_prokaryotes - but there's not much in there...). I have a working CHADO(GMOD-1.0)/GBROWSE2/BioPerl 1.6.1 setup on CentOS 5.4, and I'm trying to load some bacterial data. Specifically for this example, I'm trying to get the GenBank sequences for E.coli S88: NC_011742 and NC_011747 into CHADO. I've been following instructions from a number of locations, including http://gmod.org/wiki/Artemis-Chado_Integration_Tutorial and http://gmod.org/wiki/Chado_Tutorial, but there's an issue with these two files, in that the NC_011742 (chromosome) and NC_011747 (plasmid) sequences contain genes that have the same names (and several genes with the same name in the same sequence!), and this appears to be a problem. Here's what's going wrong: I start off with the two GenBank files: """ [lpritc at localhost ~]$ ls -1 *.gbk NC_011742.gbk NC_011747.gbk """ And convert these to .gff3 using the BioPerl script (it doesn't seem to matter whether I pass them with the wildcard, or convert separately, though passing multiple sequences for conversion might be a good place to check for unique IDs): """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s *.gbk # Input: NC_011742.gbk # working on region:NC_011742, Escherichia coli S88, 19-DEC-2008, Escherichia coli S88, complete genome. # GFF3 saved to ./NC_011742.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4696 # gene 4898 # region 1 # pseudogene 151 # CDS 4696 # RESIDUES(tr) 1442813 # RESIDUES 5032268 # processed_transcript 89 # rRNA 22 # pseudogenic_region 151 # exon 4899 # tRNA 91 # # Input: NC_011747.gbk # working on region:NC_011747, Escherichia coli S88, 18-AUG-2009, Escherichia coli S88 plasmid pECOS88, complete sequence. # GFF3 saved to ./NC_011747.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4832 # gene 5037 # region 2 # pseudogene 159 # CDS 4832 # RESIDUES(tr) 1477756 # RESIDUES 5166121 # processed_transcript 92 # rRNA 22 # pseudogenic_region 159 # exon 5038 # tRNA 91 # """ I can then use the gmod_bulk_load_gff3.pl script to load either file, but only singly. This appears to work, and the result is visible and seemingly correctly navigable in GBROWSE (using NC_011747 as the first sequence here, but the order is unimportant): """ [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref GeneID --noexon --recreate_cache --gfffile NC_011747.gbk.gff (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes...Done. Preparing data for inserting into the chado database (This may take a while ...) Dropping cds temp tables... Creating cds temp tables... NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_cds_row_id_seq" for serial column "tmp_cds_handler.cds_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_pkey" for table "tmp_cds_handler" NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_relationship_rel_row_id_seq" for serial column "tmp_cds_handler_relationship.rel_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" Loading data into feature table ... Loading data into featureloc table ... Loading data into feature_relationship table ... Loading data into featureprop table ... Skipping feature_cvterm table since the load file is empty... Skipping synonym table since the load file is empty... Skipping feature_synonym table since the load file is empty... Skipping dbxref table since the load file is empty... Loading data into feature_dbxref table ... Skipping analysisfeature table since the load file is empty... Skipping cvterm table since the load file is empty... Skipping db table since the load file is empty... Skipping cv table since the load file is empty... Skipping analysis table since the load file is empty... Skipping organism table since the load file is empty... Adding cvtermprop=MapReferenceType for 'region' ... Loading sequences (if any) ... Optimizing database (this may take a while) ... (feature featureloc feature_relationship featureprop feature_cvterm synonym feature_synonym dbxref feature_dbxref analysisfeature cvterm db cv analysis organism ) Done. While this script has made an effort to optimize the database, you should probably also run VACUUM FULL ANALYZE on the database as well """ """ chado=> SELECT feature_id, organism_id, name, uniquename FROM feature WHERE name='NC_011747'; feature_id | organism_id | name | uniquename ------------+-------------+-----------+------------ 146917 | 99 | NC_011747 | NC_011747 """ However, attempting to load in the second sequence throws an error (though this might also be a good point to check for ID uniqueness with a database check, and appropriate modification to the ID, if necessary - problems could arise if we were trying to add genuine duplicates, though...): """ [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref GeneID --noexon --recreate_cache --gfffile NC_011742.gbk.gff (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes...Done. Preparing data for inserting into the chado database (This may take a while ...) Dropping cds temp tables... Creating cds temp tables... NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_cds_row_id_seq" for serial column "tmp_cds_handler.cds_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_pkey" for table "tmp_cds_handler" NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_relationship_rel_row_id_seq" for serial column "tmp_cds_handler_relationship.rel_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" no parent yacC; you probably need to rerun the loader with the --recreate_cache option Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Pg::db handle dbname=chado;port=5432;host=localhost. """ This, of course, prevents the upload of the sequence and its annotations, as a whole. The script recommends that the --recreate_cache option should be used, but I am already using it. If the same process is run, reversing the order of the input files, the same error is reported, but for the gene with name 'int'. Both sequences contain genes with the names 'int' and 'yacC' (NC_011742 appears to contain four genes with the name 'int'): """ [lpritc at localhost ~]$ grep 'ID=yacC;' *.gbk.gff NC_011742.gbk.gff:NC_011742 GenBank gene 142755 143273 . - . ID=yacC;Dbxref=GeneID:7130628;gene=yacC;locus_tag=ECS88_0131 NC_011747.gbk.gff:NC_011747 GenBank gene 85083 85931 . + . ID=yacC;Dbxref=GeneID:7119486;gene=yacC;locus_tag=pECS88_0103 [lpritc at localhost ~]$ grep 'ID=int;' *.gbk.gff NC_011742.gbk.gff:NC_011742 GenBank gene 1182443 1183585 . - . ID=int;Dbxref=GeneID:7131611;gene=int;locus_tag=ECS88_1152 NC_011742.gbk.gff:NC_011742 GenBank pseudogene 1998684 1999646 . + . ID=int;Dbxref=GeneID:7128964;gene=int;locus_tag=ECS88_2031;pseudo=_no_value NC_011742.gbk.gff:NC_011742 GenBank gene 2829972 2830991 . + . ID=int;Dbxref=GeneID:7131911;gene=int;locus_tag=ECS88_2851 NC_011742.gbk.gff:NC_011742 GenBank gene 3220074 3221336 . + . ID=int;Dbxref=GeneID:7129893;gene=int;locus_tag=ECS88_3250 NC_011747.gbk.gff:NC_011747 GenBank gene 132 872 . + . ID=int;Dbxref=GeneID:7119360;gene=int;locus_tag=pECS88_0001 """ Commenting out either of these genes, and their child features, defers the error to another gene that has the same name in both sequences in each case. It seems that the problem might derive from attempting to uniquely associate each gene uniquely with its 'gene' tag in the GenBank file and, as there are several points in the process where it would be sensible to check for name collisions, so that the feature:uniquename column can be modified to reflect this, I looked for command-line options to each script, but didn't see one that could help. Examining the manual for gmod_bulk_load_gff3.pl suggests that this might be the problem (though I might be misunderstanding it): """ Column 9 (group) Here is where the magic happens. Assigning feature.name, feature.uniquename The values of feature.name and feature.uniquename are assigned according to these simple rules: If there is an ID tag, that is used as feature.uniquename otherwise, it is assigned a uniquename that is equal to ?auto? concatenated with the feature_id. (Note that this is a potential problem as there is no check to make sure that it is appropriately unique.) If there is a Name tag, it?s value is set to feature.name; otherwise it is null. Note that these rules are much more simple than that those that Bio::DB::GFF uses, and may need to be revisited. """ I suspect that, as the bp_genbank2gff3.pl script converts gene names (which are not guaranteed to be unique) to ID tags, the problem recognised in the manual is cropping up at this point. Luckily, the GenBank files come with locus_tag tags, which should be unique for each gene (see http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html#locus_tag). For bacteria, at least, using the locus_tag values might be a more robust option for the bp_genbank2gff3.pl; this already appears to have been recognised in the script comments: """ #?? should gene_name from /locus_tag,/gene,/product,/transposon=xxx # be converted to or added as Name=xxx (if not ID= or as well) ## problematic: convert_to_name ($feature); # drops /locus_tag,/gene, tags """ I can get round the upload problem somewhat suckily by changing the priority given to 'locus_tag' and 'gene' tags for generating the .gff ID tag in the bp_genbank2gff3.pl script: """ [lpritc at localhost ~]$ diff bp_genbank2gff3.pl /usr/bin/bp_genbank2gff3.pl 976,977c976,977 < if ($g->has_tag('locus_tag')) { < ($gene_id) = $g->get_tag_values('locus_tag'); --- > if ($g->has_tag('gene')) { > ($gene_id) = $g->get_tag_values('gene'); 979,980c979,980 < elsif ($g->has_tag('gene')) { < ($gene_id) = $g->get_tag_values('gene'); --- > elsif ($g->has_tag('locus_tag')) { > ($gene_id) = $g->get_tag_values('locus_tag'); """ But this isn't a complete solution, as GBROWSE searches by gene name don't work after making this change, and presumably some further configuration or hacking about is required to sort that out (advice welcome). So, what are other people doing to overcome this issue (if you've seen it), and would a change to the bp_genbank2gff.pl script along the lines I mention be useful to others? Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From janine.arloth at googlemail.com Mon Mar 1 11:25:09 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Mon, 1 Mar 2010 17:25:09 +0100 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> Hello, I am running blast+ and want to create blastdb, depending on a checkbox. That means when mydb is to old then I want to rebuilt the blastdb files and create a ''new'' db. When the latest versions of my files is ok, then blast should ran with the existing db. Using this code, there I will never built a new db. It is creating and than it does not create a new one. if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } Thanks for helping From jensen at fortinbras.us Mon Mar 1 22:58:09 2010 From: jensen at fortinbras.us (Mark A. Jensen) Date: Mon, 1 Mar 2010 22:58:09 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> References: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> Message-ID: <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> Hi Janine-- You'll need to get the latest version of Bio/Tools/Run/StandAloneBlastPlus.pm (rev. 16878). Then the -overwrite parameter will actually work, and you can write if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -overwrite => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } MAJ ----- Original Message ----- From: "Janine Arloth" To: Cc: Sent: Monday, March 01, 2010 11:25 AM Subject: StandAloneBlastPlus Hello, I am running blast+ and want to create blastdb, depending on a checkbox. That means when mydb is to old then I want to rebuilt the blastdb files and create a ''new'' db. When the latest versions of my files is ok, then blast should ran with the existing db. Using this code, there I will never built a new db. It is creating and than it does not create a new one. if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } Thanks for helping From szy0931 at gmail.com Tue Mar 2 01:08:10 2010 From: szy0931 at gmail.com (Zhenyu Shen) Date: Mon, 1 Mar 2010 22:08:10 -0800 (PST) Subject: [Bioperl-l] how to convert a txt file to a bed file? Message-ID: I want to convert a txt file to a bed file and then load the bed file to USCS genome browser. But how to convert the txt file to a bed file with perl? thanks From joaofadista at gmail.com Tue Mar 2 04:10:03 2010 From: joaofadista at gmail.com (fadista) Date: Tue, 2 Mar 2010 01:10:03 -0800 (PST) Subject: [Bioperl-l] Next-gen modules Message-ID: Hi, I would like to know if there is any Next-gen sequencing modules on Bioperl. Specifically, I would like to know if there is a perl script to trim poor quality sequence reads from Illumina/Solexa platform. Best regards, Fadista From maj at fortinbras.us Tue Mar 2 09:51:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 09:51:12 -0500 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com><56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com><56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> Message-ID: <18C0182252934619AD12E49243BE3C14@NewLife> This might a good method to have for Bio::Search::Tiling-- you want to stitch together all the hsps and have the concatenated alignment returned as a Bio::SimpleAlign, correct? Tiling would create the right set of hsps from which to generate the composite alignment. I can try to get something working, but it may take a while- MAJ ----- Original Message ----- From: "Paolo Pavan" To: "Chris Fields" Cc: Sent: Tuesday, March 02, 2010 9:37 AM Subject: Re: [Bioperl-l] Alignment from blast report Hi Chris, Thank you for your reply. So I have to understand that since the get_aln method returns the HSP alignment, there is no way to retrieve the whole alignment as in the example pasted, isn't it? Basically I'm trying to use megablast as kind of multiple local alignment engine and actually I'm not pretty sure this is a good idea but in my particular case could be suitable. I mean that the example below reports only the portions of the sequences that align loosing the portions that does not, I'm not sure I gave the idea. What do you think about? Can you give me your opinion? If there isn't any module written yet, I can try to write a parser, it could be of any interest? Thank you, Paolo 2010/3/2 Chris Fields : > Paolo, > > You can get a Bio::SimpleAlign from the HSP object. The first code example in > this section in the HOWTO demonstrates this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > chris > > On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > >> Dear all, >> Sorry for pushing up my post but, please does anyone have an hint for me? >> Maybe have I to send attached the report to the mailing list? I don't >> know attachment policies of the list, if it is allowed and is needed I >> can do that. >> >> Thank you, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Sorry, >>> Maybe I forgot to add this is the megablast -m 5 output. >>> >>> Thank you again, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> Hi all, >>>> I have just a brief question: I've got some megablast reports such the >>>> one I've pasted below. >>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>> entire alignment represented as a Bio::SimpleAlign object or >>>> Bio::Align::AlignI implementing one? >>>> >>>> Thank you all, >>>> Paolo >>>> >>>> >>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>> >>>> >>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>> (2000), >>>> "A greedy algorithm for aligning DNA sequences", >>>> J Comput Biol 2000; 7(1-2):203-14. >>>> >>>> Database: 00038-00053.fasta >>>> 2 sequences; 2001 total letters >>>> >>>> Searching..................................................done >>>> >>>> Query= 00038-00053 >>>> (802 letters) >>>> >>>> >>>> >>>> Score E >>>> Sequences producing significant alignments: (bits) Value >>>> >>>> ______00038 >>>> 226 1e-62 >>>> ______00053 >>>> 115 3e-29 >>>> >>>> 1_0 472 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>> ______00038 883 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 532 >>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>> ______00038 943 >>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 591 >>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 651 >>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 711 >>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 1 -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg >>>> 34 >>>> >>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>> ______00038 1000 -------------------------------- 1001 >>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>> Database: 00038-00053.fasta >>>> Posted date: Feb 25, 2010 4:47 PM >>>> Number of letters in database: 2001 >>>> Number of sequences in database: 2 >>>> >>>> Lambda K H >>>> 1.37 0.711 1.31 >>>> >>>> Gapped >>>> Lambda K H >>>> 1.37 0.711 1.31 >>>> >>>> >>>> Matrix: blastn matrix:1 -3 >>>> Gap Penalties: Existence: 0, Extension: 0 >>>> Number of Sequences: 2 >>>> Number of Hits to DB: 17 >>>> Number of extensions: 3 >>>> Number of successful extensions: 3 >>>> Number of sequences better than 10.0: 2 >>>> Number of HSP's gapped: 2 >>>> Number of HSP's successfully gapped: 2 >>>> Length of query: 802 >>>> Length of database: 2001 >>>> Length adjustment: 10 >>>> Effective length of query: 792 >>>> Effective length of database: 1981 >>>> Effective search space: 1568952 >>>> Effective search space used: 1568952 >>>> X1: 9 (17.8 bits) >>>> X2: 20 (39.6 bits) >>>> X3: 51 (101.1 bits) >>>> S1: 9 (18.3 bits) >>>> S2: 9 (18.3 bits) >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Mar 2 10:12:02 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 10:12:02 -0500 Subject: [Bioperl-l] Installing bioperl on windows In-Reply-To: <30b0ffab-3ad6-4b59-8c19-2f203ff6c4f9@f17g2000prh.googlegroups.com> References: <30b0ffab-3ad6-4b59-8c19-2f203ff6c4f9@f17g2000prh.googlegroups.com> Message-ID: The steps on the wiki are in fact quite detailed. What we need then is details from you--the commands you ran and your error messages. Thanks. ----- Original Message ----- From: "disha" To: Sent: Friday, February 26, 2010 8:43 AM Subject: [Bioperl-l] Installing bioperl on windows > Please tell me the procedure (detailed ) for installing bioperl on > windows vista.I tried the steps mentioned on the site but failed at > the initial steps > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From scott at scottcain.net Tue Mar 2 11:11:13 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 2 Mar 2010 11:11:13 -0500 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: References: Message-ID: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Hi Leighton, Wow, that is a lot of text; I really appreciate your thoroughness in describing the problem. I have a few suggestions to get the ball rolling. First, I am working on the 1.1 release of gmod/chado, and it may fix some of the problems you are describing. Certainly, ID collisions between GFF files should not be a problem (I didn't think they were in the 1.0 release, but that was a long time ago). Please try a checkout of the schema trunk in the gmod svn: http://gmod.org/wiki/SVN Another thing you may want to look at is that just last week, a developer at Texas A&M, Nathan Liles, contributed code to the bioperl-live trunk for the genbank2gff3.pl script that will do a much better job of converting bacterial genbank files to GFF3; perhaps that will help too. Working with a svn checkout of bioperl-live shouldn't be too scary either; the pieces you are interested in (that work with Chado and GBrowse) are quite stable. Let us know how it goes, Scott On Mon, Mar 1, 2010 at 6:32 AM, Leighton Pritchard wrote: > Hi, > > I've tried going back through the mailing list, Googling the answer, and > reading the documentation and wiki to find a solution for this. ?I've either > missed it, or it's not there yet. ?Hopefully there's a simple solution, or > an option that I'm just not seeing. ?I'm sure other people must be using > CHADO for bacterial genomes, and I would be interested in hearing about best > practice for using CHADO/GBROWSE with these sequences (I've seen > http://gmod.org/wiki/Chado_for_prokaryotes - but there's not much in > there...). > > I have a working CHADO(GMOD-1.0)/GBROWSE2/BioPerl 1.6.1 setup on CentOS 5.4, > and I'm trying to load some bacterial data. ?Specifically for this example, > I'm trying to get the GenBank sequences for E.coli S88: NC_011742 and > NC_011747 into CHADO. ?I've been following instructions from a number of > locations, including http://gmod.org/wiki/Artemis-Chado_Integration_Tutorial > and http://gmod.org/wiki/Chado_Tutorial, but there's an issue with these two > files, in that the NC_011742 (chromosome) and NC_011747 (plasmid) sequences > contain genes that have the same names (and several genes with the same name > in the same sequence!), and this appears to be a problem. ?Here's what's > going wrong: > > I start off with the two GenBank files: > > """ > [lpritc at localhost ~]$ ls -1 *.gbk > NC_011742.gbk > NC_011747.gbk > """ > > And convert these to .gff3 using the BioPerl script (it doesn't seem to > matter whether I pass them with the wildcard, or convert separately, though > passing multiple sequences for conversion might be a good place to check for > unique IDs): > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s *.gbk > # Input: NC_011742.gbk > # working on region:NC_011742, Escherichia coli S88, 19-DEC-2008, > Escherichia coli S88, complete genome. > # GFF3 saved to ./NC_011742.gbk.gff > # Summary: > # Feature ? ?Count > # ------- ? ?----- > # mRNA ?4696 > # gene ?4898 > # region ?1 > # pseudogene ?151 > # CDS ?4696 > # RESIDUES(tr) ?1442813 > # RESIDUES ?5032268 > # processed_transcript ?89 > # rRNA ?22 > # pseudogenic_region ?151 > # exon ?4899 > # tRNA ?91 > # > # Input: NC_011747.gbk > # working on region:NC_011747, Escherichia coli S88, 18-AUG-2009, > Escherichia coli S88 plasmid pECOS88, complete sequence. > # GFF3 saved to ./NC_011747.gbk.gff > # Summary: > # Feature ? ?Count > # ------- ? ?----- > # mRNA ?4832 > # gene ?5037 > # region ?2 > # pseudogene ?159 > # CDS ?4832 > # RESIDUES(tr) ?1477756 > # RESIDUES ?5166121 > # processed_transcript ?92 > # rRNA ?22 > # pseudogenic_region ?159 > # exon ?5038 > # tRNA ?91 > # > """ > > I can then use the gmod_bulk_load_gff3.pl script to load either file, but > only singly. ?This appears to work, and the result is visible and seemingly > correctly navigable in GBROWSE (using NC_011747 as the first sequence here, > but the order is unimportant): > > """ > [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref > GeneID --noexon --recreate_cache --gfffile NC_011747.gbk.gff > (Re)creating the uniquename cache in the database... > Creating table... > Populating table... > Creating indexes...Done. > Preparing data for inserting into the chado database > (This may take a while ...) > Dropping cds temp tables... > Creating cds temp tables... > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_cds_row_id_seq" for serial column > "tmp_cds_handler.cds_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_pkey" for table "tmp_cds_handler" > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column > "tmp_cds_handler_relationship.rel_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" > Loading data into feature table ... > Loading data into featureloc table ... > Loading data into feature_relationship table ... > Loading data into featureprop table ... > Skipping feature_cvterm table since the load file is empty... > Skipping synonym table since the load file is empty... > Skipping feature_synonym table since the load file is empty... > Skipping dbxref table since the load file is empty... > Loading data into feature_dbxref table ... > Skipping analysisfeature table since the load file is empty... > Skipping cvterm table since the load file is empty... > Skipping db table since the load file is empty... > Skipping cv table since the load file is empty... > Skipping analysis table since the load file is empty... > Skipping organism table since the load file is empty... > Adding cvtermprop=MapReferenceType for 'region' ... > Loading sequences (if any) ... > Optimizing database (this may take a while) ... > ?(feature featureloc feature_relationship featureprop feature_cvterm > synonym feature_synonym dbxref feature_dbxref analysisfeature cvterm db cv > analysis organism ) Done. > > While this script has made an effort to optimize the database, you > should probably also run VACUUM FULL ANALYZE on the database as well > """ > > """ > chado=> SELECT feature_id, organism_id, name, uniquename FROM feature WHERE > name='NC_011747'; > ?feature_id | organism_id | ? name ? ?| uniquename > ------------+-------------+-----------+------------ > ? ? 146917 | ? ? ? ? ?99 | NC_011747 | NC_011747 > """ > > However, attempting to load in the second sequence throws an error (though > this might also be a good point to check for ID uniqueness with a database > check, and appropriate modification to the ID, if necessary - problems could > arise if we were trying to add genuine duplicates, though...): > > """ > [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref > GeneID --noexon --recreate_cache --gfffile NC_011742.gbk.gff > (Re)creating the uniquename cache in the database... > Creating table... > Populating table... > Creating indexes...Done. > Preparing data for inserting into the chado database > (This may take a while ...) > Dropping cds temp tables... > Creating cds temp tables... > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_cds_row_id_seq" for serial column > "tmp_cds_handler.cds_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_pkey" for table "tmp_cds_handler" > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column > "tmp_cds_handler_relationship.rel_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" > > no parent yacC; > you probably need to rerun the loader with the --recreate_cache option > > Issuing rollback() due to DESTROY without explicit disconnect() of > DBD::Pg::db handle dbname=chado;port=5432;host=localhost. > """ > > This, of course, prevents the upload of the sequence and its annotations, as > a whole. > > The script recommends that the --recreate_cache option should be used, but I > am already using it. ?If the same process is run, reversing the order of the > input files, the same error is reported, but for the gene with name 'int'. > Both sequences contain genes with the names 'int' and 'yacC' (NC_011742 > appears to contain four genes with the name 'int'): > > """ > [lpritc at localhost ~]$ grep 'ID=yacC;' *.gbk.gff > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?142755 ? ?143273 ? ?. ? ?- > . ? ?ID=yacC;Dbxref=GeneID:7130628;gene=yacC;locus_tag=ECS88_0131 > NC_011747.gbk.gff:NC_011747 ? ?GenBank ? ?gene ? ?85083 ? ?85931 ? ?. ? ?+ > . ? ?ID=yacC;Dbxref=GeneID:7119486;gene=yacC;locus_tag=pECS88_0103 > > [lpritc at localhost ~]$ grep 'ID=int;' *.gbk.gff > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?1182443 ? ?1183585 ? ?. > - ? ?. ? ?ID=int;Dbxref=GeneID:7131611;gene=int;locus_tag=ECS88_1152 > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?pseudogene ? ?1998684 ? ?1999646 > . ? ?+ ? ?. > ID=int;Dbxref=GeneID:7128964;gene=int;locus_tag=ECS88_2031;pseudo=_no_value > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?2829972 ? ?2830991 ? ?. > + ? ?. ? ?ID=int;Dbxref=GeneID:7131911;gene=int;locus_tag=ECS88_2851 > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?3220074 ? ?3221336 ? ?. > + ? ?. ? ?ID=int;Dbxref=GeneID:7129893;gene=int;locus_tag=ECS88_3250 > NC_011747.gbk.gff:NC_011747 ? ?GenBank ? ?gene ? ?132 ? ?872 ? ?. ? ?+ ? ?. > ID=int;Dbxref=GeneID:7119360;gene=int;locus_tag=pECS88_0001 > """ > > Commenting out either of these genes, and their child features, defers the > error to another gene that has the same name in both sequences in each case. > It seems that the problem might derive from attempting to uniquely associate > each gene uniquely with its 'gene' tag in the GenBank file and, as there are > several points in the process where it would be sensible to check for name > collisions, so that the feature:uniquename column can be modified to reflect > this, I looked for command-line options to each script, but didn't see one > that could help. ?Examining the manual for gmod_bulk_load_gff3.pl suggests > that this might be the problem (though I might be misunderstanding it): > > """ > ? ? ? Column 9 (group) > ? ? ? ? ? Here is where the magic happens. > > ? ? ? ? ? Assigning feature.name, feature.uniquename > ? ? ? ? ? ? ? The values of feature.name and feature.uniquename are > assigned according to these simple rules: > > ? ? ? ? ? ? ? If there is an ID tag, that is used as feature.uniquename > ? ? ? ? ? ? ? ? ? otherwise, it is assigned a uniquename that is equal to > ?auto? concatenated with the feature_id. > > ? ? ? ? ? ? ? ? ? (Note that this is a potential problem as there is no > check to make sure that it is appropriately unique.) > > ? ? ? ? ? ? ? If there is a Name tag, it?s value is set to feature.name; > ? ? ? ? ? ? ? ? ? otherwise it is null. > > ? ? ? ? ? ? ? ? ? Note that these rules are much more simple than that > those that Bio::DB::GFF uses, and may need to be revisited. > """ > > I suspect that, as the bp_genbank2gff3.pl script converts gene names (which > are not guaranteed to be unique) to ID tags, the problem recognised in the > manual is cropping up at this point. ?Luckily, the GenBank files come with > locus_tag tags, which should be unique for each gene (see > http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html#locus_tag). ?For > bacteria, at least, using the locus_tag values might be a more robust option > for the bp_genbank2gff3.pl; this already appears to have been recognised in > the script comments: > > """ > ? ? ? ? ? ?#?? should gene_name from > /locus_tag,/gene,/product,/transposon=xxx > ? ? ? ? ? ?# be converted to or added as ?Name=xxx (if not ID= or as well) > ? ? ? ? ? ?## problematic: convert_to_name ($feature); # drops > /locus_tag,/gene, tags > """ > > I can get round the upload problem somewhat suckily by changing the priority > given to 'locus_tag' and 'gene' tags for generating the .gff ID tag in the > bp_genbank2gff3.pl script: > > """ > [lpritc at localhost ~]$ diff bp_genbank2gff3.pl /usr/bin/bp_genbank2gff3.pl > 976,977c976,977 > < ? ? if ($g->has_tag('locus_tag')) { > < ? ? ? ? ($gene_id) = $g->get_tag_values('locus_tag'); > --- >> ? ? if ($g->has_tag('gene')) { >> ? ? ? ? ($gene_id) = $g->get_tag_values('gene'); > 979,980c979,980 > < ? ? elsif ($g->has_tag('gene')) { > < ? ? ? ? ($gene_id) = $g->get_tag_values('gene'); > --- >> ? ? elsif ($g->has_tag('locus_tag')) { >> ? ? ? ? ($gene_id) = $g->get_tag_values('locus_tag'); > """ > > But this isn't a complete solution, as GBROWSE searches by gene name don't > work after making this change, and presumably some further configuration or > hacking about is required to sort that out (advice welcome). > > So, what are other people doing to overcome this issue (if you've seen it), > and would a change to the bp_genbank2gff.pl script along the lines I mention > be useful to others? > > Cheers, > > L. > > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From sdavis2 at mail.nih.gov Tue Mar 2 11:33:38 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 2 Mar 2010 11:33:38 -0500 Subject: [Bioperl-l] how to convert a txt file to a bed file? In-Reply-To: References: Message-ID: <264855a01003020833v3e15dcb7vcdd876ce80468740@mail.gmail.com> On Tue, Mar 2, 2010 at 1:08 AM, Zhenyu Shen wrote: > I want to convert a txt file to a bed file and then load the bed file > to USCS genome browser. But how to convert the txt file to a bed file > with perl? Hi, Zhenyu. A bed file IS a text file, with the format described here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#BED You just need to make your text file conform to that format and you are set to go. Sean From paolo.pavan at gmail.com Tue Mar 2 10:17:35 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 16:17:35 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <18C0182252934619AD12E49243BE3C14@NewLife> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> <18C0182252934619AD12E49243BE3C14@NewLife> Message-ID: <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> I think you got the sense, thank you. Of course hsps from different hits will be reflected in different elements aligned. I've attached the example pasted (unix text) because is more readable, hoping will not be held by the mailing server :-) Thank you, Paolo 2010/3/2 Mark A. Jensen : > This might a good method to have for Bio::Search::Tiling-- > you want to stitch together all the hsps and have the > concatenated alignment returned as a Bio::SimpleAlign, > correct? Tiling would create the right set of hsps from > which to generate the composite alignment. I can > try to get something working, but it may take a while- > MAJ > ----- Original Message ----- From: "Paolo Pavan" > To: "Chris Fields" > Cc: > Sent: Tuesday, March 02, 2010 9:37 AM > Subject: Re: [Bioperl-l] Alignment from blast report > > > Hi Chris, > Thank you for your reply. So I have to understand that since the > get_aln method returns the HSP alignment, there is no way to retrieve > the whole alignment as in the example pasted, isn't it? > Basically I'm trying to use megablast as kind of multiple local > alignment engine and actually I'm not pretty sure this is a good idea > but in my particular case could be suitable. I mean that the example > below reports only the portions of the sequences that align loosing > the portions that does not, I'm not sure I gave the idea. What do you > think about? Can you give me your opinion? > If there isn't any module written yet, I can try to write a parser, it > could be of any interest? > > Thank you, > Paolo > > 2010/3/2 Chris Fields : >> >> Paolo, >> >> You can get a Bio::SimpleAlign from the HSP object. The first code example >> in this section in the HOWTO demonstrates this: >> >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >> >> chris >> >> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >> >>> Dear all, >>> Sorry for pushing up my post but, please does anyone have an hint for me? >>> Maybe have I to send attached the report to the mailing list? I don't >>> know attachment policies of the list, if it is allowed and is needed I >>> can do that. >>> >>> Thank you, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> >>>> Sorry, >>>> Maybe I forgot to add this is the megablast -m 5 output. >>>> >>>> Thank you again, >>>> Paolo >>>> >>>> 2010/2/26 Paolo Pavan : >>>>> >>>>> Hi all, >>>>> I have just a brief question: I've got some megablast reports such the >>>>> one I've pasted below. >>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>> Bio::Align::AlignI implementing one? >>>>> >>>>> Thank you all, >>>>> Paolo >>>>> >>>>> >>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>> >>>>> >>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>> (2000), >>>>> "A greedy algorithm for aligning DNA sequences", >>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>> >>>>> Database: 00038-00053.fasta >>>>> 2 sequences; 2001 total letters >>>>> >>>>> Searching..................................................done >>>>> >>>>> Query= 00038-00053 >>>>> (802 letters) >>>>> >>>>> >>>>> >>>>> Score E >>>>> Sequences producing significant alignments: (bits) Value >>>>> >>>>> ______00038 >>>>> 226 1e-62 >>>>> ______00053 >>>>> 115 3e-29 >>>>> >>>>> 1_0 472 >>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>> ______00038 883 >>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 532 >>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>> ______00038 943 >>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 591 >>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 651 >>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 711 >>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 1 >>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>> >>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>> ______00038 1000 -------------------------------- 1001 >>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>> Database: 00038-00053.fasta >>>>> Posted date: Feb 25, 2010 4:47 PM >>>>> Number of letters in database: 2001 >>>>> Number of sequences in database: 2 >>>>> >>>>> Lambda K H >>>>> 1.37 0.711 1.31 >>>>> >>>>> Gapped >>>>> Lambda K H >>>>> 1.37 0.711 1.31 >>>>> >>>>> >>>>> Matrix: blastn matrix:1 -3 >>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>> Number of Sequences: 2 >>>>> Number of Hits to DB: 17 >>>>> Number of extensions: 3 >>>>> Number of successful extensions: 3 >>>>> Number of sequences better than 10.0: 2 >>>>> Number of HSP's gapped: 2 >>>>> Number of HSP's successfully gapped: 2 >>>>> Length of query: 802 >>>>> Length of database: 2001 >>>>> Length adjustment: 10 >>>>> Effective length of query: 792 >>>>> Effective length of database: 1981 >>>>> Effective search space: 1568952 >>>>> Effective search space used: 1568952 >>>>> X1: 9 (17.8 bits) >>>>> X2: 20 (39.6 bits) >>>>> X3: 51 (101.1 bits) >>>>> S1: 9 (18.3 bits) >>>>> S2: 9 (18.3 bits) >>>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: example.megaout Type: application/octet-stream Size: 2918 bytes Desc: not available URL: From Russell.Smithies at agresearch.co.nz Tue Mar 2 14:35:19 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Mar 2010 08:35:19 +1300 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> References: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61E4E660@exchsth.agresearch.co.nz> If you want to continue using your current version, you could try to delete your old blast db first. if($checkbox eq 'yes'){ unlink "mydb.*"; #or maybe `rm -f mydb.*` $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Tuesday, 2 March 2010 4:58 p.m. > To: Janine Arloth > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] StandAloneBlastPlus > > Hi Janine-- > You'll need to get the latest version of > Bio/Tools/Run/StandAloneBlastPlus.pm > (rev. 16878). > Then the -overwrite parameter will actually work, and you can write > > if($checkbox eq 'yes'){ > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -prog_dir => "/usr/local/ncbi/blast/bin", > -db_name => 'mydb', > -db_data => 'xxx.fa', > -overwrite => 1); > } > else{ > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'mydb'); > } > > MAJ > > ----- Original Message ----- > From: "Janine Arloth" > To: > Cc: > Sent: Monday, March 01, 2010 11:25 AM > Subject: StandAloneBlastPlus > > > Hello, > > I am running blast+ and want to create blastdb, depending on a checkbox. > That > means when mydb is to old then I want to rebuilt the blastdb files and > create a > ''new'' db. > When the latest versions of my files is ok, then blast should ran with > the > existing db. > Using this code, there I will never built a new db. It is creating and > than it > does not create a new one. > > > if($checkbox eq 'yes'){ > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -prog_dir => "/usr/local/ncbi/blast/bin", > -db_name => 'mydb', > -db_data => 'xxx.fa', > -create => 1); > } > else{ > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'mydb'); > } > > Thanks for helping > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From armendarez77 at hotmail.com Tue Mar 2 16:06:17 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Tue, 2 Mar 2010 13:06:17 -0800 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 Message-ID: Hello, I am writing a script to remotely access annotation files and parse information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it with random RefSeq accession numbers (NC_######) when something odd happened. When I used the accession number 'NC_007092', the script seemed to freeze. After some time, 'Out of Memory' was printed to the terminal. When I investigated the annotation file associated with NC_007092, a MapViewer page opened. It turns out that NC_007092 is a genome shotgun sequence, but it does not start with 'NZ' as I though all shotgun sequences did. Is this a random event that I don't have to worry much about or is there a way to pre-screen accession numbers to ensure they are associated with complete genome RefSeq files? I've included my script in case there is something I missed that could have prevented this. Thank you, Veronica _________________ use strict; use Bio::Perl; use Getopt::Long; use IO::Handle; my $accessionNumber; GetOptions("accessionNumber=s"=>\$accessionNumber); unless($accessionNumber){ print<<"OPTIONS"; options for $0 accessionNumber -a accession number OPTIONS die; } my $description = annotation_info($accessionNumber); print "$description\n"; sub annotation_info{ my $seqObj; my $accNum = shift(@_); my $rs = Bio::DB::RefSeq->new(); my $gb = Bio::DB::GenBank->new(); if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore in their accession number $seqObj = $rs->get_Seq_by_id($accNum); } elsif($accNum !~ /_/){ #GenBank annotation $seqObj = $gb->get_Seq_by_id($accNum); } return $seqObj->desc(); } _________________________________________________________________ Hotmail: Trusted email with Microsoft?s powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469226/direct/01/ From maj at fortinbras.us Tue Mar 2 15:58:59 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 15:58:59 -0500 Subject: [Bioperl-l] bioperl job Message-ID: Hi All, I have a contact looking for an individual with Bioperl experience who could do contractual on-site work in the Cambridge MA area. **I have no business interest in this whatever, just doing a friend a favor.** Let me know directly (not to the list) if you have interest. thanks -- MAJ From Russell.Smithies at agresearch.co.nz Tue Mar 2 18:08:51 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Mar 2010 12:08:51 +1300 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large. Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html Also, it may help to check the docsum first to see how big the file is going to be? (the full Genbank file for this example is only 6MB in size) =================== use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' ); my ($id) = $factory->get_ids; # get a summary $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); my $ds = $factory->next_DocSum; print "ID: $id\n"; # flattened mode while (my $item = $ds->next_Item('flattened')) { # not all Items have content, so need to check... printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } print "\n"; # download the full genbank file $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => "$id.gb"); ================ Hope this helps, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com > Sent: Wednesday, 3 March 2010 10:06 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > > Hello, > > I am writing a script to remotely access annotation files and parse > information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it > with random RefSeq accession numbers (NC_######) when something odd > happened. When I used the accession number 'NC_007092', the script seemed > to freeze. After some time, 'Out of Memory' was printed to the terminal. > > When I investigated the annotation file associated with NC_007092, a > MapViewer page opened. It turns out that NC_007092 is a genome shotgun > sequence, but it does not start with 'NZ' as I though all shotgun > sequences did. > > Is this a random event that I don't have to worry much about or is there a > way to pre-screen accession numbers to ensure they are associated with > complete genome RefSeq files? > > I've included my script in case there is something I missed that could > have prevented this. > > Thank you, > > Veronica > > > _________________ > > use strict; > use Bio::Perl; > use Getopt::Long; > use IO::Handle; > > my $accessionNumber; > > GetOptions("accessionNumber=s"=>\$accessionNumber); > unless($accessionNumber){ > print<<"OPTIONS"; > options for $0 > accessionNumber -a accession number > OPTIONS > die; > } > > my $description = annotation_info($accessionNumber); > > print "$description\n"; > > > > sub annotation_info{ > > my $seqObj; > > my $accNum = shift(@_); > > my $rs = Bio::DB::RefSeq->new(); > my $gb = Bio::DB::GenBank->new(); > > > if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore > in their accession number > > $seqObj = $rs->get_Seq_by_id($accNum); > } > elsif($accNum !~ /_/){ #GenBank annotation > $seqObj = $gb->get_Seq_by_id($accNum); > } > > return $seqObj->desc(); > } > > > _________________________________________________________________ > Hotmail: Trusted email with Microsoft's powerful SPAM protection. > http://clk.atdmt.com/GBL/go/201469226/direct/01/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From armendarez77 at hotmail.com Tue Mar 2 18:16:03 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Tue, 2 Mar 2010 15:16:03 -0800 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> References: , <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> Message-ID: I see. I work mostly in the bacteria world so mammalian chromosomes shouldn't be an issue. I just randomly picked it to test my script when it came up after I did a simple search for Bacillus in the Genome database. I'll look into docSum to help prevent unexpected large files from interrupting my script. Thank you. Veronica > From: Russell.Smithies at agresearch.co.nz > To: armendarez77 at hotmail.com; bioperl-l at lists.open-bio.org > Date: Wed, 3 Mar 2010 12:08:51 +1300 > Subject: Re: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large. > Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html > > Also, it may help to check the docsum first to see how big the file is going to be? > (the full Genbank file for this example is only 6MB in size) > > =================== > use Bio::DB::EUtilities; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' ); > > my ($id) = $factory->get_ids; > > # get a summary > $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); > my $ds = $factory->next_DocSum; > print "ID: $id\n"; > # flattened mode > while (my $item = $ds->next_Item('flattened')) { > # not all Items have content, so need to check... > printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; > } > print "\n"; > > > # download the full genbank file > $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'nucleotide', > -id => $id, > -rettype => 'gbwithparts'); > $factory->get_Response(-file => "$id.gb"); > > ================ > > Hope this helps, > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com > > Sent: Wednesday, 3 March 2010 10:06 a.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > > > > > Hello, > > > > I am writing a script to remotely access annotation files and parse > > information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it > > with random RefSeq accession numbers (NC_######) when something odd > > happened. When I used the accession number 'NC_007092', the script seemed > > to freeze. After some time, 'Out of Memory' was printed to the terminal. > > > > When I investigated the annotation file associated with NC_007092, a > > MapViewer page opened. It turns out that NC_007092 is a genome shotgun > > sequence, but it does not start with 'NZ' as I though all shotgun > > sequences did. > > > > Is this a random event that I don't have to worry much about or is there a > > way to pre-screen accession numbers to ensure they are associated with > > complete genome RefSeq files? > > > > I've included my script in case there is something I missed that could > > have prevented this. > > > > Thank you, > > > > Veronica > > > > > > _________________ > > > > use strict; > > use Bio::Perl; > > use Getopt::Long; > > use IO::Handle; > > > > my $accessionNumber; > > > > GetOptions("accessionNumber=s"=>\$accessionNumber); > > unless($accessionNumber){ > > print<<"OPTIONS"; > > options for $0 > > accessionNumber -a accession number > > OPTIONS > > die; > > } > > > > my $description = annotation_info($accessionNumber); > > > > print "$description\n"; > > > > > > > > sub annotation_info{ > > > > my $seqObj; > > > > my $accNum = shift(@_); > > > > my $rs = Bio::DB::RefSeq->new(); > > my $gb = Bio::DB::GenBank->new(); > > > > > > if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore > > in their accession number > > > > $seqObj = $rs->get_Seq_by_id($accNum); > > } > > elsif($accNum !~ /_/){ #GenBank annotation > > $seqObj = $gb->get_Seq_by_id($accNum); > > } > > > > return $seqObj->desc(); > > } > > > > > > _________________________________________________________________ > > Hotmail: Trusted email with Microsoft's powerful SPAM protection. > > http://clk.atdmt.com/GBL/go/201469226/direct/01/ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ From csaba.ortutay at uta.fi Thu Mar 4 04:57:00 2010 From: csaba.ortutay at uta.fi (Csaba Ortutay) Date: Thu, 4 Mar 2010 11:57:00 +0200 Subject: [Bioperl-l] Bio::DB::CUTG problem Message-ID: <201003041157.01013.csaba.ortutay@uta.fi> Hello, We would use Bio::DB::CUTG module to get codon usage data for a large number of genomes. We have noticed that the module cannot findcertain organisms which are otherwise in the database. It happens when the name contains some non- alphabetic characters. A few examples: Streptococcus agalactiae 2603V/R Shigella flexneri 5 str. 8401 I have located the corresponding part in the CUTG.pm code, and I would suggest a change: 222c222 < my $nameparts = join "+", $self->sp =~ /(\w+)/g; --- > my $nameparts = join "+", $self->sp =~ /(\S+)/g; With this I can now access the wanted tables. Best regards, Csaba -- Csaba Ortutay PhD Docent of Bioinformatics IMT Bioinformatics University of Tampere Finland From maj at fortinbras.us Thu Mar 4 08:10:06 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 08:10:06 -0500 Subject: [Bioperl-l] Bio::DB::CUTG problem In-Reply-To: <201003041157.01013.csaba.ortutay@uta.fi> References: <201003041157.01013.csaba.ortutay@uta.fi> Message-ID: Thanks, Csaba - change made and commited at r16898 MAJA ----- Original Message ----- From: "Csaba Ortutay" To: Sent: Thursday, March 04, 2010 4:57 AM Subject: [Bioperl-l] Bio::DB::CUTG problem > Hello, > > We would use Bio::DB::CUTG module to get codon usage data for a large number > of genomes. > > We have noticed that the module cannot findcertain organisms which are > otherwise in the database. It happens when the name contains some non- > alphabetic characters. > > A few examples: > > Streptococcus agalactiae 2603V/R > Shigella flexneri 5 str. 8401 > > I have located the corresponding part in the CUTG.pm code, and I would suggest > a change: > > 222c222 > < my $nameparts = join "+", $self->sp =~ /(\w+)/g; > --- >> my $nameparts = join "+", $self->sp =~ /(\S+)/g; > > > With this I can now access the wanted tables. > > Best regards, > Csaba > > -- > Csaba Ortutay PhD > Docent of Bioinformatics > IMT Bioinformatics > University of Tampere > Finland > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Thu Mar 4 09:40:18 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 04 Mar 2010 14:40:18 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> Message-ID: <4B8FC652.2010607@bioperl.org> Palani - This should be directed to the mailing list. -------- Original Message -------- From: PalaniKannan K Subject: Enquiry about Remoteblast.pm Date: Thu, 4 Mar 2010 10:23:45 +0530 I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through Remoteblast.pm script available through CPAN. When i am submitting my query... it shows waiting for much time. Ex. (waiting .....................) http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html This is the reference script i am using through Remoteblast perl module. It worked upto last 02/03/2010. Now it is not working We had developed 3 applications using this module. The same error comes in 3 applications we developed. So, i confim that our script dont have problem. Kindly help me in this regard. -- With Regards, palani kannan. k From maj at fortinbras.us Thu Mar 4 09:50:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 09:50:54 -0500 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com><56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com><56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com><56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com><18C0182252934619AD12E49243BE3C14@NewLife> <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> Message-ID: <2FB5C317605B48269256ABFABBED2239@NewLife> Paolo -- Ok, there's now (r16900) an *experimental* method in Bio::Search::Tiling::MapTiling called get_tiled_alns(). POD is below. Try it out and let me know-- cheers, MAJ =head1 TILED ALIGNMENTS The experimental method L will use a tiling to concatenate tiled hsps into a series of L objects: @alns = $tiling->get_tiled_alns($type, $context); Each alignment contains two sequences with ids 'query' and 'subject', and consists of a concatenation of tiling HSPs which overlap or are directly adjacent. The alignment are returned in C<$type> sequence order. When HSPs overlap, the alignment sequence is taken from the HSP which comes first in the coverage map array. The sequences in each alignment contain features (even though they are L objects) which map the original query/subject coordinates to the new alignment sequence coordinates. You can determine the original BLAST fragments this way: $aln = ($tiling->get_tiled_alns)[0]; $qseq = $aln->get_seq_by_id('query'); $hseq = $aln->get_seq_by_id('subject'); foreach my $feat ($qseq->get_SeqFeatures) { $org_start = ($feat->get_tag_values('query_start'))[0]; $org_end = ($feat->get_tag_values('query_end'))[0]; # original fragment as represented in the tiled alignment: $org_fragment = $feat->seq; } foreach my $feat ($hseq->get_SeqFeatures) { $org_start = ($feat->get_tag_values('subject_start'))[0]; $org_end = ($feat->get_tag_values('subject_end'))[0]; # original fragment as represented in the tiled alignment: $org_fragment = $feat->seq; } ----- Original Message ----- From: "Paolo Pavan" To: "Mark A. Jensen" Cc: "Chris Fields" ; Sent: Tuesday, March 02, 2010 10:17 AM Subject: Re: [Bioperl-l] Alignment from blast report >I think you got the sense, thank you. Of course hsps from different > hits will be reflected in different elements aligned. I've attached > the example pasted (unix text) because is more readable, hoping will > not be held by the mailing server :-) > > Thank you, > Paolo > > 2010/3/2 Mark A. Jensen : >> This might a good method to have for Bio::Search::Tiling-- >> you want to stitch together all the hsps and have the >> concatenated alignment returned as a Bio::SimpleAlign, >> correct? Tiling would create the right set of hsps from >> which to generate the composite alignment. I can >> try to get something working, but it may take a while- >> MAJ >> ----- Original Message ----- From: "Paolo Pavan" >> To: "Chris Fields" >> Cc: >> Sent: Tuesday, March 02, 2010 9:37 AM >> Subject: Re: [Bioperl-l] Alignment from blast report >> >> >> Hi Chris, >> Thank you for your reply. So I have to understand that since the >> get_aln method returns the HSP alignment, there is no way to retrieve >> the whole alignment as in the example pasted, isn't it? >> Basically I'm trying to use megablast as kind of multiple local >> alignment engine and actually I'm not pretty sure this is a good idea >> but in my particular case could be suitable. I mean that the example >> below reports only the portions of the sequences that align loosing >> the portions that does not, I'm not sure I gave the idea. What do you >> think about? Can you give me your opinion? >> If there isn't any module written yet, I can try to write a parser, it >> could be of any interest? >> >> Thank you, >> Paolo >> >> 2010/3/2 Chris Fields : >>> >>> Paolo, >>> >>> You can get a Bio::SimpleAlign from the HSP object. The first code example >>> in this section in the HOWTO demonstrates this: >>> >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>> >>> chris >>> >>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >>> >>>> Dear all, >>>> Sorry for pushing up my post but, please does anyone have an hint for me? >>>> Maybe have I to send attached the report to the mailing list? I don't >>>> know attachment policies of the list, if it is allowed and is needed I >>>> can do that. >>>> >>>> Thank you, >>>> Paolo >>>> >>>> 2010/2/26 Paolo Pavan : >>>>> >>>>> Sorry, >>>>> Maybe I forgot to add this is the megablast -m 5 output. >>>>> >>>>> Thank you again, >>>>> Paolo >>>>> >>>>> 2010/2/26 Paolo Pavan : >>>>>> >>>>>> Hi all, >>>>>> I have just a brief question: I've got some megablast reports such the >>>>>> one I've pasted below. >>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>>> Bio::Align::AlignI implementing one? >>>>>> >>>>>> Thank you all, >>>>>> Paolo >>>>>> >>>>>> >>>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>>> >>>>>> >>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>>> (2000), >>>>>> "A greedy algorithm for aligning DNA sequences", >>>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>>> >>>>>> Database: 00038-00053.fasta >>>>>> 2 sequences; 2001 total letters >>>>>> >>>>>> Searching..................................................done >>>>>> >>>>>> Query= 00038-00053 >>>>>> (802 letters) >>>>>> >>>>>> >>>>>> >>>>>> Score E >>>>>> Sequences producing significant alignments: (bits) Value >>>>>> >>>>>> ______00038 >>>>>> 226 1e-62 >>>>>> ______00053 >>>>>> 115 3e-29 >>>>>> >>>>>> 1_0 472 >>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>>> ______00038 883 >>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 532 >>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>>> ______00038 943 >>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 591 >>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 651 >>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 711 >>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 1 >>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>>> >>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>>> ______00038 1000 -------------------------------- 1001 >>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>>> Database: 00038-00053.fasta >>>>>> Posted date: Feb 25, 2010 4:47 PM >>>>>> Number of letters in database: 2001 >>>>>> Number of sequences in database: 2 >>>>>> >>>>>> Lambda K H >>>>>> 1.37 0.711 1.31 >>>>>> >>>>>> Gapped >>>>>> Lambda K H >>>>>> 1.37 0.711 1.31 >>>>>> >>>>>> >>>>>> Matrix: blastn matrix:1 -3 >>>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>>> Number of Sequences: 2 >>>>>> Number of Hits to DB: 17 >>>>>> Number of extensions: 3 >>>>>> Number of successful extensions: 3 >>>>>> Number of sequences better than 10.0: 2 >>>>>> Number of HSP's gapped: 2 >>>>>> Number of HSP's successfully gapped: 2 >>>>>> Length of query: 802 >>>>>> Length of database: 2001 >>>>>> Length adjustment: 10 >>>>>> Effective length of query: 792 >>>>>> Effective length of database: 1981 >>>>>> Effective search space: 1568952 >>>>>> Effective search space used: 1568952 >>>>>> X1: 9 (17.8 bits) >>>>>> X2: 20 (39.6 bits) >>>>>> X3: 51 (101.1 bits) >>>>>> S1: 9 (18.3 bits) >>>>>> S2: 9 (18.3 bits) >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From janine.arloth at googlemail.com Wed Mar 3 04:44:18 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Wed, 3 Mar 2010 10:44:18 +0100 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: References: Message-ID: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Hello, which arguments or result can I get from hits? hit = $result->next_hit; print $hit->name; Are there more than the name? Exists a description, where I can look up this? Regards From David.Messina at sbc.su.se Thu Mar 4 10:27:46 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 4 Mar 2010 16:27:46 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <4B8FC652.2010607@bioperl.org> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> Message-ID: <31C89CCE-25B8-492A-924D-A7401D415584@sbc.su.se> Hi Palani, You're using a very old version of BioPerl, 1.4: > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html The current release version is 1.6.1. Also, NCBi is changing (or may have already changed) their remote access system to require an email address. The very latest builds of BioPerl should now be compatible with this change. Get it here: http://www.bioperl.org/DIST/nightly_builds/ or directly via Subversion ? instructions here: http://www.bioperl.org/wiki/Getting_BioPerl Dave From cjfields at illinois.edu Thu Mar 4 10:30:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 04 Mar 2010 09:30:54 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <4B8FC652.2010607@bioperl.org> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> Message-ID: <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> Palani, We have a few regression tests that should have caught this but aren't quite set up correctly (they silently pass if no report is returned). This may be smoething on NCBI's end though; any remote database or analyses are notoriously brittle, hence the need to skip these by default when installing tests. Final note, but hopefully you aren't using bioperl 1.4 (as indicated by the docs). We're now on the 1.6 release series and are now on v. 1.6.1; 1.4 isn't supported anymore. chris On Thu, 2010-03-04 at 14:40 +0000, Jason Stajich wrote: > Palani - > This should be directed to the mailing list. > > -------- Original Message -------- > From: PalaniKannan K > Subject: Enquiry about Remoteblast.pm > Date: Thu, 4 Mar 2010 10:23:45 +0530 > > > > > > I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through > Remoteblast.pm script available through CPAN. When i am submitting my > query... it shows waiting for much time. Ex. (waiting .....................) > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html > > This is the reference script i am using through Remoteblast perl module. > > It worked upto last 02/03/2010. Now it is not working > > We had developed 3 applications using this module. The same error comes in 3 > applications we developed. So, i confim that our script dont have problem. > Kindly help me in this regard. > > -- > With Regards, > palani kannan. k > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Mar 4 10:27:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 10:27:16 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> References: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Message-ID: Check out http://www.bioperl.org/wiki/HOWTO:SearchIO MAJ ----- Original Message ----- From: "Janine Arloth" To: Sent: Wednesday, March 03, 2010 4:44 AM Subject: [Bioperl-l] StandAloneBlastPlus > Hello, > > which arguments or result can I get from hits? > > hit = $result->next_hit; > print $hit->name; > > Are there more than the name? Exists a description, where I can look up this? > > Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Thu Mar 4 10:25:45 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 04 Mar 2010 10:25:45 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> References: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Message-ID: <90B9BFFC-73DA-469F-900C-70448A9B1C03@verizon.net> http://www.bioperl.org/wiki/HOWTO:SearchIO On Mar 3, 2010, at 4:44 AM, Janine Arloth wrote: > Hello, > > which arguments or result can I get from hits? > > hit = $result->next_hit; > print $hit->name; > > Are there more than the name? Exists a description, where I can look up this? > > Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 4 11:49:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 04 Mar 2010 10:49:01 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> Message-ID: <1267721341.23329.26.camel@pyrimidine.igb.uiuc.edu> Okay, I'm able to replicate this (and the tests now correctly attempt to catch it). It appears that this may be a general RemoteBlast issue, as regular RemoteBlast tests are also taking forever. This shouldn't be related to the email issue (this isn't in RemoteBlast.pm yet). At least, I would hope NCBI would pass back another status besides 'WAITING' in cases where the email isn't provided. chris On Thu, 2010-03-04 at 09:30 -0600, Chris Fields wrote: > Palani, > > We have a few regression tests that should have caught this but aren't > quite set up correctly (they silently pass if no report is returned). > This may be smoething on NCBI's end though; any remote database or > analyses are notoriously brittle, hence the need to skip these by > default when installing tests. > > Final note, but hopefully you aren't using bioperl 1.4 (as indicated by > the docs). We're now on the 1.6 release series and are now on v. 1.6.1; > 1.4 isn't supported anymore. > > chris > > On Thu, 2010-03-04 at 14:40 +0000, Jason Stajich wrote: > > Palani - > > This should be directed to the mailing list. > > > > -------- Original Message -------- > > From: PalaniKannan K > > Subject: Enquiry about Remoteblast.pm > > Date: Thu, 4 Mar 2010 10:23:45 +0530 > > > > > > > > > > > > I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through > > Remoteblast.pm script available through CPAN. When i am submitting my > > query... it shows waiting for much time. Ex. (waiting .....................) > > > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html > > > > This is the reference script i am using through Remoteblast perl module. > > > > It worked upto last 02/03/2010. Now it is not working > > > > We had developed 3 applications using this module. The same error comes in 3 > > applications we developed. So, i confim that our script dont have problem. > > Kindly help me in this regard. > > > > -- > > With Regards, > > palani kannan. k > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Mar 4 14:06:33 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 04 Mar 2010 11:06:33 -0800 Subject: [Bioperl-l] call for project ideas - Google Summer of Code In-Reply-To: References: <4B8CAE6B.4010807@cornell.edu> Message-ID: <4B9004B9.8090107@cornell.edu> Hello Luis, These are interesting ideas. Have a look at http://sswap.info and http://sadiframework.org, perhaps you might want to work with one of those technologies? Be warned, these are both in early-stage development, you are on the cutting edge here! It seems like your desire to work with semantic technologies as a GSoC student could fit under a number of different mentoring organizations, possibly OBF or NEScent, or maybe another organization entirely. I'll make some inquiries. In the mean time, please add a project idea for this on the bioperl GSoC page, to give the idea somewhere to coalesce. If you can, try to come up with a more concrete idea for what you want to do. http://www.bioperl.org/wiki/Google_Summer_of_Code What do you think? Rob Luis M Rodriguez-R wrote: > Hello Robert, > > I would like to how to apply to and when the GSoC-2010 is planned to be performed. I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf and uniprot, but it could also be useful to integrate OWL). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL. The people of bio2rdf already have some parsers, but it's incompleteness is evident when working with their RDF as primary source of data. > > Best regards, > Luis. > > El 2/03/2010, a las 1:21, Robert Buels escribi?: > >> Hi all, >> >> Google's Summer of Code is coming round again, very soon now (mentoring organization applications are due next week). We need project ideas for prospective Summer of Code interns. >> >> There's a page on the BioPerl wiki, please have a look and add your ideas for intern projects. >> >> For more on Google Summer of Code, what it is and how it works, see their FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs >> >> One of the summer intern ideas I have on the page so far is to help with the tough grunt work of breaking BioPerl into smaller, more easily managed distributions. I'm sure you all can think of plenty more! >> >> Here's the page: http://www.bioperl.org/wiki/Google_Summer_of_Code >> >> Rob >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > From joa2006 at med.cornell.edu Thu Mar 4 15:11:58 2010 From: joa2006 at med.cornell.edu (Josef Anrather) Date: Thu, 04 Mar 2010 15:11:58 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] Message-ID: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Hi there, same problems here. Bioperl 1.6.1 installed; RemoteBlast version 1.006001. Could someone point me in the right direction. What is the put parameter for the email address? Does the supplied email address end up in an FBI data base if you blast the B.anthracis genome? Josef Cornell Medical College From maj at fortinbras.us Thu Mar 4 16:18:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 16:18:48 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: we're not at liberty to say ----- Original Message ----- From: "Josef Anrather" To: Sent: Thursday, March 04, 2010 3:11 PM Subject: Re: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] > Hi there, > > same problems here. Bioperl 1.6.1 installed; RemoteBlast version > 1.006001. > Could someone point me in the right direction. What is the put > parameter for the email address? > > Does the supplied email address end up in an FBI data base if you > blast the B.anthracis genome? > > Josef > > Cornell Medical College > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Fri Mar 5 05:05:43 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Mar 2010 11:05:43 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: My apologies for jumping the gun on the email thing ? that won't take effect until June 1. See full details here: http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d Looks like the problems with RemoteBlast (as Chris reported elsewhere in this thread) is at NCBI's servers (and is probably temporary). Dave From robert.bradbury at gmail.com Fri Mar 5 08:20:36 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 5 Mar 2010 08:20:36 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: On Fri, Mar 5, 2010 at 5:05 AM, Dave Messina wrote: > My apologies for jumping the gun on the email thing ? that won't take > effect until June 1. > > See full details here: > > http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d > > > Looks like the problems with RemoteBlast (as Chris reported elsewhere in > this thread) is at NCBI's servers (and is probably temporary). > > I would not be at all surprised if any problems involving RemoteBlast were related to the recent changeovers to a Javascript requirement for all interfaces to NCBI databases (this took place around mid-February and I complained about this in a previous email to the BioPerl list). I received a response back from Dr. Eric Sayers at NCBI on Feb. 26 that indicated that they were aware of the problem (involving a Javascript requirement) and indicated that NCBI developers were "investigating" ways to mitigate the problem. I've looked briefly at the new Javascript code that one is required to run when using PubMed, etc. and it looks like they may have completely changed the external interfaces to NCBI databases -- so I'm not surprised if that broke some or all other external interfaces used by BioPerl (RemoteBlast, Eutils, etc.). I'd suggest that you try to document the problems as best you can and submit them to the NCBI help desk (or info at ncbi.nlm.nih.gov). It may be worth noting that it took ~3 weeks for me to receive any response to my reports. Also note, that (a) to the best of my knowledge there has been no public discussion regarding these recent changes at NCBI; and (b) under the Jan. 21, 2009 Memorandum on Transparency and Open Government, and under the Dec 8, 2009 Open Government Directive, NCBI *should* be doing a better job working with its end users (and the taxpayers) -- and at least thus far, while NIH seems to be making an effort that doesn't seem to have filtered down to NCBI. (For example, no open/public discussion regarding the email requirement for remote blasts...). It is also worth noting that it should be possible to file FOI requests with NIH/NCBI to find out exactly what they are doing and why they are doing it. I haven't taken such steps yet but I have given consideration to doing so. Robert From biopython at maubp.freeserve.co.uk Fri Mar 5 08:31:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Mar 2010 13:31:57 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> On Fri, Mar 5, 2010 at 1:20 PM, Robert Bradbury wrote: > > (For example, no open/public discussion regarding the email > requirement for remote blasts...). > Hi all, What email requirement for remote blasts are you talking about? Note that the email referred to earlier talks about to unrelated issues, (1) changes to the BLAST output with the introduction of BLAST+, and (2) the upcoming email requirement for Entrez (aka E-utilities, they have been very clear about that with plenty of warning). http://lists.open-bio.org/pipermail/open-bio-l/2010-February/000615.html http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032159.html Is there a misunderstanding here? Peter From David.Messina at sbc.su.se Fri Mar 5 08:44:08 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Mar 2010 14:44:08 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> Message-ID: <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> > Is there a misunderstanding here? Whoops, yes there is ? that's my fault, too. I did not read carefully and conflated EUtilities and RemoteBLAST. Just to be clear, the upcoming email requirement will be for EUtilities, NOT for RemoteBLAST. Thanks for clearing that up, Peter. Dave On Mar 5, 2010, at 14:31, Peter wrote: > On Fri, Mar 5, 2010 at 1:20 PM, Robert Bradbury wrote: >> >> (For example, no open/public discussion regarding the email >> requirement for remote blasts...). >> > > Hi all, > > What email requirement for remote blasts are you talking about? > > Note that the email referred to earlier talks about to unrelated > issues, (1) changes to the BLAST output with the introduction > of BLAST+, and (2) the upcoming email requirement for Entrez > (aka E-utilities, they have been very clear about that with > plenty of warning). > > http://lists.open-bio.org/pipermail/open-bio-l/2010-February/000615.html > http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032159.html > > > Peter From biopython at maubp.freeserve.co.uk Fri Mar 5 08:48:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Mar 2010 13:48:27 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> Message-ID: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: > >> Is there a misunderstanding here? > > Whoops, yes there is ? that's my fault, too. I did not > read carefully and conflated EUtilities and RemoteBLAST. > > Just to be clear, the upcoming email requirement will > be for EUtilities, NOT for RemoteBLAST. > > Thanks for clearing that up, Peter. > Dave No problem - you guys had me worried there for a minute ;) Peter From cjfields at illinois.edu Fri Mar 5 08:50:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 07:50:51 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> On Mar 5, 2010, at 7:20 AM, Robert Bradbury wrote: > On Fri, Mar 5, 2010 at 5:05 AM, Dave Messina wrote: > >> My apologies for jumping the gun on the email thing ? that won't take >> effect until June 1. >> >> See full details here: >> >> http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d >> >> >> Looks like the problems with RemoteBlast (as Chris reported elsewhere in >> this thread) is at NCBI's servers (and is probably temporary). >> >> > I would not be at all surprised if any problems involving RemoteBlast were > related to the recent changeovers to a Javascript requirement for all > interfaces to NCBI databases (this took place around mid-February and I > complained about this in a previous email to the BioPerl list). Robert, according to Palani's recent response NCBI provided a perl script that worked, so I don't think it a Javascript issue. My guess is a change in the returned page information that isn't caught by the current regex, a problem that has happened in the past. I'll be looking into it today. > I received a response back from Dr. Eric Sayers at NCBI on Feb. 26 that > indicated that they were aware of the problem (involving a Javascript > requirement) and indicated that NCBI developers were "investigating" ways to > mitigate the problem. > > I've looked briefly at the new Javascript code that one is required to run > when using PubMed, etc. and it looks like they may have completely changed > the external interfaces to NCBI databases -- so I'm not surprised if that > broke some or all other external interfaces used by BioPerl (RemoteBlast, > Eutils, etc.). I'd suggest that you try to document the problems as best > you can and submit them to the NCBI help desk (or info at ncbi.nlm.nih.gov). > It may be worth noting that it took ~3 weeks for me to receive any response > to my reports. EUtilities works fine (both regular and SOAP); all regression tests are passing, so it's not affecting everything. > Also note, that (a) to the best of my knowledge there has been no public > discussion regarding these recent changes at NCBI; and (b) under the Jan. > 21, 2009 Memorandum on Transparency and Open Government, and under the Dec > 8, 2009 Open Government Directive, NCBI *should* be doing a better job > working with its end users (and the taxpayers) -- and at least thus far, > while NIH seems to be making an effort that doesn't seem to have filtered > down to NCBI. > > (For example, no open/public discussion regarding the email requirement for > remote blasts...). > > It is also worth noting that it should be possible to file FOI requests with > NIH/NCBI to find out exactly what they are doing and why they are doing it. > I haven't taken such steps yet but I have given consideration to doing so. > > Robert The email requirement has always been indicated, it was just never enforced. B/c of increased spamming issues on the NCBI server they took up the initiative to require users provide an email address (and enforce it starting in June). I just made a change to the BioPerl install that requests an email and bypasses Bio::DB::EUtilities tests if one is not provided, other tools will be following suit. I don't think there is anything insidious about this. My guess is they will be using them merely to track server usage per user and IP, and take necessary measures (i.e. contact or block) if needed. Finally, I'm not sure where the hostility is coming from. NCBI has provided a great service to the community for many years, even through many funding cuts, and they have had quite a few. Frankly, if one doesn't like their service requirements, there are other databases that one can use. chris From cjfields at illinois.edu Fri Mar 5 10:07:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 09:07:11 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> Message-ID: On Mar 5, 2010, at 7:48 AM, Peter wrote: > On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >> >>> Is there a misunderstanding here? >> >> Whoops, yes there is ? that's my fault, too. I did not >> read carefully and conflated EUtilities and RemoteBLAST. >> >> Just to be clear, the upcoming email requirement will >> be for EUtilities, NOT for RemoteBLAST. >> >> Thanks for clearing that up, Peter. >> Dave > > No problem - you guys had me worried there for a minute ;) > > Peter Just as an update, I can confirm it is a change with retrieve_blast() not catching the report (no Javascript, no email ;). Will try fixing this later today. chris From robert.bradbury at gmail.com Fri Mar 5 10:08:42 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 5 Mar 2010 10:08:42 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> Message-ID: Sorry, yes I too was reading quickly and not separating RemoteBlast from Eutilities requirements. With respect to "hostility", I do agree Chris that NCBI has provided a great service over the years (I've used it for over 15 as I'm sure many here have). However, the recent Javascript requirement (without any apparent discussion within the user community) has me very annoyed [1]. One could back it up a level and ask why NCBI doesn't have a "user community forum" (at least that I'm aware of) or even a bug database (it isn't like putting up a bugzilla bug database requires all that much work). Heck, even the phone companies (whom I consider to be the epitome of bureaucracy) issue me a trouble ticket # when I have a problem (something to the best of my knowledge NCBI does not do). There is also the fact that several months ago when I requested an explanation for what code/utilities were being used to generate the Homologene "homology" graphics (so I could consider extending it to other species, potentially in BioPerl) I was told in unspecific terms that a variety of utilities were used (and my impression was perhaps an underlying suggestion that it might be too complicated for me to understand -- but that could just be subjective impression on my part). [Of course such a response doesn't fit well my perspective of "open government".) Robert 1. There are a long list of reasons why Javascript is bad ranging from increasing memory and CPU requirements on the end user (one cannot run hundreds of open PubMed tabs, as I often may when doing research, on an "average" machine if all the tabs are running Javascript, downloading and running lots of Javascripts can hardly be considered "green", Javascript doesn't work in the lightest weight browsers such as Dillo, Javascript decreases the reliability and security of the browser, excessive reliance on Javascript may decrease web access for individuals with disabilities (potentially in violation of current laws I suspect), etc.) From roy.chaudhuri at gmail.com Fri Mar 5 10:52:12 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 05 Mar 2010 15:52:12 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> Message-ID: <4B9128AC.1000405@gmail.com> Hi Robert, Just a suggestion, maybe you could use HubMed (www.hubmed.org) as a PubMed alternative? It seems to work ok with JavaScript disabled. Roy. On 05/03/2010 15:08, Robert Bradbury wrote: > Sorry, yes I too was reading quickly and not separating RemoteBlast from > Eutilities requirements. > > With respect to "hostility", I do agree Chris that NCBI has provided a great > service over the years (I've used it for over 15 as I'm sure many here > have). However, the recent Javascript requirement (without any apparent > discussion within the user community) has me very annoyed [1]. One could > back it up a level and ask why NCBI doesn't have a "user community forum" > (at least that I'm aware of) or even a bug database (it isn't like putting > up a bugzilla bug database requires all that much work). Heck, even the > phone companies (whom I consider to be the epitome of bureaucracy) issue me > a trouble ticket # when I have a problem (something to the best of my > knowledge NCBI does not do). > > There is also the fact that several months ago when I requested an > explanation for what code/utilities were being used to generate the > Homologene "homology" graphics (so I could consider extending it to other > species, potentially in BioPerl) I was told in unspecific terms that a > variety of utilities were used (and my impression was perhaps an underlying > suggestion that it might be too complicated for me to understand -- but that > could just be subjective impression on my part). [Of course such a response > doesn't fit well my perspective of "open government".) > > Robert > > 1. There are a long list of reasons why Javascript is bad ranging from > increasing memory and CPU requirements on the end user (one cannot run > hundreds of open PubMed tabs, as I often may when doing research, on an > "average" machine if all the tabs are running Javascript, downloading and > running lots of Javascripts can hardly be considered "green", Javascript > doesn't work in the lightest weight browsers such as Dillo, Javascript > decreases the reliability and security of the browser, excessive reliance on > Javascript may decrease web access for individuals with disabilities > (potentially in violation of current laws I suspect), etc.) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paolo.pavan at gmail.com Fri Mar 5 13:51:55 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 5 Mar 2010 19:51:55 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <2FB5C317605B48269256ABFABBED2239@NewLife> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> <18C0182252934619AD12E49243BE3C14@NewLife> <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> <2FB5C317605B48269256ABFABBED2239@NewLife> Message-ID: <56be91b61003051051v6b06b872q9f59380b05492071@mail.gmail.com> Dear Mark, Thank you again for your efforts spent on this theme, I have read and tested carefully enough I hope, your new ads. I found they work perfectly but either I miss some feature of the Tiling API (and this is possible) or it could be that they don't entirely match what was the initial problem; for sure my fault, I can explain better. Let me start saying that what is needed is the merge of the alignments returned by the get_tiled_alns method. I have 2 seqs: h1, h2 (in the given example 00038 and 00053) and they could be aligned against the same sequence q (named 1_0) They cannot be aligned with common multiple sequence aligners like clustalw since in this case is to be preferred a local alignment algorithm instead of a global alignment. This specific case cannot be handled by programs like cap3 either. I found that megablast -m 5 can output a tiling of all the hits found versus the query, reporting this entire. I hope I gave the idea, if needed I can provide the input sequences of the megablast. Thank you again and have a nice week end, Paolo 2010/3/4 Mark A. Jensen : > Paolo -- Ok, there's now (r16900) an *experimental* method in > Bio::Search::Tiling::MapTiling called get_tiled_alns(). > POD is below. Try it out and let me know-- > cheers, > MAJ > > > =head1 TILED ALIGNMENTS > > The experimental method L will use a tiling > to concatenate tiled hsps into a series of L > objects: > > @alns = $tiling->get_tiled_alns($type, $context); > > Each alignment contains two sequences with ids 'query' and 'subject', > and consists of a concatenation of tiling HSPs which overlap or are > directly adjacent. The alignment are returned in C<$type> sequence > order. When HSPs overlap, the alignment sequence is taken from the HSP > which comes first in the coverage map array. > > The sequences in each alignment contain features (even though they are > L objects) which map the original query/subject > coordinates to the new alignment sequence coordinates. You can > determine the original BLAST fragments this way: > > $aln = ($tiling->get_tiled_alns)[0]; > $qseq = $aln->get_seq_by_id('query'); > $hseq = $aln->get_seq_by_id('subject'); > foreach my $feat ($qseq->get_SeqFeatures) { > ? $org_start = ($feat->get_tag_values('query_start'))[0]; > ? $org_end = ($feat->get_tag_values('query_end'))[0]; > ? # original fragment as represented in the tiled alignment: > ? $org_fragment = $feat->seq; > } > foreach my $feat ($hseq->get_SeqFeatures) { > ? $org_start = ($feat->get_tag_values('subject_start'))[0]; > ? $org_end = ($feat->get_tag_values('subject_end'))[0]; > ? # original fragment as represented in the tiled alignment: > ? $org_fragment = $feat->seq; > } > > > ----- Original Message ----- From: "Paolo Pavan" > To: "Mark A. Jensen" > Cc: "Chris Fields" ; > Sent: Tuesday, March 02, 2010 10:17 AM > Subject: Re: [Bioperl-l] Alignment from blast report > > >> I think you got the sense, thank you. Of course hsps from different >> hits will be reflected in different elements aligned. I've attached >> the example pasted (unix text) because is more readable, hoping will >> not be held by the mailing server :-) >> >> Thank you, >> Paolo >> >> 2010/3/2 Mark A. Jensen : >>> >>> This might a good method to have for Bio::Search::Tiling-- >>> you want to stitch together all the hsps and have the >>> concatenated alignment returned as a Bio::SimpleAlign, >>> correct? Tiling would create the right set of hsps from >>> which to generate the composite alignment. I can >>> try to get something working, but it may take a while- >>> MAJ >>> ----- Original Message ----- From: "Paolo Pavan" >>> To: "Chris Fields" >>> Cc: >>> Sent: Tuesday, March 02, 2010 9:37 AM >>> Subject: Re: [Bioperl-l] Alignment from blast report >>> >>> >>> Hi Chris, >>> Thank you for your reply. So I have to understand that since the >>> get_aln method returns the HSP alignment, there is no way to retrieve >>> the whole alignment as in the example pasted, isn't it? >>> Basically I'm trying to use megablast as kind of multiple local >>> alignment engine and actually I'm not pretty sure this is a good idea >>> but in my particular case could be suitable. I mean that the example >>> below reports only the portions of the sequences that align loosing >>> the portions that does not, I'm not sure I gave the idea. What do you >>> think about? Can you give me your opinion? >>> If there isn't any module written yet, I can try to write a parser, it >>> could be of any interest? >>> >>> Thank you, >>> Paolo >>> >>> 2010/3/2 Chris Fields : >>>> >>>> Paolo, >>>> >>>> You can get a Bio::SimpleAlign from the HSP object. The first code >>>> example >>>> in this section in the HOWTO demonstrates this: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>>> >>>> chris >>>> >>>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >>>> >>>>> Dear all, >>>>> Sorry for pushing up my post but, please does anyone have an hint for >>>>> me? >>>>> Maybe have I to send attached the report to the mailing list? I don't >>>>> know attachment policies of the list, if it is allowed and is needed I >>>>> can do that. >>>>> >>>>> Thank you, >>>>> Paolo >>>>> >>>>> 2010/2/26 Paolo Pavan : >>>>>> >>>>>> Sorry, >>>>>> Maybe I forgot to add this is the megablast -m 5 output. >>>>>> >>>>>> Thank you again, >>>>>> Paolo >>>>>> >>>>>> 2010/2/26 Paolo Pavan : >>>>>>> >>>>>>> Hi all, >>>>>>> I have just a brief question: I've got some megablast reports such >>>>>>> the >>>>>>> one I've pasted below. >>>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>>>> Bio::Align::AlignI implementing one? >>>>>>> >>>>>>> Thank you all, >>>>>>> Paolo >>>>>>> >>>>>>> >>>>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>>>> >>>>>>> >>>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>>>> (2000), >>>>>>> "A greedy algorithm for aligning DNA sequences", >>>>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>>>> >>>>>>> Database: 00038-00053.fasta >>>>>>> 2 sequences; 2001 total letters >>>>>>> >>>>>>> Searching..................................................done >>>>>>> >>>>>>> Query= 00038-00053 >>>>>>> (802 letters) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Score E >>>>>>> Sequences producing significant alignments: (bits) Value >>>>>>> >>>>>>> ______00038 >>>>>>> 226 1e-62 >>>>>>> ______00053 >>>>>>> 115 3e-29 >>>>>>> >>>>>>> 1_0 472 >>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>>>> ______00038 883 >>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 532 >>>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>>>> ______00038 943 >>>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 591 >>>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 651 >>>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 711 >>>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 1 >>>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>>>> >>>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>>>> ______00038 1000 -------------------------------- 1001 >>>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>>>> Database: 00038-00053.fasta >>>>>>> Posted date: Feb 25, 2010 4:47 PM >>>>>>> Number of letters in database: 2001 >>>>>>> Number of sequences in database: 2 >>>>>>> >>>>>>> Lambda K H >>>>>>> 1.37 0.711 1.31 >>>>>>> >>>>>>> Gapped >>>>>>> Lambda K H >>>>>>> 1.37 0.711 1.31 >>>>>>> >>>>>>> >>>>>>> Matrix: blastn matrix:1 -3 >>>>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>>>> Number of Sequences: 2 >>>>>>> Number of Hits to DB: 17 >>>>>>> Number of extensions: 3 >>>>>>> Number of successful extensions: 3 >>>>>>> Number of sequences better than 10.0: 2 >>>>>>> Number of HSP's gapped: 2 >>>>>>> Number of HSP's successfully gapped: 2 >>>>>>> Length of query: 802 >>>>>>> Length of database: 2001 >>>>>>> Length adjustment: 10 >>>>>>> Effective length of query: 792 >>>>>>> Effective length of database: 1981 >>>>>>> Effective search space: 1568952 >>>>>>> Effective search space used: 1568952 >>>>>>> X1: 9 (17.8 bits) >>>>>>> X2: 20 (39.6 bits) >>>>>>> X3: 51 (101.1 bits) >>>>>>> S1: 9 (18.3 bits) >>>>>>> S2: 9 (18.3 bits) >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shalabh.sharma7 at gmail.com Fri Mar 5 15:06:30 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 15:06:30 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) Message-ID: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> Hi All, I have a set of accession numbers. Is it possible to get "isolation_source" from the GenBank records for all the Accession numbers. I would really appreciate if anyone can help me out. Thanks Shalabh From shalabh.sharma7 at gmail.com Fri Mar 5 15:29:17 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 15:29:17 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> Message-ID: <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> HI Brian, Thanks for your quick reply. I was reading the document and it think it talks about parsing a GenBank record. What i exactly want is to submit a batch of accession numbers and get "isolation_source" directly without downloading all the Genbank files. I am still reading the document may be i missed something. Thanks a lot shalabh On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > Shalabh, > > You can start by reading about how Bioperl processes Genbank files and > their annotations: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > > > Brian O. > > On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > > > Hi All, > > I have a set of accession numbers. Is it possible to get > > "isolation_source" from the GenBank records for all the Accession > numbers. > > > > I would really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Fri Mar 5 15:43:33 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 05 Mar 2010 15:43:33 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> Message-ID: Shalabh, I see. I think you could use EUtils then. Take a look at these: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service I'm not an expert on these, and I do not know if one can ask for just a tag value ("isolation_source"). Getting a tag value from the downloaded Genbank entry is not difficult though, that Feature-Annotation HOWTO shows you how. Brian O. On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > HI Brian, > Thanks for your quick reply. > I was reading the document and it think it talks about parsing a GenBank > record. What i exactly want is to submit a batch of accession numbers and > get "isolation_source" directly without downloading all the Genbank files. > I am still reading the document may be i missed something. > > Thanks a lot > shalabh > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > >> Shalabh, >> >> You can start by reading about how Bioperl processes Genbank files and >> their annotations: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation >> >> >> >> Brian O. >> >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: >> >>> Hi All, >>> I have a set of accession numbers. Is it possible to get >>> "isolation_source" from the GenBank records for all the Accession >> numbers. >>> >>> I would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Mar 5 15:13:45 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 05 Mar 2010 15:13:45 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> Message-ID: <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> Shalabh, You can start by reading about how Bioperl processes Genbank files and their annotations: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Brian O. On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > Hi All, > I have a set of accession numbers. Is it possible to get > "isolation_source" from the GenBank records for all the Accession numbers. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 5 16:22:47 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 05 Mar 2010 15:22:47 -0600 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> Message-ID: <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> Regardless on what you try, it will only limit records returned (e.g. you will still get full records, unless you take steps to limit those somehow, by adding sequence start/stop, etc). Anyway, this worked to retrieve those with that tag: "src isolation source"[Properties] That get a lot of hits. If you are only interested in that one line you could just parse it out w/o resorting to bioperl (beleiev it or not, it's not always the best answer). chris On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: > Shalabh, > > I see. I think you could use EUtils then. Take a look at these: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > I'm not an expert on these, and I do not know if one can ask for just a tag value ("isolation_source"). Getting a tag value from the downloaded Genbank entry is not difficult though, that Feature-Annotation HOWTO shows you how. > > Brian O. > > > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > > > HI Brian, > > Thanks for your quick reply. > > I was reading the document and it think it talks about parsing a GenBank > > record. What i exactly want is to submit a batch of accession numbers and > > get "isolation_source" directly without downloading all the Genbank files. > > I am still reading the document may be i missed something. > > > > Thanks a lot > > shalabh > > > > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > > > >> Shalabh, > >> > >> You can start by reading about how Bioperl processes Genbank files and > >> their annotations: > >> > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > >> > >> > >> > >> Brian O. > >> > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > >> > >>> Hi All, > >>> I have a set of accession numbers. Is it possible to get > >>> "isolation_source" from the GenBank records for all the Accession > >> numbers. > >>> > >>> I would really appreciate if anyone can help me out. > >>> > >>> Thanks > >>> Shalabh > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Fri Mar 5 17:06:41 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 17:06:41 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> Message-ID: <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> Thanks Bran and Chris, I followed the example given here : http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook to retrieve raw data records from genbank. For example i used the id : 157091572 to get the genbank record, but the downloaded file does not contain "isolation_source" which is there when you look for the record online: http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&dopt=GenBank&RID=T2S9N0PJ01N&log%24=nuclalign&blast_rank=1&list_uids=157091572 Thanks Shalabh On Fri, Mar 5, 2010 at 4:22 PM, Chris Fields wrote: > Regardless on what you try, it will only limit records returned (e.g. > you will still get full records, unless you take steps to limit those > somehow, by adding sequence start/stop, etc). > > Anyway, this worked to retrieve those with that tag: > "src isolation source"[Properties] > > That get a lot of hits. > > If you are only interested in that one line you could just parse it out > w/o resorting to bioperl (beleiev it or not, it's not always the best > answer). > > chris > > On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: > > Shalabh, > > > > I see. I think you could use EUtils then. Take a look at these: > > > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > > > I'm not an expert on these, and I do not know if one can ask for just a > tag value ("isolation_source"). Getting a tag value from the downloaded > Genbank entry is not difficult though, that Feature-Annotation HOWTO shows > you how. > > > > Brian O. > > > > > > > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > > > > > HI Brian, > > > Thanks for your quick reply. > > > I was reading the document and it think it talks about parsing a > GenBank > > > record. What i exactly want is to submit a batch of accession numbers > and > > > get "isolation_source" directly without downloading all the Genbank > files. > > > I am still reading the document may be i missed something. > > > > > > Thanks a lot > > > shalabh > > > > > > > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne >wrote: > > > > > >> Shalabh, > > >> > > >> You can start by reading about how Bioperl processes Genbank files and > > >> their annotations: > > >> > > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > >> > > >> > > >> > > >> Brian O. > > >> > > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > > >> > > >>> Hi All, > > >>> I have a set of accession numbers. Is it possible to get > > >>> "isolation_source" from the GenBank records for all the Accession > > >> numbers. > > >>> > > >>> I would really appreciate if anyone can help me out. > > >>> > > >>> Thanks > > >>> Shalabh > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From shalabh.sharma7 at gmail.com Fri Mar 5 17:57:00 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 17:57:00 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> Message-ID: <9fcc48c71003051457x7186e3e0y1c9b8ee5ea81e153@mail.gmail.com> Thanks everyone, i got it what i was looking for. EUtlities helped me a lot. Thanks Shalabh On Fri, Mar 5, 2010 at 5:06 PM, shalabh sharma wrote: > Thanks Bran and Chris, > I followed the example given here : > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > to retrieve raw data records from genbank. > For example i used the id : 157091572 to get the genbank record, but the > downloaded file does not contain "isolation_source" which is there when you > look for the record online: > > http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&dopt=GenBank&RID=T2S9N0PJ01N&log%24=nuclalign&blast_rank=1&list_uids=157091572 > > Thanks > Shalabh > > > On Fri, Mar 5, 2010 at 4:22 PM, Chris Fields wrote: > >> Regardless on what you try, it will only limit records returned (e.g. >> you will still get full records, unless you take steps to limit those >> somehow, by adding sequence start/stop, etc). >> >> Anyway, this worked to retrieve those with that tag: >> "src isolation source"[Properties] >> >> That get a lot of hits. >> >> If you are only interested in that one line you could just parse it out >> w/o resorting to bioperl (beleiev it or not, it's not always the best >> answer). >> >> chris >> >> On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: >> > Shalabh, >> > >> > I see. I think you could use EUtils then. Take a look at these: >> > >> > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> > >> > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> > >> > I'm not an expert on these, and I do not know if one can ask for just a >> tag value ("isolation_source"). Getting a tag value from the downloaded >> Genbank entry is not difficult though, that Feature-Annotation HOWTO shows >> you how. >> > >> > Brian O. >> > >> > >> >> > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: >> > >> > > HI Brian, >> > > Thanks for your quick reply. >> > > I was reading the document and it think it talks about parsing a >> GenBank >> > > record. What i exactly want is to submit a batch of accession numbers >> and >> > > get "isolation_source" directly without downloading all the Genbank >> files. >> > > I am still reading the document may be i missed something. >> > > >> > > Thanks a lot >> > > shalabh >> > > >> > > >> > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne > >wrote: >> > > >> > >> Shalabh, >> > >> >> > >> You can start by reading about how Bioperl processes Genbank files >> and >> > >> their annotations: >> > >> >> > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation >> > >> >> > >> >> > >> >> > >> Brian O. >> > >> >> > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: >> > >> >> > >>> Hi All, >> > >>> I have a set of accession numbers. Is it possible to get >> > >>> "isolation_source" from the GenBank records for all the Accession >> > >> numbers. >> > >>> >> > >>> I would really appreciate if anyone can help me out. >> > >>> >> > >>> Thanks >> > >>> Shalabh >> > >>> _______________________________________________ >> > >>> Bioperl-l mailing list >> > >>> Bioperl-l at lists.open-bio.org >> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From cjfields at illinois.edu Fri Mar 5 23:14:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 22:14:01 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> Message-ID: <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> On Mar 5, 2010, at 7:48 AM, Peter wrote: > On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >> >>> Is there a misunderstanding here? >> >> Whoops, yes there is ? that's my fault, too. I did not >> read carefully and conflated EUtilities and RemoteBLAST. >> >> Just to be clear, the upcoming email requirement will >> be for EUtilities, NOT for RemoteBLAST. >> >> Thanks for clearing that up, Peter. >> Dave > > No problem - you guys had me worried there for a minute ;) > > Peter Just to bring this thread full circle, I have committed a fix which (ironically) reduced the code down a bit. I also added an attribute (get_rtoe) that returns the approximate time until the report is returned. chris From joa2006 at med.cornell.edu Sat Mar 6 17:13:45 2010 From: joa2006 at med.cornell.edu (Josef Anrather) Date: Sat, 06 Mar 2010 17:13:45 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> Message-ID: Chris, the fix works flawlessly on my system. Thanks for the fast response. Cheers, Josef On Mar 5, 2010, at 11:14 PM, Chris Fields wrote: > > On Mar 5, 2010, at 7:48 AM, Peter wrote: > >> On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >>> >>>> Is there a misunderstanding here? >>> >>> Whoops, yes there is ? that's my fault, too. I did not >>> read carefully and conflated EUtilities and RemoteBLAST. >>> >>> Just to be clear, the upcoming email requirement will >>> be for EUtilities, NOT for RemoteBLAST. >>> >>> Thanks for clearing that up, Peter. >>> Dave >> >> No problem - you guys had me worried there for a minute ;) >> >> Peter > > Just to bring this thread full circle, I have committed a fix which > (ironically) reduced the code down a bit. I also added an attribute > (get_rtoe) that returns the approximate time until the report is > returned. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jarodpardon at yahoo.com.cn Sun Mar 7 04:13:40 2010 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Sun, 7 Mar 2010 17:13:40 +0800 (CST) Subject: [Bioperl-l] insertion code in pdb parser Message-ID: <643595.96038.qm@web15003.mail.cnb.yahoo.com> hi, all, insertion code for a residue number is very common in many cases, esp. in the numbering schema for antibody sequence, such as 82A, 82B. When Bio::Structure::IO::pdb parses a pdb file containing residues with insertion code, it will assign the id for such residue like 'PRO-52.A' where 'A' is the insertion code, however, the opposite operation (set the id of the residue) does not work. for example, if the original residue number is 51, $res->id('PRO-52.A') will not append the insertion code after the residue number correctly, though it indeed changes the residue number from 51 to 52. Finally, I found out the only way to set the insertion code for the residue: assign the insertion code for all atoms of this residue by the method $atom->icode('A'). I think it is inconvenient and misleading, since insertion code should not be a property for an atom, it is never seen that a residue have atoms with different insertion codes. I highly recommend that there should be some changes: add icode method for residue object, not the atom, as the same, the segment id should also be for residue. Jarod From rtbio.2009 at gmail.com Sun Mar 7 08:11:54 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sun, 7 Mar 2010 14:11:54 +0100 Subject: [Bioperl-l] remoteblast Message-ID: Hello Mark and everybody, I have been trying to connect to remote blast to retrieve similar sequences to a given sequence. But my program is unable to retrieve the sequences from BLAST, i.e., it is getting executed till the remote blast ids, but it is not entering the else loop after collecting the rid. Please check this problem and help me in this regard. I think the problem is in getting the sequence and going to the 'else' part. i.e., else { open(OUTFILE,'>',$blastdebugfile); # I think the problem is in else part, i.e., it is not taking the next result.# print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output Please give me your reply. Thanks and regards, Roopa. My code is as follows. #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use rnai_blast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds."; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes
This page will automatically reload in 30 seconds
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$blastdebugfile); print OUTFILE @params; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => "$organ\[ORGN]"); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); #submits the input data to BLAST# my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); #Retrieving the result ids# my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$blastdebugfile); # I think the problem is in else part, i.e., it is not taking the next result.# print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); #saving the output in blastdata.time.out file# # $random=rand(); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); # print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dna; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); #returning the sequences obtained on BLAST# } From cjfields at illinois.edu Sun Mar 7 09:57:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 7 Mar 2010 08:57:43 -0600 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Roopa, I committed a fix for this a few days ago; if you update from SVN it should work. The problem stemmed from server-side changes at NCBI. chris On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: > Hello Mark and everybody, > > I have been trying to connect to remote blast to retrieve similar sequences > to a given sequence. But my program is unable to retrieve the sequences from > BLAST, i.e., it is getting executed till the remote blast ids, but it is not > entering the else loop after collecting the rid. Please check this problem > and help me in this regard. I think the problem is in getting the sequence > and going to the 'else' part. i.e., > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > > Please give me your reply. > > Thanks and regards, > Roopa. > > My code is as follows. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use rnai_blast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds."; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes
> This page will automatically reload in 30 seconds
> \n > \n"; > > close(OUTFILE); > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$blastdebugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > "$organ\[ORGN]"); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > #submits the input data to BLAST# > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > #Retrieving the result ids# > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > #saving the output in blastdata.time.out file# > > # $random=rand(); > > my $filename = $serverpath."/blastdata_".time()."\.out"; > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > # print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dna; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > close(OUTFILE); > > > return(@seqs); #returning the sequences obtained on BLAST# > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jdetras at gmail.com Fri Mar 5 01:17:40 2010 From: jdetras at gmail.com (Jeffrey Detras) Date: Fri, 5 Mar 2010 14:17:40 +0800 Subject: [Bioperl-l] distances between leaf nodes Message-ID: Hi, I am new at using the Bio::TreeIO module specifically using the newick format for a phylogenetic analysis. The sample_tree attached is Newick-formatted tree. My objective is to get all the distances between all the leaf nodes. I copied examples of the code from http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my knowledge) so that I understand how to assign the right array value for the nodes/leaves. The message would say must provide 2 root nodes. Here is what I have right now: #!/usr/bin/perl -w use strict; my $treefile = 'sample_tree'; use Bio::TreeIO; my $treeio = Bio::TreeIO->new(-format => 'newick', -file => $treefile); while (my $tree = $treeio->next_tree) { my @leaves = $tree->get_leaf_nodes; for (my $dist = $tree->distance(-nodes => \@leaves)){ print "Distance between trees is $dist\n"; } } Thanks, Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_tree Type: application/octet-stream Size: 418 bytes Desc: not available URL: From janine.arloth at googlemail.com Fri Mar 5 04:43:57 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Fri, 5 Mar 2010 10:43:57 +0100 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: Message-ID: Hello, using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: 1 50 test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA 51 100 test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA But I prefer this format: Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 ||||||||||||||||||||||||| Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 How can I get this? Best Regards From elujan at stanford.edu Sun Mar 7 19:49:34 2010 From: elujan at stanford.edu (Ernesto George Lujan) Date: Sun, 7 Mar 2010 16:49:34 -0800 (PST) Subject: [Bioperl-l] Installing BioPerl In-Reply-To: <1189627897.1477411268008644137.JavaMail.root@zm09.stanford.edu> Message-ID: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> Hi everyone, I'm running MacOSX 10.5.8 with Perl 5.8.8 and I'm having trouble installing the BioPerl module. I've downloaded and installed BioPerl 1.5.1-2 binary through FinkCommander, but when I type perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' into the Terminal, it tells me that I'm using BioPerl Version 1.006. How do I get this module to install correctly? Once again, my specs: Perl Version: 5.8.8 BioPerl Version: 1.006 Operating System: Max OSX 10.5.8 Thanks! -BioPerl Beginner From bimber at wisc.edu Sun Mar 7 22:57:12 2010 From: bimber at wisc.edu (Ben Bimber) Date: Sun, 7 Mar 2010 21:57:12 -0600 Subject: [Bioperl-l] Bioperl-run malformed svndiff Message-ID: <9f985cdc1003071957h6c82d4b8t1a6b9a3af7752bde@mail.gmail.com> I recently tried to check out a complete version of bioperl-run and received an error saying 'malformed svndiff'. I've tried this on two different machines, so unless I've doing something wrong, it should be reproducible. I cannot say where updating an existing repository would throw the same error or not. Below is the log: *** Check Out svn checkout "svn://code.open-bio.org/bioperl/bioperl-run/trunk/lib/Bio at HEAD" -r HEAD --depth infinity "C:\Projects\Bio" A C:/Projects/Bio/Tools A C:/Projects/Bio/Tools/Run A C:/Projects/Bio/Tools/Run/Genewise.pm A C:/Projects/Bio/Tools/Run/Analysis A C:/Projects/Bio/Tools/Run/Analysis/soap.pm A C:/Projects/Bio/Tools/Run/AssemblerBase.pm A C:/Projects/Bio/Tools/Run/BWA.pm A C:/Projects/Bio/Tools/Run/Phrap.pm A C:/Projects/Bio/Tools/Run/FootPrinter.pm A C:/Projects/Bio/Tools/Run/AnalysisFactory.pm A C:/Projects/Bio/Tools/Run/BEDTools.pm A C:/Projects/Bio/Tools/Run/EMBOSSApplication.pm A C:/Projects/Bio/Tools/Run/Genscan.pm A C:/Projects/Bio/Tools/Run/RNAMotif.pm A C:/Projects/Bio/Tools/Run/Phylo A C:/Projects/Bio/Tools/Run/Phylo/Phast A C:/Projects/Bio/Tools/Run/Phylo/Phast/PhyloFit.pm A C:/Projects/Bio/Tools/Run/Phylo/Phast/PhastCons.pm A C:/Projects/Bio/Tools/Run/Phylo/Semphy.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/FEL.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/Base.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/Modeltest.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/REL.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/SLAC.pm A C:/Projects/Bio/Tools/Run/Phylo/PhyloBase.pm A C:/Projects/Bio/Tools/Run/Phylo/Phyml.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip A C:/Projects/Bio/Tools/Run/Phylo/Phylip/DrawGram.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Base.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/ProtPars.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Consense.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/DrawTree.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Neighbor.pm A C:/Projects/Bio/Tools/Run/Phylo/Njtree A C:/Projects/Bio/Tools/Run/Phylo/Njtree/Best.pm A C:/Projects/Bio/Tools/Run/Phylo/QuickTree.pm A C:/Projects/Bio/Tools/Run/Phylo/Gerp.pm A C:/Projects/Bio/Tools/Run/Phylo/Molphy A C:/Projects/Bio/Tools/Run/Phylo/Molphy/ProtML.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML A C:/Projects/Bio/Tools/Run/Phylo/PAML/Yn00.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Evolver.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Baseml.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Codeml.pm A C:/Projects/Bio/Tools/Run/Phylo/SLR.pm A C:/Projects/Bio/Tools/Run/Phylo/Gumby.pm A C:/Projects/Bio/Tools/Run/Phylo/LVB.pm A C:/Projects/Bio/Tools/Run/Primer3.pm A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus.pm A C:/Projects/Bio/Tools/Run/Meme.pm A C:/Projects/Bio/Tools/Run/RepeatMasker.pm A C:/Projects/Bio/Tools/Run/Analysis.pm A C:/Projects/Bio/Tools/Run/Cap3.pm A C:/Projects/Bio/Tools/Run/Vista.pm A C:/Projects/Bio/Tools/Run/Pseudowise.pm A C:/Projects/Bio/Tools/Run/Minimo.pm A C:/Projects/Bio/Tools/Run/Match.pm A C:/Projects/Bio/Tools/Run/Mdust.pm A C:/Projects/Bio/Tools/Run/Eponine.pm A C:/Projects/Bio/Tools/Run/Infernal.pm A C:/Projects/Bio/Tools/Run/BlastPlus A C:/Projects/Bio/Tools/Run/BlastPlus/Config.pm A C:/Projects/Bio/Tools/Run/EMBOSSacd.pm A C:/Projects/Bio/Tools/Run/Alignment A C:/Projects/Bio/Tools/Run/Alignment/Proda.pm A C:/Projects/Bio/Tools/Run/Alignment/Kalign.pm A C:/Projects/Bio/Tools/Run/Alignment/StandAloneFasta.pm A C:/Projects/Bio/Tools/Run/Alignment/TCoffee.pm A C:/Projects/Bio/Tools/Run/Alignment/Sim4.pm A C:/Projects/Bio/Tools/Run/Alignment/Probalign.pm A C:/Projects/Bio/Tools/Run/Alignment/Amap.pm A C:/Projects/Bio/Tools/Run/Alignment/Lagan.pm A C:/Projects/Bio/Tools/Run/Alignment/Blat.pm A C:/Projects/Bio/Tools/Run/Alignment/Gmap.pm A C:/Projects/Bio/Tools/Run/Alignment/Probcons.pm A C:/Projects/Bio/Tools/Run/Alignment/DBA.pm A C:/Projects/Bio/Tools/Run/Alignment/Muscle.pm A C:/Projects/Bio/Tools/Run/Alignment/Pal2Nal.pm A C:/Projects/Bio/Tools/Run/Alignment/Exonerate.pm A C:/Projects/Bio/Tools/Run/Alignment/MAFFT.pm A C:/Projects/Bio/Tools/Run/Alignment/Clustalw.pm A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm A C:/Projects/Bio/Tools/Run/Hmmer.pm A C:/Projects/Bio/Tools/Run/BlastPlus.pm A C:/Projects/Bio/Tools/Run/ERPIN.pm A C:/Projects/Bio/Tools/Run/Maq.pm A C:/Projects/Bio/Tools/Run/Bowtie A C:/Projects/Bio/Tools/Run/Bowtie/Config.pm A C:/Projects/Bio/Tools/Run/Seg.pm A C:/Projects/Bio/Tools/Run/Prints.pm A C:/Projects/Bio/Tools/Run/MCS.pm A C:/Projects/Bio/Tools/Run/Tmhmm.pm A C:/Projects/Bio/Tools/Run/Ensembl.pm A C:/Projects/Bio/Tools/Run/Coil.pm A C:/Projects/Bio/Tools/Run/Samtools A C:/Projects/Bio/Tools/Run/Samtools/Config.pm A C:/Projects/Bio/Tools/Run/Genemark.pm A C:/Projects/Bio/Tools/Run/Bowtie.pm A C:/Projects/Bio/Tools/Run/Glimmer.pm A C:/Projects/Bio/Tools/Run/Signalp.pm A C:/Projects/Bio/Tools/Run/Simprot.pm A C:/Projects/Bio/Tools/Run/BWA A C:/Projects/Bio/Tools/Run/BWA/Config.pm A C:/Projects/Bio/Tools/Run/Newbler.pm svn: Malformed svndiff data in representation *** Error (took 00:07.184) From David.Messina at sbc.su.se Mon Mar 8 02:01:13 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 8 Mar 2010 08:01:13 +0100 Subject: [Bioperl-l] Installing BioPerl In-Reply-To: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> References: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> Message-ID: <0483C203-3E81-4112-877B-BC7A439CB916@sbc.su.se> Hey Ernesto, I'm pretty sure you've got BioPerl version 1.6.0, which is actually more current than 1.5.2 that you were looking for. Due to oddities of Perl version numbers, 1.006 = 1.6.0 (or something like that). So I think you're probably good to go. I should also mention that direct installation (i.e. not via fink) works pretty well these days, and through that you can get the current BioPerl release, which is 1.6.2 (or 1.006002000000000). Dave From alex at bioinf.uni-leipzig.de Mon Mar 8 10:45:14 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Mon, 8 Mar 2010 16:45:14 +0100 (CET) Subject: [Bioperl-l] Problem with PAML/Codeml wrapper Message-ID: Hi, I do have a problem with the PAML/Codeml wrapper. I want to calculate all pairwise K_a,K_s values from a given alignment, using the example procedure of http://www.bioperl.org/wiki/HOWTO:PAML my $dna_aln = aa_to_dna_aln($aln, \%seqs); my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new( -params => { 'runmode' => -2, 'seqtype' => 1,} ); $kaks_factory->alignment($dna_aln); my ($rc,$parser) = $kaks_factory->run(); my $result = $parser->next_result(); But I receive an error: -------------------- WARNING --------------------- MSG: There was an error - see error_string for the program output --------------------------------------------------- ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/vendor_perl/5.10.0/Bio/Root/Root.pm:359 STACK: Bio::Tools::Phylo::PAML::_parse_summary /usr/lib/perl5/vendor_perl/5.10.0/Bio/Tools/Phylo/PAML.pm:441 STACK: Bio::Tools::Phylo::PAML::next_result /usr/lib/perl5/vendor_perl/5.10.0/Bio/Tools/Phylo/PAML.pm:257 I use PAML4.4. Could this be the reason? Best, Alex --- By the time you've read this, you've already read it! From David.Messina at sbc.su.se Mon Mar 8 11:29:00 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 8 Mar 2010 17:29:00 +0100 Subject: [Bioperl-l] Problem with PAML/Codeml wrapper In-Reply-To: References: Message-ID: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> Hi Alexander, Hmm, it *should* work given those parameters ? it does for 4.3b ? but I haven't tested it with codeml 4.4 yet. Could you file a bug, including a small test case (code + sequence) so we can try to reproduce and fix the problem? http://bugzilla.open-bio.org/ Thanks, Dave From alex at bioinf.uni-leipzig.de Mon Mar 8 12:11:42 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Mon, 8 Mar 2010 18:11:42 +0100 (CET) Subject: [Bioperl-l] Problem with PAML/Codeml wrapper In-Reply-To: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> References: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> Message-ID: sure. thanks! alex On Mon, 8 Mar 2010, Dave Messina wrote: > Hi Alexander, > > Hmm, it *should* work given those parameters ? it does for 4.3b ? but I haven't tested it with codeml 4.4 yet. > > Could you file a bug, including a small test case (code + sequence) so we can try to reproduce and fix the problem? > > http://bugzilla.open-bio.org/ > > > Thanks, > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > --- By the time you've read this, you've already read it! From jovel_juan at hotmail.com Mon Mar 8 23:08:20 2010 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 9 Mar 2010 04:08:20 +0000 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: , Message-ID: Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor, but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE From jovel_juan at hotmail.com Mon Mar 8 23:50:45 2010 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 9 Mar 2010 04:50:45 +0000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , Message-ID: Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE From florent.angly at gmail.com Tue Mar 9 01:41:33 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 09 Mar 2010 16:41:33 +1000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , Message-ID: <4B95ED9D.6080307@gmail.com> Hi Juan, How about you throw away sequences that have a mismatch in the adapter? After all, if there is a mismatch in the first few bases, it does not bode well for the rest of the sequence and there are so many sequences that it is not a big loss. Florent On 09/03/10 14:50, Juan Jovel wrote: > > > Hello Guys! > > Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. > > I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. > > Any suggestion will be appreciated. Thanks! > > JUAN > > > _________________________________________________________________ > Discover the new Windows Vista > http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From michael.watson at bbsrc.ac.uk Tue Mar 9 01:38:26 2010 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 9 Mar 2010 06:38:26 +0000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , , Message-ID: <8D08960C647E64438CE5740657CBBDC501F910621D@iahcexch1.iah.bbsrc.ac.uk> Use fastx toolkit or something within emboss. Failing that, just write something in pure perl:) ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] On Behalf Of Juan Jovel [jovel_juan at hotmail.com] Sent: 09 March 2010 04:50 To: bioperl Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From acn at stowers.org Tue Mar 9 01:31:49 2010 From: acn at stowers.org (Noll, Aaron) Date: Tue, 9 Mar 2010 00:31:49 -0600 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: Message-ID: http://hannonlab.cshl.edu/fastx_toolkit/commandline.html try out the clipper tool FASTA/Q Clipper $ fastx_clipper -h usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE] version 0.0.6 [-h] = This helpful help screen. [-a ADAPTER] = ADAPTER string. default is CCTTAAGG (dummy adapter). [-l N] = discard sequences shorter than N nucleotides. default is 5. [-d N] = Keep the adapter and N bases after it. (using '-d 0' is the same as not using '-d' at all. which is the default). [-c] = Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter). [-C] = Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter). [-k] = Report Adapter-Only sequences. [-n] = keep sequences with unknown (N) nucleotides. default is to discard such sequences. [-v] = Verbose - report number of sequences. If [-o] is specified, report will be printed to STDOUT. If [-o] is not specified (and output goes to STDOUT), report will be printed to STDERR. [-z] = Compress output with GZIP. [-D] = DEBUG output. [-i INFILE] = FASTA/Q input file. default is STDIN. [-o OUTFILE] = FASTA/Q output file. default is STDOUT. This is a suite of nice utilities that can be downloaded and that by the way are also used by galaxy. -Aaron -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Juan Jovel Sent: Monday, March 08, 2010 10:51 PM To: bioperl Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From alex at bioinf.uni-leipzig.de Tue Mar 9 13:00:01 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Tue, 9 Mar 2010 19:00:01 +0100 (CET) Subject: [Bioperl-l] bootstrap values in cladogram Message-ID: Hi, using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and bootstrap values and try to plot the tree as cladogram. But somehow I cannot print the bootstrap values. Short example: test.nwk ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); [..] use Bio::TreeIO; use Bio::Tree::Draw::Cladogram; [..] my $trees = Bio::TreeIO->new( -file => "test.nwk", -format => 'newick'); my $tree = $trees->next_tree(); [..] my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, -tree => $tree, -compact => 0); $out->print(-file => "test.eps"); I already tried it by copying the bootstrap values into the ids of the internal nodes - nothing. Any suggestions? Thanks, Alex --- By the time you've read this, you've already read it! From jason at bioperl.org Tue Mar 9 15:49:06 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 09 Mar 2010 12:49:06 -0800 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: Message-ID: <4B96B442.8070003@bioperl.org> SearchIO writer -> BLAST format. presumably something like Bio::SearchIO::Writer::TextResultWriter Janine Arloth wrote, On 3/5/10 1:43 AM: > Hello, > using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: > > 1 50 > test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA > lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA > > > 51 100 > test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA > lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA > > > > But I prefer this format: > > > > Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 > > Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 > ||||||||||||||||||||||||| > Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 > > > How can I get this? > > Best Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bhakti.dwivedi at gmail.com Tue Mar 9 15:58:34 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Tue, 9 Mar 2010 15:58:34 -0500 Subject: [Bioperl-l] How to retrieve the Gene Info from the hit genomes start and end positions in the blast table report? Message-ID: Hi, I have a blastn and blastx report (both in blast table m-8 format) against the ncbi nr database. Based on the Hits Start and End positions, how can I retrieve the gene name/acc/id? The blast table does show the hit organism accession number, but what I want is specifically the gene to which it is hitting to. Is there a way to do this in bioperl? Thanks From David.Messina at sbc.su.se Tue Mar 9 16:39:08 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 9 Mar 2010 22:39:08 +0100 Subject: [Bioperl-l] How to retrieve the Gene Info from the hit genomes start and end positions in the blast table report? In-Reply-To: References: Message-ID: Hi Bhakti, Forgive me if the below shows that I've totally misunderstood ? it's late here. > The blast table does show the hit organism > accession number, As you say, in BLAST -m 8 reports, the hit's accession number is the second column. I'm not sure when this would be different from the gene's accession number, at least for the entries in nr for which a gene name has been assigned (some are known only by their accession number). > Based on the Hits Start and End positions, how can I > retrieve the gene name/acc/id? The short answer is 'you can't'. But this makes me think that you're not going against the nr database, but instead whole genome or chromosome sequence records. In which case some of them will have genes annotated in the feature table, which you can get out using BioPerl: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation But many (most?) won't be annotated in this way, in which case you will need to find some file or database that has all the genes' start and stop positions in the sequence that you're searching. Perhaps you could provide a couple of your hits as examples so the problem is clearer? Dave From till.bayer at kaust.edu.sa Wed Mar 10 03:20:15 2010 From: till.bayer at kaust.edu.sa (Till Bayer) Date: Wed, 10 Mar 2010 11:20:15 +0300 Subject: [Bioperl-l] Bio::Index::Blast bug Message-ID: <4B97563F.3020901@kaust.edu.sa> Hi all! I tried to use Bio::Index::Blast, but always got the first hit back, no matter what ID I used. The reason is that the Blast indexer seems to use 'BLAST' as a record separator in all cases, except for RPS-BLAST. I think however that for the current versions of blastall and blast+ 'Query=' should be used. Thus, changing line 222 in Blast.pm from $indexpoint = tell($BLAST) - length $_ if ( $prefix eq 'RPS-' ); to $indexpoint = tell($BLAST) - length $_; makes it work for me. However I have no idea what RPS-BLAST may be, or what different versions of blast output are used, so maybe someone who knows should have a look at that before changing things, and writing a cleaner version than the above hack. Cheers, Till -- Till Bayer 4700 King Abdullah University for Science and Technology Building 2, Room 4231-W16 Thuwal 23955-6900 Saudi Arabia Phone: +96628082373 From avilella at gmail.com Wed Mar 10 03:55:09 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 10 Mar 2010 08:55:09 +0000 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores Message-ID: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> Hi all, I would like to know if anyone knows of a script or method in bioperl to do an unambiguous assembly of fastq sequences, combining the q-scores to give assembled fastq sequences as the output. By unambiguous I mean something like what abyss would produce with this options: ABYSS -k$k -b0 -t0 -e0 -c0 but giving assembled fastq sequences with combined q-scores as output instead of simple fasta assembled sequences. Thanks in advance From sdavis2 at mail.nih.gov Wed Mar 10 05:31:50 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 10 Mar 2010 05:31:50 -0500 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores In-Reply-To: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> References: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> Message-ID: <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> On Wed, Mar 10, 2010 at 3:55 AM, Albert Vilella wrote: > Hi all, > > I would like to know if anyone knows of a script or method in bioperl > to do an unambiguous assembly of fastq sequences, combining the q-scores to > give assembled fastq sequences as the output. > > By unambiguous I mean something like what abyss would produce with this options: > > ABYSS -k$k -b0 -t0 -e0 -c0 > > but giving assembled fastq sequences with combined q-scores as output > instead of simple > fasta assembled sequences. Hi, Albert. I'm not sure exactly what you want here, but have you looked at the Mosaik aligner? Also, look at samtools pileup; you can probably produce something similar to what you want from it as well. I certainly might have misunderstood the problem, though. Sean From biopython at maubp.freeserve.co.uk Wed Mar 10 05:35:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Mar 2010 10:35:56 +0000 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <4B97563F.3020901@kaust.edu.sa> References: <4B97563F.3020901@kaust.edu.sa> Message-ID: <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: > Hi all! > > I tried to use Bio::Index::Blast, but always got the first hit back, no > matter what ID I used. The reason is that the Blast indexer seems to use > 'BLAST' as a record separator in all cases, except for RPS-BLAST. > I think however that for the current versions of blastall and blast+ > 'Query=' should be used. That fits with changes I had to make in Biopython for breaking up the plain text BLAST output into each query. For a while only the RPS-BLAST report omitted the "header" (the BLAST line and the journal references users should cite) between records, but now all the NCBI BLAST tools do this - forcing us to look for the Query= line. i.e. I can't comment on the BioPerl change itself, but your reasoning about the BLAST output makes sense. Peter From avilella at gmail.com Wed Mar 10 05:47:01 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 10 Mar 2010 10:47:01 +0000 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores In-Reply-To: <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> References: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> Message-ID: <358f4d651003100247k789344a2m2decd7283e658de9@mail.gmail.com> Hi Sean, By unambiguous assembly of reads I mean that one would not squash bubbles or trim branches, but simply collapse fully overlapping (embedded) reads by combining the q-scores, or raising the q-scores if you want, and keeping branching graphs separate. This unambiguous denovo assembly would discard depth information, which is important if you are doing digital gene expression analysis, but would produce a collapsed fastq set of sequences that would be leaner for downstream processing. I'll have a look at Mosaik. I tried samtools pileup, but it seems a bit overcomplicated to have to map back the reads if what you want to do is just have the assembled reads with fastq scores coming out of the assembler in the first place. That's why I was thinking it would be good to have this unambiguous or "dummy" fastq assembly output could fit into a bioperl script or method. Cheers On Wed, Mar 10, 2010 at 10:31 AM, Sean Davis wrote: > On Wed, Mar 10, 2010 at 3:55 AM, Albert Vilella wrote: >> Hi all, >> >> I would like to know if anyone knows of a script or method in bioperl >> to do an unambiguous assembly of fastq sequences, combining the q-scores to >> give assembled fastq sequences as the output. >> >> By unambiguous I mean something like what abyss would produce with this options: >> >> ABYSS -k$k -b0 -t0 -e0 -c0 >> >> but giving assembled fastq sequences with combined q-scores as output >> instead of simple >> fasta assembled sequences. > > Hi, Albert. > > I'm not sure exactly what you want here, but have you looked at the > Mosaik aligner? ?Also, look at samtools pileup; you can probably > produce something similar to what you want from it as well. > > I certainly might have misunderstood the problem, though. > > Sean > From adsj at novozymes.com Wed Mar 10 08:46:02 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 10 Mar 2010 14:46:02 +0100 Subject: [Bioperl-l] [PATCH] Fix infinite loop in EMBL writer. Message-ID: <87k4tke1d1.fsf@topper.koldfront.dk> This fix is an exact duplicate of the fix for bug #2915 - of the Genbank writer, which was fixed in revision 16275. --- Bio/SeqIO/embl.pm | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/Bio/SeqIO/embl.pm b/Bio/SeqIO/embl.pm index cfea1b6..de1bf11 100644 --- a/Bio/SeqIO/embl.pm +++ b/Bio/SeqIO/embl.pm @@ -1432,7 +1432,7 @@ sub _write_line_EMBL_regex { CHUNK: while($line) { foreach my $pat ($regex, '[,;\.\/-]\s|'.$regex, '[,;\.\/-]|'.$regex) { - if ($line =~ m/^(.{1,$subl})($pat)(.*)/ ) { + if ($line =~ m/^(.{0,$subl})($pat)(.*)/ ) { my $l = $1.$2; $l =~ s/#/ /g # remove word wrap protection char '#' if $pre1 eq "RA "; @@ -1441,6 +1441,7 @@ sub _write_line_EMBL_regex { # be strict about not padding spaces according to # genbank format $l =~ s/\s+$//; + next CHUNK if ($l eq ''); push(@lines, $l); next CHUNK; } -- 1.6.3.3 -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Wed Mar 10 09:27:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 08:27:59 -0600 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> Message-ID: On Mar 10, 2010, at 4:35 AM, Peter wrote: > On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >> Hi all! >> >> I tried to use Bio::Index::Blast, but always got the first hit back, no >> matter what ID I used. The reason is that the Blast indexer seems to use >> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >> I think however that for the current versions of blastall and blast+ >> 'Query=' should be used. > > That fits with changes I had to make in Biopython for breaking > up the plain text BLAST output into each query. For a while only > the RPS-BLAST report omitted the "header" (the BLAST line > and the journal references users should cite) between records, > but now all the NCBI BLAST tools do this - forcing us to look > for the Query= line. > > i.e. I can't comment on the BioPerl change itself, but your > reasoning about the BLAST output makes sense. > > Peter One side-effect of this is we will be missing the search algorithm and a few small odds and ends from all but the first report; this trickles down into how we properly deal with HSP coordinates, but we can probably wrangle some magic there to get things working for the most part. This is similar to how XML format is currently dealt with (and another reason this format is the easiest to support, as it doesn't change based on NCBI's whims). Do we have example reports with multiple queries from BLAST+ available? It would be invaluable for the projects; if not I can probably generate a few locally. chris From biopython at maubp.freeserve.co.uk Wed Mar 10 09:40:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Mar 2010 14:40:16 +0000 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> Message-ID: <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> On Wed, Mar 10, 2010 at 2:27 PM, Chris Fields wrote: > On Mar 10, 2010, at 4:35 AM, Peter wrote: > >> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >>> Hi all! >>> >>> I tried to use Bio::Index::Blast, but always got the first hit back, no >>> matter what ID I used. The reason is that the Blast indexer seems to use >>> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >>> I think however that for the current versions of blastall and blast+ >>> 'Query=' should be used. >> >> That fits with changes I had to make in Biopython for breaking >> up the plain text BLAST output into each query. For a while only >> the RPS-BLAST report omitted the "header" (the BLAST line >> and the journal references users should cite) between records, >> but now all the NCBI BLAST tools do this - forcing us to look >> for the Query= line. >> >> i.e. I can't comment on the BioPerl change itself, but your >> reasoning about the BLAST output makes sense. >> >> Peter > > One side-effect of this is we will be missing the search > algorithm and a few small odds and ends from all but > the first report; this trickles down into how we properly > deal with HSP coordinates, but we can probably wrangle > some magic there to get things working for the most part. > ... Yeah - I had similar issues with the Biopython plain text BLAST parser. The hack/magic I used was to cache the header text from the first record and then re-insert it on subsequence records. Nasty, but works. >?This is similar to how XML format is currently dealt with > (and another reason this format is the easiest to support, > as it doesn't change based on NCBI's whims). They may have changed a few things here too - watch out. > Do we have example reports with multiple queries from > BLAST+ available? ?It would be invaluable for the projects; > if not I can probably generate a few locally. I've got one example in Biopython's unit tests, http://biopython.org/SRC/biopython/Tests/Blast/bt081.txt Peter From cjfields at illinois.edu Wed Mar 10 10:19:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 09:19:42 -0600 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> Message-ID: <27C91884-E910-4BDF-B777-B90E7B4F9103@illinois.edu> On Mar 10, 2010, at 8:40 AM, Peter wrote: > On Wed, Mar 10, 2010 at 2:27 PM, Chris Fields wrote: >> On Mar 10, 2010, at 4:35 AM, Peter wrote: >> >>> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >>>> Hi all! >>>> >>>> I tried to use Bio::Index::Blast, but always got the first hit back, no >>>> matter what ID I used. The reason is that the Blast indexer seems to use >>>> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >>>> I think however that for the current versions of blastall and blast+ >>>> 'Query=' should be used. >>> >>> That fits with changes I had to make in Biopython for breaking >>> up the plain text BLAST output into each query. For a while only >>> the RPS-BLAST report omitted the "header" (the BLAST line >>> and the journal references users should cite) between records, >>> but now all the NCBI BLAST tools do this - forcing us to look >>> for the Query= line. >>> >>> i.e. I can't comment on the BioPerl change itself, but your >>> reasoning about the BLAST output makes sense. >>> >>> Peter >> >> One side-effect of this is we will be missing the search >> algorithm and a few small odds and ends from all but >> the first report; this trickles down into how we properly >> deal with HSP coordinates, but we can probably wrangle >> some magic there to get things working for the most part. >> ... > > Yeah - I had similar issues with the Biopython plain > text BLAST parser. The hack/magic I used was to > cache the header text from the first record and then > re-insert it on subsequence records. Nasty, but works. Right, but here's the side-effect: unless that data is somehow stored when indexing, it will not be caught if one starts an IO stream at any point past the BLAST header (in other words, all but the first report). We could, in effect, store that as meta information somehow (I think Index may have some meta storage), or just parse it prior to initiating the stream and pass the information into the IO object. >> This is similar to how XML format is currently dealt with >> (and another reason this format is the easiest to support, >> as it doesn't change based on NCBI's whims). > > They may have changed a few things here too - watch out. Ugh. >> Do we have example reports with multiple queries from >> BLAST+ available? It would be invaluable for the projects; >> if not I can probably generate a few locally. > > I've got one example in Biopython's unit tests, > http://biopython.org/SRC/biopython/Tests/Blast/bt081.txt > > Peter Okay, will start up some work to work out tests, etc. chris From thomas.sharpton at gmail.com Wed Mar 10 10:30:37 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 10 Mar 2010 07:30:37 -0800 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser Message-ID: Hey everyone, Since HMMER version 3 went live in the middle of last month, I thought it a good time to update the SearchIO parser I've been working on for some time and submit the tool to the community (finally....). At the moment, the module seems capable of parsing hmmsearch and hmmscan outputs, both with and without the alignment option. Some aspects of functionality have yet to be flushed out, but this one should be capable of doing most of your day to day procedures (at least it appears to on my end). I'd love to have people play with it and I'm happy to hear feedback, criticism, development requests and bug reports. That said, this is the first code I've contributed to BioPerl, so please be gentle ;). You can find the bioperl-hmmer3 package in bioperl-dev. I've included a test script as well as sample hmmscan/hmmsearch report files and test data in the bioperl-hmmer3 root directory. As an aside, BioPerl has been a wonderful resource for me and I'm glad to be giving back, even if only a little. I hope this helps out at least a few of you. All the best, Tom From cjfields at illinois.edu Wed Mar 10 10:53:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 09:53:41 -0600 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser In-Reply-To: References: Message-ID: <1268236421.20872.21.camel@pyrimidine.igb.uiuc.edu> Wonderful! Tom, thanks for your hard work! chris On Wed, 2010-03-10 at 07:30 -0800, Thomas Sharpton wrote: > Hey everyone, > > Since HMMER version 3 went live in the middle of last month, I thought > it a good time to update the SearchIO parser I've been working on for > some time and submit the tool to the community (finally....). At the > moment, the module seems capable of parsing hmmsearch and hmmscan > outputs, both with and without the alignment option. Some aspects of > functionality have yet to be flushed out, but this one should be > capable of doing most of your day to day procedures (at least it > appears to on my end). > > I'd love to have people play with it and I'm happy to hear feedback, > criticism, development requests and bug reports. That said, this is > the first code I've contributed to BioPerl, so please be gentle ;). > You can find the bioperl-hmmer3 package in bioperl-dev. I've included > a test script as well as sample hmmscan/hmmsearch report files and > test data in the bioperl-hmmer3 root directory. > > As an aside, BioPerl has been a wonderful resource for me and I'm glad > to be giving back, even if only a little. I hope this helps out at > least a few of you. > > All the best, > Tom > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From asjo at koldfront.dk Wed Mar 10 12:04:00 2010 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 10 Mar 2010 18:04:00 +0100 Subject: [Bioperl-l] Fix infinite loop in EMBL writer. In-Reply-To: <87k4tke1d1.fsf@topper.koldfront.dk> ("Adam =?iso-8859-1?Q?Sj?= =?iso-8859-1?Q?=F8gren=22's?= message of "Wed, 10 Mar 2010 14:46:02 +0100") References: <87k4tke1d1.fsf@topper.koldfront.dk> Message-ID: <87wrxkw1kv.fsf@topper.koldfront.dk> On Wed, 10 Mar 2010 14:46:02 +0100, Adam wrote: > This fix is an exact duplicate of the fix for bug #2915 - of > the Genbank writer, which was fixed in revision 16275. I have created bug #3025 in bugzilla with the patch (I couldn't remember whether here or there is most appropriate). Best regards, Adam -- "It isn't modern just because it's electric. Country Adam Sj?gren music was electric too." asjo at koldfront.dk From David.Messina at sbc.su.se Wed Mar 10 12:35:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 10 Mar 2010 18:35:52 +0100 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser In-Reply-To: References: Message-ID: Thanks so much, Thomas! I expect to be using Hmmer 3 for my own work fairly soon, so I'm looking forward to taking advantage of this. Dave From rmb32 at cornell.edu Wed Mar 10 15:13:57 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 10 Mar 2010 12:13:57 -0800 Subject: [Bioperl-l] call for help - BioPerl GSoC wiki page Message-ID: <4B97FD85.50402@cornell.edu> Hi all, BioPerl's Google Summer of Code page in support of the Open Bioinformatics Foundation's application to Google Summer of Code is shaping up, but still needs some polishing. We're coming up on the application deadline, and we need to make a good, polished show of it. Please put in a little time to look at, edit, polish, and flesh out the BioPerl and OBF wiki pages in support of our application: BioPerl: http://bioperl.org/wiki/Google_Summer_of_Code OBF: http://open-bio.org/wiki/Google_Summer_of_Code Specific things for the BioPerl page, the Bio::Assembly project on that page needs to either be fleshed out or removed. Thanks for all the hard work from everyone so far (especially Chris!). It would be *very* good to have some more project ideas and mentor volunteers. So if you haven't already, please consider volunteering to mentor a student. Also, we all know many things that BioPerl needs help with, so if you can think of a good intern project, add it to the page and maybe we can get a GSoC student to work on it. Rob From nml5566 at gmail.com Wed Mar 10 17:52:19 2010 From: nml5566 at gmail.com (Nathan Liles) Date: Wed, 10 Mar 2010 16:52:19 -0600 Subject: [Bioperl-l] Can protein glyph tracks interfere with other tracks? Message-ID: <4B9822A3.2050202@gmail.com> I'm trying to patch Gbrowse to properly display circular segments. Currently, I'm working on getting the protein glyphs to display properly beyond the end of the track. I noticed when I turn on the protein track, it can sometimes affect another track. Specifically, turning on the protein track can either cause the gene glyphs to disappear or be duplicated. This only happens for features with two subfeatures that appear on the panel at opposite ends. This seems strange since I can't imagine how one track could affect another. Has anyone noticed this behavior before? Can anybody think of a way that the protein glyph module can affect other glyphs? Thanks, Nathan Liles From me at miguel.weapps.com Thu Mar 11 00:48:17 2010 From: me at miguel.weapps.com (Luis M Rodriguez-R) Date: Thu, 11 Mar 2010 00:48:17 -0500 Subject: [Bioperl-l] PSI-BLAST uncommon result Message-ID: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> Hello all, I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception: MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792 My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail. Best regards, Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 luisrodr at uniandes.edu.co me at miguel.weapps.com BLASTP 2.2.18 [Mar-02-2008] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for compositional score matrix adjustment: Altschul, Stephen F., John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. Reference for composition-based statistics starting in round 2: Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. Query= eff254 (67 letters) Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 10,383,435 sequences; 3,542,477,638 total letters Searching..................................................done Results from round 1 Score E Sequences producing significant alignments: (bits) Value ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28 >ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin hrpA gb|ABA39805.1| HrpA [Erwinia pyrifoliae] emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM 12163] Length = 67 Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust. Identities = 67/67 (100%), Positives = 67/67 (100%) Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 Query: 61 AAKAIQF 67 AAKAIQF Sbjct: 61 AAKAIQF 67 Searching..................................................done ***** No hits found ****** Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 863,709,833 Number of sequences in database: 2,562,282 Database: /storage1/databases/ncbi-blast/nr.01 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 936,189,781 Number of sequences in database: 2,674,439 Database: /storage1/databases/ncbi-blast/nr.02 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 974,890,473 Number of sequences in database: 2,826,395 Database: /storage1/databases/ncbi-blast/nr.03 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 767,687,551 Number of sequences in database: 2,320,319 Lambda K H 0.297 0.107 0.256 Lambda K H 0.267 0.0344 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 480,706,425 Number of Sequences: 10383435 Number of extensions: 8598061 Number of successful extensions: 47335 Number of sequences better than 1.0e-25: 1 Number of HSP's better than 0.0 without gapping: 2 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 47333 Number of HSP's gapped (non-prelim): 2 length of query: 67 length of database: 3,542,477,638 effective HSP length: 39 effective length of query: 28 effective length of database: 3,137,523,673 effective search space: 87850662844 effective search space used: 87850662844 T: 11 A: 40 X1: 16 ( 6.9 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 43 (21.7 bits) S2: 298 (119.7 bits) From jason at bioperl.org Thu Mar 11 03:13:24 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 00:13:24 -0800 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: References: Message-ID: <4B98A624.7020102@bioperl.org> not sure if the cladogram is printing bootstraps from the internal id or the bootstrap function. See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how to automatically convert internal IDs to boostrap slots basically by using -internal_node_id => 'bootstrap' in the TreeIO initialization. You may want to iterate through the tree and print $node->bootstrap where you think it should be so you can verify that it is working too. -jason Alexander Donath wrote, On 3/9/10 10:00 AM: > Hi, > > using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and > bootstrap values and try to plot the tree as cladogram. But somehow I > cannot print the bootstrap values. > > Short example: > > test.nwk > ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); > > > > [..] > use Bio::TreeIO; > use Bio::Tree::Draw::Cladogram; > [..] > my $trees = Bio::TreeIO->new( -file => "test.nwk", > -format => 'newick'); > my $tree = $trees->next_tree(); > [..] > my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, > -tree => $tree, > -compact => 0); > > $out->print(-file => "test.eps"); > > > I already tried it by copying the bootstrap values into the ids of the > internal nodes - nothing. Any suggestions? > > > Thanks, > Alex > > --- > By the time you've read this, you've already read it! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 11 09:27:33 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 08:27:33 -0600 Subject: [Bioperl-l] PSI-BLAST uncommon result In-Reply-To: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> References: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> Message-ID: <70AF1FA5-FD88-48E3-A672-F72B9D3E1B3B@illinois.edu> Luis, The best way to handle this is to attach the problematic report (not append it) to a bug report on bugzilla. This ensures we aren't running into artifacts generated via the email client, etc. chris On Mar 10, 2010, at 11:48 PM, Luis M Rodriguez-R wrote: > Hello all, > > I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception: > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792 > My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail. > > Best regards, > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > > BLASTP 2.2.18 [Mar-02-2008] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > > Reference for compositional score matrix adjustment: Altschul, Stephen F., > John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, > Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches > using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. > > > Reference for composition-based statistics starting in round 2: > Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, > Sergei Shavirin, John L. Spouge, Yuri I. Wolf, > Eugene V. Koonin, and Stephen F. Altschul (2001), > "Improving the accuracy of PSI-BLAST protein database searches with > composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > Query= eff254 > (67 letters) > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 10,383,435 sequences; 3,542,477,638 total letters > > Searching..................................................done > > > Results from round 1 > > > Score E > Sequences producing significant alignments: (bits) Value > > ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28 > >> ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation > pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] > sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin > hrpA > gb|ABA39805.1| HrpA [Erwinia pyrifoliae] > emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation > pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] > emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM > 12163] > Length = 67 > > Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust. > Identities = 67/67 (100%), Positives = 67/67 (100%) > > Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 > MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN > Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 > > Query: 61 AAKAIQF 67 > AAKAIQF > Sbjct: 61 AAKAIQF 67 > > > Searching..................................................done > > > > ***** No hits found ****** > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 863,709,833 > Number of sequences in database: 2,562,282 > > Database: /storage1/databases/ncbi-blast/nr.01 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 936,189,781 > Number of sequences in database: 2,674,439 > > Database: /storage1/databases/ncbi-blast/nr.02 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 974,890,473 > Number of sequences in database: 2,826,395 > > Database: /storage1/databases/ncbi-blast/nr.03 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 767,687,551 > Number of sequences in database: 2,320,319 > > Lambda K H > 0.297 0.107 0.256 > > Lambda K H > 0.267 0.0344 0.140 > > > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Hits to DB: 480,706,425 > Number of Sequences: 10383435 > Number of extensions: 8598061 > Number of successful extensions: 47335 > Number of sequences better than 1.0e-25: 1 > Number of HSP's better than 0.0 without gapping: 2 > Number of HSP's successfully gapped in prelim test: 0 > Number of HSP's that attempted gapping in prelim test: 47333 > Number of HSP's gapped (non-prelim): 2 > length of query: 67 > length of database: 3,542,477,638 > effective HSP length: 39 > effective length of query: 28 > effective length of database: 3,137,523,673 > effective search space: 87850662844 > effective search space used: 87850662844 > T: 11 > A: 40 > X1: 16 ( 6.9 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 43 (21.7 bits) > S2: 298 (119.7 bits) > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Mar 11 10:38:50 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 07:38:50 -0800 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: References: <4B98A624.7020102@bioperl.org> Message-ID: <4B990E8A.5060704@bioperl.org> Yeah sorry then I don't know what the problem is. The usual - are you using the latest version question applies, but sounds like something else is wrong with this module. I don't have any time to try out any code sorry but maybe someone else can step in to give a hand. -jason Alexander Donath wrote, On 3/11/10 1:05 AM: > I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. > > Nevertheless, iterating through the tree and printing $node->bootstrap > worked in both cases and gave me the correct bootstrap values of the > inner nodes. > > I also called move_id_to_bootstrap on the tree. But this resulted in > an error: > > Can't locate object method "move_id_to_bootstrap" via package > "Bio::Tree::Tree". > Even though it's inherited from the interface, as far as I can tell. > > > alex > > > On Thu, 11 Mar 2010, Jason Stajich wrote: > >> not sure if the cladogram is printing bootstraps from the internal id >> or the bootstrap function. >> >> See the example code here http://bioperl.org/wiki/HOWTO:Trees that >> shows how to automatically convert internal IDs to boostrap slots >> basically by using >> -internal_node_id => 'bootstrap' >> in the TreeIO initialization. >> >> You may want to iterate through the tree and print $node->bootstrap >> where you think it should be so you can verify that it is working too. >> >> -jason >> >> Alexander Donath wrote, On 3/9/10 10:00 AM: >>> Hi, >>> >>> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths >>> and bootstrap values and try to plot the tree as cladogram. But >>> somehow I cannot print the bootstrap values. >>> >>> Short example: >>> >>> test.nwk >>> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >>> >>> >>> >>> >>> [..] >>> use Bio::TreeIO; >>> use Bio::Tree::Draw::Cladogram; >>> [..] >>> my $trees = Bio::TreeIO->new( -file => "test.nwk", >>> -format => 'newick'); >>> my $tree = $trees->next_tree(); >>> [..] >>> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >>> -tree => $tree, >>> -compact => 0); >>> >>> $out->print(-file => "test.eps"); >>> >>> >>> I already tried it by copying the bootstrap values into the ids of the >>> internal nodes - nothing. Any suggestions? >>> >>> >>> Thanks, >>> Alex >>> >>> --- >>> By the time you've read this, you've already read it! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > --- > Alexander Donath > Professur f?r Bioinformatik > Institut f?r Informatik > Universit?t Leipzig > H?rtelstr. 16-18 > D-04107 Leipzig, Germany > > phone: +49 (0)341 97-16702 > fax: +49 (0)341 97-16679 > > By the time you've read this, you've already read it! From jason at bioperl.org Thu Mar 11 10:40:59 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 07:40:59 -0800 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: References: Message-ID: <4B990F0B.8010100@bioperl.org> You should only have TWO nodes in the array not all the leaves. =head2 distance Title : distance Usage : distance(-nodes => \@nodes ) Function: returns the distance between TWO given nodes Returns : numerical distance Args : -nodes => arrayref of nodes to test or ($node1, $node2) =cut Jeffrey Detras wrote, On 3/4/10 10:17 PM: > Hi, > > I am new at using the Bio::TreeIO module specifically using the newick > format for a phylogenetic analysis. The sample_tree attached is > Newick-formatted tree. My objective is to get all the distances between all > the leaf nodes. I copied examples of the code from > http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my > knowledge) so that I understand how to assign the right array value for the > nodes/leaves. The message would say must provide 2 root nodes. > > Here is what I have right now: > > #!/usr/bin/perl -w > use strict; > > my $treefile = 'sample_tree'; > use Bio::TreeIO; > my $treeio = Bio::TreeIO->new(-format => 'newick', > -file => $treefile); > > while (my $tree = $treeio->next_tree) { > my @leaves = $tree->get_leaf_nodes; > for (my $dist = $tree->distance(-nodes => \@leaves)){ > print "Distance between trees is $dist\n"; > } > } > > Thanks, > Jeff > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Mar 11 11:11:04 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 11:11:04 -0500 Subject: [Bioperl-l] Can protein glyph tracks interfere with other tracks? In-Reply-To: <4B9822A3.2050202@gmail.com> References: <4B9822A3.2050202@gmail.com> Message-ID: <4536f7701003110811s79c30638x100ae521bce1084a@mail.gmail.com> Hi Nathan, Well, it certainly shouldn't! The tracks are supposed to be calculated independently without reusing anything. Debugging should be fun though. Does it matter if you change the adaptor (for instance, if you are using the memory adaptor for Bio::DB::SeqFeature::Store, try putting it in a mysql database (or vice versa) to help narrow down where the bug is. Scott On Wed, Mar 10, 2010 at 5:52 PM, Nathan Liles wrote: > I'm trying to patch Gbrowse to properly display circular segments. > Currently, I'm working on getting the protein glyphs to display properly > beyond the end of the track. > > I noticed when I turn on the protein track, it can sometimes affect another > track. Specifically, turning on the protein track can either cause the gene > glyphs to disappear or be duplicated. > This only happens for features with two subfeatures that appear on the panel > at opposite ends. > > This seems strange since I can't imagine how one track could affect another. > Has anyone noticed this behavior before? > Can anybody think of a way that the protein glyph module can affect other > glyphs? > > Thanks, > Nathan Liles > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Thu Mar 11 11:21:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 10:21:02 -0600 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: <4B990E8A.5060704@bioperl.org> References: <4B98A624.7020102@bioperl.org> <4B990E8A.5060704@bioperl.org> Message-ID: <2BBC0220-4233-4EB7-81A8-FA8342ED9714@illinois.edu> Alex, The best thing to do is to file this as a bug so we don't lose track of it, including demonstration code. chris On Mar 11, 2010, at 9:38 AM, Jason Stajich wrote: > Yeah sorry then I don't know what the problem is. The usual - are you using the latest version question applies, but sounds like something else is wrong with this module. > > I don't have any time to try out any code sorry but maybe someone else can step in to give a hand. > -jason > > Alexander Donath wrote, On 3/11/10 1:05 AM: >> I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. >> >> Nevertheless, iterating through the tree and printing $node->bootstrap worked in both cases and gave me the correct bootstrap values of the inner nodes. >> >> I also called move_id_to_bootstrap on the tree. But this resulted in an error: >> >> Can't locate object method "move_id_to_bootstrap" via package "Bio::Tree::Tree". >> Even though it's inherited from the interface, as far as I can tell. >> >> >> alex >> >> >> On Thu, 11 Mar 2010, Jason Stajich wrote: >> >>> not sure if the cladogram is printing bootstraps from the internal id or the bootstrap function. >>> >>> See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how to automatically convert internal IDs to boostrap slots basically by using >>> -internal_node_id => 'bootstrap' >>> in the TreeIO initialization. >>> >>> You may want to iterate through the tree and print $node->bootstrap where you think it should be so you can verify that it is working too. >>> >>> -jason >>> >>> Alexander Donath wrote, On 3/9/10 10:00 AM: >>>> Hi, >>>> >>>> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and bootstrap values and try to plot the tree as cladogram. But somehow I cannot print the bootstrap values. >>>> >>>> Short example: >>>> >>>> test.nwk >>>> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >>>> >>>> >>>> >>>> [..] >>>> use Bio::TreeIO; >>>> use Bio::Tree::Draw::Cladogram; >>>> [..] >>>> my $trees = Bio::TreeIO->new( -file => "test.nwk", >>>> -format => 'newick'); >>>> my $tree = $trees->next_tree(); >>>> [..] >>>> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >>>> -tree => $tree, >>>> -compact => 0); >>>> >>>> $out->print(-file => "test.eps"); >>>> >>>> >>>> I already tried it by copying the bootstrap values into the ids of the >>>> internal nodes - nothing. Any suggestions? >>>> >>>> >>>> Thanks, >>>> Alex >>>> >>>> --- >>>> By the time you've read this, you've already read it! >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> --- >> Alexander Donath >> Professur f?r Bioinformatik >> Institut f?r Informatik >> Universit?t Leipzig >> H?rtelstr. 16-18 >> D-04107 Leipzig, Germany >> >> phone: +49 (0)341 97-16702 >> fax: +49 (0)341 97-16679 >> >> By the time you've read this, you've already read it! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Mon Mar 8 16:06:11 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 08 Mar 2010 16:06:11 -0500 Subject: [Bioperl-l] Next Gen Formats Message-ID: <4B9566C3.6000007@umdnj.edu> Does Bioperl support color-space sequences, or FASTA formatted quality value files? ABI's Solid platform generates a number of files, two of which are fairly important (at the moment): 1) .csfasta Color-space sequences in FASTA format 2) .qual Quality values of each color call, also in FASTA format. I didn't see (at quick glance) support for this in Bioperl, but maybe someone can point me in the right direction? Ryan -------------- next part -------------- A non-text attachment was scrubbed... Name: golharam.vcf Type: text/x-vcard Size: 379 bytes Desc: not available URL: From alex at bioinf.uni-leipzig.de Thu Mar 11 04:05:13 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Thu, 11 Mar 2010 10:05:13 +0100 (CET) Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: <4B98A624.7020102@bioperl.org> References: <4B98A624.7020102@bioperl.org> Message-ID: I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. Nevertheless, iterating through the tree and printing $node->bootstrap worked in both cases and gave me the correct bootstrap values of the inner nodes. I also called move_id_to_bootstrap on the tree. But this resulted in an error: Can't locate object method "move_id_to_bootstrap" via package "Bio::Tree::Tree". Even though it's inherited from the interface, as far as I can tell. alex On Thu, 11 Mar 2010, Jason Stajich wrote: > not sure if the cladogram is printing bootstraps from the internal id or the > bootstrap function. > > See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how > to automatically convert internal IDs to boostrap slots basically by using > -internal_node_id => 'bootstrap' > in the TreeIO initialization. > > You may want to iterate through the tree and print $node->bootstrap where you > think it should be so you can verify that it is working too. > > -jason > > Alexander Donath wrote, On 3/9/10 10:00 AM: >> Hi, >> >> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and >> bootstrap values and try to plot the tree as cladogram. But somehow I >> cannot print the bootstrap values. >> >> Short example: >> >> test.nwk >> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >> >> >> >> [..] >> use Bio::TreeIO; >> use Bio::Tree::Draw::Cladogram; >> [..] >> my $trees = Bio::TreeIO->new( -file => "test.nwk", >> -format => 'newick'); >> my $tree = $trees->next_tree(); >> [..] >> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >> -tree => $tree, >> -compact => 0); >> >> $out->print(-file => "test.eps"); >> >> >> I already tried it by copying the bootstrap values into the ids of the >> internal nodes - nothing. Any suggestions? >> >> >> Thanks, >> Alex >> >> --- >> By the time you've read this, you've already read it! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Alexander Donath Professur f?r Bioinformatik Institut f?r Informatik Universit?t Leipzig H?rtelstr. 16-18 D-04107 Leipzig, Germany phone: +49 (0)341 97-16702 fax: +49 (0)341 97-16679 By the time you've read this, you've already read it! From Alexander.Kanapin at oicr.on.ca Thu Mar 11 10:56:41 2010 From: Alexander.Kanapin at oicr.on.ca (Alexander Kanapin) Date: Thu, 11 Mar 2010 10:56:41 -0500 Subject: [Bioperl-l] GFF to GTF converter Message-ID: Hi BioPerl gurus, Does anybody knows a reliable GFF to GTF converter which can generate files acceptable by cufflinks ? We attempted to convert a drosophila and worm genome GFFs (taken from Flybase and Wormbase ftp) to GTF with Bio::FeatureIO #read from a file my $in = Bio::FeatureIO->new(-file => $infile , -format => 'GFF'); #write out features my $out = Bio::FeatureIO->new(-file => ">$outfile" , -format => 'GFF' , -version => 2.5); However, we discovered that the resulting file is not compliant with GTF format specifications as they are described here: http://mblab.wustl.edu/GTF22.html Although, this chunk of code produces CDS and exon entries in the output file, it does not output start codon/stop codon annotations. Also, we think it misinterprets annotations, so that one do see UTR entries annotated as CDS' or exons. Many thanks for ideas/notes. Alex -- Alexander Kanapin, PhD Scientific Associate Ontario Institute for Cancer Research MaRS Centre, South Tower 101 College Street, Suite 800 Toronto, Ontario, Canada M5G 0A3 Tel: 647-260-7993 Toll-free: 1-866-678-6427 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From cjfields at illinois.edu Thu Mar 11 12:27:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 11:27:35 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9566C3.6000007@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> Message-ID: <7D743CA2-80A1-42E3-81D2-03B7CD01FC69@illinois.edu> Not that I know of, though we are certainly receptive to anyone wanting to work this into the current code. chris On Mar 8, 2010, at 3:06 PM, Ryan Golhar wrote: > Does Bioperl support color-space sequences, or FASTA formatted quality value files? > > ABI's Solid platform generates a number of files, two of which are fairly important (at the moment): > > 1) .csfasta > > Color-space sequences in FASTA format > > 2) .qual > > Quality values of each color call, also in FASTA format. > > I didn't see (at quick glance) support for this in Bioperl, but maybe someone can point me in the right direction? > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Thu Mar 11 12:35:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Mar 2010 17:35:32 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9566C3.6000007@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> Message-ID: <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: > Does Bioperl support color-space sequences, or FASTA formatted quality value > files? > > ABI's Solid platform generates a number of files, two of which are fairly > important (at the moment): > > 1) ?.csfasta > > Color-space sequences in FASTA format > > 2) .qual > > Quality values of each color call, also in FASTA format. You mean the QUAL format which was originally introduced by PHRED. Try "qual" as the format name in SeqIO, http://bioperl.org/wiki/HOWTO:SeqIO#Formats > I didn't see (at quick glance) support for this in Bioperl, but maybe > someone can point me in the right direction? I expect that (like in Biopython) you can treat color space FASTA + QUAL just like sequence space files, provided you are happy to interpret the color space strings yourself. Are you hoping to get BioPerl to convert the color space data into sequence space data for you? Peter From cjfields at illinois.edu Thu Mar 11 13:02:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 12:02:43 -0600 Subject: [Bioperl-l] GFF to GTF converter In-Reply-To: References: Message-ID: <8CB58FD4-633F-4711-A2F4-23D00AEB6FB8@illinois.edu> On Mar 11, 2010, at 9:56 AM, Alexander Kanapin wrote: > Hi BioPerl gurus, > > Does anybody knows a reliable GFF to GTF converter which can generate files acceptable by cufflinks ? > > We attempted to convert a drosophila and worm genome GFFs (taken from Flybase and Wormbase ftp) to GTF with Bio::FeatureIO > > #read from a file > my $in = Bio::FeatureIO->new(-file => $infile , -format => 'GFF'); > > #write out features > my $out = Bio::FeatureIO->new(-file => ">$outfile" , > -format => 'GFF' , > -version => 2.5); > > However, we discovered that the resulting file is not compliant with GTF format specifications as they are described here: http://mblab.wustl.edu/GTF22.html Just so this is clear, even though the FeatureIO docs currently state (and I quote): "[Bio::FeatureIO] is the officially sanctioned way of getting at the format objects, which most people should use." it is nowhere near complete, so I have removed said quote from main trunk and replaced with it a very explicit caveat about it's current state, i.e. highly experimental and not currently suggested for production use. It's basically half-baked right now; I am in the midst of refactoring Bio::FeatureIO to try getting it up to speed and to add in flexibility when parsing this data (I'm actually working on it right now), but it's early days on that and may take a bit. Do realize that, even with a refactored FeatureIO, this is one of the more significant problems with GTF, e.g. there are too many definitions of what constitutes GTF or GFF2, so no clear path on how to go about this. At this point most users end up writing up their own parsers, unfortunately. > Although, this chunk of code produces CDS and exon entries in the output file, it does not output start codon/stop codon annotations. > Also, we think it misinterprets annotations, so that one do see UTR entries annotated as CDS' or exons. The start/stop codons can normally be inferred from the CDS/UTRs and exons if they are provided, but again this is one of those issues where there isn't a lot of consistency with the data across various data sources (something addressed at the recent GMOD meeting). What is the source of your GFF? > Many thanks for ideas/notes. > > Alex > > -- > Alexander Kanapin, PhD > Scientific Associate > > Ontario Institute for Cancer Research > MaRS Centre, South Tower > 101 College Street, Suite 800 > Toronto, Ontario, Canada M5G 0A3 > Tel: 647-260-7993 > Toll-free: 1-866-678-6427 > www.oicr.on.ca > This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. chris From jessica.sun at gmail.com Thu Mar 11 14:38:21 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Thu, 11 Mar 2010 14:38:21 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation Message-ID: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> *I downloaded module *>* > Bio-SCF from CPAN. *>* > And I am trying to install it when I got the following error. Can *>* someone help? Thanks much in advance Note (probably harmless): No library found for -lstaden-read Writing Makefile for Bio::SCF how to obtain the missing library * -- Jessica Jingping Sun From cjfields at illinois.edu Thu Mar 11 14:49:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 13:49:51 -0600 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <62CF899F-7C31-49F0-8F5E-C99B2179F3A5@illinois.edu> Did you read the documentation for Bio-SCF? http://cpansearch.perl.org/src/LDS/Bio-SCF-1.03/INSTALL chris On Mar 11, 2010, at 1:38 PM, Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Mar 11 15:00:58 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 15:00:58 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <4536f7701003111200y7d194b3cp2aabb558dcbea5ca@mail.gmail.com> Hello Jessica, You need the Staden io-lib: http://staden.sourceforge.net/ It looks like 1.12.2 is the most recent release. Scott On Thu, Mar 11, 2010 at 2:38 PM, Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From rmb32 at cornell.edu Thu Mar 11 15:02:28 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 11 Mar 2010 12:02:28 -0800 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <4B994C54.50501@cornell.edu> Hello Jessica, For Bio-SCF, you have to have the staden package installed. See the INSTALL notes included in the Bio-SCF distribution. The easiest way to view the INSTALL notes for a perl module's distribution: - go to http://search.cpan.org/ - search for 'Bio::SCF' - click the link to the Bio-SCF-1.03 distribution you see in the search results - the page linked here describes the installation package that Bio::SCF comes in. - On that page, you will see a link to the INSTALL notes for it. This is a good thing to know how to do when you have problems with other perl modules as well. But yes, as Chris said, those installation notes direct you to install the staden io-lib libraries from staden.sourceforge.net. Rob Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > From jessica.sun at gmail.com Thu Mar 11 15:49:49 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Thu, 11 Mar 2010 15:49:49 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <4B994C54.50501@cornell.edu> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> <4B994C54.50501@cornell.edu> Message-ID: <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> Thanks, I got it resolve. Do any one knows how to add a scale of the blast hit image through Bio:Graphics, I mean the rectangle should be difference width rather than the same at the example. shown here http://www.bioperl.org/wiki/HOWTO:Graphics Thanks, On Thu, Mar 11, 2010 at 3:02 PM, Robert Buels wrote: > Hello Jessica, > > For Bio-SCF, you have to have the staden package installed. See the > INSTALL notes included in the Bio-SCF distribution. > > The easiest way to view the INSTALL notes for a perl module's distribution: > - go to http://search.cpan.org/ > - search for 'Bio::SCF' > - click the link to the Bio-SCF-1.03 distribution you see in the search > results > - the page linked here describes the installation package that Bio::SCF > comes in. > - On that page, you will see a link to the INSTALL notes for it. > > This is a good thing to know how to do when you have problems with other > perl modules as well. > > > But yes, as Chris said, those installation notes direct you to install the > staden io-lib libraries from staden.sourceforge.net. > > Rob > > Jessica Sun wrote: > >> *I downloaded module >> >> *>* > Bio-SCF from CPAN. >> *>* > And I am trying to install it when I got the following error. Can >> *>* someone help? Thanks much in advance >> Note (probably harmless): No library found for -lstaden-read >> Writing Makefile for Bio::SCF >> >> how to obtain the missing library >> >> >> * >> >> >> >> > -- Jessica Jingping Sun From scott at scottcain.net Thu Mar 11 16:33:47 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 16:33:47 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> <4B994C54.50501@cornell.edu> <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> Message-ID: <4536f7701003111333q2105c71ftdab0c0b71372ba9f@mail.gmail.com> Hello Jessica, A few things: * It would be better to start a new thread to ask an unrelated question, since people may see the subject of this thread and ignore it if they don't know the answer to the original question. * Can you please try to ask your question again, with more details? Like what have you done already, what was the result, and what would you like for it to look like. If you want it to look like something that is on the wiki, link to that something. The Howto page you linked to has lots of pictures on it. Scott On Thu, Mar 11, 2010 at 3:49 PM, Jessica Sun wrote: > Thanks, I got it resolve. > > Do any one knows how to add a scale of the blast hit image through > Bio:Graphics, I mean the rectangle should be difference width rather than > the same at the example. shown here > > http://www.bioperl.org/wiki/HOWTO:Graphics > > > > Thanks, > > > > On Thu, Mar 11, 2010 at 3:02 PM, Robert Buels wrote: > >> Hello Jessica, >> >> For Bio-SCF, you have to have the staden package installed. ?See the >> INSTALL notes included in the Bio-SCF distribution. >> >> The easiest way to view the INSTALL notes for a perl module's distribution: >> ?- go to http://search.cpan.org/ >> ?- search for 'Bio::SCF' >> ?- click the link to the Bio-SCF-1.03 distribution you see in the search >> results >> ?- the page linked here describes the installation package that Bio::SCF >> comes in. >> ?- On that page, you will see a link to the INSTALL notes for it. >> >> This is a good thing to know how to do when you have problems with other >> perl modules as well. >> >> >> But yes, as Chris said, those installation notes direct you to install the >> staden io-lib libraries from staden.sourceforge.net. >> >> Rob >> >> Jessica Sun wrote: >> >>> *I downloaded module >>> >>> *>* > Bio-SCF from CPAN. >>> *>* > And I am trying to install it when I got the following error. Can >>> *>* someone help? Thanks much in advance >>> Note (probably harmless): No library found for -lstaden-read >>> Writing Makefile for Bio::SCF >>> >>> how to obtain the missing library >>> >>> >>> * >>> >>> >>> >>> >> > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From golharam at umdnj.edu Thu Mar 11 21:19:37 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 11 Mar 2010 21:19:37 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> Message-ID: <4B99A4B9.1070901@umdnj.edu> Not convert the sequences, just read the sequence file and allow me to process each one individually, sort of like: $seqio = new Bio::Seq(...) while ($seq = $seqio->next_seq) { ... } Peter wrote: > On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: >> Does Bioperl support color-space sequences, or FASTA formatted quality value >> files? >> >> ABI's Solid platform generates a number of files, two of which are fairly >> important (at the moment): >> >> 1) .csfasta >> >> Color-space sequences in FASTA format >> >> 2) .qual >> >> Quality values of each color call, also in FASTA format. > > You mean the QUAL format which was originally introduced by PHRED. > Try "qual" as the format name in SeqIO, > http://bioperl.org/wiki/HOWTO:SeqIO#Formats > >> I didn't see (at quick glance) support for this in Bioperl, but maybe >> someone can point me in the right direction? > > I expect that (like in Biopython) you can treat color space FASTA + QUAL > just like sequence space files, provided you are happy to interpret the > color space strings yourself. > > Are you hoping to get BioPerl to convert the color space data into > sequence space data for you? > > Peter > From cjfields at illinois.edu Thu Mar 11 22:35:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 21:35:50 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B99A4B9.1070901@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> Message-ID: Ryan, We would have to see example files to get an idea of how feasible it is. You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual stream, and interleave the two somehow. However, BioPerl qual scores are PHRED-based by default, and I'm not sure how color-space data would work within that schematic. chris On Mar 11, 2010, at 8:19 PM, Ryan Golhar wrote: > Not convert the sequences, just read the sequence file and allow me to > process each one individually, sort of like: > > $seqio = new Bio::Seq(...) > while ($seq = $seqio->next_seq) { > ... > } > > Peter wrote: >> On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: >>> Does Bioperl support color-space sequences, or FASTA formatted quality value >>> files? >>> >>> ABI's Solid platform generates a number of files, two of which are fairly >>> important (at the moment): >>> >>> 1) .csfasta >>> >>> Color-space sequences in FASTA format >>> >>> 2) .qual >>> >>> Quality values of each color call, also in FASTA format. >> You mean the QUAL format which was originally introduced by PHRED. >> Try "qual" as the format name in SeqIO, >> http://bioperl.org/wiki/HOWTO:SeqIO#Formats >>> I didn't see (at quick glance) support for this in Bioperl, but maybe >>> someone can point me in the right direction? >> I expect that (like in Biopython) you can treat color space FASTA + QUAL >> just like sequence space files, provided you are happy to interpret the >> color space strings yourself. >> Are you hoping to get BioPerl to convert the color space data into >> sequence space data for you? >> Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Fri Mar 12 02:28:20 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 12 Mar 2010 07:28:20 +0000 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3969F1.8080002@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <4A3969F1.8080002@sendu.me.uk> Message-ID: <358f4d651003112328g2864ef1as7b8c44ce7bb77c82@mail.gmail.com> > I think not. Well, at least SeqFeature::Store doesn't scale. Try storing > millions of features in a database and watch it crawl to complete > unusability. I can't imagine a db scaling to holding hundreds of TB of data > either. I'm also not sure what the benefit is. There are already high-speed > ways of indexing your fastq or bam files. Hi Sendu, What are the available options to have a quick indexing of fastq files that can be integrated into bioperl? Bio::Index::fastq can be painfully slow for the latest Illumina runs... Cheers, Albert. From biopython at maubp.freeserve.co.uk Fri Mar 12 05:06:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Mar 2010 10:06:46 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> Message-ID: <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: > Ryan, > > We would have to see example files to get an idea of how feasible it is. >?You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual > stream, and interleave the two somehow. ?However, BioPerl qual > scores are PHRED-based by default, and I'm not sure how color-space > data would work within that schematic. > > chris Chris, I am under the (possibly mistaken) assumption that PHRED scores are used for SOLiD color space QUAL files - the key issue is each score corresponds to the color call in the color sequence. Ignoring color-space for a moment, are there BioPerl examples of iterating over a pair of sequence-space FASTA and QUAL files? i.e. What you'd get if you had a FASTQ file to iterate over. [I guess Ryan could just merge the color-space FASTA and QUAL into a color-space FASTQ file and iterate over that] Peter From cjfields at illinois.edu Fri Mar 12 08:04:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 12 Mar 2010 07:04:53 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> Message-ID: <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> On Mar 12, 2010, at 4:06 AM, Peter wrote: > On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: >> Ryan, >> >> We would have to see example files to get an idea of how feasible it is. >> You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual >> stream, and interleave the two somehow. However, BioPerl qual >> scores are PHRED-based by default, and I'm not sure how color-space >> data would work within that schematic. >> >> chris > > Chris, > > I am under the (possibly mistaken) assumption that PHRED scores > are used for SOLiD color space QUAL files - the key issue is each > score corresponds to the color call in the color sequence. > > Ignoring color-space for a moment, are there BioPerl examples > of iterating over a pair of sequence-space FASTA and QUAL files? > i.e. What you'd get if you had a FASTQ file to iterate over. > > [I guess Ryan could just merge the color-space FASTA and > QUAL into a color-space FASTQ file and iterate over that] > > Peter If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things. Iterating over pairs is something that has popped up before. For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like: -------------------------------- my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq'); my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta'); my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual'); while (my $seq = $in->next_seq) { $out1->write_fasta($seq); $out2->write_fasta($seq); } -------------------------------- Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'. This should work for those as well, just haven't tried it myself (it's a bug otherwise). I'm assuming for input it would be something like: -------------------------------- my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta'); my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual'); my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq'); # 'qual' parser joins the two streams while (my $seq = $in2->next_seq($in1)) { $out->write_seq($seq); } -------------------------------- chris From biopython at maubp.freeserve.co.uk Fri Mar 12 08:26:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Mar 2010 13:26:39 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9A3D14.3010208@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> <4B9A3D14.3010208@umdnj.edu> Message-ID: <320fb6e01003120526x7c0c3dddjb4e1422a41968894@mail.gmail.com> On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: > > Here is an example of a color-space sequence: > > In one file (something.csfasta): > >>1_30_226_F3 > T210320010.200.03.0110320320220212200122200.2220200 >>1_30_252_F3 > T322220212.133.00.2202322132022202221002011.0011020 > > The '.' means the color could not be called > > In another file (something.qual): > >>1_30_226_F3 > 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 > 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>1_30_252_F3 > 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 > 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 > > The -1 represents those colors that could not be called. Now that is funny (using -1). True PHRED scores are defined with a logarithm and can't be negative. A score of zero is normally used in this situation since that maps to a probability of error of 1 (i.e. the read is 100% wrong, or 0% true). Where did these files come from? Direct from a sequencing machine or via some third party script? Peter From golharam at umdnj.edu Fri Mar 12 08:43:01 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 12 Mar 2010 13:43:01 +0000 Subject: [Bioperl-l] Next Gen Formats Message-ID: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Direct from sequencing machine ------Original Message------ From: Peter Sender: p.j.a.cock at googlemail.com To: golharam at umdnj.edu Cc: Chris Fields Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Next Gen Formats Sent: Mar 12, 2010 8:26 AM On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: > > Here is an example of a color-space sequence: > > In one file (something.csfasta): > >>1_30_226_F3 > T210320010.200.03.0110320320220212200122200.2220200 >>1_30_252_F3 > T322220212.133.00.2202322132022202221002011.0011020 > > The '.' means the color could not be called > > In another file (something.qual): > >>1_30_226_F3 > 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 > 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>1_30_252_F3 > 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 > 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 > > The -1 represents those colors that could not be called. Now that is funny (using -1). True PHRED scores are defined with a logarithm and can't be negative. A score of zero is normally used in this situation since that maps to a probability of error of 1 (i.e. the read is 100% wrong, or 0% true). Where did these files come from? Direct from a sequencing machine or via some third party script? Peter Sent from my Verizon Wireless BlackBerry From cjfields at illinois.edu Fri Mar 12 09:06:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 12 Mar 2010 08:06:51 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> References: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Message-ID: For the colorspace fasta we could derive a parser just for that based on the current fasta parser. They could retain their original color space designation (maybe via a meta designation), and possibly convert to sequence calls based on their mapping (if the following link is current): http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf Did the sequencing facility provide the actual sequence, though, and not just the color calls and qual? Seems strange to not provide it... chris On Mar 12, 2010, at 7:43 AM, Ryan Golhar wrote: > Direct from sequencing machine > > ------Original Message------ > From: Peter > Sender: p.j.a.cock at googlemail.com > To: golharam at umdnj.edu > Cc: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Next Gen Formats > Sent: Mar 12, 2010 8:26 AM > > On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: >> >> Here is an example of a color-space sequence: >> >> In one file (something.csfasta): >> >>> 1_30_226_F3 >> T210320010.200.03.0110320320220212200122200.2220200 >>> 1_30_252_F3 >> T322220212.133.00.2202322132022202221002011.0011020 >> >> The '.' means the color could not be called >> >> In another file (something.qual): >> >>> 1_30_226_F3 >> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 >> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>> 1_30_252_F3 >> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 >> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 >> >> The -1 represents those colors that could not be called. > > Now that is funny (using -1). True PHRED scores are defined with a > logarithm and can't be negative. A score of zero is normally used in > this situation since that maps to a probability of error of 1 (i.e. the > read is 100% wrong, or 0% true). > > Where did these files come from? Direct from a sequencing > machine or via some third party script? > > Peter > > > Sent from my Verizon Wireless BlackBerry > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Fri Mar 12 08:09:40 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 12 Mar 2010 08:09:40 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> Message-ID: <4B9A3D14.3010208@umdnj.edu> Here is an example of a color-space sequence: In one file (something.csfasta): >1_30_226_F3 T210320010.200.03.0110320320220212200122200.2220200 >1_30_252_F3 T322220212.133.00.2202322132022202221002011.0011020 The '.' means the color could not be called In another file (something.qual): >1_30_226_F3 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >1_30_252_F3 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 The -1 represents those colors that could not be called. Chris Fields wrote: > On Mar 12, 2010, at 4:06 AM, Peter wrote: > >> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: >>> Ryan, >>> >>> We would have to see example files to get an idea of how feasible it is. >>> You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual >>> stream, and interleave the two somehow. However, BioPerl qual >>> scores are PHRED-based by default, and I'm not sure how color-space >>> data would work within that schematic. >>> >>> chris >> Chris, >> >> I am under the (possibly mistaken) assumption that PHRED scores >> are used for SOLiD color space QUAL files - the key issue is each >> score corresponds to the color call in the color sequence. >> >> Ignoring color-space for a moment, are there BioPerl examples >> of iterating over a pair of sequence-space FASTA and QUAL files? >> i.e. What you'd get if you had a FASTQ file to iterate over. >> >> [I guess Ryan could just merge the color-space FASTA and >> QUAL into a color-space FASTQ file and iterate over that] >> >> Peter > > If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things. > > Iterating over pairs is something that has popped up before. For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like: > > -------------------------------- > my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq'); > my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta'); > my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual'); > > while (my $seq = $in->next_seq) { > $out1->write_fasta($seq); > $out2->write_fasta($seq); > } > -------------------------------- > > Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'. This should work for those as well, just haven't tried it myself (it's a bug otherwise). > > I'm assuming for input it would be something like: > > -------------------------------- > my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta'); > my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual'); > my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq'); > > # 'qual' parser joins the two streams > while (my $seq = $in2->next_seq($in1)) { > $out->write_seq($seq); > } > -------------------------------- > > chris > > From pmiguel at purdue.edu Fri Mar 12 09:56:33 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 12 Mar 2010 09:56:33 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: References: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Message-ID: <4B9A5621.2020006@purdue.edu> Hi Chris, Converting back and forth from color space is something that would be needed. However, a warning for anyone working with color space data: It is a really bad idea to convert raw color space reads into sequence. This is because conversion propagates from the key base on the left to the right. A sequence error *anywhere* in the sequence will ensure all bases farther down will be converted on the wrong track. Analogous to a "frame shift" -- except there are 4 "frames", not 3. Meanwhile, the converse is not true--sequence space bases can be converted into color space without error propagation. So you want to do all your work in color space and convert to real sequence only at the end, when your consensus certain. A little more detail here: http://seqanswers.com/forums/showthread.php?t=3367 For people wanting to use a non-color space aware program for analysis of color space data, it is possible to use a process called "double encoding", where 0,1,2,3 bases of color space are just replaced with A, C, G, T of a "fake" base space. This is nearly the same as working in color space and does not incur the propagation error issues. However it is fraught with the obvious problems: you might later confuse the double encoded sequence with true sequence space with likely maddening results. Also, to get the opposite strand of color space reads you reverse without complementing. So top and bottom strands will look different. Finally, Kevin McKernan said that the dual base encoding error-detection scheme was technically using "Perforated Convolutional Codes" and said these were used on 3G networks. I only mention this in case there are some engineering types who might be interested. Phillip Chris Fields wrote: > For the colorspace fasta we could derive a parser just for that based on the current fasta parser. They could retain their original color space designation (maybe via a meta designation), and possibly convert to sequence calls based on their mapping (if the following link is current): > > http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf > > Did the sequencing facility provide the actual sequence, though, and not just the color calls and qual? Seems strange to not provide it... > > chris > > On Mar 12, 2010, at 7:43 AM, Ryan Golhar wrote: > > >> Direct from sequencing machine >> >> ------Original Message------ >> From: Peter >> Sender: p.j.a.cock at googlemail.com >> To: golharam at umdnj.edu >> Cc: Chris Fields >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Next Gen Formats >> Sent: Mar 12, 2010 8:26 AM >> >> On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: >> >>> Here is an example of a color-space sequence: >>> >>> In one file (something.csfasta): >>> >>> >>>> 1_30_226_F3 >>>> >>> T210320010.200.03.0110320320220212200122200.2220200 >>> >>>> 1_30_252_F3 >>>> >>> T322220212.133.00.2202322132022202221002011.0011020 >>> >>> The '.' means the color could not be called >>> >>> In another file (something.qual): >>> >>> >>>> 1_30_226_F3 >>>> >>> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 >>> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>> >>>> 1_30_252_F3 >>>> >>> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 >>> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 >>> >>> The -1 represents those colors that could not be called. >>> >> Now that is funny (using -1). True PHRED scores are defined with a >> logarithm and can't be negative. A score of zero is normally used in >> this situation since that maps to a probability of error of 1 (i.e. the >> read is 100% wrong, or 0% true). >> >> Where did these files come from? Direct from a sequencing >> machine or via some third party script? >> >> Peter >> >> >> Sent from my Verizon Wireless BlackBerry >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Fri Mar 12 10:44:35 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 12 Mar 2010 07:44:35 -0800 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: <30E5CA8A-56DE-4764-9A50-DF2E95015216@gmail.com> References: <4B96B442.8070003@bioperl.org> <30E5CA8A-56DE-4764-9A50-DF2E95015216@gmail.com> Message-ID: <4B9A6163.9060407@bioperl.org> I'm sure it does, that what it is supposed to do. I don't know that there is any way to directly get what you want but the code since the format that you want is not a standard multiple-alignment output format. You might consider clustalw format which shows the identical columns with '*' and you can keep the start/stop of the alignment embedded in the sequence names. Or you can extract the code you need that does the writing out of the writer module so you can try and dig out what you need. You're asking for something that is a customized view that is not standard and the tools for it are in the existing code, so it means you need to roll your view own from it. This would just mean another ResultWriter module that looks a lot like the existing one, but doesn't write the header and footer and hit table out - so those methods would just not do anything... -jason Janine Arloth wrote, On 3/12/10 12:40 AM: > Hi, > thanks... > but > > use Bio::SearchIO; > use Bio::SearchIO::Writer::TextResultWriter; > > my $in = Bio::SearchIO->new(-format => 'blast', > -file => shift @ARGV); > > my $writer = Bio::SearchIO::Writer::TextResultWriter->new(); > my $out = Bio::SearchIO->new(-writer => $writer); > $out->write_result($in->next_result); > > gives me the whole result, but I only need the alignment ;( > Am 09.03.2010 um 21:49 schrieb Jason Stajich: > > >> SearchIO writer -> BLAST format. presumably something like Bio::SearchIO::Writer::TextResultWriter >> >> Janine Arloth wrote, On 3/5/10 1:43 AM: >> >>> Hello, >>> using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: >>> >>> 1 50 >>> test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA >>> lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA >>> >>> >>> 51 100 >>> test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA >>> lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA >>> >>> >>> >>> But I prefer this format: >>> >>> >>> >>> Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 >>> >>> Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 >>> ||||||||||||||||||||||||| >>> Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 >>> >>> >>> How can I get this? >>> >>> Best Regards >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > From maj at fortinbras.us Fri Mar 12 10:45:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 12 Mar 2010 10:45:15 -0500 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: References: Message-ID: <31AA49FD0FDD466CB349ABAE75591B26@NewLife> along with Jason's comment then you'll need to loop through the node pairs by hand: my @leaves = $tree->get_leaf_nodes; my @dists; while (my $l = shift @leaves) { foreach my $m (@leaves) { push @dists, $tree->distance( -nodes => [$l, $m] ); } } should give you all n(n-1)/2 pairwise distances. ----- Original Message ----- From: "Jeffrey Detras" To: Sent: Friday, March 05, 2010 1:17 AM Subject: [Bioperl-l] distances between leaf nodes > Hi, > > I am new at using the Bio::TreeIO module specifically using the newick > format for a phylogenetic analysis. The sample_tree attached is > Newick-formatted tree. My objective is to get all the distances between all > the leaf nodes. I copied examples of the code from > http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my > knowledge) so that I understand how to assign the right array value for the > nodes/leaves. The message would say must provide 2 root nodes. > > Here is what I have right now: > > #!/usr/bin/perl -w > use strict; > > my $treefile = 'sample_tree'; > use Bio::TreeIO; > my $treeio = Bio::TreeIO->new(-format => 'newick', > -file => $treefile); > > while (my $tree = $treeio->next_tree) { > my @leaves = $tree->get_leaf_nodes; > for (my $dist = $tree->distance(-nodes => \@leaves)){ > print "Distance between trees is $dist\n"; > } > } > > Thanks, > Jeff > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rtbio.2009 at gmail.com Fri Mar 12 12:36:44 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 12 Mar 2010 18:36:44 +0100 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Hello all, I am trying remote blast program and connecting to NCBI Blast, but I am unable to retrieve the sequences. Chris had suggested me to update from SVN. Could you please tell me how to update it from SVN? Regards, Roopa. On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: > Hi Chris, > > Thank you very much for the information. Could you please tell me how to > update it from SVN? > > Thanks and regards, > Roopa > > > On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: > >> Roopa, >> >> I committed a fix for this a few days ago; if you update from SVN it >> should work. The problem stemmed from server-side changes at NCBI. >> >> chris >> >> On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: >> >> > Hello Mark and everybody, >> > >> > I have been trying to connect to remote blast to retrieve similar >> sequences >> > to a given sequence. But my program is unable to retrieve the sequences >> from >> > BLAST, i.e., it is getting executed till the remote blast ids, but it is >> not >> > entering the else loop after collecting the rid. Please check this >> problem >> > and help me in this regard. I think the problem is in getting the >> sequence >> > and going to the 'else' part. i.e., >> > >> > else { >> > >> > open(OUTFILE,'>',$blastdebugfile); # I think the problem >> is >> > in else part, i.e., it is not taking the next result.# >> > print OUTFILE "else entered"; >> > close(OUTFILE); >> > >> > my $result = $rc->next_result(); >> > >> > #save the output >> > >> > Please give me your reply. >> > >> > Thanks and regards, >> > Roopa. >> > >> > My code is as follows. >> > >> > #!/usr/bin/perl >> > >> > #path for extra camel module >> > use lib "/srv/www/htdocs/rain/RNAi/"; >> > use rnai_blast; >> > >> > >> > use Bio::SearchIO; >> > use Bio::Search::Result::BlastResult; >> > use Bio::Perl; >> > use Bio::Tools::Run::RemoteBlast; >> > use Bio::Seq; >> > use Bio::SeqIO; >> > use Bio::DB::GenBank; >> > >> > $serverpath = "/srv/www/htdocs/rain/RNAi"; >> > $serverurl = "http://141.84.66.66/rain/RNAi"; >> > $outfile = $serverpath."/rnairesult_".time().".html"; >> > $nuc = $serverpath."/nuc".time().".txt"; >> > $debugfile = $serverpath."/debug_".time().".txt"; >> > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> > >> > my $outstring =""; >> > >> > &parse_form; >> > >> > print "Content-type: text/html\n\n"; >> > print "\n"; >> > print "RNAi Result"; >> > print "> > URL=$serverurl/rnairesult_".time().".html\"> \n"; >> > print "\n"; >> > print "\n"; >> > print " Your results will appear > > href=$serverurl/rnairesult_".time().".html>here
"; >> > print " Please be patient, runtime can be up to 5 minutes
"; >> > print " This page will automatically reload in 30 seconds."; >> > print "\n"; >> > print "\n"; >> > >> > defined(my $pid = fork) or die "Can't fork: $!"; >> > exit if $pid; >> > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >> > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >> > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >> > >> > >> > >> > open(OUTFILE, '>',$outfile); >> > >> > print OUTFILE "\n >> > RNAi Result >> > > > URL=$serverurl//rnairesult_".time().".html\"> \n >> > >> > \n >> > \n >> > Your results will appear > > href=$serverurl/rnairesult_".time().".html>here
>> > Please be patient, runtime can be up to 5 minutes
>> > This page will automatically reload in 30 seconds
>> > \n >> > \n"; >> > >> > close(OUTFILE); >> > >> > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >> > >> > $in{'Inputseq'} =~ s/>.*$//m; >> > $in{'Inputseq'} =~ s/[^TAGC]//gim; >> > $in{'Inputseq'} =~ tr/actg/ACTG/; >> > >> > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >> > $in{'Threshold'}); >> > >> > >> > sub blastcode >> > { >> > >> > $inpu1= $_[0]; >> > >> > $organ= $_[1]; >> > >> > open(NUC,'>',$nuc); >> > print NUC $inpu1,"\n"; >> > close(NUC); >> > >> > my $prog = 'blastn'; >> > my $db = 'refseq_rna'; >> > my $e_val= '1e-10'; >> > my $organism= $organ; >> > >> > $gb = new Bio::DB::GenBank; >> > >> > my @params = ( '-prog' => $prog, >> > '-data' => $db, >> > '-expect' => $e_val, >> > '-readmethod' => 'SearchIO', >> > '-Organism' => $organism ); >> > >> > open(OUTFILE,'>',$blastdebugfile); >> > print OUTFILE @params; >> > close(OUTFILE); >> > >> > >> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >> => >> > "$organ\[ORGN]"); >> > >> > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> > >> > #change a paramter >> > >> > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> > Brucei[ORGN]'; >> > >> > #change a paramter >> > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >> '$input2[ORGN]'; >> > >> > my $v = 1; >> > #$v is just to turn on and off the messages >> > >> > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >> > '-organism' => "$organ\[ORGN]"); >> > >> > while (my $input = $str->next_seq()) >> > { >> > #Blast a sequence against a database: >> > #Alternatively, you could pass in a file with many >> > #sequences rather than loop through sequence one at a time >> > #Remove the loop starting 'while (my $input = $str->next_seq())' >> > #and swap the two lines below for an example of that. >> > open(OUTFILE,'>',$debugfile); >> > print OUTFILE $input; >> > close(OUTFILE); >> > >> > #submits the input data to BLAST# >> > >> > my $r = $factory->submit_blast($input); >> > >> > open(OUTFILE,'>',$debugfile); >> > print OUTFILE $r; >> > close(OUTFILE); >> > >> > >> > print STDERR "waiting...." if($v>0); >> > >> > while ( my @rids = $factory->each_rid ) { >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "while entered"; >> > close(OUTFILE); >> > foreach my $rid ( @rids ) { >> > >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "foreach entered"; >> > close(OUTFILE); >> > #Retrieving the result ids# >> > >> > my $rc = $factory->retrieve_blast($rid); >> > >> > if( !ref($rc) ) >> > { >> > if( $rc < 0 ) >> > { >> > $factory->remove_rid($rid); >> > } >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "if entered"; >> > close(OUTFILE); >> > print STDERR "." if ( $v > 0 ); >> > sleep 5; >> > } >> > >> > else { >> > >> > open(OUTFILE,'>',$blastdebugfile); # I think the problem >> is >> > in else part, i.e., it is not taking the next result.# >> > print OUTFILE "else entered"; >> > close(OUTFILE); >> > >> > my $result = $rc->next_result(); >> > >> > #save the output >> > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> > >> > open(BLASTDEBUGFILE,'>',$blastdebugfile); >> > print BLASTDEBUGFILE $result->next_hit(); >> > close(BLASTDEBUGFILE); >> > #saving the output in blastdata.time.out file# >> > >> > # $random=rand(); >> > >> > my $filename = $serverpath."/blastdata_".time()."\.out"; >> > # open(DEBUGFILE,'>',$debugfile); >> > # open(new,'>',$filename); >> > # @arra=; >> > # print DEBUGFILE @arra; >> > # close(DEBUGFILE); >> > # close(new); >> > >> > $factory->save_output($filename); >> > >> > # open(BLASTDEBUGFILE,'>',$debugfile); >> > # print BLASTDEBUGFILE "Hello $rid"; >> > # close(BLASTDEBUGFILE); >> > >> > $factory->remove_rid($rid); >> > >> > open(BLASTDEBUGFILE,'>',$blastdebugfile); >> > # print BLASTDEBUGFILE $organism; >> > close(BLASTDEBUGFILE); >> > >> > # open(OUTFILE,'>',$outfile); >> > # print OUTFILE "Test2 $result->database_name()"; >> > # close(OUTFILE); >> > >> > #$hit = $result->next_hit; >> > #open(new,'>',$debugfile); >> > #print $hit; >> > #close(new); >> > $dummy=0; >> > while ( my $hit = $result->next_hit ) { >> > >> > next unless ( $v >= 0); >> > >> > # open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "$hit in while hits"; >> > # close(OUTFILE); >> > >> > my $sequ = $gb->get_Seq_by_version($hit->name); >> > my $dna = $sequ->seq(); # get the sequence as a string >> > $dummy++; >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE $dna; >> > close(OUTFILE); >> > push(@seqs,$dna); >> > } >> > } >> > } >> > } >> > } >> > >> > $warum=@seqs; >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE $warum; >> > print OUTFILE @seqs; >> > close(OUTFILE); >> > >> > >> > return(@seqs); #returning the sequences obtained on BLAST# >> > } >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From bosborne11 at verizon.net Fri Mar 12 12:46:52 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 12 Mar 2010 12:46:52 -0500 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Please google "svn update bioperl". On Mar 12, 2010, at 12:36 PM, Roopa Raghuveer wrote: > Hello all, > > I am trying remote blast program and connecting to NCBI Blast, but I am > unable to retrieve the sequences. Chris had suggested me to update from SVN. > Could you please tell me how to update it from SVN? > > Regards, > Roopa. > > On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: > >> Hi Chris, >> >> Thank you very much for the information. Could you please tell me how to >> update it from SVN? >> >> Thanks and regards, >> Roopa >> >> >> On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: >> >>> Roopa, >>> >>> I committed a fix for this a few days ago; if you update from SVN it >>> should work. The problem stemmed from server-side changes at NCBI. >>> >>> chris >>> >>> On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: >>> >>>> Hello Mark and everybody, >>>> >>>> I have been trying to connect to remote blast to retrieve similar >>> sequences >>>> to a given sequence. But my program is unable to retrieve the sequences >>> from >>>> BLAST, i.e., it is getting executed till the remote blast ids, but it is >>> not >>>> entering the else loop after collecting the rid. Please check this >>> problem >>>> and help me in this regard. I think the problem is in getting the >>> sequence >>>> and going to the 'else' part. i.e., >>>> >>>> else { >>>> >>>> open(OUTFILE,'>',$blastdebugfile); # I think the problem >>> is >>>> in else part, i.e., it is not taking the next result.# >>>> print OUTFILE "else entered"; >>>> close(OUTFILE); >>>> >>>> my $result = $rc->next_result(); >>>> >>>> #save the output >>>> >>>> Please give me your reply. >>>> >>>> Thanks and regards, >>>> Roopa. >>>> >>>> My code is as follows. >>>> >>>> #!/usr/bin/perl >>>> >>>> #path for extra camel module >>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>> use rnai_blast; >>>> >>>> >>>> use Bio::SearchIO; >>>> use Bio::Search::Result::BlastResult; >>>> use Bio::Perl; >>>> use Bio::Tools::Run::RemoteBlast; >>>> use Bio::Seq; >>>> use Bio::SeqIO; >>>> use Bio::DB::GenBank; >>>> >>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>> $nuc = $serverpath."/nuc".time().".txt"; >>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>> >>>> my $outstring =""; >>>> >>>> &parse_form; >>>> >>>> print "Content-type: text/html\n\n"; >>>> print "\n"; >>>> print "RNAi Result"; >>>> print ">>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>> print "\n"; >>>> print "\n"; >>>> print " Your results will appear >>> href=$serverurl/rnairesult_".time().".html>here
"; >>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>> print " This page will automatically reload in 30 seconds."; >>>> print "\n"; >>>> print "\n"; >>>> >>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>> exit if $pid; >>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>> >>>> >>>> >>>> open(OUTFILE, '>',$outfile); >>>> >>>> print OUTFILE "\n >>>> RNAi Result >>>> >>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>> >>>> \n >>>> \n >>>> Your results will appear >>> href=$serverurl/rnairesult_".time().".html>here
>>>> Please be patient, runtime can be up to 5 minutes
>>>> This page will automatically reload in 30 seconds
>>>> \n >>>> \n"; >>>> >>>> close(OUTFILE); >>>> >>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>> >>>> $in{'Inputseq'} =~ s/>.*$//m; >>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>> >>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>> $in{'Threshold'}); >>>> >>>> >>>> sub blastcode >>>> { >>>> >>>> $inpu1= $_[0]; >>>> >>>> $organ= $_[1]; >>>> >>>> open(NUC,'>',$nuc); >>>> print NUC $inpu1,"\n"; >>>> close(NUC); >>>> >>>> my $prog = 'blastn'; >>>> my $db = 'refseq_rna'; >>>> my $e_val= '1e-10'; >>>> my $organism= $organ; >>>> >>>> $gb = new Bio::DB::GenBank; >>>> >>>> my @params = ( '-prog' => $prog, >>>> '-data' => $db, >>>> '-expect' => $e_val, >>>> '-readmethod' => 'SearchIO', >>>> '-Organism' => $organism ); >>>> >>>> open(OUTFILE,'>',$blastdebugfile); >>>> print OUTFILE @params; >>>> close(OUTFILE); >>>> >>>> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>> => >>>> "$organ\[ORGN]"); >>>> >>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>> >>>> #change a paramter >>>> >>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>> Brucei[ORGN]'; >>>> >>>> #change a paramter >>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>> '$input2[ORGN]'; >>>> >>>> my $v = 1; >>>> #$v is just to turn on and off the messages >>>> >>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>> '-organism' => "$organ\[ORGN]"); >>>> >>>> while (my $input = $str->next_seq()) >>>> { >>>> #Blast a sequence against a database: >>>> #Alternatively, you could pass in a file with many >>>> #sequences rather than loop through sequence one at a time >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>> #and swap the two lines below for an example of that. >>>> open(OUTFILE,'>',$debugfile); >>>> print OUTFILE $input; >>>> close(OUTFILE); >>>> >>>> #submits the input data to BLAST# >>>> >>>> my $r = $factory->submit_blast($input); >>>> >>>> open(OUTFILE,'>',$debugfile); >>>> print OUTFILE $r; >>>> close(OUTFILE); >>>> >>>> >>>> print STDERR "waiting...." if($v>0); >>>> >>>> while ( my @rids = $factory->each_rid ) { >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "while entered"; >>>> close(OUTFILE); >>>> foreach my $rid ( @rids ) { >>>> >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "foreach entered"; >>>> close(OUTFILE); >>>> #Retrieving the result ids# >>>> >>>> my $rc = $factory->retrieve_blast($rid); >>>> >>>> if( !ref($rc) ) >>>> { >>>> if( $rc < 0 ) >>>> { >>>> $factory->remove_rid($rid); >>>> } >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "if entered"; >>>> close(OUTFILE); >>>> print STDERR "." if ( $v > 0 ); >>>> sleep 5; >>>> } >>>> >>>> else { >>>> >>>> open(OUTFILE,'>',$blastdebugfile); # I think the problem >>> is >>>> in else part, i.e., it is not taking the next result.# >>>> print OUTFILE "else entered"; >>>> close(OUTFILE); >>>> >>>> my $result = $rc->next_result(); >>>> >>>> #save the output >>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>> >>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>> print BLASTDEBUGFILE $result->next_hit(); >>>> close(BLASTDEBUGFILE); >>>> #saving the output in blastdata.time.out file# >>>> >>>> # $random=rand(); >>>> >>>> my $filename = $serverpath."/blastdata_".time()."\.out"; >>>> # open(DEBUGFILE,'>',$debugfile); >>>> # open(new,'>',$filename); >>>> # @arra=; >>>> # print DEBUGFILE @arra; >>>> # close(DEBUGFILE); >>>> # close(new); >>>> >>>> $factory->save_output($filename); >>>> >>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>> # print BLASTDEBUGFILE "Hello $rid"; >>>> # close(BLASTDEBUGFILE); >>>> >>>> $factory->remove_rid($rid); >>>> >>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>> # print BLASTDEBUGFILE $organism; >>>> close(BLASTDEBUGFILE); >>>> >>>> # open(OUTFILE,'>',$outfile); >>>> # print OUTFILE "Test2 $result->database_name()"; >>>> # close(OUTFILE); >>>> >>>> #$hit = $result->next_hit; >>>> #open(new,'>',$debugfile); >>>> #print $hit; >>>> #close(new); >>>> $dummy=0; >>>> while ( my $hit = $result->next_hit ) { >>>> >>>> next unless ( $v >= 0); >>>> >>>> # open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "$hit in while hits"; >>>> # close(OUTFILE); >>>> >>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>> my $dna = $sequ->seq(); # get the sequence as a string >>>> $dummy++; >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE $dna; >>>> close(OUTFILE); >>>> push(@seqs,$dna); >>>> } >>>> } >>>> } >>>> } >>>> } >>>> >>>> $warum=@seqs; >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE $warum; >>>> print OUTFILE @seqs; >>>> close(OUTFILE); >>>> >>>> >>>> return(@seqs); #returning the sequences obtained on BLAST# >>>> } >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Mar 12 12:41:23 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 12 Mar 2010 12:41:23 -0500 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Look at http://www.bioperl.org/wiki/Using_Subversion ----- Original Message ----- From: Roopa Raghuveer To: Chris Fields ; Mark A. Jensen ; bioperl-l at lists.open-bio.org Sent: Friday, March 12, 2010 12:36 PM Subject: Re: [Bioperl-l] remoteblast Hello all, I am trying remote blast program and connecting to NCBI Blast, but I am unable to retrieve the sequences. Chris had suggested me to update from SVN. Could you please tell me how to update it from SVN? Regards, Roopa. On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: Hi Chris, Thank you very much for the information. Could you please tell me how to update it from SVN? Thanks and regards, Roopa On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: Roopa, I committed a fix for this a few days ago; if you update from SVN it should work. The problem stemmed from server-side changes at NCBI. chris On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: > Hello Mark and everybody, > > I have been trying to connect to remote blast to retrieve similar sequences > to a given sequence. But my program is unable to retrieve the sequences from > BLAST, i.e., it is getting executed till the remote blast ids, but it is not > entering the else loop after collecting the rid. Please check this problem > and help me in this regard. I think the problem is in getting the sequence > and going to the 'else' part. i.e., > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > > Please give me your reply. > > Thanks and regards, > Roopa. > > My code is as follows. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use rnai_blast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds."; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes
> This page will automatically reload in 30 seconds
> \n > \n"; > > close(OUTFILE); > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$blastdebugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > "$organ\[ORGN]"); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > #submits the input data to BLAST# > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > #Retrieving the result ids# > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > #saving the output in blastdata.time.out file# > > # $random=rand(); > > my $filename = $serverpath."/blastdata_".time()."\.out"; > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > # print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dna; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > close(OUTFILE); > > > return(@seqs); #returning the sequences obtained on BLAST# > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jessica.sun at gmail.com Fri Mar 12 16:28:11 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 12 Mar 2010 16:28:11 -0500 Subject: [Bioperl-l] RefSeq Message-ID: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> I have a question: I have a refseq with NM_ number(mRNA), how can I get the genomic sequences(NT_number) with Bioperl, if it can be done? Thanks -- Jessica Jingping Sun From sidd.basu at gmail.com Sat Mar 13 15:29:52 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Sat, 13 Mar 2010 14:29:52 -0600 Subject: [Bioperl-l] Re: RefSeq In-Reply-To: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> References: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> Message-ID: <20100313202949.GA5621@Macintosh-74.local> The following code works with 1.6.1 of bioperl. It uses eutils and the workflow efetch -> elink -> esummary. #!/usr/bin/perl -w use strict; use Bio::DB::EUtilities; my $id = $ARGV[0] || 'NM_001618'; my $eutils = Bio::DB::EUtilities->new( -eutil => 'esearch', -db => 'nucleotide', -term => $id, -usehistory => 'y' ); my $hist = $eutils->next_History || die "no history\n"; $eutils->reset_parameters( -eutil => 'elink', -db => 'gene', -dbfrom => 'nuccore', -history => $hist ); my ($gene_id) = $eutils->next_LinkSet->get_ids; $eutils->reset_parameters( -eutil => 'esummary', -db => 'gene', -id => $gene_id, ); my ($item) = $eutils->next_DocSum->get_Items_by_name('GenomicInfoType'); print $item->get_contents_by_name('ChrAccVer'), "\n"; -siddhartha On Fri, 12 Mar 2010, Jessica Sun wrote: > I have a question: I have a refseq with NM_ number(mRNA), how can I get > the genomic sequences(NT_number) with Bioperl, if it can be done? > > Thanks > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robby.hones at gmail.com Sat Mar 13 18:57:43 2010 From: robby.hones at gmail.com (robby jhones) Date: Sat, 13 Mar 2010 15:57:43 -0800 Subject: [Bioperl-l] comparing fasta sequences in multiple files Message-ID: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> Dear Group, Can anyone offer advice on comparing multiple fasta sequences in many files. We have 1000's of fasta sequences in individual files of which I would like to fish out and print to a new file (the sequence and ID), ONLY the sequences which appear in at least a few of the files: 3 out of 4 runs, perhaps all 4 runs ( as some are replicates). Is there something out there which would do this? Thanks for your helps >>Robby From sdavis2 at mail.nih.gov Sat Mar 13 19:49:46 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 13 Mar 2010 19:49:46 -0500 Subject: [Bioperl-l] comparing fasta sequences in multiple files In-Reply-To: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> References: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> Message-ID: <264855a01003131649o725cf151i2fe51e948ebfc86d@mail.gmail.com> On Sat, Mar 13, 2010 at 6:57 PM, robby jhones wrote: > Dear Group, > > ?Can anyone offer advice on comparing multiple fasta sequences in many > files. We have 1000's of fasta sequences in individual files of which I > would like to fish out and print to a new file (the sequence and ID), ONLY > the sequences which appear in at least a few of the files: 3 out of 4 runs, > perhaps all 4 runs ( as some are replicates). > > ?Is there something out there which would do this? Hi, Robby. It sounds like making a hash of IDs and then incrementing a count for each as you loop over files would give you what you want? Sean From jessica.sun at gmail.com Sat Mar 13 20:29:08 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Sat, 13 Mar 2010 20:29:08 -0500 Subject: [Bioperl-l] RefSeq In-Reply-To: <20100313202949.GA5621@Macintosh-74.local> References: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> <20100313202949.GA5621@Macintosh-74.local> Message-ID: <9adc0e9b1003131729p4f78aa50kc1500cbbe01cd815@mail.gmail.com> Great. Thanks . On Sat, Mar 13, 2010 at 3:29 PM, Siddhartha Basu wrote: > The following code works with 1.6.1 of bioperl. It uses eutils and the > workflow efetch -> elink -> esummary. > > #!/usr/bin/perl -w > > use strict; > use Bio::DB::EUtilities; > > my $id = $ARGV[0] || 'NM_001618'; > > my $eutils = Bio::DB::EUtilities->new( > -eutil => 'esearch', > -db => 'nucleotide', > -term => $id, > -usehistory => 'y' > ); > > my $hist = $eutils->next_History || die "no history\n"; > > $eutils->reset_parameters( > -eutil => 'elink', > -db => 'gene', > -dbfrom => 'nuccore', > -history => $hist > ); > > my ($gene_id) = $eutils->next_LinkSet->get_ids; > > $eutils->reset_parameters( > -eutil => 'esummary', > -db => 'gene', > -id => $gene_id, > ); > > my ($item) = $eutils->next_DocSum->get_Items_by_name('GenomicInfoType'); > print $item->get_contents_by_name('ChrAccVer'), "\n"; > > -siddhartha > > On Fri, 12 Mar 2010, Jessica Sun wrote: > > > I have a question: I have a refseq with NM_ number(mRNA), how can I get > > the genomic sequences(NT_number) with Bioperl, if it can be done? > > > > Thanks > > > > > > -- > > Jessica Jingping Sun > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun From sdavis2 at mail.nih.gov Sun Mar 14 08:38:15 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 14 Mar 2010 07:38:15 -0500 Subject: [Bioperl-l] comparing fasta sequences in multiple files In-Reply-To: <407ea9d41003132312l755b2d9bm5a9d2ba83017fd02@mail.gmail.com> References: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> <264855a01003131649o725cf151i2fe51e948ebfc86d@mail.gmail.com> <407ea9d41003132312l755b2d9bm5a9d2ba83017fd02@mail.gmail.com> Message-ID: <264855a01003140538m6cee0c27s823e45d02002d200@mail.gmail.com> On Sun, Mar 14, 2010 at 2:12 AM, robby jhones wrote: > I think that I'll need to write a hash of the IDs and sequences, then > iterate over the sequences to see if they are identical and if so push them > and the ID into an output file. I was hoping there was something out there > like this, but I suppose not. Look in the mailing list archives for the last week or so. There was some discussion about generating hashes of sequences; you could use that to generate your hash of unique sequences. Sean > On Sat, Mar 13, 2010 at 4:49 PM, Sean Davis wrote: >> >> On Sat, Mar 13, 2010 at 6:57 PM, robby jhones >> wrote: >> > Dear Group, >> > >> > ?Can anyone offer advice on comparing multiple fasta sequences in many >> > files. We have 1000's of fasta sequences in individual files of which I >> > would like to fish out and print to a new file (the sequence and ID), >> > ONLY >> > the sequences which appear in at least a few of the files: 3 out of 4 >> > runs, >> > perhaps all 4 runs ( as some are replicates). >> > >> > ?Is there something out there which would do this? >> >> Hi, Robby. >> >> It sounds like making a hash of IDs and then incrementing a count for >> each as you loop over files would give you what you want? >> >> Sean > > From lpritc at scri.ac.uk Mon Mar 15 07:55:52 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 15 Mar 2010 11:55:52 +0000 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Message-ID: Hi Scott, Thanks for the reply. I tried your suggestions on a clean VM of CentOS 5.4 and the equally wordy outcome is below... On 02/03/2010 Tuesday, March 2, 16:11, "Scott Cain" wrote: > First, I am working on the 1.1 release of gmod/chado, and it > may fix some of the problems you are describing. Certainly, ID > collisions between GFF files should not be a problem (I didn't think > they were in the 1.0 release, but that was a long time ago). Please > try a checkout of the schema trunk in the gmod svn: > > http://gmod.org/wiki/SVN As a note for anyone following this, when I downloaded the trunk/chado files only, my build failed with """ $make [...] Manifying ../blib/man3/Bio::Chaos::ChaosGraph.3pm Manifying ../blib/man3/Bio::Chaos::FeatureUtil.3pm Manifying ../blib/man3/Bio::Chaos::XSLTHelper.3pm Manifying ../blib/man3/Bio::Chaos::Root.3pm make[1]: Leaving directory `/home/lpritc/Desktop/chado/chaos-xml' make: *** No rule to make target `bin/gmod_gff2biomart5.pl', needed by `blib/script/gmod_gff2biomart5.pl'. Stop. """ I had to download the whole trunk for the installation to work. I came across this thread: http://old.nabble.com/Minor-Makefile.PL-changes-td26272744.html while I was looking for a solution; someone else has had a similar problem. > Another thing you may want to look at is that just last week, a > developer at Texas A&M, Nathan Liles, contributed code to the > bioperl-live trunk for the genbank2gff3.pl script that will do a much > better job of converting bacterial genbank files to GFF3; perhaps that > will help too. Working with a svn checkout of bioperl-live shouldn't > be too scary either; the pieces you are interested in (that work with > Chado and GBrowse) are quite stable. I also checked out BioPerl-live. The svn server at code.open-bio.org was unresponsive for a couple of days, but Peter pointed me to GitHub at http://github.com/bioperl/bioperl-live so I went from there. The process isn't quite as clean as using the latest stable version of BioPerl, however. When I attempt to use the bp_genbank2gff3.pl script, I get the following error message: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk Can't locate object method "FT_SO_map" via package "Bio::SeqFeature::Tools::TypeMapper" at /usr/bin/bp_genbank2gff3.pl line 374. """ This appears to be associated with the following code (l207 onwards...) in TypeMapper: """ =head2 map_types_to_SO [...] hardcodes the genbank to SO mapping [...] dgg: separated out FT_SO_map for caller changes. Update with: open(FTSO,"curl -s http://sequenceontology.org/resources/mapping/FT_SO.txt|"); while(){ chomp; ($ft,$so,$sid,$ftdef,$sodef)= split"\t"; print " '$ft' => '$so',\n" if($ft && $so && $ftdef); } =cut sub ft_so_map { # $self= shift; """ The upper/lower case function declaration seems to be important, as changing it back to "sub FT_SO_map" lets the script work: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk # Input: NC_004547.gbk # working on region:NC_004547, Erwinia carotovora subsp. atroseptica SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, complete genome. # GFF3 saved to ./NC_004547.gbk.gff # Summary: # Feature Count # ------- ----- # repeat_region 19 # sequence_variant 2 # repeat_unit 2 # gene 4614 # region 17387 # exon 4597 # RESIDUES 5064019 # """ Obviously, this is another unsatsifactory sucky ad hoc post-install hack; I hope I'm doing the right sort of thing, there. I'm not familiar with BioPerl so I'm not clear on why this change was made to the interface (it's part of the recent changes by Nathan Liles you referred to in your post: http://github.com/bioperl/bioperl-live/commit/18dae5436130c7c77e31120af1a37d dcd8a77a03), but it also seems to break bp_genbank2gff3.pl. Also, the --noCDS flag appears to have no effect at all when using the new version of bp_genbank2gff3.pl. The old version of bp_genbank2gff3.pl appears to recognise more feature types in the summary: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk # Input: NC_004547.gbk # working on region:NC_004547, Erwinia carotovora subsp. atroseptica SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, complete genome. # GFF3 saved to ./NC_004547.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4472 # sequence_variant 2 # gene 4594 # region 8275 # pseudogene 20 # CDS 4472 # RESIDUES(tr) 1433791 # RESIDUES 5064019 # rRNA 22 # processed_transcript 24 # repeat_region 19 # pseudogenic_region 46 # repeat_unit 2 # exon 4597 # tRNA 76 # """ and this is reflected in the substantial difference in GFF3 output, for issuing exactly the same command when moving from BioPerl 1.6.1 to bioperl-live: we get different GFF3 output that represents a different gene model. I wasn't expecting so radical a change, but at least the IDs are based on the locus_tag with the new script, and this appears to solve my problem with clashing feature IDs on the files I was using. Many thanks for your help, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From invite+m4r54agn at facebookmail.com Mon Mar 15 09:13:29 2010 From: invite+m4r54agn at facebookmail.com (Animesh Sharma) Date: Mon, 15 Mar 2010 06:13:29 -0700 Subject: [Bioperl-l] =?utf-8?b?4KSu4KWH4KSw4KWAIEZhY2Vib29rIOCkquCljQ==?= =?utf-8?b?4KSw4KWL4KSr4KS84KS+4KSH4KSyIOCkpuClh+CkluClh+Ckgg==?= Message-ID: ??????? ????? Facebook ??????? ???? ?? ???? ?? ??? ???? ?????, ??????, ?? ??????? ????? ?? ????/???? ??? ?? ??? ???? ????? ?? ??? ??? ????? ???? ?????/????? ??? ???? ?? ?? ?????? ??? ????. ???? ???? ?? Facebook ?? ?????! ?? ??? ?? Facebook ?? ???? ????, ?? ?? ?? ???? Facebook ????????? ??? ???? ???. ??????? Animesh Facebook ?? ???? ?? ???? ?? ??? ???? ??? ?? ???? ?? ???? ????: http://www.facebook.com/p.php?i=533710399&k=53F2X5TR3TXF4BGFSBYVPVW2UPKK65&r Already have an account? Add this email address to your account http://www.facebook.com/n/?merge_accounts.php&e=bioperl-l at portal.open-bio.org&c=b3e84a2fc8af2503660e52d1ee5449c1.Animesh Sharma ?? Facebook ?? ????? ???? ?? ??? bioperl-l at portal.open-bio.org ????? ???. ??? ?????? ??? ?? Facebook ?? ?? ?????? ?? ???? ??????? ? ???? ????? ??? ?? ????? ???????????? ???? ?? ??? ???? ??? ?? ???? ?? ????? ????. http://www.facebook.com/o.php?k=3cf837&u=612036206&mid=2082fa6G247aee6eG0G8 Facebook ?? ????? 1601 S. California Ave., Palo Alto, CA 94304 ??? ????? ??. From scott at scottcain.net Mon Mar 15 10:55:17 2010 From: scott at scottcain.net (Scott Cain) Date: Mon, 15 Mar 2010 10:55:17 -0400 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: References: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Message-ID: <4536f7701003150755w2c2875fbob004bc03cf3387ab@mail.gmail.com> Hi Leighton, Thanks for the feedback both on getting chado installed from svn and on the genbank2gff3 converter. About installing Chado from svn, I thought I'd modified the Makefile.PL script to gracefully survive not having the GMODtools directory present; I guess I'll have to revisit that. Since I probably won't get to it today, I created a bug report for it: https://sourceforge.net/tracker/?func=detail&aid=2970687&group_id=27707&atid=391291 About the genbank2gff3 script, I'm cc'ing Nathan to make sure he sees your comments. Thanks, Scott On Mon, Mar 15, 2010 at 7:55 AM, Leighton Pritchard wrote: > Hi Scott, > > Thanks for the reply. ?I tried your suggestions on a clean VM of CentOS 5.4 > and the equally wordy outcome is below... > > On 02/03/2010 Tuesday, March 2, 16:11, "Scott Cain" > wrote: > >> First, I am working on the 1.1 release of gmod/chado, and it >> may fix some of the problems you are describing. ?Certainly, ID >> collisions between GFF files should not be a problem (I didn't think >> they were in the 1.0 release, but that was a long time ago). ?Please >> try a checkout of the schema trunk in the gmod svn: >> >> ? http://gmod.org/wiki/SVN > > As a note for anyone following this, when I downloaded the trunk/chado files > only, my build failed with > > """ > $make > [...] > Manifying ../blib/man3/Bio::Chaos::ChaosGraph.3pm > Manifying ../blib/man3/Bio::Chaos::FeatureUtil.3pm > Manifying ../blib/man3/Bio::Chaos::XSLTHelper.3pm > Manifying ../blib/man3/Bio::Chaos::Root.3pm > make[1]: Leaving directory `/home/lpritc/Desktop/chado/chaos-xml' > make: *** No rule to make target `bin/gmod_gff2biomart5.pl', needed by > `blib/script/gmod_gff2biomart5.pl'. ?Stop. > """ > > I had to download the whole trunk for the installation to work. ?I came > across this thread: > http://old.nabble.com/Minor-Makefile.PL-changes-td26272744.html > > while I was looking for a solution; someone else has had a similar problem. > >> Another thing you may want to look at is that just last week, a >> developer at Texas A&M, Nathan Liles, contributed code to the >> bioperl-live trunk for the genbank2gff3.pl script that will do a much >> better job of converting bacterial genbank files to GFF3; perhaps that >> will help too. ?Working with a svn checkout of bioperl-live shouldn't >> be too scary either; the pieces you are interested in (that work with >> Chado and GBrowse) are quite stable. > > I also checked out BioPerl-live. ?The svn server at code.open-bio.org was > unresponsive for a couple of days, but Peter pointed me to GitHub at > http://github.com/bioperl/bioperl-live so I went from there. ?The process > isn't quite as clean as using the latest stable version of BioPerl, however. > > When I attempt to use the bp_genbank2gff3.pl script, I get the following > error message: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > Can't locate object method "FT_SO_map" via package > "Bio::SeqFeature::Tools::TypeMapper" at /usr/bin/bp_genbank2gff3.pl line > 374. > """ > > This appears to be associated with the following code (l207 onwards...) in > TypeMapper: > > """ > =head2 map_types_to_SO > > [...] > > hardcodes the genbank to SO mapping > > [...] > dgg: separated out FT_SO_map for caller changes. Update with: > > ?open(FTSO,"curl -s > http://sequenceontology.org/resources/mapping/FT_SO.txt|"); > ?while(){ > ? ?chomp; ($ft,$so,$sid,$ftdef,$sodef)= split"\t"; > ? ?print " ? ? '$ft' => '$so',\n" if($ft && $so && $ftdef); > ?} > > =cut > > sub ft_so_map ?{ > ?# $self= shift; > """ > > The upper/lower case function declaration seems to be important, as changing > it back to "sub FT_SO_map" lets the script work: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > # Input: NC_004547.gbk > # working on region:NC_004547, Erwinia carotovora subsp. atroseptica > SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, > complete genome. > # GFF3 saved to ./NC_004547.gbk.gff > # Summary: > # Feature ? ? ? Count > # ------- ? ? ? ----- > # repeat_region ?19 > # sequence_variant ?2 > # repeat_unit ?2 > # gene ?4614 > # region ?17387 > # exon ?4597 > # RESIDUES ?5064019 > # > """ > > Obviously, this is another unsatsifactory sucky ad hoc post-install hack; I > hope I'm doing the right sort of thing, there. ?I'm not familiar with > BioPerl so I'm not clear on why this change was made to the interface (it's > part of the recent changes by Nathan Liles you referred to in your post: > http://github.com/bioperl/bioperl-live/commit/18dae5436130c7c77e31120af1a37d > dcd8a77a03), but it also seems to break bp_genbank2gff3.pl. ?Also, the > --noCDS flag appears to have no effect at all when using the new version of > bp_genbank2gff3.pl. > > The old version of bp_genbank2gff3.pl appears to recognise more feature > types in the summary: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > # Input: NC_004547.gbk > # working on region:NC_004547, Erwinia carotovora subsp. atroseptica > SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, > complete genome. > # GFF3 saved to ./NC_004547.gbk.gff > # Summary: > # Feature ? ? ? Count > # ------- ? ? ? ----- > # mRNA ?4472 > # sequence_variant ?2 > # gene ?4594 > # region ?8275 > # pseudogene ?20 > # CDS ?4472 > # RESIDUES(tr) ?1433791 > # RESIDUES ?5064019 > # rRNA ?22 > # processed_transcript ?24 > # repeat_region ?19 > # pseudogenic_region ?46 > # repeat_unit ?2 > # exon ?4597 > # tRNA ?76 > # > """ > > and this is reflected in the substantial difference in GFF3 output, for > issuing exactly the same command when moving from BioPerl 1.6.1 to > bioperl-live: we get different GFF3 output that represents a different gene > model. ?I wasn't expecting so radical a change, but at least the IDs are > based on the locus_tag with the new script, and this appears to solve my > problem with clashing feature IDs on the files I was using. > > Many thanks for your help, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From kiekyon.huang at gmail.com Mon Mar 15 11:44:13 2010 From: kiekyon.huang at gmail.com (kiekyon.huang at gmail.com) Date: Mon, 15 Mar 2010 15:44:13 +0000 Subject: [Bioperl-l] Taxonomy report Message-ID: <0016e64be064b8211f0481d8c02d@google.com> Hi, just like to know if there is there any way to generate the taxonomy report from the standalone blast output? thanks From cjfields at illinois.edu Mon Mar 15 11:57:29 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 15 Mar 2010 10:57:29 -0500 Subject: [Bioperl-l] Taxonomy report In-Reply-To: <0016e64be064b8211f0481d8c02d@google.com> References: <0016e64be064b8211f0481d8c02d@google.com> Message-ID: <53CE22BE-38F4-4EC6-80A9-37228A9CF602@illinois.edu> Not that I know of, at least not w/o doing some mapping (the tax report is generated on NCBI's servers last I recall). chris On Mar 15, 2010, at 10:44 AM, kiekyon.huang at gmail.com wrote: > Hi, > > just like to know if there is there any way to generate the taxonomy report from the standalone blast output? > > thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Mar 15 13:11:27 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 15 Mar 2010 10:11:27 -0700 Subject: [Bioperl-l] getting strand from Bio::Align::AlignI ?? In-Reply-To: <8425A547-149B-41F5-B4DB-A58C9E92B373@mail.nih.gov> References: <8425A547-149B-41F5-B4DB-A58C9E92B373@mail.nih.gov> Message-ID: <4B9E6A3F.6080104@bioperl.org> Did you start with Bio::SearchIO object and call get_aln on the HSP object? Strand is available from the $hsp->query->strand and $hsp->hit->strand and Bio::SearchIO is the preferred way of parsing pairwise alignment reports. Either way the sequences themselves have strands not the alignment. Each sequence should have a strand $seq->strand since they are Bio::LocatableSeq objects. for my $seq ( $aln->each_seq ) { print $seq->id, " ", $seq->strand, "\n"; } -jason Joan Pontius wrote, On 3/15/10 8:49 AM: > I am looking into using Bio::Align::AlignI for an application that > uses blast2seq > and can't figure out how to get the strand of an alignment? > > Thanks in advance > > > > Joan Pontius-Contractor SAIC > Laboratory of Genomic Diversity > Bldg 560-NCI > Frederick Maryland 21702 > phone (301)846-1761 > fax (301) 846-1686 From cjfields1 at gmail.com Mon Mar 15 14:57:08 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Mon, 15 Mar 2010 13:57:08 -0500 Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <6C998BD2392E4BF594F041368D9456E4@BlackJack> References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> Message-ID: <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> Francisco, In general, please address any questions directly to the bioperl mail list, in case I can't respond. The anon. svn on code.open-bio.org is down at the moment. OBF support knows about this problem and it's being addressed. There is a github mirror of the repos in case this happens: http://github.com/bioperl chris On Mar 15, 2010, at 10:38 AM, Francisco J. Ossand?n wrote: > Hello Chris Fields, > I have posted before in the Bugzilla about Bioperl bugs, but this time is about the Bioperl SVN. It has been several days since I could connect to the SVN for the last time (tried from different locations). I can't connect directly (svn://code.open-bio.org/bioperl/bioperl-live/trunk) nor using the http link provided in the wiki (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > There has been some change in the SVN address or configuration that I should update? I have seen devs posting in the Bugzilla about submitted revisions to the SVN, so I guess that it is working, but I still can't connect to it. > > I hope that you can help me with this. > > Regards, > > -- > Francisco J. Ossandon > Bioinformatician. > Ph.D. Student, University Andres Bello. > Center for Bioinformatics and Genome Biology, > Fundacion Ciencia para la Vida. > Santiago, Chile. > www.cienciavida.cl/CBGB.htm From hlapp at drycafe.net Tue Mar 16 16:03:50 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 16 Mar 2010 16:03:50 -0400 Subject: [Bioperl-l] [OT] Job opportunity: Training coordinator and Bioinformatics Project Manager Message-ID: <0CDDCED9-266E-4CCE-8240-D7E2C8522784@drycafe.net> Hi all - first off, sorry for the cross-posting, we're trying to advertise this as widely as possible. Second, apologies if this is committing an offense and considered spam. I thought though that there might be some people around here who may be interested and suitable. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== A unique position is available for a training coordinator and bioinformatics project manager at the U.S. National Evolutionary Synthesis Center in Durham, North Carolina (NESCent, http:// nescent.org). NESCent is a National Science Foundation funded research center managed by Duke University, the University of North Carolina at Chapel Hill and North Carolina State University on behalf of the international evolutionary biology community. NESCent facilitates synthetic research by bringing together diverse expertise, data, tools and concepts (Sidlauskas et al. 2009). In addition to a resident population of 20-30 scientists, the Center hosts over 800 visitors a year. An informatics staff is on-site to support resident and visiting scientists? needs in high-performance computing, electronic collaboration, scientific software and databases; this includes custom software development for a limited number of high- impact projects. NESCent?s informatics training program includes a rotating series of open-application summer courses, ad-hoc short courses for resident scientists, and remote internships (including past participation in the Google Summer of Code). The training coordinator and bioinformatics project manager will provide oversight to the Center?s training activities. The incumbent will also serve as the interface between scientists and software developers at NESCent. The position provides extensive opportunities for collaboration and intellectual engagement with both NESCent- sponsored scientists and informatics staff; however, this is not an independent research position. The incumbent will report to the Director, while overseeing the work of a small informatics team and coordinating activities among the Center?s science, education and informatics programs. Responsibilities: ? 50% - Consult with sponsored scientists (including scientists in residence and working group participants) about informatics resources and needs. Manage software product development by gathering requirements from scientists, participating in conceptual design, monitoring implementation progress and product quality, facilitating communication between software developers and scientists, and researching software solutions. ? 25% - Oversee NESCent?s course curriculum by identifying opportunities for onsite or online informatics courses that satisfy demand for advanced training of resident and visiting scientists, recruiting instructors, providing guidance to instructors in developing course syllabi, coordinating logistical and technical support requirements, conducting assessments, and serving as a liaison to course organizers at other institutions. ? 25% - Assisting in the management of NESCent?s summer informatics intern program, by coordinating the recruitment, application & review process for students, communicating expectations to students and mentors, monitoring student progress, documenting student outcomes, and performing assessments. Education: Required: M.S. in Biology, Bioinformatics, or a related field. Preferred: Ph.D. and two years postdoctoral experience in evolutionary biology, or an equivalent combination of relevant education and/or experience. Experience: Required: Excellent communication, interpersonal, and organizational skills. Experience with computationally oriented scientific research. Preferred: At least two years in development of databases and open source software. Organization, coordination, development and delivery of courses and workshops appropriate for graduate-level participants. Terms of Employment: Salary will be competitive and commensurate with experience. As a full-time employee, the incumbent will receive Duke University?s benefits package (http://hr.duke.edu/benefits/main.html). The position is available immediately and will remain open until filled. The position is currently funded through November 2014, contingent on annual renewal of the Center by the NSF. How to Apply: Please send a C.V., including contact information for three references, and a brief statement of interest to Allen Rodrigo, Director, NESCent, at a.rodrigo at nescent.org. Inquiries about suitability for the position are welcome. Duke University is an Equal Opportunity/Affirmative Action employer. Additional information about NESCent: http://www.nescent.org References: Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall LW, Price S, Scherle R, Spaeth PA, Kidd DM (2009) Linking Big: The Continuing Promise of Evolutionary Synthesis. Evolution. http://dx.doi.org/10.1111/j.1558-5646.2009.00892.x From hartzell at alerce.com Tue Mar 16 19:35:13 2010 From: hartzell at alerce.com (George Hartzell) Date: Tue, 16 Mar 2010 16:35:13 -0700 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check Message-ID: <19360.5553.985550.996751@gargle.gargle.HOWL> Apologies if this is as silly of a question as it seems, I think that I must just be decaffeinated this morning.... I'm cleaning up some modules and would like to express a dependency on BioPerl-run version 1.6.1. For the main bioperl I use Bio::Root::Version and 1.006001. That works, although the course of investigating below I found that Bio::Root::RootI (which uses BR::Version) doesn't. A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) use Bio::Root::Version and thereby acquire a reasonable version number but: a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency when I want bioperl-run c) it's funny that PiseWorkflow uses Bio::Root::Version (which imports a $VERSION into it's package) then goes on to set one itself. b) there's something hinky going on, when I do 'perl Build.PL' on my Task it doesn't think that PiseWorkflow is up to date (it thinks I have version (0) if I understand correctly), but when I './Build installdeps' everything appears up to date. It looks like the trickiness of assigning $Bio::Root::Version::VERSION to $VERSION confuses Module::Build::ModuleInfo::_evaluate_version_line and the result is that VERSION appears to be 0. What's The Right Thing to do? Thanks, g. From maj at fortinbras.us Wed Mar 17 10:41:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Mar 2010 10:41:00 -0400 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check In-Reply-To: <19360.5553.985550.996751@gargle.gargle.HOWL> References: <19360.5553.985550.996751@gargle.gargle.HOWL> Message-ID: I'd say the RTTD would be to submit a bugzilla report; this sounds pretty fishy to me--(esp since the Pise stuff is deprecated, IIRC) cheers MAJ ----- Original Message ----- From: "George Hartzell" To: "bioperl-l List" Sent: Tuesday, March 16, 2010 7:35 PM Subject: [Bioperl-l] What's to depend on for BioPerl-run version check > > Apologies if this is as silly of a question as it seems, I think that > I must just be decaffeinated this morning.... > > I'm cleaning up some modules and would like to express a dependency on > BioPerl-run version 1.6.1. > > For the main bioperl I use Bio::Root::Version and 1.006001. That > works, although the course of investigating below I found that > Bio::Root::RootI (which uses BR::Version) doesn't. > > A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) > use Bio::Root::Version and thereby acquire a reasonable version number > but: > > a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency > when I want bioperl-run > c) it's funny that PiseWorkflow uses Bio::Root::Version (which > imports a $VERSION into it's package) then goes on to set one > itself. > b) there's something hinky going on, when I do 'perl Build.PL' on my > Task it doesn't think that PiseWorkflow is up to date (it thinks > I have version (0) if I understand correctly), but when I > './Build installdeps' everything appears up to date. > > It looks like the trickiness of assigning > $Bio::Root::Version::VERSION to $VERSION confuses > Module::Build::ModuleInfo::_evaluate_version_line and the result > is that VERSION appears to be 0. > > What's The Right Thing to do? > > Thanks, > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From janine.arloth at googlemail.com Mon Mar 15 04:15:50 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Mon, 15 Mar 2010 09:15:50 +0100 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: Hello, exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) Best regards From cjfields at illinois.edu Wed Mar 17 11:13:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Mar 2010 10:13:20 -0500 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check In-Reply-To: References: <19360.5553.985550.996751@gargle.gargle.HOWL> Message-ID: <32C28662-BD24-4270-A0B6-71CEB459172C@illinois.edu> What is probably the best thing to do is set up a stub module for each of the subdistributions that contains a proper version to match against. So, for BioPerl-Run, use Bio::Run or Bio::Tools::Run, BioPerl-DB use Bio::DB, etc. Distribution-specific general documentation would go in those stub modules. I sort of started this, with the first alphas but didn't get around to finishing it up. Just as a footnote, the universal $VERSION thingy was set up quite a while ago, prior to perl 5.8 I believe, and doesn't play very well with $VERSION (and version.pm) on newer perl versions. Once we move beyond 1.6.x towards breaking things up we'll have to assign new VERSIONs to anything released independently on CPAN, anyway, so this may eventually be a moot point. chris The inherited $VERSION thingy was set up a while back, basically as a way of assigning a common version across BioPerl. On Mar 17, 2010, at 9:41 AM, Mark A. Jensen wrote: > I'd say the RTTD would be to submit a bugzilla report; this sounds pretty fishy > to me--(esp since the Pise stuff is deprecated, IIRC) cheers MAJ > ----- Original Message ----- From: "George Hartzell" > To: "bioperl-l List" > Sent: Tuesday, March 16, 2010 7:35 PM > Subject: [Bioperl-l] What's to depend on for BioPerl-run version check > > >> Apologies if this is as silly of a question as it seems, I think that >> I must just be decaffeinated this morning.... >> I'm cleaning up some modules and would like to express a dependency on >> BioPerl-run version 1.6.1. >> For the main bioperl I use Bio::Root::Version and 1.006001. That >> works, although the course of investigating below I found that >> Bio::Root::RootI (which uses BR::Version) doesn't. >> A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) >> use Bio::Root::Version and thereby acquire a reasonable version number >> but: >> a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency >> when I want bioperl-run >> c) it's funny that PiseWorkflow uses Bio::Root::Version (which >> imports a $VERSION into it's package) then goes on to set one >> itself. >> b) there's something hinky going on, when I do 'perl Build.PL' on my >> Task it doesn't think that PiseWorkflow is up to date (it thinks >> I have version (0) if I understand correctly), but when I >> './Build installdeps' everything appears up to date. >> It looks like the trickiness of assigning >> $Bio::Root::Version::VERSION to $VERSION confuses >> Module::Build::ModuleInfo::_evaluate_version_line and the result >> is that VERSION appears to be 0. >> What's The Right Thing to do? >> Thanks, >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robfsouza at gmail.com Wed Mar 17 11:20:21 2010 From: robfsouza at gmail.com (robfsouza) Date: Wed, 17 Mar 2010 08:20:21 -0700 (PDT) Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> Message-ID: <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> Hi Chris, Any idea when the SVN is going to be fixed? I could not find tar.gz or other download methods in github... Robson On Mar 15, 2:57?pm, Christopher Fields wrote: > Francisco, > > In general, please address any questions directly to the bioperl mail list, in case I can't respond. ? > > The anon. svn on code.open-bio.org is down at the moment. ?OBF support knows about this problem and it's being addressed. ?There is a github mirror of the repos in case this happens: > > http://github.com/bioperl > > chris > > On Mar 15, 2010, at 10:38 AM, Francisco J. Ossand?n wrote: > > > > > Hello Chris Fields, > > I have posted before in the Bugzilla about Bioperl bugs, but this time is about the Bioperl SVN. It has been several days since I could connect to the SVN for the last time (tried from different locations). I can't connect directly (svn://code.open-bio.org/bioperl/bioperl-live/trunk) nor using the http link provided in the wiki (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > > There has been some change in the SVN address or configuration that I should update? I have seen devs posting in the Bugzilla about submitted revisions to the SVN, so I guess that it is working, but I still can't connect to it. > > > I hope that you can help me with this. > > > Regards, > > > -- > > Francisco J. Ossandon > > Bioinformatician. > > Ph.D. Student, University Andres Bello. > > Center for Bioinformatics and Genome Biology, > > Fundacion Ciencia para la Vida. > > Santiago, Chile. > >www.cienciavida.cl/CBGB.htm > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From adsj at novozymes.com Wed Mar 17 12:00:34 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 17 Mar 2010 17:00:34 +0100 Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> (robfsouza@gmail.com's message of "Wed, 17 Mar 2010 08:20:21 -0700 (PDT)") References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> Message-ID: <874okfsztp.fsf@topper.koldfront.dk> On Wed, 17 Mar 2010 08:20:21 -0700 (PDT), robfsouza wrote: > Any idea when the SVN is going to be fixed? I could not find tar.gz or > other download methods in github... If you don't want to "git clone http://github.com/bioperl/bioperl-live.git", you can click on the "Download source" link in the upper right corner of http://github.com/bioperl/bioperl-live and you'll get to choose between downloading tar or zip. Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Wed Mar 17 12:12:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Mar 2010 11:12:42 -0500 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: <53EECF69-E9CE-4619-BE0A-97BE55754D8E@illinois.edu> Janine, How would you go about doing that from the BLAST report alone (which doesn't store the whole sequence)? Unless you know something I don't, you'll need to pull the unique identifier for the sequence from the hit object while parsgin the report and grab the seq from a local or remote database (or use fastacmd or it's equivalent in blast+). chris On Mar 15, 2010, at 3:15 AM, Janine Arloth wrote: > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 15:48:27 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 08:48:27 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > Sent: Monday, 15 March 2010 9:16 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the > hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From michael.watson at bbsrc.ac.uk Wed Mar 17 16:47:57 2010 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 17 Mar 2010 20:47:57 +0000 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> Message-ID: <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> I think that relies on the blast database being built with the "-o T" option, which is not the default for formatdb.... ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] On Behalf Of Smithies, Russell [Russell.Smithies at agresearch.co.nz] Sent: 17 March 2010 19:48 To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > Sent: Monday, 15 March 2010 9:16 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the > hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 17:07:29 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 10:07:29 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A725D@exchsth.agresearch.co.nz> Precompiled databases from NCBI are built with "-o T" but when building them yourself, the default is "-o F". We build all ours with "-o T" as we have some extra stuff built into our to retrieve sequences for all your blast hits. Here's an example of our sequence retrieval: https://isgcdata.agresearch.co.nz/cgi-bin/blast_results.py?filename=xCW3ez7FU46qvpKNTGNu9ZXnw&submit_time=1268859815.54&database=isgcdata_raw --Russell > -----Original Message----- > From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] > Sent: Thursday, 18 March 2010 9:48 a.m. > To: Smithies, Russell; 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' > Subject: RE: [Bioperl-l] SearchIO, StandAloneBlastPlus > > I think that relies on the blast database being built with the "-o T" > option, which is not the default for formatdb.... > ________________________________________ > From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Smithies, Russell > [Russell.Smithies at agresearch.co.nz] > Sent: 17 March 2010 19:48 > To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus > > If you're running blast locally, use fastacmd to extract the sequences > from the blast database. > Eg fastacmd -d nr -S AC147927 > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > > Sent: Monday, 15 March 2010 9:16 p.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > > > Hello, > > > > exists a possibility to get/extract the whole hit sequences? (Not only > the > > hit string from the alignment with $hsp->$hit_string;) > > > > Best regards > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 17:53:38 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 10:53:38 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <99D9C34C-655F-4BBC-AD01-83E2EC837317@gmail.com> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> <18DF7D20DFEC044098A1062202F5FFF32C6E2A725D@exchsth.agresearch.co.nz> <99D9C34C-655F-4BBC-AD01-83E2EC837317@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A72BD@exchsth.agresearch.co.nz> It's all a bit complicated as this page is on a public site but our blast server is internal and restricted so there's no direct communication between them. The public site takes the data from the blast requect and writes it to a template file then puts it in a folder that the internal blast server checks every 10 seconds. When a new request is found, it does the blast , creates the image and map with Bio::Graphics, then transfers it to a folder on the public server. As a sneaky bodge so I don't have to transfer the image, it's base64 encoded in the html then stripped out later. The blast result page keeps refreshing until it sees the required result has returned then displays the page. It sounds a bit odd but as blast runs on one of our main servers, we don't want anyone to be able to "accidently" run commands on it - no one has hacked our servers yet :) There's some good stuff in the BioPerl howtos http://www.bioperl.org/wiki/HOWTO:Graphics and http://www.bioperl.org/wiki/HOWTO:SearchIO Bio::SearchIO::Writer::HTMLResultWriter can be quite useful though ours is html-ized 'manually' as it's streamed through a post-processing script. --Russell From: Janine Arloth [mailto:janine.arloth at googlemail.com] Sent: Thursday, 18 March 2010 10:33 a.m. To: Smithies, Russell Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus Thank you very much. Can I ask you, how you get the figure in the blast output (blastmap)? I use use Bio::Graphics; But i did not see how to create this figure? Best Regards Am 17.03.2010 um 22:07 schrieb Smithies, Russell: Precompiled databases from NCBI are built with "-o T" but when building them yourself, the default is "-o F". We build all ours with "-o T" as we have some extra stuff built into our to retrieve sequences for all your blast hits. Here's an example of our sequence retrieval: https://isgcdata.agresearch.co.nz/cgi-bin/blast_results.py?filename=xCW3ez7FU46qvpKNTGNu9ZXnw&submit_time=1268859815.54&database=isgcdata_raw --Russell -----Original Message----- From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Sent: Thursday, 18 March 2010 9:48 a.m. To: Smithies, Russell; 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: RE: [Bioperl-l] SearchIO, StandAloneBlastPlus I think that relies on the blast database being built with the "-o T" option, which is not the default for formatdb.... ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open- bio.org] On Behalf Of Smithies, Russell [Russell.Smithies at agresearch.co.nz] Sent: 17 March 2010 19:48 To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- bounces at lists.open-bio.org] On Behalf Of Janine Arloth Sent: Monday, 15 March 2010 9:16 p.m. To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus Hello, exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) Best regards _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From armendarez77 at hotmail.com Thu Mar 18 12:27:20 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Thu, 18 Mar 2010 09:27:20 -0700 Subject: [Bioperl-l] Bio::DB::RefSeq and iPrism Web Filter Message-ID: Hello, I'm having a problem involving my company's StBernard iPrism Web Filter. I would like to be able to run my scripts (include Bio::DB::RefSeq, Bio::DB::GenBank) via crontab, however the web filter requires me to log in every 8 hours. The administrator removed the filter however, my scripts still failed. I then logged into iPrism and the scripts worked. The system administrators say its the script; that it is somehow caching information and preventing itself from accessing the internet. I'm using the following modules: strict, DBI, Bio::Perl, Bio::SeqIO, Getopt::Long and Bio::Tools::Run::StandAloneBlast. I would include the script, but it's a bit involved and passes arguments to other scripts. Thank you, Veronica _________________________________________________________________ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850553/direct/01/ From cjfields at illinois.edu Thu Mar 18 13:21:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Mar 2010 12:21:22 -0500 Subject: [Bioperl-l] Bio::DB::RefSeq and iPrism Web Filter In-Reply-To: References: Message-ID: Veronica, No caching occurs that I know of. If you have a environment proxy set somehow it will use that, using LWP::UserAgent and env_proxy() (your logging in via iPrism makes me think it is something along those lines). Otherwise the proxy has to be explicitly set for each object, so no caching is apparent. Could you have a local environment proxy set that you're unaware of? See here for examples: http://search.cpan.org/~gaas/libwww-perl-5.834/lib/LWP/UserAgent.pm#Proxy_attributes You could try something like this after you create the instances, which accesses the LWP::UserAgent instance cached in the relevant class and shuts off proxies: $db->ua->no_proxy(); Otherwise, you can try coming up with a minimal test case indicating what happens (including any output) and file a bug report, just in case. chris On Mar 18, 2010, at 11:27 AM, wrote: > > Hello, > > I'm having a problem involving my company's StBernard iPrism Web Filter. I would like to be able to run my scripts (include Bio::DB::RefSeq, Bio::DB::GenBank) via crontab, however the web filter requires me to log in every 8 hours. The administrator removed the filter however, my scripts still failed. I then logged into iPrism and the scripts worked. > > The system administrators say its the script; that it is somehow caching information and preventing itself from accessing the internet. I'm using the following modules: strict, DBI, Bio::Perl, Bio::SeqIO, Getopt::Long and Bio::Tools::Run::StandAloneBlast. > > I would include the script, but it's a bit involved and passes arguments to other scripts. > > Thank you, > > Veronica > > > > _________________________________________________________________ > Hotmail: Trusted email with powerful SPAM protection. > http://clk.atdmt.com/GBL/go/210850553/direct/01/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Mar 18 17:11:34 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 18 Mar 2010 14:11:34 -0700 Subject: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4BA29706.8040606@cornell.edu> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2010 Administrator From me at miguel.weapps.com Thu Mar 18 19:33:16 2010 From: me at miguel.weapps.com (Luis M Rodriguez-R) Date: Thu, 18 Mar 2010 18:33:16 -0500 Subject: [Bioperl-l] GSoC-2010 & the semantic web Message-ID: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Hello all, I would like to know how to apply to the GSoC-2010, and when it is planned to be performed. I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf, uniprot and some onthologies, but it could also be useful to integrate OWL, for example). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL for a discoverable "bio-cloud". The people of bio2rdf already have some parsers, but there are still a lot of things to do. Best regards, Luis. Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 luisrodr at uniandes.edu.co me at miguel.weapps.com From rhythmbox-devel at maubp.freeserve.co.uk Thu Mar 18 20:25:05 2010 From: rhythmbox-devel at maubp.freeserve.co.uk (Peter) Date: Fri, 19 Mar 2010 00:25:05 +0000 Subject: [Bioperl-l] GSoC-2010 & the semantic web In-Reply-To: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> References: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Message-ID: <320fb6e01003181725j2aa1268am80ae7649bd873b46@mail.gmail.com> On Thu, Mar 18, 2010 at 11:33 PM, Luis M Rodriguez-R wrote: > > I think there are great development opportunities in information > discovery using semantic web (I'm familiar with RDF in bio2rdf, > uniprot and some onthologies, ... Have a read of the wiki pages from this recent hackathon - it should be of interested to you: http://hackathon3.dbcls.jp/ Peter From cjfields at illinois.edu Thu Mar 18 20:29:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Mar 2010 19:29:19 -0500 Subject: [Bioperl-l] GSoC-2010 & the semantic web In-Reply-To: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> References: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Message-ID: <0FADD2C6-9458-4E0C-ADB5-E4C0F18A79D8@illinois.edu> Luis, See this page for the specifics: http://www.open-bio.org/wiki/Google_Summer_of_Code There are several proposed projects already listed, feel free to add yours to the page. I'm assuming these will be OBF-focused, so tying your proposal to one of the OBF projects is probably a good idea. chris On Mar 18, 2010, at 6:33 PM, Luis M Rodriguez-R wrote: > Hello all, > > I would like to know how to apply to the GSoC-2010, and when it is planned to be performed. > > I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf, uniprot and some onthologies, but it could also be useful to integrate OWL, for example). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL for a discoverable "bio-cloud". > > The people of bio2rdf already have some parsers, but there are still a lot of things to do. > > Best regards, > Luis. > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ross at cuhk.edu.hk Sat Mar 20 19:55:35 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 21 Mar 2010 07:55:35 +0800 Subject: [Bioperl-l] automation of translation based on alignment Message-ID: <002c01cac888$d570fe20$8052fa60$@edu.hk> Dear bioperl users, I am working on virus sequences and one of the Genbank file is here: http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu m with 1000 such nucleotide sequences, I'd like to translate the corresponding protein coding sequences. The difficulties lie in: 1) The genome sequence is circular 2) The genes are overlapping I don't have all the 1000 Genbank files but I plan to use the above guide one to direct the automation process. Has bioperl implemented specialized functions to handle this kind of problem? Thanks a lot for your advice, Ross From florent.angly at gmail.com Sun Mar 21 20:44:11 2010 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 22 Mar 2010 10:44:11 +1000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <002c01cac888$d570fe20$8052fa60$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> Message-ID: <4BA6BD5B.9010509@gmail.com> Hi Ross, It seems like your answer is in the link you put. On this link, all the coding sequences are already identified and their aminoacid sequence provided. You simply need to parse all the GenBank entries to extract this information. You may use EUtilities to achieve this online: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook Florent On 21/03/10 09:55, Ross KK Leung wrote: > Dear bioperl users, > > > > I am working on virus sequences and one of the Genbank file is here: > > > > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > m > > > > with 1000 such nucleotide sequences, I'd like to translate the corresponding > protein coding sequences. The difficulties lie in: > > > > 1) The genome sequence is circular > > 2) The genes are overlapping > > > > I don't have all the 1000 Genbank files but I plan to use the above guide > one to direct the automation process. Has bioperl implemented specialized > functions to handle this kind of problem? > > > > Thanks a lot for your advice, Ross > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Sun Mar 21 21:14:27 2010 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 22 Mar 2010 11:14:27 +1000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <004d01cac95c$15c95250$415bf6f0$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> Message-ID: <4BA6C473.4090404@gmail.com> Hi Ross, Please keep relies on the BioPerl mailing list so that everyone benefits. You should give detailed explanations of what you are tying to achieve., e.g.: * What type of input file do you have? * Do you already know the location of the ORFs? * what is the multiple alignments you are talking about ... Florent On 22/03/10 11:07, Ross KK Leung wrote: > Dear Florent, > > Thanks for your response. While the one with Genbank file can be extracted, > those without have to rely on alignment. Scripts certainly can be written to > move forward and backward on the multiple alignment but it is an error-prone > process and that's why I raised this question. > > Rgds, Ross > > > > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Monday, March 22, 2010 8:44 AM > To: Ross KK Leung > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] automation of translation based on alignment > > Hi Ross, > It seems like your answer is in the link you put. On this link, all the > coding sequences are already identified and their aminoacid sequence > provided. You simply need to parse all the GenBank entries to extract > this information. You may use EUtilities to achieve this online: > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > Florent > > On 21/03/10 09:55, Ross KK Leung wrote: > >> Dear bioperl users, >> >> >> >> I am working on virus sequences and one of the Genbank file is here: >> >> >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> >> > >> tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> >> >> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > >> m >> >> >> >> with 1000 such nucleotide sequences, I'd like to translate the >> > corresponding > >> protein coding sequences. The difficulties lie in: >> >> >> >> 1) The genome sequence is circular >> >> 2) The genes are overlapping >> >> >> >> I don't have all the 1000 Genbank files but I plan to use the above guide >> one to direct the automation process. Has bioperl implemented specialized >> functions to handle this kind of problem? >> >> >> >> Thanks a lot for your advice, Ross >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From ross at cuhk.edu.hk Sun Mar 21 21:22:47 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 09:22:47 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <4BA6C473.4090404@gmail.com> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> Message-ID: <004e01cac95e$2e375f10$8aa61d30$@edu.hk> Dear Florent, Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem details: Input: 1000 multiple aligned DNA sequences One of them has Genbank file http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 the remaining 999 ones only have genomic sequences. Objective: to derive the cognate protein aligned sequences. (here have 4 sets as there are 4 overlapping genes) Difficulties: 1) circular genome 2) there may be in-dels Hope now the problem has been clarified, Ross -----Original Message----- From: Florent Angly [mailto:florent.angly at gmail.com] Sent: Monday, March 22, 2010 9:14 AM To: Ross KK Leung; bioperl-l List Subject: Re: [Bioperl-l] automation of translation based on alignment Hi Ross, Please keep relies on the BioPerl mailing list so that everyone benefits. You should give detailed explanations of what you are tying to achieve., e.g.: * What type of input file do you have? * Do you already know the location of the ORFs? * what is the multiple alignments you are talking about ... Florent On 22/03/10 11:07, Ross KK Leung wrote: > Dear Florent, > > Thanks for your response. While the one with Genbank file can be extracted, > those without have to rely on alignment. Scripts certainly can be written to > move forward and backward on the multiple alignment but it is an error-prone > process and that's why I raised this question. > > Rgds, Ross > > > > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Monday, March 22, 2010 8:44 AM > To: Ross KK Leung > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] automation of translation based on alignment > > Hi Ross, > It seems like your answer is in the link you put. On this link, all the > coding sequences are already identified and their aminoacid sequence > provided. You simply need to parse all the GenBank entries to extract > this information. You may use EUtilities to achieve this online: > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > Florent > > On 21/03/10 09:55, Ross KK Leung wrote: > >> Dear bioperl users, >> >> >> >> I am working on virus sequences and one of the Genbank file is here: >> >> >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> >> > >> tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> >> >> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > >> m >> >> >> >> with 1000 such nucleotide sequences, I'd like to translate the >> > corresponding > >> protein coding sequences. The difficulties lie in: >> >> >> >> 1) The genome sequence is circular >> >> 2) The genes are overlapping >> >> >> >> I don't have all the 1000 Genbank files but I plan to use the above guide >> one to direct the automation process. Has bioperl implemented specialized >> functions to handle this kind of problem? >> >> >> >> Thanks a lot for your advice, Ross >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From cjfields at illinois.edu Sun Mar 21 23:40:34 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 21 Mar 2010 22:40:34 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <004e01cac95e$2e375f10$8aa61d30$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> Message-ID: <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> On Mar 21, 2010, at 8:22 PM, Ross KK Leung wrote: > Dear Florent, > > Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem > details: > > Input: > > 1000 multiple aligned DNA sequences > One of them has Genbank file > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > > the remaining 999 ones only have genomic sequences. > > Objective: to derive the cognate protein aligned sequences. (here have 4 > sets as there are 4 overlapping genes) > > Difficulties: > 1) circular genome > 2) there may be in-dels To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? One could try converting the reference sequence's feature coordinates to alignment column-based positions, pull sub-alignments out from there, then translate each sequence. There would be no need to re-retrieve sequences which are already present in the alignment, unless there is something not mentioned above that I'm missing. Re: circular genomes: recent commits to bioperl should allow handling circular genomes with features and subsequence extraction. If not I would consider that a serious bug that needs to be reported. If you need to grab remote sequences from a larger set of sequences (either locally or remotely) and translate them, you can use Bio::DB::GenBank, which will directly return a Bio::Seq object. Note you would obviously have to reset these per ID based on the start/end/strand: my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => 100, -seq_stop => 200, -strand => 1); my $seqobj = $gb->get_Seq_by_id($id); # or get_Seq_by_acc($acc) # do any preprocessing here... my $protein_seqobj = $seq->translate; If you want you could also download the sequences and use one of the various flatfile database classes to work with them (I believe Bio::DB::Fasta extracts subsequences very rapidly). It might be faster. For those regions that cross the origin you may need to pull two sequences and join them somehow, as the sequences likely won't run a join automatically. > Hope now the problem has been clarified, Ross Hope this helps. chris From ross at cuhk.edu.hk Mon Mar 22 01:30:06 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 13:30:06 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> Message-ID: <006901cac980$bb60f190$3222d4b0$@edu.hk> Dear Chris, It seems that Bioperl is "clever" enough to "rectify" my start and stop by reversing the order. e.g. start = 2300 stop = 1600 It will reverse back to 1600 and then 2300. What else to tell that I'm now working on a circular genome? -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Monday, March 22, 2010 11:41 AM To: Ross KK Leung Cc: 'Florent Angly'; 'bioperl-l List' Subject: Re: [Bioperl-l] automation of translation based on alignment On Mar 21, 2010, at 8:22 PM, Ross KK Leung wrote: > Dear Florent, > > Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem > details: > > Input: > > 1000 multiple aligned DNA sequences > One of them has Genbank file > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > > the remaining 999 ones only have genomic sequences. > > Objective: to derive the cognate protein aligned sequences. (here have 4 > sets as there are 4 overlapping genes) > > Difficulties: > 1) circular genome > 2) there may be in-dels To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? One could try converting the reference sequence's feature coordinates to alignment column-based positions, pull sub-alignments out from there, then translate each sequence. There would be no need to re-retrieve sequences which are already present in the alignment, unless there is something not mentioned above that I'm missing. Re: circular genomes: recent commits to bioperl should allow handling circular genomes with features and subsequence extraction. If not I would consider that a serious bug that needs to be reported. If you need to grab remote sequences from a larger set of sequences (either locally or remotely) and translate them, you can use Bio::DB::GenBank, which will directly return a Bio::Seq object. Note you would obviously have to reset these per ID based on the start/end/strand: my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => 100, -seq_stop => 200, -strand => 1); my $seqobj = $gb->get_Seq_by_id($id); # or get_Seq_by_acc($acc) # do any preprocessing here... my $protein_seqobj = $seq->translate; If you want you could also download the sequences and use one of the various flatfile database classes to work with them (I believe Bio::DB::Fasta extracts subsequences very rapidly). It might be faster. For those regions that cross the origin you may need to pull two sequences and join them somehow, as the sequences likely won't run a join automatically. > Hope now the problem has been clarified, Ross Hope this helps. chris From cjfields at illinois.edu Mon Mar 22 08:58:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 07:58:00 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <006901cac980$bb60f190$3222d4b0$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> Message-ID: <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > Dear Chris, > > It seems that Bioperl is "clever" enough to "rectify" my start and stop by > reversing the order. > > e.g. > start = 2300 > stop = 1600 > > It will reverse back to 1600 and then 2300. > What else to tell that I'm now working on a circular genome? Reverse it where, the alignment or the feature? The svn version of BioPerl, for alignments, retains strand information (this was a bug that was fixed). For features, start is always less than end, with directionality determined by strand. For a circular genome, the feature is split across the origin, as you have seen in the original sequence you posted: ... gene join(2307..3215,1..1623) /gene="P" ... This would be represented as a Bio::Location::SplitLocation in the feature; it would joined based on that order if $seq->is_circular() is true (or at least it should). In cases like this, the safe bet is to call spliced_seq() to get the joined sequence in all cases, then call translate() to get the protein sequence. chris From ross at cuhk.edu.hk Mon Mar 22 09:17:05 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 21:17:05 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> Message-ID: <011701cac9c1$f7b89260$e729b720$@edu.hk> Chris, The following codes are what I use to retrieve sequences from GenBank. I know that I can use something like: for my $feature ($seqobj->get_SeqFeatures){ if ($feature->primary_tag eq "CDS") { ... To get features, but how should Bio::Location::SplitLocation be used? Do you mean something like: If ($seq->is_circular()) { spliced_seq(); } ? But the genome indeed has several such spliced sequences then how can I specify which is to retrieve? Thanks for your advice again~ #!/usr/bin/perl use Bio::SeqIO::genbank; use Bio::DB::GenBank; use Bio::DB::RefSeq; $gb = new Bio::DB::GenBank; my ($acc, $start, $stop) = @ARGV; my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => "$start", -seq_stop => "$stop", -strand => 1); $gbout = $acc; $seq = $gb->get_Seq_by_acc($acc); print "seq is ", $seq->seq, "\n"; $seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' ); $seqio_obj->write_seq($seq); exit; -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Monday, March 22, 2010 8:58 PM To: Ross KK Leung Cc: 'Florent Angly'; 'bioperl-l List' Subject: Re: [Bioperl-l] automation of translation based on alignment On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > Dear Chris, > > It seems that Bioperl is "clever" enough to "rectify" my start and stop by > reversing the order. > > e.g. > start = 2300 > stop = 1600 > > It will reverse back to 1600 and then 2300. > What else to tell that I'm now working on a circular genome? Reverse it where, the alignment or the feature? The svn version of BioPerl, for alignments, retains strand information (this was a bug that was fixed). For features, start is always less than end, with directionality determined by strand. For a circular genome, the feature is split across the origin, as you have seen in the original sequence you posted: ... gene join(2307..3215,1..1623) /gene="P" ... This would be represented as a Bio::Location::SplitLocation in the feature; it would joined based on that order if $seq->is_circular() is true (or at least it should). In cases like this, the safe bet is to call spliced_seq() to get the joined sequence in all cases, then call translate() to get the protein sequence. chris From jessica.sun at gmail.com Mon Mar 22 14:48:38 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Mon, 22 Mar 2010 14:48:38 -0400 Subject: [Bioperl-l] using Bio::SeqFeature::Tools::Unflattener Message-ID: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> Does any know how to get CDS of the corresponding mRNA accession(NM_) using this function? *Bio::SeqFeature::Tools::Unflattener many thanks in advance. * -- Jessica Jingping Sun From cjfields at illinois.edu Mon Mar 22 14:56:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 13:56:30 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature spliced_seq() Message-ID: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> I have just noticed that spliced_seq() is borked with Bio::DB::SeqFeature and am thinking about implementing it. Or is similar functionality already implemented elsewhere? Currently, it is calling entire_seq(), which I plan on avoiding simply to prevent sucking in the entire sequence into memory. This is currently what happens: --------------------------- my $it = $store->get_seq_stream(-type => 'mRNA'); my $ct = 0; while (my $sf = $it->next_seq) { my $seq = $sf->spliced_seq; # dies with exception } --------------------------- ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented by package Bio::DB::SeqFeature. This is not your fault - author of Bio::DB::SeqFeature should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 STACK: Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 STACK: Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 STACK: Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 STACK: beestore.pl:17 ---------------------------------------------------------------- chris From csembry at ualr.edu Mon Mar 22 15:48:56 2010 From: csembry at ualr.edu (Charles Embry) Date: Mon, 22 Mar 2010 14:48:56 -0500 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista Message-ID: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> I want to create a Gui that will use current bioperl modules(along with some I am writing). It will be on a windows machine that runs XP and maybe a laptop with Vista.(this is a project i am working on in Graduate school for a professor). It will be id'ing promoter types in eukaryote organisms and also do multiple alignments. What recommendations do yo suggest to use t develop this? A java application? If so how hard is it to get Java to use perl and bioperl modules? Another language? Is there a tool to directly develop a GUI for bioperl modules that does no use another language? I will need to tag certain sequences with user specified colors and such. Thanks for the help From cjfields at illinois.edu Mon Mar 22 16:20:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 15:20:24 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <011701cac9c1$f7b89260$e729b720$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> <011701cac9c1$f7b89260$e729b720$@edu.hk> Message-ID: On Mar 22, 2010, at 8:17 AM, Ross KK Leung wrote: > Chris, > > The following codes are what I use to retrieve sequences from GenBank. I > know that I can use something like: > > for my $feature ($seqobj->get_SeqFeatures){ > > if ($feature->primary_tag eq "CDS") { > ... > > To get features, but how should > > Bio::Location::SplitLocation > > be used? Do you mean something like: > > If ($seq->is_circular()) { > spliced_seq(); > } You probably won't directly see the SplitLocation itself unless you explicitly request it (it is contained in the sequence feature). Okay, so if you are trying to retrieve the sequence for a specific feature, you can use $sf->seq() (simple subsequence from start to end corrected for strand of feature). However, in the case where the feature crosses the origin it will contain a split location. In this case, you should call $sf->spliced_seq() to retrieve spliced sequence. For convenience, you could call spliced_seq on all sequence features; for simple locations it will just return the ordinary subseq(). So, if one had a generic sequence feature, one could call: $sf->spliced_seq->translate; to get the Bio::Seq object that is the translation of the seq feature region. > ? But the genome indeed has several such spliced sequences then how can I > specify which is to retrieve? Thanks for your advice again~ Do you mean alternatively spliced variants? These would be designated as separate features in a GenBank file, so you would check for those. Otherwise you'll have to clarify. If you haven't read them yet I suggest looking over the HOWTOs, specifically ones covering Seq/SeqIO and Feature/Annotation to get an idea of what is possible. chris > #!/usr/bin/perl > > use Bio::SeqIO::genbank; use Bio::DB::GenBank; > > use Bio::DB::RefSeq; > > > > $gb = new Bio::DB::GenBank; > > > > my ($acc, $start, $stop) = @ARGV; > > > > my $gb = Bio::DB::GenBank->new(-format => 'Fasta', > > -seq_start => "$start", > > -seq_stop => "$stop", > > -strand => 1); > > > > $gbout = $acc; > > > > $seq = $gb->get_Seq_by_acc($acc); > > print "seq is ", $seq->seq, "\n"; > > > > $seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' ); > > $seqio_obj->write_seq($seq); > > exit; > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Monday, March 22, 2010 8:58 PM > To: Ross KK Leung > Cc: 'Florent Angly'; 'bioperl-l List' > Subject: Re: [Bioperl-l] automation of translation based on alignment > > On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > >> Dear Chris, >> >> It seems that Bioperl is "clever" enough to "rectify" my start and stop by >> reversing the order. >> >> e.g. >> start = 2300 >> stop = 1600 >> >> It will reverse back to 1600 and then 2300. >> What else to tell that I'm now working on a circular genome? > > Reverse it where, the alignment or the feature? The svn version of BioPerl, > for alignments, retains strand information (this was a bug that was fixed). > For features, start is always less than end, with directionality determined > by strand. For a circular genome, the feature is split across the origin, > as you have seen in the original sequence you posted: > > ... > gene join(2307..3215,1..1623) > /gene="P" > ... > > > This would be represented as a Bio::Location::SplitLocation in the feature; > it would joined based on that order if $seq->is_circular() is true (or at > least it should). In cases like this, the safe bet is to call spliced_seq() > to get the joined sequence in all cases, then call translate() to get the > protein sequence. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Mon Mar 22 16:23:50 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 09:23:50 +1300 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E8829C2@exchsth.agresearch.co.nz> I guess it depends on how complex you need your GUI. If you only need a few a few menus, input fields, buttons, and are getting text or images as output then I'd stick to a simple web interface. You could tart it up a bit with Dojo or YUI libraries so it didn't look like every other webpage. If you need something more complex, you could give TK a go but I'm not sure how good it is and it will look a bit dated. If you're going to write the GUI in Swing, try Inline::Java and Java::Swing - take a look here: http://www.perlmonks.org/?node_id=372197 It may be easier to call Perl from Java so take a look at PLJava http://search.cpan.org/~gmpassos/PLJava-0.04/README.pod I haven't tried a Java GUI for Perl yet - we tend to use web interfaces for scripts that are going to get used by the "public" (i.e. scientists, not developers). We've found Mobyle http://bioweb2.pasteur.fr/projects/mobyle/ to be a nice way to get something up fairly quickly and it keep a consistent look to all our scripts. Hope this helps, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Charles Embry > Sent: Tuesday, 23 March 2010 8:49 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > > I want to create a Gui that will use current bioperl modules(along with > some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school > for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Mon Mar 22 16:26:15 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 22 Mar 2010 13:26:15 -0700 Subject: [Bioperl-l] Bio::DB::SeqFeature spliced_seq() In-Reply-To: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> References: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> Message-ID: <4BA7D267.6050704@bioperl.org> Yes it needs a special case I guess - since spliced_seq should work, however ... The only problem is that if both exons and CDS are sub-features you have to be smart enough to not grab both... So I have just relied on specialized dumping scripts for gff3_to_cds for my own needs (i.e. http://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/dbgff_to_cdspep.pl ). But you might also see what the Gbrowse plugin dumpers do. -jason Chris Fields wrote, On 3/22/10 11:56 AM: > I have just noticed that spliced_seq() is borked with > Bio::DB::SeqFeature and am thinking about implementing it. Or is > similar functionality already implemented elsewhere? > > Currently, it is calling entire_seq(), which I plan on avoiding simply > to prevent sucking in the entire sequence into memory. This is > currently what happens: > > > --------------------------- > > my $it = $store->get_seq_stream(-type => 'mRNA'); > > my $ct = 0; > while (my $sf = $it->next_seq) { > my $seq = $sf->spliced_seq; # dies with exception > } > > --------------------------- > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented > by package Bio::DB::SeqFeature. > This is not your fault - author of Bio::DB::SeqFeature should be blamed! > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 > STACK: > Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 > STACK: > Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 > STACK: > Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 > STACK: beestore.pl:17 > ---------------------------------------------------------------- > > > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Mon Mar 22 16:33:48 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 22 Mar 2010 13:33:48 -0700 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> Message-ID: <4BA7D42C.5050602@cornell.edu> If I were doing a GUI for BioPerl, I would certainly not try to use Java. You could have a look at how Padre, the Perl IDE (written in Perl is implemented): http://search.cpan.org/~plaven/Padre-0.58/ They use wx, I think. But, a simple web or command-line application would be far easier to write, in any language, if you can find somewhere to host it. Rob Charles Embry wrote: > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Mar 22 16:33:51 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 22 Mar 2010 13:33:51 -0700 Subject: [Bioperl-l] using Bio::SeqFeature::Tools::Unflattener In-Reply-To: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> References: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> Message-ID: <4BA7D42F.2060807@bioperl.org> you can try this but it is a bit of an involved script because it is setup for dealing with multiple genomes in multiple folders so you might want to simplify it. http://github.com/hyphaltip/genome-scripts/blob/master/data_format/genbank_gbk2gff3_unflatten.pl But I thought the perldoc was a good starting point - have you tried it Generally I do: GENBANK -> GFF3 --> genbank_gbk2gff3_unflatten.pl GFF3 -> {CDS,PEP,GENE} --> http://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/gff3_to_cdspep.pl (or equivalent) -jason Jessica Sun wrote, On 3/22/10 11:48 AM: > Does any know how to get CDS of the corresponding mRNA accession(NM_) using > this function? > *Bio::SeqFeature::Tools::Unflattener > > many thanks in advance. > > * > From Russell.Smithies at agresearch.co.nz Mon Mar 22 17:10:36 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 10:10:36 +1300 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4BA7D42C.5050602@cornell.edu> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> <4BA7D42C.5050602@cornell.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E882A5B@exchsth.agresearch.co.nz> wx www.wxwidgets.org looks very interesting - I didn't realize Cn3D used it. wxPerl http://wxperl.sourceforge.net might be worth a look. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Robert Buels > Sent: Tuesday, 23 March 2010 9:34 a.m. > To: Charles Embry > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > > If I were doing a GUI for BioPerl, I would certainly not try to use > Java. You could have a look at how Padre, the Perl IDE (written in Perl > is implemented): http://search.cpan.org/~plaven/Padre-0.58/ They use > wx, I think. > > But, a simple web or command-line application would be far easier to > write, in any language, if you can find somewhere to host it. > > Rob > > > Charles Embry wrote: > > I want to create a Gui that will use current bioperl modules(along with > some > > I am writing). It will be on a windows machine that runs XP and maybe a > > laptop with Vista.(this is a project i am working on in Graduate school > for > > a professor). It will be id'ing promoter types in eukaryote organisms > and > > also do multiple alignments. > > > > What recommendations do yo suggest to use t develop this? A java > > application? If so how hard is it to get Java to use perl and bioperl > > modules? Another language? Is there a tool to directly develop a GUI for > > bioperl modules that does no use another language? > > > > I will need to tag certain sequences with user specified colors and > such. > > > > > > Thanks for the help > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From clarsen at vecna.com Mon Mar 22 16:51:08 2010 From: clarsen at vecna.com (Chris Larsen) Date: Mon, 22 Mar 2010 16:51:08 -0400 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: Ross, Chris F, I'd like to just comment on this since we are working in parallel on a similar problem. See also the prior thread in archives for Peters work in BioPython that I instigated: "Polyproteins, robo slippage, viral mat_peptides" This dialog below is just to clarify the science that will guide the pseudocode and logic flow would be needed to be built out into a BioPerl module. There are plenty of comments on the string mashing required, and its a harrowing morass, but heres some other thoughts. Three line item comments first, and then some open general ideas for moving this block of concepts forward: 1. >> Ross Said: >> I am working on virus sequences and one of the Genbank file is here: >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> If you are transferring protein annotation, why not use the RefSeq one instead of a GenBank one? In our experience at Virusbrc.org we find that protein annotation transfer is only a valid idea if you have reference sequences for each serotype, or your annotations will have propagation errors from the reference. They just dont align more than 80% of the time for instance in Dengue, and I assume you want better then that? Yes this HepB is a decent sequence, but the problem is that HepB has four main serotypes, and yet there is only one RefSeq: NC_003977. My guess is that you will have to define reference peptide seqs for all four serotypes first, and then grab the Taxon_ID from the input unknown file so you align right i.e. you need to do virus annotation below the species level or it isnt accurate. The number of reference sequences that you use is related to the conservation of your virus family. The script needs to know which one to align to, so we have pulled that from the taxon_ID field of the *.gbk file. You could also use blast and pull the high scorer. Your choice. >> Ross said: >> >> Thanks for your response. While the one with Genbank file can be >> extracted, >> those without have to rely on alignment. Scripts certainly can be >> written to >> move forward and backward on the multiple alignment but it is an >> error-prone We find also that viruses dont have the proteins annotated most of the time. It's just genome file. Part of the problem is that /host/ proteases sometimes cleave the /viral/ polyproteins, in a species- specific way, and since there is only one database entry, but many hosts, you can /only/ give the genome code and still be right for everything it /might/ infect. You cant define the peptides in the file, because they might be different, depending on the host. Sick, isnt it? The proteins produced in different animals based on their proteases cleavage specificity help determine whether the virus effects that animal or not. This is my hunch based on experience, no, I cannot give an example. 3. Chris F said: > To preface this, any reason you're not translating the alignment > sequences using the above sequence's features as a reference? A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. 4. Last thoughts: * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? *Sometimes the features on viruses are named differently: / mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. All ideas. Nothing set in stone. Dialog welcome. Good luck all. Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 clarsen at vecna.com From janine.arloth at googlemail.com Sun Mar 21 10:02:32 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Sun, 21 Mar 2010 15:02:32 +0100 Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs In-Reply-To: References: Message-ID: Hello all, while running blast(n) I want to extend to method_arg like: .. $result = $fac->$blastprogramm_input( -query => $seq, -outfile => "blast.txt", -method_args => [ "-num_alignments" => $num_alignments_input, "-evalue" => $evalue_input, "-word_size" => $word_size_input, "-?" => $match_score_input, "-?" => $gapcosts_input ..... ] ); ... in Bio/Tools/BlastPlus/Config.pm I found for gap costs: bln| gapopen and bln| gapextend so when I have the input value = "4 4" , then Existence: 4 = gapaopen and Extension: 4 = gapextend ?? Is there a similar usage for Match/Mismatch scores like value="1,-2" -> match=1 and mismatch=-2?? (I can't find it) Thanks for help. From nils.mueller0 at googlemail.com Sun Mar 21 11:17:06 2010 From: nils.mueller0 at googlemail.com (=?ISO-8859-1?Q?Nils_M=FCller?=) Date: Sun, 21 Mar 2010 16:17:06 +0100 Subject: [Bioperl-l] BlastPlus Masker Message-ID: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> Dear all, I am confused in handeling with maskers in blastplus: I have fasta seq. and want to run blast with a low complexity masker like dustmasker: $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -mask_data => 'maskseqs.fas', -create => 1); Is myseqs.fas the same as maskseqs.fas??? I don't want to create a maskfile , I only will run blast with a masked file?? From razi.khaja at gmail.com Mon Mar 22 20:55:42 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Mon, 22 Mar 2010 20:55:42 -0400 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: <201003191525.o2JFPIr3019479@portal.open-bio.org> References: <201003191525.o2JFPIr3019479@portal.open-bio.org> Message-ID: Hello All, I've submitted a patch (blast.pm.diff) to bugzilla to enhance Bio/SearchIO/ blast.pm to be able to parse the algorithm_reference from BLAST reports. I've also submitted a patch (blast.t.diff) of 26 additional tests to parse the algorithm_reference from many of the BLAST reports in the t/data dir in bioperl-live. I'd like to get the patch into bioperl-live and would like someone to review the patch and tests. If the architecture for BLAST report parsing is changing, can someone let me know and I can contribute my efforts there. Below are links to bugzilla. Thanks, Razi Khaja ---------- Forwarded message ---------- From: Date: Fri, Mar 19, 2010 at 11:25 AM Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO To: bioperl-guts-l at bioperl.org http://bugzilla.open-bio.org/show_bug.cgi?id=3031 ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST ------- Created an attachment (id=1462) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) patch for t/SearchIO/blast.t to perform 26 additional tests to parse algorithm_reference from many BLAST report files -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From Russell.Smithies at agresearch.co.nz Mon Mar 22 21:26:30 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 14:26:30 +1300 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: References: <201003191525.o2JFPIr3019479@portal.open-bio.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> It's not really a bug if it was never implemented and it probably wasn't implemented because it wasn't needed. Is there actually a use case where you'd programmatically need to access the algorithm reference from Blast results?? I'm sure I can't think of one. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Razi Khaja > Sent: Tuesday, 23 March 2010 1:56 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse > algorithm_reference from BLAST reports using Bio::SearchIO > > Hello All, > > I've submitted a patch (blast.pm.diff) to bugzilla to enhance > Bio/SearchIO/ > blast.pm to be able to parse the algorithm_reference from BLAST reports. > I've also submitted a patch (blast.t.diff) of 26 additional tests to parse > the algorithm_reference from many of the BLAST reports in the t/data dir > in > bioperl-live. > > I'd like to get the patch into bioperl-live and would like someone to > review > the patch and tests. > > If the architecture for BLAST report parsing is changing, can someone let > me > know and I can contribute my efforts there. > > Below are links to bugzilla. > > Thanks, > > Razi Khaja > > ---------- Forwarded message ---------- > From: > Date: Fri, Mar 19, 2010 at 11:25 AM > Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference > from BLAST reports using Bio::SearchIO > To: bioperl-guts-l at bioperl.org > > > http://bugzilla.open-bio.org/show_bug.cgi?id=3031 > > > > > > ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST ------- > Created an attachment (id=1462) > --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) > patch for t/SearchIO/blast.t to perform 26 additional tests to parse > algorithm_reference from many BLAST report files > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From ross at cuhk.edu.hk Mon Mar 22 21:32:06 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 23 Mar 2010 09:32:06 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <001201caca28$a5e325b0$f1a97110$@edu.hk> Chris L, Your comment is insightful and as a non-virologist, I have never known that before. My strategy is just to extract the genomic fragments encoding proteins and derive the putative translated sequences. I'll do another round of MSA for the protein sequences in order to discover any outliners. There may be truncations, but as long as the protease acts post-translationally, it's acceptable. Chris F, What makes me feel frustrated is the verisimilar data structures and naming of Bio objects in Bioperl. If I want to retrieve a genbank file over the internet by: $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_acc('J00522'); And from: http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq so I can't do something like: my $seqobj = $seq->next_seq; for my $feat_object ($seqobj->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->spliced_seq->seq,"\n"; if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ print "gene: ",$val,"\n"; } } } } >From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the methods there mention nothing about how to get the features or inter-convert among the object types. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Larsen Sent: Tuesday, March 23, 2010 4:51 AM To: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] automation of translation based on alignment Ross, Chris F, I'd like to just comment on this since we are working in parallel on a similar problem. See also the prior thread in archives for Peters work in BioPython that I instigated: "Polyproteins, robo slippage, viral mat_peptides" This dialog below is just to clarify the science that will guide the pseudocode and logic flow would be needed to be built out into a BioPerl module. There are plenty of comments on the string mashing required, and its a harrowing morass, but heres some other thoughts. Three line item comments first, and then some open general ideas for moving this block of concepts forward: 1. >> Ross Said: >> I am working on virus sequences and one of the Genbank file is here: >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> If you are transferring protein annotation, why not use the RefSeq one instead of a GenBank one? In our experience at Virusbrc.org we find that protein annotation transfer is only a valid idea if you have reference sequences for each serotype, or your annotations will have propagation errors from the reference. They just dont align more than 80% of the time for instance in Dengue, and I assume you want better then that? Yes this HepB is a decent sequence, but the problem is that HepB has four main serotypes, and yet there is only one RefSeq: NC_003977. My guess is that you will have to define reference peptide seqs for all four serotypes first, and then grab the Taxon_ID from the input unknown file so you align right i.e. you need to do virus annotation below the species level or it isnt accurate. The number of reference sequences that you use is related to the conservation of your virus family. The script needs to know which one to align to, so we have pulled that from the taxon_ID field of the *.gbk file. You could also use blast and pull the high scorer. Your choice. >> Ross said: >> >> Thanks for your response. While the one with Genbank file can be >> extracted, >> those without have to rely on alignment. Scripts certainly can be >> written to >> move forward and backward on the multiple alignment but it is an >> error-prone We find also that viruses dont have the proteins annotated most of the time. It's just genome file. Part of the problem is that /host/ proteases sometimes cleave the /viral/ polyproteins, in a species- specific way, and since there is only one database entry, but many hosts, you can /only/ give the genome code and still be right for everything it /might/ infect. You cant define the peptides in the file, because they might be different, depending on the host. Sick, isnt it? The proteins produced in different animals based on their proteases cleavage specificity help determine whether the virus effects that animal or not. This is my hunch based on experience, no, I cannot give an example. 3. Chris F said: > To preface this, any reason you're not translating the alignment > sequences using the above sequence's features as a reference? A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. 4. Last thoughts: * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? *Sometimes the features on viruses are named differently: / mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. All ideas. Nothing set in stone. Dialog welcome. Good luck all. Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 clarsen at vecna.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Mon Mar 22 22:08:45 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Mon, 22 Mar 2010 22:08:45 -0400 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> References: <201003191525.o2JFPIr3019479@portal.open-bio.org> <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> Message-ID: Nope, not a bug, It's an enhancement though ;) I implemented it so that I could do a loss less transformation from BLAST report format to other formats. You could consider that a use case. I also have additional patches that parse other details from BLAST reports that aren't currently implemented in Bio::SearchIO, and I'd like to contribute those as well, however, I thought I'd start with this one. Razi On Mon, Mar 22, 2010 at 9:26 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > It's not really a bug if it was never implemented and it probably wasn't > implemented because it wasn't needed. > Is there actually a use case where you'd programmatically need to access > the algorithm reference from Blast results?? > I'm sure I can't think of one. > > > --Russell > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Razi Khaja > > Sent: Tuesday, 23 March 2010 1:56 p.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse > > algorithm_reference from BLAST reports using Bio::SearchIO > > > > Hello All, > > > > I've submitted a patch (blast.pm.diff) to bugzilla to enhance > > Bio/SearchIO/ > > blast.pm to be able to parse the algorithm_reference from BLAST reports. > > I've also submitted a patch (blast.t.diff) of 26 additional tests to > parse > > the algorithm_reference from many of the BLAST reports in the t/data dir > > in > > bioperl-live. > > > > I'd like to get the patch into bioperl-live and would like someone to > > review > > the patch and tests. > > > > If the architecture for BLAST report parsing is changing, can someone let > > me > > know and I can contribute my efforts there. > > > > Below are links to bugzilla. > > > > Thanks, > > > > Razi Khaja > > > > ---------- Forwarded message ---------- > > From: > > Date: Fri, Mar 19, 2010 at 11:25 AM > > Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference > > from BLAST reports using Bio::SearchIO > > To: bioperl-guts-l at bioperl.org > > > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=3031 > > > > > > > > > > > > ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST > ------- > > Created an attachment (id=1462) > > --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) > > patch for t/SearchIO/blast.t to perform 26 additional tests to parse > > algorithm_reference from many BLAST report files > > > > > > -- > > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are the assignee for the bug, or are watching the assignee. > > _______________________________________________ > > Bioperl-guts-l mailing list > > Bioperl-guts-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From maj at fortinbras.us Mon Mar 22 22:51:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Mar 2010 22:51:24 -0400 Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs In-Reply-To: References: Message-ID: Hi Janine-- The options you need are "reward" (for the match score) and "penalty" (for the mismatch score). Add them to -method_args. cheers MAJ ----- Original Message ----- From: "Janine Arloth" To: Sent: Sunday, March 21, 2010 10:02 AM Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs > Hello all, > > while running blast(n) I want to extend to method_arg like: > .. > $result = $fac->$blastprogramm_input( > -query => $seq, > -outfile => "blast.txt", > -method_args => [ > "-num_alignments" => $num_alignments_input, > "-evalue" => $evalue_input, > "-word_size" => $word_size_input, > "-?" => $match_score_input, > "-?" => $gapcosts_input > ..... > ] > ); > ... > > in Bio/Tools/BlastPlus/Config.pm I found for gap costs: bln| gapopen and bln| > gapextend > so when I have the input value = "4 4" , then Existence: 4 = gapaopen and > Extension: 4 = gapextend ?? > > Is there a similar usage for Match/Mismatch scores like value="1,-2" -> > match=1 and mismatch=-2?? > (I can't find it) > > Thanks for help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Mar 22 22:59:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Mar 2010 22:59:56 -0400 Subject: [Bioperl-l] BlastPlus Masker In-Reply-To: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> References: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> Message-ID: Hi Nils, You don't have to specify a mask_data file; the factory should make it for you; try simply $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -create => 1); -mask_data is there so that pre-made masks can be applied separately, or so you can name the file that is produced and preserve it; this is an "advanced feature", I suppose-- MAJ ----- Original Message ----- From: "Nils M?ller" To: Sent: Sunday, March 21, 2010 11:17 AM Subject: [Bioperl-l] BlastPlus Masker > Dear all, > > I am confused in handeling with maskers in blastplus: > I have fasta seq. and want to run blast with a low complexity masker like > dustmasker: > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'my_masked_db', > -db_data => 'myseqs.fas', > -masker => 'dustmasker', > -mask_data => 'maskseqs.fas', > -create => 1); > > Is myseqs.fas the same as maskseqs.fas??? I don't want to create a > maskfile , I only will run blast with a masked file?? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Mar 23 00:43:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 23:43:03 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <001201caca28$a5e325b0$f1a97110$@edu.hk> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> Message-ID: <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> On Mar 22, 2010, at 8:32 PM, Ross KK Leung wrote: > Chris L, > > Your comment is insightful and as a non-virologist, I have never known that > before. My strategy is just to extract the genomic fragments encoding > proteins and derive the putative translated sequences. I'll do another round > of MSA for the protein sequences in order to discover any outliners. There > may be truncations, but as long as the protease acts post-translationally, > it's acceptable. > > Chris F, > > What makes me feel frustrated is the verisimilar data structures and naming > of Bio objects in Bioperl. If I want to retrieve a genbank file over the > internet by: > > $gb = new Bio::DB::GenBank; > > $seq = $gb->get_Seq_by_acc('J00522'); > > And from: > http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html > > it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq > so I can't do something like: A Bio::Seq::RichSeq is-a Bio::Seq (it inherits Bio::Seq and augments it). I believe 'Bio::Seq' in the documents refers to the fact one can retrieve FASTA sequence data (which returns a simple Bio::Seq) or richer records, such as a GenBank record (which returns a Bio::Seq::RichSeq). In this case, it should probably read 'Bio::SeqI' to be more accurate (implements the Bio::SeqI interface). Beyond the addition of a few accessor methods they are essentially the same, in they both have annotation, features, etc. > my $seqobj = $seq->next_seq; You're either not reading the demos or the relevant documentation correctly, or there is a spot in the docs that needs to be fixed (if the latter, please let us know). Bio::Seq does not implement a next_seq() method, but sequence *streams* (ala Bio::SeqIO) do. You are probably thinking of something like this: my $streamobj = $gb->get_Stream_by_acc(@ids); while (my $seqobj = $stream->next_seq) { # do stuff here } The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time. Unless you call a stream in some way, that code will not work. If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > for my $feat_object ($seqobj->get_SeqFeatures) { > > if ($feat_object->primary_tag eq "CDS") { > > print $feat_object->spliced_seq->seq,"\n"; > > if ($feat_object->has_tag('gene')) { > > for my $val ($feat_object->get_tag_values('gene')){ > > print "gene: ",$val,"\n"; > > } > > } > > } > > } > >> From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the > methods there mention nothing about how to get the features or inter-convert > among the object types. Just a note, but make sure to read up-to-date documentation, particularly if you are using the latest code. Here is the pdoc for the latest release: http://doc.bioperl.org/releases/bioperl-1.6.1/Bio/Seq/RichSeqI.html This is definitely worth pointing out, and is a good example where we can improve our documentation; I've added some links to classes that would explain more. In the meantime, the best thing to do in this case is to point you to the online documentation (which I think I did already, but just in case): http://www.bioperl.org/wiki/HOWTO:Beginners http://www.bioperl.org/wiki/HOWTO:Feature-Annotation chris From cjfields at illinois.edu Tue Mar 23 00:53:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 23:53:48 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <42E3E2EC-2226-44CE-995E-01B425B161F1@illinois.edu> On Mar 22, 2010, at 3:51 PM, Chris Larsen wrote: > ... > 3. > Chris F said: > >> To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? > > > A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. > > Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. Yes, in this case nothing will be a immediate, perfect solution. It will take some additional work. > 4. > Last thoughts: > > * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). Might be nice to see what you've done, whenever that is ready. > * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? If you mean pulling out sets of sequences from a larger alignment or slices of alignments, there should be methods within Bio::SimpleAlign to do this, yes. > *Sometimes the features on viruses are named differently: /mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. > > * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. > > All ideas. Nothing set in stone. Dialog welcome. > > Good luck all. > > Chris > > > -- > > Christopher Larsen, Ph.D. > Sr. Scientist / Grants Manager > Vecna Technologies > 6404 Ivy Lane #500 > Greenbelt, MD 20770 > Phone: (240) 965-4525 > Fax: (240) 547-6133 > > clarsen at vecna.com Very nice summary of the problems in the field. thanks! chris From ross at cuhk.edu.hk Tue Mar 23 01:20:56 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 23 Mar 2010 13:20:56 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> Message-ID: <001501caca48$9db03f70$d910be50$@edu.hk> my $streamobj = $gb->get_Stream_by_acc(@ids); while (my $seqobj = $stream->next_seq) { # do stuff here } The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time. Unless you call a stream in some way, that code will not work. If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > for my $feat_object ($seqobj->get_SeqFeatures) { > > if ($feat_object->primary_tag eq "CDS") { > > print $feat_object->spliced_seq->seq,"\n"; > > if ($feat_object->has_tag('gene')) { > > for my $val ($feat_object->get_tag_values('gene')){ > > print "gene: ",$val,"\n"; > > } > > } > > } > > } Chris, in fact I did have this code before, but then it goes back to the old problem that the spliced sequence is incorrect. Please try using the following codes with "DQ089804" as the argument. If you check the printed result with: http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=2&itool=EntrezSyst em2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum you'll discover, for example, the sequence of gene P, is derived from splicing 1-1623 (starts with CTC...) and 2307-3215 (starts with ATG...), rather than 2307-3215 and 1-1623. use Bio::SeqIO::genbank; use Bio::DB::GenBank; use Bio::SeqIO; my ($acc) = @ARGV; $gb = new Bio::DB::GenBank; $streamobj = $gb->get_Stream_by_acc($acc); my $seqobj = $streamobj->next_seq; for my $feat_object ($seqobj->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->spliced_seq->seq,"\n"; if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ print "gene: ",$val,"\n"; } } } } exit; From e.osimo at gmail.com Tue Mar 23 05:42:25 2010 From: e.osimo at gmail.com (Emanuele Osimo) Date: Tue, 23 Mar 2010 10:42:25 +0100 Subject: [Bioperl-l] Xyplot and multiple lines plots Message-ID: <2ac05d0f1003230242o31779c30sffa42d8e99539b09@mail.gmail.com> Hello everyone, I would like to plot two data sets in Bio::Graphics using Xyplot, one superimposed on the other. I need to compare the differential expression of an Affy expression probeset in different subjects. I successfully managed to plot one at a time with: $panel->add_track( $feat, -graph_type=>'linepoints', -glyph =>'xyplot', -fgcolor=>'gray', -max_score => 1, -min_score => 0, ); But I cannot understand how to plot two lines independently in the same track. Thank you in advance, Emanuele From biopython at maubp.freeserve.co.uk Tue Mar 23 06:58:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Mar 2010 10:58:58 +0000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <320fb6e01003230358w11ae8e5fxef140652c5cc9f1b@mail.gmail.com> On Mon, Mar 22, 2010 at 8:51 PM, Chris Larsen wrote: > Ross, Chris F, > > I'd like to just comment on this since we are working in parallel on a > similar problem. See also the prior thread in archives for Peters work in > BioPython that I instigated: "Polyproteins, robo slippage, viral > mat_peptides" Minor typo - the old thread title was about ribo (ribosomal) slippage: http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031479.html http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031484.html etc Triggered in part by my discussion with Chris Larsen (off list) about the biological problem of getting the mature peptide sequences from GenBank files, Biopython 1.53 ended up with a new method for extracting the sequence region described by a (complex) location, e.g. from parsing in an EMBL/GenBank file. There were several threads about this, this is perhaps the best summary if anyone is interested: http://lists.open-bio.org/pipermail/biopython/2009-November/005813.html http://lists.open-bio.org/pipermail/biopython/2009-December/005889.html > This dialog below is just to clarify the science that will guide the > pseudocode and logic flow would be needed to be built out into a BioPerl > module. There are plenty of comments on the string mashing required, and its > a harrowing morass, but heres some other thoughts. Three line item comments > first, and then some open general ideas for moving this block of concepts > forward: Thanks for the update - it sounds like you've got a better understanding of the complexities now, any some of the reasons why representing things like mature peptides is tricky (the issue of different cleavage patterns in different hosts is interesting). Peter From cjfields at illinois.edu Tue Mar 23 08:46:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 23 Mar 2010 07:46:37 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <001501caca48$9db03f70$d910be50$@edu.hk> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> <001501caca48$9db03f70$d910be50$@edu.hk> Message-ID: <3A94734B-CD43-4674-8DB6-82EA1C6530E4@illinois.edu> On Mar 23, 2010, at 12:20 AM, Ross KK Leung wrote: > my $streamobj = $gb->get_Stream_by_acc(@ids); > > while (my $seqobj = $stream->next_seq) { > # do stuff here > } > > The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO > stream). '$stream->next_seq()' iterates through them one at a time. Unless > you call a stream in some way, that code will not work. If you call the > methods below directly on the *sequence* object ($seqobj, retrieved from > get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > >> for my $feat_object ($seqobj->get_SeqFeatures) { >> >> if ($feat_object->primary_tag eq "CDS") { >> >> print $feat_object->spliced_seq->seq,"\n"; >> >> if ($feat_object->has_tag('gene')) { >> >> for my $val ($feat_object->get_tag_values('gene')){ >> >> print "gene: ",$val,"\n"; >> >> } >> >> } >> >> } >> >> } > > Chris, in fact I did have this code before, but then it goes back to the old > problem that the spliced sequence is incorrect. Please try using the > following codes with "DQ089804" as the argument. If you check the printed > result with: > > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=2&itool=EntrezSyst > em2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum > > you'll discover, for example, the sequence of gene P, is derived from > splicing 1-1623 (starts with CTC...) and 2307-3215 (starts with ATG...), > rather than 2307-3215 and 1-1623. Okay, as I mentioned before, then that would be a bug. The best way to handle this is to file it in Bugzilla: http://bugzilla.open-bio.org/ I can likely look at it today, whether it's filed or not, just need to make some time. Please file the bug report, though, just in case I can't get to it right away. BTW, we had some discussion about circular genome support recently at the GMOD conference, and some code was added that was supposed to address the issues raised. I'm guessing we'll need to add more tests just to be sure. chris ... From Jean-Marc.Frigerio at pierroton.inra.fr Tue Mar 23 12:29:11 2010 From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA) Date: Tue, 23 Mar 2010 17:29:11 +0100 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: References: Message-ID: <4BA8EC57.7070802@pierroton.inra.fr> > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help Hi, Have also a look to Gtk-perl and perl-qt Best From Leighton.Pritchard at scri.ac.uk Tue Mar 23 12:35:42 2010 From: Leighton.Pritchard at scri.ac.uk (Leighton Pritchard) Date: Tue, 23 Mar 2010 16:35:42 -0000 Subject: [Bioperl-l] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? Message-ID: Hi, I can't seem to find any discussion of this on the mailing list archives (if anyone has a link, I'll happily follow it), so I was wondering what the rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live mapping CDS features to gene_component_region. For example, if I use the script on the E.coli sequence/annotation NC_000913.gbk, the gene: gene 190..255 /gene="thrL" /locus_tag="b0001" /note="synonyms: ECK0001, JW4367" /db_xref="EcoGene:EG11277" /db_xref="ECOCYC:EG11277" /db_xref="GeneID:944742" CDS 190..255 /gene="thrL" /locus_tag="b0001" /function="leader; Amino acid biosynthesis: Threonine" /function="1.5.1.8 metabolism; building block biosynthesis; amino acids; threonine" /note="GO_process: threonine biosynthetic process [goid 0009088]" /codon_start=1 /transl_table=11 /product="thr operon leader peptide" /protein_id="NP_414542.1" /db_xref="ASAP:ABE-0000006" /db_xref="UniProtKB/Swiss-Prot:P0AD86" /db_xref="GI:16127995" /db_xref="EcoGene:EG11277" /db_xref="ECOCYC:EG11277" /db_xref="GeneID:944742" /translation="MKRISTTITTTITITTGNGAG" Is mapped to NC_000913 GenBank region 190 255 . + . ID=GenBank:region:NC_000913:190:255 NC_000913 GenBank exon 190 255 . + . ID=GenBank:exon:NC_000913:190:255 NC_000913 GenBank gene 190 255 . + . ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank gene_component_region 190 255 . + . Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG NGAG I understand the region-exon-gene part of the model, but not the gene_component_region, which appears to be a catch-all. I would have assumed that the CDS is better mapped to a polypeptide, as described in the CHADO documentation: http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model There is no difference in script output whether --CDS or --noCDS is used. Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From djibrilo at yahoo.fr Tue Mar 23 13:38:25 2010 From: djibrilo at yahoo.fr (djibrilo) Date: Tue, 23 Mar 2010 10:38:25 -0700 (PDT) Subject: [Bioperl-l] Re : G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4BA8EC57.7070802@pierroton.inra.fr> References: <4BA8EC57.7070802@pierroton.inra.fr> Message-ID: <344176.4737.qm@web23001.mail.ird.yahoo.com> HI, Have also a look to perl/Tk. Best Regards ________________________________ De : Jean-Marc Frigerio INRA ? : bioperl-l at lists.open-bio.org Envoy? le : Mar 23 mars 2010, 17 h 29 min 11 s Objet : Re: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help Hi, Have also a look to Gtk-perl and perl-qt Best _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Mar 23 14:18:46 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 23 Mar 2010 14:18:46 -0400 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: References: Message-ID: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Hi Leighton, I wonder if this is a change stemming from Nathan's work on this script. Nathan? Scott On Tue, Mar 23, 2010 at 12:35 PM, Leighton Pritchard wrote: > Hi, > > I can't seem to find any discussion of this on the mailing list archives (if > anyone has a link, I'll happily follow it), so I was wondering what the > rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live > mapping CDS features to gene_component_region. > > For example, if I use the script on the E.coli sequence/annotation > NC_000913.gbk, the gene: > > ? ? gene ? ? ? ? ? ?190..255 > ? ? ? ? ? ? ? ? ? ? /gene="thrL" > ? ? ? ? ? ? ? ? ? ? /locus_tag="b0001" > ? ? ? ? ? ? ? ? ? ? /note="synonyms: ECK0001, JW4367" > ? ? ? ? ? ? ? ? ? ? /db_xref="EcoGene:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="ECOCYC:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="GeneID:944742" > ? ? CDS ? ? ? ? ? ? 190..255 > ? ? ? ? ? ? ? ? ? ? /gene="thrL" > ? ? ? ? ? ? ? ? ? ? /locus_tag="b0001" > ? ? ? ? ? ? ? ? ? ? /function="leader; Amino acid biosynthesis: Threonine" > ? ? ? ? ? ? ? ? ? ? /function="1.5.1.8 metabolism; building block > ? ? ? ? ? ? ? ? ? ? biosynthesis; amino acids; threonine" > ? ? ? ? ? ? ? ? ? ? /note="GO_process: threonine biosynthetic process [goid > ? ? ? ? ? ? ? ? ? ? 0009088]" > ? ? ? ? ? ? ? ? ? ? /codon_start=1 > ? ? ? ? ? ? ? ? ? ? /transl_table=11 > ? ? ? ? ? ? ? ? ? ? /product="thr operon leader peptide" > ? ? ? ? ? ? ? ? ? ? /protein_id="NP_414542.1" > ? ? ? ? ? ? ? ? ? ? /db_xref="ASAP:ABE-0000006" > ? ? ? ? ? ? ? ? ? ? /db_xref="UniProtKB/Swiss-Prot:P0AD86" > ? ? ? ? ? ? ? ? ? ? /db_xref="GI:16127995" > ? ? ? ? ? ? ? ? ? ? /db_xref="EcoGene:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="ECOCYC:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="GeneID:944742" > ? ? ? ? ? ? ? ? ? ? /translation="MKRISTTITTTITITTGNGAG" > > Is mapped to > > NC_000913 ? ? ? GenBank region ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=GenBank:region:NC_000913:190:255 > NC_000913 ? ? ? GenBank exon ? ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=GenBank:exon:NC_000913:190:255 > NC_000913 ? ? ? GenBank gene ? ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: > ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 > NC_000913 ? ? ? GenBank gene_component_region ? 190 ? ? 255 ? ? . ? ? ? + > . > Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 > ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine > biosynthetic process [goid > 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino > acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block > biosynthesis%3B amino acids%3B > threonine;gene=thrL;locus_tag=b0001;product=thr operon leader > peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG > NGAG > > I understand the region-exon-gene part of the model, but not the > gene_component_region, which appears to be a catch-all. ?I would have > assumed that the CDS is better mapped to a polypeptide, as described in the > CHADO documentation: > > http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model > > There is no difference in script output whether --CDS or --noCDS is used. > > Cheers, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Tue Mar 23 14:15:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 23 Mar 2010 14:15:38 -0400 Subject: [Bioperl-l] BlastPlus Masker In-Reply-To: <464282111003230942r231ca93kf56a2def9afa9651@mail.gmail.com> References: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> <464282111003230942r231ca93kf56a2def9afa9651@mail.gmail.com> Message-ID: Specifying 'dustmasker' for a nucleotide database is roughly the same as "filter : low complexity regions" and "mask : lookup table only", I believe. (There is also a facility for creating masks based on lowercase residues in a mask data fasta file; the blast+ utility is 'convert2blastmask'. You can run this with the SABlastPlus factory. I'm not very familiar with it, but you should be able to take the output file from this utility and feed it in to a new factory as the '-mask_data' to get what you want. (If anyone has done this, a brief step-by-step would be appreciated.)) cheers MAJ ----- Original Message ----- From: Nils M?ller To: Mark A. Jensen Sent: Tuesday, March 23, 2010 12:42 PM Subject: Re: [Bioperl-l] BlastPlus Masker Many thanks, is it the same as showed on the ncbi blast page (Filtering and Masking- filter: Low complexity regions and mask:Mask for lookup table only or Mask lower case letters)? 2010/3/23 Mark A. Jensen Hi Nils, You don't have to specify a mask_data file; the factory should make it for you; try simply $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -create => 1); -mask_data is there so that pre-made masks can be applied separately, or so you can name the file that is produced and preserve it; this is an "advanced feature", I suppose-- MAJ ----- Original Message ----- From: "Nils M?ller" To: Sent: Sunday, March 21, 2010 11:17 AM Subject: [Bioperl-l] BlastPlus Masker Dear all, I am confused in handeling with maskers in blastplus: I have fasta seq. and want to run blast with a low complexity masker like dustmasker: $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -mask_data => 'maskseqs.fas', -create => 1); Is myseqs.fas the same as maskseqs.fas??? I don't want to create a maskfile , I only will run blast with a masked file?? _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From lpritc at scri.ac.uk Wed Mar 24 08:05:08 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 24 Mar 2010 12:05:08 +0000 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Message-ID: Hi, I'm surprised that this issue hasn't come up already, as the change to the gene model is quite significant. For comparison, this is what the old bp_genbank2gff3.pl script would produce with --CDS: NC_000913 GenBank gene 190 255 . + . ID=thrL;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank mRNA 190 255 . + . ID=thrL.t01;Parent=thrL NC_000913 GenBank CDS 190 255 . + . ID=thrL.p01;Parent=thrL.t01;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0A D86,GI:16127995,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process : threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=length.21 NC_000913 GenBank exon 190 255 . + . Parent=thrL.t01 and with --noCDS: NC_000913 GenBank gene 190 255 . + . ID=thrL;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank mRNA 190 255 . + . ID=thrL.t01;Parent=thrL NC_000913 GenBank polypeptide 190 255 . + . ID=thrL.p01;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995, EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Derives_from=thrL.t01;Note=GO_p rocess: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=length.21 NC_000913 GenBank exon 190 255 . + . Parent=thrL.t01 The new script produces this identical output with both --CDS and --noCDS: NC_000913 GenBank region 190 255 . + . ID=GenBank:region:NC_000913:190:255 NC_000913 GenBank exon 190 255 . + . ID=GenBank:exon:NC_000913:190:255 NC_000913 GenBank gene 190 255 . + . ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank gene_component_region 190 255 . + . Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG NGAG So, although the new script improves the parent-child relationships by identifying parents on the locus_tag field (guaranteed to be unique), rather than gene name (not guaranteed to be unique), the GFF3 gene model has apparently changed from canonical: gene <- mRNA <- {polypeptide/CDS, exon} to this: region ; exon ; gene <- gene_component_region So I guess I don't understand the region-exon-gene part of the new model, after all. This new model doesn't appear to be Sequence Ontology-compatible any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon is no longer considered part_of the transcript. In fact, there's not a transcript. Given that the SO cite bp_genbank2gff3.pl as a way to get SO-compliant GFF3 (http://www.sequenceontology.org/resources/faq.html#convert), this might be an issue requiring a prompt fix or reversion. For now, due to the downstream problems this model causes with GBROWSE and ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the script to use the locus_tag field rather than the gene field for the feature ID. Cheers, L. On 23/03/2010 Tuesday, March 23, 18:18, "Scott Cain" wrote: > Hi Leighton, > > I wonder if this is a change stemming from Nathan's work on this > script. Nathan? > > Scott -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From cjfields at illinois.edu Wed Mar 24 09:06:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 08:06:01 -0500 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: References: Message-ID: <3A556027-C8DB-4683-8376-A42AC8796156@illinois.edu> On Mar 24, 2010, at 7:05 AM, Leighton Pritchard wrote: > Hi, > > I'm surprised that this issue hasn't come up already, as the change to the > gene model is quite significant. For comparison, this is what the old > bp_genbank2gff3.pl script would produce with --CDS: > ... > So, although the new script improves the parent-child relationships by > identifying parents on the locus_tag field (guaranteed to be unique), rather > than gene name (not guaranteed to be unique), the GFF3 gene model has > apparently changed from canonical: > > gene <- mRNA <- {polypeptide/CDS, exon} > > to this: > > region ; exon ; gene <- gene_component_region > > So I guess I don't understand the region-exon-gene part of the new model, > after all. This new model doesn't appear to be Sequence Ontology-compatible > any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon > is no longer considered part_of the transcript. In fact, there's not a > transcript. Given that the SO cite bp_genbank2gff3.pl as a way to get > SO-compliant GFF3 > (http://www.sequenceontology.org/resources/faq.html#convert), this might be > an issue requiring a prompt fix or reversion. I agree. I think this commit needs more code review to understand the reasoning behind it, though it will be a little trickier than a simple reversion (I think there have been additional unrelated commits since then). Nathan, was this the intent, or is this a bug? I would agree with Leighton that it's the latter. chris > For now, due to the downstream problems this model causes with GBROWSE and > ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the > script to use the locus_tag field rather than the gene field for the feature > ID. > > Cheers, > > L. From pmiguel at purdue.edu Wed Mar 24 09:49:55 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed, 24 Mar 2010 09:49:55 -0400 Subject: [Bioperl-l] How to set "complexity" param using EUtilities Message-ID: <4BAA1883.3010203@purdue.edu> Just a little FYI that might help someone using GenBank efetch (here with bioperl EUtilities) and, contrary to expectation, retrieving a bunch of accessions (or GIs) when that single accession is what is wanted. The trick is to change the "complexity" parameter from its apparent default of "1" to "0". Actually, this parameter might be worth adding to the HOWTO because it causes the EUtilities efetch to perform similar to a normal Entrez search. Which, to me, would be the expected behavior. Details below. Some accessions/GIs appear to be embedded in bundles of related sequences. Here is an example: gi|158819346|gb|EU011641.1| If I search Entrez Nucleotide http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar with the either "158819346" (the GI) or "EU011641.1", I get a single record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal RNA gene, partial sequence". This what I want. If I use the following code derived from the Eutils HOWTO: use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $id ='gb|EU011641.1|'; push @ids ,$id; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); my $file = "test.gb"; $factory->get_Response(-file => $file); I get a bundle of accessions: EU011584-EU011663. Same result using the GI number instead. From reading: http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam it looks like I would get what I want were I to set the efetch "complexity" parameter to "1". But how do I set that parameter? Below is how I did it. Not the most efficient path, but did not take that long to traverse... The HowTo does not mention it. I usually look to the the Deobfuscator: http://bioperl.org/cgi-bin/deob_interface.cgi to help me when I want some documentation for a method. But this is a parameter not a class. What class sets this parameter? Not sure. So I googled: complexity eutil site:bioperl.org The top ranked hit is actually to the deprecated 1.5.2 version of EUtilities. But the 2nd hit is to the (auto generatated?) email posted to the bioperl-guts email list by Chris Fields upon his commit of the new EUtilities overhaul: http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html From here it looks like the obvious way to set the parameter would be possible. And indeed: use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $id ='gb|EU011641.1|'; push @ids ,$id; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -complexity =>1, -id => \@ids); my $file = "test.gb"; $factory->get_Response(-file => $file); works! Also a good idea to add -email parameter so that Genbank might chastise me via email, rather than banning my IP, if I try to send more than 100 requests in a series outside of the acceptable 9PM-5AM Eastern Time hours. Phillip From peter at maubp.freeserve.co.uk Wed Mar 24 10:08:26 2010 From: peter at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:08:26 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: References: Message-ID: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> Hi, This is probably of interest to all the Bio* projects offering access to the NCBI Entrez utilities. See forwarded message below. I *think* the new guidelines basically say that the email & tool parameters are optional BUT if your IP address ever gets banned for excessive use you then have to register an email & tool combination. Regarding the email address, the NCBI say to use the email of the developer (not the end user). However, they do not distinguish between the developers of a library (like us), and the developers of an application or script using a library (who may also be the end user). Currently we (Biopython) and I think BioPerl ask developers using our libraries to populate the email address themselves. I *think* this is still the right action. Peter ---------- Forwarded message ---------- From: Date: Wed, Mar 24, 2010 at 1:53 PM Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy To: NLM/NCBI List utilities-announce New E-utility documentation now on the NCBI Bookshelf The Entrez Programming Utilities (E-Utilities) Help documentation has been added to the NCBI Bookshelf, and so?is now fully integrated with the Entrez search and retrieval system as a part of the Bookshelf database. This help document has been divided into chapters for better organization and includes several new sample Perl scripts. At present this book covers the standard URL interface for the E-utilties; material about the SOAP interface will be added soon and is still available at the same URL: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html. Revised E-utility usage policy In December, 2009 NCBI announced a change to the usage policy for the E-utilities that would require all requests to contain non-null values for both the?&email and &tool parameters. After several consultations with our users and developers, we have decided to revise this policy change, and the revised?policy is described in detail at the following link: http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen Please let us know if you have any questions or concerns about this policy change. Thank you, The E-Utilities Team NIH/NLM/NCBI eutilities at ncbi.nlm.nih.gov. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce -------------- next part -------------- _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From joseguillin at hotmail.com Tue Mar 23 13:30:44 2010 From: joseguillin at hotmail.com (Jose .) Date: Tue, 23 Mar 2010 17:30:44 +0000 Subject: [Bioperl-l] Phylo/Phylip/Consense Message-ID: Hello, I'm trying to use Phylo/Phylip/Consense, but I get the following message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: SeqBoot did not create files correctly (/var/folders/+s/+srMEKriEiWM+Q7Qleiti++++TI/-Tmp-/v3no1dYNqE/outfile) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::Tools::Run::Phylo::Phylip::SeqBoot::_run /usr/local/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm:389 STACK: Bio::Tools::Run::Phylo::Phylip::SeqBoot::run /usr/local/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm:339 STACK: INDELVOLUTION_5.1consensus.pl:492 ----------------------------------------------------------- My code is a modification of the code I found at http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Phylo/Phylip/Consense.pm use Bio::Tools::Run::Phylo::Phylip::Consense; use Bio::Tools::Run::Phylo::Phylip::SeqBoot; use Bio::Tools::Run::Phylo::Phylip::ProtDist; use Bio::Tools::Run::Phylo::Phylip::Neighbor; use Bio::Tools::Run::Phylo::Phylip::DrawTree; my $aio = Bio::AlignIO->new(-file =>'yeah.clustalw', -format=> 'clustalw'); my $aln = $aio->next_aln; my ($aln_safe, $ref_name)=$aln->set_displayname_safe(); #next use seqboot to generate multiple aligments my @params = ('datatype'=>'SEQUENCE','replicates'=>10); my $seqboot_factory = Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params); my $aln_ref= $seqboot_factory->run($aln); #my $aln_ref= $seqboot_factory->run($aln_safe); #next build distance matrices and construct trees my $pd_factory = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(); my $ne_factory = Bio::Tools::Run::Phylo::Phylip::Neighbor->new(); my @tree; foreach my $a (@{$aln_ref}){ my $mat = $pd_factory->create_distance_matrix($a); push @tree, $ne_factory->create_tree($mat); } #now use consense to get a final tree my $con_factory = Bio::Tools::Run::Phylo::Phylip::Consense->new(); #you may set outgroup either by the number representing the order in #which species are entered or by the name of the species $con_factory->outgroup(1); my $tree = $con_factory->run(\@tree); # Restore original sequence names, after ALL phylip runs: my @nodes = $tree->get_nodes(); foreach my $nd (@nodes){ $nd->id($ref_name->{$nd->id_output}) if $nd->is_Leaf; } #now draw the tree my $draw_factory = Bio::Tools::Run::Phylo::Phylip::DrawTree->new(); my $image_filename = $draw_factory->draw_tree($tree); And my yeah.clustalw file is OK: CLUSTAL W(1.81) multiple sequence alignment A/1-474 G---CGGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C B/1-452 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C C/1-466 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C D/1-476 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGA-------------TC-CTA----C E/1-439 G---CCGTGGGAGA------TGAGGAACCTGAGGTAGTCC-----TATATCTCTAGCGGC F/1-434 G---CCGTGGGAGA------TGAGGAACCCGAGG---TCC-----TATATCTCTAGCGGC G/1-462 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTA---------------TCTAGCGGC H/1-466 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTAGTCC--------ATCTCTAGCGGC I/1-462 GCTGCCGTGGGAGAGCAACATGAGGAACCGGAGGTAGTCCGGTATTATATCTCTA----C J/1-447 GCTGCCGTGGGAGAGCAACATGAGGAACCGGAGGTAGTCCGGTATTATATCTCTA----C K/1-448 G---CCGTGGGAGAGCA-CATGAGGAACCCGAGGTAGTCCGGT---ATATCTCGA----C L/1-431 G---CCGTGGGAGAGCA-CATGAGGAACCCGAGGTAGTCCGGT---ATATCTCTA----C M/1-432 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCGGTATTATATCTCTA----C N/1-422 G---CC------GAGCAACATGAGGAAC---AGGTTGTC---TATTATATCTCTA----C O/1-441 G---CAGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCG--------TCTCTA----C P/1-446 G---CAGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCG--------TCTCTA----C * * ** ******** *** * * * A/1-474 GCATCGCGGCCCTTGTC-GCTCCCACCCA--CCATC---GACGGC-ACA--TTTGCTTGT B/1-452 GCAT----------GTC-GCTC---------CCATCGCTGACGGC-ACATCTTTG---GT C/1-466 GCATCGCGGCCCTTGTC-GCTCCCACCCATCCCATCGCTGACGGC-ACA-----GCTTGT D/1-476 GCATCGCGGCCCTTGTC-GCTCCCACCCATCCCATCGCTGACGGC-ACA-----GCTTG- E/1-439 GCA-CGCGGCCCT--TC-GCTT---CCCATCCCATCGCTGACGGC-ACATCT----TTGT F/1-434 GCA-CGCGGCCCT--TCCGCTT---CCCATCCCATCGCTTACGGC-ACATCTTTGCTTGT G/1-462 GCATCGCGGCCCT--TC-GCTC---CCCATCCCATCGCTGACGTC-ACATCTTTG-TTGT H/1-466 GCATCGCGGCCCT--TC-GCTC---CCCATCCCATCGCTGACGGC-ACATCTTTGCTTGT I/1-462 GCAT-CCGGCCCTTGTC-GCTCCCA------CCATCGCTGACGGC-ACAT--TTGCTTGT J/1-447 GC------GCCCTTGTC-GCTCCCA---------TCGCTGACGGC-ACATCTTTGCTTGT K/1-448 GCATCC----CCTTGTC-GCTCCCA------CCATCGCTGACGGC----TCTTTGCTTGT L/1-431 GCATCC----CCTTGTC-GCTCCCA------CCATCGCTGACGGC----TCTTTGCTTGT M/1-432 GCATC---GCCCTTGTC-GCTCCCA------CCATCGCTGAC-GC-ACATC-TTGCTTGT N/1-422 GCATC---GCCCTTGTC-GCTCCCA------CCATCGCTGACAGCAACATCTTTGCTTGT O/1-441 GCATC---GCCCTTGTC-GCTCCCA------CCATCTCTGACGGC-ACATCTTTGCTTGT P/1-446 GCATC---GCCCTTGTC-GCTCCCA------CCATCTCTGACGGC-ACATCTTTGCTTGT ** ** *** ** ** * * A/1-474 ACGAGATTGCTTTCACACTA-TCTATTGTTCGGGTACCGAGAGTCGGCGGTGAATACATC B/1-452 ACGAGATTGCGTTCACACTA-TCCATTGTTCGGGTACCGAGAGTC-GCGGTGAATACATC C/1-466 ACGTG--TGCGTTCCCACTAATCCATTGTTCGGGTAACGAGAGTCGGCGGTGAATACATG D/1-476 -CGTGATTGCGTTCCCACTAATCCATTGTTCGGGTAACGAGAGTCGGCGGTGAATACATC E/1-439 ACGTGATTGCG----CA--AATCCATTGT---GGTACCGAGAGTCGGCGGTGAACT---C F/1-434 ACGTGATTGCG----CA--AATCCATTGTTCGGGTACCGAGAGTCG-----GAACT---C G/1-462 ACGT----GCGTTCCCA--AATCCATTGTTCGGGTACCGAGAGTCGGCGGTGAACT---C H/1-466 ACGT-------TTCCCA--AATCCAT---TCGGGTACCGAGAGTCGGCGGTGAACT---C I/1-462 ACGTGATTGC--TCCCACCAATCCAT-GTTCGGGTACCGAGAGTCGGCGGTGAACTCATC J/1-447 ACGTGATTGC--TCCCACTAATCCAT-GTTCGGGTACCGA-----------GAACTCATC K/1-448 ACGTGATTGC--TCCCACTAATCCACTG--------CCGAGAGTCGGCGGTG---CCATC L/1-431 ACGTGATTGC--TC------ATC--TTGTTCGGGTACCGA-----GGCGGTGAACTCATC M/1-432 ACGTGATTGC--TCCCACTAATCC----TTCGGGTACCAAGAGTCGGCGGTGAACTCATC N/1-422 ACGTGATTGC--TCCCACTAATCC----TTCGGGTACCAAGAGTCGGCGGTGAACTCATC O/1-441 ACGTGATTGC--TCCCACTAATCCAT--TTCGGGTACCGAGAGTCGGCGGTGAACTCATC P/1-446 ACGTGATTGC--TCCCACTAATCCATTG--CGGGTACCGAGAGTCGGCGGTGAACTCATC ** ** * * * A/1-474 TCCGGAG--AAGTGTGCTAACCACAGTG--GAACGTATAATGCTGATCCCGCTTGTTT-- B/1-452 TCCGGAG--AA--GTGCTAACCACAGTG--GAACGTATAATGCTGAT-CCGCTT-TTT-- C/1-466 TCCGGAG--AAGTGTGCTAACCACAGTG--GAAAGTATAATGCT-----------TTT-- D/1-476 TCCGGAG--AAGTGT---AACCACAGTG--GAAAGTATAATGCTGATCCCGCTTGTTT-- E/1-439 TCCGG-----AGTGTGG-AACCACAGTG--GAACGTATAATGC--ATCTCGCGTGTTT-- F/1-434 TCCGG-----AGTGTGGTAACCACAGTG--GAACGTATAATGC--ATCCCGCGTGTTT-- G/1-462 TCCGGAG--AAGTGTGGTAACCACAGTG--GAACGTATAATGC--ATC--GCGTGTTT-- H/1-466 TCCGGAG--AAGTGTGGTAACCACAGT----AACGTAT-ATGC--ATCCCGCGTGTTT-- I/1-462 TCCGGAG--AAGTGTGGTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC J/1-447 TCGGGAG--AAGTGTGCTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC K/1-448 TCCGGAG--AAGTGTGGTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC L/1-431 TCCGGAG--AAGTGTG----CCACAGTGCCGAAC--ATAATGC--ATC--GCGTGTTTGC M/1-432 TCCGGAGGAAAGTGTGGTAACCACAGTG--GAAC---------------CGC----TTCC N/1-422 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGC----TTCC O/1-441 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGCGTGTTTCC P/1-446 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGCGTGTTTCC ** ** * ** ******* ** ** A/1-474 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----ATGTAC-CCGAGGACAACTAACAGT B/1-452 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----AGGTAC-CCGAGGACAACTAACAGT C/1-466 --CTGTACCTAAAGTTCACCGGGTAGAGCCTCGTCATGTAC-CCG-----AACTAACAGT D/1-476 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----ATGTAC-CCGAGGACAACTAACAGT E/1-439 --CCGTACCTAAAGTT------GTAGGGCC-----ATGTACACCGAGGACAACTAACAGT F/1-434 --CCGTACCTAAAGTT-----GGTAGGGCC-----ATGTACACCGAGGACAACTAACAGT G/1-462 --CCGTACCTAAAGTTCTCC--GTAGGGCC-----ATGTACACCGAGGACAACTAACAGT H/1-466 --CCGTACCTAAAGTTCACCGGGTAGGGCC-----ATGTACACCGAGGACAACTAACAGT I/1-462 GATCGTACCTAAAGTTCACC--------CC-----A-------CGAG----ACTAACAG- J/1-447 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAG- K/1-448 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAGT L/1-431 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAGT M/1-432 GACCGTACCT-----T-ACCG-GTAGCGCC-----ATGTACACCGAGC---ACTA----T N/1-422 GACCGTACCT-----TCACCG-GTAGTGCC-----ATGTACACCGAGC---ACTAACAGT O/1-441 GACCGTACCT-----TCACCG-GTAGCGCC-----ATGTACACCGAGC---ACTAACAGT P/1-446 GACCGTACCT-----TCACCG-GTAGCGCC-----ATG---ACCGAGC---ACTAACAGT ****** * ** * ** **** A/1-474 GATCCTCA----TCTAAGCGCCGCTTCAGGAC----ATTGCCACGTCTACATCG------ B/1-452 GATCCTCA----TTTAAGCGCCGCTTCAGGCC----ATTGCCACGTCTACATCG------ C/1-466 GATCCTCA----TTTAAGCGCCGCTTCAGGAC----ATTACCACGTCTACATCGTTTCAT D/1-476 GATCCTCA----TTTAAGCGCCGCTTCAGGAC----ATTACCACGTCTACATCGTTTCCT E/1-439 GATCCTCA----TTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT F/1-434 GATCCTCA----TTTAAGCGCCGC---AGGACTTTTATTGCCACGTCTACATCGTTTCAT G/1-462 GATCCTCACAATTTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT H/1-466 GATCCTC-CCATTTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT I/1-462 ---CCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCCACGTCTACATC---TCAT J/1-447 ---CCTCA----T-TAAGCGCCGCTGCAGGAC----ATTGCCACGTCTACATCGTTTCAT K/1-448 GATCCTCA----TTTAAGCGCCGCTGCAGG-------TTGCCACGTCTACATCGTTTCAT L/1-431 GATCCTCA----TTTAAGCGCCGCTGC----------TTGCCACGTCTACATCGTTTCAT M/1-432 GATC--CA----TTTAAGCGCCGCTGCAGG--------TGCCACGTCTACATCGTTTCAT N/1-422 GATC--CA----TTTAAGCGCCGCTGCAGGAA----ATTGCCACGTCTACATCGTTTCAT O/1-441 GATCCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCC--GTCTACATCGTA---- P/1-446 GATCCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCC--GTCTACATCGTTTCA- * * * ********** * ** ********* A/1-474 -CATCTACTCTT--AGGCAGCAACAATTTGTCTCGTTCGACGTACAG--CGAAC--ATGT B/1-452 -CATCTACTCTT--AGGCAGCAACAATT-GTCTCGTTCGATGTACAG--CGAAC--ATGT C/1-466 TCATCTACTTTT--AGCCAGCAACAATTTGTCTCGTAGGATGTACAG--CGAACATA--- D/1-476 TCATCTACTTTT--AGCCAGCAACAATTTGTCTCGTAGGATGTACAG--CGAACATA--- E/1-439 TCATCTACTTTT--AGGCAGCAACA---TGTATCGTACGATGTACAG--CGAACATATGT F/1-434 TCATCTACTTTT--AGGCAGCAACA---TGTATCGTACGATGTACAG--CGAA------T G/1-462 TCATCTACTTTT--AGGC-GCAACAATCTGTATCG-ACGATGTAC-G--CGAACATATGT H/1-466 TCATCTACTTTT--AGGC-GCAACAATCTGTATCG-ACGATGTAC-G--CGAACATATGT I/1-462 TCACCTACTTTT--AGGGAGCAACAATCTGTATCC---G--GTACAGACCGAACATAGGA J/1-447 TC----AC-TTT--AGGGAGCAACAATCTGTATCC---G--GTAC---CCGAACATAGGT K/1-448 TCACCTACTTTT--AGGCAGCAACAATCT--ATCC---G--GTAC-GACCGAACATAGGT L/1-431 TCACCTACTTTT--AGGCAGCAACAATCT--ATCC---G--GTAC-GACCGAACATAGGT M/1-432 TCATTTACT-----AGGCAGCAACAATCTGTATC--------TATAGACCGAGCATATGT N/1-422 TCATCTACT-----AGGCAGCAACAATCTGTATCC---G--GTATAGACCAAGCATATGT O/1-441 ------ACTTTT--AGGCAGCAAC--TCTGTATCC---G--GTATAGACCGAACATATGT P/1-446 ------ACTTTTTGAGGCAGCAAC--TCTGTATCC---G--GTATAGACCGAACATATGT ** ** ***** ** ** * * A/1-474 GGGGCGTAAGACCAAAGTT--TATCGTTGGCCTTATTCGACCCAA-CAATTCGCGGATA- B/1-452 GGGGCGTAAGACCAAAGTT--TATCGTTGGCCTTATTCGACCCAA-CAATTCGCGGATA- C/1-466 TGGGCGTAAGACCAAAGTTGAT--CGTTGG---TATTCGACCCAATCAAGTCGCG----- D/1-476 TGGGCGTAAGACCAAAGTTGAT--CGTGGGCCTTATTCGACCCAATCAATTCGCG---A- E/1-439 T----GTAAGACCAAAGTT--TATCGTTGG---TATTTGACCCAGGCAATTCGCGGATA- F/1-434 T----GTAAGACCAAAGTT--TATCGTTGG---TATTTGACCCAGGCAATTCGCGGATA- G/1-462 T--GCGTAAGACCAAAGTT--TATCGTTGGCCTTATTTGACC----CAATTCGCGGGTA- H/1-466 T--GAGTAAGACCAAAGTT--TATCGTTGGCCTTATTTGACC----CAATTCGCGGGTA- I/1-462 TGTGCTTAAGACCAAAGTT--TATCGTT------ATATGACCCAAGCAATTCGCGGATA- J/1-447 -GTGCTTAAGACCAAAGTT--TATCGTT------ACATGACCCAAGCAATTCGCGGATA- K/1-448 TGGGCGCAAGACCAAAGTT--TATCGTT------ATTTGACCCAAGCAATTCGCGGATAC L/1-431 TGGGCGCAAGACCAAAGTT--TATCGTT------ATTTGACCCAAGCAATTCGC-GATA- M/1-432 TGGGCGTAAGACCAAAGTT--TATCGTTGGCTTT----GACCCAAGCAAT--GC------ N/1-422 TGGGGGTAAGACCAA-------------GGCTTT----GACCCAAGCAAT--GC------ O/1-441 TGGGCG-AAGACCAAAGTT--TATCGATGGCCTTATTTGACCCAAGCAAT--GCGGATA- P/1-446 TGGGCG-AAGACCAAAGTT--TATCGATGGCCTTATTTGACCCAAGCAAT--GCGGATA- ******** **** *** ** A/1-474 -A--AT-------TTATTCATTATTACCACTGATCAC--CCTG-CACCTATGCGGTTT-- B/1-452 -A--ATCCCGTCTTTATTC------ACCACTGATCAC--CCTG-CAC--ATGCGGTTT-- C/1-466 -----TCCCGTCTTTATTCATTATAACCACTGATCAC--CCTGGCAC--ATGCGCTTT-- D/1-476 -A--ATCCCGTCTTTATTCATTATAACCACTGATCACGACCTGGCAC--ATGCGCTAT-- E/1-439 -A---TCCCGTCTTTATT--TTTTTAGC-CTGATCTC--CCTGGCAC--AT--------- F/1-434 -A---TCCCGTCTTTATTCATTTTTACC-CTGATCTC--C---------AT--------- G/1-462 -A--ATCCCGTCTTTATTCATTATAACC-CTGATCTC--CCTGGCAC--ATGCGGTTA-- H/1-466 -A--ATCCCGTCTTTATTCATTATAACC-CTGATCTC--CCTGGCAC--ATGCGGTTA-- I/1-462 -AGGATCCTGT--TTATTCTTTATAACC-CTGATCAC--CCTGGCAT--ATGCGGTTTGC J/1-447 -AGGATCCCGT--TTATTCTTTATAACC-CTGATCAC--CCTGGCAC--ATGCGGTTTGC K/1-448 AAGGATCCCGT-----GTCATTATAACC-CTGATCAC--ACTGGCAC--ATGCGGTTTGC L/1-431 -AGGATCCCGT-----TTCATTAT--CC-CTG-TCAC--CCTGGCAC--ATGCGGTTTGC M/1-432 --GGATCCCGT--TTATTCATTAAAACC-CTGA---C--CCTGGCAC--ATGCGGTTTGC N/1-422 --GGATCCCGT--TTATTCATTATAACC-CTGA---C--CCTGGCAC--ATGCGGTTTGC O/1-441 -ATGATCCCGT--TTATTCATTATAACC-CT---CAC--CCTGGCAC--ATGCGGTTTGC P/1-446 -AGGATCCCGT--TTATTCATTATAACC-CTGATCAC--CCTGGCAC--ATGCGGTTTGC * * * ** * ** A/1-474 ACTTCGATGCC B/1-452 ACTTCGATGCC C/1-466 ACTTCGATG-- D/1-476 ACTTCGATGCC E/1-439 -CTTCGATGCC F/1-434 -CTTCGATGCC G/1-462 ACTTCGATG-- H/1-466 ACTTCGATGCC I/1-462 --TTCGATGCC J/1-447 ACTTCGATGCC K/1-448 ACTTCGATG-- L/1-431 ACTTCGATG-- M/1-432 ACTTCGATGCC N/1-422 ACTTCGATGCC O/1-441 ACTTCG-TGCC P/1-446 ACTTCG-TGCC **** ** I have tried different things, but I don't really know why do I have this problem... Does anyone knows? Thank you very much in advance, Jose G. _________________________________________________________________ ?Quieres saber qu? PC eres? ?Desc?brelo aqu?! http://www.quepceres.com/ From cjfields at illinois.edu Wed Mar 24 10:37:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 09:37:13 -0500 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> Message-ID: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> On Mar 24, 2010, at 9:08 AM, Peter wrote: > Hi, > > This is probably of interest to all the Bio* projects offering access > to the NCBI > Entrez utilities. See forwarded message below. > > I *think* the new guidelines basically say that the email & tool parameters are > optional BUT if your IP address ever gets banned for excessive use you then > have to register an email & tool combination. > > Regarding the email address, the NCBI say to use the email of the developer > (not the end user). However, they do not distinguish between the developers > of a library (like us), and the developers of an application or script using a > library (who may also be the end user). > > Currently we (Biopython) and I think BioPerl ask developers using our libraries > to populate the email address themselves. I *think* this is still the > right action. > > Peter Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I think with the SOAP-based ones as well). We're providing a specific set of tools for user to write up their own applications end applications. I can try contacting them regarding this to get an official response to clarify this somewhat. Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a default, but always leave the email blank and issue a warning if it isn't set. We could just as easily leave both blank and issue warnings for both. chris > ---------- Forwarded message ---------- > From: > Date: Wed, Mar 24, 2010 at 1:53 PM > Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy > To: NLM/NCBI List utilities-announce > > > New E-utility documentation now on the NCBI Bookshelf > > The Entrez Programming Utilities (E-Utilities) Help documentation has > been added to the NCBI Bookshelf, and so is now fully integrated with > the Entrez search and retrieval system as a part of the Bookshelf > database. This help document has been divided into chapters for better > organization and includes several new sample Perl scripts. At present > this book covers the standard URL interface for the E-utilties; > material about the SOAP interface will be added soon and is still > available at the same URL: > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html. > > > > Revised E-utility usage policy > > In December, 2009 NCBI announced a change to the usage policy for the > E-utilities that would require all requests to contain non-null values > for both the &email and &tool parameters. After several consultations > with our users and developers, we have decided to revise this policy > change, and the revised policy is described in detail at the following > link: > > http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen > > Please let us know if you have any questions or concerns about this > policy change. > > > > Thank you, > > The E-Utilities Team > > NIH/NLM/NCBI > > eutilities at ncbi.nlm.nih.gov. > > > > _______________________________________________ > Utilities-announce mailing list > http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Wed Mar 24 10:51:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:51:46 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> Message-ID: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: > > On Mar 24, 2010, at 9:08 AM, Peter wrote: > >> Hi, >> >> This is probably of interest to all the Bio* projects offering access >> to the NCBI Entrez utilities. See forwarded message below. >> >> I *think* the new guidelines basically say that the email & tool parameters are >> optional BUT if your IP address ever gets banned for excessive use you then >> have to register an email & tool combination. >> >> Regarding the email address, the NCBI say to use the email of the developer >> (not the end user). However, they do not distinguish between the developers >> of a library (like us), and the developers of an application or script using a >> library (who may also be the end user). >> >> Currently we (Biopython) and I think BioPerl ask developers using our libraries >> to populate the email address themselves. I *think* this is still the >> right action. >> >> Peter > > > Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I > think with the SOAP-based ones as well). ?We're providing a specific set of > tools for user to write up their own applications end applications. ?I can try > contacting them regarding this to get an official response to clarify this > somewhat. Please give the NCBI an email - you can CC me too if you like. > Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a > default, but always leave the email blank and issue a warning if it isn't > set. ?We could just as easily leave both blank and issue warnings for both. We currently leave out the email and set the tool parameter to "Biopython" by default but this can be overridden. Currently leaving out the email does cause Biopython to give a warning. Peter From pmiguel at purdue.edu Wed Mar 24 10:59:50 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed, 24 Mar 2010 10:59:50 -0400 Subject: [Bioperl-l] How to set "complexity" param using EUtilities In-Reply-To: <4BAA1883.3010203@purdue.edu> References: <4BAA1883.3010203@purdue.edu> Message-ID: <4BAA28E6.4090907@purdue.edu> Sorry, I got that backwards. The default is "0", apparently. But to get entrez-like performance you want "complexity" to be set to "1". Phillip Phillip San Miguel wrote: > Just a little FYI that might help someone using GenBank efetch (here > with bioperl EUtilities) and, contrary to expectation, retrieving a > bunch of accessions (or GIs) when that single accession is what is > wanted. The trick is to change the "complexity" parameter from its > apparent default of "1" to "0". > > Actually, this parameter might be worth adding to the HOWTO because it > causes the EUtilities efetch to perform similar to a normal Entrez > search. Which, to me, would be the expected behavior. > > Details below. > > Some accessions/GIs appear to be embedded in bundles of related > sequences. Here is an example: > > gi|158819346|gb|EU011641.1| > > > If I search Entrez Nucleotide > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar > > with the either "158819346" (the GI) or "EU011641.1", I get a single > record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal > RNA gene, partial sequence". This what I want. > > If I use the following code derived from the Eutils HOWTO: > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $id ='gb|EU011641.1|'; > push @ids ,$id; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -id => \@ids); > > my $file = "test.gb"; > $factory->get_Response(-file => $file); > > I get a bundle of accessions: EU011584-EU011663. > Same result using the GI number instead. > > From reading: > > http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam > > > it looks like I would get what I want were I to set the efetch > "complexity" parameter to "1". > > But how do I set that parameter? Below is how I did it. Not the most > efficient path, but did not take that long to traverse... > > The HowTo does not mention it. I usually look to the the Deobfuscator: > > http://bioperl.org/cgi-bin/deob_interface.cgi > > to help me when I want some documentation for a method. But this is a > parameter not a class. What class sets this parameter? Not sure. So I > googled: > > complexity eutil site:bioperl.org > > The top ranked hit is actually to the deprecated 1.5.2 version of > EUtilities. But the 2nd hit is to the (auto generatated?) email posted > to the bioperl-guts email list by Chris Fields upon his commit of the > new EUtilities overhaul: > > http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html > > > From here it looks like the obvious way to set the parameter would be > possible. And indeed: > > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $id ='gb|EU011641.1|'; > push @ids ,$id; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -complexity =>1, > -id => \@ids); > > my $file = "test.gb"; > $factory->get_Response(-file => $file); > > works! > > Also a good idea to add -email parameter so that Genbank might > chastise me via email, rather than banning my IP, if I try to send > more than 100 requests in a series outside of the acceptable 9PM-5AM > Eastern Time hours. > > Phillip > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at drycafe.net Wed Mar 24 11:27:37 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Mar 2010 11:27:37 -0400 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> On Mar 24, 2010, at 10:51 AM, Peter wrote: > Please give the NCBI an email - you can CC me too if you like. Can't this be the developers' mailing list (or lists, the appropriate one for each toolkit)? We can even whitelist all NCBI sender addresses so they can easily email us if there are issues. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed Mar 24 11:44:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 10:44:21 -0500 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <338BDDD8-2A66-4086-BFB7-35EC8F8F0D66@illinois.edu> On Mar 24, 2010, at 9:51 AM, Peter wrote: > On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: >> >> On Mar 24, 2010, at 9:08 AM, Peter wrote: >> >>> Hi, >>> >>> This is probably of interest to all the Bio* projects offering access >>> to the NCBI Entrez utilities. See forwarded message below. >>> >>> I *think* the new guidelines basically say that the email & tool parameters are >>> optional BUT if your IP address ever gets banned for excessive use you then >>> have to register an email & tool combination. >>> >>> Regarding the email address, the NCBI say to use the email of the developer >>> (not the end user). However, they do not distinguish between the developers >>> of a library (like us), and the developers of an application or script using a >>> library (who may also be the end user). >>> >>> Currently we (Biopython) and I think BioPerl ask developers using our libraries >>> to populate the email address themselves. I *think* this is still the >>> right action. >>> >>> Peter >> >> >> Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I >> think with the SOAP-based ones as well). We're providing a specific set of >> tools for user to write up their own applications end applications. I can try >> contacting them regarding this to get an official response to clarify this >> somewhat. > > Please give the NCBI an email - you can CC me too if you like. Sent, have cc'd the open-bio list. Don't want to cross-post this too much, so I think we should move the discussion there. >> Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a >> default, but always leave the email blank and issue a warning if it isn't >> set. We could just as easily leave both blank and issue warnings for both. > > We currently leave out the email and set the tool parameter to "Biopython" > by default but this can be overridden. Currently leaving out the email does > cause Biopython to give a warning. > > Peter We follow the same, then (down to the warning). This is mentioned in my post to them, I'll wait to see what they say. My concern is the wording of the new rules. Each tool and email must be registered with them if an IP is blocked. Does this mean each tool is assigned one specific email? And an IP that is blocked can register it to be allowed back into the fold? With that in mind, should we register each of our toolkits with them? Probably not a bad thing (it might help us as devs to get an idea of use), but then if one user abuses the rules will their actions affect all toolkit users? Is this all done on a per-IP basis, per-toolkit basis, etc? Unfortunately, at least to me, none of this is made very clear, so I'm hoping there is some clarification from their end. chris From maj at fortinbras.us Wed Mar 24 12:37:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 24 Mar 2010 12:37:56 -0400 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: I think this is a great idea--- MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Peter" Cc: ; "Biopython-Dev Mailing List" ; ; "bioperl-l list" ; "Chris Fields" ; Sent: Wednesday, March 24, 2010 11:27 AM Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > >> Please give the NCBI an email - you can CC me too if you like. > > > Can't this be the developers' mailing list (or lists, the appropriate one for > each toolkit)? We can even whitelist all NCBI sender addresses so they can > easily email us if there are issues. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From thomas.sharpton at gmail.com Wed Mar 24 13:43:48 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 24 Mar 2010 10:43:48 -0700 Subject: [Bioperl-l] Codeml runtime error Message-ID: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> Hi Bioperl gurus, I'm trying to run PAML v4.3b on a series of orthologs, specifically by implementing codeml to detect signatures of positive selection between all orthologous pairs. In some of my files, I notice that I'm getting an EOF error that causes codeml to break. The weirdness is that I only get the EOF error under one hypothesis model (the null) and never on the alternative hypothesis model - even when run on the same initial data. I've managed to track the problem down to the way BioPerl formats the temporary phylip alignment file that is fed into codeml. Apparently, PAML requires there to be at least two spaces between the sequence identifier and the start of the sequence. However, for some files - and I don't know if this is random or not - the temporary alignment file only contains one space after the sequence identifier. If I edit the phylip file accordingly and rerun codeml, the software compiles and processes the data correctly. Has anyone run into this problem before and has someone figured a work around using the kaks_factory in Bio::Tools::Run::Phylo::PAML::Codeml.pm? If this is something others have not seen, I'll submit a full bug report. Best regards, Tom From Russell.Smithies at agresearch.co.nz Wed Mar 24 15:53:45 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 25 Mar 2010 08:53:45 +1300 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> The email thing is mainly to help NCBI contact developers who may be abusing or having trouble with their services. I've had an email from Scott McGinnis at NCBI before after he noticed one of my scripts could be improved. Generally, I've found their developers to be useful - it's just some of their helpdesk people who could use a lesson in being helpful. After all, it's not like they're Google or Microsoft and just collecting addresses so they can spam you later ;-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Thursday, 25 March 2010 5:38 a.m. > To: Hilmar Lapp; Peter > Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython- > Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris > Fields > Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI > RevisedE-utility Usage Policy > > I think this is a great idea--- MAJ > ----- Original Message ----- > From: "Hilmar Lapp" > To: "Peter" > Cc: ; "Biopython-Dev Mailing List" > ; ; "bioperl- > l > list" ; "Chris Fields" > ; > > Sent: Wednesday, March 24, 2010 11:27 AM > Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI > RevisedE-utility Usage Policy > > > > > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > > > >> Please give the NCBI an email - you can CC me too if you like. > > > > > > Can't this be the developers' mailing list (or lists, the appropriate > one for > > each toolkit)? We can even whitelist all NCBI sender addresses so they > can > > easily email us if there are issues. > > > > -hilmar > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > > =========================================================== > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Mar 24 16:01:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 15:01:50 -0500 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> Message-ID: Russell, The problem we're possibly running into now is that (acc. to the documents) we will likely have to define both the tool and email (or neither), as the tool and email are registered together. There are advantages and disadvantages to both scenarios, one that you point out. ATM I'm awaiting back word from NCBI for clarification (I popped 'em an email about this earlier) and will hopefully post their response here if they send one, then we'll hash out what needs to be done. And agreed about Scott, he's always been helpful. chris On Mar 24, 2010, at 2:53 PM, Smithies, Russell wrote: > The email thing is mainly to help NCBI contact developers who may be abusing or having trouble with their services. > I've had an email from Scott McGinnis at NCBI before after he noticed one of my scripts could be improved. Generally, I've found their developers to be useful - it's just some of their helpdesk people who could use a lesson in being helpful. > > After all, it's not like they're Google or Microsoft and just collecting addresses so they can spam you later ;-) > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >> Sent: Thursday, 25 March 2010 5:38 a.m. >> To: Hilmar Lapp; Peter >> Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython- >> Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris >> Fields >> Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI >> RevisedE-utility Usage Policy >> >> I think this is a great idea--- MAJ >> ----- Original Message ----- >> From: "Hilmar Lapp" >> To: "Peter" >> Cc: ; "Biopython-Dev Mailing List" >> ; ; "bioperl- >> l >> list" ; "Chris Fields" >> ; >> >> Sent: Wednesday, March 24, 2010 11:27 AM >> Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI >> RevisedE-utility Usage Policy >> >> >>> >>> On Mar 24, 2010, at 10:51 AM, Peter wrote: >>> >>>> Please give the NCBI an email - you can CC me too if you like. >>> >>> >>> Can't this be the developers' mailing list (or lists, the appropriate >> one for >>> each toolkit)? We can even whitelist all NCBI sender addresses so they >> can >>> easily email us if there are issues. >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Mar 24 15:53:48 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 24 Mar 2010 12:53:48 -0700 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBIRevisedE-utility Usage Policy In-Reply-To: References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com><5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: <1A4207F8295607498283FE9E93B775B406A418BB@EX02.asurite.ad.asu.edu> Well, the problem with NCBI using the address to email about problem users is that the lists can't really identify the user since it isn't a specific program, but someone's specific implementation utilizing the toolkit that is causing problems. So, not sure how this would help with the problem of dealing with trouble users. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen Sent: Wednesday, March 24, 2010 9:38 AM To: Hilmar Lapp; Peter Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython-Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris Fields Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBIRevisedE-utility Usage Policy I think this is a great idea--- MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Peter" Cc: ; "Biopython-Dev Mailing List" ; ; "bioperl-l list" ; "Chris Fields" ; Sent: Wednesday, March 24, 2010 11:27 AM Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > >> Please give the NCBI an email - you can CC me too if you like. > > > Can't this be the developers' mailing list (or lists, the appropriate one for > each toolkit)? We can even whitelist all NCBI sender addresses so they can > easily email us if there are issues. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Mar 24 16:38:31 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 24 Mar 2010 21:38:31 +0100 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> Message-ID: <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> Hi Tom, Thanks for your note. From your description, it sounds like a bug report is in order. If you could include a little test case so we can reproduce it, that would be great. Dave From thomas.sharpton at gmail.com Wed Mar 24 16:40:55 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 24 Mar 2010 13:40:55 -0700 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> Message-ID: <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> Hi Dave, Thanks for the prompt reply. I'll submit a full bug report along with a code snippet and sample data set that should demonstrate the error. If there's anyway I can help, do let me know. Best, Tom On Mar 24, 2010, at 1:38 PM, Dave Messina wrote: > Hi Tom, > > Thanks for your note. From your description, it sounds like a bug > report is in order. If you could include a little test case so we > can reproduce it, that would be great. > > > Dave > From David.Messina at sbc.su.se Wed Mar 24 16:52:59 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 24 Mar 2010 21:52:59 +0100 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> Message-ID: <4BEA53ED-87B6-4EE0-B5E6-AE304A335AA8@sbc.su.se> > Thanks for the prompt reply. I'll submit a full bug report along with a code snippet and sample data set that should demonstrate the error. Terrific, thanks! > If there's anyway I can help, do let me know. Oh don't worry...I will. :) D From cjfields at illinois.edu Thu Mar 25 00:50:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 23:50:11 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] Bio::DB::SeqFeature spliced_seq() In-Reply-To: <4BA7D267.6050704@bioperl.org> References: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> <4BA7D267.6050704@bioperl.org> Message-ID: <46D94C25-4E2D-4E64-A696-1C9D3F785EEB@illinois.edu> Yes, that's essentially what I have working now. I suppose the best way to do this is have an optional type supplied and splice only those, checking the subfeatures to ensure that type exists. I'll check against SeqFeatureI's spliced_seq() to see if there are any API issues. chris On Mar 22, 2010, at 3:26 PM, Jason Stajich wrote: > Yes it needs a special case I guess - since spliced_seq should work, > however ... The only problem is that if both exons and CDS are > sub-features you have to be smart enough to not grab both... > > So I have just relied on specialized dumping scripts for gff3_to_cds for > my own needs (i.e. > http://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/dbgff_to_cdspep.pl > ). > But you might also see what the Gbrowse plugin dumpers do. > > -jason > Chris Fields wrote, On 3/22/10 11:56 AM: >> I have just noticed that spliced_seq() is borked with >> Bio::DB::SeqFeature and am thinking about implementing it. Or is >> similar functionality already implemented elsewhere? >> >> Currently, it is calling entire_seq(), which I plan on avoiding simply >> to prevent sucking in the entire sequence into memory. This is >> currently what happens: >> >> >> --------------------------- >> >> my $it = $store->get_seq_stream(-type => 'mRNA'); >> >> my $ct = 0; >> while (my $sf = $it->next_seq) { >> my $seq = $sf->spliced_seq; # dies with exception >> } >> >> --------------------------- >> >> ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >> MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented >> by package Bio::DB::SeqFeature. >> This is not your fault - author of Bio::DB::SeqFeature should be blamed! >> >> STACK: Error::throw >> STACK: >> Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 >> STACK: >> Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 >> STACK: >> Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 >> STACK: >> Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 >> STACK: beestore.pl:17 >> ---------------------------------------------------------------- >> >> >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse From lpritc at scri.ac.uk Thu Mar 25 07:20:01 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 25 Mar 2010 11:20:01 +0000 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Message-ID: Hi, Nathan's been in touch to ask exactly what the command-line was that I was using, and this was missing from the thread so, for info: bp_genbank2gff3.pl --noCDS NC_000913.gbk And bp_genbank2gff3.pl --CDS NC_000913.gbk With occasional absolute paths to the input sequence. L. On 23/03/2010 Tuesday, March 23, 18:18, "Scott Cain" wrote: > Hi Leighton, > > I wonder if this is a change stemming from Nathan's work on this > script. Nathan? > > Scott > -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From aradwen at gmail.com Fri Mar 26 07:29:16 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 26 Mar 2010 12:29:16 +0100 Subject: [Bioperl-l] aacomp.pl problem Message-ID: Hello, I'm facing a little problem with aacomp.pl in scripts examples that comes with Bioperl Here is the error message Can't locate object method "valid_aa" via package "Bio::Tools::CodonTable" at aacomp.pl line 16. Any Idea ? Thx Radwen From David.Messina at sbc.su.se Fri Mar 26 08:51:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 13:51:11 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: Message-ID: Hi Radwen, The latest version of aacomp (from subversion) worked fine for me. That version has this line near the top of the script: # $Id: aacomp.PLS 15088 2008-12-04 02:49:09Z bosborne $ If yours is different, you might try upgrading to the latest version. In fact, I'm almost certain that is the problem, since the valid_aa method is in the Bio::SeqUtils class, not Bio::Tools::CodonTable. Dave From David.Messina at sbc.su.se Fri Mar 26 10:24:25 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 15:24:25 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: Message-ID: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> Hi, Yes, the subversion site is temporarily down. However, there are nightly builds http://www.bioperl.org/DIST/nightly_builds/ and the Github mirror http://github.com/bioperl Dave On Mar 26, 2010, at 15:20, Radwen Aniba wrote: > The subversion site is down?!!! From David.Messina at sbc.su.se Fri Mar 26 10:35:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 15:35:29 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> Message-ID: <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> Radwen, Please be sure to 'reply all' so that everyone on the list can follow this discussion. > Sorry to ask beginners questions but how to configure these mirrors to upgrade ? > > I'm using ubuntu Step 1: download the bioperl-live tarball from, for example, http://www.bioperl.org/DIST/nightly_builds/ Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix Dave From cjfields at illinois.edu Fri Mar 26 10:40:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 09:40:20 -0500 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> Message-ID: <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> On Mar 26, 2010, at 9:35 AM, Dave Messina wrote: > Radwen, > > Please be sure to 'reply all' so that everyone on the list can follow this discussion. > > >> Sorry to ask beginners questions but how to configure these mirrors to upgrade ? >> >> I'm using ubuntu > > > > > Step 1: download the bioperl-live tarball from, for example, http://www.bioperl.org/DIST/nightly_builds/ > > Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > > > > Dave You can also get tarballs of bioperl-live from the github mirror (via the 'Download Source' link): http://github.com/bioperl/bioperl-live These are updated every 15 minutes. chris From aradwen at gmail.com Fri Mar 26 10:41:51 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 26 Mar 2010 15:41:51 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> Message-ID: Thank you 2010/3/26 Chris Fields > > On Mar 26, 2010, at 9:35 AM, Dave Messina wrote: > > > Radwen, > > > > Please be sure to 'reply all' so that everyone on the list can follow > this discussion. > > > > > >> Sorry to ask beginners questions but how to configure these mirrors to > upgrade ? > >> > >> I'm using ubuntu > > > > > > > > > > Step 1: download the bioperl-live tarball from, for example, > http://www.bioperl.org/DIST/nightly_builds/ > > > > Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > > > > > > > > > Dave > > > You can also get tarballs of bioperl-live from the github mirror (via the > 'Download Source' link): > > http://github.com/bioperl/bioperl-live > > These are updated every 15 minutes. > > chris From maj at fortinbras.us Fri Mar 26 10:34:49 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 10:34:49 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <4BABB825.6010803@cse.msu.edu> References: <4BABB825.6010803@cse.msu.edu> Message-ID: <249674A825C14BB3801C6184DEEA7A82@NewLife> Hi Alok-- Thanks for your interest! You should certainly consider applying. I can work with you on developing your application. I'm including the bioperl mailing list on this post; we'll continue to have this conversation on the list so that the helpful, friendly, knowledgeable, compassionate membership can participate. WrapperMaker code is currently available in svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run for an example of how Bio::Tools::Run::WrapperBase and CommandExts are used (er, by me...). cheers MAJ ----- Original Message ----- From: "Alok" To: Sent: Thursday, March 25, 2010 3:23 PM Subject: BioPerl Google SOC project > Hello Mark, > > My name is Alok Watve and I am currently pursuing PhD in Computer > Science at Michigan State University. I was going through the BioPerl > Wiki for Google SOC projects. I have good experience with Perl and was > wondering if I could work on the project "Perl Run Wrappers". > > Prior to joining MSU, I was working with D E Shaw India Software Pvt. > Ltd. My work was involved in writing Java programs and their perl > wrappers. We used perl scripts to fire java programs with all the > correct parameters. So I think I have some idea about what wrappers are. > However, I have not used BioPerl and may take some time to get familiar > with the structure. I am fairly confident that I will be able to do this. > > During my work here at MSU. I use perl a lot for doing basic text > analysis for my projects. Although I rarely use OO features of perl, I > have used them in past and never had any problems with it. I also > believe in writing well-documented and user/developer friendly code > (With comments, command line options for help/documentation). I have > attached a simple script I wrote for my project as an example. I have > also attached my resume for your consideration. > > Please let me know if you think that I am an appropriate candidate and > whether I should go ahead with submitting an application with BioPerl as > my Mentor Organization. > > Thanks a lot, > Alok > www.cse.msu.edu/~watvealo/ > -------------------------------------------------------------------------------- > #!/usr/bin/perl > > =pod > > =head1 SYNOPSIS > > Script to edit existing box query files to enable random box query. > This scripts inserts box size on each line corresponding to discrete > dimension in the existing box query file. The maximum value of "box size" > depends on the alphabet size. > > Example > ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile > > Use -perldoc for detailed help on options. > > =head1 OPTIONS > > =over > > =item -infile > > Specifies the name of the input box query file. > > =item -outfile > > Specifies the name of the output file. > > =item -uniform_box > > Specifies size of the uniform box query. > > =item -max_size > > Specifies the maximum box size for random sized box query. > > =item -help > > Displays a brief help message and exits. > > =item -perldoc > > Displays a detailed help. > > =back > > =cut > > use strict; > use warnings 'all'; > > use Getopt::Long; > use Pod::Usage; > > GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, > 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, > 'help' => \my $help, 'perldoc' => \my $perldoc); > > if(defined($perldoc)) > { > pod2usage(-verbose => 2); > } > > if(defined($help)) > { > pod2usage(-verbose=> 0); > } > > if(! (defined($infile) && defined ($outfile) )) > { > die('Please specify input, output files. Use -perldoc > for more help'); > } > > # Some basic error checking to ensure script runs .... > if(!(defined($uniformBox) ||defined($maxSize))) > { > die('Specify either box size for uniform box queries or maximum box size > for random box queries'); > } > > # Initialize random number generator. > srand(); > > # Read Input file and find out lines we are interested in > # Then perfix the line with correct box size as defined by > # user choice > open(IN, "<$infile"); > open(OUT, ">$outfile"); > my $count = 0; > while(my $line = ) > { > if( ($count%64) < 32 ) > { > if(defined($uniformBox)) > { > $line = sprintf("%d ",$uniformBox) . $line; > } > elsif(defined($maxSize)) > { > # This line corresponds to the discrete dimension. > $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; > } > } > $count ++; > print OUT $line > } > > close(OUT); > close(IN); > From cjfields at illinois.edu Fri Mar 26 11:06:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 10:06:26 -0500 Subject: [Bioperl-l] BioPerl and the Google Summer of Code Message-ID: Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via twitter: http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html http://use.perl.org/~cjfields/journal/40275 I'll update the BioPerl page with a couple more ideas later today (think: Moose and/or Perl6...). chris From awitney at sgul.ac.uk Fri Mar 26 11:20:36 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 26 Mar 2010 15:20:36 +0000 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl Message-ID: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please read the install the bioperl-ext package" even though i did compile and install the Bio::Ext::Align package If not using the pSW module, what do other people use for this? thanks adam From cjfields at illinois.edu Fri Mar 26 11:51:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 10:51:41 -0500 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl In-Reply-To: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> References: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> Message-ID: <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> It's not actively developed as far as I know. I've been thinking that we could break it out of bioperl-ext and release it on it's own, with the intent that someone could take it up at some point. We have started down that road with the HMM tools in bioperl-ext, though that one is still maintained by it's author. I know many users just use calls to outside programs, such EMBOSS (which has water and needle) or others. From the maintenance standpoint they're easier to update if something changes, XS can be a bugbear. chris On Mar 26, 2010, at 10:20 AM, Adam Witney wrote: > Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error > > "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. > Please read the install the bioperl-ext package" > > even though i did compile and install the Bio::Ext::Align package > > If not using the pSW module, what do other people use for this? > > thanks > > adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pmiguel at purdue.edu Fri Mar 26 11:52:17 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 26 Mar 2010 11:52:17 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook Message-ID: <4BACD831.20506@purdue.edu> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. I am trying to use the code provided at: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO and modified to request gi228534658 The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::DB::EUtilities; my @ids; push @ids, '228534658'; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); my $file = 'myseqs.gb'; # dump HTTP::Response content to a file (not retained in memory) $factory->get_Response(-file => $file); my $seqin = Bio::SeqIO->new(-file => $file, -format => 'genbank'); while (my $seq = $seqin->next_seq) { print "I see a sequence\n"; print $seq->species(); } "myseqs.gb" does have content: Seq-entry ::= seq { id { general { db "gpid:36555" , tag str "contig49313" } , genbank { accession "EZ113652" , version 1 } , gi 228534658 } , descr { title "TSA: Zea mays contig49313, mRNA sequence." , source { genome genomic , org { taxname "Zea mays" , db { { db "taxon" , tag id 4577 } } , orgname { name binomial { genus "Zea" , species "mays" } , lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACCAD clade; Panicoideae; Andropogoneae; Zea" , gcode 1 , mgcode 1 , div "PLN" } } } , molinfo { biomol mRNA , tech tsa } , pub { pub { article { title { name "Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing." } , authors { names std { { name name { last "Vega-Arreguin" , initials "J.C." } } , { name name { last "Ibarra-Laclette" , initials "E." } } , { name name { last "Jimenez-Moraila" , initials "B." } } , { name name { last "Martinez" , initials "O." } } , { name name { last "Vielle-Calzada" , initials "J.P." } } , { name name { last "Herrera-Estrella" , initials "L." } } , { name name { last "Herrera-Estrella" , initials "A." } } } } , from journal { title { iso-jta "BMC Genomics" , ml-jta "BMC Genomics" , issn "1471-2164" , name "BMC genomics" } , imp { date std { year 2009 , month 7 , day 6 } , volume "10" , issue "1" , pages "299" , language "ENG" , pubstatus aheadofprint , history { { pubstatus received , date std { year 2008 , month 12 , day 2 } } , { pubstatus accepted , date std { year 2009 , month 7 , day 6 } } , { pubstatus aheadofprint , date std { year 2009 , month 7 , day 6 } } , { pubstatus other , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } , { pubstatus pubmed , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } , { pubstatus medline , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } } } } , ids { pii "1471-2164-10-299" , doi "10.1186/1471-2164-10-299" , pubmed 19580677 } } , pmid 19580677 } } , pub { pub { sub { authors { names std { { name name { last "Vega-Arreguin" , first "Julio" , initials "J.C." } } , { name name { last "Ibarra-Laclette" , first "Enrique" , initials "E." } } , { name name { last "Jimenez-Moraila" , first "Beatriz" , initials "B." } } , { name name { last "Martinez" , first "Octavio" , initials "O." } } , { name name { last "Vielle-Calzada" , first "Jean" , initials "J.Philippe." } } , { name name { last "Herrera-Estrella" , first "Luis" , initials "L." } } , { name name { last "Herrera-Estrella" , first "Alfredo" , initials "A." } } } , affil std { affil "Laboratorio Nacional de Genomica para la Biodiversidad" , div "Cinvestav Campus Guanajuato" , city "Irapuato" , sub "Guanajuato" , country "Mexico" , street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , postal-code "36821" } } , medium other , date std { year 2009 , month 3 , day 23 } } } } , user { type str "GenomeProjectsDB" , data { { label str "ProjectID" , data int 36555 } , { label str "ParentID" , data int 0 } } } , create-date std { year 2009 , month 5 , day 5 } , update-date std { year 2009 , month 7 , day 14 } } , inst { repr raw , mol rna , length 450 , seq-data ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 760CFF0'H } } Maybe I am using the wrong format? This looks more like ASN than genbank format to me. Phillip From maj at fortinbras.us Fri Mar 26 11:37:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 11:37:56 -0400 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: References: Message-ID: <648F9E90AF07449887FD4C420AA8B00E@NewLife> and discussions are started in LinkedIn in 'Bioinformatics Geeks' and 'Perl Mongers' groups--MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Friday, March 26, 2010 11:06 AM Subject: [Bioperl-l] BioPerl and the Google Summer of Code > Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via > twitter: > > http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html > http://use.perl.org/~cjfields/journal/40275 > > I'll update the BioPerl page with a couple more ideas later today (think: > Moose and/or Perl6...). > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri Mar 26 12:16:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 11:16:22 -0500 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <4BACD831.20506@purdue.edu> References: <4BACD831.20506@purdue.edu> Message-ID: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. chris On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. > > I am trying to use the code provided at: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO > > and modified to request gi228534658 > > The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. > > Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: > > #!/usr/bin/perl > use strict; > use warnings; > > use Bio::SeqIO; > use Bio::DB::EUtilities; > > my @ids; > push @ids, '228534658'; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -id => \@ids); > > my $file = 'myseqs.gb'; > > # dump HTTP::Response content to a file (not retained in memory) > $factory->get_Response(-file => $file); > > my $seqin = Bio::SeqIO->new(-file => $file, > -format => 'genbank'); > > while (my $seq = $seqin->next_seq) { > print "I see a sequence\n"; > print $seq->species(); > } > > > "myseqs.gb" does have content: > > Seq-entry ::= seq { > id { > general { > db "gpid:36555" , > tag > str "contig49313" } , > genbank { > accession "EZ113652" , > version 1 } , > gi 228534658 } , > descr { > title "TSA: Zea mays contig49313, mRNA sequence." , > source { > genome genomic , > org { > taxname "Zea mays" , > db { > { > db "taxon" , > tag > id 4577 } } , > orgname { > name > binomial { > genus "Zea" , > species "mays" } , > lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; > Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; > PACCAD clade; Panicoideae; Andropogoneae; Zea" , > gcode 1 , > mgcode 1 , > div "PLN" } } } , > molinfo { > biomol mRNA , > tech tsa } , > pub { > pub { > article { > title { > name "Deep sampling of the Palomero maize transcriptome by a high > throughput strategy of pyrosequencing." } , > authors { > names > std { > { > name > name { > last "Vega-Arreguin" , > initials "J.C." } } , > { > name > name { > last "Ibarra-Laclette" , > initials "E." } } , > { > name > name { > last "Jimenez-Moraila" , > initials "B." } } , > { > name > name { > last "Martinez" , > initials "O." } } , > { > name > name { > last "Vielle-Calzada" , > initials "J.P." } } , > { > name > name { > last "Herrera-Estrella" , > initials "L." } } , > { > name > name { > last "Herrera-Estrella" , > initials "A." } } } } , > from > journal { > title { > iso-jta "BMC Genomics" , > ml-jta "BMC Genomics" , > issn "1471-2164" , > name "BMC genomics" } , > imp { > date > std { > year 2009 , > month 7 , > day 6 } , > volume "10" , > issue "1" , > pages "299" , > language "ENG" , > pubstatus aheadofprint , > history { > { > pubstatus received , > date > std { > year 2008 , > month 12 , > day 2 } } , > { > pubstatus accepted , > date > std { > year 2009 , > month 7 , > day 6 } } , > { > pubstatus aheadofprint , > date > std { > year 2009 , > month 7 , > day 6 } } , > { > pubstatus other , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } , > { > pubstatus pubmed , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } , > { > pubstatus medline , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } } } } , > ids { > pii "1471-2164-10-299" , > doi "10.1186/1471-2164-10-299" , > pubmed 19580677 } } , > pmid 19580677 } } , > pub { > pub { > sub { > authors { > names > std { > { > name > name { > last "Vega-Arreguin" , > first "Julio" , > initials "J.C." } } , > { > name > name { > last "Ibarra-Laclette" , > first "Enrique" , > initials "E." } } , > { > name > name { > last "Jimenez-Moraila" , > first "Beatriz" , > initials "B." } } , > { > name > name { > last "Martinez" , > first "Octavio" , > initials "O." } } , > { > name > name { > last "Vielle-Calzada" , > first "Jean" , > initials "J.Philippe." } } , > { > name > name { > last "Herrera-Estrella" , > first "Luis" , > initials "L." } } , > { > name > name { > last "Herrera-Estrella" , > first "Alfredo" , > initials "A." } } } , > affil > std { > affil "Laboratorio Nacional de Genomica para la Biodiversidad" , > div "Cinvestav Campus Guanajuato" , > city "Irapuato" , > sub "Guanajuato" , > country "Mexico" , > street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , > postal-code "36821" } } , > medium other , > date > std { > year 2009 , > month 3 , > day 23 } } } } , > user { > type > str "GenomeProjectsDB" , > data { > { > label > str "ProjectID" , > data > int 36555 } , > { > label > str "ParentID" , > data > int 0 } } } , > create-date > std { > year 2009 , > month 5 , > day 5 } , > update-date > std { > year 2009 , > month 7 , > day 14 } } , > inst { > repr raw , > mol rna , > length 450 , > seq-data > ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 > 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A > A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 > 760CFF0'H } } > > > Maybe I am using the wrong format? This looks more like ASN than genbank format to me. > > Phillip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 26 12:38:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 11:38:26 -0500 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: <648F9E90AF07449887FD4C420AA8B00E@NewLife> References: <648F9E90AF07449887FD4C420AA8B00E@NewLife> Message-ID: <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> BioPerl GSoC page updated with the Moose/Modern Perl/BioPerl 6-based project: http://www.bioperl.org/wiki/Google_Summer_of_Code#BioPerl_2.0_.28and_beyond.29 Feel free to add your name to the lost of mentors if you are interested. chris On Mar 26, 2010, at 10:37 AM, Mark A. Jensen wrote: > and discussions are started in LinkedIn in 'Bioinformatics Geeks' and 'Perl Mongers' groups--MAJ > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Friday, March 26, 2010 11:06 AM > Subject: [Bioperl-l] BioPerl and the Google Summer of Code > > >> Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via twitter: >> >> http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html >> http://use.perl.org/~cjfields/journal/40275 >> >> I'll update the BioPerl page with a couple more ideas later today (think: Moose and/or Perl6...). >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From pmiguel at purdue.edu Fri Mar 26 13:28:09 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 26 Mar 2010 13:28:09 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Message-ID: <4BACEEA9.2060407@purdue.edu> Ah, yes. That does the trick. Actually I have already downloaded a few thousand records in whatever that format that is returned when 'genbank' is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= seq {') Any idea what format that is and how to convert it to something SeqIO can use? If not, I can just pull them all down again by sending about 200 gi's per request. That should not offend the genbank gods... Thanks for your help, Phillip Chris Fields wrote: > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. > > Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. > > chris > > On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > > >> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. >> >> I am trying to use the code provided at: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO >> >> and modified to request gi228534658 >> >> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. >> >> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> >> use Bio::SeqIO; >> use Bio::DB::EUtilities; >> >> my @ids; >> push @ids, '228534658'; >> my $factory = Bio::DB::EUtilities->new( >> -eutil => 'efetch', >> -db => 'nucleotide', >> -rettype => 'genbank', >> -id => \@ids); >> >> my $file = 'myseqs.gb'; >> >> # dump HTTP::Response content to a file (not retained in memory) >> $factory->get_Response(-file => $file); >> >> my $seqin = Bio::SeqIO->new(-file => $file, >> -format => 'genbank'); >> >> while (my $seq = $seqin->next_seq) { >> print "I see a sequence\n"; >> print $seq->species(); >> } >> >> >> "myseqs.gb" does have content: >> >> Seq-entry ::= seq { >> id { >> general { >> db "gpid:36555" , >> tag >> str "contig49313" } , >> genbank { >> accession "EZ113652" , >> version 1 } , >> gi 228534658 } , >> descr { >> title "TSA: Zea mays contig49313, mRNA sequence." , >> source { >> genome genomic , >> org { >> taxname "Zea mays" , >> db { >> { >> db "taxon" , >> tag >> id 4577 } } , >> orgname { >> name >> binomial { >> genus "Zea" , >> species "mays" } , >> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; >> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; >> PACCAD clade; Panicoideae; Andropogoneae; Zea" , >> gcode 1 , >> mgcode 1 , >> div "PLN" } } } , >> molinfo { >> biomol mRNA , >> tech tsa } , >> pub { >> pub { >> article { >> title { >> name "Deep sampling of the Palomero maize transcriptome by a high >> throughput strategy of pyrosequencing." } , >> authors { >> names >> std { >> { >> name >> name { >> last "Vega-Arreguin" , >> initials "J.C." } } , >> { >> name >> name { >> last "Ibarra-Laclette" , >> initials "E." } } , >> { >> name >> name { >> last "Jimenez-Moraila" , >> initials "B." } } , >> { >> name >> name { >> last "Martinez" , >> initials "O." } } , >> { >> name >> name { >> last "Vielle-Calzada" , >> initials "J.P." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> initials "L." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> initials "A." } } } } , >> from >> journal { >> title { >> iso-jta "BMC Genomics" , >> ml-jta "BMC Genomics" , >> issn "1471-2164" , >> name "BMC genomics" } , >> imp { >> date >> std { >> year 2009 , >> month 7 , >> day 6 } , >> volume "10" , >> issue "1" , >> pages "299" , >> language "ENG" , >> pubstatus aheadofprint , >> history { >> { >> pubstatus received , >> date >> std { >> year 2008 , >> month 12 , >> day 2 } } , >> { >> pubstatus accepted , >> date >> std { >> year 2009 , >> month 7 , >> day 6 } } , >> { >> pubstatus aheadofprint , >> date >> std { >> year 2009 , >> month 7 , >> day 6 } } , >> { >> pubstatus other , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } , >> { >> pubstatus pubmed , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } , >> { >> pubstatus medline , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } } } } , >> ids { >> pii "1471-2164-10-299" , >> doi "10.1186/1471-2164-10-299" , >> pubmed 19580677 } } , >> pmid 19580677 } } , >> pub { >> pub { >> sub { >> authors { >> names >> std { >> { >> name >> name { >> last "Vega-Arreguin" , >> first "Julio" , >> initials "J.C." } } , >> { >> name >> name { >> last "Ibarra-Laclette" , >> first "Enrique" , >> initials "E." } } , >> { >> name >> name { >> last "Jimenez-Moraila" , >> first "Beatriz" , >> initials "B." } } , >> { >> name >> name { >> last "Martinez" , >> first "Octavio" , >> initials "O." } } , >> { >> name >> name { >> last "Vielle-Calzada" , >> first "Jean" , >> initials "J.Philippe." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> first "Luis" , >> initials "L." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> first "Alfredo" , >> initials "A." } } } , >> affil >> std { >> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , >> div "Cinvestav Campus Guanajuato" , >> city "Irapuato" , >> sub "Guanajuato" , >> country "Mexico" , >> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , >> postal-code "36821" } } , >> medium other , >> date >> std { >> year 2009 , >> month 3 , >> day 23 } } } } , >> user { >> type >> str "GenomeProjectsDB" , >> data { >> { >> label >> str "ProjectID" , >> data >> int 36555 } , >> { >> label >> str "ParentID" , >> data >> int 0 } } } , >> create-date >> std { >> year 2009 , >> month 5 , >> day 5 } , >> update-date >> std { >> year 2009 , >> month 7 , >> day 14 } } , >> inst { >> repr raw , >> mol rna , >> length 450 , >> seq-data >> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 >> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A >> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 >> 760CFF0'H } } >> >> >> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. >> >> Phillip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bioperlanand at yahoo.com Fri Mar 26 00:40:23 2010 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Thu, 25 Mar 2010 21:40:23 -0700 (PDT) Subject: [Bioperl-l] From Anand - a question on querying ncbi's genomeprj with Bio::DB::Eutilities Message-ID: <27160.94644.qm@web114211.mail.gq1.yahoo.com> Hi everybody, ? I have a list of genome project ids & I have a need where I need to gather information from a specific field? & store the output in a file. As regards what Info I want For example, for genome project id 30807? http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=30807, I need to grab the text information that reads (this is found at the bottom of the page):Anabaena azollae. Anabaena azollae is a cyanobacterial symbiont of the water fern Azolla, commonly known as 'duckweed'. Anabaena azollae is a nitrogen-fixer and provides nitrogen to the host plant.Nostoc azollae 0708. Nostoc azollae 0708, also called Anabaena azollae strain 0708, will be used for comparative analysis. I need to grab the? same information for a list of genome project ids. Is this possible using Bio::DB::Eutilities. If yes, what would be the fields/params? I did try out this: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#What_information_is_available_for_database_.27x.27.3F to find out what information is available for genomeprj, but I am unable to get the necessary field/param for my need. Please help. Alternatively, is there a better way to address my need other than Bio::DB::Eutilities Thanks in advance, Anand From rmb32 at cornell.edu Fri Mar 26 03:44:09 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 00:44:09 -0700 Subject: [Bioperl-l] GSoC mentors mailing list Message-ID: <4BAC65C9.307@cornell.edu> Hi all, If you have volunteered to be a possible GSoC mentor, and have not already been subscribed to the (mentors-only) gsoc-mentors mailing list, send me an email and I'll subscribe you. Rob Buels OBF GSoC 2010 Admin From rmb32 at cornell.edu Fri Mar 26 12:30:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 09:30:30 -0700 Subject: [Bioperl-l] Announcing OBF Summer of Code - please forward! Message-ID: <4BACE126.1030500@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 9! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2010 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010 Applications due 19:00 UTC, April 9, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 29 through Friday, April 9th, 2010. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2010 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs From cjfields at illinois.edu Fri Mar 26 14:28:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 13:28:46 -0500 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <4BACEEA9.2060407@purdue.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> <4BACEEA9.2060407@purdue.edu> Message-ID: <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> That format is ASN.1. and there isn't a BioPerl parser for GenBank ASN.1 format (it tends to be too cumbersome). However, there is a pure-perl-based one for the EntrezGene ASN.1 format (Bio::ASN1::EntrezGene). chris On Fri, 2010-03-26 at 13:28 -0400, Phillip San Miguel wrote: > Ah, yes. That does the trick. Actually I have already downloaded a few > thousand records in whatever that format that is returned when 'genbank' > is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= > seq {') Any idea what format that is and how to convert it to something > SeqIO can use? > > If not, I can just pull them all down again by sending about 200 gi's > per request. That should not offend the genbank gods... > > Thanks for your help, > Phillip > > Chris Fields wrote: > > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. > > > > Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. > > > > chris > > > > On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > > > > > >> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. > >> > >> I am trying to use the code provided at: > >> > >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO > >> > >> and modified to request gi228534658 > >> > >> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. > >> > >> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: > >> > >> #!/usr/bin/perl > >> use strict; > >> use warnings; > >> > >> use Bio::SeqIO; > >> use Bio::DB::EUtilities; > >> > >> my @ids; > >> push @ids, '228534658'; > >> my $factory = Bio::DB::EUtilities->new( > >> -eutil => 'efetch', > >> -db => 'nucleotide', > >> -rettype => 'genbank', > >> -id => \@ids); > >> > >> my $file = 'myseqs.gb'; > >> > >> # dump HTTP::Response content to a file (not retained in memory) > >> $factory->get_Response(-file => $file); > >> > >> my $seqin = Bio::SeqIO->new(-file => $file, > >> -format => 'genbank'); > >> > >> while (my $seq = $seqin->next_seq) { > >> print "I see a sequence\n"; > >> print $seq->species(); > >> } > >> > >> > >> "myseqs.gb" does have content: > >> > >> Seq-entry ::= seq { > >> id { > >> general { > >> db "gpid:36555" , > >> tag > >> str "contig49313" } , > >> genbank { > >> accession "EZ113652" , > >> version 1 } , > >> gi 228534658 } , > >> descr { > >> title "TSA: Zea mays contig49313, mRNA sequence." , > >> source { > >> genome genomic , > >> org { > >> taxname "Zea mays" , > >> db { > >> { > >> db "taxon" , > >> tag > >> id 4577 } } , > >> orgname { > >> name > >> binomial { > >> genus "Zea" , > >> species "mays" } , > >> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; > >> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; > >> PACCAD clade; Panicoideae; Andropogoneae; Zea" , > >> gcode 1 , > >> mgcode 1 , > >> div "PLN" } } } , > >> molinfo { > >> biomol mRNA , > >> tech tsa } , > >> pub { > >> pub { > >> article { > >> title { > >> name "Deep sampling of the Palomero maize transcriptome by a high > >> throughput strategy of pyrosequencing." } , > >> authors { > >> names > >> std { > >> { > >> name > >> name { > >> last "Vega-Arreguin" , > >> initials "J.C." } } , > >> { > >> name > >> name { > >> last "Ibarra-Laclette" , > >> initials "E." } } , > >> { > >> name > >> name { > >> last "Jimenez-Moraila" , > >> initials "B." } } , > >> { > >> name > >> name { > >> last "Martinez" , > >> initials "O." } } , > >> { > >> name > >> name { > >> last "Vielle-Calzada" , > >> initials "J.P." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> initials "L." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> initials "A." } } } } , > >> from > >> journal { > >> title { > >> iso-jta "BMC Genomics" , > >> ml-jta "BMC Genomics" , > >> issn "1471-2164" , > >> name "BMC genomics" } , > >> imp { > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } , > >> volume "10" , > >> issue "1" , > >> pages "299" , > >> language "ENG" , > >> pubstatus aheadofprint , > >> history { > >> { > >> pubstatus received , > >> date > >> std { > >> year 2008 , > >> month 12 , > >> day 2 } } , > >> { > >> pubstatus accepted , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } } , > >> { > >> pubstatus aheadofprint , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } } , > >> { > >> pubstatus other , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } , > >> { > >> pubstatus pubmed , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } , > >> { > >> pubstatus medline , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } } } } , > >> ids { > >> pii "1471-2164-10-299" , > >> doi "10.1186/1471-2164-10-299" , > >> pubmed 19580677 } } , > >> pmid 19580677 } } , > >> pub { > >> pub { > >> sub { > >> authors { > >> names > >> std { > >> { > >> name > >> name { > >> last "Vega-Arreguin" , > >> first "Julio" , > >> initials "J.C." } } , > >> { > >> name > >> name { > >> last "Ibarra-Laclette" , > >> first "Enrique" , > >> initials "E." } } , > >> { > >> name > >> name { > >> last "Jimenez-Moraila" , > >> first "Beatriz" , > >> initials "B." } } , > >> { > >> name > >> name { > >> last "Martinez" , > >> first "Octavio" , > >> initials "O." } } , > >> { > >> name > >> name { > >> last "Vielle-Calzada" , > >> first "Jean" , > >> initials "J.Philippe." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> first "Luis" , > >> initials "L." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> first "Alfredo" , > >> initials "A." } } } , > >> affil > >> std { > >> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , > >> div "Cinvestav Campus Guanajuato" , > >> city "Irapuato" , > >> sub "Guanajuato" , > >> country "Mexico" , > >> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , > >> postal-code "36821" } } , > >> medium other , > >> date > >> std { > >> year 2009 , > >> month 3 , > >> day 23 } } } } , > >> user { > >> type > >> str "GenomeProjectsDB" , > >> data { > >> { > >> label > >> str "ProjectID" , > >> data > >> int 36555 } , > >> { > >> label > >> str "ParentID" , > >> data > >> int 0 } } } , > >> create-date > >> std { > >> year 2009 , > >> month 5 , > >> day 5 } , > >> update-date > >> std { > >> year 2009 , > >> month 7 , > >> day 14 } } , > >> inst { > >> repr raw , > >> mol rna , > >> length 450 , > >> seq-data > >> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 > >> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A > >> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 > >> 760CFF0'H } } > >> > >> > >> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. > >> > >> Phillip > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wollenbergk at niaid.nih.gov Fri Mar 26 16:47:06 2010 From: wollenbergk at niaid.nih.gov (Wollenberg, Kurt (NIH/NIAID) [C]) Date: Fri, 26 Mar 2010 16:47:06 -0400 Subject: [Bioperl-l] Error during installation of 1.6.1 Message-ID: Hello: I am trying to install BioPerl (after a recent system upgrade) and am getting the following error: "Catching error: "Can't execute q install q: No such file or directory at /Library/Perl/Updates/5.8.8/CPAN/Shell.pm line 1755\cJ" at /Library/Perl/Updates/5.8.8/CPAN.pm line 391". Previous to this I've run the CPAN upgrade, etc. as recommended on the Installation for Unix page. This happens when I try to do the actual install, both vanilla and "force"ed. I'm attempting this on a Mac G5 workstation running 10.5.8. Any clues what I may be missing or doing incorrectly? Cheers, Kurt Wollenberg, Ph.D. Contractor - Lockheed Martin Phylogenetics Specialist Computational Biology Section Bioinformatics and Computational Biosciences Branch (BCBB) OCICB/OSMO/OD/NIAID/NIH 31 Center Drive, Room 3B62 Bethesda, MD 20892-0485 Office 301-402-8628 http://bioinformatics.niaid.nih.gov (Within NIH) http://exon.niaid.nih.gov (Public) Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives From rmb32 at cornell.edu Fri Mar 26 18:22:42 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 15:22:42 -0700 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> References: <648F9E90AF07449887FD4C420AA8B00E@NewLife> <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> Message-ID: <4BAD33B2.1060309@cornell.edu> You guys are the best. Hugs all around. R From watvealo at cse.msu.edu Fri Mar 26 19:06:24 2010 From: watvealo at cse.msu.edu (Alok) Date: Fri, 26 Mar 2010 19:06:24 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <249674A825C14BB3801C6184DEEA7A82@NewLife> References: <4BABB825.6010803@cse.msu.edu> <249674A825C14BB3801C6184DEEA7A82@NewLife> Message-ID: <4BAD3DF0.7090006@cse.msu.edu> Hi Mark, Thanks a lot for the response. I tried to access the SVN but was unable to do so. My SVN client just times out :-( I even tried SVN links from the BioPerl Wiki (http://www.bioperl.org/wiki/Using_Subversion) But they too are non-responsive. Thanks, Alok Mark A. Jensen wrote: > Hi Alok-- Thanks for your interest! You should certainly consider > applying. I can work with > you on developing your application. I'm including the bioperl mailing > list on this > post; we'll continue to have this conversation on the list so that the > helpful, friendly, > knowledgeable, compassionate membership can participate. > WrapperMaker code is currently available in > svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker > > Probably you want to have a look at Bio::Tools::Run::Samtools in > bioperl-run > for an example of how Bio::Tools::Run::WrapperBase and CommandExts are > used (er, by me...). > cheers > MAJ > ----- Original Message ----- From: "Alok" > To: > Sent: Thursday, March 25, 2010 3:23 PM > Subject: BioPerl Google SOC project > > >> Hello Mark, >> >> My name is Alok Watve and I am currently pursuing PhD in Computer >> Science at Michigan State University. I was going through the BioPerl >> Wiki for Google SOC projects. I have good experience with Perl and was >> wondering if I could work on the project "Perl Run Wrappers". >> >> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >> Ltd. My work was involved in writing Java programs and their perl >> wrappers. We used perl scripts to fire java programs with all the >> correct parameters. So I think I have some idea about what wrappers are. >> However, I have not used BioPerl and may take some time to get familiar >> with the structure. I am fairly confident that I will be able to do >> this. >> >> During my work here at MSU. I use perl a lot for doing basic text >> analysis for my projects. Although I rarely use OO features of perl, I >> have used them in past and never had any problems with it. I also >> believe in writing well-documented and user/developer friendly code >> (With comments, command line options for help/documentation). I have >> attached a simple script I wrote for my project as an example. I have >> also attached my resume for your consideration. >> >> Please let me know if you think that I am an appropriate candidate and >> whether I should go ahead with submitting an application with BioPerl as >> my Mentor Organization. >> >> Thanks a lot, >> Alok >> www.cse.msu.edu/~watvealo/ >> > > > -------------------------------------------------------------------------------- > > > >> #!/usr/bin/perl >> >> =pod >> >> =head1 SYNOPSIS >> >> Script to edit existing box query files to enable random box query. >> This scripts inserts box size on each line corresponding to discrete >> dimension in the existing box query file. The maximum value of "box >> size" >> depends on the alphabet size. >> >> Example >> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >> >> Use -perldoc for detailed help on options. >> >> =head1 OPTIONS >> >> =over >> >> =item -infile >> >> Specifies the name of the input box query file. >> >> =item -outfile >> >> Specifies the name of the output file. >> >> =item -uniform_box >> >> Specifies size of the uniform box query. >> >> =item -max_size >> >> Specifies the maximum box size for random sized box query. >> >> =item -help >> >> Displays a brief help message and exits. >> >> =item -perldoc >> >> Displays a detailed help. >> >> =back >> >> =cut >> >> use strict; >> use warnings 'all'; >> >> use Getopt::Long; >> use Pod::Usage; >> >> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, >> 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >> 'help' => \my $help, 'perldoc' => \my $perldoc); >> >> if(defined($perldoc)) >> { >> pod2usage(-verbose => 2); >> } >> >> if(defined($help)) >> { >> pod2usage(-verbose=> 0); >> } >> >> if(! (defined($infile) && defined ($outfile) )) >> { >> die('Please specify input, output files. Use -perldoc >> for more help'); >> } >> >> # Some basic error checking to ensure script runs .... >> if(!(defined($uniformBox) ||defined($maxSize))) >> { >> die('Specify either box size for uniform box queries or maximum >> box size for random box queries'); >> } >> >> # Initialize random number generator. >> srand(); >> >> # Read Input file and find out lines we are interested in >> # Then perfix the line with correct box size as defined by >> # user choice >> open(IN, "<$infile"); >> open(OUT, ">$outfile"); >> my $count = 0; >> while(my $line = ) >> { >> if( ($count%64) < 32 ) >> { >> if(defined($uniformBox)) >> { >> $line = sprintf("%d ",$uniformBox) . $line; >> } >> elsif(defined($maxSize)) >> { >> # This line corresponds to the discrete dimension. >> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >> } >> } >> $count ++; >> print OUT $line >> } >> >> close(OUT); >> close(IN); >> From maj at fortinbras.us Fri Mar 26 20:08:51 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 20:08:51 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <4BAD3DF0.7090006@cse.msu.edu> References: <4BABB825.6010803@cse.msu.edu><249674A825C14BB3801C6184DEEA7A82@NewLife> <4BAD3DF0.7090006@cse.msu.edu> Message-ID: Hi Alok-- There has been trouble with the code node of late. You can get a tarball of all the latest code at http://bioperl.org/DIST/nightly_builds/ Download both bioperl-live and bioperl-run cheers, MAJ ----- Original Message ----- From: "Alok" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Friday, March 26, 2010 7:06 PM Subject: Re: [Bioperl-l] BioPerl Google SOC project > Hi Mark, > > Thanks a lot for the response. I tried to access the SVN but was unable to do > so. My SVN client just times out :-( > I even tried SVN links from the BioPerl Wiki > (http://www.bioperl.org/wiki/Using_Subversion) > But they too are non-responsive. > > Thanks, > Alok > > Mark A. Jensen wrote: >> Hi Alok-- Thanks for your interest! You should certainly consider applying. I >> can work with >> you on developing your application. I'm including the bioperl mailing list on >> this >> post; we'll continue to have this conversation on the list so that the >> helpful, friendly, >> knowledgeable, compassionate membership can participate. >> WrapperMaker code is currently available in >> svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker >> Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run >> for an example of how Bio::Tools::Run::WrapperBase and CommandExts are >> used (er, by me...). >> cheers >> MAJ >> ----- Original Message ----- From: "Alok" >> To: >> Sent: Thursday, March 25, 2010 3:23 PM >> Subject: BioPerl Google SOC project >> >> >>> Hello Mark, >>> >>> My name is Alok Watve and I am currently pursuing PhD in Computer >>> Science at Michigan State University. I was going through the BioPerl >>> Wiki for Google SOC projects. I have good experience with Perl and was >>> wondering if I could work on the project "Perl Run Wrappers". >>> >>> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >>> Ltd. My work was involved in writing Java programs and their perl >>> wrappers. We used perl scripts to fire java programs with all the >>> correct parameters. So I think I have some idea about what wrappers are. >>> However, I have not used BioPerl and may take some time to get familiar >>> with the structure. I am fairly confident that I will be able to do this. >>> >>> During my work here at MSU. I use perl a lot for doing basic text >>> analysis for my projects. Although I rarely use OO features of perl, I >>> have used them in past and never had any problems with it. I also >>> believe in writing well-documented and user/developer friendly code >>> (With comments, command line options for help/documentation). I have >>> attached a simple script I wrote for my project as an example. I have >>> also attached my resume for your consideration. >>> >>> Please let me know if you think that I am an appropriate candidate and >>> whether I should go ahead with submitting an application with BioPerl as >>> my Mentor Organization. >>> >>> Thanks a lot, >>> Alok >>> www.cse.msu.edu/~watvealo/ >>> >> >> >> -------------------------------------------------------------------------------- >> >> >> >>> #!/usr/bin/perl >>> >>> =pod >>> >>> =head1 SYNOPSIS >>> >>> Script to edit existing box query files to enable random box query. >>> This scripts inserts box size on each line corresponding to discrete >>> dimension in the existing box query file. The maximum value of "box size" >>> depends on the alphabet size. >>> >>> Example >>> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >>> >>> Use -perldoc for detailed help on options. >>> >>> =head1 OPTIONS >>> >>> =over >>> >>> =item -infile >>> >>> Specifies the name of the input box query file. >>> >>> =item -outfile >>> >>> Specifies the name of the output file. >>> >>> =item -uniform_box >>> >>> Specifies size of the uniform box query. >>> >>> =item -max_size >>> >>> Specifies the maximum box size for random sized box query. >>> >>> =item -help >>> >>> Displays a brief help message and exits. >>> >>> =item -perldoc >>> >>> Displays a detailed help. >>> >>> =back >>> >>> =cut >>> >>> use strict; >>> use warnings 'all'; >>> >>> use Getopt::Long; >>> use Pod::Usage; >>> >>> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, >>> 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >>> 'help' => \my $help, 'perldoc' => \my $perldoc); >>> >>> if(defined($perldoc)) >>> { >>> pod2usage(-verbose => 2); >>> } >>> >>> if(defined($help)) >>> { >>> pod2usage(-verbose=> 0); >>> } >>> >>> if(! (defined($infile) && defined ($outfile) )) >>> { >>> die('Please specify input, output files. Use -perldoc >>> for more help'); >>> } >>> >>> # Some basic error checking to ensure script runs .... >>> if(!(defined($uniformBox) ||defined($maxSize))) >>> { >>> die('Specify either box size for uniform box queries or maximum box size >>> for random box queries'); >>> } >>> >>> # Initialize random number generator. >>> srand(); >>> >>> # Read Input file and find out lines we are interested in >>> # Then perfix the line with correct box size as defined by >>> # user choice >>> open(IN, "<$infile"); >>> open(OUT, ">$outfile"); >>> my $count = 0; >>> while(my $line = ) >>> { >>> if( ($count%64) < 32 ) >>> { >>> if(defined($uniformBox)) >>> { >>> $line = sprintf("%d ",$uniformBox) . $line; >>> } >>> elsif(defined($maxSize)) >>> { >>> # This line corresponds to the discrete dimension. >>> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >>> } >>> } >>> $count ++; >>> print OUT $line >>> } >>> >>> close(OUT); >>> close(IN); >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bioperlanand at yahoo.com Fri Mar 26 21:40:04 2010 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Fri, 26 Mar 2010 18:40:04 -0700 (PDT) Subject: [Bioperl-l] From Anand - a question on querying ncbi's genomeprj with Bio::DB::Eutilities Message-ID: <497143.33972.qm@web114218.mail.gq1.yahoo.com> Hi everybody, ? I have a list of genome project ids & I have a need where I need to gather information from a specific field? & store the output in a file. As regards what Info I want For example, for genome project id 30807??http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=30807, I need to grab the text information that reads (this is found at the bottom of the page):Anabaena azollae. Anabaena azollae is a cyanobacterial symbiont of the water fern Azolla, commonly known as 'duckweed'. Anabaena azollae is a nitrogen-fixer and provides nitrogen to the host plant.Nostoc azollae 0708. Nostoc azollae 0708, also called Anabaena azollae strain 0708, will be used for comparative analysis. I need to grab the? same information for a list of genome project ids. Is this possible using Bio::DB::Eutilities. If yes, what would be the fields/params? I did try out this:?http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#What_information_is_available_for_database_.27x.27.3F to find out what information is available for genomeprj, but I am unable to get the necessary field/param for my need. Please help. Alternatively, is there a better way to address my need other than Bio::DB::Eutilities Thanks in advance, Anand? From cjfields at illinois.edu Fri Mar 26 23:05:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 22:05:59 -0500 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: References: <4BABB825.6010803@cse.msu.edu><249674A825C14BB3801C6184DEEA7A82@NewLife> <4BAD3DF0.7090006@cse.msu.edu> Message-ID: <73AE1929-9920-4FD1-B36B-1C7244E20102@illinois.edu> You can also grab the code off the github mirror: http://github.com/bioperl/bioperl-live You can either run a checkout, or download the tarball using the 'Download Source' link. We'll have an SVN read-only mirror on Google Code as well very soon, if it isn't done already. chris On Mar 26, 2010, at 7:08 PM, Mark A. Jensen wrote: > Hi Alok-- There has been trouble with the code node > of late. You can get a tarball of all the latest code at > http://bioperl.org/DIST/nightly_builds/ > Download both bioperl-live and bioperl-run > cheers, > MAJ > ----- Original Message ----- From: "Alok" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Friday, March 26, 2010 7:06 PM > Subject: Re: [Bioperl-l] BioPerl Google SOC project > > >> Hi Mark, >> >> Thanks a lot for the response. I tried to access the SVN but was unable to do so. My SVN client just times out :-( >> I even tried SVN links from the BioPerl Wiki (http://www.bioperl.org/wiki/Using_Subversion) >> But they too are non-responsive. >> >> Thanks, >> Alok >> >> Mark A. Jensen wrote: >>> Hi Alok-- Thanks for your interest! You should certainly consider applying. I can work with >>> you on developing your application. I'm including the bioperl mailing list on this >>> post; we'll continue to have this conversation on the list so that the helpful, friendly, >>> knowledgeable, compassionate membership can participate. >>> WrapperMaker code is currently available in >>> svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker >>> Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run >>> for an example of how Bio::Tools::Run::WrapperBase and CommandExts are >>> used (er, by me...). >>> cheers >>> MAJ >>> ----- Original Message ----- From: "Alok" >>> To: >>> Sent: Thursday, March 25, 2010 3:23 PM >>> Subject: BioPerl Google SOC project >>> >>> >>>> Hello Mark, >>>> >>>> My name is Alok Watve and I am currently pursuing PhD in Computer >>>> Science at Michigan State University. I was going through the BioPerl >>>> Wiki for Google SOC projects. I have good experience with Perl and was >>>> wondering if I could work on the project "Perl Run Wrappers". >>>> >>>> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >>>> Ltd. My work was involved in writing Java programs and their perl >>>> wrappers. We used perl scripts to fire java programs with all the >>>> correct parameters. So I think I have some idea about what wrappers are. >>>> However, I have not used BioPerl and may take some time to get familiar >>>> with the structure. I am fairly confident that I will be able to do this. >>>> >>>> During my work here at MSU. I use perl a lot for doing basic text >>>> analysis for my projects. Although I rarely use OO features of perl, I >>>> have used them in past and never had any problems with it. I also >>>> believe in writing well-documented and user/developer friendly code >>>> (With comments, command line options for help/documentation). I have >>>> attached a simple script I wrote for my project as an example. I have >>>> also attached my resume for your consideration. >>>> >>>> Please let me know if you think that I am an appropriate candidate and >>>> whether I should go ahead with submitting an application with BioPerl as >>>> my Mentor Organization. >>>> >>>> Thanks a lot, >>>> Alok >>>> www.cse.msu.edu/~watvealo/ >>>> >>> >>> >>> -------------------------------------------------------------------------------- >>> >>> >>> >>>> #!/usr/bin/perl >>>> >>>> =pod >>>> >>>> =head1 SYNOPSIS >>>> >>>> Script to edit existing box query files to enable random box query. >>>> This scripts inserts box size on each line corresponding to discrete >>>> dimension in the existing box query file. The maximum value of "box size" >>>> depends on the alphabet size. >>>> >>>> Example >>>> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >>>> >>>> Use -perldoc for detailed help on options. >>>> >>>> =head1 OPTIONS >>>> >>>> =over >>>> >>>> =item -infile >>>> >>>> Specifies the name of the input box query file. >>>> >>>> =item -outfile >>>> >>>> Specifies the name of the output file. >>>> >>>> =item -uniform_box >>>> >>>> Specifies size of the uniform box query. >>>> >>>> =item -max_size >>>> >>>> Specifies the maximum box size for random sized box query. >>>> >>>> =item -help >>>> >>>> Displays a brief help message and exits. >>>> >>>> =item -perldoc >>>> >>>> Displays a detailed help. >>>> >>>> =back >>>> >>>> =cut >>>> >>>> use strict; >>>> use warnings 'all'; >>>> >>>> use Getopt::Long; >>>> use Pod::Usage; >>>> >>>> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >>>> 'help' => \my $help, 'perldoc' => \my $perldoc); >>>> >>>> if(defined($perldoc)) >>>> { >>>> pod2usage(-verbose => 2); >>>> } >>>> >>>> if(defined($help)) >>>> { >>>> pod2usage(-verbose=> 0); >>>> } >>>> >>>> if(! (defined($infile) && defined ($outfile) )) >>>> { >>>> die('Please specify input, output files. Use -perldoc >>>> for more help'); >>>> } >>>> >>>> # Some basic error checking to ensure script runs .... >>>> if(!(defined($uniformBox) ||defined($maxSize))) >>>> { >>>> die('Specify either box size for uniform box queries or maximum box size for random box queries'); >>>> } >>>> >>>> # Initialize random number generator. >>>> srand(); >>>> >>>> # Read Input file and find out lines we are interested in >>>> # Then perfix the line with correct box size as defined by >>>> # user choice >>>> open(IN, "<$infile"); >>>> open(OUT, ">$outfile"); >>>> my $count = 0; >>>> while(my $line = ) >>>> { >>>> if( ($count%64) < 32 ) >>>> { >>>> if(defined($uniformBox)) >>>> { >>>> $line = sprintf("%d ",$uniformBox) . $line; >>>> } >>>> elsif(defined($maxSize)) >>>> { >>>> # This line corresponds to the discrete dimension. >>>> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >>>> } >>>> } >>>> $count ++; >>>> print OUT $line >>>> } >>>> >>>> close(OUT); >>>> close(IN); >>>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Mar 26 23:15:30 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 23:15:30 -0400 Subject: [Bioperl-l] Error during installation of 1.6.1 In-Reply-To: References: Message-ID: Is it really "q install q" ? Then you probably need to do some cpan configuring. It's possible your original CPAN/Config.pm file is lost or not where cpan expects it to be after your upgrade. Try this $ cpan cpan> o conf make /usr/bin/make cpan> o conf make_install_make_command /usr/bin/make cpan> o conf commit and rerun the install. If you get other strangeness, I would check the values of all the config variables by listing with cpan> o conf BTW, by the message I infer you've got v1.93 of CPAN; maybe upgrading to the current version (v1.9402) would solve some problems. cheers MAJ ----- Original Message ----- From: "Wollenberg, Kurt (NIH/NIAID) [C]" To: Sent: Friday, March 26, 2010 4:47 PM Subject: [Bioperl-l] Error during installation of 1.6.1 > Hello: > > I am trying to install BioPerl (after a recent system upgrade) and am > getting the following error: > > "Catching error: "Can't execute q install q: No such file or directory at > /Library/Perl/Updates/5.8.8/CPAN/Shell.pm line 1755\cJ" at > /Library/Perl/Updates/5.8.8/CPAN.pm line 391". > > Previous to this I've run the CPAN upgrade, etc. as recommended on the > Installation for Unix page. This happens when I try to do the actual > install, both vanilla and "force"ed. I'm attempting this on a Mac G5 > workstation running 10.5.8. Any clues what I may be missing or doing > incorrectly? > > Cheers, > Kurt Wollenberg, Ph.D. > Contractor - Lockheed Martin > Phylogenetics Specialist > Computational Biology Section > Bioinformatics and Computational Biosciences Branch (BCBB) > OCICB/OSMO/OD/NIAID/NIH > > 31 Center Drive, Room 3B62 > Bethesda, MD 20892-0485 > Office 301-402-8628 > http://bioinformatics.niaid.nih.gov (Within NIH) > http://exon.niaid.nih.gov (Public) > > Disclaimer: > The information in this e-mail and any of its attachments is confidential > and may contain sensitive information. It should not be used by anyone who > is not the original intended recipient. If you have received this e-mail in > error please inform the sender and delete it from your mailbox or any other > storage devices. National Institute of Allergy and Infectious Diseases shall > not accept liability for any statements made that are sender's own and not > expressly made on behalf of the NIAID by one of its representatives > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Sat Mar 27 08:42:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 27 Mar 2010 12:42:12 +0000 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Message-ID: <320fb6e01003270542i1f3cd4d2x61c97bc7ccf1b917@mail.gmail.com> On Fri, Mar 26, 2010 at 4:16 PM, Chris Fields wrote: > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the > latter is if you always want a full nucleotide sequence instead > of possibly getting contig files). ?'genbank' used to be an alias > for 'gb', but apparently no longer, and appears to be something > that was changed on NCBI's end. Yeah, the NCBI changed that almost a year ago (Easter 2009). It broke one of the Biopython unit tests, and I asked the NCBI about this and if they could restore the alias "genbank". They declined, so in Biopython's efetch wrapper we spot anyone asking for retype=genbank, issue a warning, and convert it to retype=gb or retype=gp (for the protein database) instead. The relevant Biopython code is here if anyone is interested: http://biopython.org/SRC/biopython/Bio/Entrez/__init__.py Peter From pmiguel at purdue.edu Sat Mar 27 09:51:14 2010 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sat, 27 Mar 2010 09:51:14 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> <4BACEEA9.2060407@purdue.edu> <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> Message-ID: <4BAE0D52.60908@purdue.edu> Hi Chris, I also see there is a bunch of NCBI toolkit code that deals with asn.1 conversion. They even have some precompiled code: http://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html Thanks for your help, Phillip Chris Fields wrote: > That format is ASN.1. and there isn't a BioPerl parser for GenBank ASN.1 > format (it tends to be too cumbersome). > > However, there is a pure-perl-based one for the EntrezGene ASN.1 format > (Bio::ASN1::EntrezGene). > > chris > > > On Fri, 2010-03-26 at 13:28 -0400, Phillip San Miguel wrote: > >> Ah, yes. That does the trick. Actually I have already downloaded a few >> thousand records in whatever that format that is returned when 'genbank' >> is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= >> seq {') Any idea what format that is and how to convert it to something >> SeqIO can use? >> >> If not, I can just pull them all down again by sending about 200 gi's >> per request. That should not offend the genbank gods... >> >> Thanks for your help, >> Phillip >> >> Chris Fields wrote: >> >>> Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. >>> >>> Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. >>> >>> chris >>> >>> On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: >>> >>> >>> >>>> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. >>>> >>>> I am trying to use the code provided at: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO >>>> >>>> and modified to request gi228534658 >>>> >>>> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. >>>> >>>> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> >>>> use Bio::SeqIO; >>>> use Bio::DB::EUtilities; >>>> >>>> my @ids; >>>> push @ids, '228534658'; >>>> my $factory = Bio::DB::EUtilities->new( >>>> -eutil => 'efetch', >>>> -db => 'nucleotide', >>>> -rettype => 'genbank', >>>> -id => \@ids); >>>> >>>> my $file = 'myseqs.gb'; >>>> >>>> # dump HTTP::Response content to a file (not retained in memory) >>>> $factory->get_Response(-file => $file); >>>> >>>> my $seqin = Bio::SeqIO->new(-file => $file, >>>> -format => 'genbank'); >>>> >>>> while (my $seq = $seqin->next_seq) { >>>> print "I see a sequence\n"; >>>> print $seq->species(); >>>> } >>>> >>>> >>>> "myseqs.gb" does have content: >>>> >>>> Seq-entry ::= seq { >>>> id { >>>> general { >>>> db "gpid:36555" , >>>> tag >>>> str "contig49313" } , >>>> genbank { >>>> accession "EZ113652" , >>>> version 1 } , >>>> gi 228534658 } , >>>> descr { >>>> title "TSA: Zea mays contig49313, mRNA sequence." , >>>> source { >>>> genome genomic , >>>> org { >>>> taxname "Zea mays" , >>>> db { >>>> { >>>> db "taxon" , >>>> tag >>>> id 4577 } } , >>>> orgname { >>>> name >>>> binomial { >>>> genus "Zea" , >>>> species "mays" } , >>>> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; >>>> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; >>>> PACCAD clade; Panicoideae; Andropogoneae; Zea" , >>>> gcode 1 , >>>> mgcode 1 , >>>> div "PLN" } } } , >>>> molinfo { >>>> biomol mRNA , >>>> tech tsa } , >>>> pub { >>>> pub { >>>> article { >>>> title { >>>> name "Deep sampling of the Palomero maize transcriptome by a high >>>> throughput strategy of pyrosequencing." } , >>>> authors { >>>> names >>>> std { >>>> { >>>> name >>>> name { >>>> last "Vega-Arreguin" , >>>> initials "J.C." } } , >>>> { >>>> name >>>> name { >>>> last "Ibarra-Laclette" , >>>> initials "E." } } , >>>> { >>>> name >>>> name { >>>> last "Jimenez-Moraila" , >>>> initials "B." } } , >>>> { >>>> name >>>> name { >>>> last "Martinez" , >>>> initials "O." } } , >>>> { >>>> name >>>> name { >>>> last "Vielle-Calzada" , >>>> initials "J.P." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> initials "L." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> initials "A." } } } } , >>>> from >>>> journal { >>>> title { >>>> iso-jta "BMC Genomics" , >>>> ml-jta "BMC Genomics" , >>>> issn "1471-2164" , >>>> name "BMC genomics" } , >>>> imp { >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } , >>>> volume "10" , >>>> issue "1" , >>>> pages "299" , >>>> language "ENG" , >>>> pubstatus aheadofprint , >>>> history { >>>> { >>>> pubstatus received , >>>> date >>>> std { >>>> year 2008 , >>>> month 12 , >>>> day 2 } } , >>>> { >>>> pubstatus accepted , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } } , >>>> { >>>> pubstatus aheadofprint , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } } , >>>> { >>>> pubstatus other , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } , >>>> { >>>> pubstatus pubmed , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } , >>>> { >>>> pubstatus medline , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } } } } , >>>> ids { >>>> pii "1471-2164-10-299" , >>>> doi "10.1186/1471-2164-10-299" , >>>> pubmed 19580677 } } , >>>> pmid 19580677 } } , >>>> pub { >>>> pub { >>>> sub { >>>> authors { >>>> names >>>> std { >>>> { >>>> name >>>> name { >>>> last "Vega-Arreguin" , >>>> first "Julio" , >>>> initials "J.C." } } , >>>> { >>>> name >>>> name { >>>> last "Ibarra-Laclette" , >>>> first "Enrique" , >>>> initials "E." } } , >>>> { >>>> name >>>> name { >>>> last "Jimenez-Moraila" , >>>> first "Beatriz" , >>>> initials "B." } } , >>>> { >>>> name >>>> name { >>>> last "Martinez" , >>>> first "Octavio" , >>>> initials "O." } } , >>>> { >>>> name >>>> name { >>>> last "Vielle-Calzada" , >>>> first "Jean" , >>>> initials "J.Philippe." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> first "Luis" , >>>> initials "L." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> first "Alfredo" , >>>> initials "A." } } } , >>>> affil >>>> std { >>>> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , >>>> div "Cinvestav Campus Guanajuato" , >>>> city "Irapuato" , >>>> sub "Guanajuato" , >>>> country "Mexico" , >>>> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , >>>> postal-code "36821" } } , >>>> medium other , >>>> date >>>> std { >>>> year 2009 , >>>> month 3 , >>>> day 23 } } } } , >>>> user { >>>> type >>>> str "GenomeProjectsDB" , >>>> data { >>>> { >>>> label >>>> str "ProjectID" , >>>> data >>>> int 36555 } , >>>> { >>>> label >>>> str "ParentID" , >>>> data >>>> int 0 } } } , >>>> create-date >>>> std { >>>> year 2009 , >>>> month 5 , >>>> day 5 } , >>>> update-date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 14 } } , >>>> inst { >>>> repr raw , >>>> mol rna , >>>> length 450 , >>>> seq-data >>>> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 >>>> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A >>>> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 >>>> 760CFF0'H } } >>>> >>>> >>>> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. >>>> >>>> Phillip >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From awitney at sgul.ac.uk Mon Mar 29 13:26:40 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 29 Mar 2010 18:26:40 +0100 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl In-Reply-To: <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> References: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> Message-ID: <6DD3E9BB-27AD-4241-94F9-476AE6525A7D@sgul.ac.uk> thanks Chris for the explanation. It looks like Exonerate may also do something similar thanks adam On 26 Mar 2010, at 15:51, Chris Fields wrote: > It's not actively developed as far as I know. I've been thinking that we could break it out of bioperl-ext and release it on it's own, with the intent that someone could take it up at some point. We have started down that road with the HMM tools in bioperl-ext, though that one is still maintained by it's author. > > I know many users just use calls to outside programs, such EMBOSS (which has water and needle) or others. From the maintenance standpoint they're easier to update if something changes, XS can be a bugbear. > > chris > > On Mar 26, 2010, at 10:20 AM, Adam Witney wrote: > >> Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error >> >> "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. >> Please read the install the bioperl-ext package" >> >> even though i did compile and install the Bio::Ext::Align package >> >> If not using the pSW module, what do other people use for this? >> >> thanks >> >> adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nicolas.turenne at jouy.inra.fr Mon Mar 29 14:09:53 2010 From: nicolas.turenne at jouy.inra.fr (Nicolas Turenne) Date: Mon, 29 Mar 2010 20:09:53 +0200 Subject: [Bioperl-l] about biblio Message-ID: <4BB0ECF1.6050308@jouy.inra.fr> Hello, I am using biblio module from bioperl to download pubmed abstract. if i do the query "actb" on the pubmed site (http://www.ncbi.nlm.nih.gov/sites/entrez) i get 165 hits But using bioperl, if i do use Bio::Biblio; my $biblio = Bio::Biblio->new (-access => 'soap', -location => 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS', -destroy_on_exit => '0'); my @ListID = @{ $biblio->find ("actb")->get_all_ids }; i get 228 hits, so i dont understand the difference thank for help Nicolas From sj17m89 at gmail.com Mon Mar 29 13:47:38 2010 From: sj17m89 at gmail.com (Shweta Jha) Date: Mon, 29 Mar 2010 10:47:38 -0700 Subject: [Bioperl-l] Regarding Google Summer of Code Message-ID: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> Dear Sir / Madam , I , Shweta Jha , am a Third year B.Tech Bioinformatics student. I am interested to apply for the Google Summer of Code internship program. I am keen to work on project using Bioperl. Could you please let me know how do I apply for the program? Thanks and Regards Shweta Jha From rmb32 at cornell.edu Mon Mar 29 15:26:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 29 Mar 2010 12:26:30 -0700 Subject: [Bioperl-l] Regarding Google Summer of Code In-Reply-To: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> References: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> Message-ID: <4BB0FEE6.3080209@cornell.edu> Hi Shweta, See http://open-bio.org/wiki/Google_Summer_of_Code, and the GSoC FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs for details on the application process. Rob Shweta Jha wrote: > Dear Sir / Madam , > > I , Shweta Jha , am a Third year B.Tech Bioinformatics student. > > I am interested to apply for the Google Summer of Code internship program. > > I am keen to work on project using Bioperl. > > Could you please let me know how do I apply for the program? > > > > Thanks and Regards > Shweta Jha > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From martin.senger at gmail.com Mon Mar 29 17:02:02 2010 From: martin.senger at gmail.com (Martin Senger) Date: Mon, 29 Mar 2010 22:02:02 +0100 Subject: [Bioperl-l] about biblio In-Reply-To: <4BB0ECF1.6050308@jouy.inra.fr> References: <4BB0ECF1.6050308@jouy.inra.fr> Message-ID: <4d93f07c1003291402j5ab58216o3985157513d1820a@mail.gmail.com> Hi, I am actually not sure what is the correct answer - because I am not anymore maintaining the biblio server at EBI (I actually did not know that it was still running :-) - but I am very pleased that it does run). Mahmut, can I ask you a favor? Could you please pass the emailed question below to an appropriate person at EBI? Of course, if the result of this inquiry is that the problem is in the biblio module in bioperl I am quite happy and keen to fix it there. Cheers, Martin On Mon, Mar 29, 2010 at 7:09 PM, Nicolas Turenne < nicolas.turenne at jouy.inra.fr> wrote: > Hello, > I am using biblio module from bioperl to download pubmed abstract. > if i do the query "actb" on the pubmed site ( > http://www.ncbi.nlm.nih.gov/sites/entrez) > i get 165 hits > > But using bioperl, if i do > > use Bio::Biblio; > my $biblio = Bio::Biblio->new > (-access => 'soap', > -location => 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS', > -destroy_on_exit => '0'); > my @ListID = @{ $biblio->find ("actb")->get_all_ids }; > > i get 228 hits, so i dont understand the difference > > thank for help > Nicolas > -- Martin Senger email: martin.senger at gmail.com,martin.senger at kaust.edu.sa skype: martinsenger From click.xu at gmail.com Mon Mar 29 23:17:17 2010 From: click.xu at gmail.com (click xu) Date: Tue, 30 Mar 2010 11:17:17 +0800 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw Message-ID: Hi, I meet a problem when using Clustalw module. Here is the error message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht -output=gcg?? -matrix=BLOSUM -ktup le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | cannot find the file or path STACK: Error::throw STACK: Bio::Root::Root::throw /home/lf/data/BioPerl-1.6.1/Bio/Root/Root.pm:368 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig nment/Clustalw.pm:756 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali gnment/Clustalw.pm:515 STACK: test.txt:45 ----------------------------------------------------------- The test program is described as below: ----------------------------------------------------------- @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); # @seq_array is an array of Bio::Seq objects $aln = $factory->align(\@seq_array); ----------------------------------------------------------- The work path of clustalw2 has been configured: export CLUSTALDIR=/usr/local/bin/clustalw2 So, what may be reason of the error? Thanks! From Russell.Smithies at agresearch.co.nz Mon Mar 29 23:25:03 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 30 Mar 2010 16:25:03 +1300 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Do you have enough temp space? Will clustalw run 'manually' with your parameters from the command line? --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of click xu > Sent: Tuesday, 30 March 2010 4:17 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw > > Hi, > I meet a problem when using Clustalw module. > Here is the error message: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht > -output=gcg?? -matrix=BLOSUM -ktup > le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | > cannot find the file or path > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- > 1.6.1/Bio/Root/Root.pm:368 > STACK: Bio::Tools::Run::Alignment::Clustalw::_run > /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig > nment/Clustalw.pm:756 > STACK: Bio::Tools::Run::Alignment::Clustalw::align > /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali > gnment/Clustalw.pm:515 > STACK: test.txt:45 > ----------------------------------------------------------- > The test program is described as below: > ----------------------------------------------------------- > @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > # @seq_array is an array of Bio::Seq objects > $aln = $factory->align(\@seq_array); > ----------------------------------------------------------- > The work path of clustalw2 has been configured: > export CLUSTALDIR=/usr/local/bin/clustalw2 > So, what may be reason of the error? > Thanks! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From click.xu at gmail.com Tue Mar 30 00:03:49 2010 From: click.xu at gmail.com (click xu) Date: Tue, 30 Mar 2010 12:03:49 +0800 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Message-ID: Russell Clustalw2 can correctly run in command line, and the /tmp space is enough too. 2010/3/30 Smithies, Russell : > Do you have enough temp space? > Will clustalw run 'manually' with your parameters from the command line? > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of click xu >> Sent: Tuesday, 30 March 2010 4:17 p.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw >> >> Hi, >> I meet a problem when using Clustalw module. >> Here is the error message: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht >> -output=gcg?? -matrix=BLOSUM -ktup >> le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | >> cannot find the file or path >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- >> 1.6.1/Bio/Root/Root.pm:368 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig >> nment/Clustalw.pm:756 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align >> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali >> gnment/Clustalw.pm:515 >> STACK: test.txt:45 >> ----------------------------------------------------------- >> The test program is described as below: >> ----------------------------------------------------------- >> @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >> # @seq_array is an array of Bio::Seq objects >> $aln = $factory->align(\@seq_array); >> ----------------------------------------------------------- >> The work path of clustalw2 has been configured: >> export CLUSTALDIR=/usr/local/bin/clustalw2 >> So, what may be reason of the error? >> Thanks! >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From martin.senger at gmail.com Tue Mar 30 04:18:30 2010 From: martin.senger at gmail.com (Martin Senger) Date: Tue, 30 Mar 2010 09:18:30 +0100 Subject: [Bioperl-l] about biblio In-Reply-To: <4BB0ECF1.6050308@jouy.inra.fr> References: <4BB0ECF1.6050308@jouy.inra.fr> Message-ID: <4d93f07c1003300118q1c7b0551w4aa25a2a97fc35be@mail.gmail.com> Here is the answer sent by Mr Hamish McWilliam from EBI (where the MEDLINE server is running): The difference is OpenBQS adds a wildcard when it builds the SRS query: > > - [medline-AllText:actb*] gives 228 entries > - [medline-AllText:actb] gives 150 entries > > Performing the same query at PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) > gives similar answers: > > - "actb*" gives 255 entries > - "actb" gives 165 entries > > The remaining differences are probably due to slight differences in the > PubMed data at NCBI and the exported MEDLINE data. > Cheers, Martin -- Martin Senger email: martin.senger at gmail.com,martin.senger at kaust.edu.sa skype: martinsenger From cjfields at illinois.edu Tue Mar 30 08:42:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 30 Mar 2010 07:42:24 -0500 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Message-ID: <863E31F9-072B-4681-94C5-D2C8BEA82021@illinois.edu> You may need to submit this as a bug. I got clustalw2 working fairly recently, but it's possible some other API change is breaking things. chris On Mar 29, 2010, at 11:03 PM, click xu wrote: > Russell > Clustalw2 can correctly run in command line, and the /tmp space is enough too. > > > 2010/3/30 Smithies, Russell : >> Do you have enough temp space? >> Will clustalw run 'manually' with your parameters from the command line? >> >> --Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of click xu >>> Sent: Tuesday, 30 March 2010 4:17 p.m. >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw >>> >>> Hi, >>> I meet a problem when using Clustalw module. >>> Here is the error message: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: ClustalW call ( align -infile=/tmp/AeyAfdxGvH/YpcPbyhYht >>> -output=gcg -matrix=BLOSUM -ktup >>> le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | >>> cannot find the file or path >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- >>> 1.6.1/Bio/Root/Root.pm:368 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >>> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig >>> nment/Clustalw.pm:756 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::align >>> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali >>> gnment/Clustalw.pm:515 >>> STACK: test.txt:45 >>> ----------------------------------------------------------- >>> The test program is described as below: >>> ----------------------------------------------------------- >>> @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >>> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >>> # @seq_array is an array of Bio::Seq objects >>> $aln = $factory->align(\@seq_array); >>> ----------------------------------------------------------- >>> The work path of clustalw2 has been configured: >>> export CLUSTALDIR=/usr/local/bin/clustalw2 >>> So, what may be reason of the error? >>> Thanks! >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue Mar 30 16:10:09 2010 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 30 Mar 2010 22:10:09 +0200 Subject: [Bioperl-l] AlignIO formats Message-ID: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> Hi, Using GuessSeqFormat and AlignIO, I stumbled on some issues and am now wondering if the defined formats are actually OK. Esp. related to pfam, selex, stockholm formats it seems: pfam here is like selex without any comment lines, but with the /start-end after the seq id like myseq/1-111. The EBI site (http://www.ebi.ac.uk/2can/tutorials/formats.html#pfam) actually defines Pfam and Stockholm to be the same formats. This makes me wonder: is the Pfam format actually defined as Selex or Stockholm? Within BioPerl it is like Selex. In addition, Selex (as used in HMMER 2.3.2) contains comment lines like #=AC, #=RF or #=ID. GuessSeq format uses this to detect Selex, however, they do not have to be present. GuessSeqFormat uses: return (($lineno == 1 && $line =~ /^#=ID /) || ($lineno == 2 && $line =~ /^#=AC /) || ($line =~ /^#=SQ /)); to detect the Selex format. At the same time, the Selex reader does not seem to get the aln id or accession if( $entry =~ /^\#=GS\s+(\S+)\s+AC\s+(\S+)/ ) { $accession{ $1 } = $2; Also a Selex file like: seq1 ACGACGACGACG. seq2 ..GGGAAAGG.GA seq3 UUU..AAAUUU.A is guessed to be phylip (whereas the seq1/1-11 format will be guessed as pfam) I am not sure if the above is desired behaviour, though all sequences are read in the alignment object correctly. I' was wondering wether all Selex variations could be guessed as Selex, not as phylip, pfam or selex (though in the selex case we can have more alignments in one file). Regards, Bernd From p.j.a.cock at googlemail.com Tue Mar 30 17:12:46 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 30 Mar 2010 22:12:46 +0100 Subject: [Bioperl-l] AlignIO formats In-Reply-To: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> References: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> Message-ID: <320fb6e01003301412s6c90220el7a95bdc97dee03e6@mail.gmail.com> On Tue, Mar 30, 2010 at 9:10 PM, Bernd Web wrote: > Hi, > > Using GuessSeqFormat and AlignIO, I stumbled on some issues and > am now wondering if the defined formats are actually OK. Esp. related to > pfam, selex, stockholm formats it seems: > > pfam here is like selex without any comment lines, but with the > /start-end after the seq id like myseq/1-111. > The EBI site (http://www.ebi.ac.uk/2can/tutorials/formats.html#pfam) > actually defines Pfam and Stockholm to be the same formats. This makes > me wonder: is the Pfam format actually defined as Selex or Stockholm? > Within BioPerl it is like Selex. I (and therefore the Biopython documentation) also think PFAM and Stockholm alignments are basically the same thing. The BioPerl wiki seems to agree with this interpretation too. Looking at the HMMER2 examples, Selex is different but the comment style is similar. The obvious thing to check is the presence or absence of the "# STOCKHOLM 1.0" header if trying to tell them apart. See also: http://en.wikipedia.org/wiki/Stockholm_format and http://www.bioperl.org/wiki/Stockholm_multiple_alignment_format http://www.bioperl.org/wiki/SELEX_multiple_alignment_format Peter From jun.yin at ucd.ie Tue Mar 30 18:37:07 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 30 Mar 2010 23:37:07 +0100 Subject: [Bioperl-l] summer code project on Bioperl Message-ID: <7160acc75f99.4bb28b23@ucd.ie> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CV_JunYin.doc Type: application/msword Size: 27648 bytes Desc: not available URL: From ross at cuhk.edu.hk Wed Mar 31 17:28:59 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Thu, 1 Apr 2010 05:28:59 +0800 Subject: [Bioperl-l] BlastPlus usage inquiry In-Reply-To: References: Message-ID: <014401cad119$2d1467a0$873d36e0$@edu.hk> Dear all, I know it is inappropriate to raise this question in bioperl but as I received no better response from NCBI and so have to ask in this group (because finally I'll use bioperl to call blastplus). I have already been using the latest blastplus (the command is blastn directly) and found the problem of running slow and inability to run in a parallel/multithread manner. Previously I was using non blastplus version 2.2.22 with the command blastall -p blastn -a 8 etc. With similar arguments as below except the word size was 12, my shell script for the same input and database finishes almost instantly. I notice that except word size and min raw gapped score were changed by me, nothing appears to differ from the previous version parameters. Moreover, when I top my process, I find it uses only one CPU instead of 7. What may be the problem for the script that makes the job running for a day and still hasn't finished? blastn -query $1 -db $2 -out $1_$2.xml -num_threads 7 -word_size 4 -gapopen 3 -gapextend 1 -penalty -2 -outfmt 5 -xdrop_ungap 30 -xdrop_gap 30 -xdrop_gap_final 30 -min_raw_gapped_score 10 From anil_m_lal at yahoo.com Tue Mar 30 14:24:34 2010 From: anil_m_lal at yahoo.com (Anil Lal) Date: Tue, 30 Mar 2010 11:24:34 -0700 (PDT) Subject: [Bioperl-l] GSoC 2010 Message-ID: <717794.59615.qm@web37507.mail.mud.yahoo.com> Hello, I am a mid career software programmer and now transitioning in bioinformatics. I always had great interest in bioinformatics and only now am able to make a move to take classes. I am currently enrolled in University of santa cruz extension classes. I am very interested in GSoC 2010 and have identified potentially these two projects.Lightweight Sequence objects and Lazy Parsing mentored by Chris Fields and Perl Run Wrappers for External Programs in a Flash mentored by Mark Jenson. Please let me know if these projects are still available. If yes, I will send in my application with more details Thanks a lot for your help. I would be exciting to work in Bio Perl and contribute. Anil From schae234 at gmail.com Tue Mar 30 12:33:42 2010 From: schae234 at gmail.com (Robert Schaefer) Date: Tue, 30 Mar 2010 10:33:42 -0600 Subject: [Bioperl-l] Google Summer of Code Message-ID: <60c593881003300933p46c7c295k69a21ee986ef5777@mail.gmail.com> Hello, I am looking for more information of your mentorship program for google's SOC. Who would I contact for more information and to ask questions? Thank you, Rob Schaefer From forrest_zhang at 163.com Mon Mar 1 00:10:31 2010 From: forrest_zhang at 163.com (forrest) Date: Mon, 01 Mar 2010 13:10:31 +0800 Subject: [Bioperl-l] use threads to get seq file error. Message-ID: <4B8B4C47.108@163.com> Hi all, When I use threads to get Genbank format file, show some error. It is shown as: "Can't call method "get_taxon" on unblessed reference at /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." ========================================= #!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GenBank; use threads; my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); my $seq_out = Bio::SeqIO->new(-format => "genbank", -file => ">dhn_all.gb"); my @seq; my $number = @id; my $max_threads = 6; for (my $thread_number=0;$thread_number<$number;){ my %threads_seq_hash; if ($number - $thread_number > $max_threads){ for (my $thread=0;$thread<$max_threads;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } }else{ my $else_number = $number % $max_threads; for (my $thread=0;$thread<$else_number;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } } foreach my $thread (sort keys %threads_seq_hash){ my ($seq) = $threads_seq_hash{$thread}->join; push (@seq,$seq); } } foreach (@seq){ $seq_out->write_seq($_); } ========================================= How can I fix this error? Thanks. Zhang Tao From cjfields at illinois.edu Mon Mar 1 15:37:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 01 Mar 2010 14:37:18 -0600 Subject: [Bioperl-l] use threads to get seq file error. In-Reply-To: <4B8B4C47.108@163.com> References: <4B8B4C47.108@163.com> Message-ID: <1267475838.16248.8.camel@pyrimidine.igb.uiuc.edu> I get much nastier ones than that; a small taste: --------------------- WARNING --------------------- MSG: exception while parsing location line [1..680] in reading EMBL/GenBank/SwissProt, ignoring feature source (seqid=AF031249): Eval-group not allowed at runtime, use re 'eval' in regex m/(.*?)\(((?x-ism: (?> [^()]+ | \( (??{.../ at /home/cjfields/bioperl/live/Bio/Factory/FTLocationFactory.pm line 161, line 36. --------------------------------------------------- Thread 2 terminated abnormally: Can't call method "primary_tag" on an undefined value at /home/cjfields/bioperl/live/Bio/SeqIO/genbank.pm line 662, line 36. Could you report this as a bug? chris On Mon, 2010-03-01 at 13:10 +0800, forrest wrote: > Hi all, > > When I use threads to get Genbank format file, show some error. It is > shown as: > > "Can't call method "get_taxon" on unblessed reference at > /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." > > ========================================= > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > use Bio::Seq; > use Bio::DB::GenBank; > use threads; > > > my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); > > > my $seq_out = Bio::SeqIO->new(-format => "genbank", > -file => ">dhn_all.gb"); > my @seq; > > my $number = @id; > > my $max_threads = 6; > > for (my $thread_number=0;$thread_number<$number;){ > my %threads_seq_hash; > > if ($number - $thread_number > $max_threads){ > for (my $thread=0;$thread<$max_threads;){ > $threads_seq_hash{$thread} = threads->new(sub { > my $gb = Bio::DB::GenBank->new; > my $seq = $gb->get_Seq_by_acc($id[$thread_number]); > }); > $thread_number++; > $thread++; > > } > }else{ > my $else_number = $number % $max_threads; > for (my $thread=0;$thread<$else_number;){ > $threads_seq_hash{$thread} = threads->new(sub { > my $gb = Bio::DB::GenBank->new; > my $seq = $gb->get_Seq_by_acc($id[$thread_number]); > }); > $thread_number++; > $thread++; > > } > > > } > > foreach my $thread (sort keys %threads_seq_hash){ > my ($seq) = $threads_seq_hash{$thread}->join; > push (@seq,$seq); > } > } > > foreach (@seq){ > $seq_out->write_seq($_); > } > ========================================= > > > How can I fix this error? > Thanks. > > > Zhang Tao > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paolo.pavan at gmail.com Mon Mar 1 18:07:33 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 00:07:33 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> Message-ID: <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Dear all, Sorry for pushing up my post but, please does anyone have an hint for me? Maybe have I to send attached the report to the mailing list? I don't know attachment policies of the list, if it is allowed and is needed I can do that. Thank you, Paolo 2010/2/26 Paolo Pavan : > Sorry, > Maybe I forgot to add this is the megablast -m 5 output. > > Thank you again, > Paolo > > 2010/2/26 Paolo Pavan : >> Hi all, >> I have just a brief question: I've got some megablast reports such the >> one I've pasted below. >> I'm aware of the existence of the Bio::Search::IO::megablast and the >> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >> entire alignment represented as a Bio::SimpleAlign object or >> Bio::Align::AlignI implementing one? >> >> Thank you all, >> Paolo >> >> >> MEGABLAST 2.2.16 [Mar-25-2007] >> >> >> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >> "A greedy algorithm for aligning DNA sequences", >> J Comput Biol 2000; 7(1-2):203-14. >> >> Database: 00038-00053.fasta >> ?????????? 2 sequences; 2001 total letters >> >> Searching..................................................done >> >> Query= 00038-00053 >> ???????? (802 letters) >> >> >> >> ???????????????????????????????????????????????????????????????? Score??? E >> Sequences producing significant alignments:????????????????????? (bits) Value >> >> ______00038 >> 226?? 1e-62 >> ______00053 >> 115?? 3e-29 >> >> 1_0???????? 472 >> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >> ______00038 883 >> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 532 >> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >> ______00038 943 >> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 591 >> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 651 >> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 711 >> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053 1??? -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >> >> 1_0???????? 771? ccttaagtatttactttaacttgtcttgatca 802 >> ______00038 1000 -------------------------------- 1001 >> ______00053 35?? ccttaagtatt-actttaacttgtcttgatca 65 >> ? Database: 00038-00053.fasta >> ??? Posted date:? Feb 25, 2010? 4:47 PM >> ? Number of letters in database: 2001 >> ? Number of sequences in database:? 2 >> >> Lambda???? K????? H >> ??? 1.37??? 0.711???? 1.31 >> >> Gapped >> Lambda???? K????? H >> ??? 1.37??? 0.711???? 1.31 >> >> >> Matrix: blastn matrix:1 -3 >> Gap Penalties: Existence: 0, Extension: 0 >> Number of Sequences: 2 >> Number of Hits to DB: 17 >> Number of extensions: 3 >> Number of successful extensions: 3 >> Number of sequences better than 10.0: 2 >> Number of HSP's gapped: 2 >> Number of HSP's successfully gapped: 2 >> Length of query: 802 >> Length of database: 2001 >> Length adjustment: 10 >> Effective length of query: 792 >> Effective length of database: 1981 >> Effective search space:? 1568952 >> Effective search space used:? 1568952 >> X1: 9 (17.8 bits) >> X2: 20 (39.6 bits) >> X3: 51 (101.1 bits) >> S1: 9 (18.3 bits) >> S2: 9 (18.3 bits) >> > From cjfields at illinois.edu Mon Mar 1 19:30:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Mar 2010 18:30:43 -0600 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Message-ID: Paolo, You can get a Bio::SimpleAlign from the HSP object. The first code example in this section in the HOWTO demonstrates this: http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods chris On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > Dear all, > Sorry for pushing up my post but, please does anyone have an hint for me? > Maybe have I to send attached the report to the mailing list? I don't > know attachment policies of the list, if it is allowed and is needed I > can do that. > > Thank you, > Paolo > > 2010/2/26 Paolo Pavan : >> Sorry, >> Maybe I forgot to add this is the megablast -m 5 output. >> >> Thank you again, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Hi all, >>> I have just a brief question: I've got some megablast reports such the >>> one I've pasted below. >>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>> entire alignment represented as a Bio::SimpleAlign object or >>> Bio::Align::AlignI implementing one? >>> >>> Thank you all, >>> Paolo >>> >>> >>> MEGABLAST 2.2.16 [Mar-25-2007] >>> >>> >>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >>> "A greedy algorithm for aligning DNA sequences", >>> J Comput Biol 2000; 7(1-2):203-14. >>> >>> Database: 00038-00053.fasta >>> 2 sequences; 2001 total letters >>> >>> Searching..................................................done >>> >>> Query= 00038-00053 >>> (802 letters) >>> >>> >>> >>> Score E >>> Sequences producing significant alignments: (bits) Value >>> >>> ______00038 >>> 226 1e-62 >>> ______00053 >>> 115 3e-29 >>> >>> 1_0 472 >>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>> ______00038 883 >>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 532 >>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>> ______00038 943 >>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 591 >>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 651 >>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 711 >>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 1 -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>> >>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>> ______00038 1000 -------------------------------- 1001 >>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>> Database: 00038-00053.fasta >>> Posted date: Feb 25, 2010 4:47 PM >>> Number of letters in database: 2001 >>> Number of sequences in database: 2 >>> >>> Lambda K H >>> 1.37 0.711 1.31 >>> >>> Gapped >>> Lambda K H >>> 1.37 0.711 1.31 >>> >>> >>> Matrix: blastn matrix:1 -3 >>> Gap Penalties: Existence: 0, Extension: 0 >>> Number of Sequences: 2 >>> Number of Hits to DB: 17 >>> Number of extensions: 3 >>> Number of successful extensions: 3 >>> Number of sequences better than 10.0: 2 >>> Number of HSP's gapped: 2 >>> Number of HSP's successfully gapped: 2 >>> Length of query: 802 >>> Length of database: 2001 >>> Length adjustment: 10 >>> Effective length of query: 792 >>> Effective length of database: 1981 >>> Effective search space: 1568952 >>> Effective search space used: 1568952 >>> X1: 9 (17.8 bits) >>> X2: 20 (39.6 bits) >>> X3: 51 (101.1 bits) >>> S1: 9 (18.3 bits) >>> S2: 9 (18.3 bits) >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Mar 1 20:51:02 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 01 Mar 2010 17:51:02 -0800 Subject: [Bioperl-l] Any module for chromosome region analysis ? In-Reply-To: References: <1267131590.4355.2.camel@epistle> <1267131697.4355.3.camel@epistle> Message-ID: <4B8C6F06.5050905@bioperl.org> Like the ensembl perl API? Robert Bradbury wrote: > I'm not sure if the species being dealt with are "common", but it would seem > to me that a logical addition to bioperl would be an extension that took a > genome location (or locations) and interfaced one into a browser of those > regions in external databases (e.g. UCSC Genome Browser, Ensemble, etc.). > The only cases where that wouldn't work is if one is dealing with novel > species that aren't in the databases yet. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Tue Mar 2 01:21:31 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 01 Mar 2010 22:21:31 -0800 Subject: [Bioperl-l] call for project ideas - Google Summer of Code Message-ID: <4B8CAE6B.4010807@cornell.edu> Hi all, Google's Summer of Code is coming round again, very soon now (mentoring organization applications are due next week). We need project ideas for prospective Summer of Code interns. There's a page on the BioPerl wiki, please have a look and add your ideas for intern projects. For more on Google Summer of Code, what it is and how it works, see their FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs One of the summer intern ideas I have on the page so far is to help with the tough grunt work of breaking BioPerl into smaller, more easily managed distributions. I'm sure you all can think of plenty more! Here's the page: http://www.bioperl.org/wiki/Google_Summer_of_Code Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From paolo.pavan at gmail.com Tue Mar 2 09:37:59 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 15:37:59 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Message-ID: <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> Hi Chris, Thank you for your reply. So I have to understand that since the get_aln method returns the HSP alignment, there is no way to retrieve the whole alignment as in the example pasted, isn't it? Basically I'm trying to use megablast as kind of multiple local alignment engine and actually I'm not pretty sure this is a good idea but in my particular case could be suitable. I mean that the example below reports only the portions of the sequences that align loosing the portions that does not, I'm not sure I gave the idea. What do you think about? Can you give me your opinion? If there isn't any module written yet, I can try to write a parser, it could be of any interest? Thank you, Paolo 2010/3/2 Chris Fields : > Paolo, > > You can get a Bio::SimpleAlign from the HSP object. ?The first code example in this section in the HOWTO demonstrates this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > chris > > On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > >> Dear all, >> Sorry for pushing up my post but, please does anyone have an hint for me? >> Maybe have I to send attached the report to the mailing list? I don't >> know attachment policies of the list, if it is allowed and is needed I >> can do that. >> >> Thank you, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Sorry, >>> Maybe I forgot to add this is the megablast -m 5 output. >>> >>> Thank you again, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> Hi all, >>>> I have just a brief question: I've got some megablast reports such the >>>> one I've pasted below. >>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>> entire alignment represented as a Bio::SimpleAlign object or >>>> Bio::Align::AlignI implementing one? >>>> >>>> Thank you all, >>>> Paolo >>>> >>>> >>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>> >>>> >>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >>>> "A greedy algorithm for aligning DNA sequences", >>>> J Comput Biol 2000; 7(1-2):203-14. >>>> >>>> Database: 00038-00053.fasta >>>> ? ? ? ? ? ?2 sequences; 2001 total letters >>>> >>>> Searching..................................................done >>>> >>>> Query= 00038-00053 >>>> ? ? ? ? ?(802 letters) >>>> >>>> >>>> >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Score ? ?E >>>> Sequences producing significant alignments: ? ? ? ? ? ? ? ? ? ? ?(bits) Value >>>> >>>> ______00038 >>>> 226 ? 1e-62 >>>> ______00053 >>>> 115 ? 3e-29 >>>> >>>> 1_0 ? ? ? ? 472 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>> ______00038 883 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 532 >>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>> ______00038 943 >>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 591 >>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 651 >>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 711 >>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 1 ? ?-------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>> >>>> 1_0 ? ? ? ? 771 ?ccttaagtatttactttaacttgtcttgatca 802 >>>> ______00038 1000 -------------------------------- 1001 >>>> ______00053 35 ? ccttaagtatt-actttaacttgtcttgatca 65 >>>> ? Database: 00038-00053.fasta >>>> ? ? Posted date: ?Feb 25, 2010 ?4:47 PM >>>> ? Number of letters in database: 2001 >>>> ? Number of sequences in database: ?2 >>>> >>>> Lambda ? ? K ? ? ?H >>>> ? ? 1.37 ? ?0.711 ? ? 1.31 >>>> >>>> Gapped >>>> Lambda ? ? K ? ? ?H >>>> ? ? 1.37 ? ?0.711 ? ? 1.31 >>>> >>>> >>>> Matrix: blastn matrix:1 -3 >>>> Gap Penalties: Existence: 0, Extension: 0 >>>> Number of Sequences: 2 >>>> Number of Hits to DB: 17 >>>> Number of extensions: 3 >>>> Number of successful extensions: 3 >>>> Number of sequences better than 10.0: 2 >>>> Number of HSP's gapped: 2 >>>> Number of HSP's successfully gapped: 2 >>>> Length of query: 802 >>>> Length of database: 2001 >>>> Length adjustment: 10 >>>> Effective length of query: 792 >>>> Effective length of database: 1981 >>>> Effective search space: ?1568952 >>>> Effective search space used: ?1568952 >>>> X1: 9 (17.8 bits) >>>> X2: 20 (39.6 bits) >>>> X3: 51 (101.1 bits) >>>> S1: 9 (18.3 bits) >>>> S2: 9 (18.3 bits) >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Zhang_tao at uestc.edu.cn Mon Mar 1 00:02:12 2010 From: Zhang_tao at uestc.edu.cn (Zhang_tao) Date: Mon, 01 Mar 2010 13:02:12 +0800 Subject: [Bioperl-l] use threads to get seq file error. Message-ID: <467416916.06375@eyou.net> Hi all, When I use threads to get Genbank format file, show some error. It is shown as: "Can't call method "get_taxon" on unblessed reference at /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." #!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GenBank; use threads; my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); my $seq_out = Bio::SeqIO->new(-format => "genbank", -file => ">dhn_all.gb"); my @seq; my $number = @id; my $max_threads = 6; for (my $thread_number=0;$thread_number<$number;){ my %threads_seq_hash; if ($number - $thread_number > $max_threads){ for (my $thread=0;$thread<$max_threads;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } }else{ my $else_number = $number % $max_threads; for (my $thread=0;$thread<$else_number;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } } foreach my $thread (sort keys %threads_seq_hash){ my ($seq) = $threads_seq_hash{$thread}->join; push (@seq,$seq); } } foreach (@seq){ $seq_out->write_seq($_); } How can I fix this error? Thanks. Zhang Tao From lpritc at scri.ac.uk Mon Mar 1 06:32:10 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 01 Mar 2010 11:32:10 +0000 Subject: [Bioperl-l] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts Message-ID: Hi, I've tried going back through the mailing list, Googling the answer, and reading the documentation and wiki to find a solution for this. I've either missed it, or it's not there yet. Hopefully there's a simple solution, or an option that I'm just not seeing. I'm sure other people must be using CHADO for bacterial genomes, and I would be interested in hearing about best practice for using CHADO/GBROWSE with these sequences (I've seen http://gmod.org/wiki/Chado_for_prokaryotes - but there's not much in there...). I have a working CHADO(GMOD-1.0)/GBROWSE2/BioPerl 1.6.1 setup on CentOS 5.4, and I'm trying to load some bacterial data. Specifically for this example, I'm trying to get the GenBank sequences for E.coli S88: NC_011742 and NC_011747 into CHADO. I've been following instructions from a number of locations, including http://gmod.org/wiki/Artemis-Chado_Integration_Tutorial and http://gmod.org/wiki/Chado_Tutorial, but there's an issue with these two files, in that the NC_011742 (chromosome) and NC_011747 (plasmid) sequences contain genes that have the same names (and several genes with the same name in the same sequence!), and this appears to be a problem. Here's what's going wrong: I start off with the two GenBank files: """ [lpritc at localhost ~]$ ls -1 *.gbk NC_011742.gbk NC_011747.gbk """ And convert these to .gff3 using the BioPerl script (it doesn't seem to matter whether I pass them with the wildcard, or convert separately, though passing multiple sequences for conversion might be a good place to check for unique IDs): """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s *.gbk # Input: NC_011742.gbk # working on region:NC_011742, Escherichia coli S88, 19-DEC-2008, Escherichia coli S88, complete genome. # GFF3 saved to ./NC_011742.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4696 # gene 4898 # region 1 # pseudogene 151 # CDS 4696 # RESIDUES(tr) 1442813 # RESIDUES 5032268 # processed_transcript 89 # rRNA 22 # pseudogenic_region 151 # exon 4899 # tRNA 91 # # Input: NC_011747.gbk # working on region:NC_011747, Escherichia coli S88, 18-AUG-2009, Escherichia coli S88 plasmid pECOS88, complete sequence. # GFF3 saved to ./NC_011747.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4832 # gene 5037 # region 2 # pseudogene 159 # CDS 4832 # RESIDUES(tr) 1477756 # RESIDUES 5166121 # processed_transcript 92 # rRNA 22 # pseudogenic_region 159 # exon 5038 # tRNA 91 # """ I can then use the gmod_bulk_load_gff3.pl script to load either file, but only singly. This appears to work, and the result is visible and seemingly correctly navigable in GBROWSE (using NC_011747 as the first sequence here, but the order is unimportant): """ [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref GeneID --noexon --recreate_cache --gfffile NC_011747.gbk.gff (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes...Done. Preparing data for inserting into the chado database (This may take a while ...) Dropping cds temp tables... Creating cds temp tables... NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_cds_row_id_seq" for serial column "tmp_cds_handler.cds_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_pkey" for table "tmp_cds_handler" NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_relationship_rel_row_id_seq" for serial column "tmp_cds_handler_relationship.rel_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" Loading data into feature table ... Loading data into featureloc table ... Loading data into feature_relationship table ... Loading data into featureprop table ... Skipping feature_cvterm table since the load file is empty... Skipping synonym table since the load file is empty... Skipping feature_synonym table since the load file is empty... Skipping dbxref table since the load file is empty... Loading data into feature_dbxref table ... Skipping analysisfeature table since the load file is empty... Skipping cvterm table since the load file is empty... Skipping db table since the load file is empty... Skipping cv table since the load file is empty... Skipping analysis table since the load file is empty... Skipping organism table since the load file is empty... Adding cvtermprop=MapReferenceType for 'region' ... Loading sequences (if any) ... Optimizing database (this may take a while) ... (feature featureloc feature_relationship featureprop feature_cvterm synonym feature_synonym dbxref feature_dbxref analysisfeature cvterm db cv analysis organism ) Done. While this script has made an effort to optimize the database, you should probably also run VACUUM FULL ANALYZE on the database as well """ """ chado=> SELECT feature_id, organism_id, name, uniquename FROM feature WHERE name='NC_011747'; feature_id | organism_id | name | uniquename ------------+-------------+-----------+------------ 146917 | 99 | NC_011747 | NC_011747 """ However, attempting to load in the second sequence throws an error (though this might also be a good point to check for ID uniqueness with a database check, and appropriate modification to the ID, if necessary - problems could arise if we were trying to add genuine duplicates, though...): """ [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref GeneID --noexon --recreate_cache --gfffile NC_011742.gbk.gff (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes...Done. Preparing data for inserting into the chado database (This may take a while ...) Dropping cds temp tables... Creating cds temp tables... NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_cds_row_id_seq" for serial column "tmp_cds_handler.cds_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_pkey" for table "tmp_cds_handler" NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_relationship_rel_row_id_seq" for serial column "tmp_cds_handler_relationship.rel_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" no parent yacC; you probably need to rerun the loader with the --recreate_cache option Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Pg::db handle dbname=chado;port=5432;host=localhost. """ This, of course, prevents the upload of the sequence and its annotations, as a whole. The script recommends that the --recreate_cache option should be used, but I am already using it. If the same process is run, reversing the order of the input files, the same error is reported, but for the gene with name 'int'. Both sequences contain genes with the names 'int' and 'yacC' (NC_011742 appears to contain four genes with the name 'int'): """ [lpritc at localhost ~]$ grep 'ID=yacC;' *.gbk.gff NC_011742.gbk.gff:NC_011742 GenBank gene 142755 143273 . - . ID=yacC;Dbxref=GeneID:7130628;gene=yacC;locus_tag=ECS88_0131 NC_011747.gbk.gff:NC_011747 GenBank gene 85083 85931 . + . ID=yacC;Dbxref=GeneID:7119486;gene=yacC;locus_tag=pECS88_0103 [lpritc at localhost ~]$ grep 'ID=int;' *.gbk.gff NC_011742.gbk.gff:NC_011742 GenBank gene 1182443 1183585 . - . ID=int;Dbxref=GeneID:7131611;gene=int;locus_tag=ECS88_1152 NC_011742.gbk.gff:NC_011742 GenBank pseudogene 1998684 1999646 . + . ID=int;Dbxref=GeneID:7128964;gene=int;locus_tag=ECS88_2031;pseudo=_no_value NC_011742.gbk.gff:NC_011742 GenBank gene 2829972 2830991 . + . ID=int;Dbxref=GeneID:7131911;gene=int;locus_tag=ECS88_2851 NC_011742.gbk.gff:NC_011742 GenBank gene 3220074 3221336 . + . ID=int;Dbxref=GeneID:7129893;gene=int;locus_tag=ECS88_3250 NC_011747.gbk.gff:NC_011747 GenBank gene 132 872 . + . ID=int;Dbxref=GeneID:7119360;gene=int;locus_tag=pECS88_0001 """ Commenting out either of these genes, and their child features, defers the error to another gene that has the same name in both sequences in each case. It seems that the problem might derive from attempting to uniquely associate each gene uniquely with its 'gene' tag in the GenBank file and, as there are several points in the process where it would be sensible to check for name collisions, so that the feature:uniquename column can be modified to reflect this, I looked for command-line options to each script, but didn't see one that could help. Examining the manual for gmod_bulk_load_gff3.pl suggests that this might be the problem (though I might be misunderstanding it): """ Column 9 (group) Here is where the magic happens. Assigning feature.name, feature.uniquename The values of feature.name and feature.uniquename are assigned according to these simple rules: If there is an ID tag, that is used as feature.uniquename otherwise, it is assigned a uniquename that is equal to ?auto? concatenated with the feature_id. (Note that this is a potential problem as there is no check to make sure that it is appropriately unique.) If there is a Name tag, it?s value is set to feature.name; otherwise it is null. Note that these rules are much more simple than that those that Bio::DB::GFF uses, and may need to be revisited. """ I suspect that, as the bp_genbank2gff3.pl script converts gene names (which are not guaranteed to be unique) to ID tags, the problem recognised in the manual is cropping up at this point. Luckily, the GenBank files come with locus_tag tags, which should be unique for each gene (see http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html#locus_tag). For bacteria, at least, using the locus_tag values might be a more robust option for the bp_genbank2gff3.pl; this already appears to have been recognised in the script comments: """ #?? should gene_name from /locus_tag,/gene,/product,/transposon=xxx # be converted to or added as Name=xxx (if not ID= or as well) ## problematic: convert_to_name ($feature); # drops /locus_tag,/gene, tags """ I can get round the upload problem somewhat suckily by changing the priority given to 'locus_tag' and 'gene' tags for generating the .gff ID tag in the bp_genbank2gff3.pl script: """ [lpritc at localhost ~]$ diff bp_genbank2gff3.pl /usr/bin/bp_genbank2gff3.pl 976,977c976,977 < if ($g->has_tag('locus_tag')) { < ($gene_id) = $g->get_tag_values('locus_tag'); --- > if ($g->has_tag('gene')) { > ($gene_id) = $g->get_tag_values('gene'); 979,980c979,980 < elsif ($g->has_tag('gene')) { < ($gene_id) = $g->get_tag_values('gene'); --- > elsif ($g->has_tag('locus_tag')) { > ($gene_id) = $g->get_tag_values('locus_tag'); """ But this isn't a complete solution, as GBROWSE searches by gene name don't work after making this change, and presumably some further configuration or hacking about is required to sort that out (advice welcome). So, what are other people doing to overcome this issue (if you've seen it), and would a change to the bp_genbank2gff.pl script along the lines I mention be useful to others? Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From janine.arloth at googlemail.com Mon Mar 1 11:25:09 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Mon, 1 Mar 2010 17:25:09 +0100 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> Hello, I am running blast+ and want to create blastdb, depending on a checkbox. That means when mydb is to old then I want to rebuilt the blastdb files and create a ''new'' db. When the latest versions of my files is ok, then blast should ran with the existing db. Using this code, there I will never built a new db. It is creating and than it does not create a new one. if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } Thanks for helping From jensen at fortinbras.us Mon Mar 1 22:58:09 2010 From: jensen at fortinbras.us (Mark A. Jensen) Date: Mon, 1 Mar 2010 22:58:09 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> References: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> Message-ID: <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> Hi Janine-- You'll need to get the latest version of Bio/Tools/Run/StandAloneBlastPlus.pm (rev. 16878). Then the -overwrite parameter will actually work, and you can write if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -overwrite => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } MAJ ----- Original Message ----- From: "Janine Arloth" To: Cc: Sent: Monday, March 01, 2010 11:25 AM Subject: StandAloneBlastPlus Hello, I am running blast+ and want to create blastdb, depending on a checkbox. That means when mydb is to old then I want to rebuilt the blastdb files and create a ''new'' db. When the latest versions of my files is ok, then blast should ran with the existing db. Using this code, there I will never built a new db. It is creating and than it does not create a new one. if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } Thanks for helping From szy0931 at gmail.com Tue Mar 2 01:08:10 2010 From: szy0931 at gmail.com (Zhenyu Shen) Date: Mon, 1 Mar 2010 22:08:10 -0800 (PST) Subject: [Bioperl-l] how to convert a txt file to a bed file? Message-ID: I want to convert a txt file to a bed file and then load the bed file to USCS genome browser. But how to convert the txt file to a bed file with perl? thanks From joaofadista at gmail.com Tue Mar 2 04:10:03 2010 From: joaofadista at gmail.com (fadista) Date: Tue, 2 Mar 2010 01:10:03 -0800 (PST) Subject: [Bioperl-l] Next-gen modules Message-ID: Hi, I would like to know if there is any Next-gen sequencing modules on Bioperl. Specifically, I would like to know if there is a perl script to trim poor quality sequence reads from Illumina/Solexa platform. Best regards, Fadista From maj at fortinbras.us Tue Mar 2 09:51:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 09:51:12 -0500 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com><56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com><56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> Message-ID: <18C0182252934619AD12E49243BE3C14@NewLife> This might a good method to have for Bio::Search::Tiling-- you want to stitch together all the hsps and have the concatenated alignment returned as a Bio::SimpleAlign, correct? Tiling would create the right set of hsps from which to generate the composite alignment. I can try to get something working, but it may take a while- MAJ ----- Original Message ----- From: "Paolo Pavan" To: "Chris Fields" Cc: Sent: Tuesday, March 02, 2010 9:37 AM Subject: Re: [Bioperl-l] Alignment from blast report Hi Chris, Thank you for your reply. So I have to understand that since the get_aln method returns the HSP alignment, there is no way to retrieve the whole alignment as in the example pasted, isn't it? Basically I'm trying to use megablast as kind of multiple local alignment engine and actually I'm not pretty sure this is a good idea but in my particular case could be suitable. I mean that the example below reports only the portions of the sequences that align loosing the portions that does not, I'm not sure I gave the idea. What do you think about? Can you give me your opinion? If there isn't any module written yet, I can try to write a parser, it could be of any interest? Thank you, Paolo 2010/3/2 Chris Fields : > Paolo, > > You can get a Bio::SimpleAlign from the HSP object. The first code example in > this section in the HOWTO demonstrates this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > chris > > On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > >> Dear all, >> Sorry for pushing up my post but, please does anyone have an hint for me? >> Maybe have I to send attached the report to the mailing list? I don't >> know attachment policies of the list, if it is allowed and is needed I >> can do that. >> >> Thank you, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Sorry, >>> Maybe I forgot to add this is the megablast -m 5 output. >>> >>> Thank you again, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> Hi all, >>>> I have just a brief question: I've got some megablast reports such the >>>> one I've pasted below. >>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>> entire alignment represented as a Bio::SimpleAlign object or >>>> Bio::Align::AlignI implementing one? >>>> >>>> Thank you all, >>>> Paolo >>>> >>>> >>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>> >>>> >>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>> (2000), >>>> "A greedy algorithm for aligning DNA sequences", >>>> J Comput Biol 2000; 7(1-2):203-14. >>>> >>>> Database: 00038-00053.fasta >>>> 2 sequences; 2001 total letters >>>> >>>> Searching..................................................done >>>> >>>> Query= 00038-00053 >>>> (802 letters) >>>> >>>> >>>> >>>> Score E >>>> Sequences producing significant alignments: (bits) Value >>>> >>>> ______00038 >>>> 226 1e-62 >>>> ______00053 >>>> 115 3e-29 >>>> >>>> 1_0 472 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>> ______00038 883 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 532 >>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>> ______00038 943 >>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 591 >>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 651 >>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 711 >>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 1 -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg >>>> 34 >>>> >>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>> ______00038 1000 -------------------------------- 1001 >>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>> Database: 00038-00053.fasta >>>> Posted date: Feb 25, 2010 4:47 PM >>>> Number of letters in database: 2001 >>>> Number of sequences in database: 2 >>>> >>>> Lambda K H >>>> 1.37 0.711 1.31 >>>> >>>> Gapped >>>> Lambda K H >>>> 1.37 0.711 1.31 >>>> >>>> >>>> Matrix: blastn matrix:1 -3 >>>> Gap Penalties: Existence: 0, Extension: 0 >>>> Number of Sequences: 2 >>>> Number of Hits to DB: 17 >>>> Number of extensions: 3 >>>> Number of successful extensions: 3 >>>> Number of sequences better than 10.0: 2 >>>> Number of HSP's gapped: 2 >>>> Number of HSP's successfully gapped: 2 >>>> Length of query: 802 >>>> Length of database: 2001 >>>> Length adjustment: 10 >>>> Effective length of query: 792 >>>> Effective length of database: 1981 >>>> Effective search space: 1568952 >>>> Effective search space used: 1568952 >>>> X1: 9 (17.8 bits) >>>> X2: 20 (39.6 bits) >>>> X3: 51 (101.1 bits) >>>> S1: 9 (18.3 bits) >>>> S2: 9 (18.3 bits) >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Mar 2 10:12:02 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 10:12:02 -0500 Subject: [Bioperl-l] Installing bioperl on windows In-Reply-To: <30b0ffab-3ad6-4b59-8c19-2f203ff6c4f9@f17g2000prh.googlegroups.com> References: <30b0ffab-3ad6-4b59-8c19-2f203ff6c4f9@f17g2000prh.googlegroups.com> Message-ID: The steps on the wiki are in fact quite detailed. What we need then is details from you--the commands you ran and your error messages. Thanks. ----- Original Message ----- From: "disha" To: Sent: Friday, February 26, 2010 8:43 AM Subject: [Bioperl-l] Installing bioperl on windows > Please tell me the procedure (detailed ) for installing bioperl on > windows vista.I tried the steps mentioned on the site but failed at > the initial steps > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From scott at scottcain.net Tue Mar 2 11:11:13 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 2 Mar 2010 11:11:13 -0500 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: References: Message-ID: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Hi Leighton, Wow, that is a lot of text; I really appreciate your thoroughness in describing the problem. I have a few suggestions to get the ball rolling. First, I am working on the 1.1 release of gmod/chado, and it may fix some of the problems you are describing. Certainly, ID collisions between GFF files should not be a problem (I didn't think they were in the 1.0 release, but that was a long time ago). Please try a checkout of the schema trunk in the gmod svn: http://gmod.org/wiki/SVN Another thing you may want to look at is that just last week, a developer at Texas A&M, Nathan Liles, contributed code to the bioperl-live trunk for the genbank2gff3.pl script that will do a much better job of converting bacterial genbank files to GFF3; perhaps that will help too. Working with a svn checkout of bioperl-live shouldn't be too scary either; the pieces you are interested in (that work with Chado and GBrowse) are quite stable. Let us know how it goes, Scott On Mon, Mar 1, 2010 at 6:32 AM, Leighton Pritchard wrote: > Hi, > > I've tried going back through the mailing list, Googling the answer, and > reading the documentation and wiki to find a solution for this. ?I've either > missed it, or it's not there yet. ?Hopefully there's a simple solution, or > an option that I'm just not seeing. ?I'm sure other people must be using > CHADO for bacterial genomes, and I would be interested in hearing about best > practice for using CHADO/GBROWSE with these sequences (I've seen > http://gmod.org/wiki/Chado_for_prokaryotes - but there's not much in > there...). > > I have a working CHADO(GMOD-1.0)/GBROWSE2/BioPerl 1.6.1 setup on CentOS 5.4, > and I'm trying to load some bacterial data. ?Specifically for this example, > I'm trying to get the GenBank sequences for E.coli S88: NC_011742 and > NC_011747 into CHADO. ?I've been following instructions from a number of > locations, including http://gmod.org/wiki/Artemis-Chado_Integration_Tutorial > and http://gmod.org/wiki/Chado_Tutorial, but there's an issue with these two > files, in that the NC_011742 (chromosome) and NC_011747 (plasmid) sequences > contain genes that have the same names (and several genes with the same name > in the same sequence!), and this appears to be a problem. ?Here's what's > going wrong: > > I start off with the two GenBank files: > > """ > [lpritc at localhost ~]$ ls -1 *.gbk > NC_011742.gbk > NC_011747.gbk > """ > > And convert these to .gff3 using the BioPerl script (it doesn't seem to > matter whether I pass them with the wildcard, or convert separately, though > passing multiple sequences for conversion might be a good place to check for > unique IDs): > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s *.gbk > # Input: NC_011742.gbk > # working on region:NC_011742, Escherichia coli S88, 19-DEC-2008, > Escherichia coli S88, complete genome. > # GFF3 saved to ./NC_011742.gbk.gff > # Summary: > # Feature ? ?Count > # ------- ? ?----- > # mRNA ?4696 > # gene ?4898 > # region ?1 > # pseudogene ?151 > # CDS ?4696 > # RESIDUES(tr) ?1442813 > # RESIDUES ?5032268 > # processed_transcript ?89 > # rRNA ?22 > # pseudogenic_region ?151 > # exon ?4899 > # tRNA ?91 > # > # Input: NC_011747.gbk > # working on region:NC_011747, Escherichia coli S88, 18-AUG-2009, > Escherichia coli S88 plasmid pECOS88, complete sequence. > # GFF3 saved to ./NC_011747.gbk.gff > # Summary: > # Feature ? ?Count > # ------- ? ?----- > # mRNA ?4832 > # gene ?5037 > # region ?2 > # pseudogene ?159 > # CDS ?4832 > # RESIDUES(tr) ?1477756 > # RESIDUES ?5166121 > # processed_transcript ?92 > # rRNA ?22 > # pseudogenic_region ?159 > # exon ?5038 > # tRNA ?91 > # > """ > > I can then use the gmod_bulk_load_gff3.pl script to load either file, but > only singly. ?This appears to work, and the result is visible and seemingly > correctly navigable in GBROWSE (using NC_011747 as the first sequence here, > but the order is unimportant): > > """ > [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref > GeneID --noexon --recreate_cache --gfffile NC_011747.gbk.gff > (Re)creating the uniquename cache in the database... > Creating table... > Populating table... > Creating indexes...Done. > Preparing data for inserting into the chado database > (This may take a while ...) > Dropping cds temp tables... > Creating cds temp tables... > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_cds_row_id_seq" for serial column > "tmp_cds_handler.cds_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_pkey" for table "tmp_cds_handler" > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column > "tmp_cds_handler_relationship.rel_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" > Loading data into feature table ... > Loading data into featureloc table ... > Loading data into feature_relationship table ... > Loading data into featureprop table ... > Skipping feature_cvterm table since the load file is empty... > Skipping synonym table since the load file is empty... > Skipping feature_synonym table since the load file is empty... > Skipping dbxref table since the load file is empty... > Loading data into feature_dbxref table ... > Skipping analysisfeature table since the load file is empty... > Skipping cvterm table since the load file is empty... > Skipping db table since the load file is empty... > Skipping cv table since the load file is empty... > Skipping analysis table since the load file is empty... > Skipping organism table since the load file is empty... > Adding cvtermprop=MapReferenceType for 'region' ... > Loading sequences (if any) ... > Optimizing database (this may take a while) ... > ?(feature featureloc feature_relationship featureprop feature_cvterm > synonym feature_synonym dbxref feature_dbxref analysisfeature cvterm db cv > analysis organism ) Done. > > While this script has made an effort to optimize the database, you > should probably also run VACUUM FULL ANALYZE on the database as well > """ > > """ > chado=> SELECT feature_id, organism_id, name, uniquename FROM feature WHERE > name='NC_011747'; > ?feature_id | organism_id | ? name ? ?| uniquename > ------------+-------------+-----------+------------ > ? ? 146917 | ? ? ? ? ?99 | NC_011747 | NC_011747 > """ > > However, attempting to load in the second sequence throws an error (though > this might also be a good point to check for ID uniqueness with a database > check, and appropriate modification to the ID, if necessary - problems could > arise if we were trying to add genuine duplicates, though...): > > """ > [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref > GeneID --noexon --recreate_cache --gfffile NC_011742.gbk.gff > (Re)creating the uniquename cache in the database... > Creating table... > Populating table... > Creating indexes...Done. > Preparing data for inserting into the chado database > (This may take a while ...) > Dropping cds temp tables... > Creating cds temp tables... > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_cds_row_id_seq" for serial column > "tmp_cds_handler.cds_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_pkey" for table "tmp_cds_handler" > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column > "tmp_cds_handler_relationship.rel_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" > > no parent yacC; > you probably need to rerun the loader with the --recreate_cache option > > Issuing rollback() due to DESTROY without explicit disconnect() of > DBD::Pg::db handle dbname=chado;port=5432;host=localhost. > """ > > This, of course, prevents the upload of the sequence and its annotations, as > a whole. > > The script recommends that the --recreate_cache option should be used, but I > am already using it. ?If the same process is run, reversing the order of the > input files, the same error is reported, but for the gene with name 'int'. > Both sequences contain genes with the names 'int' and 'yacC' (NC_011742 > appears to contain four genes with the name 'int'): > > """ > [lpritc at localhost ~]$ grep 'ID=yacC;' *.gbk.gff > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?142755 ? ?143273 ? ?. ? ?- > . ? ?ID=yacC;Dbxref=GeneID:7130628;gene=yacC;locus_tag=ECS88_0131 > NC_011747.gbk.gff:NC_011747 ? ?GenBank ? ?gene ? ?85083 ? ?85931 ? ?. ? ?+ > . ? ?ID=yacC;Dbxref=GeneID:7119486;gene=yacC;locus_tag=pECS88_0103 > > [lpritc at localhost ~]$ grep 'ID=int;' *.gbk.gff > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?1182443 ? ?1183585 ? ?. > - ? ?. ? ?ID=int;Dbxref=GeneID:7131611;gene=int;locus_tag=ECS88_1152 > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?pseudogene ? ?1998684 ? ?1999646 > . ? ?+ ? ?. > ID=int;Dbxref=GeneID:7128964;gene=int;locus_tag=ECS88_2031;pseudo=_no_value > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?2829972 ? ?2830991 ? ?. > + ? ?. ? ?ID=int;Dbxref=GeneID:7131911;gene=int;locus_tag=ECS88_2851 > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?3220074 ? ?3221336 ? ?. > + ? ?. ? ?ID=int;Dbxref=GeneID:7129893;gene=int;locus_tag=ECS88_3250 > NC_011747.gbk.gff:NC_011747 ? ?GenBank ? ?gene ? ?132 ? ?872 ? ?. ? ?+ ? ?. > ID=int;Dbxref=GeneID:7119360;gene=int;locus_tag=pECS88_0001 > """ > > Commenting out either of these genes, and their child features, defers the > error to another gene that has the same name in both sequences in each case. > It seems that the problem might derive from attempting to uniquely associate > each gene uniquely with its 'gene' tag in the GenBank file and, as there are > several points in the process where it would be sensible to check for name > collisions, so that the feature:uniquename column can be modified to reflect > this, I looked for command-line options to each script, but didn't see one > that could help. ?Examining the manual for gmod_bulk_load_gff3.pl suggests > that this might be the problem (though I might be misunderstanding it): > > """ > ? ? ? Column 9 (group) > ? ? ? ? ? Here is where the magic happens. > > ? ? ? ? ? Assigning feature.name, feature.uniquename > ? ? ? ? ? ? ? The values of feature.name and feature.uniquename are > assigned according to these simple rules: > > ? ? ? ? ? ? ? If there is an ID tag, that is used as feature.uniquename > ? ? ? ? ? ? ? ? ? otherwise, it is assigned a uniquename that is equal to > ?auto? concatenated with the feature_id. > > ? ? ? ? ? ? ? ? ? (Note that this is a potential problem as there is no > check to make sure that it is appropriately unique.) > > ? ? ? ? ? ? ? If there is a Name tag, it?s value is set to feature.name; > ? ? ? ? ? ? ? ? ? otherwise it is null. > > ? ? ? ? ? ? ? ? ? Note that these rules are much more simple than that > those that Bio::DB::GFF uses, and may need to be revisited. > """ > > I suspect that, as the bp_genbank2gff3.pl script converts gene names (which > are not guaranteed to be unique) to ID tags, the problem recognised in the > manual is cropping up at this point. ?Luckily, the GenBank files come with > locus_tag tags, which should be unique for each gene (see > http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html#locus_tag). ?For > bacteria, at least, using the locus_tag values might be a more robust option > for the bp_genbank2gff3.pl; this already appears to have been recognised in > the script comments: > > """ > ? ? ? ? ? ?#?? should gene_name from > /locus_tag,/gene,/product,/transposon=xxx > ? ? ? ? ? ?# be converted to or added as ?Name=xxx (if not ID= or as well) > ? ? ? ? ? ?## problematic: convert_to_name ($feature); # drops > /locus_tag,/gene, tags > """ > > I can get round the upload problem somewhat suckily by changing the priority > given to 'locus_tag' and 'gene' tags for generating the .gff ID tag in the > bp_genbank2gff3.pl script: > > """ > [lpritc at localhost ~]$ diff bp_genbank2gff3.pl /usr/bin/bp_genbank2gff3.pl > 976,977c976,977 > < ? ? if ($g->has_tag('locus_tag')) { > < ? ? ? ? ($gene_id) = $g->get_tag_values('locus_tag'); > --- >> ? ? if ($g->has_tag('gene')) { >> ? ? ? ? ($gene_id) = $g->get_tag_values('gene'); > 979,980c979,980 > < ? ? elsif ($g->has_tag('gene')) { > < ? ? ? ? ($gene_id) = $g->get_tag_values('gene'); > --- >> ? ? elsif ($g->has_tag('locus_tag')) { >> ? ? ? ? ($gene_id) = $g->get_tag_values('locus_tag'); > """ > > But this isn't a complete solution, as GBROWSE searches by gene name don't > work after making this change, and presumably some further configuration or > hacking about is required to sort that out (advice welcome). > > So, what are other people doing to overcome this issue (if you've seen it), > and would a change to the bp_genbank2gff.pl script along the lines I mention > be useful to others? > > Cheers, > > L. > > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From sdavis2 at mail.nih.gov Tue Mar 2 11:33:38 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 2 Mar 2010 11:33:38 -0500 Subject: [Bioperl-l] how to convert a txt file to a bed file? In-Reply-To: References: Message-ID: <264855a01003020833v3e15dcb7vcdd876ce80468740@mail.gmail.com> On Tue, Mar 2, 2010 at 1:08 AM, Zhenyu Shen wrote: > I want to convert a txt file to a bed file and then load the bed file > to USCS genome browser. But how to convert the txt file to a bed file > with perl? Hi, Zhenyu. A bed file IS a text file, with the format described here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#BED You just need to make your text file conform to that format and you are set to go. Sean From paolo.pavan at gmail.com Tue Mar 2 10:17:35 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 16:17:35 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <18C0182252934619AD12E49243BE3C14@NewLife> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> <18C0182252934619AD12E49243BE3C14@NewLife> Message-ID: <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> I think you got the sense, thank you. Of course hsps from different hits will be reflected in different elements aligned. I've attached the example pasted (unix text) because is more readable, hoping will not be held by the mailing server :-) Thank you, Paolo 2010/3/2 Mark A. Jensen : > This might a good method to have for Bio::Search::Tiling-- > you want to stitch together all the hsps and have the > concatenated alignment returned as a Bio::SimpleAlign, > correct? Tiling would create the right set of hsps from > which to generate the composite alignment. I can > try to get something working, but it may take a while- > MAJ > ----- Original Message ----- From: "Paolo Pavan" > To: "Chris Fields" > Cc: > Sent: Tuesday, March 02, 2010 9:37 AM > Subject: Re: [Bioperl-l] Alignment from blast report > > > Hi Chris, > Thank you for your reply. So I have to understand that since the > get_aln method returns the HSP alignment, there is no way to retrieve > the whole alignment as in the example pasted, isn't it? > Basically I'm trying to use megablast as kind of multiple local > alignment engine and actually I'm not pretty sure this is a good idea > but in my particular case could be suitable. I mean that the example > below reports only the portions of the sequences that align loosing > the portions that does not, I'm not sure I gave the idea. What do you > think about? Can you give me your opinion? > If there isn't any module written yet, I can try to write a parser, it > could be of any interest? > > Thank you, > Paolo > > 2010/3/2 Chris Fields : >> >> Paolo, >> >> You can get a Bio::SimpleAlign from the HSP object. The first code example >> in this section in the HOWTO demonstrates this: >> >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >> >> chris >> >> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >> >>> Dear all, >>> Sorry for pushing up my post but, please does anyone have an hint for me? >>> Maybe have I to send attached the report to the mailing list? I don't >>> know attachment policies of the list, if it is allowed and is needed I >>> can do that. >>> >>> Thank you, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> >>>> Sorry, >>>> Maybe I forgot to add this is the megablast -m 5 output. >>>> >>>> Thank you again, >>>> Paolo >>>> >>>> 2010/2/26 Paolo Pavan : >>>>> >>>>> Hi all, >>>>> I have just a brief question: I've got some megablast reports such the >>>>> one I've pasted below. >>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>> Bio::Align::AlignI implementing one? >>>>> >>>>> Thank you all, >>>>> Paolo >>>>> >>>>> >>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>> >>>>> >>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>> (2000), >>>>> "A greedy algorithm for aligning DNA sequences", >>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>> >>>>> Database: 00038-00053.fasta >>>>> 2 sequences; 2001 total letters >>>>> >>>>> Searching..................................................done >>>>> >>>>> Query= 00038-00053 >>>>> (802 letters) >>>>> >>>>> >>>>> >>>>> Score E >>>>> Sequences producing significant alignments: (bits) Value >>>>> >>>>> ______00038 >>>>> 226 1e-62 >>>>> ______00053 >>>>> 115 3e-29 >>>>> >>>>> 1_0 472 >>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>> ______00038 883 >>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 532 >>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>> ______00038 943 >>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 591 >>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 651 >>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 711 >>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 1 >>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>> >>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>> ______00038 1000 -------------------------------- 1001 >>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>> Database: 00038-00053.fasta >>>>> Posted date: Feb 25, 2010 4:47 PM >>>>> Number of letters in database: 2001 >>>>> Number of sequences in database: 2 >>>>> >>>>> Lambda K H >>>>> 1.37 0.711 1.31 >>>>> >>>>> Gapped >>>>> Lambda K H >>>>> 1.37 0.711 1.31 >>>>> >>>>> >>>>> Matrix: blastn matrix:1 -3 >>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>> Number of Sequences: 2 >>>>> Number of Hits to DB: 17 >>>>> Number of extensions: 3 >>>>> Number of successful extensions: 3 >>>>> Number of sequences better than 10.0: 2 >>>>> Number of HSP's gapped: 2 >>>>> Number of HSP's successfully gapped: 2 >>>>> Length of query: 802 >>>>> Length of database: 2001 >>>>> Length adjustment: 10 >>>>> Effective length of query: 792 >>>>> Effective length of database: 1981 >>>>> Effective search space: 1568952 >>>>> Effective search space used: 1568952 >>>>> X1: 9 (17.8 bits) >>>>> X2: 20 (39.6 bits) >>>>> X3: 51 (101.1 bits) >>>>> S1: 9 (18.3 bits) >>>>> S2: 9 (18.3 bits) >>>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: example.megaout Type: application/octet-stream Size: 2918 bytes Desc: not available URL: From Russell.Smithies at agresearch.co.nz Tue Mar 2 14:35:19 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Mar 2010 08:35:19 +1300 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> References: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61E4E660@exchsth.agresearch.co.nz> If you want to continue using your current version, you could try to delete your old blast db first. if($checkbox eq 'yes'){ unlink "mydb.*"; #or maybe `rm -f mydb.*` $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Tuesday, 2 March 2010 4:58 p.m. > To: Janine Arloth > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] StandAloneBlastPlus > > Hi Janine-- > You'll need to get the latest version of > Bio/Tools/Run/StandAloneBlastPlus.pm > (rev. 16878). > Then the -overwrite parameter will actually work, and you can write > > if($checkbox eq 'yes'){ > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -prog_dir => "/usr/local/ncbi/blast/bin", > -db_name => 'mydb', > -db_data => 'xxx.fa', > -overwrite => 1); > } > else{ > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'mydb'); > } > > MAJ > > ----- Original Message ----- > From: "Janine Arloth" > To: > Cc: > Sent: Monday, March 01, 2010 11:25 AM > Subject: StandAloneBlastPlus > > > Hello, > > I am running blast+ and want to create blastdb, depending on a checkbox. > That > means when mydb is to old then I want to rebuilt the blastdb files and > create a > ''new'' db. > When the latest versions of my files is ok, then blast should ran with > the > existing db. > Using this code, there I will never built a new db. It is creating and > than it > does not create a new one. > > > if($checkbox eq 'yes'){ > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -prog_dir => "/usr/local/ncbi/blast/bin", > -db_name => 'mydb', > -db_data => 'xxx.fa', > -create => 1); > } > else{ > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'mydb'); > } > > Thanks for helping > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From armendarez77 at hotmail.com Tue Mar 2 16:06:17 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Tue, 2 Mar 2010 13:06:17 -0800 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 Message-ID: Hello, I am writing a script to remotely access annotation files and parse information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it with random RefSeq accession numbers (NC_######) when something odd happened. When I used the accession number 'NC_007092', the script seemed to freeze. After some time, 'Out of Memory' was printed to the terminal. When I investigated the annotation file associated with NC_007092, a MapViewer page opened. It turns out that NC_007092 is a genome shotgun sequence, but it does not start with 'NZ' as I though all shotgun sequences did. Is this a random event that I don't have to worry much about or is there a way to pre-screen accession numbers to ensure they are associated with complete genome RefSeq files? I've included my script in case there is something I missed that could have prevented this. Thank you, Veronica _________________ use strict; use Bio::Perl; use Getopt::Long; use IO::Handle; my $accessionNumber; GetOptions("accessionNumber=s"=>\$accessionNumber); unless($accessionNumber){ print<<"OPTIONS"; options for $0 accessionNumber -a accession number OPTIONS die; } my $description = annotation_info($accessionNumber); print "$description\n"; sub annotation_info{ my $seqObj; my $accNum = shift(@_); my $rs = Bio::DB::RefSeq->new(); my $gb = Bio::DB::GenBank->new(); if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore in their accession number $seqObj = $rs->get_Seq_by_id($accNum); } elsif($accNum !~ /_/){ #GenBank annotation $seqObj = $gb->get_Seq_by_id($accNum); } return $seqObj->desc(); } _________________________________________________________________ Hotmail: Trusted email with Microsoft?s powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469226/direct/01/ From maj at fortinbras.us Tue Mar 2 15:58:59 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 15:58:59 -0500 Subject: [Bioperl-l] bioperl job Message-ID: Hi All, I have a contact looking for an individual with Bioperl experience who could do contractual on-site work in the Cambridge MA area. **I have no business interest in this whatever, just doing a friend a favor.** Let me know directly (not to the list) if you have interest. thanks -- MAJ From Russell.Smithies at agresearch.co.nz Tue Mar 2 18:08:51 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Mar 2010 12:08:51 +1300 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large. Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html Also, it may help to check the docsum first to see how big the file is going to be? (the full Genbank file for this example is only 6MB in size) =================== use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' ); my ($id) = $factory->get_ids; # get a summary $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); my $ds = $factory->next_DocSum; print "ID: $id\n"; # flattened mode while (my $item = $ds->next_Item('flattened')) { # not all Items have content, so need to check... printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } print "\n"; # download the full genbank file $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => "$id.gb"); ================ Hope this helps, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com > Sent: Wednesday, 3 March 2010 10:06 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > > Hello, > > I am writing a script to remotely access annotation files and parse > information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it > with random RefSeq accession numbers (NC_######) when something odd > happened. When I used the accession number 'NC_007092', the script seemed > to freeze. After some time, 'Out of Memory' was printed to the terminal. > > When I investigated the annotation file associated with NC_007092, a > MapViewer page opened. It turns out that NC_007092 is a genome shotgun > sequence, but it does not start with 'NZ' as I though all shotgun > sequences did. > > Is this a random event that I don't have to worry much about or is there a > way to pre-screen accession numbers to ensure they are associated with > complete genome RefSeq files? > > I've included my script in case there is something I missed that could > have prevented this. > > Thank you, > > Veronica > > > _________________ > > use strict; > use Bio::Perl; > use Getopt::Long; > use IO::Handle; > > my $accessionNumber; > > GetOptions("accessionNumber=s"=>\$accessionNumber); > unless($accessionNumber){ > print<<"OPTIONS"; > options for $0 > accessionNumber -a accession number > OPTIONS > die; > } > > my $description = annotation_info($accessionNumber); > > print "$description\n"; > > > > sub annotation_info{ > > my $seqObj; > > my $accNum = shift(@_); > > my $rs = Bio::DB::RefSeq->new(); > my $gb = Bio::DB::GenBank->new(); > > > if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore > in their accession number > > $seqObj = $rs->get_Seq_by_id($accNum); > } > elsif($accNum !~ /_/){ #GenBank annotation > $seqObj = $gb->get_Seq_by_id($accNum); > } > > return $seqObj->desc(); > } > > > _________________________________________________________________ > Hotmail: Trusted email with Microsoft's powerful SPAM protection. > http://clk.atdmt.com/GBL/go/201469226/direct/01/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From armendarez77 at hotmail.com Tue Mar 2 18:16:03 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Tue, 2 Mar 2010 15:16:03 -0800 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> References: , <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> Message-ID: I see. I work mostly in the bacteria world so mammalian chromosomes shouldn't be an issue. I just randomly picked it to test my script when it came up after I did a simple search for Bacillus in the Genome database. I'll look into docSum to help prevent unexpected large files from interrupting my script. Thank you. Veronica > From: Russell.Smithies at agresearch.co.nz > To: armendarez77 at hotmail.com; bioperl-l at lists.open-bio.org > Date: Wed, 3 Mar 2010 12:08:51 +1300 > Subject: Re: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large. > Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html > > Also, it may help to check the docsum first to see how big the file is going to be? > (the full Genbank file for this example is only 6MB in size) > > =================== > use Bio::DB::EUtilities; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' ); > > my ($id) = $factory->get_ids; > > # get a summary > $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); > my $ds = $factory->next_DocSum; > print "ID: $id\n"; > # flattened mode > while (my $item = $ds->next_Item('flattened')) { > # not all Items have content, so need to check... > printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; > } > print "\n"; > > > # download the full genbank file > $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'nucleotide', > -id => $id, > -rettype => 'gbwithparts'); > $factory->get_Response(-file => "$id.gb"); > > ================ > > Hope this helps, > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com > > Sent: Wednesday, 3 March 2010 10:06 a.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > > > > > Hello, > > > > I am writing a script to remotely access annotation files and parse > > information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it > > with random RefSeq accession numbers (NC_######) when something odd > > happened. When I used the accession number 'NC_007092', the script seemed > > to freeze. After some time, 'Out of Memory' was printed to the terminal. > > > > When I investigated the annotation file associated with NC_007092, a > > MapViewer page opened. It turns out that NC_007092 is a genome shotgun > > sequence, but it does not start with 'NZ' as I though all shotgun > > sequences did. > > > > Is this a random event that I don't have to worry much about or is there a > > way to pre-screen accession numbers to ensure they are associated with > > complete genome RefSeq files? > > > > I've included my script in case there is something I missed that could > > have prevented this. > > > > Thank you, > > > > Veronica > > > > > > _________________ > > > > use strict; > > use Bio::Perl; > > use Getopt::Long; > > use IO::Handle; > > > > my $accessionNumber; > > > > GetOptions("accessionNumber=s"=>\$accessionNumber); > > unless($accessionNumber){ > > print<<"OPTIONS"; > > options for $0 > > accessionNumber -a accession number > > OPTIONS > > die; > > } > > > > my $description = annotation_info($accessionNumber); > > > > print "$description\n"; > > > > > > > > sub annotation_info{ > > > > my $seqObj; > > > > my $accNum = shift(@_); > > > > my $rs = Bio::DB::RefSeq->new(); > > my $gb = Bio::DB::GenBank->new(); > > > > > > if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore > > in their accession number > > > > $seqObj = $rs->get_Seq_by_id($accNum); > > } > > elsif($accNum !~ /_/){ #GenBank annotation > > $seqObj = $gb->get_Seq_by_id($accNum); > > } > > > > return $seqObj->desc(); > > } > > > > > > _________________________________________________________________ > > Hotmail: Trusted email with Microsoft's powerful SPAM protection. > > http://clk.atdmt.com/GBL/go/201469226/direct/01/ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ From csaba.ortutay at uta.fi Thu Mar 4 04:57:00 2010 From: csaba.ortutay at uta.fi (Csaba Ortutay) Date: Thu, 4 Mar 2010 11:57:00 +0200 Subject: [Bioperl-l] Bio::DB::CUTG problem Message-ID: <201003041157.01013.csaba.ortutay@uta.fi> Hello, We would use Bio::DB::CUTG module to get codon usage data for a large number of genomes. We have noticed that the module cannot findcertain organisms which are otherwise in the database. It happens when the name contains some non- alphabetic characters. A few examples: Streptococcus agalactiae 2603V/R Shigella flexneri 5 str. 8401 I have located the corresponding part in the CUTG.pm code, and I would suggest a change: 222c222 < my $nameparts = join "+", $self->sp =~ /(\w+)/g; --- > my $nameparts = join "+", $self->sp =~ /(\S+)/g; With this I can now access the wanted tables. Best regards, Csaba -- Csaba Ortutay PhD Docent of Bioinformatics IMT Bioinformatics University of Tampere Finland From maj at fortinbras.us Thu Mar 4 08:10:06 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 08:10:06 -0500 Subject: [Bioperl-l] Bio::DB::CUTG problem In-Reply-To: <201003041157.01013.csaba.ortutay@uta.fi> References: <201003041157.01013.csaba.ortutay@uta.fi> Message-ID: Thanks, Csaba - change made and commited at r16898 MAJA ----- Original Message ----- From: "Csaba Ortutay" To: Sent: Thursday, March 04, 2010 4:57 AM Subject: [Bioperl-l] Bio::DB::CUTG problem > Hello, > > We would use Bio::DB::CUTG module to get codon usage data for a large number > of genomes. > > We have noticed that the module cannot findcertain organisms which are > otherwise in the database. It happens when the name contains some non- > alphabetic characters. > > A few examples: > > Streptococcus agalactiae 2603V/R > Shigella flexneri 5 str. 8401 > > I have located the corresponding part in the CUTG.pm code, and I would suggest > a change: > > 222c222 > < my $nameparts = join "+", $self->sp =~ /(\w+)/g; > --- >> my $nameparts = join "+", $self->sp =~ /(\S+)/g; > > > With this I can now access the wanted tables. > > Best regards, > Csaba > > -- > Csaba Ortutay PhD > Docent of Bioinformatics > IMT Bioinformatics > University of Tampere > Finland > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Thu Mar 4 09:40:18 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 04 Mar 2010 14:40:18 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> Message-ID: <4B8FC652.2010607@bioperl.org> Palani - This should be directed to the mailing list. -------- Original Message -------- From: PalaniKannan K Subject: Enquiry about Remoteblast.pm Date: Thu, 4 Mar 2010 10:23:45 +0530 I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through Remoteblast.pm script available through CPAN. When i am submitting my query... it shows waiting for much time. Ex. (waiting .....................) http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html This is the reference script i am using through Remoteblast perl module. It worked upto last 02/03/2010. Now it is not working We had developed 3 applications using this module. The same error comes in 3 applications we developed. So, i confim that our script dont have problem. Kindly help me in this regard. -- With Regards, palani kannan. k From maj at fortinbras.us Thu Mar 4 09:50:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 09:50:54 -0500 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com><56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com><56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com><56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com><18C0182252934619AD12E49243BE3C14@NewLife> <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> Message-ID: <2FB5C317605B48269256ABFABBED2239@NewLife> Paolo -- Ok, there's now (r16900) an *experimental* method in Bio::Search::Tiling::MapTiling called get_tiled_alns(). POD is below. Try it out and let me know-- cheers, MAJ =head1 TILED ALIGNMENTS The experimental method L will use a tiling to concatenate tiled hsps into a series of L objects: @alns = $tiling->get_tiled_alns($type, $context); Each alignment contains two sequences with ids 'query' and 'subject', and consists of a concatenation of tiling HSPs which overlap or are directly adjacent. The alignment are returned in C<$type> sequence order. When HSPs overlap, the alignment sequence is taken from the HSP which comes first in the coverage map array. The sequences in each alignment contain features (even though they are L objects) which map the original query/subject coordinates to the new alignment sequence coordinates. You can determine the original BLAST fragments this way: $aln = ($tiling->get_tiled_alns)[0]; $qseq = $aln->get_seq_by_id('query'); $hseq = $aln->get_seq_by_id('subject'); foreach my $feat ($qseq->get_SeqFeatures) { $org_start = ($feat->get_tag_values('query_start'))[0]; $org_end = ($feat->get_tag_values('query_end'))[0]; # original fragment as represented in the tiled alignment: $org_fragment = $feat->seq; } foreach my $feat ($hseq->get_SeqFeatures) { $org_start = ($feat->get_tag_values('subject_start'))[0]; $org_end = ($feat->get_tag_values('subject_end'))[0]; # original fragment as represented in the tiled alignment: $org_fragment = $feat->seq; } ----- Original Message ----- From: "Paolo Pavan" To: "Mark A. Jensen" Cc: "Chris Fields" ; Sent: Tuesday, March 02, 2010 10:17 AM Subject: Re: [Bioperl-l] Alignment from blast report >I think you got the sense, thank you. Of course hsps from different > hits will be reflected in different elements aligned. I've attached > the example pasted (unix text) because is more readable, hoping will > not be held by the mailing server :-) > > Thank you, > Paolo > > 2010/3/2 Mark A. Jensen : >> This might a good method to have for Bio::Search::Tiling-- >> you want to stitch together all the hsps and have the >> concatenated alignment returned as a Bio::SimpleAlign, >> correct? Tiling would create the right set of hsps from >> which to generate the composite alignment. I can >> try to get something working, but it may take a while- >> MAJ >> ----- Original Message ----- From: "Paolo Pavan" >> To: "Chris Fields" >> Cc: >> Sent: Tuesday, March 02, 2010 9:37 AM >> Subject: Re: [Bioperl-l] Alignment from blast report >> >> >> Hi Chris, >> Thank you for your reply. So I have to understand that since the >> get_aln method returns the HSP alignment, there is no way to retrieve >> the whole alignment as in the example pasted, isn't it? >> Basically I'm trying to use megablast as kind of multiple local >> alignment engine and actually I'm not pretty sure this is a good idea >> but in my particular case could be suitable. I mean that the example >> below reports only the portions of the sequences that align loosing >> the portions that does not, I'm not sure I gave the idea. What do you >> think about? Can you give me your opinion? >> If there isn't any module written yet, I can try to write a parser, it >> could be of any interest? >> >> Thank you, >> Paolo >> >> 2010/3/2 Chris Fields : >>> >>> Paolo, >>> >>> You can get a Bio::SimpleAlign from the HSP object. The first code example >>> in this section in the HOWTO demonstrates this: >>> >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>> >>> chris >>> >>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >>> >>>> Dear all, >>>> Sorry for pushing up my post but, please does anyone have an hint for me? >>>> Maybe have I to send attached the report to the mailing list? I don't >>>> know attachment policies of the list, if it is allowed and is needed I >>>> can do that. >>>> >>>> Thank you, >>>> Paolo >>>> >>>> 2010/2/26 Paolo Pavan : >>>>> >>>>> Sorry, >>>>> Maybe I forgot to add this is the megablast -m 5 output. >>>>> >>>>> Thank you again, >>>>> Paolo >>>>> >>>>> 2010/2/26 Paolo Pavan : >>>>>> >>>>>> Hi all, >>>>>> I have just a brief question: I've got some megablast reports such the >>>>>> one I've pasted below. >>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>>> Bio::Align::AlignI implementing one? >>>>>> >>>>>> Thank you all, >>>>>> Paolo >>>>>> >>>>>> >>>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>>> >>>>>> >>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>>> (2000), >>>>>> "A greedy algorithm for aligning DNA sequences", >>>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>>> >>>>>> Database: 00038-00053.fasta >>>>>> 2 sequences; 2001 total letters >>>>>> >>>>>> Searching..................................................done >>>>>> >>>>>> Query= 00038-00053 >>>>>> (802 letters) >>>>>> >>>>>> >>>>>> >>>>>> Score E >>>>>> Sequences producing significant alignments: (bits) Value >>>>>> >>>>>> ______00038 >>>>>> 226 1e-62 >>>>>> ______00053 >>>>>> 115 3e-29 >>>>>> >>>>>> 1_0 472 >>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>>> ______00038 883 >>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 532 >>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>>> ______00038 943 >>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 591 >>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 651 >>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 711 >>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 1 >>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>>> >>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>>> ______00038 1000 -------------------------------- 1001 >>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>>> Database: 00038-00053.fasta >>>>>> Posted date: Feb 25, 2010 4:47 PM >>>>>> Number of letters in database: 2001 >>>>>> Number of sequences in database: 2 >>>>>> >>>>>> Lambda K H >>>>>> 1.37 0.711 1.31 >>>>>> >>>>>> Gapped >>>>>> Lambda K H >>>>>> 1.37 0.711 1.31 >>>>>> >>>>>> >>>>>> Matrix: blastn matrix:1 -3 >>>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>>> Number of Sequences: 2 >>>>>> Number of Hits to DB: 17 >>>>>> Number of extensions: 3 >>>>>> Number of successful extensions: 3 >>>>>> Number of sequences better than 10.0: 2 >>>>>> Number of HSP's gapped: 2 >>>>>> Number of HSP's successfully gapped: 2 >>>>>> Length of query: 802 >>>>>> Length of database: 2001 >>>>>> Length adjustment: 10 >>>>>> Effective length of query: 792 >>>>>> Effective length of database: 1981 >>>>>> Effective search space: 1568952 >>>>>> Effective search space used: 1568952 >>>>>> X1: 9 (17.8 bits) >>>>>> X2: 20 (39.6 bits) >>>>>> X3: 51 (101.1 bits) >>>>>> S1: 9 (18.3 bits) >>>>>> S2: 9 (18.3 bits) >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From janine.arloth at googlemail.com Wed Mar 3 04:44:18 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Wed, 3 Mar 2010 10:44:18 +0100 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: References: Message-ID: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Hello, which arguments or result can I get from hits? hit = $result->next_hit; print $hit->name; Are there more than the name? Exists a description, where I can look up this? Regards From David.Messina at sbc.su.se Thu Mar 4 10:27:46 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 4 Mar 2010 16:27:46 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <4B8FC652.2010607@bioperl.org> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> Message-ID: <31C89CCE-25B8-492A-924D-A7401D415584@sbc.su.se> Hi Palani, You're using a very old version of BioPerl, 1.4: > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html The current release version is 1.6.1. Also, NCBi is changing (or may have already changed) their remote access system to require an email address. The very latest builds of BioPerl should now be compatible with this change. Get it here: http://www.bioperl.org/DIST/nightly_builds/ or directly via Subversion ? instructions here: http://www.bioperl.org/wiki/Getting_BioPerl Dave From cjfields at illinois.edu Thu Mar 4 10:30:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 04 Mar 2010 09:30:54 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <4B8FC652.2010607@bioperl.org> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> Message-ID: <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> Palani, We have a few regression tests that should have caught this but aren't quite set up correctly (they silently pass if no report is returned). This may be smoething on NCBI's end though; any remote database or analyses are notoriously brittle, hence the need to skip these by default when installing tests. Final note, but hopefully you aren't using bioperl 1.4 (as indicated by the docs). We're now on the 1.6 release series and are now on v. 1.6.1; 1.4 isn't supported anymore. chris On Thu, 2010-03-04 at 14:40 +0000, Jason Stajich wrote: > Palani - > This should be directed to the mailing list. > > -------- Original Message -------- > From: PalaniKannan K > Subject: Enquiry about Remoteblast.pm > Date: Thu, 4 Mar 2010 10:23:45 +0530 > > > > > > I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through > Remoteblast.pm script available through CPAN. When i am submitting my > query... it shows waiting for much time. Ex. (waiting .....................) > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html > > This is the reference script i am using through Remoteblast perl module. > > It worked upto last 02/03/2010. Now it is not working > > We had developed 3 applications using this module. The same error comes in 3 > applications we developed. So, i confim that our script dont have problem. > Kindly help me in this regard. > > -- > With Regards, > palani kannan. k > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Mar 4 10:27:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 10:27:16 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> References: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Message-ID: Check out http://www.bioperl.org/wiki/HOWTO:SearchIO MAJ ----- Original Message ----- From: "Janine Arloth" To: Sent: Wednesday, March 03, 2010 4:44 AM Subject: [Bioperl-l] StandAloneBlastPlus > Hello, > > which arguments or result can I get from hits? > > hit = $result->next_hit; > print $hit->name; > > Are there more than the name? Exists a description, where I can look up this? > > Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Thu Mar 4 10:25:45 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 04 Mar 2010 10:25:45 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> References: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Message-ID: <90B9BFFC-73DA-469F-900C-70448A9B1C03@verizon.net> http://www.bioperl.org/wiki/HOWTO:SearchIO On Mar 3, 2010, at 4:44 AM, Janine Arloth wrote: > Hello, > > which arguments or result can I get from hits? > > hit = $result->next_hit; > print $hit->name; > > Are there more than the name? Exists a description, where I can look up this? > > Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 4 11:49:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 04 Mar 2010 10:49:01 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> Message-ID: <1267721341.23329.26.camel@pyrimidine.igb.uiuc.edu> Okay, I'm able to replicate this (and the tests now correctly attempt to catch it). It appears that this may be a general RemoteBlast issue, as regular RemoteBlast tests are also taking forever. This shouldn't be related to the email issue (this isn't in RemoteBlast.pm yet). At least, I would hope NCBI would pass back another status besides 'WAITING' in cases where the email isn't provided. chris On Thu, 2010-03-04 at 09:30 -0600, Chris Fields wrote: > Palani, > > We have a few regression tests that should have caught this but aren't > quite set up correctly (they silently pass if no report is returned). > This may be smoething on NCBI's end though; any remote database or > analyses are notoriously brittle, hence the need to skip these by > default when installing tests. > > Final note, but hopefully you aren't using bioperl 1.4 (as indicated by > the docs). We're now on the 1.6 release series and are now on v. 1.6.1; > 1.4 isn't supported anymore. > > chris > > On Thu, 2010-03-04 at 14:40 +0000, Jason Stajich wrote: > > Palani - > > This should be directed to the mailing list. > > > > -------- Original Message -------- > > From: PalaniKannan K > > Subject: Enquiry about Remoteblast.pm > > Date: Thu, 4 Mar 2010 10:23:45 +0530 > > > > > > > > > > > > I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through > > Remoteblast.pm script available through CPAN. When i am submitting my > > query... it shows waiting for much time. Ex. (waiting .....................) > > > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html > > > > This is the reference script i am using through Remoteblast perl module. > > > > It worked upto last 02/03/2010. Now it is not working > > > > We had developed 3 applications using this module. The same error comes in 3 > > applications we developed. So, i confim that our script dont have problem. > > Kindly help me in this regard. > > > > -- > > With Regards, > > palani kannan. k > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Mar 4 14:06:33 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 04 Mar 2010 11:06:33 -0800 Subject: [Bioperl-l] call for project ideas - Google Summer of Code In-Reply-To: References: <4B8CAE6B.4010807@cornell.edu> Message-ID: <4B9004B9.8090107@cornell.edu> Hello Luis, These are interesting ideas. Have a look at http://sswap.info and http://sadiframework.org, perhaps you might want to work with one of those technologies? Be warned, these are both in early-stage development, you are on the cutting edge here! It seems like your desire to work with semantic technologies as a GSoC student could fit under a number of different mentoring organizations, possibly OBF or NEScent, or maybe another organization entirely. I'll make some inquiries. In the mean time, please add a project idea for this on the bioperl GSoC page, to give the idea somewhere to coalesce. If you can, try to come up with a more concrete idea for what you want to do. http://www.bioperl.org/wiki/Google_Summer_of_Code What do you think? Rob Luis M Rodriguez-R wrote: > Hello Robert, > > I would like to how to apply to and when the GSoC-2010 is planned to be performed. I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf and uniprot, but it could also be useful to integrate OWL). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL. The people of bio2rdf already have some parsers, but it's incompleteness is evident when working with their RDF as primary source of data. > > Best regards, > Luis. > > El 2/03/2010, a las 1:21, Robert Buels escribi?: > >> Hi all, >> >> Google's Summer of Code is coming round again, very soon now (mentoring organization applications are due next week). We need project ideas for prospective Summer of Code interns. >> >> There's a page on the BioPerl wiki, please have a look and add your ideas for intern projects. >> >> For more on Google Summer of Code, what it is and how it works, see their FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs >> >> One of the summer intern ideas I have on the page so far is to help with the tough grunt work of breaking BioPerl into smaller, more easily managed distributions. I'm sure you all can think of plenty more! >> >> Here's the page: http://www.bioperl.org/wiki/Google_Summer_of_Code >> >> Rob >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > From joa2006 at med.cornell.edu Thu Mar 4 15:11:58 2010 From: joa2006 at med.cornell.edu (Josef Anrather) Date: Thu, 04 Mar 2010 15:11:58 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] Message-ID: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Hi there, same problems here. Bioperl 1.6.1 installed; RemoteBlast version 1.006001. Could someone point me in the right direction. What is the put parameter for the email address? Does the supplied email address end up in an FBI data base if you blast the B.anthracis genome? Josef Cornell Medical College From maj at fortinbras.us Thu Mar 4 16:18:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 16:18:48 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: we're not at liberty to say ----- Original Message ----- From: "Josef Anrather" To: Sent: Thursday, March 04, 2010 3:11 PM Subject: Re: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] > Hi there, > > same problems here. Bioperl 1.6.1 installed; RemoteBlast version > 1.006001. > Could someone point me in the right direction. What is the put > parameter for the email address? > > Does the supplied email address end up in an FBI data base if you > blast the B.anthracis genome? > > Josef > > Cornell Medical College > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Fri Mar 5 05:05:43 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Mar 2010 11:05:43 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: My apologies for jumping the gun on the email thing ? that won't take effect until June 1. See full details here: http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d Looks like the problems with RemoteBlast (as Chris reported elsewhere in this thread) is at NCBI's servers (and is probably temporary). Dave From robert.bradbury at gmail.com Fri Mar 5 08:20:36 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 5 Mar 2010 08:20:36 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: On Fri, Mar 5, 2010 at 5:05 AM, Dave Messina wrote: > My apologies for jumping the gun on the email thing ? that won't take > effect until June 1. > > See full details here: > > http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d > > > Looks like the problems with RemoteBlast (as Chris reported elsewhere in > this thread) is at NCBI's servers (and is probably temporary). > > I would not be at all surprised if any problems involving RemoteBlast were related to the recent changeovers to a Javascript requirement for all interfaces to NCBI databases (this took place around mid-February and I complained about this in a previous email to the BioPerl list). I received a response back from Dr. Eric Sayers at NCBI on Feb. 26 that indicated that they were aware of the problem (involving a Javascript requirement) and indicated that NCBI developers were "investigating" ways to mitigate the problem. I've looked briefly at the new Javascript code that one is required to run when using PubMed, etc. and it looks like they may have completely changed the external interfaces to NCBI databases -- so I'm not surprised if that broke some or all other external interfaces used by BioPerl (RemoteBlast, Eutils, etc.). I'd suggest that you try to document the problems as best you can and submit them to the NCBI help desk (or info at ncbi.nlm.nih.gov). It may be worth noting that it took ~3 weeks for me to receive any response to my reports. Also note, that (a) to the best of my knowledge there has been no public discussion regarding these recent changes at NCBI; and (b) under the Jan. 21, 2009 Memorandum on Transparency and Open Government, and under the Dec 8, 2009 Open Government Directive, NCBI *should* be doing a better job working with its end users (and the taxpayers) -- and at least thus far, while NIH seems to be making an effort that doesn't seem to have filtered down to NCBI. (For example, no open/public discussion regarding the email requirement for remote blasts...). It is also worth noting that it should be possible to file FOI requests with NIH/NCBI to find out exactly what they are doing and why they are doing it. I haven't taken such steps yet but I have given consideration to doing so. Robert From biopython at maubp.freeserve.co.uk Fri Mar 5 08:31:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Mar 2010 13:31:57 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> On Fri, Mar 5, 2010 at 1:20 PM, Robert Bradbury wrote: > > (For example, no open/public discussion regarding the email > requirement for remote blasts...). > Hi all, What email requirement for remote blasts are you talking about? Note that the email referred to earlier talks about to unrelated issues, (1) changes to the BLAST output with the introduction of BLAST+, and (2) the upcoming email requirement for Entrez (aka E-utilities, they have been very clear about that with plenty of warning). http://lists.open-bio.org/pipermail/open-bio-l/2010-February/000615.html http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032159.html Is there a misunderstanding here? Peter From David.Messina at sbc.su.se Fri Mar 5 08:44:08 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Mar 2010 14:44:08 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> Message-ID: <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> > Is there a misunderstanding here? Whoops, yes there is ? that's my fault, too. I did not read carefully and conflated EUtilities and RemoteBLAST. Just to be clear, the upcoming email requirement will be for EUtilities, NOT for RemoteBLAST. Thanks for clearing that up, Peter. Dave On Mar 5, 2010, at 14:31, Peter wrote: > On Fri, Mar 5, 2010 at 1:20 PM, Robert Bradbury wrote: >> >> (For example, no open/public discussion regarding the email >> requirement for remote blasts...). >> > > Hi all, > > What email requirement for remote blasts are you talking about? > > Note that the email referred to earlier talks about to unrelated > issues, (1) changes to the BLAST output with the introduction > of BLAST+, and (2) the upcoming email requirement for Entrez > (aka E-utilities, they have been very clear about that with > plenty of warning). > > http://lists.open-bio.org/pipermail/open-bio-l/2010-February/000615.html > http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032159.html > > > Peter From biopython at maubp.freeserve.co.uk Fri Mar 5 08:48:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Mar 2010 13:48:27 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> Message-ID: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: > >> Is there a misunderstanding here? > > Whoops, yes there is ? that's my fault, too. I did not > read carefully and conflated EUtilities and RemoteBLAST. > > Just to be clear, the upcoming email requirement will > be for EUtilities, NOT for RemoteBLAST. > > Thanks for clearing that up, Peter. > Dave No problem - you guys had me worried there for a minute ;) Peter From cjfields at illinois.edu Fri Mar 5 08:50:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 07:50:51 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> On Mar 5, 2010, at 7:20 AM, Robert Bradbury wrote: > On Fri, Mar 5, 2010 at 5:05 AM, Dave Messina wrote: > >> My apologies for jumping the gun on the email thing ? that won't take >> effect until June 1. >> >> See full details here: >> >> http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d >> >> >> Looks like the problems with RemoteBlast (as Chris reported elsewhere in >> this thread) is at NCBI's servers (and is probably temporary). >> >> > I would not be at all surprised if any problems involving RemoteBlast were > related to the recent changeovers to a Javascript requirement for all > interfaces to NCBI databases (this took place around mid-February and I > complained about this in a previous email to the BioPerl list). Robert, according to Palani's recent response NCBI provided a perl script that worked, so I don't think it a Javascript issue. My guess is a change in the returned page information that isn't caught by the current regex, a problem that has happened in the past. I'll be looking into it today. > I received a response back from Dr. Eric Sayers at NCBI on Feb. 26 that > indicated that they were aware of the problem (involving a Javascript > requirement) and indicated that NCBI developers were "investigating" ways to > mitigate the problem. > > I've looked briefly at the new Javascript code that one is required to run > when using PubMed, etc. and it looks like they may have completely changed > the external interfaces to NCBI databases -- so I'm not surprised if that > broke some or all other external interfaces used by BioPerl (RemoteBlast, > Eutils, etc.). I'd suggest that you try to document the problems as best > you can and submit them to the NCBI help desk (or info at ncbi.nlm.nih.gov). > It may be worth noting that it took ~3 weeks for me to receive any response > to my reports. EUtilities works fine (both regular and SOAP); all regression tests are passing, so it's not affecting everything. > Also note, that (a) to the best of my knowledge there has been no public > discussion regarding these recent changes at NCBI; and (b) under the Jan. > 21, 2009 Memorandum on Transparency and Open Government, and under the Dec > 8, 2009 Open Government Directive, NCBI *should* be doing a better job > working with its end users (and the taxpayers) -- and at least thus far, > while NIH seems to be making an effort that doesn't seem to have filtered > down to NCBI. > > (For example, no open/public discussion regarding the email requirement for > remote blasts...). > > It is also worth noting that it should be possible to file FOI requests with > NIH/NCBI to find out exactly what they are doing and why they are doing it. > I haven't taken such steps yet but I have given consideration to doing so. > > Robert The email requirement has always been indicated, it was just never enforced. B/c of increased spamming issues on the NCBI server they took up the initiative to require users provide an email address (and enforce it starting in June). I just made a change to the BioPerl install that requests an email and bypasses Bio::DB::EUtilities tests if one is not provided, other tools will be following suit. I don't think there is anything insidious about this. My guess is they will be using them merely to track server usage per user and IP, and take necessary measures (i.e. contact or block) if needed. Finally, I'm not sure where the hostility is coming from. NCBI has provided a great service to the community for many years, even through many funding cuts, and they have had quite a few. Frankly, if one doesn't like their service requirements, there are other databases that one can use. chris From cjfields at illinois.edu Fri Mar 5 10:07:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 09:07:11 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> Message-ID: On Mar 5, 2010, at 7:48 AM, Peter wrote: > On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >> >>> Is there a misunderstanding here? >> >> Whoops, yes there is ? that's my fault, too. I did not >> read carefully and conflated EUtilities and RemoteBLAST. >> >> Just to be clear, the upcoming email requirement will >> be for EUtilities, NOT for RemoteBLAST. >> >> Thanks for clearing that up, Peter. >> Dave > > No problem - you guys had me worried there for a minute ;) > > Peter Just as an update, I can confirm it is a change with retrieve_blast() not catching the report (no Javascript, no email ;). Will try fixing this later today. chris From robert.bradbury at gmail.com Fri Mar 5 10:08:42 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 5 Mar 2010 10:08:42 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> Message-ID: Sorry, yes I too was reading quickly and not separating RemoteBlast from Eutilities requirements. With respect to "hostility", I do agree Chris that NCBI has provided a great service over the years (I've used it for over 15 as I'm sure many here have). However, the recent Javascript requirement (without any apparent discussion within the user community) has me very annoyed [1]. One could back it up a level and ask why NCBI doesn't have a "user community forum" (at least that I'm aware of) or even a bug database (it isn't like putting up a bugzilla bug database requires all that much work). Heck, even the phone companies (whom I consider to be the epitome of bureaucracy) issue me a trouble ticket # when I have a problem (something to the best of my knowledge NCBI does not do). There is also the fact that several months ago when I requested an explanation for what code/utilities were being used to generate the Homologene "homology" graphics (so I could consider extending it to other species, potentially in BioPerl) I was told in unspecific terms that a variety of utilities were used (and my impression was perhaps an underlying suggestion that it might be too complicated for me to understand -- but that could just be subjective impression on my part). [Of course such a response doesn't fit well my perspective of "open government".) Robert 1. There are a long list of reasons why Javascript is bad ranging from increasing memory and CPU requirements on the end user (one cannot run hundreds of open PubMed tabs, as I often may when doing research, on an "average" machine if all the tabs are running Javascript, downloading and running lots of Javascripts can hardly be considered "green", Javascript doesn't work in the lightest weight browsers such as Dillo, Javascript decreases the reliability and security of the browser, excessive reliance on Javascript may decrease web access for individuals with disabilities (potentially in violation of current laws I suspect), etc.) From roy.chaudhuri at gmail.com Fri Mar 5 10:52:12 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 05 Mar 2010 15:52:12 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> Message-ID: <4B9128AC.1000405@gmail.com> Hi Robert, Just a suggestion, maybe you could use HubMed (www.hubmed.org) as a PubMed alternative? It seems to work ok with JavaScript disabled. Roy. On 05/03/2010 15:08, Robert Bradbury wrote: > Sorry, yes I too was reading quickly and not separating RemoteBlast from > Eutilities requirements. > > With respect to "hostility", I do agree Chris that NCBI has provided a great > service over the years (I've used it for over 15 as I'm sure many here > have). However, the recent Javascript requirement (without any apparent > discussion within the user community) has me very annoyed [1]. One could > back it up a level and ask why NCBI doesn't have a "user community forum" > (at least that I'm aware of) or even a bug database (it isn't like putting > up a bugzilla bug database requires all that much work). Heck, even the > phone companies (whom I consider to be the epitome of bureaucracy) issue me > a trouble ticket # when I have a problem (something to the best of my > knowledge NCBI does not do). > > There is also the fact that several months ago when I requested an > explanation for what code/utilities were being used to generate the > Homologene "homology" graphics (so I could consider extending it to other > species, potentially in BioPerl) I was told in unspecific terms that a > variety of utilities were used (and my impression was perhaps an underlying > suggestion that it might be too complicated for me to understand -- but that > could just be subjective impression on my part). [Of course such a response > doesn't fit well my perspective of "open government".) > > Robert > > 1. There are a long list of reasons why Javascript is bad ranging from > increasing memory and CPU requirements on the end user (one cannot run > hundreds of open PubMed tabs, as I often may when doing research, on an > "average" machine if all the tabs are running Javascript, downloading and > running lots of Javascripts can hardly be considered "green", Javascript > doesn't work in the lightest weight browsers such as Dillo, Javascript > decreases the reliability and security of the browser, excessive reliance on > Javascript may decrease web access for individuals with disabilities > (potentially in violation of current laws I suspect), etc.) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paolo.pavan at gmail.com Fri Mar 5 13:51:55 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 5 Mar 2010 19:51:55 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <2FB5C317605B48269256ABFABBED2239@NewLife> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> <18C0182252934619AD12E49243BE3C14@NewLife> <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> <2FB5C317605B48269256ABFABBED2239@NewLife> Message-ID: <56be91b61003051051v6b06b872q9f59380b05492071@mail.gmail.com> Dear Mark, Thank you again for your efforts spent on this theme, I have read and tested carefully enough I hope, your new ads. I found they work perfectly but either I miss some feature of the Tiling API (and this is possible) or it could be that they don't entirely match what was the initial problem; for sure my fault, I can explain better. Let me start saying that what is needed is the merge of the alignments returned by the get_tiled_alns method. I have 2 seqs: h1, h2 (in the given example 00038 and 00053) and they could be aligned against the same sequence q (named 1_0) They cannot be aligned with common multiple sequence aligners like clustalw since in this case is to be preferred a local alignment algorithm instead of a global alignment. This specific case cannot be handled by programs like cap3 either. I found that megablast -m 5 can output a tiling of all the hits found versus the query, reporting this entire. I hope I gave the idea, if needed I can provide the input sequences of the megablast. Thank you again and have a nice week end, Paolo 2010/3/4 Mark A. Jensen : > Paolo -- Ok, there's now (r16900) an *experimental* method in > Bio::Search::Tiling::MapTiling called get_tiled_alns(). > POD is below. Try it out and let me know-- > cheers, > MAJ > > > =head1 TILED ALIGNMENTS > > The experimental method L will use a tiling > to concatenate tiled hsps into a series of L > objects: > > @alns = $tiling->get_tiled_alns($type, $context); > > Each alignment contains two sequences with ids 'query' and 'subject', > and consists of a concatenation of tiling HSPs which overlap or are > directly adjacent. The alignment are returned in C<$type> sequence > order. When HSPs overlap, the alignment sequence is taken from the HSP > which comes first in the coverage map array. > > The sequences in each alignment contain features (even though they are > L objects) which map the original query/subject > coordinates to the new alignment sequence coordinates. You can > determine the original BLAST fragments this way: > > $aln = ($tiling->get_tiled_alns)[0]; > $qseq = $aln->get_seq_by_id('query'); > $hseq = $aln->get_seq_by_id('subject'); > foreach my $feat ($qseq->get_SeqFeatures) { > ? $org_start = ($feat->get_tag_values('query_start'))[0]; > ? $org_end = ($feat->get_tag_values('query_end'))[0]; > ? # original fragment as represented in the tiled alignment: > ? $org_fragment = $feat->seq; > } > foreach my $feat ($hseq->get_SeqFeatures) { > ? $org_start = ($feat->get_tag_values('subject_start'))[0]; > ? $org_end = ($feat->get_tag_values('subject_end'))[0]; > ? # original fragment as represented in the tiled alignment: > ? $org_fragment = $feat->seq; > } > > > ----- Original Message ----- From: "Paolo Pavan" > To: "Mark A. Jensen" > Cc: "Chris Fields" ; > Sent: Tuesday, March 02, 2010 10:17 AM > Subject: Re: [Bioperl-l] Alignment from blast report > > >> I think you got the sense, thank you. Of course hsps from different >> hits will be reflected in different elements aligned. I've attached >> the example pasted (unix text) because is more readable, hoping will >> not be held by the mailing server :-) >> >> Thank you, >> Paolo >> >> 2010/3/2 Mark A. Jensen : >>> >>> This might a good method to have for Bio::Search::Tiling-- >>> you want to stitch together all the hsps and have the >>> concatenated alignment returned as a Bio::SimpleAlign, >>> correct? Tiling would create the right set of hsps from >>> which to generate the composite alignment. I can >>> try to get something working, but it may take a while- >>> MAJ >>> ----- Original Message ----- From: "Paolo Pavan" >>> To: "Chris Fields" >>> Cc: >>> Sent: Tuesday, March 02, 2010 9:37 AM >>> Subject: Re: [Bioperl-l] Alignment from blast report >>> >>> >>> Hi Chris, >>> Thank you for your reply. So I have to understand that since the >>> get_aln method returns the HSP alignment, there is no way to retrieve >>> the whole alignment as in the example pasted, isn't it? >>> Basically I'm trying to use megablast as kind of multiple local >>> alignment engine and actually I'm not pretty sure this is a good idea >>> but in my particular case could be suitable. I mean that the example >>> below reports only the portions of the sequences that align loosing >>> the portions that does not, I'm not sure I gave the idea. What do you >>> think about? Can you give me your opinion? >>> If there isn't any module written yet, I can try to write a parser, it >>> could be of any interest? >>> >>> Thank you, >>> Paolo >>> >>> 2010/3/2 Chris Fields : >>>> >>>> Paolo, >>>> >>>> You can get a Bio::SimpleAlign from the HSP object. The first code >>>> example >>>> in this section in the HOWTO demonstrates this: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>>> >>>> chris >>>> >>>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >>>> >>>>> Dear all, >>>>> Sorry for pushing up my post but, please does anyone have an hint for >>>>> me? >>>>> Maybe have I to send attached the report to the mailing list? I don't >>>>> know attachment policies of the list, if it is allowed and is needed I >>>>> can do that. >>>>> >>>>> Thank you, >>>>> Paolo >>>>> >>>>> 2010/2/26 Paolo Pavan : >>>>>> >>>>>> Sorry, >>>>>> Maybe I forgot to add this is the megablast -m 5 output. >>>>>> >>>>>> Thank you again, >>>>>> Paolo >>>>>> >>>>>> 2010/2/26 Paolo Pavan : >>>>>>> >>>>>>> Hi all, >>>>>>> I have just a brief question: I've got some megablast reports such >>>>>>> the >>>>>>> one I've pasted below. >>>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>>>> Bio::Align::AlignI implementing one? >>>>>>> >>>>>>> Thank you all, >>>>>>> Paolo >>>>>>> >>>>>>> >>>>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>>>> >>>>>>> >>>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>>>> (2000), >>>>>>> "A greedy algorithm for aligning DNA sequences", >>>>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>>>> >>>>>>> Database: 00038-00053.fasta >>>>>>> 2 sequences; 2001 total letters >>>>>>> >>>>>>> Searching..................................................done >>>>>>> >>>>>>> Query= 00038-00053 >>>>>>> (802 letters) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Score E >>>>>>> Sequences producing significant alignments: (bits) Value >>>>>>> >>>>>>> ______00038 >>>>>>> 226 1e-62 >>>>>>> ______00053 >>>>>>> 115 3e-29 >>>>>>> >>>>>>> 1_0 472 >>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>>>> ______00038 883 >>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 532 >>>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>>>> ______00038 943 >>>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 591 >>>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 651 >>>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 711 >>>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 1 >>>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>>>> >>>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>>>> ______00038 1000 -------------------------------- 1001 >>>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>>>> Database: 00038-00053.fasta >>>>>>> Posted date: Feb 25, 2010 4:47 PM >>>>>>> Number of letters in database: 2001 >>>>>>> Number of sequences in database: 2 >>>>>>> >>>>>>> Lambda K H >>>>>>> 1.37 0.711 1.31 >>>>>>> >>>>>>> Gapped >>>>>>> Lambda K H >>>>>>> 1.37 0.711 1.31 >>>>>>> >>>>>>> >>>>>>> Matrix: blastn matrix:1 -3 >>>>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>>>> Number of Sequences: 2 >>>>>>> Number of Hits to DB: 17 >>>>>>> Number of extensions: 3 >>>>>>> Number of successful extensions: 3 >>>>>>> Number of sequences better than 10.0: 2 >>>>>>> Number of HSP's gapped: 2 >>>>>>> Number of HSP's successfully gapped: 2 >>>>>>> Length of query: 802 >>>>>>> Length of database: 2001 >>>>>>> Length adjustment: 10 >>>>>>> Effective length of query: 792 >>>>>>> Effective length of database: 1981 >>>>>>> Effective search space: 1568952 >>>>>>> Effective search space used: 1568952 >>>>>>> X1: 9 (17.8 bits) >>>>>>> X2: 20 (39.6 bits) >>>>>>> X3: 51 (101.1 bits) >>>>>>> S1: 9 (18.3 bits) >>>>>>> S2: 9 (18.3 bits) >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shalabh.sharma7 at gmail.com Fri Mar 5 15:06:30 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 15:06:30 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) Message-ID: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> Hi All, I have a set of accession numbers. Is it possible to get "isolation_source" from the GenBank records for all the Accession numbers. I would really appreciate if anyone can help me out. Thanks Shalabh From shalabh.sharma7 at gmail.com Fri Mar 5 15:29:17 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 15:29:17 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> Message-ID: <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> HI Brian, Thanks for your quick reply. I was reading the document and it think it talks about parsing a GenBank record. What i exactly want is to submit a batch of accession numbers and get "isolation_source" directly without downloading all the Genbank files. I am still reading the document may be i missed something. Thanks a lot shalabh On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > Shalabh, > > You can start by reading about how Bioperl processes Genbank files and > their annotations: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > > > Brian O. > > On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > > > Hi All, > > I have a set of accession numbers. Is it possible to get > > "isolation_source" from the GenBank records for all the Accession > numbers. > > > > I would really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Fri Mar 5 15:43:33 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 05 Mar 2010 15:43:33 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> Message-ID: Shalabh, I see. I think you could use EUtils then. Take a look at these: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service I'm not an expert on these, and I do not know if one can ask for just a tag value ("isolation_source"). Getting a tag value from the downloaded Genbank entry is not difficult though, that Feature-Annotation HOWTO shows you how. Brian O. On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > HI Brian, > Thanks for your quick reply. > I was reading the document and it think it talks about parsing a GenBank > record. What i exactly want is to submit a batch of accession numbers and > get "isolation_source" directly without downloading all the Genbank files. > I am still reading the document may be i missed something. > > Thanks a lot > shalabh > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > >> Shalabh, >> >> You can start by reading about how Bioperl processes Genbank files and >> their annotations: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation >> >> >> >> Brian O. >> >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: >> >>> Hi All, >>> I have a set of accession numbers. Is it possible to get >>> "isolation_source" from the GenBank records for all the Accession >> numbers. >>> >>> I would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Mar 5 15:13:45 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 05 Mar 2010 15:13:45 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> Message-ID: <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> Shalabh, You can start by reading about how Bioperl processes Genbank files and their annotations: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Brian O. On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > Hi All, > I have a set of accession numbers. Is it possible to get > "isolation_source" from the GenBank records for all the Accession numbers. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 5 16:22:47 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 05 Mar 2010 15:22:47 -0600 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> Message-ID: <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> Regardless on what you try, it will only limit records returned (e.g. you will still get full records, unless you take steps to limit those somehow, by adding sequence start/stop, etc). Anyway, this worked to retrieve those with that tag: "src isolation source"[Properties] That get a lot of hits. If you are only interested in that one line you could just parse it out w/o resorting to bioperl (beleiev it or not, it's not always the best answer). chris On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: > Shalabh, > > I see. I think you could use EUtils then. Take a look at these: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > I'm not an expert on these, and I do not know if one can ask for just a tag value ("isolation_source"). Getting a tag value from the downloaded Genbank entry is not difficult though, that Feature-Annotation HOWTO shows you how. > > Brian O. > > > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > > > HI Brian, > > Thanks for your quick reply. > > I was reading the document and it think it talks about parsing a GenBank > > record. What i exactly want is to submit a batch of accession numbers and > > get "isolation_source" directly without downloading all the Genbank files. > > I am still reading the document may be i missed something. > > > > Thanks a lot > > shalabh > > > > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > > > >> Shalabh, > >> > >> You can start by reading about how Bioperl processes Genbank files and > >> their annotations: > >> > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > >> > >> > >> > >> Brian O. > >> > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > >> > >>> Hi All, > >>> I have a set of accession numbers. Is it possible to get > >>> "isolation_source" from the GenBank records for all the Accession > >> numbers. > >>> > >>> I would really appreciate if anyone can help me out. > >>> > >>> Thanks > >>> Shalabh > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Fri Mar 5 17:06:41 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 17:06:41 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> Message-ID: <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> Thanks Bran and Chris, I followed the example given here : http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook to retrieve raw data records from genbank. For example i used the id : 157091572 to get the genbank record, but the downloaded file does not contain "isolation_source" which is there when you look for the record online: http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&dopt=GenBank&RID=T2S9N0PJ01N&log%24=nuclalign&blast_rank=1&list_uids=157091572 Thanks Shalabh On Fri, Mar 5, 2010 at 4:22 PM, Chris Fields wrote: > Regardless on what you try, it will only limit records returned (e.g. > you will still get full records, unless you take steps to limit those > somehow, by adding sequence start/stop, etc). > > Anyway, this worked to retrieve those with that tag: > "src isolation source"[Properties] > > That get a lot of hits. > > If you are only interested in that one line you could just parse it out > w/o resorting to bioperl (beleiev it or not, it's not always the best > answer). > > chris > > On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: > > Shalabh, > > > > I see. I think you could use EUtils then. Take a look at these: > > > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > > > I'm not an expert on these, and I do not know if one can ask for just a > tag value ("isolation_source"). Getting a tag value from the downloaded > Genbank entry is not difficult though, that Feature-Annotation HOWTO shows > you how. > > > > Brian O. > > > > > > > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > > > > > HI Brian, > > > Thanks for your quick reply. > > > I was reading the document and it think it talks about parsing a > GenBank > > > record. What i exactly want is to submit a batch of accession numbers > and > > > get "isolation_source" directly without downloading all the Genbank > files. > > > I am still reading the document may be i missed something. > > > > > > Thanks a lot > > > shalabh > > > > > > > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne >wrote: > > > > > >> Shalabh, > > >> > > >> You can start by reading about how Bioperl processes Genbank files and > > >> their annotations: > > >> > > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > >> > > >> > > >> > > >> Brian O. > > >> > > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > > >> > > >>> Hi All, > > >>> I have a set of accession numbers. Is it possible to get > > >>> "isolation_source" from the GenBank records for all the Accession > > >> numbers. > > >>> > > >>> I would really appreciate if anyone can help me out. > > >>> > > >>> Thanks > > >>> Shalabh > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From shalabh.sharma7 at gmail.com Fri Mar 5 17:57:00 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 17:57:00 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> Message-ID: <9fcc48c71003051457x7186e3e0y1c9b8ee5ea81e153@mail.gmail.com> Thanks everyone, i got it what i was looking for. EUtlities helped me a lot. Thanks Shalabh On Fri, Mar 5, 2010 at 5:06 PM, shalabh sharma wrote: > Thanks Bran and Chris, > I followed the example given here : > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > to retrieve raw data records from genbank. > For example i used the id : 157091572 to get the genbank record, but the > downloaded file does not contain "isolation_source" which is there when you > look for the record online: > > http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&dopt=GenBank&RID=T2S9N0PJ01N&log%24=nuclalign&blast_rank=1&list_uids=157091572 > > Thanks > Shalabh > > > On Fri, Mar 5, 2010 at 4:22 PM, Chris Fields wrote: > >> Regardless on what you try, it will only limit records returned (e.g. >> you will still get full records, unless you take steps to limit those >> somehow, by adding sequence start/stop, etc). >> >> Anyway, this worked to retrieve those with that tag: >> "src isolation source"[Properties] >> >> That get a lot of hits. >> >> If you are only interested in that one line you could just parse it out >> w/o resorting to bioperl (beleiev it or not, it's not always the best >> answer). >> >> chris >> >> On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: >> > Shalabh, >> > >> > I see. I think you could use EUtils then. Take a look at these: >> > >> > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> > >> > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> > >> > I'm not an expert on these, and I do not know if one can ask for just a >> tag value ("isolation_source"). Getting a tag value from the downloaded >> Genbank entry is not difficult though, that Feature-Annotation HOWTO shows >> you how. >> > >> > Brian O. >> > >> > >> >> > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: >> > >> > > HI Brian, >> > > Thanks for your quick reply. >> > > I was reading the document and it think it talks about parsing a >> GenBank >> > > record. What i exactly want is to submit a batch of accession numbers >> and >> > > get "isolation_source" directly without downloading all the Genbank >> files. >> > > I am still reading the document may be i missed something. >> > > >> > > Thanks a lot >> > > shalabh >> > > >> > > >> > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne > >wrote: >> > > >> > >> Shalabh, >> > >> >> > >> You can start by reading about how Bioperl processes Genbank files >> and >> > >> their annotations: >> > >> >> > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation >> > >> >> > >> >> > >> >> > >> Brian O. >> > >> >> > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: >> > >> >> > >>> Hi All, >> > >>> I have a set of accession numbers. Is it possible to get >> > >>> "isolation_source" from the GenBank records for all the Accession >> > >> numbers. >> > >>> >> > >>> I would really appreciate if anyone can help me out. >> > >>> >> > >>> Thanks >> > >>> Shalabh >> > >>> _______________________________________________ >> > >>> Bioperl-l mailing list >> > >>> Bioperl-l at lists.open-bio.org >> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From cjfields at illinois.edu Fri Mar 5 23:14:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 22:14:01 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> Message-ID: <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> On Mar 5, 2010, at 7:48 AM, Peter wrote: > On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >> >>> Is there a misunderstanding here? >> >> Whoops, yes there is ? that's my fault, too. I did not >> read carefully and conflated EUtilities and RemoteBLAST. >> >> Just to be clear, the upcoming email requirement will >> be for EUtilities, NOT for RemoteBLAST. >> >> Thanks for clearing that up, Peter. >> Dave > > No problem - you guys had me worried there for a minute ;) > > Peter Just to bring this thread full circle, I have committed a fix which (ironically) reduced the code down a bit. I also added an attribute (get_rtoe) that returns the approximate time until the report is returned. chris From joa2006 at med.cornell.edu Sat Mar 6 17:13:45 2010 From: joa2006 at med.cornell.edu (Josef Anrather) Date: Sat, 06 Mar 2010 17:13:45 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> Message-ID: Chris, the fix works flawlessly on my system. Thanks for the fast response. Cheers, Josef On Mar 5, 2010, at 11:14 PM, Chris Fields wrote: > > On Mar 5, 2010, at 7:48 AM, Peter wrote: > >> On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >>> >>>> Is there a misunderstanding here? >>> >>> Whoops, yes there is ? that's my fault, too. I did not >>> read carefully and conflated EUtilities and RemoteBLAST. >>> >>> Just to be clear, the upcoming email requirement will >>> be for EUtilities, NOT for RemoteBLAST. >>> >>> Thanks for clearing that up, Peter. >>> Dave >> >> No problem - you guys had me worried there for a minute ;) >> >> Peter > > Just to bring this thread full circle, I have committed a fix which > (ironically) reduced the code down a bit. I also added an attribute > (get_rtoe) that returns the approximate time until the report is > returned. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jarodpardon at yahoo.com.cn Sun Mar 7 04:13:40 2010 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Sun, 7 Mar 2010 17:13:40 +0800 (CST) Subject: [Bioperl-l] insertion code in pdb parser Message-ID: <643595.96038.qm@web15003.mail.cnb.yahoo.com> hi, all, insertion code for a residue number is very common in many cases, esp. in the numbering schema for antibody sequence, such as 82A, 82B. When Bio::Structure::IO::pdb parses a pdb file containing residues with insertion code, it will assign the id for such residue like 'PRO-52.A' where 'A' is the insertion code, however, the opposite operation (set the id of the residue) does not work. for example, if the original residue number is 51, $res->id('PRO-52.A') will not append the insertion code after the residue number correctly, though it indeed changes the residue number from 51 to 52. Finally, I found out the only way to set the insertion code for the residue: assign the insertion code for all atoms of this residue by the method $atom->icode('A'). I think it is inconvenient and misleading, since insertion code should not be a property for an atom, it is never seen that a residue have atoms with different insertion codes. I highly recommend that there should be some changes: add icode method for residue object, not the atom, as the same, the segment id should also be for residue. Jarod From rtbio.2009 at gmail.com Sun Mar 7 08:11:54 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sun, 7 Mar 2010 14:11:54 +0100 Subject: [Bioperl-l] remoteblast Message-ID: Hello Mark and everybody, I have been trying to connect to remote blast to retrieve similar sequences to a given sequence. But my program is unable to retrieve the sequences from BLAST, i.e., it is getting executed till the remote blast ids, but it is not entering the else loop after collecting the rid. Please check this problem and help me in this regard. I think the problem is in getting the sequence and going to the 'else' part. i.e., else { open(OUTFILE,'>',$blastdebugfile); # I think the problem is in else part, i.e., it is not taking the next result.# print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output Please give me your reply. Thanks and regards, Roopa. My code is as follows. #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use rnai_blast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds."; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes
This page will automatically reload in 30 seconds
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$blastdebugfile); print OUTFILE @params; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => "$organ\[ORGN]"); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); #submits the input data to BLAST# my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); #Retrieving the result ids# my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$blastdebugfile); # I think the problem is in else part, i.e., it is not taking the next result.# print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); #saving the output in blastdata.time.out file# # $random=rand(); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); # print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dna; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); #returning the sequences obtained on BLAST# } From cjfields at illinois.edu Sun Mar 7 09:57:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 7 Mar 2010 08:57:43 -0600 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Roopa, I committed a fix for this a few days ago; if you update from SVN it should work. The problem stemmed from server-side changes at NCBI. chris On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: > Hello Mark and everybody, > > I have been trying to connect to remote blast to retrieve similar sequences > to a given sequence. But my program is unable to retrieve the sequences from > BLAST, i.e., it is getting executed till the remote blast ids, but it is not > entering the else loop after collecting the rid. Please check this problem > and help me in this regard. I think the problem is in getting the sequence > and going to the 'else' part. i.e., > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > > Please give me your reply. > > Thanks and regards, > Roopa. > > My code is as follows. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use rnai_blast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds."; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes
> This page will automatically reload in 30 seconds
> \n > \n"; > > close(OUTFILE); > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$blastdebugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > "$organ\[ORGN]"); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > #submits the input data to BLAST# > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > #Retrieving the result ids# > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > #saving the output in blastdata.time.out file# > > # $random=rand(); > > my $filename = $serverpath."/blastdata_".time()."\.out"; > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > # print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dna; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > close(OUTFILE); > > > return(@seqs); #returning the sequences obtained on BLAST# > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jdetras at gmail.com Fri Mar 5 01:17:40 2010 From: jdetras at gmail.com (Jeffrey Detras) Date: Fri, 5 Mar 2010 14:17:40 +0800 Subject: [Bioperl-l] distances between leaf nodes Message-ID: Hi, I am new at using the Bio::TreeIO module specifically using the newick format for a phylogenetic analysis. The sample_tree attached is Newick-formatted tree. My objective is to get all the distances between all the leaf nodes. I copied examples of the code from http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my knowledge) so that I understand how to assign the right array value for the nodes/leaves. The message would say must provide 2 root nodes. Here is what I have right now: #!/usr/bin/perl -w use strict; my $treefile = 'sample_tree'; use Bio::TreeIO; my $treeio = Bio::TreeIO->new(-format => 'newick', -file => $treefile); while (my $tree = $treeio->next_tree) { my @leaves = $tree->get_leaf_nodes; for (my $dist = $tree->distance(-nodes => \@leaves)){ print "Distance between trees is $dist\n"; } } Thanks, Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_tree Type: application/octet-stream Size: 418 bytes Desc: not available URL: From janine.arloth at googlemail.com Fri Mar 5 04:43:57 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Fri, 5 Mar 2010 10:43:57 +0100 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: Message-ID: Hello, using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: 1 50 test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA 51 100 test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA But I prefer this format: Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 ||||||||||||||||||||||||| Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 How can I get this? Best Regards From elujan at stanford.edu Sun Mar 7 19:49:34 2010 From: elujan at stanford.edu (Ernesto George Lujan) Date: Sun, 7 Mar 2010 16:49:34 -0800 (PST) Subject: [Bioperl-l] Installing BioPerl In-Reply-To: <1189627897.1477411268008644137.JavaMail.root@zm09.stanford.edu> Message-ID: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> Hi everyone, I'm running MacOSX 10.5.8 with Perl 5.8.8 and I'm having trouble installing the BioPerl module. I've downloaded and installed BioPerl 1.5.1-2 binary through FinkCommander, but when I type perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' into the Terminal, it tells me that I'm using BioPerl Version 1.006. How do I get this module to install correctly? Once again, my specs: Perl Version: 5.8.8 BioPerl Version: 1.006 Operating System: Max OSX 10.5.8 Thanks! -BioPerl Beginner From bimber at wisc.edu Sun Mar 7 22:57:12 2010 From: bimber at wisc.edu (Ben Bimber) Date: Sun, 7 Mar 2010 21:57:12 -0600 Subject: [Bioperl-l] Bioperl-run malformed svndiff Message-ID: <9f985cdc1003071957h6c82d4b8t1a6b9a3af7752bde@mail.gmail.com> I recently tried to check out a complete version of bioperl-run and received an error saying 'malformed svndiff'. I've tried this on two different machines, so unless I've doing something wrong, it should be reproducible. I cannot say where updating an existing repository would throw the same error or not. Below is the log: *** Check Out svn checkout "svn://code.open-bio.org/bioperl/bioperl-run/trunk/lib/Bio at HEAD" -r HEAD --depth infinity "C:\Projects\Bio" A C:/Projects/Bio/Tools A C:/Projects/Bio/Tools/Run A C:/Projects/Bio/Tools/Run/Genewise.pm A C:/Projects/Bio/Tools/Run/Analysis A C:/Projects/Bio/Tools/Run/Analysis/soap.pm A C:/Projects/Bio/Tools/Run/AssemblerBase.pm A C:/Projects/Bio/Tools/Run/BWA.pm A C:/Projects/Bio/Tools/Run/Phrap.pm A C:/Projects/Bio/Tools/Run/FootPrinter.pm A C:/Projects/Bio/Tools/Run/AnalysisFactory.pm A C:/Projects/Bio/Tools/Run/BEDTools.pm A C:/Projects/Bio/Tools/Run/EMBOSSApplication.pm A C:/Projects/Bio/Tools/Run/Genscan.pm A C:/Projects/Bio/Tools/Run/RNAMotif.pm A C:/Projects/Bio/Tools/Run/Phylo A C:/Projects/Bio/Tools/Run/Phylo/Phast A C:/Projects/Bio/Tools/Run/Phylo/Phast/PhyloFit.pm A C:/Projects/Bio/Tools/Run/Phylo/Phast/PhastCons.pm A C:/Projects/Bio/Tools/Run/Phylo/Semphy.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/FEL.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/Base.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/Modeltest.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/REL.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/SLAC.pm A C:/Projects/Bio/Tools/Run/Phylo/PhyloBase.pm A C:/Projects/Bio/Tools/Run/Phylo/Phyml.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip A C:/Projects/Bio/Tools/Run/Phylo/Phylip/DrawGram.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Base.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/ProtPars.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Consense.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/DrawTree.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Neighbor.pm A C:/Projects/Bio/Tools/Run/Phylo/Njtree A C:/Projects/Bio/Tools/Run/Phylo/Njtree/Best.pm A C:/Projects/Bio/Tools/Run/Phylo/QuickTree.pm A C:/Projects/Bio/Tools/Run/Phylo/Gerp.pm A C:/Projects/Bio/Tools/Run/Phylo/Molphy A C:/Projects/Bio/Tools/Run/Phylo/Molphy/ProtML.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML A C:/Projects/Bio/Tools/Run/Phylo/PAML/Yn00.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Evolver.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Baseml.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Codeml.pm A C:/Projects/Bio/Tools/Run/Phylo/SLR.pm A C:/Projects/Bio/Tools/Run/Phylo/Gumby.pm A C:/Projects/Bio/Tools/Run/Phylo/LVB.pm A C:/Projects/Bio/Tools/Run/Primer3.pm A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus.pm A C:/Projects/Bio/Tools/Run/Meme.pm A C:/Projects/Bio/Tools/Run/RepeatMasker.pm A C:/Projects/Bio/Tools/Run/Analysis.pm A C:/Projects/Bio/Tools/Run/Cap3.pm A C:/Projects/Bio/Tools/Run/Vista.pm A C:/Projects/Bio/Tools/Run/Pseudowise.pm A C:/Projects/Bio/Tools/Run/Minimo.pm A C:/Projects/Bio/Tools/Run/Match.pm A C:/Projects/Bio/Tools/Run/Mdust.pm A C:/Projects/Bio/Tools/Run/Eponine.pm A C:/Projects/Bio/Tools/Run/Infernal.pm A C:/Projects/Bio/Tools/Run/BlastPlus A C:/Projects/Bio/Tools/Run/BlastPlus/Config.pm A C:/Projects/Bio/Tools/Run/EMBOSSacd.pm A C:/Projects/Bio/Tools/Run/Alignment A C:/Projects/Bio/Tools/Run/Alignment/Proda.pm A C:/Projects/Bio/Tools/Run/Alignment/Kalign.pm A C:/Projects/Bio/Tools/Run/Alignment/StandAloneFasta.pm A C:/Projects/Bio/Tools/Run/Alignment/TCoffee.pm A C:/Projects/Bio/Tools/Run/Alignment/Sim4.pm A C:/Projects/Bio/Tools/Run/Alignment/Probalign.pm A C:/Projects/Bio/Tools/Run/Alignment/Amap.pm A C:/Projects/Bio/Tools/Run/Alignment/Lagan.pm A C:/Projects/Bio/Tools/Run/Alignment/Blat.pm A C:/Projects/Bio/Tools/Run/Alignment/Gmap.pm A C:/Projects/Bio/Tools/Run/Alignment/Probcons.pm A C:/Projects/Bio/Tools/Run/Alignment/DBA.pm A C:/Projects/Bio/Tools/Run/Alignment/Muscle.pm A C:/Projects/Bio/Tools/Run/Alignment/Pal2Nal.pm A C:/Projects/Bio/Tools/Run/Alignment/Exonerate.pm A C:/Projects/Bio/Tools/Run/Alignment/MAFFT.pm A C:/Projects/Bio/Tools/Run/Alignment/Clustalw.pm A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm A C:/Projects/Bio/Tools/Run/Hmmer.pm A C:/Projects/Bio/Tools/Run/BlastPlus.pm A C:/Projects/Bio/Tools/Run/ERPIN.pm A C:/Projects/Bio/Tools/Run/Maq.pm A C:/Projects/Bio/Tools/Run/Bowtie A C:/Projects/Bio/Tools/Run/Bowtie/Config.pm A C:/Projects/Bio/Tools/Run/Seg.pm A C:/Projects/Bio/Tools/Run/Prints.pm A C:/Projects/Bio/Tools/Run/MCS.pm A C:/Projects/Bio/Tools/Run/Tmhmm.pm A C:/Projects/Bio/Tools/Run/Ensembl.pm A C:/Projects/Bio/Tools/Run/Coil.pm A C:/Projects/Bio/Tools/Run/Samtools A C:/Projects/Bio/Tools/Run/Samtools/Config.pm A C:/Projects/Bio/Tools/Run/Genemark.pm A C:/Projects/Bio/Tools/Run/Bowtie.pm A C:/Projects/Bio/Tools/Run/Glimmer.pm A C:/Projects/Bio/Tools/Run/Signalp.pm A C:/Projects/Bio/Tools/Run/Simprot.pm A C:/Projects/Bio/Tools/Run/BWA A C:/Projects/Bio/Tools/Run/BWA/Config.pm A C:/Projects/Bio/Tools/Run/Newbler.pm svn: Malformed svndiff data in representation *** Error (took 00:07.184) From David.Messina at sbc.su.se Mon Mar 8 02:01:13 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 8 Mar 2010 08:01:13 +0100 Subject: [Bioperl-l] Installing BioPerl In-Reply-To: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> References: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> Message-ID: <0483C203-3E81-4112-877B-BC7A439CB916@sbc.su.se> Hey Ernesto, I'm pretty sure you've got BioPerl version 1.6.0, which is actually more current than 1.5.2 that you were looking for. Due to oddities of Perl version numbers, 1.006 = 1.6.0 (or something like that). So I think you're probably good to go. I should also mention that direct installation (i.e. not via fink) works pretty well these days, and through that you can get the current BioPerl release, which is 1.6.2 (or 1.006002000000000). Dave From alex at bioinf.uni-leipzig.de Mon Mar 8 10:45:14 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Mon, 8 Mar 2010 16:45:14 +0100 (CET) Subject: [Bioperl-l] Problem with PAML/Codeml wrapper Message-ID: Hi, I do have a problem with the PAML/Codeml wrapper. I want to calculate all pairwise K_a,K_s values from a given alignment, using the example procedure of http://www.bioperl.org/wiki/HOWTO:PAML my $dna_aln = aa_to_dna_aln($aln, \%seqs); my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new( -params => { 'runmode' => -2, 'seqtype' => 1,} ); $kaks_factory->alignment($dna_aln); my ($rc,$parser) = $kaks_factory->run(); my $result = $parser->next_result(); But I receive an error: -------------------- WARNING --------------------- MSG: There was an error - see error_string for the program output --------------------------------------------------- ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/vendor_perl/5.10.0/Bio/Root/Root.pm:359 STACK: Bio::Tools::Phylo::PAML::_parse_summary /usr/lib/perl5/vendor_perl/5.10.0/Bio/Tools/Phylo/PAML.pm:441 STACK: Bio::Tools::Phylo::PAML::next_result /usr/lib/perl5/vendor_perl/5.10.0/Bio/Tools/Phylo/PAML.pm:257 I use PAML4.4. Could this be the reason? Best, Alex --- By the time you've read this, you've already read it! From David.Messina at sbc.su.se Mon Mar 8 11:29:00 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 8 Mar 2010 17:29:00 +0100 Subject: [Bioperl-l] Problem with PAML/Codeml wrapper In-Reply-To: References: Message-ID: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> Hi Alexander, Hmm, it *should* work given those parameters ? it does for 4.3b ? but I haven't tested it with codeml 4.4 yet. Could you file a bug, including a small test case (code + sequence) so we can try to reproduce and fix the problem? http://bugzilla.open-bio.org/ Thanks, Dave From alex at bioinf.uni-leipzig.de Mon Mar 8 12:11:42 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Mon, 8 Mar 2010 18:11:42 +0100 (CET) Subject: [Bioperl-l] Problem with PAML/Codeml wrapper In-Reply-To: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> References: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> Message-ID: sure. thanks! alex On Mon, 8 Mar 2010, Dave Messina wrote: > Hi Alexander, > > Hmm, it *should* work given those parameters ? it does for 4.3b ? but I haven't tested it with codeml 4.4 yet. > > Could you file a bug, including a small test case (code + sequence) so we can try to reproduce and fix the problem? > > http://bugzilla.open-bio.org/ > > > Thanks, > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > --- By the time you've read this, you've already read it! From jovel_juan at hotmail.com Mon Mar 8 23:08:20 2010 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 9 Mar 2010 04:08:20 +0000 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: , Message-ID: Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor, but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE From jovel_juan at hotmail.com Mon Mar 8 23:50:45 2010 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 9 Mar 2010 04:50:45 +0000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , Message-ID: Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE From florent.angly at gmail.com Tue Mar 9 01:41:33 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 09 Mar 2010 16:41:33 +1000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , Message-ID: <4B95ED9D.6080307@gmail.com> Hi Juan, How about you throw away sequences that have a mismatch in the adapter? After all, if there is a mismatch in the first few bases, it does not bode well for the rest of the sequence and there are so many sequences that it is not a big loss. Florent On 09/03/10 14:50, Juan Jovel wrote: > > > Hello Guys! > > Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. > > I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. > > Any suggestion will be appreciated. Thanks! > > JUAN > > > _________________________________________________________________ > Discover the new Windows Vista > http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From michael.watson at bbsrc.ac.uk Tue Mar 9 01:38:26 2010 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 9 Mar 2010 06:38:26 +0000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , , Message-ID: <8D08960C647E64438CE5740657CBBDC501F910621D@iahcexch1.iah.bbsrc.ac.uk> Use fastx toolkit or something within emboss. Failing that, just write something in pure perl:) ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] On Behalf Of Juan Jovel [jovel_juan at hotmail.com] Sent: 09 March 2010 04:50 To: bioperl Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From acn at stowers.org Tue Mar 9 01:31:49 2010 From: acn at stowers.org (Noll, Aaron) Date: Tue, 9 Mar 2010 00:31:49 -0600 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: Message-ID: http://hannonlab.cshl.edu/fastx_toolkit/commandline.html try out the clipper tool FASTA/Q Clipper $ fastx_clipper -h usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE] version 0.0.6 [-h] = This helpful help screen. [-a ADAPTER] = ADAPTER string. default is CCTTAAGG (dummy adapter). [-l N] = discard sequences shorter than N nucleotides. default is 5. [-d N] = Keep the adapter and N bases after it. (using '-d 0' is the same as not using '-d' at all. which is the default). [-c] = Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter). [-C] = Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter). [-k] = Report Adapter-Only sequences. [-n] = keep sequences with unknown (N) nucleotides. default is to discard such sequences. [-v] = Verbose - report number of sequences. If [-o] is specified, report will be printed to STDOUT. If [-o] is not specified (and output goes to STDOUT), report will be printed to STDERR. [-z] = Compress output with GZIP. [-D] = DEBUG output. [-i INFILE] = FASTA/Q input file. default is STDIN. [-o OUTFILE] = FASTA/Q output file. default is STDOUT. This is a suite of nice utilities that can be downloaded and that by the way are also used by galaxy. -Aaron -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Juan Jovel Sent: Monday, March 08, 2010 10:51 PM To: bioperl Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From alex at bioinf.uni-leipzig.de Tue Mar 9 13:00:01 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Tue, 9 Mar 2010 19:00:01 +0100 (CET) Subject: [Bioperl-l] bootstrap values in cladogram Message-ID: Hi, using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and bootstrap values and try to plot the tree as cladogram. But somehow I cannot print the bootstrap values. Short example: test.nwk ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); [..] use Bio::TreeIO; use Bio::Tree::Draw::Cladogram; [..] my $trees = Bio::TreeIO->new( -file => "test.nwk", -format => 'newick'); my $tree = $trees->next_tree(); [..] my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, -tree => $tree, -compact => 0); $out->print(-file => "test.eps"); I already tried it by copying the bootstrap values into the ids of the internal nodes - nothing. Any suggestions? Thanks, Alex --- By the time you've read this, you've already read it! From jason at bioperl.org Tue Mar 9 15:49:06 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 09 Mar 2010 12:49:06 -0800 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: Message-ID: <4B96B442.8070003@bioperl.org> SearchIO writer -> BLAST format. presumably something like Bio::SearchIO::Writer::TextResultWriter Janine Arloth wrote, On 3/5/10 1:43 AM: > Hello, > using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: > > 1 50 > test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA > lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA > > > 51 100 > test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA > lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA > > > > But I prefer this format: > > > > Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 > > Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 > ||||||||||||||||||||||||| > Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 > > > How can I get this? > > Best Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bhakti.dwivedi at gmail.com Tue Mar 9 15:58:34 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Tue, 9 Mar 2010 15:58:34 -0500 Subject: [Bioperl-l] How to retrieve the Gene Info from the hit genomes start and end positions in the blast table report? Message-ID: Hi, I have a blastn and blastx report (both in blast table m-8 format) against the ncbi nr database. Based on the Hits Start and End positions, how can I retrieve the gene name/acc/id? The blast table does show the hit organism accession number, but what I want is specifically the gene to which it is hitting to. Is there a way to do this in bioperl? Thanks From David.Messina at sbc.su.se Tue Mar 9 16:39:08 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 9 Mar 2010 22:39:08 +0100 Subject: [Bioperl-l] How to retrieve the Gene Info from the hit genomes start and end positions in the blast table report? In-Reply-To: References: Message-ID: Hi Bhakti, Forgive me if the below shows that I've totally misunderstood ? it's late here. > The blast table does show the hit organism > accession number, As you say, in BLAST -m 8 reports, the hit's accession number is the second column. I'm not sure when this would be different from the gene's accession number, at least for the entries in nr for which a gene name has been assigned (some are known only by their accession number). > Based on the Hits Start and End positions, how can I > retrieve the gene name/acc/id? The short answer is 'you can't'. But this makes me think that you're not going against the nr database, but instead whole genome or chromosome sequence records. In which case some of them will have genes annotated in the feature table, which you can get out using BioPerl: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation But many (most?) won't be annotated in this way, in which case you will need to find some file or database that has all the genes' start and stop positions in the sequence that you're searching. Perhaps you could provide a couple of your hits as examples so the problem is clearer? Dave From till.bayer at kaust.edu.sa Wed Mar 10 03:20:15 2010 From: till.bayer at kaust.edu.sa (Till Bayer) Date: Wed, 10 Mar 2010 11:20:15 +0300 Subject: [Bioperl-l] Bio::Index::Blast bug Message-ID: <4B97563F.3020901@kaust.edu.sa> Hi all! I tried to use Bio::Index::Blast, but always got the first hit back, no matter what ID I used. The reason is that the Blast indexer seems to use 'BLAST' as a record separator in all cases, except for RPS-BLAST. I think however that for the current versions of blastall and blast+ 'Query=' should be used. Thus, changing line 222 in Blast.pm from $indexpoint = tell($BLAST) - length $_ if ( $prefix eq 'RPS-' ); to $indexpoint = tell($BLAST) - length $_; makes it work for me. However I have no idea what RPS-BLAST may be, or what different versions of blast output are used, so maybe someone who knows should have a look at that before changing things, and writing a cleaner version than the above hack. Cheers, Till -- Till Bayer 4700 King Abdullah University for Science and Technology Building 2, Room 4231-W16 Thuwal 23955-6900 Saudi Arabia Phone: +96628082373 From avilella at gmail.com Wed Mar 10 03:55:09 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 10 Mar 2010 08:55:09 +0000 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores Message-ID: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> Hi all, I would like to know if anyone knows of a script or method in bioperl to do an unambiguous assembly of fastq sequences, combining the q-scores to give assembled fastq sequences as the output. By unambiguous I mean something like what abyss would produce with this options: ABYSS -k$k -b0 -t0 -e0 -c0 but giving assembled fastq sequences with combined q-scores as output instead of simple fasta assembled sequences. Thanks in advance From sdavis2 at mail.nih.gov Wed Mar 10 05:31:50 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 10 Mar 2010 05:31:50 -0500 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores In-Reply-To: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> References: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> Message-ID: <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> On Wed, Mar 10, 2010 at 3:55 AM, Albert Vilella wrote: > Hi all, > > I would like to know if anyone knows of a script or method in bioperl > to do an unambiguous assembly of fastq sequences, combining the q-scores to > give assembled fastq sequences as the output. > > By unambiguous I mean something like what abyss would produce with this options: > > ABYSS -k$k -b0 -t0 -e0 -c0 > > but giving assembled fastq sequences with combined q-scores as output > instead of simple > fasta assembled sequences. Hi, Albert. I'm not sure exactly what you want here, but have you looked at the Mosaik aligner? Also, look at samtools pileup; you can probably produce something similar to what you want from it as well. I certainly might have misunderstood the problem, though. Sean From biopython at maubp.freeserve.co.uk Wed Mar 10 05:35:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Mar 2010 10:35:56 +0000 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <4B97563F.3020901@kaust.edu.sa> References: <4B97563F.3020901@kaust.edu.sa> Message-ID: <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: > Hi all! > > I tried to use Bio::Index::Blast, but always got the first hit back, no > matter what ID I used. The reason is that the Blast indexer seems to use > 'BLAST' as a record separator in all cases, except for RPS-BLAST. > I think however that for the current versions of blastall and blast+ > 'Query=' should be used. That fits with changes I had to make in Biopython for breaking up the plain text BLAST output into each query. For a while only the RPS-BLAST report omitted the "header" (the BLAST line and the journal references users should cite) between records, but now all the NCBI BLAST tools do this - forcing us to look for the Query= line. i.e. I can't comment on the BioPerl change itself, but your reasoning about the BLAST output makes sense. Peter From avilella at gmail.com Wed Mar 10 05:47:01 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 10 Mar 2010 10:47:01 +0000 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores In-Reply-To: <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> References: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> Message-ID: <358f4d651003100247k789344a2m2decd7283e658de9@mail.gmail.com> Hi Sean, By unambiguous assembly of reads I mean that one would not squash bubbles or trim branches, but simply collapse fully overlapping (embedded) reads by combining the q-scores, or raising the q-scores if you want, and keeping branching graphs separate. This unambiguous denovo assembly would discard depth information, which is important if you are doing digital gene expression analysis, but would produce a collapsed fastq set of sequences that would be leaner for downstream processing. I'll have a look at Mosaik. I tried samtools pileup, but it seems a bit overcomplicated to have to map back the reads if what you want to do is just have the assembled reads with fastq scores coming out of the assembler in the first place. That's why I was thinking it would be good to have this unambiguous or "dummy" fastq assembly output could fit into a bioperl script or method. Cheers On Wed, Mar 10, 2010 at 10:31 AM, Sean Davis wrote: > On Wed, Mar 10, 2010 at 3:55 AM, Albert Vilella wrote: >> Hi all, >> >> I would like to know if anyone knows of a script or method in bioperl >> to do an unambiguous assembly of fastq sequences, combining the q-scores to >> give assembled fastq sequences as the output. >> >> By unambiguous I mean something like what abyss would produce with this options: >> >> ABYSS -k$k -b0 -t0 -e0 -c0 >> >> but giving assembled fastq sequences with combined q-scores as output >> instead of simple >> fasta assembled sequences. > > Hi, Albert. > > I'm not sure exactly what you want here, but have you looked at the > Mosaik aligner? ?Also, look at samtools pileup; you can probably > produce something similar to what you want from it as well. > > I certainly might have misunderstood the problem, though. > > Sean > From adsj at novozymes.com Wed Mar 10 08:46:02 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 10 Mar 2010 14:46:02 +0100 Subject: [Bioperl-l] [PATCH] Fix infinite loop in EMBL writer. Message-ID: <87k4tke1d1.fsf@topper.koldfront.dk> This fix is an exact duplicate of the fix for bug #2915 - of the Genbank writer, which was fixed in revision 16275. --- Bio/SeqIO/embl.pm | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/Bio/SeqIO/embl.pm b/Bio/SeqIO/embl.pm index cfea1b6..de1bf11 100644 --- a/Bio/SeqIO/embl.pm +++ b/Bio/SeqIO/embl.pm @@ -1432,7 +1432,7 @@ sub _write_line_EMBL_regex { CHUNK: while($line) { foreach my $pat ($regex, '[,;\.\/-]\s|'.$regex, '[,;\.\/-]|'.$regex) { - if ($line =~ m/^(.{1,$subl})($pat)(.*)/ ) { + if ($line =~ m/^(.{0,$subl})($pat)(.*)/ ) { my $l = $1.$2; $l =~ s/#/ /g # remove word wrap protection char '#' if $pre1 eq "RA "; @@ -1441,6 +1441,7 @@ sub _write_line_EMBL_regex { # be strict about not padding spaces according to # genbank format $l =~ s/\s+$//; + next CHUNK if ($l eq ''); push(@lines, $l); next CHUNK; } -- 1.6.3.3 -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Wed Mar 10 09:27:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 08:27:59 -0600 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> Message-ID: On Mar 10, 2010, at 4:35 AM, Peter wrote: > On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >> Hi all! >> >> I tried to use Bio::Index::Blast, but always got the first hit back, no >> matter what ID I used. The reason is that the Blast indexer seems to use >> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >> I think however that for the current versions of blastall and blast+ >> 'Query=' should be used. > > That fits with changes I had to make in Biopython for breaking > up the plain text BLAST output into each query. For a while only > the RPS-BLAST report omitted the "header" (the BLAST line > and the journal references users should cite) between records, > but now all the NCBI BLAST tools do this - forcing us to look > for the Query= line. > > i.e. I can't comment on the BioPerl change itself, but your > reasoning about the BLAST output makes sense. > > Peter One side-effect of this is we will be missing the search algorithm and a few small odds and ends from all but the first report; this trickles down into how we properly deal with HSP coordinates, but we can probably wrangle some magic there to get things working for the most part. This is similar to how XML format is currently dealt with (and another reason this format is the easiest to support, as it doesn't change based on NCBI's whims). Do we have example reports with multiple queries from BLAST+ available? It would be invaluable for the projects; if not I can probably generate a few locally. chris From biopython at maubp.freeserve.co.uk Wed Mar 10 09:40:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Mar 2010 14:40:16 +0000 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> Message-ID: <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> On Wed, Mar 10, 2010 at 2:27 PM, Chris Fields wrote: > On Mar 10, 2010, at 4:35 AM, Peter wrote: > >> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >>> Hi all! >>> >>> I tried to use Bio::Index::Blast, but always got the first hit back, no >>> matter what ID I used. The reason is that the Blast indexer seems to use >>> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >>> I think however that for the current versions of blastall and blast+ >>> 'Query=' should be used. >> >> That fits with changes I had to make in Biopython for breaking >> up the plain text BLAST output into each query. For a while only >> the RPS-BLAST report omitted the "header" (the BLAST line >> and the journal references users should cite) between records, >> but now all the NCBI BLAST tools do this - forcing us to look >> for the Query= line. >> >> i.e. I can't comment on the BioPerl change itself, but your >> reasoning about the BLAST output makes sense. >> >> Peter > > One side-effect of this is we will be missing the search > algorithm and a few small odds and ends from all but > the first report; this trickles down into how we properly > deal with HSP coordinates, but we can probably wrangle > some magic there to get things working for the most part. > ... Yeah - I had similar issues with the Biopython plain text BLAST parser. The hack/magic I used was to cache the header text from the first record and then re-insert it on subsequence records. Nasty, but works. >?This is similar to how XML format is currently dealt with > (and another reason this format is the easiest to support, > as it doesn't change based on NCBI's whims). They may have changed a few things here too - watch out. > Do we have example reports with multiple queries from > BLAST+ available? ?It would be invaluable for the projects; > if not I can probably generate a few locally. I've got one example in Biopython's unit tests, http://biopython.org/SRC/biopython/Tests/Blast/bt081.txt Peter From cjfields at illinois.edu Wed Mar 10 10:19:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 09:19:42 -0600 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> Message-ID: <27C91884-E910-4BDF-B777-B90E7B4F9103@illinois.edu> On Mar 10, 2010, at 8:40 AM, Peter wrote: > On Wed, Mar 10, 2010 at 2:27 PM, Chris Fields wrote: >> On Mar 10, 2010, at 4:35 AM, Peter wrote: >> >>> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >>>> Hi all! >>>> >>>> I tried to use Bio::Index::Blast, but always got the first hit back, no >>>> matter what ID I used. The reason is that the Blast indexer seems to use >>>> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >>>> I think however that for the current versions of blastall and blast+ >>>> 'Query=' should be used. >>> >>> That fits with changes I had to make in Biopython for breaking >>> up the plain text BLAST output into each query. For a while only >>> the RPS-BLAST report omitted the "header" (the BLAST line >>> and the journal references users should cite) between records, >>> but now all the NCBI BLAST tools do this - forcing us to look >>> for the Query= line. >>> >>> i.e. I can't comment on the BioPerl change itself, but your >>> reasoning about the BLAST output makes sense. >>> >>> Peter >> >> One side-effect of this is we will be missing the search >> algorithm and a few small odds and ends from all but >> the first report; this trickles down into how we properly >> deal with HSP coordinates, but we can probably wrangle >> some magic there to get things working for the most part. >> ... > > Yeah - I had similar issues with the Biopython plain > text BLAST parser. The hack/magic I used was to > cache the header text from the first record and then > re-insert it on subsequence records. Nasty, but works. Right, but here's the side-effect: unless that data is somehow stored when indexing, it will not be caught if one starts an IO stream at any point past the BLAST header (in other words, all but the first report). We could, in effect, store that as meta information somehow (I think Index may have some meta storage), or just parse it prior to initiating the stream and pass the information into the IO object. >> This is similar to how XML format is currently dealt with >> (and another reason this format is the easiest to support, >> as it doesn't change based on NCBI's whims). > > They may have changed a few things here too - watch out. Ugh. >> Do we have example reports with multiple queries from >> BLAST+ available? It would be invaluable for the projects; >> if not I can probably generate a few locally. > > I've got one example in Biopython's unit tests, > http://biopython.org/SRC/biopython/Tests/Blast/bt081.txt > > Peter Okay, will start up some work to work out tests, etc. chris From thomas.sharpton at gmail.com Wed Mar 10 10:30:37 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 10 Mar 2010 07:30:37 -0800 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser Message-ID: Hey everyone, Since HMMER version 3 went live in the middle of last month, I thought it a good time to update the SearchIO parser I've been working on for some time and submit the tool to the community (finally....). At the moment, the module seems capable of parsing hmmsearch and hmmscan outputs, both with and without the alignment option. Some aspects of functionality have yet to be flushed out, but this one should be capable of doing most of your day to day procedures (at least it appears to on my end). I'd love to have people play with it and I'm happy to hear feedback, criticism, development requests and bug reports. That said, this is the first code I've contributed to BioPerl, so please be gentle ;). You can find the bioperl-hmmer3 package in bioperl-dev. I've included a test script as well as sample hmmscan/hmmsearch report files and test data in the bioperl-hmmer3 root directory. As an aside, BioPerl has been a wonderful resource for me and I'm glad to be giving back, even if only a little. I hope this helps out at least a few of you. All the best, Tom From cjfields at illinois.edu Wed Mar 10 10:53:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 09:53:41 -0600 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser In-Reply-To: References: Message-ID: <1268236421.20872.21.camel@pyrimidine.igb.uiuc.edu> Wonderful! Tom, thanks for your hard work! chris On Wed, 2010-03-10 at 07:30 -0800, Thomas Sharpton wrote: > Hey everyone, > > Since HMMER version 3 went live in the middle of last month, I thought > it a good time to update the SearchIO parser I've been working on for > some time and submit the tool to the community (finally....). At the > moment, the module seems capable of parsing hmmsearch and hmmscan > outputs, both with and without the alignment option. Some aspects of > functionality have yet to be flushed out, but this one should be > capable of doing most of your day to day procedures (at least it > appears to on my end). > > I'd love to have people play with it and I'm happy to hear feedback, > criticism, development requests and bug reports. That said, this is > the first code I've contributed to BioPerl, so please be gentle ;). > You can find the bioperl-hmmer3 package in bioperl-dev. I've included > a test script as well as sample hmmscan/hmmsearch report files and > test data in the bioperl-hmmer3 root directory. > > As an aside, BioPerl has been a wonderful resource for me and I'm glad > to be giving back, even if only a little. I hope this helps out at > least a few of you. > > All the best, > Tom > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From asjo at koldfront.dk Wed Mar 10 12:04:00 2010 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 10 Mar 2010 18:04:00 +0100 Subject: [Bioperl-l] Fix infinite loop in EMBL writer. In-Reply-To: <87k4tke1d1.fsf@topper.koldfront.dk> ("Adam =?iso-8859-1?Q?Sj?= =?iso-8859-1?Q?=F8gren=22's?= message of "Wed, 10 Mar 2010 14:46:02 +0100") References: <87k4tke1d1.fsf@topper.koldfront.dk> Message-ID: <87wrxkw1kv.fsf@topper.koldfront.dk> On Wed, 10 Mar 2010 14:46:02 +0100, Adam wrote: > This fix is an exact duplicate of the fix for bug #2915 - of > the Genbank writer, which was fixed in revision 16275. I have created bug #3025 in bugzilla with the patch (I couldn't remember whether here or there is most appropriate). Best regards, Adam -- "It isn't modern just because it's electric. Country Adam Sj?gren music was electric too." asjo at koldfront.dk From David.Messina at sbc.su.se Wed Mar 10 12:35:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 10 Mar 2010 18:35:52 +0100 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser In-Reply-To: References: Message-ID: Thanks so much, Thomas! I expect to be using Hmmer 3 for my own work fairly soon, so I'm looking forward to taking advantage of this. Dave From rmb32 at cornell.edu Wed Mar 10 15:13:57 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 10 Mar 2010 12:13:57 -0800 Subject: [Bioperl-l] call for help - BioPerl GSoC wiki page Message-ID: <4B97FD85.50402@cornell.edu> Hi all, BioPerl's Google Summer of Code page in support of the Open Bioinformatics Foundation's application to Google Summer of Code is shaping up, but still needs some polishing. We're coming up on the application deadline, and we need to make a good, polished show of it. Please put in a little time to look at, edit, polish, and flesh out the BioPerl and OBF wiki pages in support of our application: BioPerl: http://bioperl.org/wiki/Google_Summer_of_Code OBF: http://open-bio.org/wiki/Google_Summer_of_Code Specific things for the BioPerl page, the Bio::Assembly project on that page needs to either be fleshed out or removed. Thanks for all the hard work from everyone so far (especially Chris!). It would be *very* good to have some more project ideas and mentor volunteers. So if you haven't already, please consider volunteering to mentor a student. Also, we all know many things that BioPerl needs help with, so if you can think of a good intern project, add it to the page and maybe we can get a GSoC student to work on it. Rob From nml5566 at gmail.com Wed Mar 10 17:52:19 2010 From: nml5566 at gmail.com (Nathan Liles) Date: Wed, 10 Mar 2010 16:52:19 -0600 Subject: [Bioperl-l] Can protein glyph tracks interfere with other tracks? Message-ID: <4B9822A3.2050202@gmail.com> I'm trying to patch Gbrowse to properly display circular segments. Currently, I'm working on getting the protein glyphs to display properly beyond the end of the track. I noticed when I turn on the protein track, it can sometimes affect another track. Specifically, turning on the protein track can either cause the gene glyphs to disappear or be duplicated. This only happens for features with two subfeatures that appear on the panel at opposite ends. This seems strange since I can't imagine how one track could affect another. Has anyone noticed this behavior before? Can anybody think of a way that the protein glyph module can affect other glyphs? Thanks, Nathan Liles From me at miguel.weapps.com Thu Mar 11 00:48:17 2010 From: me at miguel.weapps.com (Luis M Rodriguez-R) Date: Thu, 11 Mar 2010 00:48:17 -0500 Subject: [Bioperl-l] PSI-BLAST uncommon result Message-ID: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> Hello all, I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception: MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792 My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail. Best regards, Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 luisrodr at uniandes.edu.co me at miguel.weapps.com BLASTP 2.2.18 [Mar-02-2008] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for compositional score matrix adjustment: Altschul, Stephen F., John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. Reference for composition-based statistics starting in round 2: Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. Query= eff254 (67 letters) Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 10,383,435 sequences; 3,542,477,638 total letters Searching..................................................done Results from round 1 Score E Sequences producing significant alignments: (bits) Value ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28 >ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin hrpA gb|ABA39805.1| HrpA [Erwinia pyrifoliae] emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM 12163] Length = 67 Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust. Identities = 67/67 (100%), Positives = 67/67 (100%) Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 Query: 61 AAKAIQF 67 AAKAIQF Sbjct: 61 AAKAIQF 67 Searching..................................................done ***** No hits found ****** Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 863,709,833 Number of sequences in database: 2,562,282 Database: /storage1/databases/ncbi-blast/nr.01 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 936,189,781 Number of sequences in database: 2,674,439 Database: /storage1/databases/ncbi-blast/nr.02 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 974,890,473 Number of sequences in database: 2,826,395 Database: /storage1/databases/ncbi-blast/nr.03 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 767,687,551 Number of sequences in database: 2,320,319 Lambda K H 0.297 0.107 0.256 Lambda K H 0.267 0.0344 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 480,706,425 Number of Sequences: 10383435 Number of extensions: 8598061 Number of successful extensions: 47335 Number of sequences better than 1.0e-25: 1 Number of HSP's better than 0.0 without gapping: 2 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 47333 Number of HSP's gapped (non-prelim): 2 length of query: 67 length of database: 3,542,477,638 effective HSP length: 39 effective length of query: 28 effective length of database: 3,137,523,673 effective search space: 87850662844 effective search space used: 87850662844 T: 11 A: 40 X1: 16 ( 6.9 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 43 (21.7 bits) S2: 298 (119.7 bits) From jason at bioperl.org Thu Mar 11 03:13:24 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 00:13:24 -0800 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: References: Message-ID: <4B98A624.7020102@bioperl.org> not sure if the cladogram is printing bootstraps from the internal id or the bootstrap function. See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how to automatically convert internal IDs to boostrap slots basically by using -internal_node_id => 'bootstrap' in the TreeIO initialization. You may want to iterate through the tree and print $node->bootstrap where you think it should be so you can verify that it is working too. -jason Alexander Donath wrote, On 3/9/10 10:00 AM: > Hi, > > using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and > bootstrap values and try to plot the tree as cladogram. But somehow I > cannot print the bootstrap values. > > Short example: > > test.nwk > ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); > > > > [..] > use Bio::TreeIO; > use Bio::Tree::Draw::Cladogram; > [..] > my $trees = Bio::TreeIO->new( -file => "test.nwk", > -format => 'newick'); > my $tree = $trees->next_tree(); > [..] > my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, > -tree => $tree, > -compact => 0); > > $out->print(-file => "test.eps"); > > > I already tried it by copying the bootstrap values into the ids of the > internal nodes - nothing. Any suggestions? > > > Thanks, > Alex > > --- > By the time you've read this, you've already read it! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 11 09:27:33 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 08:27:33 -0600 Subject: [Bioperl-l] PSI-BLAST uncommon result In-Reply-To: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> References: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> Message-ID: <70AF1FA5-FD88-48E3-A672-F72B9D3E1B3B@illinois.edu> Luis, The best way to handle this is to attach the problematic report (not append it) to a bug report on bugzilla. This ensures we aren't running into artifacts generated via the email client, etc. chris On Mar 10, 2010, at 11:48 PM, Luis M Rodriguez-R wrote: > Hello all, > > I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception: > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792 > My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail. > > Best regards, > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > > BLASTP 2.2.18 [Mar-02-2008] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > > Reference for compositional score matrix adjustment: Altschul, Stephen F., > John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, > Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches > using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. > > > Reference for composition-based statistics starting in round 2: > Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, > Sergei Shavirin, John L. Spouge, Yuri I. Wolf, > Eugene V. Koonin, and Stephen F. Altschul (2001), > "Improving the accuracy of PSI-BLAST protein database searches with > composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > Query= eff254 > (67 letters) > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 10,383,435 sequences; 3,542,477,638 total letters > > Searching..................................................done > > > Results from round 1 > > > Score E > Sequences producing significant alignments: (bits) Value > > ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28 > >> ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation > pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] > sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin > hrpA > gb|ABA39805.1| HrpA [Erwinia pyrifoliae] > emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation > pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] > emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM > 12163] > Length = 67 > > Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust. > Identities = 67/67 (100%), Positives = 67/67 (100%) > > Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 > MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN > Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 > > Query: 61 AAKAIQF 67 > AAKAIQF > Sbjct: 61 AAKAIQF 67 > > > Searching..................................................done > > > > ***** No hits found ****** > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 863,709,833 > Number of sequences in database: 2,562,282 > > Database: /storage1/databases/ncbi-blast/nr.01 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 936,189,781 > Number of sequences in database: 2,674,439 > > Database: /storage1/databases/ncbi-blast/nr.02 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 974,890,473 > Number of sequences in database: 2,826,395 > > Database: /storage1/databases/ncbi-blast/nr.03 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 767,687,551 > Number of sequences in database: 2,320,319 > > Lambda K H > 0.297 0.107 0.256 > > Lambda K H > 0.267 0.0344 0.140 > > > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Hits to DB: 480,706,425 > Number of Sequences: 10383435 > Number of extensions: 8598061 > Number of successful extensions: 47335 > Number of sequences better than 1.0e-25: 1 > Number of HSP's better than 0.0 without gapping: 2 > Number of HSP's successfully gapped in prelim test: 0 > Number of HSP's that attempted gapping in prelim test: 47333 > Number of HSP's gapped (non-prelim): 2 > length of query: 67 > length of database: 3,542,477,638 > effective HSP length: 39 > effective length of query: 28 > effective length of database: 3,137,523,673 > effective search space: 87850662844 > effective search space used: 87850662844 > T: 11 > A: 40 > X1: 16 ( 6.9 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 43 (21.7 bits) > S2: 298 (119.7 bits) > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Mar 11 10:38:50 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 07:38:50 -0800 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: References: <4B98A624.7020102@bioperl.org> Message-ID: <4B990E8A.5060704@bioperl.org> Yeah sorry then I don't know what the problem is. The usual - are you using the latest version question applies, but sounds like something else is wrong with this module. I don't have any time to try out any code sorry but maybe someone else can step in to give a hand. -jason Alexander Donath wrote, On 3/11/10 1:05 AM: > I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. > > Nevertheless, iterating through the tree and printing $node->bootstrap > worked in both cases and gave me the correct bootstrap values of the > inner nodes. > > I also called move_id_to_bootstrap on the tree. But this resulted in > an error: > > Can't locate object method "move_id_to_bootstrap" via package > "Bio::Tree::Tree". > Even though it's inherited from the interface, as far as I can tell. > > > alex > > > On Thu, 11 Mar 2010, Jason Stajich wrote: > >> not sure if the cladogram is printing bootstraps from the internal id >> or the bootstrap function. >> >> See the example code here http://bioperl.org/wiki/HOWTO:Trees that >> shows how to automatically convert internal IDs to boostrap slots >> basically by using >> -internal_node_id => 'bootstrap' >> in the TreeIO initialization. >> >> You may want to iterate through the tree and print $node->bootstrap >> where you think it should be so you can verify that it is working too. >> >> -jason >> >> Alexander Donath wrote, On 3/9/10 10:00 AM: >>> Hi, >>> >>> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths >>> and bootstrap values and try to plot the tree as cladogram. But >>> somehow I cannot print the bootstrap values. >>> >>> Short example: >>> >>> test.nwk >>> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >>> >>> >>> >>> >>> [..] >>> use Bio::TreeIO; >>> use Bio::Tree::Draw::Cladogram; >>> [..] >>> my $trees = Bio::TreeIO->new( -file => "test.nwk", >>> -format => 'newick'); >>> my $tree = $trees->next_tree(); >>> [..] >>> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >>> -tree => $tree, >>> -compact => 0); >>> >>> $out->print(-file => "test.eps"); >>> >>> >>> I already tried it by copying the bootstrap values into the ids of the >>> internal nodes - nothing. Any suggestions? >>> >>> >>> Thanks, >>> Alex >>> >>> --- >>> By the time you've read this, you've already read it! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > --- > Alexander Donath > Professur f?r Bioinformatik > Institut f?r Informatik > Universit?t Leipzig > H?rtelstr. 16-18 > D-04107 Leipzig, Germany > > phone: +49 (0)341 97-16702 > fax: +49 (0)341 97-16679 > > By the time you've read this, you've already read it! From jason at bioperl.org Thu Mar 11 10:40:59 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 07:40:59 -0800 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: References: Message-ID: <4B990F0B.8010100@bioperl.org> You should only have TWO nodes in the array not all the leaves. =head2 distance Title : distance Usage : distance(-nodes => \@nodes ) Function: returns the distance between TWO given nodes Returns : numerical distance Args : -nodes => arrayref of nodes to test or ($node1, $node2) =cut Jeffrey Detras wrote, On 3/4/10 10:17 PM: > Hi, > > I am new at using the Bio::TreeIO module specifically using the newick > format for a phylogenetic analysis. The sample_tree attached is > Newick-formatted tree. My objective is to get all the distances between all > the leaf nodes. I copied examples of the code from > http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my > knowledge) so that I understand how to assign the right array value for the > nodes/leaves. The message would say must provide 2 root nodes. > > Here is what I have right now: > > #!/usr/bin/perl -w > use strict; > > my $treefile = 'sample_tree'; > use Bio::TreeIO; > my $treeio = Bio::TreeIO->new(-format => 'newick', > -file => $treefile); > > while (my $tree = $treeio->next_tree) { > my @leaves = $tree->get_leaf_nodes; > for (my $dist = $tree->distance(-nodes => \@leaves)){ > print "Distance between trees is $dist\n"; > } > } > > Thanks, > Jeff > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Mar 11 11:11:04 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 11:11:04 -0500 Subject: [Bioperl-l] Can protein glyph tracks interfere with other tracks? In-Reply-To: <4B9822A3.2050202@gmail.com> References: <4B9822A3.2050202@gmail.com> Message-ID: <4536f7701003110811s79c30638x100ae521bce1084a@mail.gmail.com> Hi Nathan, Well, it certainly shouldn't! The tracks are supposed to be calculated independently without reusing anything. Debugging should be fun though. Does it matter if you change the adaptor (for instance, if you are using the memory adaptor for Bio::DB::SeqFeature::Store, try putting it in a mysql database (or vice versa) to help narrow down where the bug is. Scott On Wed, Mar 10, 2010 at 5:52 PM, Nathan Liles wrote: > I'm trying to patch Gbrowse to properly display circular segments. > Currently, I'm working on getting the protein glyphs to display properly > beyond the end of the track. > > I noticed when I turn on the protein track, it can sometimes affect another > track. Specifically, turning on the protein track can either cause the gene > glyphs to disappear or be duplicated. > This only happens for features with two subfeatures that appear on the panel > at opposite ends. > > This seems strange since I can't imagine how one track could affect another. > Has anyone noticed this behavior before? > Can anybody think of a way that the protein glyph module can affect other > glyphs? > > Thanks, > Nathan Liles > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Thu Mar 11 11:21:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 10:21:02 -0600 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: <4B990E8A.5060704@bioperl.org> References: <4B98A624.7020102@bioperl.org> <4B990E8A.5060704@bioperl.org> Message-ID: <2BBC0220-4233-4EB7-81A8-FA8342ED9714@illinois.edu> Alex, The best thing to do is to file this as a bug so we don't lose track of it, including demonstration code. chris On Mar 11, 2010, at 9:38 AM, Jason Stajich wrote: > Yeah sorry then I don't know what the problem is. The usual - are you using the latest version question applies, but sounds like something else is wrong with this module. > > I don't have any time to try out any code sorry but maybe someone else can step in to give a hand. > -jason > > Alexander Donath wrote, On 3/11/10 1:05 AM: >> I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. >> >> Nevertheless, iterating through the tree and printing $node->bootstrap worked in both cases and gave me the correct bootstrap values of the inner nodes. >> >> I also called move_id_to_bootstrap on the tree. But this resulted in an error: >> >> Can't locate object method "move_id_to_bootstrap" via package "Bio::Tree::Tree". >> Even though it's inherited from the interface, as far as I can tell. >> >> >> alex >> >> >> On Thu, 11 Mar 2010, Jason Stajich wrote: >> >>> not sure if the cladogram is printing bootstraps from the internal id or the bootstrap function. >>> >>> See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how to automatically convert internal IDs to boostrap slots basically by using >>> -internal_node_id => 'bootstrap' >>> in the TreeIO initialization. >>> >>> You may want to iterate through the tree and print $node->bootstrap where you think it should be so you can verify that it is working too. >>> >>> -jason >>> >>> Alexander Donath wrote, On 3/9/10 10:00 AM: >>>> Hi, >>>> >>>> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and bootstrap values and try to plot the tree as cladogram. But somehow I cannot print the bootstrap values. >>>> >>>> Short example: >>>> >>>> test.nwk >>>> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >>>> >>>> >>>> >>>> [..] >>>> use Bio::TreeIO; >>>> use Bio::Tree::Draw::Cladogram; >>>> [..] >>>> my $trees = Bio::TreeIO->new( -file => "test.nwk", >>>> -format => 'newick'); >>>> my $tree = $trees->next_tree(); >>>> [..] >>>> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >>>> -tree => $tree, >>>> -compact => 0); >>>> >>>> $out->print(-file => "test.eps"); >>>> >>>> >>>> I already tried it by copying the bootstrap values into the ids of the >>>> internal nodes - nothing. Any suggestions? >>>> >>>> >>>> Thanks, >>>> Alex >>>> >>>> --- >>>> By the time you've read this, you've already read it! >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> --- >> Alexander Donath >> Professur f?r Bioinformatik >> Institut f?r Informatik >> Universit?t Leipzig >> H?rtelstr. 16-18 >> D-04107 Leipzig, Germany >> >> phone: +49 (0)341 97-16702 >> fax: +49 (0)341 97-16679 >> >> By the time you've read this, you've already read it! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Mon Mar 8 16:06:11 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 08 Mar 2010 16:06:11 -0500 Subject: [Bioperl-l] Next Gen Formats Message-ID: <4B9566C3.6000007@umdnj.edu> Does Bioperl support color-space sequences, or FASTA formatted quality value files? ABI's Solid platform generates a number of files, two of which are fairly important (at the moment): 1) .csfasta Color-space sequences in FASTA format 2) .qual Quality values of each color call, also in FASTA format. I didn't see (at quick glance) support for this in Bioperl, but maybe someone can point me in the right direction? Ryan -------------- next part -------------- A non-text attachment was scrubbed... Name: golharam.vcf Type: text/x-vcard Size: 379 bytes Desc: not available URL: From alex at bioinf.uni-leipzig.de Thu Mar 11 04:05:13 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Thu, 11 Mar 2010 10:05:13 +0100 (CET) Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: <4B98A624.7020102@bioperl.org> References: <4B98A624.7020102@bioperl.org> Message-ID: I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. Nevertheless, iterating through the tree and printing $node->bootstrap worked in both cases and gave me the correct bootstrap values of the inner nodes. I also called move_id_to_bootstrap on the tree. But this resulted in an error: Can't locate object method "move_id_to_bootstrap" via package "Bio::Tree::Tree". Even though it's inherited from the interface, as far as I can tell. alex On Thu, 11 Mar 2010, Jason Stajich wrote: > not sure if the cladogram is printing bootstraps from the internal id or the > bootstrap function. > > See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how > to automatically convert internal IDs to boostrap slots basically by using > -internal_node_id => 'bootstrap' > in the TreeIO initialization. > > You may want to iterate through the tree and print $node->bootstrap where you > think it should be so you can verify that it is working too. > > -jason > > Alexander Donath wrote, On 3/9/10 10:00 AM: >> Hi, >> >> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and >> bootstrap values and try to plot the tree as cladogram. But somehow I >> cannot print the bootstrap values. >> >> Short example: >> >> test.nwk >> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >> >> >> >> [..] >> use Bio::TreeIO; >> use Bio::Tree::Draw::Cladogram; >> [..] >> my $trees = Bio::TreeIO->new( -file => "test.nwk", >> -format => 'newick'); >> my $tree = $trees->next_tree(); >> [..] >> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >> -tree => $tree, >> -compact => 0); >> >> $out->print(-file => "test.eps"); >> >> >> I already tried it by copying the bootstrap values into the ids of the >> internal nodes - nothing. Any suggestions? >> >> >> Thanks, >> Alex >> >> --- >> By the time you've read this, you've already read it! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Alexander Donath Professur f?r Bioinformatik Institut f?r Informatik Universit?t Leipzig H?rtelstr. 16-18 D-04107 Leipzig, Germany phone: +49 (0)341 97-16702 fax: +49 (0)341 97-16679 By the time you've read this, you've already read it! From Alexander.Kanapin at oicr.on.ca Thu Mar 11 10:56:41 2010 From: Alexander.Kanapin at oicr.on.ca (Alexander Kanapin) Date: Thu, 11 Mar 2010 10:56:41 -0500 Subject: [Bioperl-l] GFF to GTF converter Message-ID: Hi BioPerl gurus, Does anybody knows a reliable GFF to GTF converter which can generate files acceptable by cufflinks ? We attempted to convert a drosophila and worm genome GFFs (taken from Flybase and Wormbase ftp) to GTF with Bio::FeatureIO #read from a file my $in = Bio::FeatureIO->new(-file => $infile , -format => 'GFF'); #write out features my $out = Bio::FeatureIO->new(-file => ">$outfile" , -format => 'GFF' , -version => 2.5); However, we discovered that the resulting file is not compliant with GTF format specifications as they are described here: http://mblab.wustl.edu/GTF22.html Although, this chunk of code produces CDS and exon entries in the output file, it does not output start codon/stop codon annotations. Also, we think it misinterprets annotations, so that one do see UTR entries annotated as CDS' or exons. Many thanks for ideas/notes. Alex -- Alexander Kanapin, PhD Scientific Associate Ontario Institute for Cancer Research MaRS Centre, South Tower 101 College Street, Suite 800 Toronto, Ontario, Canada M5G 0A3 Tel: 647-260-7993 Toll-free: 1-866-678-6427 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From cjfields at illinois.edu Thu Mar 11 12:27:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 11:27:35 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9566C3.6000007@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> Message-ID: <7D743CA2-80A1-42E3-81D2-03B7CD01FC69@illinois.edu> Not that I know of, though we are certainly receptive to anyone wanting to work this into the current code. chris On Mar 8, 2010, at 3:06 PM, Ryan Golhar wrote: > Does Bioperl support color-space sequences, or FASTA formatted quality value files? > > ABI's Solid platform generates a number of files, two of which are fairly important (at the moment): > > 1) .csfasta > > Color-space sequences in FASTA format > > 2) .qual > > Quality values of each color call, also in FASTA format. > > I didn't see (at quick glance) support for this in Bioperl, but maybe someone can point me in the right direction? > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Thu Mar 11 12:35:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Mar 2010 17:35:32 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9566C3.6000007@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> Message-ID: <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: > Does Bioperl support color-space sequences, or FASTA formatted quality value > files? > > ABI's Solid platform generates a number of files, two of which are fairly > important (at the moment): > > 1) ?.csfasta > > Color-space sequences in FASTA format > > 2) .qual > > Quality values of each color call, also in FASTA format. You mean the QUAL format which was originally introduced by PHRED. Try "qual" as the format name in SeqIO, http://bioperl.org/wiki/HOWTO:SeqIO#Formats > I didn't see (at quick glance) support for this in Bioperl, but maybe > someone can point me in the right direction? I expect that (like in Biopython) you can treat color space FASTA + QUAL just like sequence space files, provided you are happy to interpret the color space strings yourself. Are you hoping to get BioPerl to convert the color space data into sequence space data for you? Peter From cjfields at illinois.edu Thu Mar 11 13:02:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 12:02:43 -0600 Subject: [Bioperl-l] GFF to GTF converter In-Reply-To: References: Message-ID: <8CB58FD4-633F-4711-A2F4-23D00AEB6FB8@illinois.edu> On Mar 11, 2010, at 9:56 AM, Alexander Kanapin wrote: > Hi BioPerl gurus, > > Does anybody knows a reliable GFF to GTF converter which can generate files acceptable by cufflinks ? > > We attempted to convert a drosophila and worm genome GFFs (taken from Flybase and Wormbase ftp) to GTF with Bio::FeatureIO > > #read from a file > my $in = Bio::FeatureIO->new(-file => $infile , -format => 'GFF'); > > #write out features > my $out = Bio::FeatureIO->new(-file => ">$outfile" , > -format => 'GFF' , > -version => 2.5); > > However, we discovered that the resulting file is not compliant with GTF format specifications as they are described here: http://mblab.wustl.edu/GTF22.html Just so this is clear, even though the FeatureIO docs currently state (and I quote): "[Bio::FeatureIO] is the officially sanctioned way of getting at the format objects, which most people should use." it is nowhere near complete, so I have removed said quote from main trunk and replaced with it a very explicit caveat about it's current state, i.e. highly experimental and not currently suggested for production use. It's basically half-baked right now; I am in the midst of refactoring Bio::FeatureIO to try getting it up to speed and to add in flexibility when parsing this data (I'm actually working on it right now), but it's early days on that and may take a bit. Do realize that, even with a refactored FeatureIO, this is one of the more significant problems with GTF, e.g. there are too many definitions of what constitutes GTF or GFF2, so no clear path on how to go about this. At this point most users end up writing up their own parsers, unfortunately. > Although, this chunk of code produces CDS and exon entries in the output file, it does not output start codon/stop codon annotations. > Also, we think it misinterprets annotations, so that one do see UTR entries annotated as CDS' or exons. The start/stop codons can normally be inferred from the CDS/UTRs and exons if they are provided, but again this is one of those issues where there isn't a lot of consistency with the data across various data sources (something addressed at the recent GMOD meeting). What is the source of your GFF? > Many thanks for ideas/notes. > > Alex > > -- > Alexander Kanapin, PhD > Scientific Associate > > Ontario Institute for Cancer Research > MaRS Centre, South Tower > 101 College Street, Suite 800 > Toronto, Ontario, Canada M5G 0A3 > Tel: 647-260-7993 > Toll-free: 1-866-678-6427 > www.oicr.on.ca > This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. chris From jessica.sun at gmail.com Thu Mar 11 14:38:21 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Thu, 11 Mar 2010 14:38:21 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation Message-ID: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> *I downloaded module *>* > Bio-SCF from CPAN. *>* > And I am trying to install it when I got the following error. Can *>* someone help? Thanks much in advance Note (probably harmless): No library found for -lstaden-read Writing Makefile for Bio::SCF how to obtain the missing library * -- Jessica Jingping Sun From cjfields at illinois.edu Thu Mar 11 14:49:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 13:49:51 -0600 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <62CF899F-7C31-49F0-8F5E-C99B2179F3A5@illinois.edu> Did you read the documentation for Bio-SCF? http://cpansearch.perl.org/src/LDS/Bio-SCF-1.03/INSTALL chris On Mar 11, 2010, at 1:38 PM, Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Mar 11 15:00:58 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 15:00:58 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <4536f7701003111200y7d194b3cp2aabb558dcbea5ca@mail.gmail.com> Hello Jessica, You need the Staden io-lib: http://staden.sourceforge.net/ It looks like 1.12.2 is the most recent release. Scott On Thu, Mar 11, 2010 at 2:38 PM, Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From rmb32 at cornell.edu Thu Mar 11 15:02:28 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 11 Mar 2010 12:02:28 -0800 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <4B994C54.50501@cornell.edu> Hello Jessica, For Bio-SCF, you have to have the staden package installed. See the INSTALL notes included in the Bio-SCF distribution. The easiest way to view the INSTALL notes for a perl module's distribution: - go to http://search.cpan.org/ - search for 'Bio::SCF' - click the link to the Bio-SCF-1.03 distribution you see in the search results - the page linked here describes the installation package that Bio::SCF comes in. - On that page, you will see a link to the INSTALL notes for it. This is a good thing to know how to do when you have problems with other perl modules as well. But yes, as Chris said, those installation notes direct you to install the staden io-lib libraries from staden.sourceforge.net. Rob Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > From jessica.sun at gmail.com Thu Mar 11 15:49:49 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Thu, 11 Mar 2010 15:49:49 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <4B994C54.50501@cornell.edu> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> <4B994C54.50501@cornell.edu> Message-ID: <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> Thanks, I got it resolve. Do any one knows how to add a scale of the blast hit image through Bio:Graphics, I mean the rectangle should be difference width rather than the same at the example. shown here http://www.bioperl.org/wiki/HOWTO:Graphics Thanks, On Thu, Mar 11, 2010 at 3:02 PM, Robert Buels wrote: > Hello Jessica, > > For Bio-SCF, you have to have the staden package installed. See the > INSTALL notes included in the Bio-SCF distribution. > > The easiest way to view the INSTALL notes for a perl module's distribution: > - go to http://search.cpan.org/ > - search for 'Bio::SCF' > - click the link to the Bio-SCF-1.03 distribution you see in the search > results > - the page linked here describes the installation package that Bio::SCF > comes in. > - On that page, you will see a link to the INSTALL notes for it. > > This is a good thing to know how to do when you have problems with other > perl modules as well. > > > But yes, as Chris said, those installation notes direct you to install the > staden io-lib libraries from staden.sourceforge.net. > > Rob > > Jessica Sun wrote: > >> *I downloaded module >> >> *>* > Bio-SCF from CPAN. >> *>* > And I am trying to install it when I got the following error. Can >> *>* someone help? Thanks much in advance >> Note (probably harmless): No library found for -lstaden-read >> Writing Makefile for Bio::SCF >> >> how to obtain the missing library >> >> >> * >> >> >> >> > -- Jessica Jingping Sun From scott at scottcain.net Thu Mar 11 16:33:47 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 16:33:47 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> <4B994C54.50501@cornell.edu> <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> Message-ID: <4536f7701003111333q2105c71ftdab0c0b71372ba9f@mail.gmail.com> Hello Jessica, A few things: * It would be better to start a new thread to ask an unrelated question, since people may see the subject of this thread and ignore it if they don't know the answer to the original question. * Can you please try to ask your question again, with more details? Like what have you done already, what was the result, and what would you like for it to look like. If you want it to look like something that is on the wiki, link to that something. The Howto page you linked to has lots of pictures on it. Scott On Thu, Mar 11, 2010 at 3:49 PM, Jessica Sun wrote: > Thanks, I got it resolve. > > Do any one knows how to add a scale of the blast hit image through > Bio:Graphics, I mean the rectangle should be difference width rather than > the same at the example. shown here > > http://www.bioperl.org/wiki/HOWTO:Graphics > > > > Thanks, > > > > On Thu, Mar 11, 2010 at 3:02 PM, Robert Buels wrote: > >> Hello Jessica, >> >> For Bio-SCF, you have to have the staden package installed. ?See the >> INSTALL notes included in the Bio-SCF distribution. >> >> The easiest way to view the INSTALL notes for a perl module's distribution: >> ?- go to http://search.cpan.org/ >> ?- search for 'Bio::SCF' >> ?- click the link to the Bio-SCF-1.03 distribution you see in the search >> results >> ?- the page linked here describes the installation package that Bio::SCF >> comes in. >> ?- On that page, you will see a link to the INSTALL notes for it. >> >> This is a good thing to know how to do when you have problems with other >> perl modules as well. >> >> >> But yes, as Chris said, those installation notes direct you to install the >> staden io-lib libraries from staden.sourceforge.net. >> >> Rob >> >> Jessica Sun wrote: >> >>> *I downloaded module >>> >>> *>* > Bio-SCF from CPAN. >>> *>* > And I am trying to install it when I got the following error. Can >>> *>* someone help? Thanks much in advance >>> Note (probably harmless): No library found for -lstaden-read >>> Writing Makefile for Bio::SCF >>> >>> how to obtain the missing library >>> >>> >>> * >>> >>> >>> >>> >> > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From golharam at umdnj.edu Thu Mar 11 21:19:37 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 11 Mar 2010 21:19:37 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> Message-ID: <4B99A4B9.1070901@umdnj.edu> Not convert the sequences, just read the sequence file and allow me to process each one individually, sort of like: $seqio = new Bio::Seq(...) while ($seq = $seqio->next_seq) { ... } Peter wrote: > On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: >> Does Bioperl support color-space sequences, or FASTA formatted quality value >> files? >> >> ABI's Solid platform generates a number of files, two of which are fairly >> important (at the moment): >> >> 1) .csfasta >> >> Color-space sequences in FASTA format >> >> 2) .qual >> >> Quality values of each color call, also in FASTA format. > > You mean the QUAL format which was originally introduced by PHRED. > Try "qual" as the format name in SeqIO, > http://bioperl.org/wiki/HOWTO:SeqIO#Formats > >> I didn't see (at quick glance) support for this in Bioperl, but maybe >> someone can point me in the right direction? > > I expect that (like in Biopython) you can treat color space FASTA + QUAL > just like sequence space files, provided you are happy to interpret the > color space strings yourself. > > Are you hoping to get BioPerl to convert the color space data into > sequence space data for you? > > Peter > From cjfields at illinois.edu Thu Mar 11 22:35:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 21:35:50 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B99A4B9.1070901@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> Message-ID: Ryan, We would have to see example files to get an idea of how feasible it is. You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual stream, and interleave the two somehow. However, BioPerl qual scores are PHRED-based by default, and I'm not sure how color-space data would work within that schematic. chris On Mar 11, 2010, at 8:19 PM, Ryan Golhar wrote: > Not convert the sequences, just read the sequence file and allow me to > process each one individually, sort of like: > > $seqio = new Bio::Seq(...) > while ($seq = $seqio->next_seq) { > ... > } > > Peter wrote: >> On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: >>> Does Bioperl support color-space sequences, or FASTA formatted quality value >>> files? >>> >>> ABI's Solid platform generates a number of files, two of which are fairly >>> important (at the moment): >>> >>> 1) .csfasta >>> >>> Color-space sequences in FASTA format >>> >>> 2) .qual >>> >>> Quality values of each color call, also in FASTA format. >> You mean the QUAL format which was originally introduced by PHRED. >> Try "qual" as the format name in SeqIO, >> http://bioperl.org/wiki/HOWTO:SeqIO#Formats >>> I didn't see (at quick glance) support for this in Bioperl, but maybe >>> someone can point me in the right direction? >> I expect that (like in Biopython) you can treat color space FASTA + QUAL >> just like sequence space files, provided you are happy to interpret the >> color space strings yourself. >> Are you hoping to get BioPerl to convert the color space data into >> sequence space data for you? >> Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Fri Mar 12 02:28:20 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 12 Mar 2010 07:28:20 +0000 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3969F1.8080002@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <4A3969F1.8080002@sendu.me.uk> Message-ID: <358f4d651003112328g2864ef1as7b8c44ce7bb77c82@mail.gmail.com> > I think not. Well, at least SeqFeature::Store doesn't scale. Try storing > millions of features in a database and watch it crawl to complete > unusability. I can't imagine a db scaling to holding hundreds of TB of data > either. I'm also not sure what the benefit is. There are already high-speed > ways of indexing your fastq or bam files. Hi Sendu, What are the available options to have a quick indexing of fastq files that can be integrated into bioperl? Bio::Index::fastq can be painfully slow for the latest Illumina runs... Cheers, Albert. From biopython at maubp.freeserve.co.uk Fri Mar 12 05:06:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Mar 2010 10:06:46 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> Message-ID: <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: > Ryan, > > We would have to see example files to get an idea of how feasible it is. >?You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual > stream, and interleave the two somehow. ?However, BioPerl qual > scores are PHRED-based by default, and I'm not sure how color-space > data would work within that schematic. > > chris Chris, I am under the (possibly mistaken) assumption that PHRED scores are used for SOLiD color space QUAL files - the key issue is each score corresponds to the color call in the color sequence. Ignoring color-space for a moment, are there BioPerl examples of iterating over a pair of sequence-space FASTA and QUAL files? i.e. What you'd get if you had a FASTQ file to iterate over. [I guess Ryan could just merge the color-space FASTA and QUAL into a color-space FASTQ file and iterate over that] Peter From cjfields at illinois.edu Fri Mar 12 08:04:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 12 Mar 2010 07:04:53 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> Message-ID: <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> On Mar 12, 2010, at 4:06 AM, Peter wrote: > On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: >> Ryan, >> >> We would have to see example files to get an idea of how feasible it is. >> You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual >> stream, and interleave the two somehow. However, BioPerl qual >> scores are PHRED-based by default, and I'm not sure how color-space >> data would work within that schematic. >> >> chris > > Chris, > > I am under the (possibly mistaken) assumption that PHRED scores > are used for SOLiD color space QUAL files - the key issue is each > score corresponds to the color call in the color sequence. > > Ignoring color-space for a moment, are there BioPerl examples > of iterating over a pair of sequence-space FASTA and QUAL files? > i.e. What you'd get if you had a FASTQ file to iterate over. > > [I guess Ryan could just merge the color-space FASTA and > QUAL into a color-space FASTQ file and iterate over that] > > Peter If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things. Iterating over pairs is something that has popped up before. For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like: -------------------------------- my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq'); my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta'); my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual'); while (my $seq = $in->next_seq) { $out1->write_fasta($seq); $out2->write_fasta($seq); } -------------------------------- Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'. This should work for those as well, just haven't tried it myself (it's a bug otherwise). I'm assuming for input it would be something like: -------------------------------- my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta'); my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual'); my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq'); # 'qual' parser joins the two streams while (my $seq = $in2->next_seq($in1)) { $out->write_seq($seq); } -------------------------------- chris From biopython at maubp.freeserve.co.uk Fri Mar 12 08:26:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Mar 2010 13:26:39 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9A3D14.3010208@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> <4B9A3D14.3010208@umdnj.edu> Message-ID: <320fb6e01003120526x7c0c3dddjb4e1422a41968894@mail.gmail.com> On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: > > Here is an example of a color-space sequence: > > In one file (something.csfasta): > >>1_30_226_F3 > T210320010.200.03.0110320320220212200122200.2220200 >>1_30_252_F3 > T322220212.133.00.2202322132022202221002011.0011020 > > The '.' means the color could not be called > > In another file (something.qual): > >>1_30_226_F3 > 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 > 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>1_30_252_F3 > 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 > 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 > > The -1 represents those colors that could not be called. Now that is funny (using -1). True PHRED scores are defined with a logarithm and can't be negative. A score of zero is normally used in this situation since that maps to a probability of error of 1 (i.e. the read is 100% wrong, or 0% true). Where did these files come from? Direct from a sequencing machine or via some third party script? Peter From golharam at umdnj.edu Fri Mar 12 08:43:01 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 12 Mar 2010 13:43:01 +0000 Subject: [Bioperl-l] Next Gen Formats Message-ID: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Direct from sequencing machine ------Original Message------ From: Peter Sender: p.j.a.cock at googlemail.com To: golharam at umdnj.edu Cc: Chris Fields Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Next Gen Formats Sent: Mar 12, 2010 8:26 AM On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: > > Here is an example of a color-space sequence: > > In one file (something.csfasta): > >>1_30_226_F3 > T210320010.200.03.0110320320220212200122200.2220200 >>1_30_252_F3 > T322220212.133.00.2202322132022202221002011.0011020 > > The '.' means the color could not be called > > In another file (something.qual): > >>1_30_226_F3 > 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 > 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>1_30_252_F3 > 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 > 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 > > The -1 represents those colors that could not be called. Now that is funny (using -1). True PHRED scores are defined with a logarithm and can't be negative. A score of zero is normally used in this situation since that maps to a probability of error of 1 (i.e. the read is 100% wrong, or 0% true). Where did these files come from? Direct from a sequencing machine or via some third party script? Peter Sent from my Verizon Wireless BlackBerry From cjfields at illinois.edu Fri Mar 12 09:06:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 12 Mar 2010 08:06:51 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> References: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Message-ID: For the colorspace fasta we could derive a parser just for that based on the current fasta parser. They could retain their original color space designation (maybe via a meta designation), and possibly convert to sequence calls based on their mapping (if the following link is current): http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf Did the sequencing facility provide the actual sequence, though, and not just the color calls and qual? Seems strange to not provide it... chris On Mar 12, 2010, at 7:43 AM, Ryan Golhar wrote: > Direct from sequencing machine > > ------Original Message------ > From: Peter > Sender: p.j.a.cock at googlemail.com > To: golharam at umdnj.edu > Cc: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Next Gen Formats > Sent: Mar 12, 2010 8:26 AM > > On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: >> >> Here is an example of a color-space sequence: >> >> In one file (something.csfasta): >> >>> 1_30_226_F3 >> T210320010.200.03.0110320320220212200122200.2220200 >>> 1_30_252_F3 >> T322220212.133.00.2202322132022202221002011.0011020 >> >> The '.' means the color could not be called >> >> In another file (something.qual): >> >>> 1_30_226_F3 >> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 >> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>> 1_30_252_F3 >> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 >> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 >> >> The -1 represents those colors that could not be called. > > Now that is funny (using -1). True PHRED scores are defined with a > logarithm and can't be negative. A score of zero is normally used in > this situation since that maps to a probability of error of 1 (i.e. the > read is 100% wrong, or 0% true). > > Where did these files come from? Direct from a sequencing > machine or via some third party script? > > Peter > > > Sent from my Verizon Wireless BlackBerry > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Fri Mar 12 08:09:40 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 12 Mar 2010 08:09:40 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> Message-ID: <4B9A3D14.3010208@umdnj.edu> Here is an example of a color-space sequence: In one file (something.csfasta): >1_30_226_F3 T210320010.200.03.0110320320220212200122200.2220200 >1_30_252_F3 T322220212.133.00.2202322132022202221002011.0011020 The '.' means the color could not be called In another file (something.qual): >1_30_226_F3 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >1_30_252_F3 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 The -1 represents those colors that could not be called. Chris Fields wrote: > On Mar 12, 2010, at 4:06 AM, Peter wrote: > >> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: >>> Ryan, >>> >>> We would have to see example files to get an idea of how feasible it is. >>> You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual >>> stream, and interleave the two somehow. However, BioPerl qual >>> scores are PHRED-based by default, and I'm not sure how color-space >>> data would work within that schematic. >>> >>> chris >> Chris, >> >> I am under the (possibly mistaken) assumption that PHRED scores >> are used for SOLiD color space QUAL files - the key issue is each >> score corresponds to the color call in the color sequence. >> >> Ignoring color-space for a moment, are there BioPerl examples >> of iterating over a pair of sequence-space FASTA and QUAL files? >> i.e. What you'd get if you had a FASTQ file to iterate over. >> >> [I guess Ryan could just merge the color-space FASTA and >> QUAL into a color-space FASTQ file and iterate over that] >> >> Peter > > If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things. > > Iterating over pairs is something that has popped up before. For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like: > > -------------------------------- > my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq'); > my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta'); > my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual'); > > while (my $seq = $in->next_seq) { > $out1->write_fasta($seq); > $out2->write_fasta($seq); > } > -------------------------------- > > Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'. This should work for those as well, just haven't tried it myself (it's a bug otherwise). > > I'm assuming for input it would be something like: > > -------------------------------- > my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta'); > my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual'); > my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq'); > > # 'qual' parser joins the two streams > while (my $seq = $in2->next_seq($in1)) { > $out->write_seq($seq); > } > -------------------------------- > > chris > > From pmiguel at purdue.edu Fri Mar 12 09:56:33 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 12 Mar 2010 09:56:33 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: References: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Message-ID: <4B9A5621.2020006@purdue.edu> Hi Chris, Converting back and forth from color space is something that would be needed. However, a warning for anyone working with color space data: It is a really bad idea to convert raw color space reads into sequence. This is because conversion propagates from the key base on the left to the right. A sequence error *anywhere* in the sequence will ensure all bases farther down will be converted on the wrong track. Analogous to a "frame shift" -- except there are 4 "frames", not 3. Meanwhile, the converse is not true--sequence space bases can be converted into color space without error propagation. So you want to do all your work in color space and convert to real sequence only at the end, when your consensus certain. A little more detail here: http://seqanswers.com/forums/showthread.php?t=3367 For people wanting to use a non-color space aware program for analysis of color space data, it is possible to use a process called "double encoding", where 0,1,2,3 bases of color space are just replaced with A, C, G, T of a "fake" base space. This is nearly the same as working in color space and does not incur the propagation error issues. However it is fraught with the obvious problems: you might later confuse the double encoded sequence with true sequence space with likely maddening results. Also, to get the opposite strand of color space reads you reverse without complementing. So top and bottom strands will look different. Finally, Kevin McKernan said that the dual base encoding error-detection scheme was technically using "Perforated Convolutional Codes" and said these were used on 3G networks. I only mention this in case there are some engineering types who might be interested. Phillip Chris Fields wrote: > For the colorspace fasta we could derive a parser just for that based on the current fasta parser. They could retain their original color space designation (maybe via a meta designation), and possibly convert to sequence calls based on their mapping (if the following link is current): > > http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf > > Did the sequencing facility provide the actual sequence, though, and not just the color calls and qual? Seems strange to not provide it... > > chris > > On Mar 12, 2010, at 7:43 AM, Ryan Golhar wrote: > > >> Direct from sequencing machine >> >> ------Original Message------ >> From: Peter >> Sender: p.j.a.cock at googlemail.com >> To: golharam at umdnj.edu >> Cc: Chris Fields >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Next Gen Formats >> Sent: Mar 12, 2010 8:26 AM >> >> On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: >> >>> Here is an example of a color-space sequence: >>> >>> In one file (something.csfasta): >>> >>> >>>> 1_30_226_F3 >>>> >>> T210320010.200.03.0110320320220212200122200.2220200 >>> >>>> 1_30_252_F3 >>>> >>> T322220212.133.00.2202322132022202221002011.0011020 >>> >>> The '.' means the color could not be called >>> >>> In another file (something.qual): >>> >>> >>>> 1_30_226_F3 >>>> >>> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 >>> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>> >>>> 1_30_252_F3 >>>> >>> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 >>> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 >>> >>> The -1 represents those colors that could not be called. >>> >> Now that is funny (using -1). True PHRED scores are defined with a >> logarithm and can't be negative. A score of zero is normally used in >> this situation since that maps to a probability of error of 1 (i.e. the >> read is 100% wrong, or 0% true). >> >> Where did these files come from? Direct from a sequencing >> machine or via some third party script? >> >> Peter >> >> >> Sent from my Verizon Wireless BlackBerry >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Fri Mar 12 10:44:35 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 12 Mar 2010 07:44:35 -0800 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: <30E5CA8A-56DE-4764-9A50-DF2E95015216@gmail.com> References: <4B96B442.8070003@bioperl.org> <30E5CA8A-56DE-4764-9A50-DF2E95015216@gmail.com> Message-ID: <4B9A6163.9060407@bioperl.org> I'm sure it does, that what it is supposed to do. I don't know that there is any way to directly get what you want but the code since the format that you want is not a standard multiple-alignment output format. You might consider clustalw format which shows the identical columns with '*' and you can keep the start/stop of the alignment embedded in the sequence names. Or you can extract the code you need that does the writing out of the writer module so you can try and dig out what you need. You're asking for something that is a customized view that is not standard and the tools for it are in the existing code, so it means you need to roll your view own from it. This would just mean another ResultWriter module that looks a lot like the existing one, but doesn't write the header and footer and hit table out - so those methods would just not do anything... -jason Janine Arloth wrote, On 3/12/10 12:40 AM: > Hi, > thanks... > but > > use Bio::SearchIO; > use Bio::SearchIO::Writer::TextResultWriter; > > my $in = Bio::SearchIO->new(-format => 'blast', > -file => shift @ARGV); > > my $writer = Bio::SearchIO::Writer::TextResultWriter->new(); > my $out = Bio::SearchIO->new(-writer => $writer); > $out->write_result($in->next_result); > > gives me the whole result, but I only need the alignment ;( > Am 09.03.2010 um 21:49 schrieb Jason Stajich: > > >> SearchIO writer -> BLAST format. presumably something like Bio::SearchIO::Writer::TextResultWriter >> >> Janine Arloth wrote, On 3/5/10 1:43 AM: >> >>> Hello, >>> using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: >>> >>> 1 50 >>> test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA >>> lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA >>> >>> >>> 51 100 >>> test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA >>> lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA >>> >>> >>> >>> But I prefer this format: >>> >>> >>> >>> Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 >>> >>> Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 >>> ||||||||||||||||||||||||| >>> Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 >>> >>> >>> How can I get this? >>> >>> Best Regards >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > From maj at fortinbras.us Fri Mar 12 10:45:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 12 Mar 2010 10:45:15 -0500 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: References: Message-ID: <31AA49FD0FDD466CB349ABAE75591B26@NewLife> along with Jason's comment then you'll need to loop through the node pairs by hand: my @leaves = $tree->get_leaf_nodes; my @dists; while (my $l = shift @leaves) { foreach my $m (@leaves) { push @dists, $tree->distance( -nodes => [$l, $m] ); } } should give you all n(n-1)/2 pairwise distances. ----- Original Message ----- From: "Jeffrey Detras" To: Sent: Friday, March 05, 2010 1:17 AM Subject: [Bioperl-l] distances between leaf nodes > Hi, > > I am new at using the Bio::TreeIO module specifically using the newick > format for a phylogenetic analysis. The sample_tree attached is > Newick-formatted tree. My objective is to get all the distances between all > the leaf nodes. I copied examples of the code from > http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my > knowledge) so that I understand how to assign the right array value for the > nodes/leaves. The message would say must provide 2 root nodes. > > Here is what I have right now: > > #!/usr/bin/perl -w > use strict; > > my $treefile = 'sample_tree'; > use Bio::TreeIO; > my $treeio = Bio::TreeIO->new(-format => 'newick', > -file => $treefile); > > while (my $tree = $treeio->next_tree) { > my @leaves = $tree->get_leaf_nodes; > for (my $dist = $tree->distance(-nodes => \@leaves)){ > print "Distance between trees is $dist\n"; > } > } > > Thanks, > Jeff > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rtbio.2009 at gmail.com Fri Mar 12 12:36:44 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 12 Mar 2010 18:36:44 +0100 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Hello all, I am trying remote blast program and connecting to NCBI Blast, but I am unable to retrieve the sequences. Chris had suggested me to update from SVN. Could you please tell me how to update it from SVN? Regards, Roopa. On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: > Hi Chris, > > Thank you very much for the information. Could you please tell me how to > update it from SVN? > > Thanks and regards, > Roopa > > > On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: > >> Roopa, >> >> I committed a fix for this a few days ago; if you update from SVN it >> should work. The problem stemmed from server-side changes at NCBI. >> >> chris >> >> On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: >> >> > Hello Mark and everybody, >> > >> > I have been trying to connect to remote blast to retrieve similar >> sequences >> > to a given sequence. But my program is unable to retrieve the sequences >> from >> > BLAST, i.e., it is getting executed till the remote blast ids, but it is >> not >> > entering the else loop after collecting the rid. Please check this >> problem >> > and help me in this regard. I think the problem is in getting the >> sequence >> > and going to the 'else' part. i.e., >> > >> > else { >> > >> > open(OUTFILE,'>',$blastdebugfile); # I think the problem >> is >> > in else part, i.e., it is not taking the next result.# >> > print OUTFILE "else entered"; >> > close(OUTFILE); >> > >> > my $result = $rc->next_result(); >> > >> > #save the output >> > >> > Please give me your reply. >> > >> > Thanks and regards, >> > Roopa. >> > >> > My code is as follows. >> > >> > #!/usr/bin/perl >> > >> > #path for extra camel module >> > use lib "/srv/www/htdocs/rain/RNAi/"; >> > use rnai_blast; >> > >> > >> > use Bio::SearchIO; >> > use Bio::Search::Result::BlastResult; >> > use Bio::Perl; >> > use Bio::Tools::Run::RemoteBlast; >> > use Bio::Seq; >> > use Bio::SeqIO; >> > use Bio::DB::GenBank; >> > >> > $serverpath = "/srv/www/htdocs/rain/RNAi"; >> > $serverurl = "http://141.84.66.66/rain/RNAi"; >> > $outfile = $serverpath."/rnairesult_".time().".html"; >> > $nuc = $serverpath."/nuc".time().".txt"; >> > $debugfile = $serverpath."/debug_".time().".txt"; >> > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> > >> > my $outstring =""; >> > >> > &parse_form; >> > >> > print "Content-type: text/html\n\n"; >> > print "\n"; >> > print "RNAi Result"; >> > print "> > URL=$serverurl/rnairesult_".time().".html\"> \n"; >> > print "\n"; >> > print "\n"; >> > print " Your results will appear > > href=$serverurl/rnairesult_".time().".html>here
"; >> > print " Please be patient, runtime can be up to 5 minutes
"; >> > print " This page will automatically reload in 30 seconds."; >> > print "\n"; >> > print "\n"; >> > >> > defined(my $pid = fork) or die "Can't fork: $!"; >> > exit if $pid; >> > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >> > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >> > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >> > >> > >> > >> > open(OUTFILE, '>',$outfile); >> > >> > print OUTFILE "\n >> > RNAi Result >> > > > URL=$serverurl//rnairesult_".time().".html\"> \n >> > >> > \n >> > \n >> > Your results will appear > > href=$serverurl/rnairesult_".time().".html>here
>> > Please be patient, runtime can be up to 5 minutes
>> > This page will automatically reload in 30 seconds
>> > \n >> > \n"; >> > >> > close(OUTFILE); >> > >> > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >> > >> > $in{'Inputseq'} =~ s/>.*$//m; >> > $in{'Inputseq'} =~ s/[^TAGC]//gim; >> > $in{'Inputseq'} =~ tr/actg/ACTG/; >> > >> > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >> > $in{'Threshold'}); >> > >> > >> > sub blastcode >> > { >> > >> > $inpu1= $_[0]; >> > >> > $organ= $_[1]; >> > >> > open(NUC,'>',$nuc); >> > print NUC $inpu1,"\n"; >> > close(NUC); >> > >> > my $prog = 'blastn'; >> > my $db = 'refseq_rna'; >> > my $e_val= '1e-10'; >> > my $organism= $organ; >> > >> > $gb = new Bio::DB::GenBank; >> > >> > my @params = ( '-prog' => $prog, >> > '-data' => $db, >> > '-expect' => $e_val, >> > '-readmethod' => 'SearchIO', >> > '-Organism' => $organism ); >> > >> > open(OUTFILE,'>',$blastdebugfile); >> > print OUTFILE @params; >> > close(OUTFILE); >> > >> > >> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >> => >> > "$organ\[ORGN]"); >> > >> > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> > >> > #change a paramter >> > >> > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> > Brucei[ORGN]'; >> > >> > #change a paramter >> > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >> '$input2[ORGN]'; >> > >> > my $v = 1; >> > #$v is just to turn on and off the messages >> > >> > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >> > '-organism' => "$organ\[ORGN]"); >> > >> > while (my $input = $str->next_seq()) >> > { >> > #Blast a sequence against a database: >> > #Alternatively, you could pass in a file with many >> > #sequences rather than loop through sequence one at a time >> > #Remove the loop starting 'while (my $input = $str->next_seq())' >> > #and swap the two lines below for an example of that. >> > open(OUTFILE,'>',$debugfile); >> > print OUTFILE $input; >> > close(OUTFILE); >> > >> > #submits the input data to BLAST# >> > >> > my $r = $factory->submit_blast($input); >> > >> > open(OUTFILE,'>',$debugfile); >> > print OUTFILE $r; >> > close(OUTFILE); >> > >> > >> > print STDERR "waiting...." if($v>0); >> > >> > while ( my @rids = $factory->each_rid ) { >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "while entered"; >> > close(OUTFILE); >> > foreach my $rid ( @rids ) { >> > >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "foreach entered"; >> > close(OUTFILE); >> > #Retrieving the result ids# >> > >> > my $rc = $factory->retrieve_blast($rid); >> > >> > if( !ref($rc) ) >> > { >> > if( $rc < 0 ) >> > { >> > $factory->remove_rid($rid); >> > } >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "if entered"; >> > close(OUTFILE); >> > print STDERR "." if ( $v > 0 ); >> > sleep 5; >> > } >> > >> > else { >> > >> > open(OUTFILE,'>',$blastdebugfile); # I think the problem >> is >> > in else part, i.e., it is not taking the next result.# >> > print OUTFILE "else entered"; >> > close(OUTFILE); >> > >> > my $result = $rc->next_result(); >> > >> > #save the output >> > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> > >> > open(BLASTDEBUGFILE,'>',$blastdebugfile); >> > print BLASTDEBUGFILE $result->next_hit(); >> > close(BLASTDEBUGFILE); >> > #saving the output in blastdata.time.out file# >> > >> > # $random=rand(); >> > >> > my $filename = $serverpath."/blastdata_".time()."\.out"; >> > # open(DEBUGFILE,'>',$debugfile); >> > # open(new,'>',$filename); >> > # @arra=; >> > # print DEBUGFILE @arra; >> > # close(DEBUGFILE); >> > # close(new); >> > >> > $factory->save_output($filename); >> > >> > # open(BLASTDEBUGFILE,'>',$debugfile); >> > # print BLASTDEBUGFILE "Hello $rid"; >> > # close(BLASTDEBUGFILE); >> > >> > $factory->remove_rid($rid); >> > >> > open(BLASTDEBUGFILE,'>',$blastdebugfile); >> > # print BLASTDEBUGFILE $organism; >> > close(BLASTDEBUGFILE); >> > >> > # open(OUTFILE,'>',$outfile); >> > # print OUTFILE "Test2 $result->database_name()"; >> > # close(OUTFILE); >> > >> > #$hit = $result->next_hit; >> > #open(new,'>',$debugfile); >> > #print $hit; >> > #close(new); >> > $dummy=0; >> > while ( my $hit = $result->next_hit ) { >> > >> > next unless ( $v >= 0); >> > >> > # open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "$hit in while hits"; >> > # close(OUTFILE); >> > >> > my $sequ = $gb->get_Seq_by_version($hit->name); >> > my $dna = $sequ->seq(); # get the sequence as a string >> > $dummy++; >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE $dna; >> > close(OUTFILE); >> > push(@seqs,$dna); >> > } >> > } >> > } >> > } >> > } >> > >> > $warum=@seqs; >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE $warum; >> > print OUTFILE @seqs; >> > close(OUTFILE); >> > >> > >> > return(@seqs); #returning the sequences obtained on BLAST# >> > } >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From bosborne11 at verizon.net Fri Mar 12 12:46:52 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 12 Mar 2010 12:46:52 -0500 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Please google "svn update bioperl". On Mar 12, 2010, at 12:36 PM, Roopa Raghuveer wrote: > Hello all, > > I am trying remote blast program and connecting to NCBI Blast, but I am > unable to retrieve the sequences. Chris had suggested me to update from SVN. > Could you please tell me how to update it from SVN? > > Regards, > Roopa. > > On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: > >> Hi Chris, >> >> Thank you very much for the information. Could you please tell me how to >> update it from SVN? >> >> Thanks and regards, >> Roopa >> >> >> On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: >> >>> Roopa, >>> >>> I committed a fix for this a few days ago; if you update from SVN it >>> should work. The problem stemmed from server-side changes at NCBI. >>> >>> chris >>> >>> On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: >>> >>>> Hello Mark and everybody, >>>> >>>> I have been trying to connect to remote blast to retrieve similar >>> sequences >>>> to a given sequence. But my program is unable to retrieve the sequences >>> from >>>> BLAST, i.e., it is getting executed till the remote blast ids, but it is >>> not >>>> entering the else loop after collecting the rid. Please check this >>> problem >>>> and help me in this regard. I think the problem is in getting the >>> sequence >>>> and going to the 'else' part. i.e., >>>> >>>> else { >>>> >>>> open(OUTFILE,'>',$blastdebugfile); # I think the problem >>> is >>>> in else part, i.e., it is not taking the next result.# >>>> print OUTFILE "else entered"; >>>> close(OUTFILE); >>>> >>>> my $result = $rc->next_result(); >>>> >>>> #save the output >>>> >>>> Please give me your reply. >>>> >>>> Thanks and regards, >>>> Roopa. >>>> >>>> My code is as follows. >>>> >>>> #!/usr/bin/perl >>>> >>>> #path for extra camel module >>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>> use rnai_blast; >>>> >>>> >>>> use Bio::SearchIO; >>>> use Bio::Search::Result::BlastResult; >>>> use Bio::Perl; >>>> use Bio::Tools::Run::RemoteBlast; >>>> use Bio::Seq; >>>> use Bio::SeqIO; >>>> use Bio::DB::GenBank; >>>> >>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>> $nuc = $serverpath."/nuc".time().".txt"; >>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>> >>>> my $outstring =""; >>>> >>>> &parse_form; >>>> >>>> print "Content-type: text/html\n\n"; >>>> print "\n"; >>>> print "RNAi Result"; >>>> print ">>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>> print "\n"; >>>> print "\n"; >>>> print " Your results will appear >>> href=$serverurl/rnairesult_".time().".html>here
"; >>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>> print " This page will automatically reload in 30 seconds."; >>>> print "\n"; >>>> print "\n"; >>>> >>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>> exit if $pid; >>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>> >>>> >>>> >>>> open(OUTFILE, '>',$outfile); >>>> >>>> print OUTFILE "\n >>>> RNAi Result >>>> >>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>> >>>> \n >>>> \n >>>> Your results will appear >>> href=$serverurl/rnairesult_".time().".html>here
>>>> Please be patient, runtime can be up to 5 minutes
>>>> This page will automatically reload in 30 seconds
>>>> \n >>>> \n"; >>>> >>>> close(OUTFILE); >>>> >>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>> >>>> $in{'Inputseq'} =~ s/>.*$//m; >>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>> >>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>> $in{'Threshold'}); >>>> >>>> >>>> sub blastcode >>>> { >>>> >>>> $inpu1= $_[0]; >>>> >>>> $organ= $_[1]; >>>> >>>> open(NUC,'>',$nuc); >>>> print NUC $inpu1,"\n"; >>>> close(NUC); >>>> >>>> my $prog = 'blastn'; >>>> my $db = 'refseq_rna'; >>>> my $e_val= '1e-10'; >>>> my $organism= $organ; >>>> >>>> $gb = new Bio::DB::GenBank; >>>> >>>> my @params = ( '-prog' => $prog, >>>> '-data' => $db, >>>> '-expect' => $e_val, >>>> '-readmethod' => 'SearchIO', >>>> '-Organism' => $organism ); >>>> >>>> open(OUTFILE,'>',$blastdebugfile); >>>> print OUTFILE @params; >>>> close(OUTFILE); >>>> >>>> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>> => >>>> "$organ\[ORGN]"); >>>> >>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>> >>>> #change a paramter >>>> >>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>> Brucei[ORGN]'; >>>> >>>> #change a paramter >>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>> '$input2[ORGN]'; >>>> >>>> my $v = 1; >>>> #$v is just to turn on and off the messages >>>> >>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>> '-organism' => "$organ\[ORGN]"); >>>> >>>> while (my $input = $str->next_seq()) >>>> { >>>> #Blast a sequence against a database: >>>> #Alternatively, you could pass in a file with many >>>> #sequences rather than loop through sequence one at a time >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>> #and swap the two lines below for an example of that. >>>> open(OUTFILE,'>',$debugfile); >>>> print OUTFILE $input; >>>> close(OUTFILE); >>>> >>>> #submits the input data to BLAST# >>>> >>>> my $r = $factory->submit_blast($input); >>>> >>>> open(OUTFILE,'>',$debugfile); >>>> print OUTFILE $r; >>>> close(OUTFILE); >>>> >>>> >>>> print STDERR "waiting...." if($v>0); >>>> >>>> while ( my @rids = $factory->each_rid ) { >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "while entered"; >>>> close(OUTFILE); >>>> foreach my $rid ( @rids ) { >>>> >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "foreach entered"; >>>> close(OUTFILE); >>>> #Retrieving the result ids# >>>> >>>> my $rc = $factory->retrieve_blast($rid); >>>> >>>> if( !ref($rc) ) >>>> { >>>> if( $rc < 0 ) >>>> { >>>> $factory->remove_rid($rid); >>>> } >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "if entered"; >>>> close(OUTFILE); >>>> print STDERR "." if ( $v > 0 ); >>>> sleep 5; >>>> } >>>> >>>> else { >>>> >>>> open(OUTFILE,'>',$blastdebugfile); # I think the problem >>> is >>>> in else part, i.e., it is not taking the next result.# >>>> print OUTFILE "else entered"; >>>> close(OUTFILE); >>>> >>>> my $result = $rc->next_result(); >>>> >>>> #save the output >>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>> >>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>> print BLASTDEBUGFILE $result->next_hit(); >>>> close(BLASTDEBUGFILE); >>>> #saving the output in blastdata.time.out file# >>>> >>>> # $random=rand(); >>>> >>>> my $filename = $serverpath."/blastdata_".time()."\.out"; >>>> # open(DEBUGFILE,'>',$debugfile); >>>> # open(new,'>',$filename); >>>> # @arra=; >>>> # print DEBUGFILE @arra; >>>> # close(DEBUGFILE); >>>> # close(new); >>>> >>>> $factory->save_output($filename); >>>> >>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>> # print BLASTDEBUGFILE "Hello $rid"; >>>> # close(BLASTDEBUGFILE); >>>> >>>> $factory->remove_rid($rid); >>>> >>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>> # print BLASTDEBUGFILE $organism; >>>> close(BLASTDEBUGFILE); >>>> >>>> # open(OUTFILE,'>',$outfile); >>>> # print OUTFILE "Test2 $result->database_name()"; >>>> # close(OUTFILE); >>>> >>>> #$hit = $result->next_hit; >>>> #open(new,'>',$debugfile); >>>> #print $hit; >>>> #close(new); >>>> $dummy=0; >>>> while ( my $hit = $result->next_hit ) { >>>> >>>> next unless ( $v >= 0); >>>> >>>> # open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "$hit in while hits"; >>>> # close(OUTFILE); >>>> >>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>> my $dna = $sequ->seq(); # get the sequence as a string >>>> $dummy++; >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE $dna; >>>> close(OUTFILE); >>>> push(@seqs,$dna); >>>> } >>>> } >>>> } >>>> } >>>> } >>>> >>>> $warum=@seqs; >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE $warum; >>>> print OUTFILE @seqs; >>>> close(OUTFILE); >>>> >>>> >>>> return(@seqs); #returning the sequences obtained on BLAST# >>>> } >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Mar 12 12:41:23 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 12 Mar 2010 12:41:23 -0500 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Look at http://www.bioperl.org/wiki/Using_Subversion ----- Original Message ----- From: Roopa Raghuveer To: Chris Fields ; Mark A. Jensen ; bioperl-l at lists.open-bio.org Sent: Friday, March 12, 2010 12:36 PM Subject: Re: [Bioperl-l] remoteblast Hello all, I am trying remote blast program and connecting to NCBI Blast, but I am unable to retrieve the sequences. Chris had suggested me to update from SVN. Could you please tell me how to update it from SVN? Regards, Roopa. On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: Hi Chris, Thank you very much for the information. Could you please tell me how to update it from SVN? Thanks and regards, Roopa On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: Roopa, I committed a fix for this a few days ago; if you update from SVN it should work. The problem stemmed from server-side changes at NCBI. chris On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: > Hello Mark and everybody, > > I have been trying to connect to remote blast to retrieve similar sequences > to a given sequence. But my program is unable to retrieve the sequences from > BLAST, i.e., it is getting executed till the remote blast ids, but it is not > entering the else loop after collecting the rid. Please check this problem > and help me in this regard. I think the problem is in getting the sequence > and going to the 'else' part. i.e., > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > > Please give me your reply. > > Thanks and regards, > Roopa. > > My code is as follows. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use rnai_blast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds."; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes
> This page will automatically reload in 30 seconds
> \n > \n"; > > close(OUTFILE); > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$blastdebugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > "$organ\[ORGN]"); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > #submits the input data to BLAST# > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > #Retrieving the result ids# > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > #saving the output in blastdata.time.out file# > > # $random=rand(); > > my $filename = $serverpath."/blastdata_".time()."\.out"; > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > # print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dna; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > close(OUTFILE); > > > return(@seqs); #returning the sequences obtained on BLAST# > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jessica.sun at gmail.com Fri Mar 12 16:28:11 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 12 Mar 2010 16:28:11 -0500 Subject: [Bioperl-l] RefSeq Message-ID: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> I have a question: I have a refseq with NM_ number(mRNA), how can I get the genomic sequences(NT_number) with Bioperl, if it can be done? Thanks -- Jessica Jingping Sun From sidd.basu at gmail.com Sat Mar 13 15:29:52 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Sat, 13 Mar 2010 14:29:52 -0600 Subject: [Bioperl-l] Re: RefSeq In-Reply-To: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> References: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> Message-ID: <20100313202949.GA5621@Macintosh-74.local> The following code works with 1.6.1 of bioperl. It uses eutils and the workflow efetch -> elink -> esummary. #!/usr/bin/perl -w use strict; use Bio::DB::EUtilities; my $id = $ARGV[0] || 'NM_001618'; my $eutils = Bio::DB::EUtilities->new( -eutil => 'esearch', -db => 'nucleotide', -term => $id, -usehistory => 'y' ); my $hist = $eutils->next_History || die "no history\n"; $eutils->reset_parameters( -eutil => 'elink', -db => 'gene', -dbfrom => 'nuccore', -history => $hist ); my ($gene_id) = $eutils->next_LinkSet->get_ids; $eutils->reset_parameters( -eutil => 'esummary', -db => 'gene', -id => $gene_id, ); my ($item) = $eutils->next_DocSum->get_Items_by_name('GenomicInfoType'); print $item->get_contents_by_name('ChrAccVer'), "\n"; -siddhartha On Fri, 12 Mar 2010, Jessica Sun wrote: > I have a question: I have a refseq with NM_ number(mRNA), how can I get > the genomic sequences(NT_number) with Bioperl, if it can be done? > > Thanks > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robby.hones at gmail.com Sat Mar 13 18:57:43 2010 From: robby.hones at gmail.com (robby jhones) Date: Sat, 13 Mar 2010 15:57:43 -0800 Subject: [Bioperl-l] comparing fasta sequences in multiple files Message-ID: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> Dear Group, Can anyone offer advice on comparing multiple fasta sequences in many files. We have 1000's of fasta sequences in individual files of which I would like to fish out and print to a new file (the sequence and ID), ONLY the sequences which appear in at least a few of the files: 3 out of 4 runs, perhaps all 4 runs ( as some are replicates). Is there something out there which would do this? Thanks for your helps >>Robby From sdavis2 at mail.nih.gov Sat Mar 13 19:49:46 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 13 Mar 2010 19:49:46 -0500 Subject: [Bioperl-l] comparing fasta sequences in multiple files In-Reply-To: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> References: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> Message-ID: <264855a01003131649o725cf151i2fe51e948ebfc86d@mail.gmail.com> On Sat, Mar 13, 2010 at 6:57 PM, robby jhones wrote: > Dear Group, > > ?Can anyone offer advice on comparing multiple fasta sequences in many > files. We have 1000's of fasta sequences in individual files of which I > would like to fish out and print to a new file (the sequence and ID), ONLY > the sequences which appear in at least a few of the files: 3 out of 4 runs, > perhaps all 4 runs ( as some are replicates). > > ?Is there something out there which would do this? Hi, Robby. It sounds like making a hash of IDs and then incrementing a count for each as you loop over files would give you what you want? Sean From jessica.sun at gmail.com Sat Mar 13 20:29:08 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Sat, 13 Mar 2010 20:29:08 -0500 Subject: [Bioperl-l] RefSeq In-Reply-To: <20100313202949.GA5621@Macintosh-74.local> References: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> <20100313202949.GA5621@Macintosh-74.local> Message-ID: <9adc0e9b1003131729p4f78aa50kc1500cbbe01cd815@mail.gmail.com> Great. Thanks . On Sat, Mar 13, 2010 at 3:29 PM, Siddhartha Basu wrote: > The following code works with 1.6.1 of bioperl. It uses eutils and the > workflow efetch -> elink -> esummary. > > #!/usr/bin/perl -w > > use strict; > use Bio::DB::EUtilities; > > my $id = $ARGV[0] || 'NM_001618'; > > my $eutils = Bio::DB::EUtilities->new( > -eutil => 'esearch', > -db => 'nucleotide', > -term => $id, > -usehistory => 'y' > ); > > my $hist = $eutils->next_History || die "no history\n"; > > $eutils->reset_parameters( > -eutil => 'elink', > -db => 'gene', > -dbfrom => 'nuccore', > -history => $hist > ); > > my ($gene_id) = $eutils->next_LinkSet->get_ids; > > $eutils->reset_parameters( > -eutil => 'esummary', > -db => 'gene', > -id => $gene_id, > ); > > my ($item) = $eutils->next_DocSum->get_Items_by_name('GenomicInfoType'); > print $item->get_contents_by_name('ChrAccVer'), "\n"; > > -siddhartha > > On Fri, 12 Mar 2010, Jessica Sun wrote: > > > I have a question: I have a refseq with NM_ number(mRNA), how can I get > > the genomic sequences(NT_number) with Bioperl, if it can be done? > > > > Thanks > > > > > > -- > > Jessica Jingping Sun > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun From sdavis2 at mail.nih.gov Sun Mar 14 08:38:15 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 14 Mar 2010 07:38:15 -0500 Subject: [Bioperl-l] comparing fasta sequences in multiple files In-Reply-To: <407ea9d41003132312l755b2d9bm5a9d2ba83017fd02@mail.gmail.com> References: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> <264855a01003131649o725cf151i2fe51e948ebfc86d@mail.gmail.com> <407ea9d41003132312l755b2d9bm5a9d2ba83017fd02@mail.gmail.com> Message-ID: <264855a01003140538m6cee0c27s823e45d02002d200@mail.gmail.com> On Sun, Mar 14, 2010 at 2:12 AM, robby jhones wrote: > I think that I'll need to write a hash of the IDs and sequences, then > iterate over the sequences to see if they are identical and if so push them > and the ID into an output file. I was hoping there was something out there > like this, but I suppose not. Look in the mailing list archives for the last week or so. There was some discussion about generating hashes of sequences; you could use that to generate your hash of unique sequences. Sean > On Sat, Mar 13, 2010 at 4:49 PM, Sean Davis wrote: >> >> On Sat, Mar 13, 2010 at 6:57 PM, robby jhones >> wrote: >> > Dear Group, >> > >> > ?Can anyone offer advice on comparing multiple fasta sequences in many >> > files. We have 1000's of fasta sequences in individual files of which I >> > would like to fish out and print to a new file (the sequence and ID), >> > ONLY >> > the sequences which appear in at least a few of the files: 3 out of 4 >> > runs, >> > perhaps all 4 runs ( as some are replicates). >> > >> > ?Is there something out there which would do this? >> >> Hi, Robby. >> >> It sounds like making a hash of IDs and then incrementing a count for >> each as you loop over files would give you what you want? >> >> Sean > > From lpritc at scri.ac.uk Mon Mar 15 07:55:52 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 15 Mar 2010 11:55:52 +0000 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Message-ID: Hi Scott, Thanks for the reply. I tried your suggestions on a clean VM of CentOS 5.4 and the equally wordy outcome is below... On 02/03/2010 Tuesday, March 2, 16:11, "Scott Cain" wrote: > First, I am working on the 1.1 release of gmod/chado, and it > may fix some of the problems you are describing. Certainly, ID > collisions between GFF files should not be a problem (I didn't think > they were in the 1.0 release, but that was a long time ago). Please > try a checkout of the schema trunk in the gmod svn: > > http://gmod.org/wiki/SVN As a note for anyone following this, when I downloaded the trunk/chado files only, my build failed with """ $make [...] Manifying ../blib/man3/Bio::Chaos::ChaosGraph.3pm Manifying ../blib/man3/Bio::Chaos::FeatureUtil.3pm Manifying ../blib/man3/Bio::Chaos::XSLTHelper.3pm Manifying ../blib/man3/Bio::Chaos::Root.3pm make[1]: Leaving directory `/home/lpritc/Desktop/chado/chaos-xml' make: *** No rule to make target `bin/gmod_gff2biomart5.pl', needed by `blib/script/gmod_gff2biomart5.pl'. Stop. """ I had to download the whole trunk for the installation to work. I came across this thread: http://old.nabble.com/Minor-Makefile.PL-changes-td26272744.html while I was looking for a solution; someone else has had a similar problem. > Another thing you may want to look at is that just last week, a > developer at Texas A&M, Nathan Liles, contributed code to the > bioperl-live trunk for the genbank2gff3.pl script that will do a much > better job of converting bacterial genbank files to GFF3; perhaps that > will help too. Working with a svn checkout of bioperl-live shouldn't > be too scary either; the pieces you are interested in (that work with > Chado and GBrowse) are quite stable. I also checked out BioPerl-live. The svn server at code.open-bio.org was unresponsive for a couple of days, but Peter pointed me to GitHub at http://github.com/bioperl/bioperl-live so I went from there. The process isn't quite as clean as using the latest stable version of BioPerl, however. When I attempt to use the bp_genbank2gff3.pl script, I get the following error message: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk Can't locate object method "FT_SO_map" via package "Bio::SeqFeature::Tools::TypeMapper" at /usr/bin/bp_genbank2gff3.pl line 374. """ This appears to be associated with the following code (l207 onwards...) in TypeMapper: """ =head2 map_types_to_SO [...] hardcodes the genbank to SO mapping [...] dgg: separated out FT_SO_map for caller changes. Update with: open(FTSO,"curl -s http://sequenceontology.org/resources/mapping/FT_SO.txt|"); while(){ chomp; ($ft,$so,$sid,$ftdef,$sodef)= split"\t"; print " '$ft' => '$so',\n" if($ft && $so && $ftdef); } =cut sub ft_so_map { # $self= shift; """ The upper/lower case function declaration seems to be important, as changing it back to "sub FT_SO_map" lets the script work: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk # Input: NC_004547.gbk # working on region:NC_004547, Erwinia carotovora subsp. atroseptica SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, complete genome. # GFF3 saved to ./NC_004547.gbk.gff # Summary: # Feature Count # ------- ----- # repeat_region 19 # sequence_variant 2 # repeat_unit 2 # gene 4614 # region 17387 # exon 4597 # RESIDUES 5064019 # """ Obviously, this is another unsatsifactory sucky ad hoc post-install hack; I hope I'm doing the right sort of thing, there. I'm not familiar with BioPerl so I'm not clear on why this change was made to the interface (it's part of the recent changes by Nathan Liles you referred to in your post: http://github.com/bioperl/bioperl-live/commit/18dae5436130c7c77e31120af1a37d dcd8a77a03), but it also seems to break bp_genbank2gff3.pl. Also, the --noCDS flag appears to have no effect at all when using the new version of bp_genbank2gff3.pl. The old version of bp_genbank2gff3.pl appears to recognise more feature types in the summary: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk # Input: NC_004547.gbk # working on region:NC_004547, Erwinia carotovora subsp. atroseptica SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, complete genome. # GFF3 saved to ./NC_004547.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4472 # sequence_variant 2 # gene 4594 # region 8275 # pseudogene 20 # CDS 4472 # RESIDUES(tr) 1433791 # RESIDUES 5064019 # rRNA 22 # processed_transcript 24 # repeat_region 19 # pseudogenic_region 46 # repeat_unit 2 # exon 4597 # tRNA 76 # """ and this is reflected in the substantial difference in GFF3 output, for issuing exactly the same command when moving from BioPerl 1.6.1 to bioperl-live: we get different GFF3 output that represents a different gene model. I wasn't expecting so radical a change, but at least the IDs are based on the locus_tag with the new script, and this appears to solve my problem with clashing feature IDs on the files I was using. Many thanks for your help, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From invite+m4r54agn at facebookmail.com Mon Mar 15 09:13:29 2010 From: invite+m4r54agn at facebookmail.com (Animesh Sharma) Date: Mon, 15 Mar 2010 06:13:29 -0700 Subject: [Bioperl-l] =?utf-8?b?4KSu4KWH4KSw4KWAIEZhY2Vib29rIOCkquCljQ==?= =?utf-8?b?4KSw4KWL4KSr4KS84KS+4KSH4KSyIOCkpuClh+CkluClh+Ckgg==?= Message-ID: ??????? ????? Facebook ??????? ???? ?? ???? ?? ??? ???? ?????, ??????, ?? ??????? ????? ?? ????/???? ??? ?? ??? ???? ????? ?? ??? ??? ????? ???? ?????/????? ??? ???? ?? ?? ?????? ??? ????. ???? ???? ?? Facebook ?? ?????! ?? ??? ?? Facebook ?? ???? ????, ?? ?? ?? ???? Facebook ????????? ??? ???? ???. ??????? Animesh Facebook ?? ???? ?? ???? ?? ??? ???? ??? ?? ???? ?? ???? ????: http://www.facebook.com/p.php?i=533710399&k=53F2X5TR3TXF4BGFSBYVPVW2UPKK65&r Already have an account? Add this email address to your account http://www.facebook.com/n/?merge_accounts.php&e=bioperl-l at portal.open-bio.org&c=b3e84a2fc8af2503660e52d1ee5449c1.Animesh Sharma ?? Facebook ?? ????? ???? ?? ??? bioperl-l at portal.open-bio.org ????? ???. ??? ?????? ??? ?? Facebook ?? ?? ?????? ?? ???? ??????? ? ???? ????? ??? ?? ????? ???????????? ???? ?? ??? ???? ??? ?? ???? ?? ????? ????. http://www.facebook.com/o.php?k=3cf837&u=612036206&mid=2082fa6G247aee6eG0G8 Facebook ?? ????? 1601 S. California Ave., Palo Alto, CA 94304 ??? ????? ??. From scott at scottcain.net Mon Mar 15 10:55:17 2010 From: scott at scottcain.net (Scott Cain) Date: Mon, 15 Mar 2010 10:55:17 -0400 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: References: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Message-ID: <4536f7701003150755w2c2875fbob004bc03cf3387ab@mail.gmail.com> Hi Leighton, Thanks for the feedback both on getting chado installed from svn and on the genbank2gff3 converter. About installing Chado from svn, I thought I'd modified the Makefile.PL script to gracefully survive not having the GMODtools directory present; I guess I'll have to revisit that. Since I probably won't get to it today, I created a bug report for it: https://sourceforge.net/tracker/?func=detail&aid=2970687&group_id=27707&atid=391291 About the genbank2gff3 script, I'm cc'ing Nathan to make sure he sees your comments. Thanks, Scott On Mon, Mar 15, 2010 at 7:55 AM, Leighton Pritchard wrote: > Hi Scott, > > Thanks for the reply. ?I tried your suggestions on a clean VM of CentOS 5.4 > and the equally wordy outcome is below... > > On 02/03/2010 Tuesday, March 2, 16:11, "Scott Cain" > wrote: > >> First, I am working on the 1.1 release of gmod/chado, and it >> may fix some of the problems you are describing. ?Certainly, ID >> collisions between GFF files should not be a problem (I didn't think >> they were in the 1.0 release, but that was a long time ago). ?Please >> try a checkout of the schema trunk in the gmod svn: >> >> ? http://gmod.org/wiki/SVN > > As a note for anyone following this, when I downloaded the trunk/chado files > only, my build failed with > > """ > $make > [...] > Manifying ../blib/man3/Bio::Chaos::ChaosGraph.3pm > Manifying ../blib/man3/Bio::Chaos::FeatureUtil.3pm > Manifying ../blib/man3/Bio::Chaos::XSLTHelper.3pm > Manifying ../blib/man3/Bio::Chaos::Root.3pm > make[1]: Leaving directory `/home/lpritc/Desktop/chado/chaos-xml' > make: *** No rule to make target `bin/gmod_gff2biomart5.pl', needed by > `blib/script/gmod_gff2biomart5.pl'. ?Stop. > """ > > I had to download the whole trunk for the installation to work. ?I came > across this thread: > http://old.nabble.com/Minor-Makefile.PL-changes-td26272744.html > > while I was looking for a solution; someone else has had a similar problem. > >> Another thing you may want to look at is that just last week, a >> developer at Texas A&M, Nathan Liles, contributed code to the >> bioperl-live trunk for the genbank2gff3.pl script that will do a much >> better job of converting bacterial genbank files to GFF3; perhaps that >> will help too. ?Working with a svn checkout of bioperl-live shouldn't >> be too scary either; the pieces you are interested in (that work with >> Chado and GBrowse) are quite stable. > > I also checked out BioPerl-live. ?The svn server at code.open-bio.org was > unresponsive for a couple of days, but Peter pointed me to GitHub at > http://github.com/bioperl/bioperl-live so I went from there. ?The process > isn't quite as clean as using the latest stable version of BioPerl, however. > > When I attempt to use the bp_genbank2gff3.pl script, I get the following > error message: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > Can't locate object method "FT_SO_map" via package > "Bio::SeqFeature::Tools::TypeMapper" at /usr/bin/bp_genbank2gff3.pl line > 374. > """ > > This appears to be associated with the following code (l207 onwards...) in > TypeMapper: > > """ > =head2 map_types_to_SO > > [...] > > hardcodes the genbank to SO mapping > > [...] > dgg: separated out FT_SO_map for caller changes. Update with: > > ?open(FTSO,"curl -s > http://sequenceontology.org/resources/mapping/FT_SO.txt|"); > ?while(){ > ? ?chomp; ($ft,$so,$sid,$ftdef,$sodef)= split"\t"; > ? ?print " ? ? '$ft' => '$so',\n" if($ft && $so && $ftdef); > ?} > > =cut > > sub ft_so_map ?{ > ?# $self= shift; > """ > > The upper/lower case function declaration seems to be important, as changing > it back to "sub FT_SO_map" lets the script work: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > # Input: NC_004547.gbk > # working on region:NC_004547, Erwinia carotovora subsp. atroseptica > SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, > complete genome. > # GFF3 saved to ./NC_004547.gbk.gff > # Summary: > # Feature ? ? ? Count > # ------- ? ? ? ----- > # repeat_region ?19 > # sequence_variant ?2 > # repeat_unit ?2 > # gene ?4614 > # region ?17387 > # exon ?4597 > # RESIDUES ?5064019 > # > """ > > Obviously, this is another unsatsifactory sucky ad hoc post-install hack; I > hope I'm doing the right sort of thing, there. ?I'm not familiar with > BioPerl so I'm not clear on why this change was made to the interface (it's > part of the recent changes by Nathan Liles you referred to in your post: > http://github.com/bioperl/bioperl-live/commit/18dae5436130c7c77e31120af1a37d > dcd8a77a03), but it also seems to break bp_genbank2gff3.pl. ?Also, the > --noCDS flag appears to have no effect at all when using the new version of > bp_genbank2gff3.pl. > > The old version of bp_genbank2gff3.pl appears to recognise more feature > types in the summary: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > # Input: NC_004547.gbk > # working on region:NC_004547, Erwinia carotovora subsp. atroseptica > SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, > complete genome. > # GFF3 saved to ./NC_004547.gbk.gff > # Summary: > # Feature ? ? ? Count > # ------- ? ? ? ----- > # mRNA ?4472 > # sequence_variant ?2 > # gene ?4594 > # region ?8275 > # pseudogene ?20 > # CDS ?4472 > # RESIDUES(tr) ?1433791 > # RESIDUES ?5064019 > # rRNA ?22 > # processed_transcript ?24 > # repeat_region ?19 > # pseudogenic_region ?46 > # repeat_unit ?2 > # exon ?4597 > # tRNA ?76 > # > """ > > and this is reflected in the substantial difference in GFF3 output, for > issuing exactly the same command when moving from BioPerl 1.6.1 to > bioperl-live: we get different GFF3 output that represents a different gene > model. ?I wasn't expecting so radical a change, but at least the IDs are > based on the locus_tag with the new script, and this appears to solve my > problem with clashing feature IDs on the files I was using. > > Many thanks for your help, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From kiekyon.huang at gmail.com Mon Mar 15 11:44:13 2010 From: kiekyon.huang at gmail.com (kiekyon.huang at gmail.com) Date: Mon, 15 Mar 2010 15:44:13 +0000 Subject: [Bioperl-l] Taxonomy report Message-ID: <0016e64be064b8211f0481d8c02d@google.com> Hi, just like to know if there is there any way to generate the taxonomy report from the standalone blast output? thanks From cjfields at illinois.edu Mon Mar 15 11:57:29 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 15 Mar 2010 10:57:29 -0500 Subject: [Bioperl-l] Taxonomy report In-Reply-To: <0016e64be064b8211f0481d8c02d@google.com> References: <0016e64be064b8211f0481d8c02d@google.com> Message-ID: <53CE22BE-38F4-4EC6-80A9-37228A9CF602@illinois.edu> Not that I know of, at least not w/o doing some mapping (the tax report is generated on NCBI's servers last I recall). chris On Mar 15, 2010, at 10:44 AM, kiekyon.huang at gmail.com wrote: > Hi, > > just like to know if there is there any way to generate the taxonomy report from the standalone blast output? > > thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Mar 15 13:11:27 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 15 Mar 2010 10:11:27 -0700 Subject: [Bioperl-l] getting strand from Bio::Align::AlignI ?? In-Reply-To: <8425A547-149B-41F5-B4DB-A58C9E92B373@mail.nih.gov> References: <8425A547-149B-41F5-B4DB-A58C9E92B373@mail.nih.gov> Message-ID: <4B9E6A3F.6080104@bioperl.org> Did you start with Bio::SearchIO object and call get_aln on the HSP object? Strand is available from the $hsp->query->strand and $hsp->hit->strand and Bio::SearchIO is the preferred way of parsing pairwise alignment reports. Either way the sequences themselves have strands not the alignment. Each sequence should have a strand $seq->strand since they are Bio::LocatableSeq objects. for my $seq ( $aln->each_seq ) { print $seq->id, " ", $seq->strand, "\n"; } -jason Joan Pontius wrote, On 3/15/10 8:49 AM: > I am looking into using Bio::Align::AlignI for an application that > uses blast2seq > and can't figure out how to get the strand of an alignment? > > Thanks in advance > > > > Joan Pontius-Contractor SAIC > Laboratory of Genomic Diversity > Bldg 560-NCI > Frederick Maryland 21702 > phone (301)846-1761 > fax (301) 846-1686 From cjfields1 at gmail.com Mon Mar 15 14:57:08 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Mon, 15 Mar 2010 13:57:08 -0500 Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <6C998BD2392E4BF594F041368D9456E4@BlackJack> References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> Message-ID: <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> Francisco, In general, please address any questions directly to the bioperl mail list, in case I can't respond. The anon. svn on code.open-bio.org is down at the moment. OBF support knows about this problem and it's being addressed. There is a github mirror of the repos in case this happens: http://github.com/bioperl chris On Mar 15, 2010, at 10:38 AM, Francisco J. Ossand?n wrote: > Hello Chris Fields, > I have posted before in the Bugzilla about Bioperl bugs, but this time is about the Bioperl SVN. It has been several days since I could connect to the SVN for the last time (tried from different locations). I can't connect directly (svn://code.open-bio.org/bioperl/bioperl-live/trunk) nor using the http link provided in the wiki (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > There has been some change in the SVN address or configuration that I should update? I have seen devs posting in the Bugzilla about submitted revisions to the SVN, so I guess that it is working, but I still can't connect to it. > > I hope that you can help me with this. > > Regards, > > -- > Francisco J. Ossandon > Bioinformatician. > Ph.D. Student, University Andres Bello. > Center for Bioinformatics and Genome Biology, > Fundacion Ciencia para la Vida. > Santiago, Chile. > www.cienciavida.cl/CBGB.htm From hlapp at drycafe.net Tue Mar 16 16:03:50 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 16 Mar 2010 16:03:50 -0400 Subject: [Bioperl-l] [OT] Job opportunity: Training coordinator and Bioinformatics Project Manager Message-ID: <0CDDCED9-266E-4CCE-8240-D7E2C8522784@drycafe.net> Hi all - first off, sorry for the cross-posting, we're trying to advertise this as widely as possible. Second, apologies if this is committing an offense and considered spam. I thought though that there might be some people around here who may be interested and suitable. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== A unique position is available for a training coordinator and bioinformatics project manager at the U.S. National Evolutionary Synthesis Center in Durham, North Carolina (NESCent, http:// nescent.org). NESCent is a National Science Foundation funded research center managed by Duke University, the University of North Carolina at Chapel Hill and North Carolina State University on behalf of the international evolutionary biology community. NESCent facilitates synthetic research by bringing together diverse expertise, data, tools and concepts (Sidlauskas et al. 2009). In addition to a resident population of 20-30 scientists, the Center hosts over 800 visitors a year. An informatics staff is on-site to support resident and visiting scientists? needs in high-performance computing, electronic collaboration, scientific software and databases; this includes custom software development for a limited number of high- impact projects. NESCent?s informatics training program includes a rotating series of open-application summer courses, ad-hoc short courses for resident scientists, and remote internships (including past participation in the Google Summer of Code). The training coordinator and bioinformatics project manager will provide oversight to the Center?s training activities. The incumbent will also serve as the interface between scientists and software developers at NESCent. The position provides extensive opportunities for collaboration and intellectual engagement with both NESCent- sponsored scientists and informatics staff; however, this is not an independent research position. The incumbent will report to the Director, while overseeing the work of a small informatics team and coordinating activities among the Center?s science, education and informatics programs. Responsibilities: ? 50% - Consult with sponsored scientists (including scientists in residence and working group participants) about informatics resources and needs. Manage software product development by gathering requirements from scientists, participating in conceptual design, monitoring implementation progress and product quality, facilitating communication between software developers and scientists, and researching software solutions. ? 25% - Oversee NESCent?s course curriculum by identifying opportunities for onsite or online informatics courses that satisfy demand for advanced training of resident and visiting scientists, recruiting instructors, providing guidance to instructors in developing course syllabi, coordinating logistical and technical support requirements, conducting assessments, and serving as a liaison to course organizers at other institutions. ? 25% - Assisting in the management of NESCent?s summer informatics intern program, by coordinating the recruitment, application & review process for students, communicating expectations to students and mentors, monitoring student progress, documenting student outcomes, and performing assessments. Education: Required: M.S. in Biology, Bioinformatics, or a related field. Preferred: Ph.D. and two years postdoctoral experience in evolutionary biology, or an equivalent combination of relevant education and/or experience. Experience: Required: Excellent communication, interpersonal, and organizational skills. Experience with computationally oriented scientific research. Preferred: At least two years in development of databases and open source software. Organization, coordination, development and delivery of courses and workshops appropriate for graduate-level participants. Terms of Employment: Salary will be competitive and commensurate with experience. As a full-time employee, the incumbent will receive Duke University?s benefits package (http://hr.duke.edu/benefits/main.html). The position is available immediately and will remain open until filled. The position is currently funded through November 2014, contingent on annual renewal of the Center by the NSF. How to Apply: Please send a C.V., including contact information for three references, and a brief statement of interest to Allen Rodrigo, Director, NESCent, at a.rodrigo at nescent.org. Inquiries about suitability for the position are welcome. Duke University is an Equal Opportunity/Affirmative Action employer. Additional information about NESCent: http://www.nescent.org References: Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall LW, Price S, Scherle R, Spaeth PA, Kidd DM (2009) Linking Big: The Continuing Promise of Evolutionary Synthesis. Evolution. http://dx.doi.org/10.1111/j.1558-5646.2009.00892.x From hartzell at alerce.com Tue Mar 16 19:35:13 2010 From: hartzell at alerce.com (George Hartzell) Date: Tue, 16 Mar 2010 16:35:13 -0700 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check Message-ID: <19360.5553.985550.996751@gargle.gargle.HOWL> Apologies if this is as silly of a question as it seems, I think that I must just be decaffeinated this morning.... I'm cleaning up some modules and would like to express a dependency on BioPerl-run version 1.6.1. For the main bioperl I use Bio::Root::Version and 1.006001. That works, although the course of investigating below I found that Bio::Root::RootI (which uses BR::Version) doesn't. A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) use Bio::Root::Version and thereby acquire a reasonable version number but: a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency when I want bioperl-run c) it's funny that PiseWorkflow uses Bio::Root::Version (which imports a $VERSION into it's package) then goes on to set one itself. b) there's something hinky going on, when I do 'perl Build.PL' on my Task it doesn't think that PiseWorkflow is up to date (it thinks I have version (0) if I understand correctly), but when I './Build installdeps' everything appears up to date. It looks like the trickiness of assigning $Bio::Root::Version::VERSION to $VERSION confuses Module::Build::ModuleInfo::_evaluate_version_line and the result is that VERSION appears to be 0. What's The Right Thing to do? Thanks, g. From maj at fortinbras.us Wed Mar 17 10:41:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Mar 2010 10:41:00 -0400 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check In-Reply-To: <19360.5553.985550.996751@gargle.gargle.HOWL> References: <19360.5553.985550.996751@gargle.gargle.HOWL> Message-ID: I'd say the RTTD would be to submit a bugzilla report; this sounds pretty fishy to me--(esp since the Pise stuff is deprecated, IIRC) cheers MAJ ----- Original Message ----- From: "George Hartzell" To: "bioperl-l List" Sent: Tuesday, March 16, 2010 7:35 PM Subject: [Bioperl-l] What's to depend on for BioPerl-run version check > > Apologies if this is as silly of a question as it seems, I think that > I must just be decaffeinated this morning.... > > I'm cleaning up some modules and would like to express a dependency on > BioPerl-run version 1.6.1. > > For the main bioperl I use Bio::Root::Version and 1.006001. That > works, although the course of investigating below I found that > Bio::Root::RootI (which uses BR::Version) doesn't. > > A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) > use Bio::Root::Version and thereby acquire a reasonable version number > but: > > a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency > when I want bioperl-run > c) it's funny that PiseWorkflow uses Bio::Root::Version (which > imports a $VERSION into it's package) then goes on to set one > itself. > b) there's something hinky going on, when I do 'perl Build.PL' on my > Task it doesn't think that PiseWorkflow is up to date (it thinks > I have version (0) if I understand correctly), but when I > './Build installdeps' everything appears up to date. > > It looks like the trickiness of assigning > $Bio::Root::Version::VERSION to $VERSION confuses > Module::Build::ModuleInfo::_evaluate_version_line and the result > is that VERSION appears to be 0. > > What's The Right Thing to do? > > Thanks, > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From janine.arloth at googlemail.com Mon Mar 15 04:15:50 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Mon, 15 Mar 2010 09:15:50 +0100 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: Hello, exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) Best regards From cjfields at illinois.edu Wed Mar 17 11:13:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Mar 2010 10:13:20 -0500 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check In-Reply-To: References: <19360.5553.985550.996751@gargle.gargle.HOWL> Message-ID: <32C28662-BD24-4270-A0B6-71CEB459172C@illinois.edu> What is probably the best thing to do is set up a stub module for each of the subdistributions that contains a proper version to match against. So, for BioPerl-Run, use Bio::Run or Bio::Tools::Run, BioPerl-DB use Bio::DB, etc. Distribution-specific general documentation would go in those stub modules. I sort of started this, with the first alphas but didn't get around to finishing it up. Just as a footnote, the universal $VERSION thingy was set up quite a while ago, prior to perl 5.8 I believe, and doesn't play very well with $VERSION (and version.pm) on newer perl versions. Once we move beyond 1.6.x towards breaking things up we'll have to assign new VERSIONs to anything released independently on CPAN, anyway, so this may eventually be a moot point. chris The inherited $VERSION thingy was set up a while back, basically as a way of assigning a common version across BioPerl. On Mar 17, 2010, at 9:41 AM, Mark A. Jensen wrote: > I'd say the RTTD would be to submit a bugzilla report; this sounds pretty fishy > to me--(esp since the Pise stuff is deprecated, IIRC) cheers MAJ > ----- Original Message ----- From: "George Hartzell" > To: "bioperl-l List" > Sent: Tuesday, March 16, 2010 7:35 PM > Subject: [Bioperl-l] What's to depend on for BioPerl-run version check > > >> Apologies if this is as silly of a question as it seems, I think that >> I must just be decaffeinated this morning.... >> I'm cleaning up some modules and would like to express a dependency on >> BioPerl-run version 1.6.1. >> For the main bioperl I use Bio::Root::Version and 1.006001. That >> works, although the course of investigating below I found that >> Bio::Root::RootI (which uses BR::Version) doesn't. >> A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) >> use Bio::Root::Version and thereby acquire a reasonable version number >> but: >> a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency >> when I want bioperl-run >> c) it's funny that PiseWorkflow uses Bio::Root::Version (which >> imports a $VERSION into it's package) then goes on to set one >> itself. >> b) there's something hinky going on, when I do 'perl Build.PL' on my >> Task it doesn't think that PiseWorkflow is up to date (it thinks >> I have version (0) if I understand correctly), but when I >> './Build installdeps' everything appears up to date. >> It looks like the trickiness of assigning >> $Bio::Root::Version::VERSION to $VERSION confuses >> Module::Build::ModuleInfo::_evaluate_version_line and the result >> is that VERSION appears to be 0. >> What's The Right Thing to do? >> Thanks, >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robfsouza at gmail.com Wed Mar 17 11:20:21 2010 From: robfsouza at gmail.com (robfsouza) Date: Wed, 17 Mar 2010 08:20:21 -0700 (PDT) Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> Message-ID: <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> Hi Chris, Any idea when the SVN is going to be fixed? I could not find tar.gz or other download methods in github... Robson On Mar 15, 2:57?pm, Christopher Fields wrote: > Francisco, > > In general, please address any questions directly to the bioperl mail list, in case I can't respond. ? > > The anon. svn on code.open-bio.org is down at the moment. ?OBF support knows about this problem and it's being addressed. ?There is a github mirror of the repos in case this happens: > > http://github.com/bioperl > > chris > > On Mar 15, 2010, at 10:38 AM, Francisco J. Ossand?n wrote: > > > > > Hello Chris Fields, > > I have posted before in the Bugzilla about Bioperl bugs, but this time is about the Bioperl SVN. It has been several days since I could connect to the SVN for the last time (tried from different locations). I can't connect directly (svn://code.open-bio.org/bioperl/bioperl-live/trunk) nor using the http link provided in the wiki (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > > There has been some change in the SVN address or configuration that I should update? I have seen devs posting in the Bugzilla about submitted revisions to the SVN, so I guess that it is working, but I still can't connect to it. > > > I hope that you can help me with this. > > > Regards, > > > -- > > Francisco J. Ossandon > > Bioinformatician. > > Ph.D. Student, University Andres Bello. > > Center for Bioinformatics and Genome Biology, > > Fundacion Ciencia para la Vida. > > Santiago, Chile. > >www.cienciavida.cl/CBGB.htm > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From adsj at novozymes.com Wed Mar 17 12:00:34 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 17 Mar 2010 17:00:34 +0100 Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> (robfsouza@gmail.com's message of "Wed, 17 Mar 2010 08:20:21 -0700 (PDT)") References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> Message-ID: <874okfsztp.fsf@topper.koldfront.dk> On Wed, 17 Mar 2010 08:20:21 -0700 (PDT), robfsouza wrote: > Any idea when the SVN is going to be fixed? I could not find tar.gz or > other download methods in github... If you don't want to "git clone http://github.com/bioperl/bioperl-live.git", you can click on the "Download source" link in the upper right corner of http://github.com/bioperl/bioperl-live and you'll get to choose between downloading tar or zip. Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Wed Mar 17 12:12:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Mar 2010 11:12:42 -0500 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: <53EECF69-E9CE-4619-BE0A-97BE55754D8E@illinois.edu> Janine, How would you go about doing that from the BLAST report alone (which doesn't store the whole sequence)? Unless you know something I don't, you'll need to pull the unique identifier for the sequence from the hit object while parsgin the report and grab the seq from a local or remote database (or use fastacmd or it's equivalent in blast+). chris On Mar 15, 2010, at 3:15 AM, Janine Arloth wrote: > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 15:48:27 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 08:48:27 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > Sent: Monday, 15 March 2010 9:16 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the > hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From michael.watson at bbsrc.ac.uk Wed Mar 17 16:47:57 2010 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 17 Mar 2010 20:47:57 +0000 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> Message-ID: <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> I think that relies on the blast database being built with the "-o T" option, which is not the default for formatdb.... ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] On Behalf Of Smithies, Russell [Russell.Smithies at agresearch.co.nz] Sent: 17 March 2010 19:48 To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > Sent: Monday, 15 March 2010 9:16 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the > hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 17:07:29 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 10:07:29 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A725D@exchsth.agresearch.co.nz> Precompiled databases from NCBI are built with "-o T" but when building them yourself, the default is "-o F". We build all ours with "-o T" as we have some extra stuff built into our to retrieve sequences for all your blast hits. Here's an example of our sequence retrieval: https://isgcdata.agresearch.co.nz/cgi-bin/blast_results.py?filename=xCW3ez7FU46qvpKNTGNu9ZXnw&submit_time=1268859815.54&database=isgcdata_raw --Russell > -----Original Message----- > From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] > Sent: Thursday, 18 March 2010 9:48 a.m. > To: Smithies, Russell; 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' > Subject: RE: [Bioperl-l] SearchIO, StandAloneBlastPlus > > I think that relies on the blast database being built with the "-o T" > option, which is not the default for formatdb.... > ________________________________________ > From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Smithies, Russell > [Russell.Smithies at agresearch.co.nz] > Sent: 17 March 2010 19:48 > To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus > > If you're running blast locally, use fastacmd to extract the sequences > from the blast database. > Eg fastacmd -d nr -S AC147927 > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > > Sent: Monday, 15 March 2010 9:16 p.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > > > Hello, > > > > exists a possibility to get/extract the whole hit sequences? (Not only > the > > hit string from the alignment with $hsp->$hit_string;) > > > > Best regards > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 17:53:38 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 10:53:38 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <99D9C34C-655F-4BBC-AD01-83E2EC837317@gmail.com> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> <18DF7D20DFEC044098A1062202F5FFF32C6E2A725D@exchsth.agresearch.co.nz> <99D9C34C-655F-4BBC-AD01-83E2EC837317@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A72BD@exchsth.agresearch.co.nz> It's all a bit complicated as this page is on a public site but our blast server is internal and restricted so there's no direct communication between them. The public site takes the data from the blast requect and writes it to a template file then puts it in a folder that the internal blast server checks every 10 seconds. When a new request is found, it does the blast , creates the image and map with Bio::Graphics, then transfers it to a folder on the public server. As a sneaky bodge so I don't have to transfer the image, it's base64 encoded in the html then stripped out later. The blast result page keeps refreshing until it sees the required result has returned then displays the page. It sounds a bit odd but as blast runs on one of our main servers, we don't want anyone to be able to "accidently" run commands on it - no one has hacked our servers yet :) There's some good stuff in the BioPerl howtos http://www.bioperl.org/wiki/HOWTO:Graphics and http://www.bioperl.org/wiki/HOWTO:SearchIO Bio::SearchIO::Writer::HTMLResultWriter can be quite useful though ours is html-ized 'manually' as it's streamed through a post-processing script. --Russell From: Janine Arloth [mailto:janine.arloth at googlemail.com] Sent: Thursday, 18 March 2010 10:33 a.m. To: Smithies, Russell Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus Thank you very much. Can I ask you, how you get the figure in the blast output (blastmap)? I use use Bio::Graphics; But i did not see how to create this figure? Best Regards Am 17.03.2010 um 22:07 schrieb Smithies, Russell: Precompiled databases from NCBI are built with "-o T" but when building them yourself, the default is "-o F". We build all ours with "-o T" as we have some extra stuff built into our to retrieve sequences for all your blast hits. Here's an example of our sequence retrieval: https://isgcdata.agresearch.co.nz/cgi-bin/blast_results.py?filename=xCW3ez7FU46qvpKNTGNu9ZXnw&submit_time=1268859815.54&database=isgcdata_raw --Russell -----Original Message----- From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Sent: Thursday, 18 March 2010 9:48 a.m. To: Smithies, Russell; 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: RE: [Bioperl-l] SearchIO, StandAloneBlastPlus I think that relies on the blast database being built with the "-o T" option, which is not the default for formatdb.... ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open- bio.org] On Behalf Of Smithies, Russell [Russell.Smithies at agresearch.co.nz] Sent: 17 March 2010 19:48 To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- bounces at lists.open-bio.org] On Behalf Of Janine Arloth Sent: Monday, 15 March 2010 9:16 p.m. To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus Hello, exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) Best regards _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From armendarez77 at hotmail.com Thu Mar 18 12:27:20 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Thu, 18 Mar 2010 09:27:20 -0700 Subject: [Bioperl-l] Bio::DB::RefSeq and iPrism Web Filter Message-ID: Hello, I'm having a problem involving my company's StBernard iPrism Web Filter. I would like to be able to run my scripts (include Bio::DB::RefSeq, Bio::DB::GenBank) via crontab, however the web filter requires me to log in every 8 hours. The administrator removed the filter however, my scripts still failed. I then logged into iPrism and the scripts worked. The system administrators say its the script; that it is somehow caching information and preventing itself from accessing the internet. I'm using the following modules: strict, DBI, Bio::Perl, Bio::SeqIO, Getopt::Long and Bio::Tools::Run::StandAloneBlast. I would include the script, but it's a bit involved and passes arguments to other scripts. Thank you, Veronica _________________________________________________________________ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850553/direct/01/ From cjfields at illinois.edu Thu Mar 18 13:21:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Mar 2010 12:21:22 -0500 Subject: [Bioperl-l] Bio::DB::RefSeq and iPrism Web Filter In-Reply-To: References: Message-ID: Veronica, No caching occurs that I know of. If you have a environment proxy set somehow it will use that, using LWP::UserAgent and env_proxy() (your logging in via iPrism makes me think it is something along those lines). Otherwise the proxy has to be explicitly set for each object, so no caching is apparent. Could you have a local environment proxy set that you're unaware of? See here for examples: http://search.cpan.org/~gaas/libwww-perl-5.834/lib/LWP/UserAgent.pm#Proxy_attributes You could try something like this after you create the instances, which accesses the LWP::UserAgent instance cached in the relevant class and shuts off proxies: $db->ua->no_proxy(); Otherwise, you can try coming up with a minimal test case indicating what happens (including any output) and file a bug report, just in case. chris On Mar 18, 2010, at 11:27 AM, wrote: > > Hello, > > I'm having a problem involving my company's StBernard iPrism Web Filter. I would like to be able to run my scripts (include Bio::DB::RefSeq, Bio::DB::GenBank) via crontab, however the web filter requires me to log in every 8 hours. The administrator removed the filter however, my scripts still failed. I then logged into iPrism and the scripts worked. > > The system administrators say its the script; that it is somehow caching information and preventing itself from accessing the internet. I'm using the following modules: strict, DBI, Bio::Perl, Bio::SeqIO, Getopt::Long and Bio::Tools::Run::StandAloneBlast. > > I would include the script, but it's a bit involved and passes arguments to other scripts. > > Thank you, > > Veronica > > > > _________________________________________________________________ > Hotmail: Trusted email with powerful SPAM protection. > http://clk.atdmt.com/GBL/go/210850553/direct/01/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Mar 18 17:11:34 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 18 Mar 2010 14:11:34 -0700 Subject: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4BA29706.8040606@cornell.edu> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2010 Administrator From me at miguel.weapps.com Thu Mar 18 19:33:16 2010 From: me at miguel.weapps.com (Luis M Rodriguez-R) Date: Thu, 18 Mar 2010 18:33:16 -0500 Subject: [Bioperl-l] GSoC-2010 & the semantic web Message-ID: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Hello all, I would like to know how to apply to the GSoC-2010, and when it is planned to be performed. I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf, uniprot and some onthologies, but it could also be useful to integrate OWL, for example). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL for a discoverable "bio-cloud". The people of bio2rdf already have some parsers, but there are still a lot of things to do. Best regards, Luis. Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 luisrodr at uniandes.edu.co me at miguel.weapps.com From rhythmbox-devel at maubp.freeserve.co.uk Thu Mar 18 20:25:05 2010 From: rhythmbox-devel at maubp.freeserve.co.uk (Peter) Date: Fri, 19 Mar 2010 00:25:05 +0000 Subject: [Bioperl-l] GSoC-2010 & the semantic web In-Reply-To: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> References: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Message-ID: <320fb6e01003181725j2aa1268am80ae7649bd873b46@mail.gmail.com> On Thu, Mar 18, 2010 at 11:33 PM, Luis M Rodriguez-R wrote: > > I think there are great development opportunities in information > discovery using semantic web (I'm familiar with RDF in bio2rdf, > uniprot and some onthologies, ... Have a read of the wiki pages from this recent hackathon - it should be of interested to you: http://hackathon3.dbcls.jp/ Peter From cjfields at illinois.edu Thu Mar 18 20:29:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Mar 2010 19:29:19 -0500 Subject: [Bioperl-l] GSoC-2010 & the semantic web In-Reply-To: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> References: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Message-ID: <0FADD2C6-9458-4E0C-ADB5-E4C0F18A79D8@illinois.edu> Luis, See this page for the specifics: http://www.open-bio.org/wiki/Google_Summer_of_Code There are several proposed projects already listed, feel free to add yours to the page. I'm assuming these will be OBF-focused, so tying your proposal to one of the OBF projects is probably a good idea. chris On Mar 18, 2010, at 6:33 PM, Luis M Rodriguez-R wrote: > Hello all, > > I would like to know how to apply to the GSoC-2010, and when it is planned to be performed. > > I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf, uniprot and some onthologies, but it could also be useful to integrate OWL, for example). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL for a discoverable "bio-cloud". > > The people of bio2rdf already have some parsers, but there are still a lot of things to do. > > Best regards, > Luis. > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ross at cuhk.edu.hk Sat Mar 20 19:55:35 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 21 Mar 2010 07:55:35 +0800 Subject: [Bioperl-l] automation of translation based on alignment Message-ID: <002c01cac888$d570fe20$8052fa60$@edu.hk> Dear bioperl users, I am working on virus sequences and one of the Genbank file is here: http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu m with 1000 such nucleotide sequences, I'd like to translate the corresponding protein coding sequences. The difficulties lie in: 1) The genome sequence is circular 2) The genes are overlapping I don't have all the 1000 Genbank files but I plan to use the above guide one to direct the automation process. Has bioperl implemented specialized functions to handle this kind of problem? Thanks a lot for your advice, Ross From florent.angly at gmail.com Sun Mar 21 20:44:11 2010 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 22 Mar 2010 10:44:11 +1000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <002c01cac888$d570fe20$8052fa60$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> Message-ID: <4BA6BD5B.9010509@gmail.com> Hi Ross, It seems like your answer is in the link you put. On this link, all the coding sequences are already identified and their aminoacid sequence provided. You simply need to parse all the GenBank entries to extract this information. You may use EUtilities to achieve this online: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook Florent On 21/03/10 09:55, Ross KK Leung wrote: > Dear bioperl users, > > > > I am working on virus sequences and one of the Genbank file is here: > > > > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > m > > > > with 1000 such nucleotide sequences, I'd like to translate the corresponding > protein coding sequences. The difficulties lie in: > > > > 1) The genome sequence is circular > > 2) The genes are overlapping > > > > I don't have all the 1000 Genbank files but I plan to use the above guide > one to direct the automation process. Has bioperl implemented specialized > functions to handle this kind of problem? > > > > Thanks a lot for your advice, Ross > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Sun Mar 21 21:14:27 2010 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 22 Mar 2010 11:14:27 +1000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <004d01cac95c$15c95250$415bf6f0$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> Message-ID: <4BA6C473.4090404@gmail.com> Hi Ross, Please keep relies on the BioPerl mailing list so that everyone benefits. You should give detailed explanations of what you are tying to achieve., e.g.: * What type of input file do you have? * Do you already know the location of the ORFs? * what is the multiple alignments you are talking about ... Florent On 22/03/10 11:07, Ross KK Leung wrote: > Dear Florent, > > Thanks for your response. While the one with Genbank file can be extracted, > those without have to rely on alignment. Scripts certainly can be written to > move forward and backward on the multiple alignment but it is an error-prone > process and that's why I raised this question. > > Rgds, Ross > > > > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Monday, March 22, 2010 8:44 AM > To: Ross KK Leung > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] automation of translation based on alignment > > Hi Ross, > It seems like your answer is in the link you put. On this link, all the > coding sequences are already identified and their aminoacid sequence > provided. You simply need to parse all the GenBank entries to extract > this information. You may use EUtilities to achieve this online: > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > Florent > > On 21/03/10 09:55, Ross KK Leung wrote: > >> Dear bioperl users, >> >> >> >> I am working on virus sequences and one of the Genbank file is here: >> >> >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> >> > >> tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> >> >> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > >> m >> >> >> >> with 1000 such nucleotide sequences, I'd like to translate the >> > corresponding > >> protein coding sequences. The difficulties lie in: >> >> >> >> 1) The genome sequence is circular >> >> 2) The genes are overlapping >> >> >> >> I don't have all the 1000 Genbank files but I plan to use the above guide >> one to direct the automation process. Has bioperl implemented specialized >> functions to handle this kind of problem? >> >> >> >> Thanks a lot for your advice, Ross >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From ross at cuhk.edu.hk Sun Mar 21 21:22:47 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 09:22:47 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <4BA6C473.4090404@gmail.com> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> Message-ID: <004e01cac95e$2e375f10$8aa61d30$@edu.hk> Dear Florent, Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem details: Input: 1000 multiple aligned DNA sequences One of them has Genbank file http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 the remaining 999 ones only have genomic sequences. Objective: to derive the cognate protein aligned sequences. (here have 4 sets as there are 4 overlapping genes) Difficulties: 1) circular genome 2) there may be in-dels Hope now the problem has been clarified, Ross -----Original Message----- From: Florent Angly [mailto:florent.angly at gmail.com] Sent: Monday, March 22, 2010 9:14 AM To: Ross KK Leung; bioperl-l List Subject: Re: [Bioperl-l] automation of translation based on alignment Hi Ross, Please keep relies on the BioPerl mailing list so that everyone benefits. You should give detailed explanations of what you are tying to achieve., e.g.: * What type of input file do you have? * Do you already know the location of the ORFs? * what is the multiple alignments you are talking about ... Florent On 22/03/10 11:07, Ross KK Leung wrote: > Dear Florent, > > Thanks for your response. While the one with Genbank file can be extracted, > those without have to rely on alignment. Scripts certainly can be written to > move forward and backward on the multiple alignment but it is an error-prone > process and that's why I raised this question. > > Rgds, Ross > > > > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Monday, March 22, 2010 8:44 AM > To: Ross KK Leung > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] automation of translation based on alignment > > Hi Ross, > It seems like your answer is in the link you put. On this link, all the > coding sequences are already identified and their aminoacid sequence > provided. You simply need to parse all the GenBank entries to extract > this information. You may use EUtilities to achieve this online: > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > Florent > > On 21/03/10 09:55, Ross KK Leung wrote: > >> Dear bioperl users, >> >> >> >> I am working on virus sequences and one of the Genbank file is here: >> >> >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> >> > >> tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> >> >> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > >> m >> >> >> >> with 1000 such nucleotide sequences, I'd like to translate the >> > corresponding > >> protein coding sequences. The difficulties lie in: >> >> >> >> 1) The genome sequence is circular >> >> 2) The genes are overlapping >> >> >> >> I don't have all the 1000 Genbank files but I plan to use the above guide >> one to direct the automation process. Has bioperl implemented specialized >> functions to handle this kind of problem? >> >> >> >> Thanks a lot for your advice, Ross >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From cjfields at illinois.edu Sun Mar 21 23:40:34 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 21 Mar 2010 22:40:34 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <004e01cac95e$2e375f10$8aa61d30$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> Message-ID: <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> On Mar 21, 2010, at 8:22 PM, Ross KK Leung wrote: > Dear Florent, > > Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem > details: > > Input: > > 1000 multiple aligned DNA sequences > One of them has Genbank file > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > > the remaining 999 ones only have genomic sequences. > > Objective: to derive the cognate protein aligned sequences. (here have 4 > sets as there are 4 overlapping genes) > > Difficulties: > 1) circular genome > 2) there may be in-dels To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? One could try converting the reference sequence's feature coordinates to alignment column-based positions, pull sub-alignments out from there, then translate each sequence. There would be no need to re-retrieve sequences which are already present in the alignment, unless there is something not mentioned above that I'm missing. Re: circular genomes: recent commits to bioperl should allow handling circular genomes with features and subsequence extraction. If not I would consider that a serious bug that needs to be reported. If you need to grab remote sequences from a larger set of sequences (either locally or remotely) and translate them, you can use Bio::DB::GenBank, which will directly return a Bio::Seq object. Note you would obviously have to reset these per ID based on the start/end/strand: my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => 100, -seq_stop => 200, -strand => 1); my $seqobj = $gb->get_Seq_by_id($id); # or get_Seq_by_acc($acc) # do any preprocessing here... my $protein_seqobj = $seq->translate; If you want you could also download the sequences and use one of the various flatfile database classes to work with them (I believe Bio::DB::Fasta extracts subsequences very rapidly). It might be faster. For those regions that cross the origin you may need to pull two sequences and join them somehow, as the sequences likely won't run a join automatically. > Hope now the problem has been clarified, Ross Hope this helps. chris From ross at cuhk.edu.hk Mon Mar 22 01:30:06 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 13:30:06 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> Message-ID: <006901cac980$bb60f190$3222d4b0$@edu.hk> Dear Chris, It seems that Bioperl is "clever" enough to "rectify" my start and stop by reversing the order. e.g. start = 2300 stop = 1600 It will reverse back to 1600 and then 2300. What else to tell that I'm now working on a circular genome? -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Monday, March 22, 2010 11:41 AM To: Ross KK Leung Cc: 'Florent Angly'; 'bioperl-l List' Subject: Re: [Bioperl-l] automation of translation based on alignment On Mar 21, 2010, at 8:22 PM, Ross KK Leung wrote: > Dear Florent, > > Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem > details: > > Input: > > 1000 multiple aligned DNA sequences > One of them has Genbank file > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > > the remaining 999 ones only have genomic sequences. > > Objective: to derive the cognate protein aligned sequences. (here have 4 > sets as there are 4 overlapping genes) > > Difficulties: > 1) circular genome > 2) there may be in-dels To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? One could try converting the reference sequence's feature coordinates to alignment column-based positions, pull sub-alignments out from there, then translate each sequence. There would be no need to re-retrieve sequences which are already present in the alignment, unless there is something not mentioned above that I'm missing. Re: circular genomes: recent commits to bioperl should allow handling circular genomes with features and subsequence extraction. If not I would consider that a serious bug that needs to be reported. If you need to grab remote sequences from a larger set of sequences (either locally or remotely) and translate them, you can use Bio::DB::GenBank, which will directly return a Bio::Seq object. Note you would obviously have to reset these per ID based on the start/end/strand: my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => 100, -seq_stop => 200, -strand => 1); my $seqobj = $gb->get_Seq_by_id($id); # or get_Seq_by_acc($acc) # do any preprocessing here... my $protein_seqobj = $seq->translate; If you want you could also download the sequences and use one of the various flatfile database classes to work with them (I believe Bio::DB::Fasta extracts subsequences very rapidly). It might be faster. For those regions that cross the origin you may need to pull two sequences and join them somehow, as the sequences likely won't run a join automatically. > Hope now the problem has been clarified, Ross Hope this helps. chris From cjfields at illinois.edu Mon Mar 22 08:58:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 07:58:00 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <006901cac980$bb60f190$3222d4b0$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> Message-ID: <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > Dear Chris, > > It seems that Bioperl is "clever" enough to "rectify" my start and stop by > reversing the order. > > e.g. > start = 2300 > stop = 1600 > > It will reverse back to 1600 and then 2300. > What else to tell that I'm now working on a circular genome? Reverse it where, the alignment or the feature? The svn version of BioPerl, for alignments, retains strand information (this was a bug that was fixed). For features, start is always less than end, with directionality determined by strand. For a circular genome, the feature is split across the origin, as you have seen in the original sequence you posted: ... gene join(2307..3215,1..1623) /gene="P" ... This would be represented as a Bio::Location::SplitLocation in the feature; it would joined based on that order if $seq->is_circular() is true (or at least it should). In cases like this, the safe bet is to call spliced_seq() to get the joined sequence in all cases, then call translate() to get the protein sequence. chris From ross at cuhk.edu.hk Mon Mar 22 09:17:05 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 21:17:05 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> Message-ID: <011701cac9c1$f7b89260$e729b720$@edu.hk> Chris, The following codes are what I use to retrieve sequences from GenBank. I know that I can use something like: for my $feature ($seqobj->get_SeqFeatures){ if ($feature->primary_tag eq "CDS") { ... To get features, but how should Bio::Location::SplitLocation be used? Do you mean something like: If ($seq->is_circular()) { spliced_seq(); } ? But the genome indeed has several such spliced sequences then how can I specify which is to retrieve? Thanks for your advice again~ #!/usr/bin/perl use Bio::SeqIO::genbank; use Bio::DB::GenBank; use Bio::DB::RefSeq; $gb = new Bio::DB::GenBank; my ($acc, $start, $stop) = @ARGV; my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => "$start", -seq_stop => "$stop", -strand => 1); $gbout = $acc; $seq = $gb->get_Seq_by_acc($acc); print "seq is ", $seq->seq, "\n"; $seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' ); $seqio_obj->write_seq($seq); exit; -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Monday, March 22, 2010 8:58 PM To: Ross KK Leung Cc: 'Florent Angly'; 'bioperl-l List' Subject: Re: [Bioperl-l] automation of translation based on alignment On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > Dear Chris, > > It seems that Bioperl is "clever" enough to "rectify" my start and stop by > reversing the order. > > e.g. > start = 2300 > stop = 1600 > > It will reverse back to 1600 and then 2300. > What else to tell that I'm now working on a circular genome? Reverse it where, the alignment or the feature? The svn version of BioPerl, for alignments, retains strand information (this was a bug that was fixed). For features, start is always less than end, with directionality determined by strand. For a circular genome, the feature is split across the origin, as you have seen in the original sequence you posted: ... gene join(2307..3215,1..1623) /gene="P" ... This would be represented as a Bio::Location::SplitLocation in the feature; it would joined based on that order if $seq->is_circular() is true (or at least it should). In cases like this, the safe bet is to call spliced_seq() to get the joined sequence in all cases, then call translate() to get the protein sequence. chris From jessica.sun at gmail.com Mon Mar 22 14:48:38 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Mon, 22 Mar 2010 14:48:38 -0400 Subject: [Bioperl-l] using Bio::SeqFeature::Tools::Unflattener Message-ID: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> Does any know how to get CDS of the corresponding mRNA accession(NM_) using this function? *Bio::SeqFeature::Tools::Unflattener many thanks in advance. * -- Jessica Jingping Sun From cjfields at illinois.edu Mon Mar 22 14:56:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 13:56:30 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature spliced_seq() Message-ID: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> I have just noticed that spliced_seq() is borked with Bio::DB::SeqFeature and am thinking about implementing it. Or is similar functionality already implemented elsewhere? Currently, it is calling entire_seq(), which I plan on avoiding simply to prevent sucking in the entire sequence into memory. This is currently what happens: --------------------------- my $it = $store->get_seq_stream(-type => 'mRNA'); my $ct = 0; while (my $sf = $it->next_seq) { my $seq = $sf->spliced_seq; # dies with exception } --------------------------- ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented by package Bio::DB::SeqFeature. This is not your fault - author of Bio::DB::SeqFeature should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 STACK: Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 STACK: Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 STACK: Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 STACK: beestore.pl:17 ---------------------------------------------------------------- chris From csembry at ualr.edu Mon Mar 22 15:48:56 2010 From: csembry at ualr.edu (Charles Embry) Date: Mon, 22 Mar 2010 14:48:56 -0500 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista Message-ID: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> I want to create a Gui that will use current bioperl modules(along with some I am writing). It will be on a windows machine that runs XP and maybe a laptop with Vista.(this is a project i am working on in Graduate school for a professor). It will be id'ing promoter types in eukaryote organisms and also do multiple alignments. What recommendations do yo suggest to use t develop this? A java application? If so how hard is it to get Java to use perl and bioperl modules? Another language? Is there a tool to directly develop a GUI for bioperl modules that does no use another language? I will need to tag certain sequences with user specified colors and such. Thanks for the help From cjfields at illinois.edu Mon Mar 22 16:20:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 15:20:24 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <011701cac9c1$f7b89260$e729b720$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> <011701cac9c1$f7b89260$e729b720$@edu.hk> Message-ID: On Mar 22, 2010, at 8:17 AM, Ross KK Leung wrote: > Chris, > > The following codes are what I use to retrieve sequences from GenBank. I > know that I can use something like: > > for my $feature ($seqobj->get_SeqFeatures){ > > if ($feature->primary_tag eq "CDS") { > ... > > To get features, but how should > > Bio::Location::SplitLocation > > be used? Do you mean something like: > > If ($seq->is_circular()) { > spliced_seq(); > } You probably won't directly see the SplitLocation itself unless you explicitly request it (it is contained in the sequence feature). Okay, so if you are trying to retrieve the sequence for a specific feature, you can use $sf->seq() (simple subsequence from start to end corrected for strand of feature). However, in the case where the feature crosses the origin it will contain a split location. In this case, you should call $sf->spliced_seq() to retrieve spliced sequence. For convenience, you could call spliced_seq on all sequence features; for simple locations it will just return the ordinary subseq(). So, if one had a generic sequence feature, one could call: $sf->spliced_seq->translate; to get the Bio::Seq object that is the translation of the seq feature region. > ? But the genome indeed has several such spliced sequences then how can I > specify which is to retrieve? Thanks for your advice again~ Do you mean alternatively spliced variants? These would be designated as separate features in a GenBank file, so you would check for those. Otherwise you'll have to clarify. If you haven't read them yet I suggest looking over the HOWTOs, specifically ones covering Seq/SeqIO and Feature/Annotation to get an idea of what is possible. chris > #!/usr/bin/perl > > use Bio::SeqIO::genbank; use Bio::DB::GenBank; > > use Bio::DB::RefSeq; > > > > $gb = new Bio::DB::GenBank; > > > > my ($acc, $start, $stop) = @ARGV; > > > > my $gb = Bio::DB::GenBank->new(-format => 'Fasta', > > -seq_start => "$start", > > -seq_stop => "$stop", > > -strand => 1); > > > > $gbout = $acc; > > > > $seq = $gb->get_Seq_by_acc($acc); > > print "seq is ", $seq->seq, "\n"; > > > > $seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' ); > > $seqio_obj->write_seq($seq); > > exit; > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Monday, March 22, 2010 8:58 PM > To: Ross KK Leung > Cc: 'Florent Angly'; 'bioperl-l List' > Subject: Re: [Bioperl-l] automation of translation based on alignment > > On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > >> Dear Chris, >> >> It seems that Bioperl is "clever" enough to "rectify" my start and stop by >> reversing the order. >> >> e.g. >> start = 2300 >> stop = 1600 >> >> It will reverse back to 1600 and then 2300. >> What else to tell that I'm now working on a circular genome? > > Reverse it where, the alignment or the feature? The svn version of BioPerl, > for alignments, retains strand information (this was a bug that was fixed). > For features, start is always less than end, with directionality determined > by strand. For a circular genome, the feature is split across the origin, > as you have seen in the original sequence you posted: > > ... > gene join(2307..3215,1..1623) > /gene="P" > ... > > > This would be represented as a Bio::Location::SplitLocation in the feature; > it would joined based on that order if $seq->is_circular() is true (or at > least it should). In cases like this, the safe bet is to call spliced_seq() > to get the joined sequence in all cases, then call translate() to get the > protein sequence. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Mon Mar 22 16:23:50 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 09:23:50 +1300 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E8829C2@exchsth.agresearch.co.nz> I guess it depends on how complex you need your GUI. If you only need a few a few menus, input fields, buttons, and are getting text or images as output then I'd stick to a simple web interface. You could tart it up a bit with Dojo or YUI libraries so it didn't look like every other webpage. If you need something more complex, you could give TK a go but I'm not sure how good it is and it will look a bit dated. If you're going to write the GUI in Swing, try Inline::Java and Java::Swing - take a look here: http://www.perlmonks.org/?node_id=372197 It may be easier to call Perl from Java so take a look at PLJava http://search.cpan.org/~gmpassos/PLJava-0.04/README.pod I haven't tried a Java GUI for Perl yet - we tend to use web interfaces for scripts that are going to get used by the "public" (i.e. scientists, not developers). We've found Mobyle http://bioweb2.pasteur.fr/projects/mobyle/ to be a nice way to get something up fairly quickly and it keep a consistent look to all our scripts. Hope this helps, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Charles Embry > Sent: Tuesday, 23 March 2010 8:49 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > > I want to create a Gui that will use current bioperl modules(along with > some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school > for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Mon Mar 22 16:26:15 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 22 Mar 2010 13:26:15 -0700 Subject: [Bioperl-l] Bio::DB::SeqFeature spliced_seq() In-Reply-To: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> References: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> Message-ID: <4BA7D267.6050704@bioperl.org> Yes it needs a special case I guess - since spliced_seq should work, however ... The only problem is that if both exons and CDS are sub-features you have to be smart enough to not grab both... So I have just relied on specialized dumping scripts for gff3_to_cds for my own needs (i.e. http://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/dbgff_to_cdspep.pl ). But you might also see what the Gbrowse plugin dumpers do. -jason Chris Fields wrote, On 3/22/10 11:56 AM: > I have just noticed that spliced_seq() is borked with > Bio::DB::SeqFeature and am thinking about implementing it. Or is > similar functionality already implemented elsewhere? > > Currently, it is calling entire_seq(), which I plan on avoiding simply > to prevent sucking in the entire sequence into memory. This is > currently what happens: > > > --------------------------- > > my $it = $store->get_seq_stream(-type => 'mRNA'); > > my $ct = 0; > while (my $sf = $it->next_seq) { > my $seq = $sf->spliced_seq; # dies with exception > } > > --------------------------- > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented > by package Bio::DB::SeqFeature. > This is not your fault - author of Bio::DB::SeqFeature should be blamed! > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 > STACK: > Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 > STACK: > Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 > STACK: > Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 > STACK: beestore.pl:17 > ---------------------------------------------------------------- > > > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Mon Mar 22 16:33:48 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 22 Mar 2010 13:33:48 -0700 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> Message-ID: <4BA7D42C.5050602@cornell.edu> If I were doing a GUI for BioPerl, I would certainly not try to use Java. You could have a look at how Padre, the Perl IDE (written in Perl is implemented): http://search.cpan.org/~plaven/Padre-0.58/ They use wx, I think. But, a simple web or command-line application would be far easier to write, in any language, if you can find somewhere to host it. Rob Charles Embry wrote: > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Mar 22 16:33:51 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 22 Mar 2010 13:33:51 -0700 Subject: [Bioperl-l] using Bio::SeqFeature::Tools::Unflattener In-Reply-To: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> References: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> Message-ID: <4BA7D42F.2060807@bioperl.org> you can try this but it is a bit of an involved script because it is setup for dealing with multiple genomes in multiple folders so you might want to simplify it. http://github.com/hyphaltip/genome-scripts/blob/master/data_format/genbank_gbk2gff3_unflatten.pl But I thought the perldoc was a good starting point - have you tried it Generally I do: GENBANK -> GFF3 --> genbank_gbk2gff3_unflatten.pl GFF3 -> {CDS,PEP,GENE} --> http://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/gff3_to_cdspep.pl (or equivalent) -jason Jessica Sun wrote, On 3/22/10 11:48 AM: > Does any know how to get CDS of the corresponding mRNA accession(NM_) using > this function? > *Bio::SeqFeature::Tools::Unflattener > > many thanks in advance. > > * > From Russell.Smithies at agresearch.co.nz Mon Mar 22 17:10:36 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 10:10:36 +1300 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4BA7D42C.5050602@cornell.edu> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> <4BA7D42C.5050602@cornell.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E882A5B@exchsth.agresearch.co.nz> wx www.wxwidgets.org looks very interesting - I didn't realize Cn3D used it. wxPerl http://wxperl.sourceforge.net might be worth a look. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Robert Buels > Sent: Tuesday, 23 March 2010 9:34 a.m. > To: Charles Embry > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > > If I were doing a GUI for BioPerl, I would certainly not try to use > Java. You could have a look at how Padre, the Perl IDE (written in Perl > is implemented): http://search.cpan.org/~plaven/Padre-0.58/ They use > wx, I think. > > But, a simple web or command-line application would be far easier to > write, in any language, if you can find somewhere to host it. > > Rob > > > Charles Embry wrote: > > I want to create a Gui that will use current bioperl modules(along with > some > > I am writing). It will be on a windows machine that runs XP and maybe a > > laptop with Vista.(this is a project i am working on in Graduate school > for > > a professor). It will be id'ing promoter types in eukaryote organisms > and > > also do multiple alignments. > > > > What recommendations do yo suggest to use t develop this? A java > > application? If so how hard is it to get Java to use perl and bioperl > > modules? Another language? Is there a tool to directly develop a GUI for > > bioperl modules that does no use another language? > > > > I will need to tag certain sequences with user specified colors and > such. > > > > > > Thanks for the help > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From clarsen at vecna.com Mon Mar 22 16:51:08 2010 From: clarsen at vecna.com (Chris Larsen) Date: Mon, 22 Mar 2010 16:51:08 -0400 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: Ross, Chris F, I'd like to just comment on this since we are working in parallel on a similar problem. See also the prior thread in archives for Peters work in BioPython that I instigated: "Polyproteins, robo slippage, viral mat_peptides" This dialog below is just to clarify the science that will guide the pseudocode and logic flow would be needed to be built out into a BioPerl module. There are plenty of comments on the string mashing required, and its a harrowing morass, but heres some other thoughts. Three line item comments first, and then some open general ideas for moving this block of concepts forward: 1. >> Ross Said: >> I am working on virus sequences and one of the Genbank file is here: >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> If you are transferring protein annotation, why not use the RefSeq one instead of a GenBank one? In our experience at Virusbrc.org we find that protein annotation transfer is only a valid idea if you have reference sequences for each serotype, or your annotations will have propagation errors from the reference. They just dont align more than 80% of the time for instance in Dengue, and I assume you want better then that? Yes this HepB is a decent sequence, but the problem is that HepB has four main serotypes, and yet there is only one RefSeq: NC_003977. My guess is that you will have to define reference peptide seqs for all four serotypes first, and then grab the Taxon_ID from the input unknown file so you align right i.e. you need to do virus annotation below the species level or it isnt accurate. The number of reference sequences that you use is related to the conservation of your virus family. The script needs to know which one to align to, so we have pulled that from the taxon_ID field of the *.gbk file. You could also use blast and pull the high scorer. Your choice. >> Ross said: >> >> Thanks for your response. While the one with Genbank file can be >> extracted, >> those without have to rely on alignment. Scripts certainly can be >> written to >> move forward and backward on the multiple alignment but it is an >> error-prone We find also that viruses dont have the proteins annotated most of the time. It's just genome file. Part of the problem is that /host/ proteases sometimes cleave the /viral/ polyproteins, in a species- specific way, and since there is only one database entry, but many hosts, you can /only/ give the genome code and still be right for everything it /might/ infect. You cant define the peptides in the file, because they might be different, depending on the host. Sick, isnt it? The proteins produced in different animals based on their proteases cleavage specificity help determine whether the virus effects that animal or not. This is my hunch based on experience, no, I cannot give an example. 3. Chris F said: > To preface this, any reason you're not translating the alignment > sequences using the above sequence's features as a reference? A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. 4. Last thoughts: * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? *Sometimes the features on viruses are named differently: / mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. All ideas. Nothing set in stone. Dialog welcome. Good luck all. Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 clarsen at vecna.com From janine.arloth at googlemail.com Sun Mar 21 10:02:32 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Sun, 21 Mar 2010 15:02:32 +0100 Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs In-Reply-To: References: Message-ID: Hello all, while running blast(n) I want to extend to method_arg like: .. $result = $fac->$blastprogramm_input( -query => $seq, -outfile => "blast.txt", -method_args => [ "-num_alignments" => $num_alignments_input, "-evalue" => $evalue_input, "-word_size" => $word_size_input, "-?" => $match_score_input, "-?" => $gapcosts_input ..... ] ); ... in Bio/Tools/BlastPlus/Config.pm I found for gap costs: bln| gapopen and bln| gapextend so when I have the input value = "4 4" , then Existence: 4 = gapaopen and Extension: 4 = gapextend ?? Is there a similar usage for Match/Mismatch scores like value="1,-2" -> match=1 and mismatch=-2?? (I can't find it) Thanks for help. From nils.mueller0 at googlemail.com Sun Mar 21 11:17:06 2010 From: nils.mueller0 at googlemail.com (=?ISO-8859-1?Q?Nils_M=FCller?=) Date: Sun, 21 Mar 2010 16:17:06 +0100 Subject: [Bioperl-l] BlastPlus Masker Message-ID: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> Dear all, I am confused in handeling with maskers in blastplus: I have fasta seq. and want to run blast with a low complexity masker like dustmasker: $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -mask_data => 'maskseqs.fas', -create => 1); Is myseqs.fas the same as maskseqs.fas??? I don't want to create a maskfile , I only will run blast with a masked file?? From razi.khaja at gmail.com Mon Mar 22 20:55:42 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Mon, 22 Mar 2010 20:55:42 -0400 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: <201003191525.o2JFPIr3019479@portal.open-bio.org> References: <201003191525.o2JFPIr3019479@portal.open-bio.org> Message-ID: Hello All, I've submitted a patch (blast.pm.diff) to bugzilla to enhance Bio/SearchIO/ blast.pm to be able to parse the algorithm_reference from BLAST reports. I've also submitted a patch (blast.t.diff) of 26 additional tests to parse the algorithm_reference from many of the BLAST reports in the t/data dir in bioperl-live. I'd like to get the patch into bioperl-live and would like someone to review the patch and tests. If the architecture for BLAST report parsing is changing, can someone let me know and I can contribute my efforts there. Below are links to bugzilla. Thanks, Razi Khaja ---------- Forwarded message ---------- From: Date: Fri, Mar 19, 2010 at 11:25 AM Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO To: bioperl-guts-l at bioperl.org http://bugzilla.open-bio.org/show_bug.cgi?id=3031 ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST ------- Created an attachment (id=1462) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) patch for t/SearchIO/blast.t to perform 26 additional tests to parse algorithm_reference from many BLAST report files -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From Russell.Smithies at agresearch.co.nz Mon Mar 22 21:26:30 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 14:26:30 +1300 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: References: <201003191525.o2JFPIr3019479@portal.open-bio.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> It's not really a bug if it was never implemented and it probably wasn't implemented because it wasn't needed. Is there actually a use case where you'd programmatically need to access the algorithm reference from Blast results?? I'm sure I can't think of one. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Razi Khaja > Sent: Tuesday, 23 March 2010 1:56 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse > algorithm_reference from BLAST reports using Bio::SearchIO > > Hello All, > > I've submitted a patch (blast.pm.diff) to bugzilla to enhance > Bio/SearchIO/ > blast.pm to be able to parse the algorithm_reference from BLAST reports. > I've also submitted a patch (blast.t.diff) of 26 additional tests to parse > the algorithm_reference from many of the BLAST reports in the t/data dir > in > bioperl-live. > > I'd like to get the patch into bioperl-live and would like someone to > review > the patch and tests. > > If the architecture for BLAST report parsing is changing, can someone let > me > know and I can contribute my efforts there. > > Below are links to bugzilla. > > Thanks, > > Razi Khaja > > ---------- Forwarded message ---------- > From: > Date: Fri, Mar 19, 2010 at 11:25 AM > Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference > from BLAST reports using Bio::SearchIO > To: bioperl-guts-l at bioperl.org > > > http://bugzilla.open-bio.org/show_bug.cgi?id=3031 > > > > > > ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST ------- > Created an attachment (id=1462) > --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) > patch for t/SearchIO/blast.t to perform 26 additional tests to parse > algorithm_reference from many BLAST report files > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From ross at cuhk.edu.hk Mon Mar 22 21:32:06 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 23 Mar 2010 09:32:06 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <001201caca28$a5e325b0$f1a97110$@edu.hk> Chris L, Your comment is insightful and as a non-virologist, I have never known that before. My strategy is just to extract the genomic fragments encoding proteins and derive the putative translated sequences. I'll do another round of MSA for the protein sequences in order to discover any outliners. There may be truncations, but as long as the protease acts post-translationally, it's acceptable. Chris F, What makes me feel frustrated is the verisimilar data structures and naming of Bio objects in Bioperl. If I want to retrieve a genbank file over the internet by: $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_acc('J00522'); And from: http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq so I can't do something like: my $seqobj = $seq->next_seq; for my $feat_object ($seqobj->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->spliced_seq->seq,"\n"; if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ print "gene: ",$val,"\n"; } } } } >From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the methods there mention nothing about how to get the features or inter-convert among the object types. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Larsen Sent: Tuesday, March 23, 2010 4:51 AM To: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] automation of translation based on alignment Ross, Chris F, I'd like to just comment on this since we are working in parallel on a similar problem. See also the prior thread in archives for Peters work in BioPython that I instigated: "Polyproteins, robo slippage, viral mat_peptides" This dialog below is just to clarify the science that will guide the pseudocode and logic flow would be needed to be built out into a BioPerl module. There are plenty of comments on the string mashing required, and its a harrowing morass, but heres some other thoughts. Three line item comments first, and then some open general ideas for moving this block of concepts forward: 1. >> Ross Said: >> I am working on virus sequences and one of the Genbank file is here: >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> If you are transferring protein annotation, why not use the RefSeq one instead of a GenBank one? In our experience at Virusbrc.org we find that protein annotation transfer is only a valid idea if you have reference sequences for each serotype, or your annotations will have propagation errors from the reference. They just dont align more than 80% of the time for instance in Dengue, and I assume you want better then that? Yes this HepB is a decent sequence, but the problem is that HepB has four main serotypes, and yet there is only one RefSeq: NC_003977. My guess is that you will have to define reference peptide seqs for all four serotypes first, and then grab the Taxon_ID from the input unknown file so you align right i.e. you need to do virus annotation below the species level or it isnt accurate. The number of reference sequences that you use is related to the conservation of your virus family. The script needs to know which one to align to, so we have pulled that from the taxon_ID field of the *.gbk file. You could also use blast and pull the high scorer. Your choice. >> Ross said: >> >> Thanks for your response. While the one with Genbank file can be >> extracted, >> those without have to rely on alignment. Scripts certainly can be >> written to >> move forward and backward on the multiple alignment but it is an >> error-prone We find also that viruses dont have the proteins annotated most of the time. It's just genome file. Part of the problem is that /host/ proteases sometimes cleave the /viral/ polyproteins, in a species- specific way, and since there is only one database entry, but many hosts, you can /only/ give the genome code and still be right for everything it /might/ infect. You cant define the peptides in the file, because they might be different, depending on the host. Sick, isnt it? The proteins produced in different animals based on their proteases cleavage specificity help determine whether the virus effects that animal or not. This is my hunch based on experience, no, I cannot give an example. 3. Chris F said: > To preface this, any reason you're not translating the alignment > sequences using the above sequence's features as a reference? A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. 4. Last thoughts: * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? *Sometimes the features on viruses are named differently: / mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. All ideas. Nothing set in stone. Dialog welcome. Good luck all. Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 clarsen at vecna.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Mon Mar 22 22:08:45 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Mon, 22 Mar 2010 22:08:45 -0400 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> References: <201003191525.o2JFPIr3019479@portal.open-bio.org> <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> Message-ID: Nope, not a bug, It's an enhancement though ;) I implemented it so that I could do a loss less transformation from BLAST report format to other formats. You could consider that a use case. I also have additional patches that parse other details from BLAST reports that aren't currently implemented in Bio::SearchIO, and I'd like to contribute those as well, however, I thought I'd start with this one. Razi On Mon, Mar 22, 2010 at 9:26 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > It's not really a bug if it was never implemented and it probably wasn't > implemented because it wasn't needed. > Is there actually a use case where you'd programmatically need to access > the algorithm reference from Blast results?? > I'm sure I can't think of one. > > > --Russell > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Razi Khaja > > Sent: Tuesday, 23 March 2010 1:56 p.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse > > algorithm_reference from BLAST reports using Bio::SearchIO > > > > Hello All, > > > > I've submitted a patch (blast.pm.diff) to bugzilla to enhance > > Bio/SearchIO/ > > blast.pm to be able to parse the algorithm_reference from BLAST reports. > > I've also submitted a patch (blast.t.diff) of 26 additional tests to > parse > > the algorithm_reference from many of the BLAST reports in the t/data dir > > in > > bioperl-live. > > > > I'd like to get the patch into bioperl-live and would like someone to > > review > > the patch and tests. > > > > If the architecture for BLAST report parsing is changing, can someone let > > me > > know and I can contribute my efforts there. > > > > Below are links to bugzilla. > > > > Thanks, > > > > Razi Khaja > > > > ---------- Forwarded message ---------- > > From: > > Date: Fri, Mar 19, 2010 at 11:25 AM > > Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference > > from BLAST reports using Bio::SearchIO > > To: bioperl-guts-l at bioperl.org > > > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=3031 > > > > > > > > > > > > ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST > ------- > > Created an attachment (id=1462) > > --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) > > patch for t/SearchIO/blast.t to perform 26 additional tests to parse > > algorithm_reference from many BLAST report files > > > > > > -- > > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are the assignee for the bug, or are watching the assignee. > > _______________________________________________ > > Bioperl-guts-l mailing list > > Bioperl-guts-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From maj at fortinbras.us Mon Mar 22 22:51:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Mar 2010 22:51:24 -0400 Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs In-Reply-To: References: Message-ID: Hi Janine-- The options you need are "reward" (for the match score) and "penalty" (for the mismatch score). Add them to -method_args. cheers MAJ ----- Original Message ----- From: "Janine Arloth" To: Sent: Sunday, March 21, 2010 10:02 AM Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs > Hello all, > > while running blast(n) I want to extend to method_arg like: > .. > $result = $fac->$blastprogramm_input( > -query => $seq, > -outfile => "blast.txt", > -method_args => [ > "-num_alignments" => $num_alignments_input, > "-evalue" => $evalue_input, > "-word_size" => $word_size_input, > "-?" => $match_score_input, > "-?" => $gapcosts_input > ..... > ] > ); > ... > > in Bio/Tools/BlastPlus/Config.pm I found for gap costs: bln| gapopen and bln| > gapextend > so when I have the input value = "4 4" , then Existence: 4 = gapaopen and > Extension: 4 = gapextend ?? > > Is there a similar usage for Match/Mismatch scores like value="1,-2" -> > match=1 and mismatch=-2?? > (I can't find it) > > Thanks for help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Mar 22 22:59:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Mar 2010 22:59:56 -0400 Subject: [Bioperl-l] BlastPlus Masker In-Reply-To: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> References: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> Message-ID: Hi Nils, You don't have to specify a mask_data file; the factory should make it for you; try simply $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -create => 1); -mask_data is there so that pre-made masks can be applied separately, or so you can name the file that is produced and preserve it; this is an "advanced feature", I suppose-- MAJ ----- Original Message ----- From: "Nils M?ller" To: Sent: Sunday, March 21, 2010 11:17 AM Subject: [Bioperl-l] BlastPlus Masker > Dear all, > > I am confused in handeling with maskers in blastplus: > I have fasta seq. and want to run blast with a low complexity masker like > dustmasker: > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'my_masked_db', > -db_data => 'myseqs.fas', > -masker => 'dustmasker', > -mask_data => 'maskseqs.fas', > -create => 1); > > Is myseqs.fas the same as maskseqs.fas??? I don't want to create a > maskfile , I only will run blast with a masked file?? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Mar 23 00:43:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 23:43:03 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <001201caca28$a5e325b0$f1a97110$@edu.hk> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> Message-ID: <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> On Mar 22, 2010, at 8:32 PM, Ross KK Leung wrote: > Chris L, > > Your comment is insightful and as a non-virologist, I have never known that > before. My strategy is just to extract the genomic fragments encoding > proteins and derive the putative translated sequences. I'll do another round > of MSA for the protein sequences in order to discover any outliners. There > may be truncations, but as long as the protease acts post-translationally, > it's acceptable. > > Chris F, > > What makes me feel frustrated is the verisimilar data structures and naming > of Bio objects in Bioperl. If I want to retrieve a genbank file over the > internet by: > > $gb = new Bio::DB::GenBank; > > $seq = $gb->get_Seq_by_acc('J00522'); > > And from: > http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html > > it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq > so I can't do something like: A Bio::Seq::RichSeq is-a Bio::Seq (it inherits Bio::Seq and augments it). I believe 'Bio::Seq' in the documents refers to the fact one can retrieve FASTA sequence data (which returns a simple Bio::Seq) or richer records, such as a GenBank record (which returns a Bio::Seq::RichSeq). In this case, it should probably read 'Bio::SeqI' to be more accurate (implements the Bio::SeqI interface). Beyond the addition of a few accessor methods they are essentially the same, in they both have annotation, features, etc. > my $seqobj = $seq->next_seq; You're either not reading the demos or the relevant documentation correctly, or there is a spot in the docs that needs to be fixed (if the latter, please let us know). Bio::Seq does not implement a next_seq() method, but sequence *streams* (ala Bio::SeqIO) do. You are probably thinking of something like this: my $streamobj = $gb->get_Stream_by_acc(@ids); while (my $seqobj = $stream->next_seq) { # do stuff here } The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time. Unless you call a stream in some way, that code will not work. If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > for my $feat_object ($seqobj->get_SeqFeatures) { > > if ($feat_object->primary_tag eq "CDS") { > > print $feat_object->spliced_seq->seq,"\n"; > > if ($feat_object->has_tag('gene')) { > > for my $val ($feat_object->get_tag_values('gene')){ > > print "gene: ",$val,"\n"; > > } > > } > > } > > } > >> From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the > methods there mention nothing about how to get the features or inter-convert > among the object types. Just a note, but make sure to read up-to-date documentation, particularly if you are using the latest code. Here is the pdoc for the latest release: http://doc.bioperl.org/releases/bioperl-1.6.1/Bio/Seq/RichSeqI.html This is definitely worth pointing out, and is a good example where we can improve our documentation; I've added some links to classes that would explain more. In the meantime, the best thing to do in this case is to point you to the online documentation (which I think I did already, but just in case): http://www.bioperl.org/wiki/HOWTO:Beginners http://www.bioperl.org/wiki/HOWTO:Feature-Annotation chris From cjfields at illinois.edu Tue Mar 23 00:53:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 23:53:48 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <42E3E2EC-2226-44CE-995E-01B425B161F1@illinois.edu> On Mar 22, 2010, at 3:51 PM, Chris Larsen wrote: > ... > 3. > Chris F said: > >> To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? > > > A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. > > Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. Yes, in this case nothing will be a immediate, perfect solution. It will take some additional work. > 4. > Last thoughts: > > * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). Might be nice to see what you've done, whenever that is ready. > * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? If you mean pulling out sets of sequences from a larger alignment or slices of alignments, there should be methods within Bio::SimpleAlign to do this, yes. > *Sometimes the features on viruses are named differently: /mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. > > * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. > > All ideas. Nothing set in stone. Dialog welcome. > > Good luck all. > > Chris > > > -- > > Christopher Larsen, Ph.D. > Sr. Scientist / Grants Manager > Vecna Technologies > 6404 Ivy Lane #500 > Greenbelt, MD 20770 > Phone: (240) 965-4525 > Fax: (240) 547-6133 > > clarsen at vecna.com Very nice summary of the problems in the field. thanks! chris From ross at cuhk.edu.hk Tue Mar 23 01:20:56 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 23 Mar 2010 13:20:56 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> Message-ID: <001501caca48$9db03f70$d910be50$@edu.hk> my $streamobj = $gb->get_Stream_by_acc(@ids); while (my $seqobj = $stream->next_seq) { # do stuff here } The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time. Unless you call a stream in some way, that code will not work. If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > for my $feat_object ($seqobj->get_SeqFeatures) { > > if ($feat_object->primary_tag eq "CDS") { > > print $feat_object->spliced_seq->seq,"\n"; > > if ($feat_object->has_tag('gene')) { > > for my $val ($feat_object->get_tag_values('gene')){ > > print "gene: ",$val,"\n"; > > } > > } > > } > > } Chris, in fact I did have this code before, but then it goes back to the old problem that the spliced sequence is incorrect. Please try using the following codes with "DQ089804" as the argument. If you check the printed result with: http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=2&itool=EntrezSyst em2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum you'll discover, for example, the sequence of gene P, is derived from splicing 1-1623 (starts with CTC...) and 2307-3215 (starts with ATG...), rather than 2307-3215 and 1-1623. use Bio::SeqIO::genbank; use Bio::DB::GenBank; use Bio::SeqIO; my ($acc) = @ARGV; $gb = new Bio::DB::GenBank; $streamobj = $gb->get_Stream_by_acc($acc); my $seqobj = $streamobj->next_seq; for my $feat_object ($seqobj->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->spliced_seq->seq,"\n"; if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ print "gene: ",$val,"\n"; } } } } exit; From e.osimo at gmail.com Tue Mar 23 05:42:25 2010 From: e.osimo at gmail.com (Emanuele Osimo) Date: Tue, 23 Mar 2010 10:42:25 +0100 Subject: [Bioperl-l] Xyplot and multiple lines plots Message-ID: <2ac05d0f1003230242o31779c30sffa42d8e99539b09@mail.gmail.com> Hello everyone, I would like to plot two data sets in Bio::Graphics using Xyplot, one superimposed on the other. I need to compare the differential expression of an Affy expression probeset in different subjects. I successfully managed to plot one at a time with: $panel->add_track( $feat, -graph_type=>'linepoints', -glyph =>'xyplot', -fgcolor=>'gray', -max_score => 1, -min_score => 0, ); But I cannot understand how to plot two lines independently in the same track. Thank you in advance, Emanuele From biopython at maubp.freeserve.co.uk Tue Mar 23 06:58:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Mar 2010 10:58:58 +0000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <320fb6e01003230358w11ae8e5fxef140652c5cc9f1b@mail.gmail.com> On Mon, Mar 22, 2010 at 8:51 PM, Chris Larsen wrote: > Ross, Chris F, > > I'd like to just comment on this since we are working in parallel on a > similar problem. See also the prior thread in archives for Peters work in > BioPython that I instigated: "Polyproteins, robo slippage, viral > mat_peptides" Minor typo - the old thread title was about ribo (ribosomal) slippage: http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031479.html http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031484.html etc Triggered in part by my discussion with Chris Larsen (off list) about the biological problem of getting the mature peptide sequences from GenBank files, Biopython 1.53 ended up with a new method for extracting the sequence region described by a (complex) location, e.g. from parsing in an EMBL/GenBank file. There were several threads about this, this is perhaps the best summary if anyone is interested: http://lists.open-bio.org/pipermail/biopython/2009-November/005813.html http://lists.open-bio.org/pipermail/biopython/2009-December/005889.html > This dialog below is just to clarify the science that will guide the > pseudocode and logic flow would be needed to be built out into a BioPerl > module. There are plenty of comments on the string mashing required, and its > a harrowing morass, but heres some other thoughts. Three line item comments > first, and then some open general ideas for moving this block of concepts > forward: Thanks for the update - it sounds like you've got a better understanding of the complexities now, any some of the reasons why representing things like mature peptides is tricky (the issue of different cleavage patterns in different hosts is interesting). Peter From cjfields at illinois.edu Tue Mar 23 08:46:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 23 Mar 2010 07:46:37 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <001501caca48$9db03f70$d910be50$@edu.hk> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> <001501caca48$9db03f70$d910be50$@edu.hk> Message-ID: <3A94734B-CD43-4674-8DB6-82EA1C6530E4@illinois.edu> On Mar 23, 2010, at 12:20 AM, Ross KK Leung wrote: > my $streamobj = $gb->get_Stream_by_acc(@ids); > > while (my $seqobj = $stream->next_seq) { > # do stuff here > } > > The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO > stream). '$stream->next_seq()' iterates through them one at a time. Unless > you call a stream in some way, that code will not work. If you call the > methods below directly on the *sequence* object ($seqobj, retrieved from > get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > >> for my $feat_object ($seqobj->get_SeqFeatures) { >> >> if ($feat_object->primary_tag eq "CDS") { >> >> print $feat_object->spliced_seq->seq,"\n"; >> >> if ($feat_object->has_tag('gene')) { >> >> for my $val ($feat_object->get_tag_values('gene')){ >> >> print "gene: ",$val,"\n"; >> >> } >> >> } >> >> } >> >> } > > Chris, in fact I did have this code before, but then it goes back to the old > problem that the spliced sequence is incorrect. Please try using the > following codes with "DQ089804" as the argument. If you check the printed > result with: > > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=2&itool=EntrezSyst > em2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum > > you'll discover, for example, the sequence of gene P, is derived from > splicing 1-1623 (starts with CTC...) and 2307-3215 (starts with ATG...), > rather than 2307-3215 and 1-1623. Okay, as I mentioned before, then that would be a bug. The best way to handle this is to file it in Bugzilla: http://bugzilla.open-bio.org/ I can likely look at it today, whether it's filed or not, just need to make some time. Please file the bug report, though, just in case I can't get to it right away. BTW, we had some discussion about circular genome support recently at the GMOD conference, and some code was added that was supposed to address the issues raised. I'm guessing we'll need to add more tests just to be sure. chris ... From Jean-Marc.Frigerio at pierroton.inra.fr Tue Mar 23 12:29:11 2010 From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA) Date: Tue, 23 Mar 2010 17:29:11 +0100 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: References: Message-ID: <4BA8EC57.7070802@pierroton.inra.fr> > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help Hi, Have also a look to Gtk-perl and perl-qt Best From Leighton.Pritchard at scri.ac.uk Tue Mar 23 12:35:42 2010 From: Leighton.Pritchard at scri.ac.uk (Leighton Pritchard) Date: Tue, 23 Mar 2010 16:35:42 -0000 Subject: [Bioperl-l] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? Message-ID: Hi, I can't seem to find any discussion of this on the mailing list archives (if anyone has a link, I'll happily follow it), so I was wondering what the rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live mapping CDS features to gene_component_region. For example, if I use the script on the E.coli sequence/annotation NC_000913.gbk, the gene: gene 190..255 /gene="thrL" /locus_tag="b0001" /note="synonyms: ECK0001, JW4367" /db_xref="EcoGene:EG11277" /db_xref="ECOCYC:EG11277" /db_xref="GeneID:944742" CDS 190..255 /gene="thrL" /locus_tag="b0001" /function="leader; Amino acid biosynthesis: Threonine" /function="1.5.1.8 metabolism; building block biosynthesis; amino acids; threonine" /note="GO_process: threonine biosynthetic process [goid 0009088]" /codon_start=1 /transl_table=11 /product="thr operon leader peptide" /protein_id="NP_414542.1" /db_xref="ASAP:ABE-0000006" /db_xref="UniProtKB/Swiss-Prot:P0AD86" /db_xref="GI:16127995" /db_xref="EcoGene:EG11277" /db_xref="ECOCYC:EG11277" /db_xref="GeneID:944742" /translation="MKRISTTITTTITITTGNGAG" Is mapped to NC_000913 GenBank region 190 255 . + . ID=GenBank:region:NC_000913:190:255 NC_000913 GenBank exon 190 255 . + . ID=GenBank:exon:NC_000913:190:255 NC_000913 GenBank gene 190 255 . + . ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank gene_component_region 190 255 . + . Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG NGAG I understand the region-exon-gene part of the model, but not the gene_component_region, which appears to be a catch-all. I would have assumed that the CDS is better mapped to a polypeptide, as described in the CHADO documentation: http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model There is no difference in script output whether --CDS or --noCDS is used. Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From djibrilo at yahoo.fr Tue Mar 23 13:38:25 2010 From: djibrilo at yahoo.fr (djibrilo) Date: Tue, 23 Mar 2010 10:38:25 -0700 (PDT) Subject: [Bioperl-l] Re : G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4BA8EC57.7070802@pierroton.inra.fr> References: <4BA8EC57.7070802@pierroton.inra.fr> Message-ID: <344176.4737.qm@web23001.mail.ird.yahoo.com> HI, Have also a look to perl/Tk. Best Regards ________________________________ De : Jean-Marc Frigerio INRA ? : bioperl-l at lists.open-bio.org Envoy? le : Mar 23 mars 2010, 17 h 29 min 11 s Objet : Re: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help Hi, Have also a look to Gtk-perl and perl-qt Best _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Mar 23 14:18:46 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 23 Mar 2010 14:18:46 -0400 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: References: Message-ID: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Hi Leighton, I wonder if this is a change stemming from Nathan's work on this script. Nathan? Scott On Tue, Mar 23, 2010 at 12:35 PM, Leighton Pritchard wrote: > Hi, > > I can't seem to find any discussion of this on the mailing list archives (if > anyone has a link, I'll happily follow it), so I was wondering what the > rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live > mapping CDS features to gene_component_region. > > For example, if I use the script on the E.coli sequence/annotation > NC_000913.gbk, the gene: > > ? ? gene ? ? ? ? ? ?190..255 > ? ? ? ? ? ? ? ? ? ? /gene="thrL" > ? ? ? ? ? ? ? ? ? ? /locus_tag="b0001" > ? ? ? ? ? ? ? ? ? ? /note="synonyms: ECK0001, JW4367" > ? ? ? ? ? ? ? ? ? ? /db_xref="EcoGene:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="ECOCYC:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="GeneID:944742" > ? ? CDS ? ? ? ? ? ? 190..255 > ? ? ? ? ? ? ? ? ? ? /gene="thrL" > ? ? ? ? ? ? ? ? ? ? /locus_tag="b0001" > ? ? ? ? ? ? ? ? ? ? /function="leader; Amino acid biosynthesis: Threonine" > ? ? ? ? ? ? ? ? ? ? /function="1.5.1.8 metabolism; building block > ? ? ? ? ? ? ? ? ? ? biosynthesis; amino acids; threonine" > ? ? ? ? ? ? ? ? ? ? /note="GO_process: threonine biosynthetic process [goid > ? ? ? ? ? ? ? ? ? ? 0009088]" > ? ? ? ? ? ? ? ? ? ? /codon_start=1 > ? ? ? ? ? ? ? ? ? ? /transl_table=11 > ? ? ? ? ? ? ? ? ? ? /product="thr operon leader peptide" > ? ? ? ? ? ? ? ? ? ? /protein_id="NP_414542.1" > ? ? ? ? ? ? ? ? ? ? /db_xref="ASAP:ABE-0000006" > ? ? ? ? ? ? ? ? ? ? /db_xref="UniProtKB/Swiss-Prot:P0AD86" > ? ? ? ? ? ? ? ? ? ? /db_xref="GI:16127995" > ? ? ? ? ? ? ? ? ? ? /db_xref="EcoGene:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="ECOCYC:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="GeneID:944742" > ? ? ? ? ? ? ? ? ? ? /translation="MKRISTTITTTITITTGNGAG" > > Is mapped to > > NC_000913 ? ? ? GenBank region ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=GenBank:region:NC_000913:190:255 > NC_000913 ? ? ? GenBank exon ? ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=GenBank:exon:NC_000913:190:255 > NC_000913 ? ? ? GenBank gene ? ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: > ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 > NC_000913 ? ? ? GenBank gene_component_region ? 190 ? ? 255 ? ? . ? ? ? + > . > Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 > ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine > biosynthetic process [goid > 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino > acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block > biosynthesis%3B amino acids%3B > threonine;gene=thrL;locus_tag=b0001;product=thr operon leader > peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG > NGAG > > I understand the region-exon-gene part of the model, but not the > gene_component_region, which appears to be a catch-all. ?I would have > assumed that the CDS is better mapped to a polypeptide, as described in the > CHADO documentation: > > http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model > > There is no difference in script output whether --CDS or --noCDS is used. > > Cheers, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Tue Mar 23 14:15:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 23 Mar 2010 14:15:38 -0400 Subject: [Bioperl-l] BlastPlus Masker In-Reply-To: <464282111003230942r231ca93kf56a2def9afa9651@mail.gmail.com> References: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> <464282111003230942r231ca93kf56a2def9afa9651@mail.gmail.com> Message-ID: Specifying 'dustmasker' for a nucleotide database is roughly the same as "filter : low complexity regions" and "mask : lookup table only", I believe. (There is also a facility for creating masks based on lowercase residues in a mask data fasta file; the blast+ utility is 'convert2blastmask'. You can run this with the SABlastPlus factory. I'm not very familiar with it, but you should be able to take the output file from this utility and feed it in to a new factory as the '-mask_data' to get what you want. (If anyone has done this, a brief step-by-step would be appreciated.)) cheers MAJ ----- Original Message ----- From: Nils M?ller To: Mark A. Jensen Sent: Tuesday, March 23, 2010 12:42 PM Subject: Re: [Bioperl-l] BlastPlus Masker Many thanks, is it the same as showed on the ncbi blast page (Filtering and Masking- filter: Low complexity regions and mask:Mask for lookup table only or Mask lower case letters)? 2010/3/23 Mark A. Jensen Hi Nils, You don't have to specify a mask_data file; the factory should make it for you; try simply $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -create => 1); -mask_data is there so that pre-made masks can be applied separately, or so you can name the file that is produced and preserve it; this is an "advanced feature", I suppose-- MAJ ----- Original Message ----- From: "Nils M?ller" To: Sent: Sunday, March 21, 2010 11:17 AM Subject: [Bioperl-l] BlastPlus Masker Dear all, I am confused in handeling with maskers in blastplus: I have fasta seq. and want to run blast with a low complexity masker like dustmasker: $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -mask_data => 'maskseqs.fas', -create => 1); Is myseqs.fas the same as maskseqs.fas??? I don't want to create a maskfile , I only will run blast with a masked file?? _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From lpritc at scri.ac.uk Wed Mar 24 08:05:08 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 24 Mar 2010 12:05:08 +0000 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Message-ID: Hi, I'm surprised that this issue hasn't come up already, as the change to the gene model is quite significant. For comparison, this is what the old bp_genbank2gff3.pl script would produce with --CDS: NC_000913 GenBank gene 190 255 . + . ID=thrL;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank mRNA 190 255 . + . ID=thrL.t01;Parent=thrL NC_000913 GenBank CDS 190 255 . + . ID=thrL.p01;Parent=thrL.t01;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0A D86,GI:16127995,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process : threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=length.21 NC_000913 GenBank exon 190 255 . + . Parent=thrL.t01 and with --noCDS: NC_000913 GenBank gene 190 255 . + . ID=thrL;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank mRNA 190 255 . + . ID=thrL.t01;Parent=thrL NC_000913 GenBank polypeptide 190 255 . + . ID=thrL.p01;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995, EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Derives_from=thrL.t01;Note=GO_p rocess: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=length.21 NC_000913 GenBank exon 190 255 . + . Parent=thrL.t01 The new script produces this identical output with both --CDS and --noCDS: NC_000913 GenBank region 190 255 . + . ID=GenBank:region:NC_000913:190:255 NC_000913 GenBank exon 190 255 . + . ID=GenBank:exon:NC_000913:190:255 NC_000913 GenBank gene 190 255 . + . ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank gene_component_region 190 255 . + . Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG NGAG So, although the new script improves the parent-child relationships by identifying parents on the locus_tag field (guaranteed to be unique), rather than gene name (not guaranteed to be unique), the GFF3 gene model has apparently changed from canonical: gene <- mRNA <- {polypeptide/CDS, exon} to this: region ; exon ; gene <- gene_component_region So I guess I don't understand the region-exon-gene part of the new model, after all. This new model doesn't appear to be Sequence Ontology-compatible any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon is no longer considered part_of the transcript. In fact, there's not a transcript. Given that the SO cite bp_genbank2gff3.pl as a way to get SO-compliant GFF3 (http://www.sequenceontology.org/resources/faq.html#convert), this might be an issue requiring a prompt fix or reversion. For now, due to the downstream problems this model causes with GBROWSE and ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the script to use the locus_tag field rather than the gene field for the feature ID. Cheers, L. On 23/03/2010 Tuesday, March 23, 18:18, "Scott Cain" wrote: > Hi Leighton, > > I wonder if this is a change stemming from Nathan's work on this > script. Nathan? > > Scott -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From cjfields at illinois.edu Wed Mar 24 09:06:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 08:06:01 -0500 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: References: Message-ID: <3A556027-C8DB-4683-8376-A42AC8796156@illinois.edu> On Mar 24, 2010, at 7:05 AM, Leighton Pritchard wrote: > Hi, > > I'm surprised that this issue hasn't come up already, as the change to the > gene model is quite significant. For comparison, this is what the old > bp_genbank2gff3.pl script would produce with --CDS: > ... > So, although the new script improves the parent-child relationships by > identifying parents on the locus_tag field (guaranteed to be unique), rather > than gene name (not guaranteed to be unique), the GFF3 gene model has > apparently changed from canonical: > > gene <- mRNA <- {polypeptide/CDS, exon} > > to this: > > region ; exon ; gene <- gene_component_region > > So I guess I don't understand the region-exon-gene part of the new model, > after all. This new model doesn't appear to be Sequence Ontology-compatible > any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon > is no longer considered part_of the transcript. In fact, there's not a > transcript. Given that the SO cite bp_genbank2gff3.pl as a way to get > SO-compliant GFF3 > (http://www.sequenceontology.org/resources/faq.html#convert), this might be > an issue requiring a prompt fix or reversion. I agree. I think this commit needs more code review to understand the reasoning behind it, though it will be a little trickier than a simple reversion (I think there have been additional unrelated commits since then). Nathan, was this the intent, or is this a bug? I would agree with Leighton that it's the latter. chris > For now, due to the downstream problems this model causes with GBROWSE and > ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the > script to use the locus_tag field rather than the gene field for the feature > ID. > > Cheers, > > L. From pmiguel at purdue.edu Wed Mar 24 09:49:55 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed, 24 Mar 2010 09:49:55 -0400 Subject: [Bioperl-l] How to set "complexity" param using EUtilities Message-ID: <4BAA1883.3010203@purdue.edu> Just a little FYI that might help someone using GenBank efetch (here with bioperl EUtilities) and, contrary to expectation, retrieving a bunch of accessions (or GIs) when that single accession is what is wanted. The trick is to change the "complexity" parameter from its apparent default of "1" to "0". Actually, this parameter might be worth adding to the HOWTO because it causes the EUtilities efetch to perform similar to a normal Entrez search. Which, to me, would be the expected behavior. Details below. Some accessions/GIs appear to be embedded in bundles of related sequences. Here is an example: gi|158819346|gb|EU011641.1| If I search Entrez Nucleotide http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar with the either "158819346" (the GI) or "EU011641.1", I get a single record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal RNA gene, partial sequence". This what I want. If I use the following code derived from the Eutils HOWTO: use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $id ='gb|EU011641.1|'; push @ids ,$id; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); my $file = "test.gb"; $factory->get_Response(-file => $file); I get a bundle of accessions: EU011584-EU011663. Same result using the GI number instead. From reading: http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam it looks like I would get what I want were I to set the efetch "complexity" parameter to "1". But how do I set that parameter? Below is how I did it. Not the most efficient path, but did not take that long to traverse... The HowTo does not mention it. I usually look to the the Deobfuscator: http://bioperl.org/cgi-bin/deob_interface.cgi to help me when I want some documentation for a method. But this is a parameter not a class. What class sets this parameter? Not sure. So I googled: complexity eutil site:bioperl.org The top ranked hit is actually to the deprecated 1.5.2 version of EUtilities. But the 2nd hit is to the (auto generatated?) email posted to the bioperl-guts email list by Chris Fields upon his commit of the new EUtilities overhaul: http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html From here it looks like the obvious way to set the parameter would be possible. And indeed: use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $id ='gb|EU011641.1|'; push @ids ,$id; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -complexity =>1, -id => \@ids); my $file = "test.gb"; $factory->get_Response(-file => $file); works! Also a good idea to add -email parameter so that Genbank might chastise me via email, rather than banning my IP, if I try to send more than 100 requests in a series outside of the acceptable 9PM-5AM Eastern Time hours. Phillip From peter at maubp.freeserve.co.uk Wed Mar 24 10:08:26 2010 From: peter at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:08:26 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: References: Message-ID: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> Hi, This is probably of interest to all the Bio* projects offering access to the NCBI Entrez utilities. See forwarded message below. I *think* the new guidelines basically say that the email & tool parameters are optional BUT if your IP address ever gets banned for excessive use you then have to register an email & tool combination. Regarding the email address, the NCBI say to use the email of the developer (not the end user). However, they do not distinguish between the developers of a library (like us), and the developers of an application or script using a library (who may also be the end user). Currently we (Biopython) and I think BioPerl ask developers using our libraries to populate the email address themselves. I *think* this is still the right action. Peter ---------- Forwarded message ---------- From: Date: Wed, Mar 24, 2010 at 1:53 PM Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy To: NLM/NCBI List utilities-announce New E-utility documentation now on the NCBI Bookshelf The Entrez Programming Utilities (E-Utilities) Help documentation has been added to the NCBI Bookshelf, and so?is now fully integrated with the Entrez search and retrieval system as a part of the Bookshelf database. This help document has been divided into chapters for better organization and includes several new sample Perl scripts. At present this book covers the standard URL interface for the E-utilties; material about the SOAP interface will be added soon and is still available at the same URL: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html. Revised E-utility usage policy In December, 2009 NCBI announced a change to the usage policy for the E-utilities that would require all requests to contain non-null values for both the?&email and &tool parameters. After several consultations with our users and developers, we have decided to revise this policy change, and the revised?policy is described in detail at the following link: http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen Please let us know if you have any questions or concerns about this policy change. Thank you, The E-Utilities Team NIH/NLM/NCBI eutilities at ncbi.nlm.nih.gov. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce -------------- next part -------------- _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From joseguillin at hotmail.com Tue Mar 23 13:30:44 2010 From: joseguillin at hotmail.com (Jose .) Date: Tue, 23 Mar 2010 17:30:44 +0000 Subject: [Bioperl-l] Phylo/Phylip/Consense Message-ID: Hello, I'm trying to use Phylo/Phylip/Consense, but I get the following message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: SeqBoot did not create files correctly (/var/folders/+s/+srMEKriEiWM+Q7Qleiti++++TI/-Tmp-/v3no1dYNqE/outfile) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::Tools::Run::Phylo::Phylip::SeqBoot::_run /usr/local/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm:389 STACK: Bio::Tools::Run::Phylo::Phylip::SeqBoot::run /usr/local/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm:339 STACK: INDELVOLUTION_5.1consensus.pl:492 ----------------------------------------------------------- My code is a modification of the code I found at http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Phylo/Phylip/Consense.pm use Bio::Tools::Run::Phylo::Phylip::Consense; use Bio::Tools::Run::Phylo::Phylip::SeqBoot; use Bio::Tools::Run::Phylo::Phylip::ProtDist; use Bio::Tools::Run::Phylo::Phylip::Neighbor; use Bio::Tools::Run::Phylo::Phylip::DrawTree; my $aio = Bio::AlignIO->new(-file =>'yeah.clustalw', -format=> 'clustalw'); my $aln = $aio->next_aln; my ($aln_safe, $ref_name)=$aln->set_displayname_safe(); #next use seqboot to generate multiple aligments my @params = ('datatype'=>'SEQUENCE','replicates'=>10); my $seqboot_factory = Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params); my $aln_ref= $seqboot_factory->run($aln); #my $aln_ref= $seqboot_factory->run($aln_safe); #next build distance matrices and construct trees my $pd_factory = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(); my $ne_factory = Bio::Tools::Run::Phylo::Phylip::Neighbor->new(); my @tree; foreach my $a (@{$aln_ref}){ my $mat = $pd_factory->create_distance_matrix($a); push @tree, $ne_factory->create_tree($mat); } #now use consense to get a final tree my $con_factory = Bio::Tools::Run::Phylo::Phylip::Consense->new(); #you may set outgroup either by the number representing the order in #which species are entered or by the name of the species $con_factory->outgroup(1); my $tree = $con_factory->run(\@tree); # Restore original sequence names, after ALL phylip runs: my @nodes = $tree->get_nodes(); foreach my $nd (@nodes){ $nd->id($ref_name->{$nd->id_output}) if $nd->is_Leaf; } #now draw the tree my $draw_factory = Bio::Tools::Run::Phylo::Phylip::DrawTree->new(); my $image_filename = $draw_factory->draw_tree($tree); And my yeah.clustalw file is OK: CLUSTAL W(1.81) multiple sequence alignment A/1-474 G---CGGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C B/1-452 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C C/1-466 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C D/1-476 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGA-------------TC-CTA----C E/1-439 G---CCGTGGGAGA------TGAGGAACCTGAGGTAGTCC-----TATATCTCTAGCGGC F/1-434 G---CCGTGGGAGA------TGAGGAACCCGAGG---TCC-----TATATCTCTAGCGGC G/1-462 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTA---------------TCTAGCGGC H/1-466 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTAGTCC--------ATCTCTAGCGGC I/1-462 GCTGCCGTGGGAGAGCAACATGAGGAACCGGAGGTAGTCCGGTATTATATCTCTA----C J/1-447 GCTGCCGTGGGAGAGCAACATGAGGAACCGGAGGTAGTCCGGTATTATATCTCTA----C K/1-448 G---CCGTGGGAGAGCA-CATGAGGAACCCGAGGTAGTCCGGT---ATATCTCGA----C L/1-431 G---CCGTGGGAGAGCA-CATGAGGAACCCGAGGTAGTCCGGT---ATATCTCTA----C M/1-432 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCGGTATTATATCTCTA----C N/1-422 G---CC------GAGCAACATGAGGAAC---AGGTTGTC---TATTATATCTCTA----C O/1-441 G---CAGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCG--------TCTCTA----C P/1-446 G---CAGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCG--------TCTCTA----C * * ** ******** *** * * * A/1-474 GCATCGCGGCCCTTGTC-GCTCCCACCCA--CCATC---GACGGC-ACA--TTTGCTTGT B/1-452 GCAT----------GTC-GCTC---------CCATCGCTGACGGC-ACATCTTTG---GT C/1-466 GCATCGCGGCCCTTGTC-GCTCCCACCCATCCCATCGCTGACGGC-ACA-----GCTTGT D/1-476 GCATCGCGGCCCTTGTC-GCTCCCACCCATCCCATCGCTGACGGC-ACA-----GCTTG- E/1-439 GCA-CGCGGCCCT--TC-GCTT---CCCATCCCATCGCTGACGGC-ACATCT----TTGT F/1-434 GCA-CGCGGCCCT--TCCGCTT---CCCATCCCATCGCTTACGGC-ACATCTTTGCTTGT G/1-462 GCATCGCGGCCCT--TC-GCTC---CCCATCCCATCGCTGACGTC-ACATCTTTG-TTGT H/1-466 GCATCGCGGCCCT--TC-GCTC---CCCATCCCATCGCTGACGGC-ACATCTTTGCTTGT I/1-462 GCAT-CCGGCCCTTGTC-GCTCCCA------CCATCGCTGACGGC-ACAT--TTGCTTGT J/1-447 GC------GCCCTTGTC-GCTCCCA---------TCGCTGACGGC-ACATCTTTGCTTGT K/1-448 GCATCC----CCTTGTC-GCTCCCA------CCATCGCTGACGGC----TCTTTGCTTGT L/1-431 GCATCC----CCTTGTC-GCTCCCA------CCATCGCTGACGGC----TCTTTGCTTGT M/1-432 GCATC---GCCCTTGTC-GCTCCCA------CCATCGCTGAC-GC-ACATC-TTGCTTGT N/1-422 GCATC---GCCCTTGTC-GCTCCCA------CCATCGCTGACAGCAACATCTTTGCTTGT O/1-441 GCATC---GCCCTTGTC-GCTCCCA------CCATCTCTGACGGC-ACATCTTTGCTTGT P/1-446 GCATC---GCCCTTGTC-GCTCCCA------CCATCTCTGACGGC-ACATCTTTGCTTGT ** ** *** ** ** * * A/1-474 ACGAGATTGCTTTCACACTA-TCTATTGTTCGGGTACCGAGAGTCGGCGGTGAATACATC B/1-452 ACGAGATTGCGTTCACACTA-TCCATTGTTCGGGTACCGAGAGTC-GCGGTGAATACATC C/1-466 ACGTG--TGCGTTCCCACTAATCCATTGTTCGGGTAACGAGAGTCGGCGGTGAATACATG D/1-476 -CGTGATTGCGTTCCCACTAATCCATTGTTCGGGTAACGAGAGTCGGCGGTGAATACATC E/1-439 ACGTGATTGCG----CA--AATCCATTGT---GGTACCGAGAGTCGGCGGTGAACT---C F/1-434 ACGTGATTGCG----CA--AATCCATTGTTCGGGTACCGAGAGTCG-----GAACT---C G/1-462 ACGT----GCGTTCCCA--AATCCATTGTTCGGGTACCGAGAGTCGGCGGTGAACT---C H/1-466 ACGT-------TTCCCA--AATCCAT---TCGGGTACCGAGAGTCGGCGGTGAACT---C I/1-462 ACGTGATTGC--TCCCACCAATCCAT-GTTCGGGTACCGAGAGTCGGCGGTGAACTCATC J/1-447 ACGTGATTGC--TCCCACTAATCCAT-GTTCGGGTACCGA-----------GAACTCATC K/1-448 ACGTGATTGC--TCCCACTAATCCACTG--------CCGAGAGTCGGCGGTG---CCATC L/1-431 ACGTGATTGC--TC------ATC--TTGTTCGGGTACCGA-----GGCGGTGAACTCATC M/1-432 ACGTGATTGC--TCCCACTAATCC----TTCGGGTACCAAGAGTCGGCGGTGAACTCATC N/1-422 ACGTGATTGC--TCCCACTAATCC----TTCGGGTACCAAGAGTCGGCGGTGAACTCATC O/1-441 ACGTGATTGC--TCCCACTAATCCAT--TTCGGGTACCGAGAGTCGGCGGTGAACTCATC P/1-446 ACGTGATTGC--TCCCACTAATCCATTG--CGGGTACCGAGAGTCGGCGGTGAACTCATC ** ** * * * A/1-474 TCCGGAG--AAGTGTGCTAACCACAGTG--GAACGTATAATGCTGATCCCGCTTGTTT-- B/1-452 TCCGGAG--AA--GTGCTAACCACAGTG--GAACGTATAATGCTGAT-CCGCTT-TTT-- C/1-466 TCCGGAG--AAGTGTGCTAACCACAGTG--GAAAGTATAATGCT-----------TTT-- D/1-476 TCCGGAG--AAGTGT---AACCACAGTG--GAAAGTATAATGCTGATCCCGCTTGTTT-- E/1-439 TCCGG-----AGTGTGG-AACCACAGTG--GAACGTATAATGC--ATCTCGCGTGTTT-- F/1-434 TCCGG-----AGTGTGGTAACCACAGTG--GAACGTATAATGC--ATCCCGCGTGTTT-- G/1-462 TCCGGAG--AAGTGTGGTAACCACAGTG--GAACGTATAATGC--ATC--GCGTGTTT-- H/1-466 TCCGGAG--AAGTGTGGTAACCACAGT----AACGTAT-ATGC--ATCCCGCGTGTTT-- I/1-462 TCCGGAG--AAGTGTGGTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC J/1-447 TCGGGAG--AAGTGTGCTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC K/1-448 TCCGGAG--AAGTGTGGTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC L/1-431 TCCGGAG--AAGTGTG----CCACAGTGCCGAAC--ATAATGC--ATC--GCGTGTTTGC M/1-432 TCCGGAGGAAAGTGTGGTAACCACAGTG--GAAC---------------CGC----TTCC N/1-422 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGC----TTCC O/1-441 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGCGTGTTTCC P/1-446 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGCGTGTTTCC ** ** * ** ******* ** ** A/1-474 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----ATGTAC-CCGAGGACAACTAACAGT B/1-452 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----AGGTAC-CCGAGGACAACTAACAGT C/1-466 --CTGTACCTAAAGTTCACCGGGTAGAGCCTCGTCATGTAC-CCG-----AACTAACAGT D/1-476 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----ATGTAC-CCGAGGACAACTAACAGT E/1-439 --CCGTACCTAAAGTT------GTAGGGCC-----ATGTACACCGAGGACAACTAACAGT F/1-434 --CCGTACCTAAAGTT-----GGTAGGGCC-----ATGTACACCGAGGACAACTAACAGT G/1-462 --CCGTACCTAAAGTTCTCC--GTAGGGCC-----ATGTACACCGAGGACAACTAACAGT H/1-466 --CCGTACCTAAAGTTCACCGGGTAGGGCC-----ATGTACACCGAGGACAACTAACAGT I/1-462 GATCGTACCTAAAGTTCACC--------CC-----A-------CGAG----ACTAACAG- J/1-447 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAG- K/1-448 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAGT L/1-431 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAGT M/1-432 GACCGTACCT-----T-ACCG-GTAGCGCC-----ATGTACACCGAGC---ACTA----T N/1-422 GACCGTACCT-----TCACCG-GTAGTGCC-----ATGTACACCGAGC---ACTAACAGT O/1-441 GACCGTACCT-----TCACCG-GTAGCGCC-----ATGTACACCGAGC---ACTAACAGT P/1-446 GACCGTACCT-----TCACCG-GTAGCGCC-----ATG---ACCGAGC---ACTAACAGT ****** * ** * ** **** A/1-474 GATCCTCA----TCTAAGCGCCGCTTCAGGAC----ATTGCCACGTCTACATCG------ B/1-452 GATCCTCA----TTTAAGCGCCGCTTCAGGCC----ATTGCCACGTCTACATCG------ C/1-466 GATCCTCA----TTTAAGCGCCGCTTCAGGAC----ATTACCACGTCTACATCGTTTCAT D/1-476 GATCCTCA----TTTAAGCGCCGCTTCAGGAC----ATTACCACGTCTACATCGTTTCCT E/1-439 GATCCTCA----TTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT F/1-434 GATCCTCA----TTTAAGCGCCGC---AGGACTTTTATTGCCACGTCTACATCGTTTCAT G/1-462 GATCCTCACAATTTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT H/1-466 GATCCTC-CCATTTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT I/1-462 ---CCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCCACGTCTACATC---TCAT J/1-447 ---CCTCA----T-TAAGCGCCGCTGCAGGAC----ATTGCCACGTCTACATCGTTTCAT K/1-448 GATCCTCA----TTTAAGCGCCGCTGCAGG-------TTGCCACGTCTACATCGTTTCAT L/1-431 GATCCTCA----TTTAAGCGCCGCTGC----------TTGCCACGTCTACATCGTTTCAT M/1-432 GATC--CA----TTTAAGCGCCGCTGCAGG--------TGCCACGTCTACATCGTTTCAT N/1-422 GATC--CA----TTTAAGCGCCGCTGCAGGAA----ATTGCCACGTCTACATCGTTTCAT O/1-441 GATCCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCC--GTCTACATCGTA---- P/1-446 GATCCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCC--GTCTACATCGTTTCA- * * * ********** * ** ********* A/1-474 -CATCTACTCTT--AGGCAGCAACAATTTGTCTCGTTCGACGTACAG--CGAAC--ATGT B/1-452 -CATCTACTCTT--AGGCAGCAACAATT-GTCTCGTTCGATGTACAG--CGAAC--ATGT C/1-466 TCATCTACTTTT--AGCCAGCAACAATTTGTCTCGTAGGATGTACAG--CGAACATA--- D/1-476 TCATCTACTTTT--AGCCAGCAACAATTTGTCTCGTAGGATGTACAG--CGAACATA--- E/1-439 TCATCTACTTTT--AGGCAGCAACA---TGTATCGTACGATGTACAG--CGAACATATGT F/1-434 TCATCTACTTTT--AGGCAGCAACA---TGTATCGTACGATGTACAG--CGAA------T G/1-462 TCATCTACTTTT--AGGC-GCAACAATCTGTATCG-ACGATGTAC-G--CGAACATATGT H/1-466 TCATCTACTTTT--AGGC-GCAACAATCTGTATCG-ACGATGTAC-G--CGAACATATGT I/1-462 TCACCTACTTTT--AGGGAGCAACAATCTGTATCC---G--GTACAGACCGAACATAGGA J/1-447 TC----AC-TTT--AGGGAGCAACAATCTGTATCC---G--GTAC---CCGAACATAGGT K/1-448 TCACCTACTTTT--AGGCAGCAACAATCT--ATCC---G--GTAC-GACCGAACATAGGT L/1-431 TCACCTACTTTT--AGGCAGCAACAATCT--ATCC---G--GTAC-GACCGAACATAGGT M/1-432 TCATTTACT-----AGGCAGCAACAATCTGTATC--------TATAGACCGAGCATATGT N/1-422 TCATCTACT-----AGGCAGCAACAATCTGTATCC---G--GTATAGACCAAGCATATGT O/1-441 ------ACTTTT--AGGCAGCAAC--TCTGTATCC---G--GTATAGACCGAACATATGT P/1-446 ------ACTTTTTGAGGCAGCAAC--TCTGTATCC---G--GTATAGACCGAACATATGT ** ** ***** ** ** * * A/1-474 GGGGCGTAAGACCAAAGTT--TATCGTTGGCCTTATTCGACCCAA-CAATTCGCGGATA- B/1-452 GGGGCGTAAGACCAAAGTT--TATCGTTGGCCTTATTCGACCCAA-CAATTCGCGGATA- C/1-466 TGGGCGTAAGACCAAAGTTGAT--CGTTGG---TATTCGACCCAATCAAGTCGCG----- D/1-476 TGGGCGTAAGACCAAAGTTGAT--CGTGGGCCTTATTCGACCCAATCAATTCGCG---A- E/1-439 T----GTAAGACCAAAGTT--TATCGTTGG---TATTTGACCCAGGCAATTCGCGGATA- F/1-434 T----GTAAGACCAAAGTT--TATCGTTGG---TATTTGACCCAGGCAATTCGCGGATA- G/1-462 T--GCGTAAGACCAAAGTT--TATCGTTGGCCTTATTTGACC----CAATTCGCGGGTA- H/1-466 T--GAGTAAGACCAAAGTT--TATCGTTGGCCTTATTTGACC----CAATTCGCGGGTA- I/1-462 TGTGCTTAAGACCAAAGTT--TATCGTT------ATATGACCCAAGCAATTCGCGGATA- J/1-447 -GTGCTTAAGACCAAAGTT--TATCGTT------ACATGACCCAAGCAATTCGCGGATA- K/1-448 TGGGCGCAAGACCAAAGTT--TATCGTT------ATTTGACCCAAGCAATTCGCGGATAC L/1-431 TGGGCGCAAGACCAAAGTT--TATCGTT------ATTTGACCCAAGCAATTCGC-GATA- M/1-432 TGGGCGTAAGACCAAAGTT--TATCGTTGGCTTT----GACCCAAGCAAT--GC------ N/1-422 TGGGGGTAAGACCAA-------------GGCTTT----GACCCAAGCAAT--GC------ O/1-441 TGGGCG-AAGACCAAAGTT--TATCGATGGCCTTATTTGACCCAAGCAAT--GCGGATA- P/1-446 TGGGCG-AAGACCAAAGTT--TATCGATGGCCTTATTTGACCCAAGCAAT--GCGGATA- ******** **** *** ** A/1-474 -A--AT-------TTATTCATTATTACCACTGATCAC--CCTG-CACCTATGCGGTTT-- B/1-452 -A--ATCCCGTCTTTATTC------ACCACTGATCAC--CCTG-CAC--ATGCGGTTT-- C/1-466 -----TCCCGTCTTTATTCATTATAACCACTGATCAC--CCTGGCAC--ATGCGCTTT-- D/1-476 -A--ATCCCGTCTTTATTCATTATAACCACTGATCACGACCTGGCAC--ATGCGCTAT-- E/1-439 -A---TCCCGTCTTTATT--TTTTTAGC-CTGATCTC--CCTGGCAC--AT--------- F/1-434 -A---TCCCGTCTTTATTCATTTTTACC-CTGATCTC--C---------AT--------- G/1-462 -A--ATCCCGTCTTTATTCATTATAACC-CTGATCTC--CCTGGCAC--ATGCGGTTA-- H/1-466 -A--ATCCCGTCTTTATTCATTATAACC-CTGATCTC--CCTGGCAC--ATGCGGTTA-- I/1-462 -AGGATCCTGT--TTATTCTTTATAACC-CTGATCAC--CCTGGCAT--ATGCGGTTTGC J/1-447 -AGGATCCCGT--TTATTCTTTATAACC-CTGATCAC--CCTGGCAC--ATGCGGTTTGC K/1-448 AAGGATCCCGT-----GTCATTATAACC-CTGATCAC--ACTGGCAC--ATGCGGTTTGC L/1-431 -AGGATCCCGT-----TTCATTAT--CC-CTG-TCAC--CCTGGCAC--ATGCGGTTTGC M/1-432 --GGATCCCGT--TTATTCATTAAAACC-CTGA---C--CCTGGCAC--ATGCGGTTTGC N/1-422 --GGATCCCGT--TTATTCATTATAACC-CTGA---C--CCTGGCAC--ATGCGGTTTGC O/1-441 -ATGATCCCGT--TTATTCATTATAACC-CT---CAC--CCTGGCAC--ATGCGGTTTGC P/1-446 -AGGATCCCGT--TTATTCATTATAACC-CTGATCAC--CCTGGCAC--ATGCGGTTTGC * * * ** * ** A/1-474 ACTTCGATGCC B/1-452 ACTTCGATGCC C/1-466 ACTTCGATG-- D/1-476 ACTTCGATGCC E/1-439 -CTTCGATGCC F/1-434 -CTTCGATGCC G/1-462 ACTTCGATG-- H/1-466 ACTTCGATGCC I/1-462 --TTCGATGCC J/1-447 ACTTCGATGCC K/1-448 ACTTCGATG-- L/1-431 ACTTCGATG-- M/1-432 ACTTCGATGCC N/1-422 ACTTCGATGCC O/1-441 ACTTCG-TGCC P/1-446 ACTTCG-TGCC **** ** I have tried different things, but I don't really know why do I have this problem... Does anyone knows? Thank you very much in advance, Jose G. _________________________________________________________________ ?Quieres saber qu? PC eres? ?Desc?brelo aqu?! http://www.quepceres.com/ From cjfields at illinois.edu Wed Mar 24 10:37:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 09:37:13 -0500 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> Message-ID: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> On Mar 24, 2010, at 9:08 AM, Peter wrote: > Hi, > > This is probably of interest to all the Bio* projects offering access > to the NCBI > Entrez utilities. See forwarded message below. > > I *think* the new guidelines basically say that the email & tool parameters are > optional BUT if your IP address ever gets banned for excessive use you then > have to register an email & tool combination. > > Regarding the email address, the NCBI say to use the email of the developer > (not the end user). However, they do not distinguish between the developers > of a library (like us), and the developers of an application or script using a > library (who may also be the end user). > > Currently we (Biopython) and I think BioPerl ask developers using our libraries > to populate the email address themselves. I *think* this is still the > right action. > > Peter Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I think with the SOAP-based ones as well). We're providing a specific set of tools for user to write up their own applications end applications. I can try contacting them regarding this to get an official response to clarify this somewhat. Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a default, but always leave the email blank and issue a warning if it isn't set. We could just as easily leave both blank and issue warnings for both. chris > ---------- Forwarded message ---------- > From: > Date: Wed, Mar 24, 2010 at 1:53 PM > Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy > To: NLM/NCBI List utilities-announce > > > New E-utility documentation now on the NCBI Bookshelf > > The Entrez Programming Utilities (E-Utilities) Help documentation has > been added to the NCBI Bookshelf, and so is now fully integrated with > the Entrez search and retrieval system as a part of the Bookshelf > database. This help document has been divided into chapters for better > organization and includes several new sample Perl scripts. At present > this book covers the standard URL interface for the E-utilties; > material about the SOAP interface will be added soon and is still > available at the same URL: > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html. > > > > Revised E-utility usage policy > > In December, 2009 NCBI announced a change to the usage policy for the > E-utilities that would require all requests to contain non-null values > for both the &email and &tool parameters. After several consultations > with our users and developers, we have decided to revise this policy > change, and the revised policy is described in detail at the following > link: > > http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen > > Please let us know if you have any questions or concerns about this > policy change. > > > > Thank you, > > The E-Utilities Team > > NIH/NLM/NCBI > > eutilities at ncbi.nlm.nih.gov. > > > > _______________________________________________ > Utilities-announce mailing list > http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Wed Mar 24 10:51:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:51:46 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> Message-ID: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: > > On Mar 24, 2010, at 9:08 AM, Peter wrote: > >> Hi, >> >> This is probably of interest to all the Bio* projects offering access >> to the NCBI Entrez utilities. See forwarded message below. >> >> I *think* the new guidelines basically say that the email & tool parameters are >> optional BUT if your IP address ever gets banned for excessive use you then >> have to register an email & tool combination. >> >> Regarding the email address, the NCBI say to use the email of the developer >> (not the end user). However, they do not distinguish between the developers >> of a library (like us), and the developers of an application or script using a >> library (who may also be the end user). >> >> Currently we (Biopython) and I think BioPerl ask developers using our libraries >> to populate the email address themselves. I *think* this is still the >> right action. >> >> Peter > > > Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I > think with the SOAP-based ones as well). ?We're providing a specific set of > tools for user to write up their own applications end applications. ?I can try > contacting them regarding this to get an official response to clarify this > somewhat. Please give the NCBI an email - you can CC me too if you like. > Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a > default, but always leave the email blank and issue a warning if it isn't > set. ?We could just as easily leave both blank and issue warnings for both. We currently leave out the email and set the tool parameter to "Biopython" by default but this can be overridden. Currently leaving out the email does cause Biopython to give a warning. Peter From pmiguel at purdue.edu Wed Mar 24 10:59:50 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed, 24 Mar 2010 10:59:50 -0400 Subject: [Bioperl-l] How to set "complexity" param using EUtilities In-Reply-To: <4BAA1883.3010203@purdue.edu> References: <4BAA1883.3010203@purdue.edu> Message-ID: <4BAA28E6.4090907@purdue.edu> Sorry, I got that backwards. The default is "0", apparently. But to get entrez-like performance you want "complexity" to be set to "1". Phillip Phillip San Miguel wrote: > Just a little FYI that might help someone using GenBank efetch (here > with bioperl EUtilities) and, contrary to expectation, retrieving a > bunch of accessions (or GIs) when that single accession is what is > wanted. The trick is to change the "complexity" parameter from its > apparent default of "1" to "0". > > Actually, this parameter might be worth adding to the HOWTO because it > causes the EUtilities efetch to perform similar to a normal Entrez > search. Which, to me, would be the expected behavior. > > Details below. > > Some accessions/GIs appear to be embedded in bundles of related > sequences. Here is an example: > > gi|158819346|gb|EU011641.1| > > > If I search Entrez Nucleotide > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar > > with the either "158819346" (the GI) or "EU011641.1", I get a single > record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal > RNA gene, partial sequence". This what I want. > > If I use the following code derived from the Eutils HOWTO: > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $id ='gb|EU011641.1|'; > push @ids ,$id; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -id => \@ids); > > my $file = "test.gb"; > $factory->get_Response(-file => $file); > > I get a bundle of accessions: EU011584-EU011663. > Same result using the GI number instead. > > From reading: > > http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam > > > it looks like I would get what I want were I to set the efetch > "complexity" parameter to "1". > > But how do I set that parameter? Below is how I did it. Not the most > efficient path, but did not take that long to traverse... > > The HowTo does not mention it. I usually look to the the Deobfuscator: > > http://bioperl.org/cgi-bin/deob_interface.cgi > > to help me when I want some documentation for a method. But this is a > parameter not a class. What class sets this parameter? Not sure. So I > googled: > > complexity eutil site:bioperl.org > > The top ranked hit is actually to the deprecated 1.5.2 version of > EUtilities. But the 2nd hit is to the (auto generatated?) email posted > to the bioperl-guts email list by Chris Fields upon his commit of the > new EUtilities overhaul: > > http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html > > > From here it looks like the obvious way to set the parameter would be > possible. And indeed: > > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $id ='gb|EU011641.1|'; > push @ids ,$id; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -complexity =>1, > -id => \@ids); > > my $file = "test.gb"; > $factory->get_Response(-file => $file); > > works! > > Also a good idea to add -email parameter so that Genbank might > chastise me via email, rather than banning my IP, if I try to send > more than 100 requests in a series outside of the acceptable 9PM-5AM > Eastern Time hours. > > Phillip > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at drycafe.net Wed Mar 24 11:27:37 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Mar 2010 11:27:37 -0400 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> On Mar 24, 2010, at 10:51 AM, Peter wrote: > Please give the NCBI an email - you can CC me too if you like. Can't this be the developers' mailing list (or lists, the appropriate one for each toolkit)? We can even whitelist all NCBI sender addresses so they can easily email us if there are issues. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed Mar 24 11:44:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 10:44:21 -0500 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <338BDDD8-2A66-4086-BFB7-35EC8F8F0D66@illinois.edu> On Mar 24, 2010, at 9:51 AM, Peter wrote: > On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: >> >> On Mar 24, 2010, at 9:08 AM, Peter wrote: >> >>> Hi, >>> >>> This is probably of interest to all the Bio* projects offering access >>> to the NCBI Entrez utilities. See forwarded message below. >>> >>> I *think* the new guidelines basically say that the email & tool parameters are >>> optional BUT if your IP address ever gets banned for excessive use you then >>> have to register an email & tool combination. >>> >>> Regarding the email address, the NCBI say to use the email of the developer >>> (not the end user). However, they do not distinguish between the developers >>> of a library (like us), and the developers of an application or script using a >>> library (who may also be the end user). >>> >>> Currently we (Biopython) and I think BioPerl ask developers using our libraries >>> to populate the email address themselves. I *think* this is still the >>> right action. >>> >>> Peter >> >> >> Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I >> think with the SOAP-based ones as well). We're providing a specific set of >> tools for user to write up their own applications end applications. I can try >> contacting them regarding this to get an official response to clarify this >> somewhat. > > Please give the NCBI an email - you can CC me too if you like. Sent, have cc'd the open-bio list. Don't want to cross-post this too much, so I think we should move the discussion there. >> Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a >> default, but always leave the email blank and issue a warning if it isn't >> set. We could just as easily leave both blank and issue warnings for both. > > We currently leave out the email and set the tool parameter to "Biopython" > by default but this can be overridden. Currently leaving out the email does > cause Biopython to give a warning. > > Peter We follow the same, then (down to the warning). This is mentioned in my post to them, I'll wait to see what they say. My concern is the wording of the new rules. Each tool and email must be registered with them if an IP is blocked. Does this mean each tool is assigned one specific email? And an IP that is blocked can register it to be allowed back into the fold? With that in mind, should we register each of our toolkits with them? Probably not a bad thing (it might help us as devs to get an idea of use), but then if one user abuses the rules will their actions affect all toolkit users? Is this all done on a per-IP basis, per-toolkit basis, etc? Unfortunately, at least to me, none of this is made very clear, so I'm hoping there is some clarification from their end. chris From maj at fortinbras.us Wed Mar 24 12:37:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 24 Mar 2010 12:37:56 -0400 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: I think this is a great idea--- MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Peter" Cc: ; "Biopython-Dev Mailing List" ; ; "bioperl-l list" ; "Chris Fields" ; Sent: Wednesday, March 24, 2010 11:27 AM Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > >> Please give the NCBI an email - you can CC me too if you like. > > > Can't this be the developers' mailing list (or lists, the appropriate one for > each toolkit)? We can even whitelist all NCBI sender addresses so they can > easily email us if there are issues. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From thomas.sharpton at gmail.com Wed Mar 24 13:43:48 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 24 Mar 2010 10:43:48 -0700 Subject: [Bioperl-l] Codeml runtime error Message-ID: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> Hi Bioperl gurus, I'm trying to run PAML v4.3b on a series of orthologs, specifically by implementing codeml to detect signatures of positive selection between all orthologous pairs. In some of my files, I notice that I'm getting an EOF error that causes codeml to break. The weirdness is that I only get the EOF error under one hypothesis model (the null) and never on the alternative hypothesis model - even when run on the same initial data. I've managed to track the problem down to the way BioPerl formats the temporary phylip alignment file that is fed into codeml. Apparently, PAML requires there to be at least two spaces between the sequence identifier and the start of the sequence. However, for some files - and I don't know if this is random or not - the temporary alignment file only contains one space after the sequence identifier. If I edit the phylip file accordingly and rerun codeml, the software compiles and processes the data correctly. Has anyone run into this problem before and has someone figured a work around using the kaks_factory in Bio::Tools::Run::Phylo::PAML::Codeml.pm? If this is something others have not seen, I'll submit a full bug report. Best regards, Tom From Russell.Smithies at agresearch.co.nz Wed Mar 24 15:53:45 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 25 Mar 2010 08:53:45 +1300 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> The email thing is mainly to help NCBI contact developers who may be abusing or having trouble with their services. I've had an email from Scott McGinnis at NCBI before after he noticed one of my scripts could be improved. Generally, I've found their developers to be useful - it's just some of their helpdesk people who could use a lesson in being helpful. After all, it's not like they're Google or Microsoft and just collecting addresses so they can spam you later ;-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Thursday, 25 March 2010 5:38 a.m. > To: Hilmar Lapp; Peter > Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython- > Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris > Fields > Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI > RevisedE-utility Usage Policy > > I think this is a great idea--- MAJ > ----- Original Message ----- > From: "Hilmar Lapp" > To: "Peter" > Cc: ; "Biopython-Dev Mailing List" > ; ; "bioperl- > l > list" ; "Chris Fields" > ; > > Sent: Wednesday, March 24, 2010 11:27 AM > Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI > RevisedE-utility Usage Policy > > > > > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > > > >> Please give the NCBI an email - you can CC me too if you like. > > > > > > Can't this be the developers' mailing list (or lists, the appropriate > one for > > each toolkit)? We can even whitelist all NCBI sender addresses so they > can > > easily email us if there are issues. > > > > -hilmar > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > > =========================================================== > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Mar 24 16:01:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 15:01:50 -0500 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> Message-ID: Russell, The problem we're possibly running into now is that (acc. to the documents) we will likely have to define both the tool and email (or neither), as the tool and email are registered together. There are advantages and disadvantages to both scenarios, one that you point out. ATM I'm awaiting back word from NCBI for clarification (I popped 'em an email about this earlier) and will hopefully post their response here if they send one, then we'll hash out what needs to be done. And agreed about Scott, he's always been helpful. chris On Mar 24, 2010, at 2:53 PM, Smithies, Russell wrote: > The email thing is mainly to help NCBI contact developers who may be abusing or having trouble with their services. > I've had an email from Scott McGinnis at NCBI before after he noticed one of my scripts could be improved. Generally, I've found their developers to be useful - it's just some of their helpdesk people who could use a lesson in being helpful. > > After all, it's not like they're Google or Microsoft and just collecting addresses so they can spam you later ;-) > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >> Sent: Thursday, 25 March 2010 5:38 a.m. >> To: Hilmar Lapp; Peter >> Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython- >> Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris >> Fields >> Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI >> RevisedE-utility Usage Policy >> >> I think this is a great idea--- MAJ >> ----- Original Message ----- >> From: "Hilmar Lapp" >> To: "Peter" >> Cc: ; "Biopython-Dev Mailing List" >> ; ; "bioperl- >> l >> list" ; "Chris Fields" >> ; >> >> Sent: Wednesday, March 24, 2010 11:27 AM >> Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI >> RevisedE-utility Usage Policy >> >> >>> >>> On Mar 24, 2010, at 10:51 AM, Peter wrote: >>> >>>> Please give the NCBI an email - you can CC me too if you like. >>> >>> >>> Can't this be the developers' mailing list (or lists, the appropriate >> one for >>> each toolkit)? We can even whitelist all NCBI sender addresses so they >> can >>> easily email us if there are issues. >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Mar 24 15:53:48 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 24 Mar 2010 12:53:48 -0700 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBIRevisedE-utility Usage Policy In-Reply-To: References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com><5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: <1A4207F8295607498283FE9E93B775B406A418BB@EX02.asurite.ad.asu.edu> Well, the problem with NCBI using the address to email about problem users is that the lists can't really identify the user since it isn't a specific program, but someone's specific implementation utilizing the toolkit that is causing problems. So, not sure how this would help with the problem of dealing with trouble users. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen Sent: Wednesday, March 24, 2010 9:38 AM To: Hilmar Lapp; Peter Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython-Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris Fields Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBIRevisedE-utility Usage Policy I think this is a great idea--- MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Peter" Cc: ; "Biopython-Dev Mailing List" ; ; "bioperl-l list" ; "Chris Fields" ; Sent: Wednesday, March 24, 2010 11:27 AM Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > >> Please give the NCBI an email - you can CC me too if you like. > > > Can't this be the developers' mailing list (or lists, the appropriate one for > each toolkit)? We can even whitelist all NCBI sender addresses so they can > easily email us if there are issues. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Mar 24 16:38:31 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 24 Mar 2010 21:38:31 +0100 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> Message-ID: <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> Hi Tom, Thanks for your note. From your description, it sounds like a bug report is in order. If you could include a little test case so we can reproduce it, that would be great. Dave From thomas.sharpton at gmail.com Wed Mar 24 16:40:55 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 24 Mar 2010 13:40:55 -0700 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> Message-ID: <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> Hi Dave, Thanks for the prompt reply. I'll submit a full bug report along with a code snippet and sample data set that should demonstrate the error. If there's anyway I can help, do let me know. Best, Tom On Mar 24, 2010, at 1:38 PM, Dave Messina wrote: > Hi Tom, > > Thanks for your note. From your description, it sounds like a bug > report is in order. If you could include a little test case so we > can reproduce it, that would be great. > > > Dave > From David.Messina at sbc.su.se Wed Mar 24 16:52:59 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 24 Mar 2010 21:52:59 +0100 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> Message-ID: <4BEA53ED-87B6-4EE0-B5E6-AE304A335AA8@sbc.su.se> > Thanks for the prompt reply. I'll submit a full bug report along with a code snippet and sample data set that should demonstrate the error. Terrific, thanks! > If there's anyway I can help, do let me know. Oh don't worry...I will. :) D From cjfields at illinois.edu Thu Mar 25 00:50:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 23:50:11 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] Bio::DB::SeqFeature spliced_seq() In-Reply-To: <4BA7D267.6050704@bioperl.org> References: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> <4BA7D267.6050704@bioperl.org> Message-ID: <46D94C25-4E2D-4E64-A696-1C9D3F785EEB@illinois.edu> Yes, that's essentially what I have working now. I suppose the best way to do this is have an optional type supplied and splice only those, checking the subfeatures to ensure that type exists. I'll check against SeqFeatureI's spliced_seq() to see if there are any API issues. chris On Mar 22, 2010, at 3:26 PM, Jason Stajich wrote: > Yes it needs a special case I guess - since spliced_seq should work, > however ... The only problem is that if both exons and CDS are > sub-features you have to be smart enough to not grab both... > > So I have just relied on specialized dumping scripts for gff3_to_cds for > my own needs (i.e. > http://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/dbgff_to_cdspep.pl > ). > But you might also see what the Gbrowse plugin dumpers do. > > -jason > Chris Fields wrote, On 3/22/10 11:56 AM: >> I have just noticed that spliced_seq() is borked with >> Bio::DB::SeqFeature and am thinking about implementing it. Or is >> similar functionality already implemented elsewhere? >> >> Currently, it is calling entire_seq(), which I plan on avoiding simply >> to prevent sucking in the entire sequence into memory. This is >> currently what happens: >> >> >> --------------------------- >> >> my $it = $store->get_seq_stream(-type => 'mRNA'); >> >> my $ct = 0; >> while (my $sf = $it->next_seq) { >> my $seq = $sf->spliced_seq; # dies with exception >> } >> >> --------------------------- >> >> ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >> MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented >> by package Bio::DB::SeqFeature. >> This is not your fault - author of Bio::DB::SeqFeature should be blamed! >> >> STACK: Error::throw >> STACK: >> Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 >> STACK: >> Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 >> STACK: >> Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 >> STACK: >> Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 >> STACK: beestore.pl:17 >> ---------------------------------------------------------------- >> >> >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse From lpritc at scri.ac.uk Thu Mar 25 07:20:01 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 25 Mar 2010 11:20:01 +0000 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Message-ID: Hi, Nathan's been in touch to ask exactly what the command-line was that I was using, and this was missing from the thread so, for info: bp_genbank2gff3.pl --noCDS NC_000913.gbk And bp_genbank2gff3.pl --CDS NC_000913.gbk With occasional absolute paths to the input sequence. L. On 23/03/2010 Tuesday, March 23, 18:18, "Scott Cain" wrote: > Hi Leighton, > > I wonder if this is a change stemming from Nathan's work on this > script. Nathan? > > Scott > -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From aradwen at gmail.com Fri Mar 26 07:29:16 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 26 Mar 2010 12:29:16 +0100 Subject: [Bioperl-l] aacomp.pl problem Message-ID: Hello, I'm facing a little problem with aacomp.pl in scripts examples that comes with Bioperl Here is the error message Can't locate object method "valid_aa" via package "Bio::Tools::CodonTable" at aacomp.pl line 16. Any Idea ? Thx Radwen From David.Messina at sbc.su.se Fri Mar 26 08:51:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 13:51:11 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: Message-ID: Hi Radwen, The latest version of aacomp (from subversion) worked fine for me. That version has this line near the top of the script: # $Id: aacomp.PLS 15088 2008-12-04 02:49:09Z bosborne $ If yours is different, you might try upgrading to the latest version. In fact, I'm almost certain that is the problem, since the valid_aa method is in the Bio::SeqUtils class, not Bio::Tools::CodonTable. Dave From David.Messina at sbc.su.se Fri Mar 26 10:24:25 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 15:24:25 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: Message-ID: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> Hi, Yes, the subversion site is temporarily down. However, there are nightly builds http://www.bioperl.org/DIST/nightly_builds/ and the Github mirror http://github.com/bioperl Dave On Mar 26, 2010, at 15:20, Radwen Aniba wrote: > The subversion site is down?!!! From David.Messina at sbc.su.se Fri Mar 26 10:35:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 15:35:29 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> Message-ID: <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> Radwen, Please be sure to 'reply all' so that everyone on the list can follow this discussion. > Sorry to ask beginners questions but how to configure these mirrors to upgrade ? > > I'm using ubuntu Step 1: download the bioperl-live tarball from, for example, http://www.bioperl.org/DIST/nightly_builds/ Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix Dave From cjfields at illinois.edu Fri Mar 26 10:40:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 09:40:20 -0500 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> Message-ID: <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> On Mar 26, 2010, at 9:35 AM, Dave Messina wrote: > Radwen, > > Please be sure to 'reply all' so that everyone on the list can follow this discussion. > > >> Sorry to ask beginners questions but how to configure these mirrors to upgrade ? >> >> I'm using ubuntu > > > > > Step 1: download the bioperl-live tarball from, for example, http://www.bioperl.org/DIST/nightly_builds/ > > Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > > > > Dave You can also get tarballs of bioperl-live from the github mirror (via the 'Download Source' link): http://github.com/bioperl/bioperl-live These are updated every 15 minutes. chris From aradwen at gmail.com Fri Mar 26 10:41:51 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 26 Mar 2010 15:41:51 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> Message-ID: Thank you 2010/3/26 Chris Fields > > On Mar 26, 2010, at 9:35 AM, Dave Messina wrote: > > > Radwen, > > > > Please be sure to 'reply all' so that everyone on the list can follow > this discussion. > > > > > >> Sorry to ask beginners questions but how to configure these mirrors to > upgrade ? > >> > >> I'm using ubuntu > > > > > > > > > > Step 1: download the bioperl-live tarball from, for example, > http://www.bioperl.org/DIST/nightly_builds/ > > > > Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > > > > > > > > > Dave > > > You can also get tarballs of bioperl-live from the github mirror (via the > 'Download Source' link): > > http://github.com/bioperl/bioperl-live > > These are updated every 15 minutes. > > chris From maj at fortinbras.us Fri Mar 26 10:34:49 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 10:34:49 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <4BABB825.6010803@cse.msu.edu> References: <4BABB825.6010803@cse.msu.edu> Message-ID: <249674A825C14BB3801C6184DEEA7A82@NewLife> Hi Alok-- Thanks for your interest! You should certainly consider applying. I can work with you on developing your application. I'm including the bioperl mailing list on this post; we'll continue to have this conversation on the list so that the helpful, friendly, knowledgeable, compassionate membership can participate. WrapperMaker code is currently available in svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run for an example of how Bio::Tools::Run::WrapperBase and CommandExts are used (er, by me...). cheers MAJ ----- Original Message ----- From: "Alok" To: Sent: Thursday, March 25, 2010 3:23 PM Subject: BioPerl Google SOC project > Hello Mark, > > My name is Alok Watve and I am currently pursuing PhD in Computer > Science at Michigan State University. I was going through the BioPerl > Wiki for Google SOC projects. I have good experience with Perl and was > wondering if I could work on the project "Perl Run Wrappers". > > Prior to joining MSU, I was working with D E Shaw India Software Pvt. > Ltd. My work was involved in writing Java programs and their perl > wrappers. We used perl scripts to fire java programs with all the > correct parameters. So I think I have some idea about what wrappers are. > However, I have not used BioPerl and may take some time to get familiar > with the structure. I am fairly confident that I will be able to do this. > > During my work here at MSU. I use perl a lot for doing basic text > analysis for my projects. Although I rarely use OO features of perl, I > have used them in past and never had any problems with it. I also > believe in writing well-documented and user/developer friendly code > (With comments, command line options for help/documentation). I have > attached a simple script I wrote for my project as an example. I have > also attached my resume for your consideration. > > Please let me know if you think that I am an appropriate candidate and > whether I should go ahead with submitting an application with BioPerl as > my Mentor Organization. > > Thanks a lot, > Alok > www.cse.msu.edu/~watvealo/ > -------------------------------------------------------------------------------- > #!/usr/bin/perl > > =pod > > =head1 SYNOPSIS > > Script to edit existing box query files to enable random box query. > This scripts inserts box size on each line corresponding to discrete > dimension in the existing box query file. The maximum value of "box size" > depends on the alphabet size. > > Example > ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile > > Use -perldoc for detailed help on options. > > =head1 OPTIONS > > =over > > =item -infile > > Specifies the name of the input box query file. > > =item -outfile > > Specifies the name of the output file. > > =item -uniform_box > > Specifies size of the uniform box query. > > =item -max_size > > Specifies the maximum box size for random sized box query. > > =item -help > > Displays a brief help message and exits. > > =item -perldoc > > Displays a detailed help. > > =back > > =cut > > use strict; > use warnings 'all'; > > use Getopt::Long; > use Pod::Usage; > > GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, > 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, > 'help' => \my $help, 'perldoc' => \my $perldoc); > > if(defined($perldoc)) > { > pod2usage(-verbose => 2); > } > > if(defined($help)) > { > pod2usage(-verbose=> 0); > } > > if(! (defined($infile) && defined ($outfile) )) > { > die('Please specify input, output files. Use -perldoc > for more help'); > } > > # Some basic error checking to ensure script runs .... > if(!(defined($uniformBox) ||defined($maxSize))) > { > die('Specify either box size for uniform box queries or maximum box size > for random box queries'); > } > > # Initialize random number generator. > srand(); > > # Read Input file and find out lines we are interested in > # Then perfix the line with correct box size as defined by > # user choice > open(IN, "<$infile"); > open(OUT, ">$outfile"); > my $count = 0; > while(my $line = ) > { > if( ($count%64) < 32 ) > { > if(defined($uniformBox)) > { > $line = sprintf("%d ",$uniformBox) . $line; > } > elsif(defined($maxSize)) > { > # This line corresponds to the discrete dimension. > $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; > } > } > $count ++; > print OUT $line > } > > close(OUT); > close(IN); > From cjfields at illinois.edu Fri Mar 26 11:06:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 10:06:26 -0500 Subject: [Bioperl-l] BioPerl and the Google Summer of Code Message-ID: Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via twitter: http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html http://use.perl.org/~cjfields/journal/40275 I'll update the BioPerl page with a couple more ideas later today (think: Moose and/or Perl6...). chris From awitney at sgul.ac.uk Fri Mar 26 11:20:36 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 26 Mar 2010 15:20:36 +0000 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl Message-ID: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please read the install the bioperl-ext package" even though i did compile and install the Bio::Ext::Align package If not using the pSW module, what do other people use for this? thanks adam From cjfields at illinois.edu Fri Mar 26 11:51:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 10:51:41 -0500 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl In-Reply-To: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> References: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> Message-ID: <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> It's not actively developed as far as I know. I've been thinking that we could break it out of bioperl-ext and release it on it's own, with the intent that someone could take it up at some point. We have started down that road with the HMM tools in bioperl-ext, though that one is still maintained by it's author. I know many users just use calls to outside programs, such EMBOSS (which has water and needle) or others. From the maintenance standpoint they're easier to update if something changes, XS can be a bugbear. chris On Mar 26, 2010, at 10:20 AM, Adam Witney wrote: > Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error > > "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. > Please read the install the bioperl-ext package" > > even though i did compile and install the Bio::Ext::Align package > > If not using the pSW module, what do other people use for this? > > thanks > > adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pmiguel at purdue.edu Fri Mar 26 11:52:17 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 26 Mar 2010 11:52:17 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook Message-ID: <4BACD831.20506@purdue.edu> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. I am trying to use the code provided at: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO and modified to request gi228534658 The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::DB::EUtilities; my @ids; push @ids, '228534658'; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); my $file = 'myseqs.gb'; # dump HTTP::Response content to a file (not retained in memory) $factory->get_Response(-file => $file); my $seqin = Bio::SeqIO->new(-file => $file, -format => 'genbank'); while (my $seq = $seqin->next_seq) { print "I see a sequence\n"; print $seq->species(); } "myseqs.gb" does have content: Seq-entry ::= seq { id { general { db "gpid:36555" , tag str "contig49313" } , genbank { accession "EZ113652" , version 1 } , gi 228534658 } , descr { title "TSA: Zea mays contig49313, mRNA sequence." , source { genome genomic , org { taxname "Zea mays" , db { { db "taxon" , tag id 4577 } } , orgname { name binomial { genus "Zea" , species "mays" } , lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACCAD clade; Panicoideae; Andropogoneae; Zea" , gcode 1 , mgcode 1 , div "PLN" } } } , molinfo { biomol mRNA , tech tsa } , pub { pub { article { title { name "Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing." } , authors { names std { { name name { last "Vega-Arreguin" , initials "J.C." } } , { name name { last "Ibarra-Laclette" , initials "E." } } , { name name { last "Jimenez-Moraila" , initials "B." } } , { name name { last "Martinez" , initials "O." } } , { name name { last "Vielle-Calzada" , initials "J.P." } } , { name name { last "Herrera-Estrella" , initials "L." } } , { name name { last "Herrera-Estrella" , initials "A." } } } } , from journal { title { iso-jta "BMC Genomics" , ml-jta "BMC Genomics" , issn "1471-2164" , name "BMC genomics" } , imp { date std { year 2009 , month 7 , day 6 } , volume "10" , issue "1" , pages "299" , language "ENG" , pubstatus aheadofprint , history { { pubstatus received , date std { year 2008 , month 12 , day 2 } } , { pubstatus accepted , date std { year 2009 , month 7 , day 6 } } , { pubstatus aheadofprint , date std { year 2009 , month 7 , day 6 } } , { pubstatus other , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } , { pubstatus pubmed , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } , { pubstatus medline , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } } } } , ids { pii "1471-2164-10-299" , doi "10.1186/1471-2164-10-299" , pubmed 19580677 } } , pmid 19580677 } } , pub { pub { sub { authors { names std { { name name { last "Vega-Arreguin" , first "Julio" , initials "J.C." } } , { name name { last "Ibarra-Laclette" , first "Enrique" , initials "E." } } , { name name { last "Jimenez-Moraila" , first "Beatriz" , initials "B." } } , { name name { last "Martinez" , first "Octavio" , initials "O." } } , { name name { last "Vielle-Calzada" , first "Jean" , initials "J.Philippe." } } , { name name { last "Herrera-Estrella" , first "Luis" , initials "L." } } , { name name { last "Herrera-Estrella" , first "Alfredo" , initials "A." } } } , affil std { affil "Laboratorio Nacional de Genomica para la Biodiversidad" , div "Cinvestav Campus Guanajuato" , city "Irapuato" , sub "Guanajuato" , country "Mexico" , street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , postal-code "36821" } } , medium other , date std { year 2009 , month 3 , day 23 } } } } , user { type str "GenomeProjectsDB" , data { { label str "ProjectID" , data int 36555 } , { label str "ParentID" , data int 0 } } } , create-date std { year 2009 , month 5 , day 5 } , update-date std { year 2009 , month 7 , day 14 } } , inst { repr raw , mol rna , length 450 , seq-data ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 760CFF0'H } } Maybe I am using the wrong format? This looks more like ASN than genbank format to me. Phillip From maj at fortinbras.us Fri Mar 26 11:37:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 11:37:56 -0400 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: References: Message-ID: <648F9E90AF07449887FD4C420AA8B00E@NewLife> and discussions are started in LinkedIn in 'Bioinformatics Geeks' and 'Perl Mongers' groups--MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Friday, March 26, 2010 11:06 AM Subject: [Bioperl-l] BioPerl and the Google Summer of Code > Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via > twitter: > > http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html > http://use.perl.org/~cjfields/journal/40275 > > I'll update the BioPerl page with a couple more ideas later today (think: > Moose and/or Perl6...). > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri Mar 26 12:16:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 11:16:22 -0500 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <4BACD831.20506@purdue.edu> References: <4BACD831.20506@purdue.edu> Message-ID: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. chris On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. > > I am trying to use the code provided at: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO > > and modified to request gi228534658 > > The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. > > Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: > > #!/usr/bin/perl > use strict; > use warnings; > > use Bio::SeqIO; > use Bio::DB::EUtilities; > > my @ids; > push @ids, '228534658'; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -id => \@ids); > > my $file = 'myseqs.gb'; > > # dump HTTP::Response content to a file (not retained in memory) > $factory->get_Response(-file => $file); > > my $seqin = Bio::SeqIO->new(-file => $file, > -format => 'genbank'); > > while (my $seq = $seqin->next_seq) { > print "I see a sequence\n"; > print $seq->species(); > } > > > "myseqs.gb" does have content: > > Seq-entry ::= seq { > id { > general { > db "gpid:36555" , > tag > str "contig49313" } , > genbank { > accession "EZ113652" , > version 1 } , > gi 228534658 } , > descr { > title "TSA: Zea mays contig49313, mRNA sequence." , > source { > genome genomic , > org { > taxname "Zea mays" , > db { > { > db "taxon" , > tag > id 4577 } } , > orgname { > name > binomial { > genus "Zea" , > species "mays" } , > lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; > Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; > PACCAD clade; Panicoideae; Andropogoneae; Zea" , > gcode 1 , > mgcode 1 , > div "PLN" } } } , > molinfo { > biomol mRNA , > tech tsa } , > pub { > pub { > article { > title { > name "Deep sampling of the Palomero maize transcriptome by a high > throughput strategy of pyrosequencing." } , > authors { > names > std { > { > name > name { > last "Vega-Arreguin" , > initials "J.C." } } , > { > name > name { > last "Ibarra-Laclette" , > initials "E." } } , > { > name > name { > last "Jimenez-Moraila" , > initials "B." } } , > { > name > name { > last "Martinez" , > initials "O." } } , > { > name > name { > last "Vielle-Calzada" , > initials "J.P." } } , > { > name > name { > last "Herrera-Estrella" , > initials "L." } } , > { > name > name { > last "Herrera-Estrella" , > initials "A." } } } } , > from > journal { > title { > iso-jta "BMC Genomics" , > ml-jta "BMC Genomics" , > issn "1471-2164" , > name "BMC genomics" } , > imp { > date > std { > year 2009 , > month 7 , > day 6 } , > volume "10" , > issue "1" , > pages "299" , > language "ENG" , > pubstatus aheadofprint , > history { > { > pubstatus received , > date > std { > year 2008 , > month 12 , > day 2 } } , > { > pubstatus accepted , > date > std { > year 2009 , > month 7 , > day 6 } } , > { > pubstatus aheadofprint , > date > std { > year 2009 , > month 7 , > day 6 } } , > { > pubstatus other , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } , > { > pubstatus pubmed , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } , > { > pubstatus medline , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } } } } , > ids { > pii "1471-2164-10-299" , > doi "10.1186/1471-2164-10-299" , > pubmed 19580677 } } , > pmid 19580677 } } , > pub { > pub { > sub { > authors { > names > std { > { > name > name { > last "Vega-Arreguin" , > first "Julio" , > initials "J.C." } } , > { > name > name { > last "Ibarra-Laclette" , > first "Enrique" , > initials "E." } } , > { > name > name { > last "Jimenez-Moraila" , > first "Beatriz" , > initials "B." } } , > { > name > name { > last "Martinez" , > first "Octavio" , > initials "O." } } , > { > name > name { > last "Vielle-Calzada" , > first "Jean" , > initials "J.Philippe." } } , > { > name > name { > last "Herrera-Estrella" , > first "Luis" , > initials "L." } } , > { > name > name { > last "Herrera-Estrella" , > first "Alfredo" , > initials "A." } } } , > affil > std { > affil "Laboratorio Nacional de Genomica para la Biodiversidad" , > div "Cinvestav Campus Guanajuato" , > city "Irapuato" , > sub "Guanajuato" , > country "Mexico" , > street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , > postal-code "36821" } } , > medium other , > date > std { > year 2009 , > month 3 , > day 23 } } } } , > user { > type > str "GenomeProjectsDB" , > data { > { > label > str "ProjectID" , > data > int 36555 } , > { > label > str "ParentID" , > data > int 0 } } } , > create-date > std { > year 2009 , > month 5 , > day 5 } , > update-date > std { > year 2009 , > month 7 , > day 14 } } , > inst { > repr raw , > mol rna , > length 450 , > seq-data > ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 > 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A > A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 > 760CFF0'H } } > > > Maybe I am using the wrong format? This looks more like ASN than genbank format to me. > > Phillip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 26 12:38:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 11:38:26 -0500 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: <648F9E90AF07449887FD4C420AA8B00E@NewLife> References: <648F9E90AF07449887FD4C420AA8B00E@NewLife> Message-ID: <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> BioPerl GSoC page updated with the Moose/Modern Perl/BioPerl 6-based project: http://www.bioperl.org/wiki/Google_Summer_of_Code#BioPerl_2.0_.28and_beyond.29 Feel free to add your name to the lost of mentors if you are interested. chris On Mar 26, 2010, at 10:37 AM, Mark A. Jensen wrote: > and discussions are started in LinkedIn in 'Bioinformatics Geeks' and 'Perl Mongers' groups--MAJ > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Friday, March 26, 2010 11:06 AM > Subject: [Bioperl-l] BioPerl and the Google Summer of Code > > >> Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via twitter: >> >> http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html >> http://use.perl.org/~cjfields/journal/40275 >> >> I'll update the BioPerl page with a couple more ideas later today (think: Moose and/or Perl6...). >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From pmiguel at purdue.edu Fri Mar 26 13:28:09 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 26 Mar 2010 13:28:09 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Message-ID: <4BACEEA9.2060407@purdue.edu> Ah, yes. That does the trick. Actually I have already downloaded a few thousand records in whatever that format that is returned when 'genbank' is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= seq {') Any idea what format that is and how to convert it to something SeqIO can use? If not, I can just pull them all down again by sending about 200 gi's per request. That should not offend the genbank gods... Thanks for your help, Phillip Chris Fields wrote: > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. > > Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. > > chris > > On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > > >> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. >> >> I am trying to use the code provided at: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO >> >> and modified to request gi228534658 >> >> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. >> >> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> >> use Bio::SeqIO; >> use Bio::DB::EUtilities; >> >> my @ids; >> push @ids, '228534658'; >> my $factory = Bio::DB::EUtilities->new( >> -eutil => 'efetch', >> -db => 'nucleotide', >> -rettype => 'genbank', >> -id => \@ids); >> >> my $file = 'myseqs.gb'; >> >> # dump HTTP::Response content to a file (not retained in memory) >> $factory->get_Response(-file => $file); >> >> my $seqin = Bio::SeqIO->new(-file => $file, >> -format => 'genbank'); >> >> while (my $seq = $seqin->next_seq) { >> print "I see a sequence\n"; >> print $seq->species(); >> } >> >> >> "myseqs.gb" does have content: >> >> Seq-entry ::= seq { >> id { >> general { >> db "gpid:36555" , >> tag >> str "contig49313" } , >> genbank { >> accession "EZ113652" , >> version 1 } , >> gi 228534658 } , >> descr { >> title "TSA: Zea mays contig49313, mRNA sequence." , >> source { >> genome genomic , >> org { >> taxname "Zea mays" , >> db { >> { >> db "taxon" , >> tag >> id 4577 } } , >> orgname { >> name >> binomial { >> genus "Zea" , >> species "mays" } , >> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; >> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; >> PACCAD clade; Panicoideae; Andropogoneae; Zea" , >> gcode 1 , >> mgcode 1 , >> div "PLN" } } } , >> molinfo { >> biomol mRNA , >> tech tsa } , >> pub { >> pub { >> article { >> title { >> name "Deep sampling of the Palomero maize transcriptome by a high >> throughput strategy of pyrosequencing." } , >> authors { >> names >> std { >> { >> name >> name { >> last "Vega-Arreguin" , >> initials "J.C." } } , >> { >> name >> name { >> last "Ibarra-Laclette" , >> initials "E." } } , >> { >> name >> name { >> last "Jimenez-Moraila" , >> initials "B." } } , >> { >> name >> name { >> last "Martinez" , >> initials "O." } } , >> { >> name >> name { >> last "Vielle-Calzada" , >> initials "J.P." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> initials "L." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> initials "A." } } } } , >> from >> journal { >> title { >> iso-jta "BMC Genomics" , >> ml-jta "BMC Genomics" , >> issn "1471-2164" , >> name "BMC genomics" } , >> imp { >> date >> std { >> year 2009 , >> month 7 , >> day 6 } , >> volume "10" , >> issue "1" , >> pages "299" , >> language "ENG" , >> pubstatus aheadofprint , >> history { >> { >> pubstatus received , >> date >> std { >> year 2008 , >> month 12 , >> day 2 } } , >> { >> pubstatus accepted , >> date >> std { >> year 2009 , >> month 7 , >> day 6 } } , >> { >> pubstatus aheadofprint , >> date >> std { >> year 2009 , >> month 7 , >> day 6 } } , >> { >> pubstatus other , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } , >> { >> pubstatus pubmed , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } , >> { >> pubstatus medline , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } } } } , >> ids { >> pii "1471-2164-10-299" , >> doi "10.1186/1471-2164-10-299" , >> pubmed 19580677 } } , >> pmid 19580677 } } , >> pub { >> pub { >> sub { >> authors { >> names >> std { >> { >> name >> name { >> last "Vega-Arreguin" , >> first "Julio" , >> initials "J.C." } } , >> { >> name >> name { >> last "Ibarra-Laclette" , >> first "Enrique" , >> initials "E." } } , >> { >> name >> name { >> last "Jimenez-Moraila" , >> first "Beatriz" , >> initials "B." } } , >> { >> name >> name { >> last "Martinez" , >> first "Octavio" , >> initials "O." } } , >> { >> name >> name { >> last "Vielle-Calzada" , >> first "Jean" , >> initials "J.Philippe." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> first "Luis" , >> initials "L." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> first "Alfredo" , >> initials "A." } } } , >> affil >> std { >> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , >> div "Cinvestav Campus Guanajuato" , >> city "Irapuato" , >> sub "Guanajuato" , >> country "Mexico" , >> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , >> postal-code "36821" } } , >> medium other , >> date >> std { >> year 2009 , >> month 3 , >> day 23 } } } } , >> user { >> type >> str "GenomeProjectsDB" , >> data { >> { >> label >> str "ProjectID" , >> data >> int 36555 } , >> { >> label >> str "ParentID" , >> data >> int 0 } } } , >> create-date >> std { >> year 2009 , >> month 5 , >> day 5 } , >> update-date >> std { >> year 2009 , >> month 7 , >> day 14 } } , >> inst { >> repr raw , >> mol rna , >> length 450 , >> seq-data >> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 >> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A >> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 >> 760CFF0'H } } >> >> >> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. >> >> Phillip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bioperlanand at yahoo.com Fri Mar 26 00:40:23 2010 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Thu, 25 Mar 2010 21:40:23 -0700 (PDT) Subject: [Bioperl-l] From Anand - a question on querying ncbi's genomeprj with Bio::DB::Eutilities Message-ID: <27160.94644.qm@web114211.mail.gq1.yahoo.com> Hi everybody, ? I have a list of genome project ids & I have a need where I need to gather information from a specific field? & store the output in a file. As regards what Info I want For example, for genome project id 30807? http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=30807, I need to grab the text information that reads (this is found at the bottom of the page):Anabaena azollae. Anabaena azollae is a cyanobacterial symbiont of the water fern Azolla, commonly known as 'duckweed'. Anabaena azollae is a nitrogen-fixer and provides nitrogen to the host plant.Nostoc azollae 0708. Nostoc azollae 0708, also called Anabaena azollae strain 0708, will be used for comparative analysis. I need to grab the? same information for a list of genome project ids. Is this possible using Bio::DB::Eutilities. If yes, what would be the fields/params? I did try out this: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#What_information_is_available_for_database_.27x.27.3F to find out what information is available for genomeprj, but I am unable to get the necessary field/param for my need. Please help. Alternatively, is there a better way to address my need other than Bio::DB::Eutilities Thanks in advance, Anand From rmb32 at cornell.edu Fri Mar 26 03:44:09 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 00:44:09 -0700 Subject: [Bioperl-l] GSoC mentors mailing list Message-ID: <4BAC65C9.307@cornell.edu> Hi all, If you have volunteered to be a possible GSoC mentor, and have not already been subscribed to the (mentors-only) gsoc-mentors mailing list, send me an email and I'll subscribe you. Rob Buels OBF GSoC 2010 Admin From rmb32 at cornell.edu Fri Mar 26 12:30:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 09:30:30 -0700 Subject: [Bioperl-l] Announcing OBF Summer of Code - please forward! Message-ID: <4BACE126.1030500@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 9! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2010 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010 Applications due 19:00 UTC, April 9, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 29 through Friday, April 9th, 2010. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2010 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs From cjfields at illinois.edu Fri Mar 26 14:28:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 13:28:46 -0500 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <4BACEEA9.2060407@purdue.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> <4BACEEA9.2060407@purdue.edu> Message-ID: <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> That format is ASN.1. and there isn't a BioPerl parser for GenBank ASN.1 format (it tends to be too cumbersome). However, there is a pure-perl-based one for the EntrezGene ASN.1 format (Bio::ASN1::EntrezGene). chris On Fri, 2010-03-26 at 13:28 -0400, Phillip San Miguel wrote: > Ah, yes. That does the trick. Actually I have already downloaded a few > thousand records in whatever that format that is returned when 'genbank' > is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= > seq {') Any idea what format that is and how to convert it to something > SeqIO can use? > > If not, I can just pull them all down again by sending about 200 gi's > per request. That should not offend the genbank gods... > > Thanks for your help, > Phillip > > Chris Fields wrote: > > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. > > > > Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. > > > > chris > > > > On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > > > > > >> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. > >> > >> I am trying to use the code provided at: > >> > >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO > >> > >> and modified to request gi228534658 > >> > >> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. > >> > >> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: > >> > >> #!/usr/bin/perl > >> use strict; > >> use warnings; > >> > >> use Bio::SeqIO; > >> use Bio::DB::EUtilities; > >> > >> my @ids; > >> push @ids, '228534658'; > >> my $factory = Bio::DB::EUtilities->new( > >> -eutil => 'efetch', > >> -db => 'nucleotide', > >> -rettype => 'genbank', > >> -id => \@ids); > >> > >> my $file = 'myseqs.gb'; > >> > >> # dump HTTP::Response content to a file (not retained in memory) > >> $factory->get_Response(-file => $file); > >> > >> my $seqin = Bio::SeqIO->new(-file => $file, > >> -format => 'genbank'); > >> > >> while (my $seq = $seqin->next_seq) { > >> print "I see a sequence\n"; > >> print $seq->species(); > >> } > >> > >> > >> "myseqs.gb" does have content: > >> > >> Seq-entry ::= seq { > >> id { > >> general { > >> db "gpid:36555" , > >> tag > >> str "contig49313" } , > >> genbank { > >> accession "EZ113652" , > >> version 1 } , > >> gi 228534658 } , > >> descr { > >> title "TSA: Zea mays contig49313, mRNA sequence." , > >> source { > >> genome genomic , > >> org { > >> taxname "Zea mays" , > >> db { > >> { > >> db "taxon" , > >> tag > >> id 4577 } } , > >> orgname { > >> name > >> binomial { > >> genus "Zea" , > >> species "mays" } , > >> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; > >> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; > >> PACCAD clade; Panicoideae; Andropogoneae; Zea" , > >> gcode 1 , > >> mgcode 1 , > >> div "PLN" } } } , > >> molinfo { > >> biomol mRNA , > >> tech tsa } , > >> pub { > >> pub { > >> article { > >> title { > >> name "Deep sampling of the Palomero maize transcriptome by a high > >> throughput strategy of pyrosequencing." } , > >> authors { > >> names > >> std { > >> { > >> name > >> name { > >> last "Vega-Arreguin" , > >> initials "J.C." } } , > >> { > >> name > >> name { > >> last "Ibarra-Laclette" , > >> initials "E." } } , > >> { > >> name > >> name { > >> last "Jimenez-Moraila" , > >> initials "B." } } , > >> { > >> name > >> name { > >> last "Martinez" , > >> initials "O." } } , > >> { > >> name > >> name { > >> last "Vielle-Calzada" , > >> initials "J.P." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> initials "L." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> initials "A." } } } } , > >> from > >> journal { > >> title { > >> iso-jta "BMC Genomics" , > >> ml-jta "BMC Genomics" , > >> issn "1471-2164" , > >> name "BMC genomics" } , > >> imp { > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } , > >> volume "10" , > >> issue "1" , > >> pages "299" , > >> language "ENG" , > >> pubstatus aheadofprint , > >> history { > >> { > >> pubstatus received , > >> date > >> std { > >> year 2008 , > >> month 12 , > >> day 2 } } , > >> { > >> pubstatus accepted , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } } , > >> { > >> pubstatus aheadofprint , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } } , > >> { > >> pubstatus other , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } , > >> { > >> pubstatus pubmed , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } , > >> { > >> pubstatus medline , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } } } } , > >> ids { > >> pii "1471-2164-10-299" , > >> doi "10.1186/1471-2164-10-299" , > >> pubmed 19580677 } } , > >> pmid 19580677 } } , > >> pub { > >> pub { > >> sub { > >> authors { > >> names > >> std { > >> { > >> name > >> name { > >> last "Vega-Arreguin" , > >> first "Julio" , > >> initials "J.C." } } , > >> { > >> name > >> name { > >> last "Ibarra-Laclette" , > >> first "Enrique" , > >> initials "E." } } , > >> { > >> name > >> name { > >> last "Jimenez-Moraila" , > >> first "Beatriz" , > >> initials "B." } } , > >> { > >> name > >> name { > >> last "Martinez" , > >> first "Octavio" , > >> initials "O." } } , > >> { > >> name > >> name { > >> last "Vielle-Calzada" , > >> first "Jean" , > >> initials "J.Philippe." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> first "Luis" , > >> initials "L." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> first "Alfredo" , > >> initials "A." } } } , > >> affil > >> std { > >> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , > >> div "Cinvestav Campus Guanajuato" , > >> city "Irapuato" , > >> sub "Guanajuato" , > >> country "Mexico" , > >> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , > >> postal-code "36821" } } , > >> medium other , > >> date > >> std { > >> year 2009 , > >> month 3 , > >> day 23 } } } } , > >> user { > >> type > >> str "GenomeProjectsDB" , > >> data { > >> { > >> label > >> str "ProjectID" , > >> data > >> int 36555 } , > >> { > >> label > >> str "ParentID" , > >> data > >> int 0 } } } , > >> create-date > >> std { > >> year 2009 , > >> month 5 , > >> day 5 } , > >> update-date > >> std { > >> year 2009 , > >> month 7 , > >> day 14 } } , > >> inst { > >> repr raw , > >> mol rna , > >> length 450 , > >> seq-data > >> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 > >> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A > >> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 > >> 760CFF0'H } } > >> > >> > >> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. > >> > >> Phillip > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wollenbergk at niaid.nih.gov Fri Mar 26 16:47:06 2010 From: wollenbergk at niaid.nih.gov (Wollenberg, Kurt (NIH/NIAID) [C]) Date: Fri, 26 Mar 2010 16:47:06 -0400 Subject: [Bioperl-l] Error during installation of 1.6.1 Message-ID: Hello: I am trying to install BioPerl (after a recent system upgrade) and am getting the following error: "Catching error: "Can't execute q install q: No such file or directory at /Library/Perl/Updates/5.8.8/CPAN/Shell.pm line 1755\cJ" at /Library/Perl/Updates/5.8.8/CPAN.pm line 391". Previous to this I've run the CPAN upgrade, etc. as recommended on the Installation for Unix page. This happens when I try to do the actual install, both vanilla and "force"ed. I'm attempting this on a Mac G5 workstation running 10.5.8. Any clues what I may be missing or doing incorrectly? Cheers, Kurt Wollenberg, Ph.D. Contractor - Lockheed Martin Phylogenetics Specialist Computational Biology Section Bioinformatics and Computational Biosciences Branch (BCBB) OCICB/OSMO/OD/NIAID/NIH 31 Center Drive, Room 3B62 Bethesda, MD 20892-0485 Office 301-402-8628 http://bioinformatics.niaid.nih.gov (Within NIH) http://exon.niaid.nih.gov (Public) Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives From rmb32 at cornell.edu Fri Mar 26 18:22:42 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 15:22:42 -0700 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> References: <648F9E90AF07449887FD4C420AA8B00E@NewLife> <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> Message-ID: <4BAD33B2.1060309@cornell.edu> You guys are the best. Hugs all around. R From watvealo at cse.msu.edu Fri Mar 26 19:06:24 2010 From: watvealo at cse.msu.edu (Alok) Date: Fri, 26 Mar 2010 19:06:24 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <249674A825C14BB3801C6184DEEA7A82@NewLife> References: <4BABB825.6010803@cse.msu.edu> <249674A825C14BB3801C6184DEEA7A82@NewLife> Message-ID: <4BAD3DF0.7090006@cse.msu.edu> Hi Mark, Thanks a lot for the response. I tried to access the SVN but was unable to do so. My SVN client just times out :-( I even tried SVN links from the BioPerl Wiki (http://www.bioperl.org/wiki/Using_Subversion) But they too are non-responsive. Thanks, Alok Mark A. Jensen wrote: > Hi Alok-- Thanks for your interest! You should certainly consider > applying. I can work with > you on developing your application. I'm including the bioperl mailing > list on this > post; we'll continue to have this conversation on the list so that the > helpful, friendly, > knowledgeable, compassionate membership can participate. > WrapperMaker code is currently available in > svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker > > Probably you want to have a look at Bio::Tools::Run::Samtools in > bioperl-run > for an example of how Bio::Tools::Run::WrapperBase and CommandExts are > used (er, by me...). > cheers > MAJ > ----- Original Message ----- From: "Alok" > To: > Sent: Thursday, March 25, 2010 3:23 PM > Subject: BioPerl Google SOC project > > >> Hello Mark, >> >> My name is Alok Watve and I am currently pursuing PhD in Computer >> Science at Michigan State University. I was going through the BioPerl >> Wiki for Google SOC projects. I have good experience with Perl and was >> wondering if I could work on the project "Perl Run Wrappers". >> >> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >> Ltd. My work was involved in writing Java programs and their perl >> wrappers. We used perl scripts to fire java programs with all the >> correct parameters. So I think I have some idea about what wrappers are. >> However, I have not used BioPerl and may take some time to get familiar >> with the structure. I am fairly confident that I will be able to do >> this. >> >> During my work here at MSU. I use perl a lot for doing basic text >> analysis for my projects. Although I rarely use OO features of perl, I >> have used them in past and never had any problems with it. I also >> believe in writing well-documented and user/developer friendly code >> (With comments, command line options for help/documentation). I have >> attached a simple script I wrote for my project as an example. I have >> also attached my resume for your consideration. >> >> Please let me know if you think that I am an appropriate candidate and >> whether I should go ahead with submitting an application with BioPerl as >> my Mentor Organization. >> >> Thanks a lot, >> Alok >> www.cse.msu.edu/~watvealo/ >> > > > -------------------------------------------------------------------------------- > > > >> #!/usr/bin/perl >> >> =pod >> >> =head1 SYNOPSIS >> >> Script to edit existing box query files to enable random box query. >> This scripts inserts box size on each line corresponding to discrete >> dimension in the existing box query file. The maximum value of "box >> size" >> depends on the alphabet size. >> >> Example >> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >> >> Use -perldoc for detailed help on options. >> >> =head1 OPTIONS >> >> =over >> >> =item -infile >> >> Specifies the name of the input box query file. >> >> =item -outfile >> >> Specifies the name of the output file. >> >> =item -uniform_box >> >> Specifies size of the uniform box query. >> >> =item -max_size >> >> Specifies the maximum box size for random sized box query. >> >> =item -help >> >> Displays a brief help message and exits. >> >> =item -perldoc >> >> Displays a detailed help. >> >> =back >> >> =cut >> >> use strict; >> use warnings 'all'; >> >> use Getopt::Long; >> use Pod::Usage; >> >> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, >> 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >> 'help' => \my $help, 'perldoc' => \my $perldoc); >> >> if(defined($perldoc)) >> { >> pod2usage(-verbose => 2); >> } >> >> if(defined($help)) >> { >> pod2usage(-verbose=> 0); >> } >> >> if(! (defined($infile) && defined ($outfile) )) >> { >> die('Please specify input, output files. Use -perldoc >> for more help'); >> } >> >> # Some basic error checking to ensure script runs .... >> if(!(defined($uniformBox) ||defined($maxSize))) >> { >> die('Specify either box size for uniform box queries or maximum >> box size for random box queries'); >> } >> >> # Initialize random number generator. >> srand(); >> >> # Read Input file and find out lines we are interested in >> # Then perfix the line with correct box size as defined by >> # user choice >> open(IN, "<$infile"); >> open(OUT, ">$outfile"); >> my $count = 0; >> while(my $line = ) >> { >> if( ($count%64) < 32 ) >> { >> if(defined($uniformBox)) >> { >> $line = sprintf("%d ",$uniformBox) . $line; >> } >> elsif(defined($maxSize)) >> { >> # This line corresponds to the discrete dimension. >> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >> } >> } >> $count ++; >> print OUT $line >> } >> >> close(OUT); >> close(IN); >> From maj at fortinbras.us Fri Mar 26 20:08:51 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 20:08:51 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <4BAD3DF0.7090006@cse.msu.edu> References: <4BABB825.6010803@cse.msu.edu><249674A825C14BB3801C6184DEEA7A82@NewLife> <4BAD3DF0.7090006@cse.msu.edu> Message-ID: Hi Alok-- There has been trouble with the code node of late. You can get a tarball of all the latest code at http://bioperl.org/DIST/nightly_builds/ Download both bioperl-live and bioperl-run cheers, MAJ ----- Original Message ----- From: "Alok" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Friday, March 26, 2010 7:06 PM Subject: Re: [Bioperl-l] BioPerl Google SOC project > Hi Mark, > > Thanks a lot for the response. I tried to access the SVN but was unable to do > so. My SVN client just times out :-( > I even tried SVN links from the BioPerl Wiki > (http://www.bioperl.org/wiki/Using_Subversion) > But they too are non-responsive. > > Thanks, > Alok > > Mark A. Jensen wrote: >> Hi Alok-- Thanks for your interest! You should certainly consider applying. I >> can work with >> you on developing your application. I'm including the bioperl mailing list on >> this >> post; we'll continue to have this conversation on the list so that the >> helpful, friendly, >> knowledgeable, compassionate membership can participate. >> WrapperMaker code is currently available in >> svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker >> Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run >> for an example of how Bio::Tools::Run::WrapperBase and CommandExts are >> used (er, by me...). >> cheers >> MAJ >> ----- Original Message ----- From: "Alok" >> To: >> Sent: Thursday, March 25, 2010 3:23 PM >> Subject: BioPerl Google SOC project >> >> >>> Hello Mark, >>> >>> My name is Alok Watve and I am currently pursuing PhD in Computer >>> Science at Michigan State University. I was going through the BioPerl >>> Wiki for Google SOC projects. I have good experience with Perl and was >>> wondering if I could work on the project "Perl Run Wrappers". >>> >>> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >>> Ltd. My work was involved in writing Java programs and their perl >>> wrappers. We used perl scripts to fire java programs with all the >>> correct parameters. So I think I have some idea about what wrappers are. >>> However, I have not used BioPerl and may take some time to get familiar >>> with the structure. I am fairly confident that I will be able to do this. >>> >>> During my work here at MSU. I use perl a lot for doing basic text >>> analysis for my projects. Although I rarely use OO features of perl, I >>> have used them in past and never had any problems with it. I also >>> believe in writing well-documented and user/developer friendly code >>> (With comments, command line options for help/documentation). I have >>> attached a simple script I wrote for my project as an example. I have >>> also attached my resume for your consideration. >>> >>> Please let me know if you think that I am an appropriate candidate and >>> whether I should go ahead with submitting an application with BioPerl as >>> my Mentor Organization. >>> >>> Thanks a lot, >>> Alok >>> www.cse.msu.edu/~watvealo/ >>> >> >> >> -------------------------------------------------------------------------------- >> >> >> >>> #!/usr/bin/perl >>> >>> =pod >>> >>> =head1 SYNOPSIS >>> >>> Script to edit existing box query files to enable random box query. >>> This scripts inserts box size on each line corresponding to discrete >>> dimension in the existing box query file. The maximum value of "box size" >>> depends on the alphabet size. >>> >>> Example >>> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >>> >>> Use -perldoc for detailed help on options. >>> >>> =head1 OPTIONS >>> >>> =over >>> >>> =item -infile >>> >>> Specifies the name of the input box query file. >>> >>> =item -outfile >>> >>> Specifies the name of the output file. >>> >>> =item -uniform_box >>> >>> Specifies size of the uniform box query. >>> >>> =item -max_size >>> >>> Specifies the maximum box size for random sized box query. >>> >>> =item -help >>> >>> Displays a brief help message and exits. >>> >>> =item -perldoc >>> >>> Displays a detailed help. >>> >>> =back >>> >>> =cut >>> >>> use strict; >>> use warnings 'all'; >>> >>> use Getopt::Long; >>> use Pod::Usage; >>> >>> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, >>> 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >>> 'help' => \my $help, 'perldoc' => \my $perldoc); >>> >>> if(defined($perldoc)) >>> { >>> pod2usage(-verbose => 2); >>> } >>> >>> if(defined($help)) >>> { >>> pod2usage(-verbose=> 0); >>> } >>> >>> if(! (defined($infile) && defined ($outfile) )) >>> { >>> die('Please specify input, output files. Use -perldoc >>> for more help'); >>> } >>> >>> # Some basic error checking to ensure script runs .... >>> if(!(defined($uniformBox) ||defined($maxSize))) >>> { >>> die('Specify either box size for uniform box queries or maximum box size >>> for random box queries'); >>> } >>> >>> # Initialize random number generator. >>> srand(); >>> >>> # Read Input file and find out lines we are interested in >>> # Then perfix the line with correct box size as defined by >>> # user choice >>> open(IN, "<$infile"); >>> open(OUT, ">$outfile"); >>> my $count = 0; >>> while(my $line = ) >>> { >>> if( ($count%64) < 32 ) >>> { >>> if(defined($uniformBox)) >>> { >>> $line = sprintf("%d ",$uniformBox) . $line; >>> } >>> elsif(defined($maxSize)) >>> { >>> # This line corresponds to the discrete dimension. >>> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >>> } >>> } >>> $count ++; >>> print OUT $line >>> } >>> >>> close(OUT); >>> close(IN); >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bioperlanand at yahoo.com Fri Mar 26 21:40:04 2010 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Fri, 26 Mar 2010 18:40:04 -0700 (PDT) Subject: [Bioperl-l] From Anand - a question on querying ncbi's genomeprj with Bio::DB::Eutilities Message-ID: <497143.33972.qm@web114218.mail.gq1.yahoo.com> Hi everybody, ? I have a list of genome project ids & I have a need where I need to gather information from a specific field? & store the output in a file. As regards what Info I want For example, for genome project id 30807??http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=30807, I need to grab the text information that reads (this is found at the bottom of the page):Anabaena azollae. Anabaena azollae is a cyanobacterial symbiont of the water fern Azolla, commonly known as 'duckweed'. Anabaena azollae is a nitrogen-fixer and provides nitrogen to the host plant.Nostoc azollae 0708. Nostoc azollae 0708, also called Anabaena azollae strain 0708, will be used for comparative analysis. I need to grab the? same information for a list of genome project ids. Is this possible using Bio::DB::Eutilities. If yes, what would be the fields/params? I did try out this:?http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#What_information_is_available_for_database_.27x.27.3F to find out what information is available for genomeprj, but I am unable to get the necessary field/param for my need. Please help. Alternatively, is there a better way to address my need other than Bio::DB::Eutilities Thanks in advance, Anand? From cjfields at illinois.edu Fri Mar 26 23:05:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 22:05:59 -0500 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: References: <4BABB825.6010803@cse.msu.edu><249674A825C14BB3801C6184DEEA7A82@NewLife> <4BAD3DF0.7090006@cse.msu.edu> Message-ID: <73AE1929-9920-4FD1-B36B-1C7244E20102@illinois.edu> You can also grab the code off the github mirror: http://github.com/bioperl/bioperl-live You can either run a checkout, or download the tarball using the 'Download Source' link. We'll have an SVN read-only mirror on Google Code as well very soon, if it isn't done already. chris On Mar 26, 2010, at 7:08 PM, Mark A. Jensen wrote: > Hi Alok-- There has been trouble with the code node > of late. You can get a tarball of all the latest code at > http://bioperl.org/DIST/nightly_builds/ > Download both bioperl-live and bioperl-run > cheers, > MAJ > ----- Original Message ----- From: "Alok" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Friday, March 26, 2010 7:06 PM > Subject: Re: [Bioperl-l] BioPerl Google SOC project > > >> Hi Mark, >> >> Thanks a lot for the response. I tried to access the SVN but was unable to do so. My SVN client just times out :-( >> I even tried SVN links from the BioPerl Wiki (http://www.bioperl.org/wiki/Using_Subversion) >> But they too are non-responsive. >> >> Thanks, >> Alok >> >> Mark A. Jensen wrote: >>> Hi Alok-- Thanks for your interest! You should certainly consider applying. I can work with >>> you on developing your application. I'm including the bioperl mailing list on this >>> post; we'll continue to have this conversation on the list so that the helpful, friendly, >>> knowledgeable, compassionate membership can participate. >>> WrapperMaker code is currently available in >>> svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker >>> Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run >>> for an example of how Bio::Tools::Run::WrapperBase and CommandExts are >>> used (er, by me...). >>> cheers >>> MAJ >>> ----- Original Message ----- From: "Alok" >>> To: >>> Sent: Thursday, March 25, 2010 3:23 PM >>> Subject: BioPerl Google SOC project >>> >>> >>>> Hello Mark, >>>> >>>> My name is Alok Watve and I am currently pursuing PhD in Computer >>>> Science at Michigan State University. I was going through the BioPerl >>>> Wiki for Google SOC projects. I have good experience with Perl and was >>>> wondering if I could work on the project "Perl Run Wrappers". >>>> >>>> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >>>> Ltd. My work was involved in writing Java programs and their perl >>>> wrappers. We used perl scripts to fire java programs with all the >>>> correct parameters. So I think I have some idea about what wrappers are. >>>> However, I have not used BioPerl and may take some time to get familiar >>>> with the structure. I am fairly confident that I will be able to do this. >>>> >>>> During my work here at MSU. I use perl a lot for doing basic text >>>> analysis for my projects. Although I rarely use OO features of perl, I >>>> have used them in past and never had any problems with it. I also >>>> believe in writing well-documented and user/developer friendly code >>>> (With comments, command line options for help/documentation). I have >>>> attached a simple script I wrote for my project as an example. I have >>>> also attached my resume for your consideration. >>>> >>>> Please let me know if you think that I am an appropriate candidate and >>>> whether I should go ahead with submitting an application with BioPerl as >>>> my Mentor Organization. >>>> >>>> Thanks a lot, >>>> Alok >>>> www.cse.msu.edu/~watvealo/ >>>> >>> >>> >>> -------------------------------------------------------------------------------- >>> >>> >>> >>>> #!/usr/bin/perl >>>> >>>> =pod >>>> >>>> =head1 SYNOPSIS >>>> >>>> Script to edit existing box query files to enable random box query. >>>> This scripts inserts box size on each line corresponding to discrete >>>> dimension in the existing box query file. The maximum value of "box size" >>>> depends on the alphabet size. >>>> >>>> Example >>>> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >>>> >>>> Use -perldoc for detailed help on options. >>>> >>>> =head1 OPTIONS >>>> >>>> =over >>>> >>>> =item -infile >>>> >>>> Specifies the name of the input box query file. >>>> >>>> =item -outfile >>>> >>>> Specifies the name of the output file. >>>> >>>> =item -uniform_box >>>> >>>> Specifies size of the uniform box query. >>>> >>>> =item -max_size >>>> >>>> Specifies the maximum box size for random sized box query. >>>> >>>> =item -help >>>> >>>> Displays a brief help message and exits. >>>> >>>> =item -perldoc >>>> >>>> Displays a detailed help. >>>> >>>> =back >>>> >>>> =cut >>>> >>>> use strict; >>>> use warnings 'all'; >>>> >>>> use Getopt::Long; >>>> use Pod::Usage; >>>> >>>> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >>>> 'help' => \my $help, 'perldoc' => \my $perldoc); >>>> >>>> if(defined($perldoc)) >>>> { >>>> pod2usage(-verbose => 2); >>>> } >>>> >>>> if(defined($help)) >>>> { >>>> pod2usage(-verbose=> 0); >>>> } >>>> >>>> if(! (defined($infile) && defined ($outfile) )) >>>> { >>>> die('Please specify input, output files. Use -perldoc >>>> for more help'); >>>> } >>>> >>>> # Some basic error checking to ensure script runs .... >>>> if(!(defined($uniformBox) ||defined($maxSize))) >>>> { >>>> die('Specify either box size for uniform box queries or maximum box size for random box queries'); >>>> } >>>> >>>> # Initialize random number generator. >>>> srand(); >>>> >>>> # Read Input file and find out lines we are interested in >>>> # Then perfix the line with correct box size as defined by >>>> # user choice >>>> open(IN, "<$infile"); >>>> open(OUT, ">$outfile"); >>>> my $count = 0; >>>> while(my $line = ) >>>> { >>>> if( ($count%64) < 32 ) >>>> { >>>> if(defined($uniformBox)) >>>> { >>>> $line = sprintf("%d ",$uniformBox) . $line; >>>> } >>>> elsif(defined($maxSize)) >>>> { >>>> # This line corresponds to the discrete dimension. >>>> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >>>> } >>>> } >>>> $count ++; >>>> print OUT $line >>>> } >>>> >>>> close(OUT); >>>> close(IN); >>>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Mar 26 23:15:30 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 23:15:30 -0400 Subject: [Bioperl-l] Error during installation of 1.6.1 In-Reply-To: References: Message-ID: Is it really "q install q" ? Then you probably need to do some cpan configuring. It's possible your original CPAN/Config.pm file is lost or not where cpan expects it to be after your upgrade. Try this $ cpan cpan> o conf make /usr/bin/make cpan> o conf make_install_make_command /usr/bin/make cpan> o conf commit and rerun the install. If you get other strangeness, I would check the values of all the config variables by listing with cpan> o conf BTW, by the message I infer you've got v1.93 of CPAN; maybe upgrading to the current version (v1.9402) would solve some problems. cheers MAJ ----- Original Message ----- From: "Wollenberg, Kurt (NIH/NIAID) [C]" To: Sent: Friday, March 26, 2010 4:47 PM Subject: [Bioperl-l] Error during installation of 1.6.1 > Hello: > > I am trying to install BioPerl (after a recent system upgrade) and am > getting the following error: > > "Catching error: "Can't execute q install q: No such file or directory at > /Library/Perl/Updates/5.8.8/CPAN/Shell.pm line 1755\cJ" at > /Library/Perl/Updates/5.8.8/CPAN.pm line 391". > > Previous to this I've run the CPAN upgrade, etc. as recommended on the > Installation for Unix page. This happens when I try to do the actual > install, both vanilla and "force"ed. I'm attempting this on a Mac G5 > workstation running 10.5.8. Any clues what I may be missing or doing > incorrectly? > > Cheers, > Kurt Wollenberg, Ph.D. > Contractor - Lockheed Martin > Phylogenetics Specialist > Computational Biology Section > Bioinformatics and Computational Biosciences Branch (BCBB) > OCICB/OSMO/OD/NIAID/NIH > > 31 Center Drive, Room 3B62 > Bethesda, MD 20892-0485 > Office 301-402-8628 > http://bioinformatics.niaid.nih.gov (Within NIH) > http://exon.niaid.nih.gov (Public) > > Disclaimer: > The information in this e-mail and any of its attachments is confidential > and may contain sensitive information. It should not be used by anyone who > is not the original intended recipient. If you have received this e-mail in > error please inform the sender and delete it from your mailbox or any other > storage devices. National Institute of Allergy and Infectious Diseases shall > not accept liability for any statements made that are sender's own and not > expressly made on behalf of the NIAID by one of its representatives > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Sat Mar 27 08:42:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 27 Mar 2010 12:42:12 +0000 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Message-ID: <320fb6e01003270542i1f3cd4d2x61c97bc7ccf1b917@mail.gmail.com> On Fri, Mar 26, 2010 at 4:16 PM, Chris Fields wrote: > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the > latter is if you always want a full nucleotide sequence instead > of possibly getting contig files). ?'genbank' used to be an alias > for 'gb', but apparently no longer, and appears to be something > that was changed on NCBI's end. Yeah, the NCBI changed that almost a year ago (Easter 2009). It broke one of the Biopython unit tests, and I asked the NCBI about this and if they could restore the alias "genbank". They declined, so in Biopython's efetch wrapper we spot anyone asking for retype=genbank, issue a warning, and convert it to retype=gb or retype=gp (for the protein database) instead. The relevant Biopython code is here if anyone is interested: http://biopython.org/SRC/biopython/Bio/Entrez/__init__.py Peter From pmiguel at purdue.edu Sat Mar 27 09:51:14 2010 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sat, 27 Mar 2010 09:51:14 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> <4BACEEA9.2060407@purdue.edu> <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> Message-ID: <4BAE0D52.60908@purdue.edu> Hi Chris, I also see there is a bunch of NCBI toolkit code that deals with asn.1 conversion. They even have some precompiled code: http://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html Thanks for your help, Phillip Chris Fields wrote: > That format is ASN.1. and there isn't a BioPerl parser for GenBank ASN.1 > format (it tends to be too cumbersome). > > However, there is a pure-perl-based one for the EntrezGene ASN.1 format > (Bio::ASN1::EntrezGene). > > chris > > > On Fri, 2010-03-26 at 13:28 -0400, Phillip San Miguel wrote: > >> Ah, yes. That does the trick. Actually I have already downloaded a few >> thousand records in whatever that format that is returned when 'genbank' >> is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= >> seq {') Any idea what format that is and how to convert it to something >> SeqIO can use? >> >> If not, I can just pull them all down again by sending about 200 gi's >> per request. That should not offend the genbank gods... >> >> Thanks for your help, >> Phillip >> >> Chris Fields wrote: >> >>> Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. >>> >>> Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. >>> >>> chris >>> >>> On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: >>> >>> >>> >>>> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. >>>> >>>> I am trying to use the code provided at: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO >>>> >>>> and modified to request gi228534658 >>>> >>>> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. >>>> >>>> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> >>>> use Bio::SeqIO; >>>> use Bio::DB::EUtilities; >>>> >>>> my @ids; >>>> push @ids, '228534658'; >>>> my $factory = Bio::DB::EUtilities->new( >>>> -eutil => 'efetch', >>>> -db => 'nucleotide', >>>> -rettype => 'genbank', >>>> -id => \@ids); >>>> >>>> my $file = 'myseqs.gb'; >>>> >>>> # dump HTTP::Response content to a file (not retained in memory) >>>> $factory->get_Response(-file => $file); >>>> >>>> my $seqin = Bio::SeqIO->new(-file => $file, >>>> -format => 'genbank'); >>>> >>>> while (my $seq = $seqin->next_seq) { >>>> print "I see a sequence\n"; >>>> print $seq->species(); >>>> } >>>> >>>> >>>> "myseqs.gb" does have content: >>>> >>>> Seq-entry ::= seq { >>>> id { >>>> general { >>>> db "gpid:36555" , >>>> tag >>>> str "contig49313" } , >>>> genbank { >>>> accession "EZ113652" , >>>> version 1 } , >>>> gi 228534658 } , >>>> descr { >>>> title "TSA: Zea mays contig49313, mRNA sequence." , >>>> source { >>>> genome genomic , >>>> org { >>>> taxname "Zea mays" , >>>> db { >>>> { >>>> db "taxon" , >>>> tag >>>> id 4577 } } , >>>> orgname { >>>> name >>>> binomial { >>>> genus "Zea" , >>>> species "mays" } , >>>> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; >>>> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; >>>> PACCAD clade; Panicoideae; Andropogoneae; Zea" , >>>> gcode 1 , >>>> mgcode 1 , >>>> div "PLN" } } } , >>>> molinfo { >>>> biomol mRNA , >>>> tech tsa } , >>>> pub { >>>> pub { >>>> article { >>>> title { >>>> name "Deep sampling of the Palomero maize transcriptome by a high >>>> throughput strategy of pyrosequencing." } , >>>> authors { >>>> names >>>> std { >>>> { >>>> name >>>> name { >>>> last "Vega-Arreguin" , >>>> initials "J.C." } } , >>>> { >>>> name >>>> name { >>>> last "Ibarra-Laclette" , >>>> initials "E." } } , >>>> { >>>> name >>>> name { >>>> last "Jimenez-Moraila" , >>>> initials "B." } } , >>>> { >>>> name >>>> name { >>>> last "Martinez" , >>>> initials "O." } } , >>>> { >>>> name >>>> name { >>>> last "Vielle-Calzada" , >>>> initials "J.P." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> initials "L." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> initials "A." } } } } , >>>> from >>>> journal { >>>> title { >>>> iso-jta "BMC Genomics" , >>>> ml-jta "BMC Genomics" , >>>> issn "1471-2164" , >>>> name "BMC genomics" } , >>>> imp { >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } , >>>> volume "10" , >>>> issue "1" , >>>> pages "299" , >>>> language "ENG" , >>>> pubstatus aheadofprint , >>>> history { >>>> { >>>> pubstatus received , >>>> date >>>> std { >>>> year 2008 , >>>> month 12 , >>>> day 2 } } , >>>> { >>>> pubstatus accepted , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } } , >>>> { >>>> pubstatus aheadofprint , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } } , >>>> { >>>> pubstatus other , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } , >>>> { >>>> pubstatus pubmed , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } , >>>> { >>>> pubstatus medline , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } } } } , >>>> ids { >>>> pii "1471-2164-10-299" , >>>> doi "10.1186/1471-2164-10-299" , >>>> pubmed 19580677 } } , >>>> pmid 19580677 } } , >>>> pub { >>>> pub { >>>> sub { >>>> authors { >>>> names >>>> std { >>>> { >>>> name >>>> name { >>>> last "Vega-Arreguin" , >>>> first "Julio" , >>>> initials "J.C." } } , >>>> { >>>> name >>>> name { >>>> last "Ibarra-Laclette" , >>>> first "Enrique" , >>>> initials "E." } } , >>>> { >>>> name >>>> name { >>>> last "Jimenez-Moraila" , >>>> first "Beatriz" , >>>> initials "B." } } , >>>> { >>>> name >>>> name { >>>> last "Martinez" , >>>> first "Octavio" , >>>> initials "O." } } , >>>> { >>>> name >>>> name { >>>> last "Vielle-Calzada" , >>>> first "Jean" , >>>> initials "J.Philippe." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> first "Luis" , >>>> initials "L." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> first "Alfredo" , >>>> initials "A." } } } , >>>> affil >>>> std { >>>> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , >>>> div "Cinvestav Campus Guanajuato" , >>>> city "Irapuato" , >>>> sub "Guanajuato" , >>>> country "Mexico" , >>>> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , >>>> postal-code "36821" } } , >>>> medium other , >>>> date >>>> std { >>>> year 2009 , >>>> month 3 , >>>> day 23 } } } } , >>>> user { >>>> type >>>> str "GenomeProjectsDB" , >>>> data { >>>> { >>>> label >>>> str "ProjectID" , >>>> data >>>> int 36555 } , >>>> { >>>> label >>>> str "ParentID" , >>>> data >>>> int 0 } } } , >>>> create-date >>>> std { >>>> year 2009 , >>>> month 5 , >>>> day 5 } , >>>> update-date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 14 } } , >>>> inst { >>>> repr raw , >>>> mol rna , >>>> length 450 , >>>> seq-data >>>> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 >>>> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A >>>> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 >>>> 760CFF0'H } } >>>> >>>> >>>> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. >>>> >>>> Phillip >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From awitney at sgul.ac.uk Mon Mar 29 13:26:40 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 29 Mar 2010 18:26:40 +0100 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl In-Reply-To: <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> References: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> Message-ID: <6DD3E9BB-27AD-4241-94F9-476AE6525A7D@sgul.ac.uk> thanks Chris for the explanation. It looks like Exonerate may also do something similar thanks adam On 26 Mar 2010, at 15:51, Chris Fields wrote: > It's not actively developed as far as I know. I've been thinking that we could break it out of bioperl-ext and release it on it's own, with the intent that someone could take it up at some point. We have started down that road with the HMM tools in bioperl-ext, though that one is still maintained by it's author. > > I know many users just use calls to outside programs, such EMBOSS (which has water and needle) or others. From the maintenance standpoint they're easier to update if something changes, XS can be a bugbear. > > chris > > On Mar 26, 2010, at 10:20 AM, Adam Witney wrote: > >> Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error >> >> "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. >> Please read the install the bioperl-ext package" >> >> even though i did compile and install the Bio::Ext::Align package >> >> If not using the pSW module, what do other people use for this? >> >> thanks >> >> adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nicolas.turenne at jouy.inra.fr Mon Mar 29 14:09:53 2010 From: nicolas.turenne at jouy.inra.fr (Nicolas Turenne) Date: Mon, 29 Mar 2010 20:09:53 +0200 Subject: [Bioperl-l] about biblio Message-ID: <4BB0ECF1.6050308@jouy.inra.fr> Hello, I am using biblio module from bioperl to download pubmed abstract. if i do the query "actb" on the pubmed site (http://www.ncbi.nlm.nih.gov/sites/entrez) i get 165 hits But using bioperl, if i do use Bio::Biblio; my $biblio = Bio::Biblio->new (-access => 'soap', -location => 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS', -destroy_on_exit => '0'); my @ListID = @{ $biblio->find ("actb")->get_all_ids }; i get 228 hits, so i dont understand the difference thank for help Nicolas From sj17m89 at gmail.com Mon Mar 29 13:47:38 2010 From: sj17m89 at gmail.com (Shweta Jha) Date: Mon, 29 Mar 2010 10:47:38 -0700 Subject: [Bioperl-l] Regarding Google Summer of Code Message-ID: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> Dear Sir / Madam , I , Shweta Jha , am a Third year B.Tech Bioinformatics student. I am interested to apply for the Google Summer of Code internship program. I am keen to work on project using Bioperl. Could you please let me know how do I apply for the program? Thanks and Regards Shweta Jha From rmb32 at cornell.edu Mon Mar 29 15:26:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 29 Mar 2010 12:26:30 -0700 Subject: [Bioperl-l] Regarding Google Summer of Code In-Reply-To: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> References: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> Message-ID: <4BB0FEE6.3080209@cornell.edu> Hi Shweta, See http://open-bio.org/wiki/Google_Summer_of_Code, and the GSoC FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs for details on the application process. Rob Shweta Jha wrote: > Dear Sir / Madam , > > I , Shweta Jha , am a Third year B.Tech Bioinformatics student. > > I am interested to apply for the Google Summer of Code internship program. > > I am keen to work on project using Bioperl. > > Could you please let me know how do I apply for the program? > > > > Thanks and Regards > Shweta Jha > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From martin.senger at gmail.com Mon Mar 29 17:02:02 2010 From: martin.senger at gmail.com (Martin Senger) Date: Mon, 29 Mar 2010 22:02:02 +0100 Subject: [Bioperl-l] about biblio In-Reply-To: <4BB0ECF1.6050308@jouy.inra.fr> References: <4BB0ECF1.6050308@jouy.inra.fr> Message-ID: <4d93f07c1003291402j5ab58216o3985157513d1820a@mail.gmail.com> Hi, I am actually not sure what is the correct answer - because I am not anymore maintaining the biblio server at EBI (I actually did not know that it was still running :-) - but I am very pleased that it does run). Mahmut, can I ask you a favor? Could you please pass the emailed question below to an appropriate person at EBI? Of course, if the result of this inquiry is that the problem is in the biblio module in bioperl I am quite happy and keen to fix it there. Cheers, Martin On Mon, Mar 29, 2010 at 7:09 PM, Nicolas Turenne < nicolas.turenne at jouy.inra.fr> wrote: > Hello, > I am using biblio module from bioperl to download pubmed abstract. > if i do the query "actb" on the pubmed site ( > http://www.ncbi.nlm.nih.gov/sites/entrez) > i get 165 hits > > But using bioperl, if i do > > use Bio::Biblio; > my $biblio = Bio::Biblio->new > (-access => 'soap', > -location => 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS', > -destroy_on_exit => '0'); > my @ListID = @{ $biblio->find ("actb")->get_all_ids }; > > i get 228 hits, so i dont understand the difference > > thank for help > Nicolas > -- Martin Senger email: martin.senger at gmail.com,martin.senger at kaust.edu.sa skype: martinsenger From click.xu at gmail.com Mon Mar 29 23:17:17 2010 From: click.xu at gmail.com (click xu) Date: Tue, 30 Mar 2010 11:17:17 +0800 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw Message-ID: Hi, I meet a problem when using Clustalw module. Here is the error message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht -output=gcg?? -matrix=BLOSUM -ktup le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | cannot find the file or path STACK: Error::throw STACK: Bio::Root::Root::throw /home/lf/data/BioPerl-1.6.1/Bio/Root/Root.pm:368 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig nment/Clustalw.pm:756 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali gnment/Clustalw.pm:515 STACK: test.txt:45 ----------------------------------------------------------- The test program is described as below: ----------------------------------------------------------- @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); # @seq_array is an array of Bio::Seq objects $aln = $factory->align(\@seq_array); ----------------------------------------------------------- The work path of clustalw2 has been configured: export CLUSTALDIR=/usr/local/bin/clustalw2 So, what may be reason of the error? Thanks! From Russell.Smithies at agresearch.co.nz Mon Mar 29 23:25:03 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 30 Mar 2010 16:25:03 +1300 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Do you have enough temp space? Will clustalw run 'manually' with your parameters from the command line? --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of click xu > Sent: Tuesday, 30 March 2010 4:17 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw > > Hi, > I meet a problem when using Clustalw module. > Here is the error message: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht > -output=gcg?? -matrix=BLOSUM -ktup > le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | > cannot find the file or path > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- > 1.6.1/Bio/Root/Root.pm:368 > STACK: Bio::Tools::Run::Alignment::Clustalw::_run > /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig > nment/Clustalw.pm:756 > STACK: Bio::Tools::Run::Alignment::Clustalw::align > /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali > gnment/Clustalw.pm:515 > STACK: test.txt:45 > ----------------------------------------------------------- > The test program is described as below: > ----------------------------------------------------------- > @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > # @seq_array is an array of Bio::Seq objects > $aln = $factory->align(\@seq_array); > ----------------------------------------------------------- > The work path of clustalw2 has been configured: > export CLUSTALDIR=/usr/local/bin/clustalw2 > So, what may be reason of the error? > Thanks! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From click.xu at gmail.com Tue Mar 30 00:03:49 2010 From: click.xu at gmail.com (click xu) Date: Tue, 30 Mar 2010 12:03:49 +0800 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Message-ID: Russell Clustalw2 can correctly run in command line, and the /tmp space is enough too. 2010/3/30 Smithies, Russell : > Do you have enough temp space? > Will clustalw run 'manually' with your parameters from the command line? > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of click xu >> Sent: Tuesday, 30 March 2010 4:17 p.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw >> >> Hi, >> I meet a problem when using Clustalw module. >> Here is the error message: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht >> -output=gcg?? -matrix=BLOSUM -ktup >> le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | >> cannot find the file or path >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- >> 1.6.1/Bio/Root/Root.pm:368 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig >> nment/Clustalw.pm:756 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align >> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali >> gnment/Clustalw.pm:515 >> STACK: test.txt:45 >> ----------------------------------------------------------- >> The test program is described as below: >> ----------------------------------------------------------- >> @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >> # @seq_array is an array of Bio::Seq objects >> $aln = $factory->align(\@seq_array); >> ----------------------------------------------------------- >> The work path of clustalw2 has been configured: >> export CLUSTALDIR=/usr/local/bin/clustalw2 >> So, what may be reason of the error? >> Thanks! >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From martin.senger at gmail.com Tue Mar 30 04:18:30 2010 From: martin.senger at gmail.com (Martin Senger) Date: Tue, 30 Mar 2010 09:18:30 +0100 Subject: [Bioperl-l] about biblio In-Reply-To: <4BB0ECF1.6050308@jouy.inra.fr> References: <4BB0ECF1.6050308@jouy.inra.fr> Message-ID: <4d93f07c1003300118q1c7b0551w4aa25a2a97fc35be@mail.gmail.com> Here is the answer sent by Mr Hamish McWilliam from EBI (where the MEDLINE server is running): The difference is OpenBQS adds a wildcard when it builds the SRS query: > > - [medline-AllText:actb*] gives 228 entries > - [medline-AllText:actb] gives 150 entries > > Performing the same query at PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) > gives similar answers: > > - "actb*" gives 255 entries > - "actb" gives 165 entries > > The remaining differences are probably due to slight differences in the > PubMed data at NCBI and the exported MEDLINE data. > Cheers, Martin -- Martin Senger email: martin.senger at gmail.com,martin.senger at kaust.edu.sa skype: martinsenger From cjfields at illinois.edu Tue Mar 30 08:42:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 30 Mar 2010 07:42:24 -0500 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Message-ID: <863E31F9-072B-4681-94C5-D2C8BEA82021@illinois.edu> You may need to submit this as a bug. I got clustalw2 working fairly recently, but it's possible some other API change is breaking things. chris On Mar 29, 2010, at 11:03 PM, click xu wrote: > Russell > Clustalw2 can correctly run in command line, and the /tmp space is enough too. > > > 2010/3/30 Smithies, Russell : >> Do you have enough temp space? >> Will clustalw run 'manually' with your parameters from the command line? >> >> --Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of click xu >>> Sent: Tuesday, 30 March 2010 4:17 p.m. >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw >>> >>> Hi, >>> I meet a problem when using Clustalw module. >>> Here is the error message: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: ClustalW call ( align -infile=/tmp/AeyAfdxGvH/YpcPbyhYht >>> -output=gcg -matrix=BLOSUM -ktup >>> le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | >>> cannot find the file or path >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- >>> 1.6.1/Bio/Root/Root.pm:368 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >>> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig >>> nment/Clustalw.pm:756 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::align >>> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali >>> gnment/Clustalw.pm:515 >>> STACK: test.txt:45 >>> ----------------------------------------------------------- >>> The test program is described as below: >>> ----------------------------------------------------------- >>> @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >>> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >>> # @seq_array is an array of Bio::Seq objects >>> $aln = $factory->align(\@seq_array); >>> ----------------------------------------------------------- >>> The work path of clustalw2 has been configured: >>> export CLUSTALDIR=/usr/local/bin/clustalw2 >>> So, what may be reason of the error? >>> Thanks! >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue Mar 30 16:10:09 2010 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 30 Mar 2010 22:10:09 +0200 Subject: [Bioperl-l] AlignIO formats Message-ID: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> Hi, Using GuessSeqFormat and AlignIO, I stumbled on some issues and am now wondering if the defined formats are actually OK. Esp. related to pfam, selex, stockholm formats it seems: pfam here is like selex without any comment lines, but with the /start-end after the seq id like myseq/1-111. The EBI site (http://www.ebi.ac.uk/2can/tutorials/formats.html#pfam) actually defines Pfam and Stockholm to be the same formats. This makes me wonder: is the Pfam format actually defined as Selex or Stockholm? Within BioPerl it is like Selex. In addition, Selex (as used in HMMER 2.3.2) contains comment lines like #=AC, #=RF or #=ID. GuessSeq format uses this to detect Selex, however, they do not have to be present. GuessSeqFormat uses: return (($lineno == 1 && $line =~ /^#=ID /) || ($lineno == 2 && $line =~ /^#=AC /) || ($line =~ /^#=SQ /)); to detect the Selex format. At the same time, the Selex reader does not seem to get the aln id or accession if( $entry =~ /^\#=GS\s+(\S+)\s+AC\s+(\S+)/ ) { $accession{ $1 } = $2; Also a Selex file like: seq1 ACGACGACGACG. seq2 ..GGGAAAGG.GA seq3 UUU..AAAUUU.A is guessed to be phylip (whereas the seq1/1-11 format will be guessed as pfam) I am not sure if the above is desired behaviour, though all sequences are read in the alignment object correctly. I' was wondering wether all Selex variations could be guessed as Selex, not as phylip, pfam or selex (though in the selex case we can have more alignments in one file). Regards, Bernd From p.j.a.cock at googlemail.com Tue Mar 30 17:12:46 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 30 Mar 2010 22:12:46 +0100 Subject: [Bioperl-l] AlignIO formats In-Reply-To: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> References: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> Message-ID: <320fb6e01003301412s6c90220el7a95bdc97dee03e6@mail.gmail.com> On Tue, Mar 30, 2010 at 9:10 PM, Bernd Web wrote: > Hi, > > Using GuessSeqFormat and AlignIO, I stumbled on some issues and > am now wondering if the defined formats are actually OK. Esp. related to > pfam, selex, stockholm formats it seems: > > pfam here is like selex without any comment lines, but with the > /start-end after the seq id like myseq/1-111. > The EBI site (http://www.ebi.ac.uk/2can/tutorials/formats.html#pfam) > actually defines Pfam and Stockholm to be the same formats. This makes > me wonder: is the Pfam format actually defined as Selex or Stockholm? > Within BioPerl it is like Selex. I (and therefore the Biopython documentation) also think PFAM and Stockholm alignments are basically the same thing. The BioPerl wiki seems to agree with this interpretation too. Looking at the HMMER2 examples, Selex is different but the comment style is similar. The obvious thing to check is the presence or absence of the "# STOCKHOLM 1.0" header if trying to tell them apart. See also: http://en.wikipedia.org/wiki/Stockholm_format and http://www.bioperl.org/wiki/Stockholm_multiple_alignment_format http://www.bioperl.org/wiki/SELEX_multiple_alignment_format Peter From jun.yin at ucd.ie Tue Mar 30 18:37:07 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 30 Mar 2010 23:37:07 +0100 Subject: [Bioperl-l] summer code project on Bioperl Message-ID: <7160acc75f99.4bb28b23@ucd.ie> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CV_JunYin.doc Type: application/msword Size: 27648 bytes Desc: not available URL: From ross at cuhk.edu.hk Wed Mar 31 17:28:59 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Thu, 1 Apr 2010 05:28:59 +0800 Subject: [Bioperl-l] BlastPlus usage inquiry In-Reply-To: References: Message-ID: <014401cad119$2d1467a0$873d36e0$@edu.hk> Dear all, I know it is inappropriate to raise this question in bioperl but as I received no better response from NCBI and so have to ask in this group (because finally I'll use bioperl to call blastplus). I have already been using the latest blastplus (the command is blastn directly) and found the problem of running slow and inability to run in a parallel/multithread manner. Previously I was using non blastplus version 2.2.22 with the command blastall -p blastn -a 8 etc. With similar arguments as below except the word size was 12, my shell script for the same input and database finishes almost instantly. I notice that except word size and min raw gapped score were changed by me, nothing appears to differ from the previous version parameters. Moreover, when I top my process, I find it uses only one CPU instead of 7. What may be the problem for the script that makes the job running for a day and still hasn't finished? blastn -query $1 -db $2 -out $1_$2.xml -num_threads 7 -word_size 4 -gapopen 3 -gapextend 1 -penalty -2 -outfmt 5 -xdrop_ungap 30 -xdrop_gap 30 -xdrop_gap_final 30 -min_raw_gapped_score 10 From anil_m_lal at yahoo.com Tue Mar 30 14:24:34 2010 From: anil_m_lal at yahoo.com (Anil Lal) Date: Tue, 30 Mar 2010 11:24:34 -0700 (PDT) Subject: [Bioperl-l] GSoC 2010 Message-ID: <717794.59615.qm@web37507.mail.mud.yahoo.com> Hello, I am a mid career software programmer and now transitioning in bioinformatics. I always had great interest in bioinformatics and only now am able to make a move to take classes. I am currently enrolled in University of santa cruz extension classes. I am very interested in GSoC 2010 and have identified potentially these two projects.Lightweight Sequence objects and Lazy Parsing mentored by Chris Fields and Perl Run Wrappers for External Programs in a Flash mentored by Mark Jenson. Please let me know if these projects are still available. If yes, I will send in my application with more details Thanks a lot for your help. I would be exciting to work in Bio Perl and contribute. Anil From schae234 at gmail.com Tue Mar 30 12:33:42 2010 From: schae234 at gmail.com (Robert Schaefer) Date: Tue, 30 Mar 2010 10:33:42 -0600 Subject: [Bioperl-l] Google Summer of Code Message-ID: <60c593881003300933p46c7c295k69a21ee986ef5777@mail.gmail.com> Hello, I am looking for more information of your mentorship program for google's SOC. Who would I contact for more information and to ask questions? Thank you, Rob Schaefer From forrest_zhang at 163.com Mon Mar 1 00:10:31 2010 From: forrest_zhang at 163.com (forrest) Date: Mon, 01 Mar 2010 13:10:31 +0800 Subject: [Bioperl-l] use threads to get seq file error. Message-ID: <4B8B4C47.108@163.com> Hi all, When I use threads to get Genbank format file, show some error. It is shown as: "Can't call method "get_taxon" on unblessed reference at /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." ========================================= #!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GenBank; use threads; my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); my $seq_out = Bio::SeqIO->new(-format => "genbank", -file => ">dhn_all.gb"); my @seq; my $number = @id; my $max_threads = 6; for (my $thread_number=0;$thread_number<$number;){ my %threads_seq_hash; if ($number - $thread_number > $max_threads){ for (my $thread=0;$thread<$max_threads;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } }else{ my $else_number = $number % $max_threads; for (my $thread=0;$thread<$else_number;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } } foreach my $thread (sort keys %threads_seq_hash){ my ($seq) = $threads_seq_hash{$thread}->join; push (@seq,$seq); } } foreach (@seq){ $seq_out->write_seq($_); } ========================================= How can I fix this error? Thanks. Zhang Tao From cjfields at illinois.edu Mon Mar 1 15:37:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 01 Mar 2010 14:37:18 -0600 Subject: [Bioperl-l] use threads to get seq file error. In-Reply-To: <4B8B4C47.108@163.com> References: <4B8B4C47.108@163.com> Message-ID: <1267475838.16248.8.camel@pyrimidine.igb.uiuc.edu> I get much nastier ones than that; a small taste: --------------------- WARNING --------------------- MSG: exception while parsing location line [1..680] in reading EMBL/GenBank/SwissProt, ignoring feature source (seqid=AF031249): Eval-group not allowed at runtime, use re 'eval' in regex m/(.*?)\(((?x-ism: (?> [^()]+ | \( (??{.../ at /home/cjfields/bioperl/live/Bio/Factory/FTLocationFactory.pm line 161, line 36. --------------------------------------------------- Thread 2 terminated abnormally: Can't call method "primary_tag" on an undefined value at /home/cjfields/bioperl/live/Bio/SeqIO/genbank.pm line 662, line 36. Could you report this as a bug? chris On Mon, 2010-03-01 at 13:10 +0800, forrest wrote: > Hi all, > > When I use threads to get Genbank format file, show some error. It is > shown as: > > "Can't call method "get_taxon" on unblessed reference at > /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." > > ========================================= > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > use Bio::Seq; > use Bio::DB::GenBank; > use threads; > > > my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); > > > my $seq_out = Bio::SeqIO->new(-format => "genbank", > -file => ">dhn_all.gb"); > my @seq; > > my $number = @id; > > my $max_threads = 6; > > for (my $thread_number=0;$thread_number<$number;){ > my %threads_seq_hash; > > if ($number - $thread_number > $max_threads){ > for (my $thread=0;$thread<$max_threads;){ > $threads_seq_hash{$thread} = threads->new(sub { > my $gb = Bio::DB::GenBank->new; > my $seq = $gb->get_Seq_by_acc($id[$thread_number]); > }); > $thread_number++; > $thread++; > > } > }else{ > my $else_number = $number % $max_threads; > for (my $thread=0;$thread<$else_number;){ > $threads_seq_hash{$thread} = threads->new(sub { > my $gb = Bio::DB::GenBank->new; > my $seq = $gb->get_Seq_by_acc($id[$thread_number]); > }); > $thread_number++; > $thread++; > > } > > > } > > foreach my $thread (sort keys %threads_seq_hash){ > my ($seq) = $threads_seq_hash{$thread}->join; > push (@seq,$seq); > } > } > > foreach (@seq){ > $seq_out->write_seq($_); > } > ========================================= > > > How can I fix this error? > Thanks. > > > Zhang Tao > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paolo.pavan at gmail.com Mon Mar 1 18:07:33 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 00:07:33 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> Message-ID: <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Dear all, Sorry for pushing up my post but, please does anyone have an hint for me? Maybe have I to send attached the report to the mailing list? I don't know attachment policies of the list, if it is allowed and is needed I can do that. Thank you, Paolo 2010/2/26 Paolo Pavan : > Sorry, > Maybe I forgot to add this is the megablast -m 5 output. > > Thank you again, > Paolo > > 2010/2/26 Paolo Pavan : >> Hi all, >> I have just a brief question: I've got some megablast reports such the >> one I've pasted below. >> I'm aware of the existence of the Bio::Search::IO::megablast and the >> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >> entire alignment represented as a Bio::SimpleAlign object or >> Bio::Align::AlignI implementing one? >> >> Thank you all, >> Paolo >> >> >> MEGABLAST 2.2.16 [Mar-25-2007] >> >> >> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >> "A greedy algorithm for aligning DNA sequences", >> J Comput Biol 2000; 7(1-2):203-14. >> >> Database: 00038-00053.fasta >> ?????????? 2 sequences; 2001 total letters >> >> Searching..................................................done >> >> Query= 00038-00053 >> ???????? (802 letters) >> >> >> >> ???????????????????????????????????????????????????????????????? Score??? E >> Sequences producing significant alignments:????????????????????? (bits) Value >> >> ______00038 >> 226?? 1e-62 >> ______00053 >> 115?? 3e-29 >> >> 1_0???????? 472 >> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >> ______00038 883 >> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 532 >> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >> ______00038 943 >> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 591 >> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 651 >> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 711 >> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053 1??? -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >> >> 1_0???????? 771? ccttaagtatttactttaacttgtcttgatca 802 >> ______00038 1000 -------------------------------- 1001 >> ______00053 35?? ccttaagtatt-actttaacttgtcttgatca 65 >> ? Database: 00038-00053.fasta >> ??? Posted date:? Feb 25, 2010? 4:47 PM >> ? Number of letters in database: 2001 >> ? Number of sequences in database:? 2 >> >> Lambda???? K????? H >> ??? 1.37??? 0.711???? 1.31 >> >> Gapped >> Lambda???? K????? H >> ??? 1.37??? 0.711???? 1.31 >> >> >> Matrix: blastn matrix:1 -3 >> Gap Penalties: Existence: 0, Extension: 0 >> Number of Sequences: 2 >> Number of Hits to DB: 17 >> Number of extensions: 3 >> Number of successful extensions: 3 >> Number of sequences better than 10.0: 2 >> Number of HSP's gapped: 2 >> Number of HSP's successfully gapped: 2 >> Length of query: 802 >> Length of database: 2001 >> Length adjustment: 10 >> Effective length of query: 792 >> Effective length of database: 1981 >> Effective search space:? 1568952 >> Effective search space used:? 1568952 >> X1: 9 (17.8 bits) >> X2: 20 (39.6 bits) >> X3: 51 (101.1 bits) >> S1: 9 (18.3 bits) >> S2: 9 (18.3 bits) >> > From cjfields at illinois.edu Mon Mar 1 19:30:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Mar 2010 18:30:43 -0600 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Message-ID: Paolo, You can get a Bio::SimpleAlign from the HSP object. The first code example in this section in the HOWTO demonstrates this: http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods chris On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > Dear all, > Sorry for pushing up my post but, please does anyone have an hint for me? > Maybe have I to send attached the report to the mailing list? I don't > know attachment policies of the list, if it is allowed and is needed I > can do that. > > Thank you, > Paolo > > 2010/2/26 Paolo Pavan : >> Sorry, >> Maybe I forgot to add this is the megablast -m 5 output. >> >> Thank you again, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Hi all, >>> I have just a brief question: I've got some megablast reports such the >>> one I've pasted below. >>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>> entire alignment represented as a Bio::SimpleAlign object or >>> Bio::Align::AlignI implementing one? >>> >>> Thank you all, >>> Paolo >>> >>> >>> MEGABLAST 2.2.16 [Mar-25-2007] >>> >>> >>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >>> "A greedy algorithm for aligning DNA sequences", >>> J Comput Biol 2000; 7(1-2):203-14. >>> >>> Database: 00038-00053.fasta >>> 2 sequences; 2001 total letters >>> >>> Searching..................................................done >>> >>> Query= 00038-00053 >>> (802 letters) >>> >>> >>> >>> Score E >>> Sequences producing significant alignments: (bits) Value >>> >>> ______00038 >>> 226 1e-62 >>> ______00053 >>> 115 3e-29 >>> >>> 1_0 472 >>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>> ______00038 883 >>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 532 >>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>> ______00038 943 >>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 591 >>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 651 >>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 711 >>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 1 -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>> >>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>> ______00038 1000 -------------------------------- 1001 >>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>> Database: 00038-00053.fasta >>> Posted date: Feb 25, 2010 4:47 PM >>> Number of letters in database: 2001 >>> Number of sequences in database: 2 >>> >>> Lambda K H >>> 1.37 0.711 1.31 >>> >>> Gapped >>> Lambda K H >>> 1.37 0.711 1.31 >>> >>> >>> Matrix: blastn matrix:1 -3 >>> Gap Penalties: Existence: 0, Extension: 0 >>> Number of Sequences: 2 >>> Number of Hits to DB: 17 >>> Number of extensions: 3 >>> Number of successful extensions: 3 >>> Number of sequences better than 10.0: 2 >>> Number of HSP's gapped: 2 >>> Number of HSP's successfully gapped: 2 >>> Length of query: 802 >>> Length of database: 2001 >>> Length adjustment: 10 >>> Effective length of query: 792 >>> Effective length of database: 1981 >>> Effective search space: 1568952 >>> Effective search space used: 1568952 >>> X1: 9 (17.8 bits) >>> X2: 20 (39.6 bits) >>> X3: 51 (101.1 bits) >>> S1: 9 (18.3 bits) >>> S2: 9 (18.3 bits) >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Mar 1 20:51:02 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 01 Mar 2010 17:51:02 -0800 Subject: [Bioperl-l] Any module for chromosome region analysis ? In-Reply-To: References: <1267131590.4355.2.camel@epistle> <1267131697.4355.3.camel@epistle> Message-ID: <4B8C6F06.5050905@bioperl.org> Like the ensembl perl API? Robert Bradbury wrote: > I'm not sure if the species being dealt with are "common", but it would seem > to me that a logical addition to bioperl would be an extension that took a > genome location (or locations) and interfaced one into a browser of those > regions in external databases (e.g. UCSC Genome Browser, Ensemble, etc.). > The only cases where that wouldn't work is if one is dealing with novel > species that aren't in the databases yet. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Tue Mar 2 01:21:31 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 01 Mar 2010 22:21:31 -0800 Subject: [Bioperl-l] call for project ideas - Google Summer of Code Message-ID: <4B8CAE6B.4010807@cornell.edu> Hi all, Google's Summer of Code is coming round again, very soon now (mentoring organization applications are due next week). We need project ideas for prospective Summer of Code interns. There's a page on the BioPerl wiki, please have a look and add your ideas for intern projects. For more on Google Summer of Code, what it is and how it works, see their FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs One of the summer intern ideas I have on the page so far is to help with the tough grunt work of breaking BioPerl into smaller, more easily managed distributions. I'm sure you all can think of plenty more! Here's the page: http://www.bioperl.org/wiki/Google_Summer_of_Code Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From paolo.pavan at gmail.com Tue Mar 2 09:37:59 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 15:37:59 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Message-ID: <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> Hi Chris, Thank you for your reply. So I have to understand that since the get_aln method returns the HSP alignment, there is no way to retrieve the whole alignment as in the example pasted, isn't it? Basically I'm trying to use megablast as kind of multiple local alignment engine and actually I'm not pretty sure this is a good idea but in my particular case could be suitable. I mean that the example below reports only the portions of the sequences that align loosing the portions that does not, I'm not sure I gave the idea. What do you think about? Can you give me your opinion? If there isn't any module written yet, I can try to write a parser, it could be of any interest? Thank you, Paolo 2010/3/2 Chris Fields : > Paolo, > > You can get a Bio::SimpleAlign from the HSP object. ?The first code example in this section in the HOWTO demonstrates this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > chris > > On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > >> Dear all, >> Sorry for pushing up my post but, please does anyone have an hint for me? >> Maybe have I to send attached the report to the mailing list? I don't >> know attachment policies of the list, if it is allowed and is needed I >> can do that. >> >> Thank you, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Sorry, >>> Maybe I forgot to add this is the megablast -m 5 output. >>> >>> Thank you again, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> Hi all, >>>> I have just a brief question: I've got some megablast reports such the >>>> one I've pasted below. >>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>> entire alignment represented as a Bio::SimpleAlign object or >>>> Bio::Align::AlignI implementing one? >>>> >>>> Thank you all, >>>> Paolo >>>> >>>> >>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>> >>>> >>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >>>> "A greedy algorithm for aligning DNA sequences", >>>> J Comput Biol 2000; 7(1-2):203-14. >>>> >>>> Database: 00038-00053.fasta >>>> ? ? ? ? ? ?2 sequences; 2001 total letters >>>> >>>> Searching..................................................done >>>> >>>> Query= 00038-00053 >>>> ? ? ? ? ?(802 letters) >>>> >>>> >>>> >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Score ? ?E >>>> Sequences producing significant alignments: ? ? ? ? ? ? ? ? ? ? ?(bits) Value >>>> >>>> ______00038 >>>> 226 ? 1e-62 >>>> ______00053 >>>> 115 ? 3e-29 >>>> >>>> 1_0 ? ? ? ? 472 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>> ______00038 883 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 532 >>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>> ______00038 943 >>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 591 >>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 651 >>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 711 >>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 1 ? ?-------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>> >>>> 1_0 ? ? ? ? 771 ?ccttaagtatttactttaacttgtcttgatca 802 >>>> ______00038 1000 -------------------------------- 1001 >>>> ______00053 35 ? ccttaagtatt-actttaacttgtcttgatca 65 >>>> ? Database: 00038-00053.fasta >>>> ? ? Posted date: ?Feb 25, 2010 ?4:47 PM >>>> ? Number of letters in database: 2001 >>>> ? Number of sequences in database: ?2 >>>> >>>> Lambda ? ? K ? ? ?H >>>> ? ? 1.37 ? ?0.711 ? ? 1.31 >>>> >>>> Gapped >>>> Lambda ? ? K ? ? ?H >>>> ? ? 1.37 ? ?0.711 ? ? 1.31 >>>> >>>> >>>> Matrix: blastn matrix:1 -3 >>>> Gap Penalties: Existence: 0, Extension: 0 >>>> Number of Sequences: 2 >>>> Number of Hits to DB: 17 >>>> Number of extensions: 3 >>>> Number of successful extensions: 3 >>>> Number of sequences better than 10.0: 2 >>>> Number of HSP's gapped: 2 >>>> Number of HSP's successfully gapped: 2 >>>> Length of query: 802 >>>> Length of database: 2001 >>>> Length adjustment: 10 >>>> Effective length of query: 792 >>>> Effective length of database: 1981 >>>> Effective search space: ?1568952 >>>> Effective search space used: ?1568952 >>>> X1: 9 (17.8 bits) >>>> X2: 20 (39.6 bits) >>>> X3: 51 (101.1 bits) >>>> S1: 9 (18.3 bits) >>>> S2: 9 (18.3 bits) >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Zhang_tao at uestc.edu.cn Mon Mar 1 00:02:12 2010 From: Zhang_tao at uestc.edu.cn (Zhang_tao) Date: Mon, 01 Mar 2010 13:02:12 +0800 Subject: [Bioperl-l] use threads to get seq file error. Message-ID: <467416916.06375@eyou.net> Hi all, When I use threads to get Genbank format file, show some error. It is shown as: "Can't call method "get_taxon" on unblessed reference at /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." #!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GenBank; use threads; my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); my $seq_out = Bio::SeqIO->new(-format => "genbank", -file => ">dhn_all.gb"); my @seq; my $number = @id; my $max_threads = 6; for (my $thread_number=0;$thread_number<$number;){ my %threads_seq_hash; if ($number - $thread_number > $max_threads){ for (my $thread=0;$thread<$max_threads;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } }else{ my $else_number = $number % $max_threads; for (my $thread=0;$thread<$else_number;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } } foreach my $thread (sort keys %threads_seq_hash){ my ($seq) = $threads_seq_hash{$thread}->join; push (@seq,$seq); } } foreach (@seq){ $seq_out->write_seq($_); } How can I fix this error? Thanks. Zhang Tao From lpritc at scri.ac.uk Mon Mar 1 06:32:10 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 01 Mar 2010 11:32:10 +0000 Subject: [Bioperl-l] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts Message-ID: Hi, I've tried going back through the mailing list, Googling the answer, and reading the documentation and wiki to find a solution for this. I've either missed it, or it's not there yet. Hopefully there's a simple solution, or an option that I'm just not seeing. I'm sure other people must be using CHADO for bacterial genomes, and I would be interested in hearing about best practice for using CHADO/GBROWSE with these sequences (I've seen http://gmod.org/wiki/Chado_for_prokaryotes - but there's not much in there...). I have a working CHADO(GMOD-1.0)/GBROWSE2/BioPerl 1.6.1 setup on CentOS 5.4, and I'm trying to load some bacterial data. Specifically for this example, I'm trying to get the GenBank sequences for E.coli S88: NC_011742 and NC_011747 into CHADO. I've been following instructions from a number of locations, including http://gmod.org/wiki/Artemis-Chado_Integration_Tutorial and http://gmod.org/wiki/Chado_Tutorial, but there's an issue with these two files, in that the NC_011742 (chromosome) and NC_011747 (plasmid) sequences contain genes that have the same names (and several genes with the same name in the same sequence!), and this appears to be a problem. Here's what's going wrong: I start off with the two GenBank files: """ [lpritc at localhost ~]$ ls -1 *.gbk NC_011742.gbk NC_011747.gbk """ And convert these to .gff3 using the BioPerl script (it doesn't seem to matter whether I pass them with the wildcard, or convert separately, though passing multiple sequences for conversion might be a good place to check for unique IDs): """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s *.gbk # Input: NC_011742.gbk # working on region:NC_011742, Escherichia coli S88, 19-DEC-2008, Escherichia coli S88, complete genome. # GFF3 saved to ./NC_011742.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4696 # gene 4898 # region 1 # pseudogene 151 # CDS 4696 # RESIDUES(tr) 1442813 # RESIDUES 5032268 # processed_transcript 89 # rRNA 22 # pseudogenic_region 151 # exon 4899 # tRNA 91 # # Input: NC_011747.gbk # working on region:NC_011747, Escherichia coli S88, 18-AUG-2009, Escherichia coli S88 plasmid pECOS88, complete sequence. # GFF3 saved to ./NC_011747.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4832 # gene 5037 # region 2 # pseudogene 159 # CDS 4832 # RESIDUES(tr) 1477756 # RESIDUES 5166121 # processed_transcript 92 # rRNA 22 # pseudogenic_region 159 # exon 5038 # tRNA 91 # """ I can then use the gmod_bulk_load_gff3.pl script to load either file, but only singly. This appears to work, and the result is visible and seemingly correctly navigable in GBROWSE (using NC_011747 as the first sequence here, but the order is unimportant): """ [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref GeneID --noexon --recreate_cache --gfffile NC_011747.gbk.gff (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes...Done. Preparing data for inserting into the chado database (This may take a while ...) Dropping cds temp tables... Creating cds temp tables... NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_cds_row_id_seq" for serial column "tmp_cds_handler.cds_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_pkey" for table "tmp_cds_handler" NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_relationship_rel_row_id_seq" for serial column "tmp_cds_handler_relationship.rel_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" Loading data into feature table ... Loading data into featureloc table ... Loading data into feature_relationship table ... Loading data into featureprop table ... Skipping feature_cvterm table since the load file is empty... Skipping synonym table since the load file is empty... Skipping feature_synonym table since the load file is empty... Skipping dbxref table since the load file is empty... Loading data into feature_dbxref table ... Skipping analysisfeature table since the load file is empty... Skipping cvterm table since the load file is empty... Skipping db table since the load file is empty... Skipping cv table since the load file is empty... Skipping analysis table since the load file is empty... Skipping organism table since the load file is empty... Adding cvtermprop=MapReferenceType for 'region' ... Loading sequences (if any) ... Optimizing database (this may take a while) ... (feature featureloc feature_relationship featureprop feature_cvterm synonym feature_synonym dbxref feature_dbxref analysisfeature cvterm db cv analysis organism ) Done. While this script has made an effort to optimize the database, you should probably also run VACUUM FULL ANALYZE on the database as well """ """ chado=> SELECT feature_id, organism_id, name, uniquename FROM feature WHERE name='NC_011747'; feature_id | organism_id | name | uniquename ------------+-------------+-----------+------------ 146917 | 99 | NC_011747 | NC_011747 """ However, attempting to load in the second sequence throws an error (though this might also be a good point to check for ID uniqueness with a database check, and appropriate modification to the ID, if necessary - problems could arise if we were trying to add genuine duplicates, though...): """ [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref GeneID --noexon --recreate_cache --gfffile NC_011742.gbk.gff (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes...Done. Preparing data for inserting into the chado database (This may take a while ...) Dropping cds temp tables... Creating cds temp tables... NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_cds_row_id_seq" for serial column "tmp_cds_handler.cds_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_pkey" for table "tmp_cds_handler" NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_relationship_rel_row_id_seq" for serial column "tmp_cds_handler_relationship.rel_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" no parent yacC; you probably need to rerun the loader with the --recreate_cache option Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Pg::db handle dbname=chado;port=5432;host=localhost. """ This, of course, prevents the upload of the sequence and its annotations, as a whole. The script recommends that the --recreate_cache option should be used, but I am already using it. If the same process is run, reversing the order of the input files, the same error is reported, but for the gene with name 'int'. Both sequences contain genes with the names 'int' and 'yacC' (NC_011742 appears to contain four genes with the name 'int'): """ [lpritc at localhost ~]$ grep 'ID=yacC;' *.gbk.gff NC_011742.gbk.gff:NC_011742 GenBank gene 142755 143273 . - . ID=yacC;Dbxref=GeneID:7130628;gene=yacC;locus_tag=ECS88_0131 NC_011747.gbk.gff:NC_011747 GenBank gene 85083 85931 . + . ID=yacC;Dbxref=GeneID:7119486;gene=yacC;locus_tag=pECS88_0103 [lpritc at localhost ~]$ grep 'ID=int;' *.gbk.gff NC_011742.gbk.gff:NC_011742 GenBank gene 1182443 1183585 . - . ID=int;Dbxref=GeneID:7131611;gene=int;locus_tag=ECS88_1152 NC_011742.gbk.gff:NC_011742 GenBank pseudogene 1998684 1999646 . + . ID=int;Dbxref=GeneID:7128964;gene=int;locus_tag=ECS88_2031;pseudo=_no_value NC_011742.gbk.gff:NC_011742 GenBank gene 2829972 2830991 . + . ID=int;Dbxref=GeneID:7131911;gene=int;locus_tag=ECS88_2851 NC_011742.gbk.gff:NC_011742 GenBank gene 3220074 3221336 . + . ID=int;Dbxref=GeneID:7129893;gene=int;locus_tag=ECS88_3250 NC_011747.gbk.gff:NC_011747 GenBank gene 132 872 . + . ID=int;Dbxref=GeneID:7119360;gene=int;locus_tag=pECS88_0001 """ Commenting out either of these genes, and their child features, defers the error to another gene that has the same name in both sequences in each case. It seems that the problem might derive from attempting to uniquely associate each gene uniquely with its 'gene' tag in the GenBank file and, as there are several points in the process where it would be sensible to check for name collisions, so that the feature:uniquename column can be modified to reflect this, I looked for command-line options to each script, but didn't see one that could help. Examining the manual for gmod_bulk_load_gff3.pl suggests that this might be the problem (though I might be misunderstanding it): """ Column 9 (group) Here is where the magic happens. Assigning feature.name, feature.uniquename The values of feature.name and feature.uniquename are assigned according to these simple rules: If there is an ID tag, that is used as feature.uniquename otherwise, it is assigned a uniquename that is equal to ?auto? concatenated with the feature_id. (Note that this is a potential problem as there is no check to make sure that it is appropriately unique.) If there is a Name tag, it?s value is set to feature.name; otherwise it is null. Note that these rules are much more simple than that those that Bio::DB::GFF uses, and may need to be revisited. """ I suspect that, as the bp_genbank2gff3.pl script converts gene names (which are not guaranteed to be unique) to ID tags, the problem recognised in the manual is cropping up at this point. Luckily, the GenBank files come with locus_tag tags, which should be unique for each gene (see http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html#locus_tag). For bacteria, at least, using the locus_tag values might be a more robust option for the bp_genbank2gff3.pl; this already appears to have been recognised in the script comments: """ #?? should gene_name from /locus_tag,/gene,/product,/transposon=xxx # be converted to or added as Name=xxx (if not ID= or as well) ## problematic: convert_to_name ($feature); # drops /locus_tag,/gene, tags """ I can get round the upload problem somewhat suckily by changing the priority given to 'locus_tag' and 'gene' tags for generating the .gff ID tag in the bp_genbank2gff3.pl script: """ [lpritc at localhost ~]$ diff bp_genbank2gff3.pl /usr/bin/bp_genbank2gff3.pl 976,977c976,977 < if ($g->has_tag('locus_tag')) { < ($gene_id) = $g->get_tag_values('locus_tag'); --- > if ($g->has_tag('gene')) { > ($gene_id) = $g->get_tag_values('gene'); 979,980c979,980 < elsif ($g->has_tag('gene')) { < ($gene_id) = $g->get_tag_values('gene'); --- > elsif ($g->has_tag('locus_tag')) { > ($gene_id) = $g->get_tag_values('locus_tag'); """ But this isn't a complete solution, as GBROWSE searches by gene name don't work after making this change, and presumably some further configuration or hacking about is required to sort that out (advice welcome). So, what are other people doing to overcome this issue (if you've seen it), and would a change to the bp_genbank2gff.pl script along the lines I mention be useful to others? Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From janine.arloth at googlemail.com Mon Mar 1 11:25:09 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Mon, 1 Mar 2010 17:25:09 +0100 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> Hello, I am running blast+ and want to create blastdb, depending on a checkbox. That means when mydb is to old then I want to rebuilt the blastdb files and create a ''new'' db. When the latest versions of my files is ok, then blast should ran with the existing db. Using this code, there I will never built a new db. It is creating and than it does not create a new one. if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } Thanks for helping From jensen at fortinbras.us Mon Mar 1 22:58:09 2010 From: jensen at fortinbras.us (Mark A. Jensen) Date: Mon, 1 Mar 2010 22:58:09 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> References: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> Message-ID: <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> Hi Janine-- You'll need to get the latest version of Bio/Tools/Run/StandAloneBlastPlus.pm (rev. 16878). Then the -overwrite parameter will actually work, and you can write if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -overwrite => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } MAJ ----- Original Message ----- From: "Janine Arloth" To: Cc: Sent: Monday, March 01, 2010 11:25 AM Subject: StandAloneBlastPlus Hello, I am running blast+ and want to create blastdb, depending on a checkbox. That means when mydb is to old then I want to rebuilt the blastdb files and create a ''new'' db. When the latest versions of my files is ok, then blast should ran with the existing db. Using this code, there I will never built a new db. It is creating and than it does not create a new one. if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } Thanks for helping From szy0931 at gmail.com Tue Mar 2 01:08:10 2010 From: szy0931 at gmail.com (Zhenyu Shen) Date: Mon, 1 Mar 2010 22:08:10 -0800 (PST) Subject: [Bioperl-l] how to convert a txt file to a bed file? Message-ID: I want to convert a txt file to a bed file and then load the bed file to USCS genome browser. But how to convert the txt file to a bed file with perl? thanks From joaofadista at gmail.com Tue Mar 2 04:10:03 2010 From: joaofadista at gmail.com (fadista) Date: Tue, 2 Mar 2010 01:10:03 -0800 (PST) Subject: [Bioperl-l] Next-gen modules Message-ID: Hi, I would like to know if there is any Next-gen sequencing modules on Bioperl. Specifically, I would like to know if there is a perl script to trim poor quality sequence reads from Illumina/Solexa platform. Best regards, Fadista From maj at fortinbras.us Tue Mar 2 09:51:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 09:51:12 -0500 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com><56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com><56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> Message-ID: <18C0182252934619AD12E49243BE3C14@NewLife> This might a good method to have for Bio::Search::Tiling-- you want to stitch together all the hsps and have the concatenated alignment returned as a Bio::SimpleAlign, correct? Tiling would create the right set of hsps from which to generate the composite alignment. I can try to get something working, but it may take a while- MAJ ----- Original Message ----- From: "Paolo Pavan" To: "Chris Fields" Cc: Sent: Tuesday, March 02, 2010 9:37 AM Subject: Re: [Bioperl-l] Alignment from blast report Hi Chris, Thank you for your reply. So I have to understand that since the get_aln method returns the HSP alignment, there is no way to retrieve the whole alignment as in the example pasted, isn't it? Basically I'm trying to use megablast as kind of multiple local alignment engine and actually I'm not pretty sure this is a good idea but in my particular case could be suitable. I mean that the example below reports only the portions of the sequences that align loosing the portions that does not, I'm not sure I gave the idea. What do you think about? Can you give me your opinion? If there isn't any module written yet, I can try to write a parser, it could be of any interest? Thank you, Paolo 2010/3/2 Chris Fields : > Paolo, > > You can get a Bio::SimpleAlign from the HSP object. The first code example in > this section in the HOWTO demonstrates this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > chris > > On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > >> Dear all, >> Sorry for pushing up my post but, please does anyone have an hint for me? >> Maybe have I to send attached the report to the mailing list? I don't >> know attachment policies of the list, if it is allowed and is needed I >> can do that. >> >> Thank you, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Sorry, >>> Maybe I forgot to add this is the megablast -m 5 output. >>> >>> Thank you again, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> Hi all, >>>> I have just a brief question: I've got some megablast reports such the >>>> one I've pasted below. >>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>> entire alignment represented as a Bio::SimpleAlign object or >>>> Bio::Align::AlignI implementing one? >>>> >>>> Thank you all, >>>> Paolo >>>> >>>> >>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>> >>>> >>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>> (2000), >>>> "A greedy algorithm for aligning DNA sequences", >>>> J Comput Biol 2000; 7(1-2):203-14. >>>> >>>> Database: 00038-00053.fasta >>>> 2 sequences; 2001 total letters >>>> >>>> Searching..................................................done >>>> >>>> Query= 00038-00053 >>>> (802 letters) >>>> >>>> >>>> >>>> Score E >>>> Sequences producing significant alignments: (bits) Value >>>> >>>> ______00038 >>>> 226 1e-62 >>>> ______00053 >>>> 115 3e-29 >>>> >>>> 1_0 472 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>> ______00038 883 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 532 >>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>> ______00038 943 >>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 591 >>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 651 >>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 711 >>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 1 -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg >>>> 34 >>>> >>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>> ______00038 1000 -------------------------------- 1001 >>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>> Database: 00038-00053.fasta >>>> Posted date: Feb 25, 2010 4:47 PM >>>> Number of letters in database: 2001 >>>> Number of sequences in database: 2 >>>> >>>> Lambda K H >>>> 1.37 0.711 1.31 >>>> >>>> Gapped >>>> Lambda K H >>>> 1.37 0.711 1.31 >>>> >>>> >>>> Matrix: blastn matrix:1 -3 >>>> Gap Penalties: Existence: 0, Extension: 0 >>>> Number of Sequences: 2 >>>> Number of Hits to DB: 17 >>>> Number of extensions: 3 >>>> Number of successful extensions: 3 >>>> Number of sequences better than 10.0: 2 >>>> Number of HSP's gapped: 2 >>>> Number of HSP's successfully gapped: 2 >>>> Length of query: 802 >>>> Length of database: 2001 >>>> Length adjustment: 10 >>>> Effective length of query: 792 >>>> Effective length of database: 1981 >>>> Effective search space: 1568952 >>>> Effective search space used: 1568952 >>>> X1: 9 (17.8 bits) >>>> X2: 20 (39.6 bits) >>>> X3: 51 (101.1 bits) >>>> S1: 9 (18.3 bits) >>>> S2: 9 (18.3 bits) >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Mar 2 10:12:02 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 10:12:02 -0500 Subject: [Bioperl-l] Installing bioperl on windows In-Reply-To: <30b0ffab-3ad6-4b59-8c19-2f203ff6c4f9@f17g2000prh.googlegroups.com> References: <30b0ffab-3ad6-4b59-8c19-2f203ff6c4f9@f17g2000prh.googlegroups.com> Message-ID: The steps on the wiki are in fact quite detailed. What we need then is details from you--the commands you ran and your error messages. Thanks. ----- Original Message ----- From: "disha" To: Sent: Friday, February 26, 2010 8:43 AM Subject: [Bioperl-l] Installing bioperl on windows > Please tell me the procedure (detailed ) for installing bioperl on > windows vista.I tried the steps mentioned on the site but failed at > the initial steps > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From scott at scottcain.net Tue Mar 2 11:11:13 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 2 Mar 2010 11:11:13 -0500 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: References: Message-ID: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Hi Leighton, Wow, that is a lot of text; I really appreciate your thoroughness in describing the problem. I have a few suggestions to get the ball rolling. First, I am working on the 1.1 release of gmod/chado, and it may fix some of the problems you are describing. Certainly, ID collisions between GFF files should not be a problem (I didn't think they were in the 1.0 release, but that was a long time ago). Please try a checkout of the schema trunk in the gmod svn: http://gmod.org/wiki/SVN Another thing you may want to look at is that just last week, a developer at Texas A&M, Nathan Liles, contributed code to the bioperl-live trunk for the genbank2gff3.pl script that will do a much better job of converting bacterial genbank files to GFF3; perhaps that will help too. Working with a svn checkout of bioperl-live shouldn't be too scary either; the pieces you are interested in (that work with Chado and GBrowse) are quite stable. Let us know how it goes, Scott On Mon, Mar 1, 2010 at 6:32 AM, Leighton Pritchard wrote: > Hi, > > I've tried going back through the mailing list, Googling the answer, and > reading the documentation and wiki to find a solution for this. ?I've either > missed it, or it's not there yet. ?Hopefully there's a simple solution, or > an option that I'm just not seeing. ?I'm sure other people must be using > CHADO for bacterial genomes, and I would be interested in hearing about best > practice for using CHADO/GBROWSE with these sequences (I've seen > http://gmod.org/wiki/Chado_for_prokaryotes - but there's not much in > there...). > > I have a working CHADO(GMOD-1.0)/GBROWSE2/BioPerl 1.6.1 setup on CentOS 5.4, > and I'm trying to load some bacterial data. ?Specifically for this example, > I'm trying to get the GenBank sequences for E.coli S88: NC_011742 and > NC_011747 into CHADO. ?I've been following instructions from a number of > locations, including http://gmod.org/wiki/Artemis-Chado_Integration_Tutorial > and http://gmod.org/wiki/Chado_Tutorial, but there's an issue with these two > files, in that the NC_011742 (chromosome) and NC_011747 (plasmid) sequences > contain genes that have the same names (and several genes with the same name > in the same sequence!), and this appears to be a problem. ?Here's what's > going wrong: > > I start off with the two GenBank files: > > """ > [lpritc at localhost ~]$ ls -1 *.gbk > NC_011742.gbk > NC_011747.gbk > """ > > And convert these to .gff3 using the BioPerl script (it doesn't seem to > matter whether I pass them with the wildcard, or convert separately, though > passing multiple sequences for conversion might be a good place to check for > unique IDs): > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s *.gbk > # Input: NC_011742.gbk > # working on region:NC_011742, Escherichia coli S88, 19-DEC-2008, > Escherichia coli S88, complete genome. > # GFF3 saved to ./NC_011742.gbk.gff > # Summary: > # Feature ? ?Count > # ------- ? ?----- > # mRNA ?4696 > # gene ?4898 > # region ?1 > # pseudogene ?151 > # CDS ?4696 > # RESIDUES(tr) ?1442813 > # RESIDUES ?5032268 > # processed_transcript ?89 > # rRNA ?22 > # pseudogenic_region ?151 > # exon ?4899 > # tRNA ?91 > # > # Input: NC_011747.gbk > # working on region:NC_011747, Escherichia coli S88, 18-AUG-2009, > Escherichia coli S88 plasmid pECOS88, complete sequence. > # GFF3 saved to ./NC_011747.gbk.gff > # Summary: > # Feature ? ?Count > # ------- ? ?----- > # mRNA ?4832 > # gene ?5037 > # region ?2 > # pseudogene ?159 > # CDS ?4832 > # RESIDUES(tr) ?1477756 > # RESIDUES ?5166121 > # processed_transcript ?92 > # rRNA ?22 > # pseudogenic_region ?159 > # exon ?5038 > # tRNA ?91 > # > """ > > I can then use the gmod_bulk_load_gff3.pl script to load either file, but > only singly. ?This appears to work, and the result is visible and seemingly > correctly navigable in GBROWSE (using NC_011747 as the first sequence here, > but the order is unimportant): > > """ > [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref > GeneID --noexon --recreate_cache --gfffile NC_011747.gbk.gff > (Re)creating the uniquename cache in the database... > Creating table... > Populating table... > Creating indexes...Done. > Preparing data for inserting into the chado database > (This may take a while ...) > Dropping cds temp tables... > Creating cds temp tables... > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_cds_row_id_seq" for serial column > "tmp_cds_handler.cds_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_pkey" for table "tmp_cds_handler" > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column > "tmp_cds_handler_relationship.rel_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" > Loading data into feature table ... > Loading data into featureloc table ... > Loading data into feature_relationship table ... > Loading data into featureprop table ... > Skipping feature_cvterm table since the load file is empty... > Skipping synonym table since the load file is empty... > Skipping feature_synonym table since the load file is empty... > Skipping dbxref table since the load file is empty... > Loading data into feature_dbxref table ... > Skipping analysisfeature table since the load file is empty... > Skipping cvterm table since the load file is empty... > Skipping db table since the load file is empty... > Skipping cv table since the load file is empty... > Skipping analysis table since the load file is empty... > Skipping organism table since the load file is empty... > Adding cvtermprop=MapReferenceType for 'region' ... > Loading sequences (if any) ... > Optimizing database (this may take a while) ... > ?(feature featureloc feature_relationship featureprop feature_cvterm > synonym feature_synonym dbxref feature_dbxref analysisfeature cvterm db cv > analysis organism ) Done. > > While this script has made an effort to optimize the database, you > should probably also run VACUUM FULL ANALYZE on the database as well > """ > > """ > chado=> SELECT feature_id, organism_id, name, uniquename FROM feature WHERE > name='NC_011747'; > ?feature_id | organism_id | ? name ? ?| uniquename > ------------+-------------+-----------+------------ > ? ? 146917 | ? ? ? ? ?99 | NC_011747 | NC_011747 > """ > > However, attempting to load in the second sequence throws an error (though > this might also be a good point to check for ID uniqueness with a database > check, and appropriate modification to the ID, if necessary - problems could > arise if we were trying to add genuine duplicates, though...): > > """ > [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref > GeneID --noexon --recreate_cache --gfffile NC_011742.gbk.gff > (Re)creating the uniquename cache in the database... > Creating table... > Populating table... > Creating indexes...Done. > Preparing data for inserting into the chado database > (This may take a while ...) > Dropping cds temp tables... > Creating cds temp tables... > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_cds_row_id_seq" for serial column > "tmp_cds_handler.cds_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_pkey" for table "tmp_cds_handler" > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column > "tmp_cds_handler_relationship.rel_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" > > no parent yacC; > you probably need to rerun the loader with the --recreate_cache option > > Issuing rollback() due to DESTROY without explicit disconnect() of > DBD::Pg::db handle dbname=chado;port=5432;host=localhost. > """ > > This, of course, prevents the upload of the sequence and its annotations, as > a whole. > > The script recommends that the --recreate_cache option should be used, but I > am already using it. ?If the same process is run, reversing the order of the > input files, the same error is reported, but for the gene with name 'int'. > Both sequences contain genes with the names 'int' and 'yacC' (NC_011742 > appears to contain four genes with the name 'int'): > > """ > [lpritc at localhost ~]$ grep 'ID=yacC;' *.gbk.gff > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?142755 ? ?143273 ? ?. ? ?- > . ? ?ID=yacC;Dbxref=GeneID:7130628;gene=yacC;locus_tag=ECS88_0131 > NC_011747.gbk.gff:NC_011747 ? ?GenBank ? ?gene ? ?85083 ? ?85931 ? ?. ? ?+ > . ? ?ID=yacC;Dbxref=GeneID:7119486;gene=yacC;locus_tag=pECS88_0103 > > [lpritc at localhost ~]$ grep 'ID=int;' *.gbk.gff > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?1182443 ? ?1183585 ? ?. > - ? ?. ? ?ID=int;Dbxref=GeneID:7131611;gene=int;locus_tag=ECS88_1152 > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?pseudogene ? ?1998684 ? ?1999646 > . ? ?+ ? ?. > ID=int;Dbxref=GeneID:7128964;gene=int;locus_tag=ECS88_2031;pseudo=_no_value > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?2829972 ? ?2830991 ? ?. > + ? ?. ? ?ID=int;Dbxref=GeneID:7131911;gene=int;locus_tag=ECS88_2851 > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?3220074 ? ?3221336 ? ?. > + ? ?. ? ?ID=int;Dbxref=GeneID:7129893;gene=int;locus_tag=ECS88_3250 > NC_011747.gbk.gff:NC_011747 ? ?GenBank ? ?gene ? ?132 ? ?872 ? ?. ? ?+ ? ?. > ID=int;Dbxref=GeneID:7119360;gene=int;locus_tag=pECS88_0001 > """ > > Commenting out either of these genes, and their child features, defers the > error to another gene that has the same name in both sequences in each case. > It seems that the problem might derive from attempting to uniquely associate > each gene uniquely with its 'gene' tag in the GenBank file and, as there are > several points in the process where it would be sensible to check for name > collisions, so that the feature:uniquename column can be modified to reflect > this, I looked for command-line options to each script, but didn't see one > that could help. ?Examining the manual for gmod_bulk_load_gff3.pl suggests > that this might be the problem (though I might be misunderstanding it): > > """ > ? ? ? Column 9 (group) > ? ? ? ? ? Here is where the magic happens. > > ? ? ? ? ? Assigning feature.name, feature.uniquename > ? ? ? ? ? ? ? The values of feature.name and feature.uniquename are > assigned according to these simple rules: > > ? ? ? ? ? ? ? If there is an ID tag, that is used as feature.uniquename > ? ? ? ? ? ? ? ? ? otherwise, it is assigned a uniquename that is equal to > ?auto? concatenated with the feature_id. > > ? ? ? ? ? ? ? ? ? (Note that this is a potential problem as there is no > check to make sure that it is appropriately unique.) > > ? ? ? ? ? ? ? If there is a Name tag, it?s value is set to feature.name; > ? ? ? ? ? ? ? ? ? otherwise it is null. > > ? ? ? ? ? ? ? ? ? Note that these rules are much more simple than that > those that Bio::DB::GFF uses, and may need to be revisited. > """ > > I suspect that, as the bp_genbank2gff3.pl script converts gene names (which > are not guaranteed to be unique) to ID tags, the problem recognised in the > manual is cropping up at this point. ?Luckily, the GenBank files come with > locus_tag tags, which should be unique for each gene (see > http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html#locus_tag). ?For > bacteria, at least, using the locus_tag values might be a more robust option > for the bp_genbank2gff3.pl; this already appears to have been recognised in > the script comments: > > """ > ? ? ? ? ? ?#?? should gene_name from > /locus_tag,/gene,/product,/transposon=xxx > ? ? ? ? ? ?# be converted to or added as ?Name=xxx (if not ID= or as well) > ? ? ? ? ? ?## problematic: convert_to_name ($feature); # drops > /locus_tag,/gene, tags > """ > > I can get round the upload problem somewhat suckily by changing the priority > given to 'locus_tag' and 'gene' tags for generating the .gff ID tag in the > bp_genbank2gff3.pl script: > > """ > [lpritc at localhost ~]$ diff bp_genbank2gff3.pl /usr/bin/bp_genbank2gff3.pl > 976,977c976,977 > < ? ? if ($g->has_tag('locus_tag')) { > < ? ? ? ? ($gene_id) = $g->get_tag_values('locus_tag'); > --- >> ? ? if ($g->has_tag('gene')) { >> ? ? ? ? ($gene_id) = $g->get_tag_values('gene'); > 979,980c979,980 > < ? ? elsif ($g->has_tag('gene')) { > < ? ? ? ? ($gene_id) = $g->get_tag_values('gene'); > --- >> ? ? elsif ($g->has_tag('locus_tag')) { >> ? ? ? ? ($gene_id) = $g->get_tag_values('locus_tag'); > """ > > But this isn't a complete solution, as GBROWSE searches by gene name don't > work after making this change, and presumably some further configuration or > hacking about is required to sort that out (advice welcome). > > So, what are other people doing to overcome this issue (if you've seen it), > and would a change to the bp_genbank2gff.pl script along the lines I mention > be useful to others? > > Cheers, > > L. > > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From sdavis2 at mail.nih.gov Tue Mar 2 11:33:38 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 2 Mar 2010 11:33:38 -0500 Subject: [Bioperl-l] how to convert a txt file to a bed file? In-Reply-To: References: Message-ID: <264855a01003020833v3e15dcb7vcdd876ce80468740@mail.gmail.com> On Tue, Mar 2, 2010 at 1:08 AM, Zhenyu Shen wrote: > I want to convert a txt file to a bed file and then load the bed file > to USCS genome browser. But how to convert the txt file to a bed file > with perl? Hi, Zhenyu. A bed file IS a text file, with the format described here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#BED You just need to make your text file conform to that format and you are set to go. Sean From paolo.pavan at gmail.com Tue Mar 2 10:17:35 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 16:17:35 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <18C0182252934619AD12E49243BE3C14@NewLife> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> <18C0182252934619AD12E49243BE3C14@NewLife> Message-ID: <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> I think you got the sense, thank you. Of course hsps from different hits will be reflected in different elements aligned. I've attached the example pasted (unix text) because is more readable, hoping will not be held by the mailing server :-) Thank you, Paolo 2010/3/2 Mark A. Jensen : > This might a good method to have for Bio::Search::Tiling-- > you want to stitch together all the hsps and have the > concatenated alignment returned as a Bio::SimpleAlign, > correct? Tiling would create the right set of hsps from > which to generate the composite alignment. I can > try to get something working, but it may take a while- > MAJ > ----- Original Message ----- From: "Paolo Pavan" > To: "Chris Fields" > Cc: > Sent: Tuesday, March 02, 2010 9:37 AM > Subject: Re: [Bioperl-l] Alignment from blast report > > > Hi Chris, > Thank you for your reply. So I have to understand that since the > get_aln method returns the HSP alignment, there is no way to retrieve > the whole alignment as in the example pasted, isn't it? > Basically I'm trying to use megablast as kind of multiple local > alignment engine and actually I'm not pretty sure this is a good idea > but in my particular case could be suitable. I mean that the example > below reports only the portions of the sequences that align loosing > the portions that does not, I'm not sure I gave the idea. What do you > think about? Can you give me your opinion? > If there isn't any module written yet, I can try to write a parser, it > could be of any interest? > > Thank you, > Paolo > > 2010/3/2 Chris Fields : >> >> Paolo, >> >> You can get a Bio::SimpleAlign from the HSP object. The first code example >> in this section in the HOWTO demonstrates this: >> >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >> >> chris >> >> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >> >>> Dear all, >>> Sorry for pushing up my post but, please does anyone have an hint for me? >>> Maybe have I to send attached the report to the mailing list? I don't >>> know attachment policies of the list, if it is allowed and is needed I >>> can do that. >>> >>> Thank you, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> >>>> Sorry, >>>> Maybe I forgot to add this is the megablast -m 5 output. >>>> >>>> Thank you again, >>>> Paolo >>>> >>>> 2010/2/26 Paolo Pavan : >>>>> >>>>> Hi all, >>>>> I have just a brief question: I've got some megablast reports such the >>>>> one I've pasted below. >>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>> Bio::Align::AlignI implementing one? >>>>> >>>>> Thank you all, >>>>> Paolo >>>>> >>>>> >>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>> >>>>> >>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>> (2000), >>>>> "A greedy algorithm for aligning DNA sequences", >>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>> >>>>> Database: 00038-00053.fasta >>>>> 2 sequences; 2001 total letters >>>>> >>>>> Searching..................................................done >>>>> >>>>> Query= 00038-00053 >>>>> (802 letters) >>>>> >>>>> >>>>> >>>>> Score E >>>>> Sequences producing significant alignments: (bits) Value >>>>> >>>>> ______00038 >>>>> 226 1e-62 >>>>> ______00053 >>>>> 115 3e-29 >>>>> >>>>> 1_0 472 >>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>> ______00038 883 >>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 532 >>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>> ______00038 943 >>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 591 >>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 651 >>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 711 >>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 1 >>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>> >>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>> ______00038 1000 -------------------------------- 1001 >>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>> Database: 00038-00053.fasta >>>>> Posted date: Feb 25, 2010 4:47 PM >>>>> Number of letters in database: 2001 >>>>> Number of sequences in database: 2 >>>>> >>>>> Lambda K H >>>>> 1.37 0.711 1.31 >>>>> >>>>> Gapped >>>>> Lambda K H >>>>> 1.37 0.711 1.31 >>>>> >>>>> >>>>> Matrix: blastn matrix:1 -3 >>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>> Number of Sequences: 2 >>>>> Number of Hits to DB: 17 >>>>> Number of extensions: 3 >>>>> Number of successful extensions: 3 >>>>> Number of sequences better than 10.0: 2 >>>>> Number of HSP's gapped: 2 >>>>> Number of HSP's successfully gapped: 2 >>>>> Length of query: 802 >>>>> Length of database: 2001 >>>>> Length adjustment: 10 >>>>> Effective length of query: 792 >>>>> Effective length of database: 1981 >>>>> Effective search space: 1568952 >>>>> Effective search space used: 1568952 >>>>> X1: 9 (17.8 bits) >>>>> X2: 20 (39.6 bits) >>>>> X3: 51 (101.1 bits) >>>>> S1: 9 (18.3 bits) >>>>> S2: 9 (18.3 bits) >>>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: example.megaout Type: application/octet-stream Size: 2918 bytes Desc: not available URL: From Russell.Smithies at agresearch.co.nz Tue Mar 2 14:35:19 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Mar 2010 08:35:19 +1300 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> References: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61E4E660@exchsth.agresearch.co.nz> If you want to continue using your current version, you could try to delete your old blast db first. if($checkbox eq 'yes'){ unlink "mydb.*"; #or maybe `rm -f mydb.*` $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Tuesday, 2 March 2010 4:58 p.m. > To: Janine Arloth > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] StandAloneBlastPlus > > Hi Janine-- > You'll need to get the latest version of > Bio/Tools/Run/StandAloneBlastPlus.pm > (rev. 16878). > Then the -overwrite parameter will actually work, and you can write > > if($checkbox eq 'yes'){ > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -prog_dir => "/usr/local/ncbi/blast/bin", > -db_name => 'mydb', > -db_data => 'xxx.fa', > -overwrite => 1); > } > else{ > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'mydb'); > } > > MAJ > > ----- Original Message ----- > From: "Janine Arloth" > To: > Cc: > Sent: Monday, March 01, 2010 11:25 AM > Subject: StandAloneBlastPlus > > > Hello, > > I am running blast+ and want to create blastdb, depending on a checkbox. > That > means when mydb is to old then I want to rebuilt the blastdb files and > create a > ''new'' db. > When the latest versions of my files is ok, then blast should ran with > the > existing db. > Using this code, there I will never built a new db. It is creating and > than it > does not create a new one. > > > if($checkbox eq 'yes'){ > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -prog_dir => "/usr/local/ncbi/blast/bin", > -db_name => 'mydb', > -db_data => 'xxx.fa', > -create => 1); > } > else{ > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'mydb'); > } > > Thanks for helping > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From armendarez77 at hotmail.com Tue Mar 2 16:06:17 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Tue, 2 Mar 2010 13:06:17 -0800 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 Message-ID: Hello, I am writing a script to remotely access annotation files and parse information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it with random RefSeq accession numbers (NC_######) when something odd happened. When I used the accession number 'NC_007092', the script seemed to freeze. After some time, 'Out of Memory' was printed to the terminal. When I investigated the annotation file associated with NC_007092, a MapViewer page opened. It turns out that NC_007092 is a genome shotgun sequence, but it does not start with 'NZ' as I though all shotgun sequences did. Is this a random event that I don't have to worry much about or is there a way to pre-screen accession numbers to ensure they are associated with complete genome RefSeq files? I've included my script in case there is something I missed that could have prevented this. Thank you, Veronica _________________ use strict; use Bio::Perl; use Getopt::Long; use IO::Handle; my $accessionNumber; GetOptions("accessionNumber=s"=>\$accessionNumber); unless($accessionNumber){ print<<"OPTIONS"; options for $0 accessionNumber -a accession number OPTIONS die; } my $description = annotation_info($accessionNumber); print "$description\n"; sub annotation_info{ my $seqObj; my $accNum = shift(@_); my $rs = Bio::DB::RefSeq->new(); my $gb = Bio::DB::GenBank->new(); if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore in their accession number $seqObj = $rs->get_Seq_by_id($accNum); } elsif($accNum !~ /_/){ #GenBank annotation $seqObj = $gb->get_Seq_by_id($accNum); } return $seqObj->desc(); } _________________________________________________________________ Hotmail: Trusted email with Microsoft?s powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469226/direct/01/ From maj at fortinbras.us Tue Mar 2 15:58:59 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 15:58:59 -0500 Subject: [Bioperl-l] bioperl job Message-ID: Hi All, I have a contact looking for an individual with Bioperl experience who could do contractual on-site work in the Cambridge MA area. **I have no business interest in this whatever, just doing a friend a favor.** Let me know directly (not to the list) if you have interest. thanks -- MAJ From Russell.Smithies at agresearch.co.nz Tue Mar 2 18:08:51 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Mar 2010 12:08:51 +1300 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large. Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html Also, it may help to check the docsum first to see how big the file is going to be? (the full Genbank file for this example is only 6MB in size) =================== use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' ); my ($id) = $factory->get_ids; # get a summary $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); my $ds = $factory->next_DocSum; print "ID: $id\n"; # flattened mode while (my $item = $ds->next_Item('flattened')) { # not all Items have content, so need to check... printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } print "\n"; # download the full genbank file $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => "$id.gb"); ================ Hope this helps, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com > Sent: Wednesday, 3 March 2010 10:06 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > > Hello, > > I am writing a script to remotely access annotation files and parse > information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it > with random RefSeq accession numbers (NC_######) when something odd > happened. When I used the accession number 'NC_007092', the script seemed > to freeze. After some time, 'Out of Memory' was printed to the terminal. > > When I investigated the annotation file associated with NC_007092, a > MapViewer page opened. It turns out that NC_007092 is a genome shotgun > sequence, but it does not start with 'NZ' as I though all shotgun > sequences did. > > Is this a random event that I don't have to worry much about or is there a > way to pre-screen accession numbers to ensure they are associated with > complete genome RefSeq files? > > I've included my script in case there is something I missed that could > have prevented this. > > Thank you, > > Veronica > > > _________________ > > use strict; > use Bio::Perl; > use Getopt::Long; > use IO::Handle; > > my $accessionNumber; > > GetOptions("accessionNumber=s"=>\$accessionNumber); > unless($accessionNumber){ > print<<"OPTIONS"; > options for $0 > accessionNumber -a accession number > OPTIONS > die; > } > > my $description = annotation_info($accessionNumber); > > print "$description\n"; > > > > sub annotation_info{ > > my $seqObj; > > my $accNum = shift(@_); > > my $rs = Bio::DB::RefSeq->new(); > my $gb = Bio::DB::GenBank->new(); > > > if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore > in their accession number > > $seqObj = $rs->get_Seq_by_id($accNum); > } > elsif($accNum !~ /_/){ #GenBank annotation > $seqObj = $gb->get_Seq_by_id($accNum); > } > > return $seqObj->desc(); > } > > > _________________________________________________________________ > Hotmail: Trusted email with Microsoft's powerful SPAM protection. > http://clk.atdmt.com/GBL/go/201469226/direct/01/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From armendarez77 at hotmail.com Tue Mar 2 18:16:03 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Tue, 2 Mar 2010 15:16:03 -0800 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> References: , <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> Message-ID: I see. I work mostly in the bacteria world so mammalian chromosomes shouldn't be an issue. I just randomly picked it to test my script when it came up after I did a simple search for Bacillus in the Genome database. I'll look into docSum to help prevent unexpected large files from interrupting my script. Thank you. Veronica > From: Russell.Smithies at agresearch.co.nz > To: armendarez77 at hotmail.com; bioperl-l at lists.open-bio.org > Date: Wed, 3 Mar 2010 12:08:51 +1300 > Subject: Re: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large. > Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html > > Also, it may help to check the docsum first to see how big the file is going to be? > (the full Genbank file for this example is only 6MB in size) > > =================== > use Bio::DB::EUtilities; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' ); > > my ($id) = $factory->get_ids; > > # get a summary > $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); > my $ds = $factory->next_DocSum; > print "ID: $id\n"; > # flattened mode > while (my $item = $ds->next_Item('flattened')) { > # not all Items have content, so need to check... > printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; > } > print "\n"; > > > # download the full genbank file > $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'nucleotide', > -id => $id, > -rettype => 'gbwithparts'); > $factory->get_Response(-file => "$id.gb"); > > ================ > > Hope this helps, > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com > > Sent: Wednesday, 3 March 2010 10:06 a.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > > > > > Hello, > > > > I am writing a script to remotely access annotation files and parse > > information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it > > with random RefSeq accession numbers (NC_######) when something odd > > happened. When I used the accession number 'NC_007092', the script seemed > > to freeze. After some time, 'Out of Memory' was printed to the terminal. > > > > When I investigated the annotation file associated with NC_007092, a > > MapViewer page opened. It turns out that NC_007092 is a genome shotgun > > sequence, but it does not start with 'NZ' as I though all shotgun > > sequences did. > > > > Is this a random event that I don't have to worry much about or is there a > > way to pre-screen accession numbers to ensure they are associated with > > complete genome RefSeq files? > > > > I've included my script in case there is something I missed that could > > have prevented this. > > > > Thank you, > > > > Veronica > > > > > > _________________ > > > > use strict; > > use Bio::Perl; > > use Getopt::Long; > > use IO::Handle; > > > > my $accessionNumber; > > > > GetOptions("accessionNumber=s"=>\$accessionNumber); > > unless($accessionNumber){ > > print<<"OPTIONS"; > > options for $0 > > accessionNumber -a accession number > > OPTIONS > > die; > > } > > > > my $description = annotation_info($accessionNumber); > > > > print "$description\n"; > > > > > > > > sub annotation_info{ > > > > my $seqObj; > > > > my $accNum = shift(@_); > > > > my $rs = Bio::DB::RefSeq->new(); > > my $gb = Bio::DB::GenBank->new(); > > > > > > if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore > > in their accession number > > > > $seqObj = $rs->get_Seq_by_id($accNum); > > } > > elsif($accNum !~ /_/){ #GenBank annotation > > $seqObj = $gb->get_Seq_by_id($accNum); > > } > > > > return $seqObj->desc(); > > } > > > > > > _________________________________________________________________ > > Hotmail: Trusted email with Microsoft's powerful SPAM protection. > > http://clk.atdmt.com/GBL/go/201469226/direct/01/ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ From csaba.ortutay at uta.fi Thu Mar 4 04:57:00 2010 From: csaba.ortutay at uta.fi (Csaba Ortutay) Date: Thu, 4 Mar 2010 11:57:00 +0200 Subject: [Bioperl-l] Bio::DB::CUTG problem Message-ID: <201003041157.01013.csaba.ortutay@uta.fi> Hello, We would use Bio::DB::CUTG module to get codon usage data for a large number of genomes. We have noticed that the module cannot findcertain organisms which are otherwise in the database. It happens when the name contains some non- alphabetic characters. A few examples: Streptococcus agalactiae 2603V/R Shigella flexneri 5 str. 8401 I have located the corresponding part in the CUTG.pm code, and I would suggest a change: 222c222 < my $nameparts = join "+", $self->sp =~ /(\w+)/g; --- > my $nameparts = join "+", $self->sp =~ /(\S+)/g; With this I can now access the wanted tables. Best regards, Csaba -- Csaba Ortutay PhD Docent of Bioinformatics IMT Bioinformatics University of Tampere Finland From maj at fortinbras.us Thu Mar 4 08:10:06 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 08:10:06 -0500 Subject: [Bioperl-l] Bio::DB::CUTG problem In-Reply-To: <201003041157.01013.csaba.ortutay@uta.fi> References: <201003041157.01013.csaba.ortutay@uta.fi> Message-ID: Thanks, Csaba - change made and commited at r16898 MAJA ----- Original Message ----- From: "Csaba Ortutay" To: Sent: Thursday, March 04, 2010 4:57 AM Subject: [Bioperl-l] Bio::DB::CUTG problem > Hello, > > We would use Bio::DB::CUTG module to get codon usage data for a large number > of genomes. > > We have noticed that the module cannot findcertain organisms which are > otherwise in the database. It happens when the name contains some non- > alphabetic characters. > > A few examples: > > Streptococcus agalactiae 2603V/R > Shigella flexneri 5 str. 8401 > > I have located the corresponding part in the CUTG.pm code, and I would suggest > a change: > > 222c222 > < my $nameparts = join "+", $self->sp =~ /(\w+)/g; > --- >> my $nameparts = join "+", $self->sp =~ /(\S+)/g; > > > With this I can now access the wanted tables. > > Best regards, > Csaba > > -- > Csaba Ortutay PhD > Docent of Bioinformatics > IMT Bioinformatics > University of Tampere > Finland > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Thu Mar 4 09:40:18 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 04 Mar 2010 14:40:18 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> Message-ID: <4B8FC652.2010607@bioperl.org> Palani - This should be directed to the mailing list. -------- Original Message -------- From: PalaniKannan K Subject: Enquiry about Remoteblast.pm Date: Thu, 4 Mar 2010 10:23:45 +0530 I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through Remoteblast.pm script available through CPAN. When i am submitting my query... it shows waiting for much time. Ex. (waiting .....................) http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html This is the reference script i am using through Remoteblast perl module. It worked upto last 02/03/2010. Now it is not working We had developed 3 applications using this module. The same error comes in 3 applications we developed. So, i confim that our script dont have problem. Kindly help me in this regard. -- With Regards, palani kannan. k From maj at fortinbras.us Thu Mar 4 09:50:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 09:50:54 -0500 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com><56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com><56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com><56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com><18C0182252934619AD12E49243BE3C14@NewLife> <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> Message-ID: <2FB5C317605B48269256ABFABBED2239@NewLife> Paolo -- Ok, there's now (r16900) an *experimental* method in Bio::Search::Tiling::MapTiling called get_tiled_alns(). POD is below. Try it out and let me know-- cheers, MAJ =head1 TILED ALIGNMENTS The experimental method L will use a tiling to concatenate tiled hsps into a series of L objects: @alns = $tiling->get_tiled_alns($type, $context); Each alignment contains two sequences with ids 'query' and 'subject', and consists of a concatenation of tiling HSPs which overlap or are directly adjacent. The alignment are returned in C<$type> sequence order. When HSPs overlap, the alignment sequence is taken from the HSP which comes first in the coverage map array. The sequences in each alignment contain features (even though they are L objects) which map the original query/subject coordinates to the new alignment sequence coordinates. You can determine the original BLAST fragments this way: $aln = ($tiling->get_tiled_alns)[0]; $qseq = $aln->get_seq_by_id('query'); $hseq = $aln->get_seq_by_id('subject'); foreach my $feat ($qseq->get_SeqFeatures) { $org_start = ($feat->get_tag_values('query_start'))[0]; $org_end = ($feat->get_tag_values('query_end'))[0]; # original fragment as represented in the tiled alignment: $org_fragment = $feat->seq; } foreach my $feat ($hseq->get_SeqFeatures) { $org_start = ($feat->get_tag_values('subject_start'))[0]; $org_end = ($feat->get_tag_values('subject_end'))[0]; # original fragment as represented in the tiled alignment: $org_fragment = $feat->seq; } ----- Original Message ----- From: "Paolo Pavan" To: "Mark A. Jensen" Cc: "Chris Fields" ; Sent: Tuesday, March 02, 2010 10:17 AM Subject: Re: [Bioperl-l] Alignment from blast report >I think you got the sense, thank you. Of course hsps from different > hits will be reflected in different elements aligned. I've attached > the example pasted (unix text) because is more readable, hoping will > not be held by the mailing server :-) > > Thank you, > Paolo > > 2010/3/2 Mark A. Jensen : >> This might a good method to have for Bio::Search::Tiling-- >> you want to stitch together all the hsps and have the >> concatenated alignment returned as a Bio::SimpleAlign, >> correct? Tiling would create the right set of hsps from >> which to generate the composite alignment. I can >> try to get something working, but it may take a while- >> MAJ >> ----- Original Message ----- From: "Paolo Pavan" >> To: "Chris Fields" >> Cc: >> Sent: Tuesday, March 02, 2010 9:37 AM >> Subject: Re: [Bioperl-l] Alignment from blast report >> >> >> Hi Chris, >> Thank you for your reply. So I have to understand that since the >> get_aln method returns the HSP alignment, there is no way to retrieve >> the whole alignment as in the example pasted, isn't it? >> Basically I'm trying to use megablast as kind of multiple local >> alignment engine and actually I'm not pretty sure this is a good idea >> but in my particular case could be suitable. I mean that the example >> below reports only the portions of the sequences that align loosing >> the portions that does not, I'm not sure I gave the idea. What do you >> think about? Can you give me your opinion? >> If there isn't any module written yet, I can try to write a parser, it >> could be of any interest? >> >> Thank you, >> Paolo >> >> 2010/3/2 Chris Fields : >>> >>> Paolo, >>> >>> You can get a Bio::SimpleAlign from the HSP object. The first code example >>> in this section in the HOWTO demonstrates this: >>> >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>> >>> chris >>> >>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >>> >>>> Dear all, >>>> Sorry for pushing up my post but, please does anyone have an hint for me? >>>> Maybe have I to send attached the report to the mailing list? I don't >>>> know attachment policies of the list, if it is allowed and is needed I >>>> can do that. >>>> >>>> Thank you, >>>> Paolo >>>> >>>> 2010/2/26 Paolo Pavan : >>>>> >>>>> Sorry, >>>>> Maybe I forgot to add this is the megablast -m 5 output. >>>>> >>>>> Thank you again, >>>>> Paolo >>>>> >>>>> 2010/2/26 Paolo Pavan : >>>>>> >>>>>> Hi all, >>>>>> I have just a brief question: I've got some megablast reports such the >>>>>> one I've pasted below. >>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>>> Bio::Align::AlignI implementing one? >>>>>> >>>>>> Thank you all, >>>>>> Paolo >>>>>> >>>>>> >>>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>>> >>>>>> >>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>>> (2000), >>>>>> "A greedy algorithm for aligning DNA sequences", >>>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>>> >>>>>> Database: 00038-00053.fasta >>>>>> 2 sequences; 2001 total letters >>>>>> >>>>>> Searching..................................................done >>>>>> >>>>>> Query= 00038-00053 >>>>>> (802 letters) >>>>>> >>>>>> >>>>>> >>>>>> Score E >>>>>> Sequences producing significant alignments: (bits) Value >>>>>> >>>>>> ______00038 >>>>>> 226 1e-62 >>>>>> ______00053 >>>>>> 115 3e-29 >>>>>> >>>>>> 1_0 472 >>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>>> ______00038 883 >>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 532 >>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>>> ______00038 943 >>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 591 >>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 651 >>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 711 >>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 1 >>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>>> >>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>>> ______00038 1000 -------------------------------- 1001 >>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>>> Database: 00038-00053.fasta >>>>>> Posted date: Feb 25, 2010 4:47 PM >>>>>> Number of letters in database: 2001 >>>>>> Number of sequences in database: 2 >>>>>> >>>>>> Lambda K H >>>>>> 1.37 0.711 1.31 >>>>>> >>>>>> Gapped >>>>>> Lambda K H >>>>>> 1.37 0.711 1.31 >>>>>> >>>>>> >>>>>> Matrix: blastn matrix:1 -3 >>>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>>> Number of Sequences: 2 >>>>>> Number of Hits to DB: 17 >>>>>> Number of extensions: 3 >>>>>> Number of successful extensions: 3 >>>>>> Number of sequences better than 10.0: 2 >>>>>> Number of HSP's gapped: 2 >>>>>> Number of HSP's successfully gapped: 2 >>>>>> Length of query: 802 >>>>>> Length of database: 2001 >>>>>> Length adjustment: 10 >>>>>> Effective length of query: 792 >>>>>> Effective length of database: 1981 >>>>>> Effective search space: 1568952 >>>>>> Effective search space used: 1568952 >>>>>> X1: 9 (17.8 bits) >>>>>> X2: 20 (39.6 bits) >>>>>> X3: 51 (101.1 bits) >>>>>> S1: 9 (18.3 bits) >>>>>> S2: 9 (18.3 bits) >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From janine.arloth at googlemail.com Wed Mar 3 04:44:18 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Wed, 3 Mar 2010 10:44:18 +0100 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: References: Message-ID: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Hello, which arguments or result can I get from hits? hit = $result->next_hit; print $hit->name; Are there more than the name? Exists a description, where I can look up this? Regards From David.Messina at sbc.su.se Thu Mar 4 10:27:46 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 4 Mar 2010 16:27:46 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <4B8FC652.2010607@bioperl.org> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> Message-ID: <31C89CCE-25B8-492A-924D-A7401D415584@sbc.su.se> Hi Palani, You're using a very old version of BioPerl, 1.4: > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html The current release version is 1.6.1. Also, NCBi is changing (or may have already changed) their remote access system to require an email address. The very latest builds of BioPerl should now be compatible with this change. Get it here: http://www.bioperl.org/DIST/nightly_builds/ or directly via Subversion ? instructions here: http://www.bioperl.org/wiki/Getting_BioPerl Dave From cjfields at illinois.edu Thu Mar 4 10:30:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 04 Mar 2010 09:30:54 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <4B8FC652.2010607@bioperl.org> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> Message-ID: <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> Palani, We have a few regression tests that should have caught this but aren't quite set up correctly (they silently pass if no report is returned). This may be smoething on NCBI's end though; any remote database or analyses are notoriously brittle, hence the need to skip these by default when installing tests. Final note, but hopefully you aren't using bioperl 1.4 (as indicated by the docs). We're now on the 1.6 release series and are now on v. 1.6.1; 1.4 isn't supported anymore. chris On Thu, 2010-03-04 at 14:40 +0000, Jason Stajich wrote: > Palani - > This should be directed to the mailing list. > > -------- Original Message -------- > From: PalaniKannan K > Subject: Enquiry about Remoteblast.pm > Date: Thu, 4 Mar 2010 10:23:45 +0530 > > > > > > I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through > Remoteblast.pm script available through CPAN. When i am submitting my > query... it shows waiting for much time. Ex. (waiting .....................) > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html > > This is the reference script i am using through Remoteblast perl module. > > It worked upto last 02/03/2010. Now it is not working > > We had developed 3 applications using this module. The same error comes in 3 > applications we developed. So, i confim that our script dont have problem. > Kindly help me in this regard. > > -- > With Regards, > palani kannan. k > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Mar 4 10:27:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 10:27:16 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> References: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Message-ID: Check out http://www.bioperl.org/wiki/HOWTO:SearchIO MAJ ----- Original Message ----- From: "Janine Arloth" To: Sent: Wednesday, March 03, 2010 4:44 AM Subject: [Bioperl-l] StandAloneBlastPlus > Hello, > > which arguments or result can I get from hits? > > hit = $result->next_hit; > print $hit->name; > > Are there more than the name? Exists a description, where I can look up this? > > Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Thu Mar 4 10:25:45 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 04 Mar 2010 10:25:45 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> References: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Message-ID: <90B9BFFC-73DA-469F-900C-70448A9B1C03@verizon.net> http://www.bioperl.org/wiki/HOWTO:SearchIO On Mar 3, 2010, at 4:44 AM, Janine Arloth wrote: > Hello, > > which arguments or result can I get from hits? > > hit = $result->next_hit; > print $hit->name; > > Are there more than the name? Exists a description, where I can look up this? > > Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 4 11:49:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 04 Mar 2010 10:49:01 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> Message-ID: <1267721341.23329.26.camel@pyrimidine.igb.uiuc.edu> Okay, I'm able to replicate this (and the tests now correctly attempt to catch it). It appears that this may be a general RemoteBlast issue, as regular RemoteBlast tests are also taking forever. This shouldn't be related to the email issue (this isn't in RemoteBlast.pm yet). At least, I would hope NCBI would pass back another status besides 'WAITING' in cases where the email isn't provided. chris On Thu, 2010-03-04 at 09:30 -0600, Chris Fields wrote: > Palani, > > We have a few regression tests that should have caught this but aren't > quite set up correctly (they silently pass if no report is returned). > This may be smoething on NCBI's end though; any remote database or > analyses are notoriously brittle, hence the need to skip these by > default when installing tests. > > Final note, but hopefully you aren't using bioperl 1.4 (as indicated by > the docs). We're now on the 1.6 release series and are now on v. 1.6.1; > 1.4 isn't supported anymore. > > chris > > On Thu, 2010-03-04 at 14:40 +0000, Jason Stajich wrote: > > Palani - > > This should be directed to the mailing list. > > > > -------- Original Message -------- > > From: PalaniKannan K > > Subject: Enquiry about Remoteblast.pm > > Date: Thu, 4 Mar 2010 10:23:45 +0530 > > > > > > > > > > > > I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through > > Remoteblast.pm script available through CPAN. When i am submitting my > > query... it shows waiting for much time. Ex. (waiting .....................) > > > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html > > > > This is the reference script i am using through Remoteblast perl module. > > > > It worked upto last 02/03/2010. Now it is not working > > > > We had developed 3 applications using this module. The same error comes in 3 > > applications we developed. So, i confim that our script dont have problem. > > Kindly help me in this regard. > > > > -- > > With Regards, > > palani kannan. k > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Mar 4 14:06:33 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 04 Mar 2010 11:06:33 -0800 Subject: [Bioperl-l] call for project ideas - Google Summer of Code In-Reply-To: References: <4B8CAE6B.4010807@cornell.edu> Message-ID: <4B9004B9.8090107@cornell.edu> Hello Luis, These are interesting ideas. Have a look at http://sswap.info and http://sadiframework.org, perhaps you might want to work with one of those technologies? Be warned, these are both in early-stage development, you are on the cutting edge here! It seems like your desire to work with semantic technologies as a GSoC student could fit under a number of different mentoring organizations, possibly OBF or NEScent, or maybe another organization entirely. I'll make some inquiries. In the mean time, please add a project idea for this on the bioperl GSoC page, to give the idea somewhere to coalesce. If you can, try to come up with a more concrete idea for what you want to do. http://www.bioperl.org/wiki/Google_Summer_of_Code What do you think? Rob Luis M Rodriguez-R wrote: > Hello Robert, > > I would like to how to apply to and when the GSoC-2010 is planned to be performed. I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf and uniprot, but it could also be useful to integrate OWL). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL. The people of bio2rdf already have some parsers, but it's incompleteness is evident when working with their RDF as primary source of data. > > Best regards, > Luis. > > El 2/03/2010, a las 1:21, Robert Buels escribi?: > >> Hi all, >> >> Google's Summer of Code is coming round again, very soon now (mentoring organization applications are due next week). We need project ideas for prospective Summer of Code interns. >> >> There's a page on the BioPerl wiki, please have a look and add your ideas for intern projects. >> >> For more on Google Summer of Code, what it is and how it works, see their FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs >> >> One of the summer intern ideas I have on the page so far is to help with the tough grunt work of breaking BioPerl into smaller, more easily managed distributions. I'm sure you all can think of plenty more! >> >> Here's the page: http://www.bioperl.org/wiki/Google_Summer_of_Code >> >> Rob >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > From joa2006 at med.cornell.edu Thu Mar 4 15:11:58 2010 From: joa2006 at med.cornell.edu (Josef Anrather) Date: Thu, 04 Mar 2010 15:11:58 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] Message-ID: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Hi there, same problems here. Bioperl 1.6.1 installed; RemoteBlast version 1.006001. Could someone point me in the right direction. What is the put parameter for the email address? Does the supplied email address end up in an FBI data base if you blast the B.anthracis genome? Josef Cornell Medical College From maj at fortinbras.us Thu Mar 4 16:18:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 16:18:48 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: we're not at liberty to say ----- Original Message ----- From: "Josef Anrather" To: Sent: Thursday, March 04, 2010 3:11 PM Subject: Re: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] > Hi there, > > same problems here. Bioperl 1.6.1 installed; RemoteBlast version > 1.006001. > Could someone point me in the right direction. What is the put > parameter for the email address? > > Does the supplied email address end up in an FBI data base if you > blast the B.anthracis genome? > > Josef > > Cornell Medical College > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Fri Mar 5 05:05:43 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Mar 2010 11:05:43 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: My apologies for jumping the gun on the email thing ? that won't take effect until June 1. See full details here: http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d Looks like the problems with RemoteBlast (as Chris reported elsewhere in this thread) is at NCBI's servers (and is probably temporary). Dave From robert.bradbury at gmail.com Fri Mar 5 08:20:36 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 5 Mar 2010 08:20:36 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: On Fri, Mar 5, 2010 at 5:05 AM, Dave Messina wrote: > My apologies for jumping the gun on the email thing ? that won't take > effect until June 1. > > See full details here: > > http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d > > > Looks like the problems with RemoteBlast (as Chris reported elsewhere in > this thread) is at NCBI's servers (and is probably temporary). > > I would not be at all surprised if any problems involving RemoteBlast were related to the recent changeovers to a Javascript requirement for all interfaces to NCBI databases (this took place around mid-February and I complained about this in a previous email to the BioPerl list). I received a response back from Dr. Eric Sayers at NCBI on Feb. 26 that indicated that they were aware of the problem (involving a Javascript requirement) and indicated that NCBI developers were "investigating" ways to mitigate the problem. I've looked briefly at the new Javascript code that one is required to run when using PubMed, etc. and it looks like they may have completely changed the external interfaces to NCBI databases -- so I'm not surprised if that broke some or all other external interfaces used by BioPerl (RemoteBlast, Eutils, etc.). I'd suggest that you try to document the problems as best you can and submit them to the NCBI help desk (or info at ncbi.nlm.nih.gov). It may be worth noting that it took ~3 weeks for me to receive any response to my reports. Also note, that (a) to the best of my knowledge there has been no public discussion regarding these recent changes at NCBI; and (b) under the Jan. 21, 2009 Memorandum on Transparency and Open Government, and under the Dec 8, 2009 Open Government Directive, NCBI *should* be doing a better job working with its end users (and the taxpayers) -- and at least thus far, while NIH seems to be making an effort that doesn't seem to have filtered down to NCBI. (For example, no open/public discussion regarding the email requirement for remote blasts...). It is also worth noting that it should be possible to file FOI requests with NIH/NCBI to find out exactly what they are doing and why they are doing it. I haven't taken such steps yet but I have given consideration to doing so. Robert From biopython at maubp.freeserve.co.uk Fri Mar 5 08:31:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Mar 2010 13:31:57 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> On Fri, Mar 5, 2010 at 1:20 PM, Robert Bradbury wrote: > > (For example, no open/public discussion regarding the email > requirement for remote blasts...). > Hi all, What email requirement for remote blasts are you talking about? Note that the email referred to earlier talks about to unrelated issues, (1) changes to the BLAST output with the introduction of BLAST+, and (2) the upcoming email requirement for Entrez (aka E-utilities, they have been very clear about that with plenty of warning). http://lists.open-bio.org/pipermail/open-bio-l/2010-February/000615.html http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032159.html Is there a misunderstanding here? Peter From David.Messina at sbc.su.se Fri Mar 5 08:44:08 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Mar 2010 14:44:08 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> Message-ID: <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> > Is there a misunderstanding here? Whoops, yes there is ? that's my fault, too. I did not read carefully and conflated EUtilities and RemoteBLAST. Just to be clear, the upcoming email requirement will be for EUtilities, NOT for RemoteBLAST. Thanks for clearing that up, Peter. Dave On Mar 5, 2010, at 14:31, Peter wrote: > On Fri, Mar 5, 2010 at 1:20 PM, Robert Bradbury wrote: >> >> (For example, no open/public discussion regarding the email >> requirement for remote blasts...). >> > > Hi all, > > What email requirement for remote blasts are you talking about? > > Note that the email referred to earlier talks about to unrelated > issues, (1) changes to the BLAST output with the introduction > of BLAST+, and (2) the upcoming email requirement for Entrez > (aka E-utilities, they have been very clear about that with > plenty of warning). > > http://lists.open-bio.org/pipermail/open-bio-l/2010-February/000615.html > http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032159.html > > > Peter From biopython at maubp.freeserve.co.uk Fri Mar 5 08:48:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Mar 2010 13:48:27 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> Message-ID: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: > >> Is there a misunderstanding here? > > Whoops, yes there is ? that's my fault, too. I did not > read carefully and conflated EUtilities and RemoteBLAST. > > Just to be clear, the upcoming email requirement will > be for EUtilities, NOT for RemoteBLAST. > > Thanks for clearing that up, Peter. > Dave No problem - you guys had me worried there for a minute ;) Peter From cjfields at illinois.edu Fri Mar 5 08:50:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 07:50:51 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> On Mar 5, 2010, at 7:20 AM, Robert Bradbury wrote: > On Fri, Mar 5, 2010 at 5:05 AM, Dave Messina wrote: > >> My apologies for jumping the gun on the email thing ? that won't take >> effect until June 1. >> >> See full details here: >> >> http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d >> >> >> Looks like the problems with RemoteBlast (as Chris reported elsewhere in >> this thread) is at NCBI's servers (and is probably temporary). >> >> > I would not be at all surprised if any problems involving RemoteBlast were > related to the recent changeovers to a Javascript requirement for all > interfaces to NCBI databases (this took place around mid-February and I > complained about this in a previous email to the BioPerl list). Robert, according to Palani's recent response NCBI provided a perl script that worked, so I don't think it a Javascript issue. My guess is a change in the returned page information that isn't caught by the current regex, a problem that has happened in the past. I'll be looking into it today. > I received a response back from Dr. Eric Sayers at NCBI on Feb. 26 that > indicated that they were aware of the problem (involving a Javascript > requirement) and indicated that NCBI developers were "investigating" ways to > mitigate the problem. > > I've looked briefly at the new Javascript code that one is required to run > when using PubMed, etc. and it looks like they may have completely changed > the external interfaces to NCBI databases -- so I'm not surprised if that > broke some or all other external interfaces used by BioPerl (RemoteBlast, > Eutils, etc.). I'd suggest that you try to document the problems as best > you can and submit them to the NCBI help desk (or info at ncbi.nlm.nih.gov). > It may be worth noting that it took ~3 weeks for me to receive any response > to my reports. EUtilities works fine (both regular and SOAP); all regression tests are passing, so it's not affecting everything. > Also note, that (a) to the best of my knowledge there has been no public > discussion regarding these recent changes at NCBI; and (b) under the Jan. > 21, 2009 Memorandum on Transparency and Open Government, and under the Dec > 8, 2009 Open Government Directive, NCBI *should* be doing a better job > working with its end users (and the taxpayers) -- and at least thus far, > while NIH seems to be making an effort that doesn't seem to have filtered > down to NCBI. > > (For example, no open/public discussion regarding the email requirement for > remote blasts...). > > It is also worth noting that it should be possible to file FOI requests with > NIH/NCBI to find out exactly what they are doing and why they are doing it. > I haven't taken such steps yet but I have given consideration to doing so. > > Robert The email requirement has always been indicated, it was just never enforced. B/c of increased spamming issues on the NCBI server they took up the initiative to require users provide an email address (and enforce it starting in June). I just made a change to the BioPerl install that requests an email and bypasses Bio::DB::EUtilities tests if one is not provided, other tools will be following suit. I don't think there is anything insidious about this. My guess is they will be using them merely to track server usage per user and IP, and take necessary measures (i.e. contact or block) if needed. Finally, I'm not sure where the hostility is coming from. NCBI has provided a great service to the community for many years, even through many funding cuts, and they have had quite a few. Frankly, if one doesn't like their service requirements, there are other databases that one can use. chris From cjfields at illinois.edu Fri Mar 5 10:07:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 09:07:11 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> Message-ID: On Mar 5, 2010, at 7:48 AM, Peter wrote: > On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >> >>> Is there a misunderstanding here? >> >> Whoops, yes there is ? that's my fault, too. I did not >> read carefully and conflated EUtilities and RemoteBLAST. >> >> Just to be clear, the upcoming email requirement will >> be for EUtilities, NOT for RemoteBLAST. >> >> Thanks for clearing that up, Peter. >> Dave > > No problem - you guys had me worried there for a minute ;) > > Peter Just as an update, I can confirm it is a change with retrieve_blast() not catching the report (no Javascript, no email ;). Will try fixing this later today. chris From robert.bradbury at gmail.com Fri Mar 5 10:08:42 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 5 Mar 2010 10:08:42 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> Message-ID: Sorry, yes I too was reading quickly and not separating RemoteBlast from Eutilities requirements. With respect to "hostility", I do agree Chris that NCBI has provided a great service over the years (I've used it for over 15 as I'm sure many here have). However, the recent Javascript requirement (without any apparent discussion within the user community) has me very annoyed [1]. One could back it up a level and ask why NCBI doesn't have a "user community forum" (at least that I'm aware of) or even a bug database (it isn't like putting up a bugzilla bug database requires all that much work). Heck, even the phone companies (whom I consider to be the epitome of bureaucracy) issue me a trouble ticket # when I have a problem (something to the best of my knowledge NCBI does not do). There is also the fact that several months ago when I requested an explanation for what code/utilities were being used to generate the Homologene "homology" graphics (so I could consider extending it to other species, potentially in BioPerl) I was told in unspecific terms that a variety of utilities were used (and my impression was perhaps an underlying suggestion that it might be too complicated for me to understand -- but that could just be subjective impression on my part). [Of course such a response doesn't fit well my perspective of "open government".) Robert 1. There are a long list of reasons why Javascript is bad ranging from increasing memory and CPU requirements on the end user (one cannot run hundreds of open PubMed tabs, as I often may when doing research, on an "average" machine if all the tabs are running Javascript, downloading and running lots of Javascripts can hardly be considered "green", Javascript doesn't work in the lightest weight browsers such as Dillo, Javascript decreases the reliability and security of the browser, excessive reliance on Javascript may decrease web access for individuals with disabilities (potentially in violation of current laws I suspect), etc.) From roy.chaudhuri at gmail.com Fri Mar 5 10:52:12 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 05 Mar 2010 15:52:12 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> Message-ID: <4B9128AC.1000405@gmail.com> Hi Robert, Just a suggestion, maybe you could use HubMed (www.hubmed.org) as a PubMed alternative? It seems to work ok with JavaScript disabled. Roy. On 05/03/2010 15:08, Robert Bradbury wrote: > Sorry, yes I too was reading quickly and not separating RemoteBlast from > Eutilities requirements. > > With respect to "hostility", I do agree Chris that NCBI has provided a great > service over the years (I've used it for over 15 as I'm sure many here > have). However, the recent Javascript requirement (without any apparent > discussion within the user community) has me very annoyed [1]. One could > back it up a level and ask why NCBI doesn't have a "user community forum" > (at least that I'm aware of) or even a bug database (it isn't like putting > up a bugzilla bug database requires all that much work). Heck, even the > phone companies (whom I consider to be the epitome of bureaucracy) issue me > a trouble ticket # when I have a problem (something to the best of my > knowledge NCBI does not do). > > There is also the fact that several months ago when I requested an > explanation for what code/utilities were being used to generate the > Homologene "homology" graphics (so I could consider extending it to other > species, potentially in BioPerl) I was told in unspecific terms that a > variety of utilities were used (and my impression was perhaps an underlying > suggestion that it might be too complicated for me to understand -- but that > could just be subjective impression on my part). [Of course such a response > doesn't fit well my perspective of "open government".) > > Robert > > 1. There are a long list of reasons why Javascript is bad ranging from > increasing memory and CPU requirements on the end user (one cannot run > hundreds of open PubMed tabs, as I often may when doing research, on an > "average" machine if all the tabs are running Javascript, downloading and > running lots of Javascripts can hardly be considered "green", Javascript > doesn't work in the lightest weight browsers such as Dillo, Javascript > decreases the reliability and security of the browser, excessive reliance on > Javascript may decrease web access for individuals with disabilities > (potentially in violation of current laws I suspect), etc.) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paolo.pavan at gmail.com Fri Mar 5 13:51:55 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 5 Mar 2010 19:51:55 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <2FB5C317605B48269256ABFABBED2239@NewLife> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> <18C0182252934619AD12E49243BE3C14@NewLife> <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> <2FB5C317605B48269256ABFABBED2239@NewLife> Message-ID: <56be91b61003051051v6b06b872q9f59380b05492071@mail.gmail.com> Dear Mark, Thank you again for your efforts spent on this theme, I have read and tested carefully enough I hope, your new ads. I found they work perfectly but either I miss some feature of the Tiling API (and this is possible) or it could be that they don't entirely match what was the initial problem; for sure my fault, I can explain better. Let me start saying that what is needed is the merge of the alignments returned by the get_tiled_alns method. I have 2 seqs: h1, h2 (in the given example 00038 and 00053) and they could be aligned against the same sequence q (named 1_0) They cannot be aligned with common multiple sequence aligners like clustalw since in this case is to be preferred a local alignment algorithm instead of a global alignment. This specific case cannot be handled by programs like cap3 either. I found that megablast -m 5 can output a tiling of all the hits found versus the query, reporting this entire. I hope I gave the idea, if needed I can provide the input sequences of the megablast. Thank you again and have a nice week end, Paolo 2010/3/4 Mark A. Jensen : > Paolo -- Ok, there's now (r16900) an *experimental* method in > Bio::Search::Tiling::MapTiling called get_tiled_alns(). > POD is below. Try it out and let me know-- > cheers, > MAJ > > > =head1 TILED ALIGNMENTS > > The experimental method L will use a tiling > to concatenate tiled hsps into a series of L > objects: > > @alns = $tiling->get_tiled_alns($type, $context); > > Each alignment contains two sequences with ids 'query' and 'subject', > and consists of a concatenation of tiling HSPs which overlap or are > directly adjacent. The alignment are returned in C<$type> sequence > order. When HSPs overlap, the alignment sequence is taken from the HSP > which comes first in the coverage map array. > > The sequences in each alignment contain features (even though they are > L objects) which map the original query/subject > coordinates to the new alignment sequence coordinates. You can > determine the original BLAST fragments this way: > > $aln = ($tiling->get_tiled_alns)[0]; > $qseq = $aln->get_seq_by_id('query'); > $hseq = $aln->get_seq_by_id('subject'); > foreach my $feat ($qseq->get_SeqFeatures) { > ? $org_start = ($feat->get_tag_values('query_start'))[0]; > ? $org_end = ($feat->get_tag_values('query_end'))[0]; > ? # original fragment as represented in the tiled alignment: > ? $org_fragment = $feat->seq; > } > foreach my $feat ($hseq->get_SeqFeatures) { > ? $org_start = ($feat->get_tag_values('subject_start'))[0]; > ? $org_end = ($feat->get_tag_values('subject_end'))[0]; > ? # original fragment as represented in the tiled alignment: > ? $org_fragment = $feat->seq; > } > > > ----- Original Message ----- From: "Paolo Pavan" > To: "Mark A. Jensen" > Cc: "Chris Fields" ; > Sent: Tuesday, March 02, 2010 10:17 AM > Subject: Re: [Bioperl-l] Alignment from blast report > > >> I think you got the sense, thank you. Of course hsps from different >> hits will be reflected in different elements aligned. I've attached >> the example pasted (unix text) because is more readable, hoping will >> not be held by the mailing server :-) >> >> Thank you, >> Paolo >> >> 2010/3/2 Mark A. Jensen : >>> >>> This might a good method to have for Bio::Search::Tiling-- >>> you want to stitch together all the hsps and have the >>> concatenated alignment returned as a Bio::SimpleAlign, >>> correct? Tiling would create the right set of hsps from >>> which to generate the composite alignment. I can >>> try to get something working, but it may take a while- >>> MAJ >>> ----- Original Message ----- From: "Paolo Pavan" >>> To: "Chris Fields" >>> Cc: >>> Sent: Tuesday, March 02, 2010 9:37 AM >>> Subject: Re: [Bioperl-l] Alignment from blast report >>> >>> >>> Hi Chris, >>> Thank you for your reply. So I have to understand that since the >>> get_aln method returns the HSP alignment, there is no way to retrieve >>> the whole alignment as in the example pasted, isn't it? >>> Basically I'm trying to use megablast as kind of multiple local >>> alignment engine and actually I'm not pretty sure this is a good idea >>> but in my particular case could be suitable. I mean that the example >>> below reports only the portions of the sequences that align loosing >>> the portions that does not, I'm not sure I gave the idea. What do you >>> think about? Can you give me your opinion? >>> If there isn't any module written yet, I can try to write a parser, it >>> could be of any interest? >>> >>> Thank you, >>> Paolo >>> >>> 2010/3/2 Chris Fields : >>>> >>>> Paolo, >>>> >>>> You can get a Bio::SimpleAlign from the HSP object. The first code >>>> example >>>> in this section in the HOWTO demonstrates this: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>>> >>>> chris >>>> >>>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >>>> >>>>> Dear all, >>>>> Sorry for pushing up my post but, please does anyone have an hint for >>>>> me? >>>>> Maybe have I to send attached the report to the mailing list? I don't >>>>> know attachment policies of the list, if it is allowed and is needed I >>>>> can do that. >>>>> >>>>> Thank you, >>>>> Paolo >>>>> >>>>> 2010/2/26 Paolo Pavan : >>>>>> >>>>>> Sorry, >>>>>> Maybe I forgot to add this is the megablast -m 5 output. >>>>>> >>>>>> Thank you again, >>>>>> Paolo >>>>>> >>>>>> 2010/2/26 Paolo Pavan : >>>>>>> >>>>>>> Hi all, >>>>>>> I have just a brief question: I've got some megablast reports such >>>>>>> the >>>>>>> one I've pasted below. >>>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>>>> Bio::Align::AlignI implementing one? >>>>>>> >>>>>>> Thank you all, >>>>>>> Paolo >>>>>>> >>>>>>> >>>>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>>>> >>>>>>> >>>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>>>> (2000), >>>>>>> "A greedy algorithm for aligning DNA sequences", >>>>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>>>> >>>>>>> Database: 00038-00053.fasta >>>>>>> 2 sequences; 2001 total letters >>>>>>> >>>>>>> Searching..................................................done >>>>>>> >>>>>>> Query= 00038-00053 >>>>>>> (802 letters) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Score E >>>>>>> Sequences producing significant alignments: (bits) Value >>>>>>> >>>>>>> ______00038 >>>>>>> 226 1e-62 >>>>>>> ______00053 >>>>>>> 115 3e-29 >>>>>>> >>>>>>> 1_0 472 >>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>>>> ______00038 883 >>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 532 >>>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>>>> ______00038 943 >>>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 591 >>>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 651 >>>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 711 >>>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 1 >>>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>>>> >>>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>>>> ______00038 1000 -------------------------------- 1001 >>>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>>>> Database: 00038-00053.fasta >>>>>>> Posted date: Feb 25, 2010 4:47 PM >>>>>>> Number of letters in database: 2001 >>>>>>> Number of sequences in database: 2 >>>>>>> >>>>>>> Lambda K H >>>>>>> 1.37 0.711 1.31 >>>>>>> >>>>>>> Gapped >>>>>>> Lambda K H >>>>>>> 1.37 0.711 1.31 >>>>>>> >>>>>>> >>>>>>> Matrix: blastn matrix:1 -3 >>>>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>>>> Number of Sequences: 2 >>>>>>> Number of Hits to DB: 17 >>>>>>> Number of extensions: 3 >>>>>>> Number of successful extensions: 3 >>>>>>> Number of sequences better than 10.0: 2 >>>>>>> Number of HSP's gapped: 2 >>>>>>> Number of HSP's successfully gapped: 2 >>>>>>> Length of query: 802 >>>>>>> Length of database: 2001 >>>>>>> Length adjustment: 10 >>>>>>> Effective length of query: 792 >>>>>>> Effective length of database: 1981 >>>>>>> Effective search space: 1568952 >>>>>>> Effective search space used: 1568952 >>>>>>> X1: 9 (17.8 bits) >>>>>>> X2: 20 (39.6 bits) >>>>>>> X3: 51 (101.1 bits) >>>>>>> S1: 9 (18.3 bits) >>>>>>> S2: 9 (18.3 bits) >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shalabh.sharma7 at gmail.com Fri Mar 5 15:06:30 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 15:06:30 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) Message-ID: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> Hi All, I have a set of accession numbers. Is it possible to get "isolation_source" from the GenBank records for all the Accession numbers. I would really appreciate if anyone can help me out. Thanks Shalabh From shalabh.sharma7 at gmail.com Fri Mar 5 15:29:17 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 15:29:17 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> Message-ID: <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> HI Brian, Thanks for your quick reply. I was reading the document and it think it talks about parsing a GenBank record. What i exactly want is to submit a batch of accession numbers and get "isolation_source" directly without downloading all the Genbank files. I am still reading the document may be i missed something. Thanks a lot shalabh On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > Shalabh, > > You can start by reading about how Bioperl processes Genbank files and > their annotations: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > > > Brian O. > > On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > > > Hi All, > > I have a set of accession numbers. Is it possible to get > > "isolation_source" from the GenBank records for all the Accession > numbers. > > > > I would really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Fri Mar 5 15:43:33 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 05 Mar 2010 15:43:33 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> Message-ID: Shalabh, I see. I think you could use EUtils then. Take a look at these: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service I'm not an expert on these, and I do not know if one can ask for just a tag value ("isolation_source"). Getting a tag value from the downloaded Genbank entry is not difficult though, that Feature-Annotation HOWTO shows you how. Brian O. On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > HI Brian, > Thanks for your quick reply. > I was reading the document and it think it talks about parsing a GenBank > record. What i exactly want is to submit a batch of accession numbers and > get "isolation_source" directly without downloading all the Genbank files. > I am still reading the document may be i missed something. > > Thanks a lot > shalabh > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > >> Shalabh, >> >> You can start by reading about how Bioperl processes Genbank files and >> their annotations: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation >> >> >> >> Brian O. >> >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: >> >>> Hi All, >>> I have a set of accession numbers. Is it possible to get >>> "isolation_source" from the GenBank records for all the Accession >> numbers. >>> >>> I would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Mar 5 15:13:45 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 05 Mar 2010 15:13:45 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> Message-ID: <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> Shalabh, You can start by reading about how Bioperl processes Genbank files and their annotations: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Brian O. On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > Hi All, > I have a set of accession numbers. Is it possible to get > "isolation_source" from the GenBank records for all the Accession numbers. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 5 16:22:47 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 05 Mar 2010 15:22:47 -0600 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> Message-ID: <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> Regardless on what you try, it will only limit records returned (e.g. you will still get full records, unless you take steps to limit those somehow, by adding sequence start/stop, etc). Anyway, this worked to retrieve those with that tag: "src isolation source"[Properties] That get a lot of hits. If you are only interested in that one line you could just parse it out w/o resorting to bioperl (beleiev it or not, it's not always the best answer). chris On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: > Shalabh, > > I see. I think you could use EUtils then. Take a look at these: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > I'm not an expert on these, and I do not know if one can ask for just a tag value ("isolation_source"). Getting a tag value from the downloaded Genbank entry is not difficult though, that Feature-Annotation HOWTO shows you how. > > Brian O. > > > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > > > HI Brian, > > Thanks for your quick reply. > > I was reading the document and it think it talks about parsing a GenBank > > record. What i exactly want is to submit a batch of accession numbers and > > get "isolation_source" directly without downloading all the Genbank files. > > I am still reading the document may be i missed something. > > > > Thanks a lot > > shalabh > > > > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > > > >> Shalabh, > >> > >> You can start by reading about how Bioperl processes Genbank files and > >> their annotations: > >> > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > >> > >> > >> > >> Brian O. > >> > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > >> > >>> Hi All, > >>> I have a set of accession numbers. Is it possible to get > >>> "isolation_source" from the GenBank records for all the Accession > >> numbers. > >>> > >>> I would really appreciate if anyone can help me out. > >>> > >>> Thanks > >>> Shalabh > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Fri Mar 5 17:06:41 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 17:06:41 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> Message-ID: <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> Thanks Bran and Chris, I followed the example given here : http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook to retrieve raw data records from genbank. For example i used the id : 157091572 to get the genbank record, but the downloaded file does not contain "isolation_source" which is there when you look for the record online: http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&dopt=GenBank&RID=T2S9N0PJ01N&log%24=nuclalign&blast_rank=1&list_uids=157091572 Thanks Shalabh On Fri, Mar 5, 2010 at 4:22 PM, Chris Fields wrote: > Regardless on what you try, it will only limit records returned (e.g. > you will still get full records, unless you take steps to limit those > somehow, by adding sequence start/stop, etc). > > Anyway, this worked to retrieve those with that tag: > "src isolation source"[Properties] > > That get a lot of hits. > > If you are only interested in that one line you could just parse it out > w/o resorting to bioperl (beleiev it or not, it's not always the best > answer). > > chris > > On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: > > Shalabh, > > > > I see. I think you could use EUtils then. Take a look at these: > > > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > > > I'm not an expert on these, and I do not know if one can ask for just a > tag value ("isolation_source"). Getting a tag value from the downloaded > Genbank entry is not difficult though, that Feature-Annotation HOWTO shows > you how. > > > > Brian O. > > > > > > > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > > > > > HI Brian, > > > Thanks for your quick reply. > > > I was reading the document and it think it talks about parsing a > GenBank > > > record. What i exactly want is to submit a batch of accession numbers > and > > > get "isolation_source" directly without downloading all the Genbank > files. > > > I am still reading the document may be i missed something. > > > > > > Thanks a lot > > > shalabh > > > > > > > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne >wrote: > > > > > >> Shalabh, > > >> > > >> You can start by reading about how Bioperl processes Genbank files and > > >> their annotations: > > >> > > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > >> > > >> > > >> > > >> Brian O. > > >> > > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > > >> > > >>> Hi All, > > >>> I have a set of accession numbers. Is it possible to get > > >>> "isolation_source" from the GenBank records for all the Accession > > >> numbers. > > >>> > > >>> I would really appreciate if anyone can help me out. > > >>> > > >>> Thanks > > >>> Shalabh > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From shalabh.sharma7 at gmail.com Fri Mar 5 17:57:00 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 17:57:00 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> Message-ID: <9fcc48c71003051457x7186e3e0y1c9b8ee5ea81e153@mail.gmail.com> Thanks everyone, i got it what i was looking for. EUtlities helped me a lot. Thanks Shalabh On Fri, Mar 5, 2010 at 5:06 PM, shalabh sharma wrote: > Thanks Bran and Chris, > I followed the example given here : > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > to retrieve raw data records from genbank. > For example i used the id : 157091572 to get the genbank record, but the > downloaded file does not contain "isolation_source" which is there when you > look for the record online: > > http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&dopt=GenBank&RID=T2S9N0PJ01N&log%24=nuclalign&blast_rank=1&list_uids=157091572 > > Thanks > Shalabh > > > On Fri, Mar 5, 2010 at 4:22 PM, Chris Fields wrote: > >> Regardless on what you try, it will only limit records returned (e.g. >> you will still get full records, unless you take steps to limit those >> somehow, by adding sequence start/stop, etc). >> >> Anyway, this worked to retrieve those with that tag: >> "src isolation source"[Properties] >> >> That get a lot of hits. >> >> If you are only interested in that one line you could just parse it out >> w/o resorting to bioperl (beleiev it or not, it's not always the best >> answer). >> >> chris >> >> On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: >> > Shalabh, >> > >> > I see. I think you could use EUtils then. Take a look at these: >> > >> > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> > >> > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> > >> > I'm not an expert on these, and I do not know if one can ask for just a >> tag value ("isolation_source"). Getting a tag value from the downloaded >> Genbank entry is not difficult though, that Feature-Annotation HOWTO shows >> you how. >> > >> > Brian O. >> > >> > >> >> > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: >> > >> > > HI Brian, >> > > Thanks for your quick reply. >> > > I was reading the document and it think it talks about parsing a >> GenBank >> > > record. What i exactly want is to submit a batch of accession numbers >> and >> > > get "isolation_source" directly without downloading all the Genbank >> files. >> > > I am still reading the document may be i missed something. >> > > >> > > Thanks a lot >> > > shalabh >> > > >> > > >> > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne > >wrote: >> > > >> > >> Shalabh, >> > >> >> > >> You can start by reading about how Bioperl processes Genbank files >> and >> > >> their annotations: >> > >> >> > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation >> > >> >> > >> >> > >> >> > >> Brian O. >> > >> >> > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: >> > >> >> > >>> Hi All, >> > >>> I have a set of accession numbers. Is it possible to get >> > >>> "isolation_source" from the GenBank records for all the Accession >> > >> numbers. >> > >>> >> > >>> I would really appreciate if anyone can help me out. >> > >>> >> > >>> Thanks >> > >>> Shalabh >> > >>> _______________________________________________ >> > >>> Bioperl-l mailing list >> > >>> Bioperl-l at lists.open-bio.org >> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From cjfields at illinois.edu Fri Mar 5 23:14:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 22:14:01 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> Message-ID: <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> On Mar 5, 2010, at 7:48 AM, Peter wrote: > On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >> >>> Is there a misunderstanding here? >> >> Whoops, yes there is ? that's my fault, too. I did not >> read carefully and conflated EUtilities and RemoteBLAST. >> >> Just to be clear, the upcoming email requirement will >> be for EUtilities, NOT for RemoteBLAST. >> >> Thanks for clearing that up, Peter. >> Dave > > No problem - you guys had me worried there for a minute ;) > > Peter Just to bring this thread full circle, I have committed a fix which (ironically) reduced the code down a bit. I also added an attribute (get_rtoe) that returns the approximate time until the report is returned. chris From joa2006 at med.cornell.edu Sat Mar 6 17:13:45 2010 From: joa2006 at med.cornell.edu (Josef Anrather) Date: Sat, 06 Mar 2010 17:13:45 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> Message-ID: Chris, the fix works flawlessly on my system. Thanks for the fast response. Cheers, Josef On Mar 5, 2010, at 11:14 PM, Chris Fields wrote: > > On Mar 5, 2010, at 7:48 AM, Peter wrote: > >> On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >>> >>>> Is there a misunderstanding here? >>> >>> Whoops, yes there is ? that's my fault, too. I did not >>> read carefully and conflated EUtilities and RemoteBLAST. >>> >>> Just to be clear, the upcoming email requirement will >>> be for EUtilities, NOT for RemoteBLAST. >>> >>> Thanks for clearing that up, Peter. >>> Dave >> >> No problem - you guys had me worried there for a minute ;) >> >> Peter > > Just to bring this thread full circle, I have committed a fix which > (ironically) reduced the code down a bit. I also added an attribute > (get_rtoe) that returns the approximate time until the report is > returned. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jarodpardon at yahoo.com.cn Sun Mar 7 04:13:40 2010 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Sun, 7 Mar 2010 17:13:40 +0800 (CST) Subject: [Bioperl-l] insertion code in pdb parser Message-ID: <643595.96038.qm@web15003.mail.cnb.yahoo.com> hi, all, insertion code for a residue number is very common in many cases, esp. in the numbering schema for antibody sequence, such as 82A, 82B. When Bio::Structure::IO::pdb parses a pdb file containing residues with insertion code, it will assign the id for such residue like 'PRO-52.A' where 'A' is the insertion code, however, the opposite operation (set the id of the residue) does not work. for example, if the original residue number is 51, $res->id('PRO-52.A') will not append the insertion code after the residue number correctly, though it indeed changes the residue number from 51 to 52. Finally, I found out the only way to set the insertion code for the residue: assign the insertion code for all atoms of this residue by the method $atom->icode('A'). I think it is inconvenient and misleading, since insertion code should not be a property for an atom, it is never seen that a residue have atoms with different insertion codes. I highly recommend that there should be some changes: add icode method for residue object, not the atom, as the same, the segment id should also be for residue. Jarod From rtbio.2009 at gmail.com Sun Mar 7 08:11:54 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sun, 7 Mar 2010 14:11:54 +0100 Subject: [Bioperl-l] remoteblast Message-ID: Hello Mark and everybody, I have been trying to connect to remote blast to retrieve similar sequences to a given sequence. But my program is unable to retrieve the sequences from BLAST, i.e., it is getting executed till the remote blast ids, but it is not entering the else loop after collecting the rid. Please check this problem and help me in this regard. I think the problem is in getting the sequence and going to the 'else' part. i.e., else { open(OUTFILE,'>',$blastdebugfile); # I think the problem is in else part, i.e., it is not taking the next result.# print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output Please give me your reply. Thanks and regards, Roopa. My code is as follows. #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use rnai_blast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds."; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes
This page will automatically reload in 30 seconds
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$blastdebugfile); print OUTFILE @params; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => "$organ\[ORGN]"); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); #submits the input data to BLAST# my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); #Retrieving the result ids# my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$blastdebugfile); # I think the problem is in else part, i.e., it is not taking the next result.# print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); #saving the output in blastdata.time.out file# # $random=rand(); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); # print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dna; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); #returning the sequences obtained on BLAST# } From cjfields at illinois.edu Sun Mar 7 09:57:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 7 Mar 2010 08:57:43 -0600 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Roopa, I committed a fix for this a few days ago; if you update from SVN it should work. The problem stemmed from server-side changes at NCBI. chris On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: > Hello Mark and everybody, > > I have been trying to connect to remote blast to retrieve similar sequences > to a given sequence. But my program is unable to retrieve the sequences from > BLAST, i.e., it is getting executed till the remote blast ids, but it is not > entering the else loop after collecting the rid. Please check this problem > and help me in this regard. I think the problem is in getting the sequence > and going to the 'else' part. i.e., > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > > Please give me your reply. > > Thanks and regards, > Roopa. > > My code is as follows. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use rnai_blast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds."; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes
> This page will automatically reload in 30 seconds
> \n > \n"; > > close(OUTFILE); > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$blastdebugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > "$organ\[ORGN]"); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > #submits the input data to BLAST# > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > #Retrieving the result ids# > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > #saving the output in blastdata.time.out file# > > # $random=rand(); > > my $filename = $serverpath."/blastdata_".time()."\.out"; > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > # print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dna; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > close(OUTFILE); > > > return(@seqs); #returning the sequences obtained on BLAST# > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jdetras at gmail.com Fri Mar 5 01:17:40 2010 From: jdetras at gmail.com (Jeffrey Detras) Date: Fri, 5 Mar 2010 14:17:40 +0800 Subject: [Bioperl-l] distances between leaf nodes Message-ID: Hi, I am new at using the Bio::TreeIO module specifically using the newick format for a phylogenetic analysis. The sample_tree attached is Newick-formatted tree. My objective is to get all the distances between all the leaf nodes. I copied examples of the code from http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my knowledge) so that I understand how to assign the right array value for the nodes/leaves. The message would say must provide 2 root nodes. Here is what I have right now: #!/usr/bin/perl -w use strict; my $treefile = 'sample_tree'; use Bio::TreeIO; my $treeio = Bio::TreeIO->new(-format => 'newick', -file => $treefile); while (my $tree = $treeio->next_tree) { my @leaves = $tree->get_leaf_nodes; for (my $dist = $tree->distance(-nodes => \@leaves)){ print "Distance between trees is $dist\n"; } } Thanks, Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_tree Type: application/octet-stream Size: 418 bytes Desc: not available URL: From janine.arloth at googlemail.com Fri Mar 5 04:43:57 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Fri, 5 Mar 2010 10:43:57 +0100 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: Message-ID: Hello, using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: 1 50 test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA 51 100 test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA But I prefer this format: Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 ||||||||||||||||||||||||| Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 How can I get this? Best Regards From elujan at stanford.edu Sun Mar 7 19:49:34 2010 From: elujan at stanford.edu (Ernesto George Lujan) Date: Sun, 7 Mar 2010 16:49:34 -0800 (PST) Subject: [Bioperl-l] Installing BioPerl In-Reply-To: <1189627897.1477411268008644137.JavaMail.root@zm09.stanford.edu> Message-ID: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> Hi everyone, I'm running MacOSX 10.5.8 with Perl 5.8.8 and I'm having trouble installing the BioPerl module. I've downloaded and installed BioPerl 1.5.1-2 binary through FinkCommander, but when I type perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' into the Terminal, it tells me that I'm using BioPerl Version 1.006. How do I get this module to install correctly? Once again, my specs: Perl Version: 5.8.8 BioPerl Version: 1.006 Operating System: Max OSX 10.5.8 Thanks! -BioPerl Beginner From bimber at wisc.edu Sun Mar 7 22:57:12 2010 From: bimber at wisc.edu (Ben Bimber) Date: Sun, 7 Mar 2010 21:57:12 -0600 Subject: [Bioperl-l] Bioperl-run malformed svndiff Message-ID: <9f985cdc1003071957h6c82d4b8t1a6b9a3af7752bde@mail.gmail.com> I recently tried to check out a complete version of bioperl-run and received an error saying 'malformed svndiff'. I've tried this on two different machines, so unless I've doing something wrong, it should be reproducible. I cannot say where updating an existing repository would throw the same error or not. Below is the log: *** Check Out svn checkout "svn://code.open-bio.org/bioperl/bioperl-run/trunk/lib/Bio at HEAD" -r HEAD --depth infinity "C:\Projects\Bio" A C:/Projects/Bio/Tools A C:/Projects/Bio/Tools/Run A C:/Projects/Bio/Tools/Run/Genewise.pm A C:/Projects/Bio/Tools/Run/Analysis A C:/Projects/Bio/Tools/Run/Analysis/soap.pm A C:/Projects/Bio/Tools/Run/AssemblerBase.pm A C:/Projects/Bio/Tools/Run/BWA.pm A C:/Projects/Bio/Tools/Run/Phrap.pm A C:/Projects/Bio/Tools/Run/FootPrinter.pm A C:/Projects/Bio/Tools/Run/AnalysisFactory.pm A C:/Projects/Bio/Tools/Run/BEDTools.pm A C:/Projects/Bio/Tools/Run/EMBOSSApplication.pm A C:/Projects/Bio/Tools/Run/Genscan.pm A C:/Projects/Bio/Tools/Run/RNAMotif.pm A C:/Projects/Bio/Tools/Run/Phylo A C:/Projects/Bio/Tools/Run/Phylo/Phast A C:/Projects/Bio/Tools/Run/Phylo/Phast/PhyloFit.pm A C:/Projects/Bio/Tools/Run/Phylo/Phast/PhastCons.pm A C:/Projects/Bio/Tools/Run/Phylo/Semphy.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/FEL.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/Base.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/Modeltest.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/REL.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/SLAC.pm A C:/Projects/Bio/Tools/Run/Phylo/PhyloBase.pm A C:/Projects/Bio/Tools/Run/Phylo/Phyml.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip A C:/Projects/Bio/Tools/Run/Phylo/Phylip/DrawGram.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Base.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/ProtPars.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Consense.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/DrawTree.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Neighbor.pm A C:/Projects/Bio/Tools/Run/Phylo/Njtree A C:/Projects/Bio/Tools/Run/Phylo/Njtree/Best.pm A C:/Projects/Bio/Tools/Run/Phylo/QuickTree.pm A C:/Projects/Bio/Tools/Run/Phylo/Gerp.pm A C:/Projects/Bio/Tools/Run/Phylo/Molphy A C:/Projects/Bio/Tools/Run/Phylo/Molphy/ProtML.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML A C:/Projects/Bio/Tools/Run/Phylo/PAML/Yn00.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Evolver.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Baseml.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Codeml.pm A C:/Projects/Bio/Tools/Run/Phylo/SLR.pm A C:/Projects/Bio/Tools/Run/Phylo/Gumby.pm A C:/Projects/Bio/Tools/Run/Phylo/LVB.pm A C:/Projects/Bio/Tools/Run/Primer3.pm A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus.pm A C:/Projects/Bio/Tools/Run/Meme.pm A C:/Projects/Bio/Tools/Run/RepeatMasker.pm A C:/Projects/Bio/Tools/Run/Analysis.pm A C:/Projects/Bio/Tools/Run/Cap3.pm A C:/Projects/Bio/Tools/Run/Vista.pm A C:/Projects/Bio/Tools/Run/Pseudowise.pm A C:/Projects/Bio/Tools/Run/Minimo.pm A C:/Projects/Bio/Tools/Run/Match.pm A C:/Projects/Bio/Tools/Run/Mdust.pm A C:/Projects/Bio/Tools/Run/Eponine.pm A C:/Projects/Bio/Tools/Run/Infernal.pm A C:/Projects/Bio/Tools/Run/BlastPlus A C:/Projects/Bio/Tools/Run/BlastPlus/Config.pm A C:/Projects/Bio/Tools/Run/EMBOSSacd.pm A C:/Projects/Bio/Tools/Run/Alignment A C:/Projects/Bio/Tools/Run/Alignment/Proda.pm A C:/Projects/Bio/Tools/Run/Alignment/Kalign.pm A C:/Projects/Bio/Tools/Run/Alignment/StandAloneFasta.pm A C:/Projects/Bio/Tools/Run/Alignment/TCoffee.pm A C:/Projects/Bio/Tools/Run/Alignment/Sim4.pm A C:/Projects/Bio/Tools/Run/Alignment/Probalign.pm A C:/Projects/Bio/Tools/Run/Alignment/Amap.pm A C:/Projects/Bio/Tools/Run/Alignment/Lagan.pm A C:/Projects/Bio/Tools/Run/Alignment/Blat.pm A C:/Projects/Bio/Tools/Run/Alignment/Gmap.pm A C:/Projects/Bio/Tools/Run/Alignment/Probcons.pm A C:/Projects/Bio/Tools/Run/Alignment/DBA.pm A C:/Projects/Bio/Tools/Run/Alignment/Muscle.pm A C:/Projects/Bio/Tools/Run/Alignment/Pal2Nal.pm A C:/Projects/Bio/Tools/Run/Alignment/Exonerate.pm A C:/Projects/Bio/Tools/Run/Alignment/MAFFT.pm A C:/Projects/Bio/Tools/Run/Alignment/Clustalw.pm A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm A C:/Projects/Bio/Tools/Run/Hmmer.pm A C:/Projects/Bio/Tools/Run/BlastPlus.pm A C:/Projects/Bio/Tools/Run/ERPIN.pm A C:/Projects/Bio/Tools/Run/Maq.pm A C:/Projects/Bio/Tools/Run/Bowtie A C:/Projects/Bio/Tools/Run/Bowtie/Config.pm A C:/Projects/Bio/Tools/Run/Seg.pm A C:/Projects/Bio/Tools/Run/Prints.pm A C:/Projects/Bio/Tools/Run/MCS.pm A C:/Projects/Bio/Tools/Run/Tmhmm.pm A C:/Projects/Bio/Tools/Run/Ensembl.pm A C:/Projects/Bio/Tools/Run/Coil.pm A C:/Projects/Bio/Tools/Run/Samtools A C:/Projects/Bio/Tools/Run/Samtools/Config.pm A C:/Projects/Bio/Tools/Run/Genemark.pm A C:/Projects/Bio/Tools/Run/Bowtie.pm A C:/Projects/Bio/Tools/Run/Glimmer.pm A C:/Projects/Bio/Tools/Run/Signalp.pm A C:/Projects/Bio/Tools/Run/Simprot.pm A C:/Projects/Bio/Tools/Run/BWA A C:/Projects/Bio/Tools/Run/BWA/Config.pm A C:/Projects/Bio/Tools/Run/Newbler.pm svn: Malformed svndiff data in representation *** Error (took 00:07.184) From David.Messina at sbc.su.se Mon Mar 8 02:01:13 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 8 Mar 2010 08:01:13 +0100 Subject: [Bioperl-l] Installing BioPerl In-Reply-To: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> References: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> Message-ID: <0483C203-3E81-4112-877B-BC7A439CB916@sbc.su.se> Hey Ernesto, I'm pretty sure you've got BioPerl version 1.6.0, which is actually more current than 1.5.2 that you were looking for. Due to oddities of Perl version numbers, 1.006 = 1.6.0 (or something like that). So I think you're probably good to go. I should also mention that direct installation (i.e. not via fink) works pretty well these days, and through that you can get the current BioPerl release, which is 1.6.2 (or 1.006002000000000). Dave From alex at bioinf.uni-leipzig.de Mon Mar 8 10:45:14 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Mon, 8 Mar 2010 16:45:14 +0100 (CET) Subject: [Bioperl-l] Problem with PAML/Codeml wrapper Message-ID: Hi, I do have a problem with the PAML/Codeml wrapper. I want to calculate all pairwise K_a,K_s values from a given alignment, using the example procedure of http://www.bioperl.org/wiki/HOWTO:PAML my $dna_aln = aa_to_dna_aln($aln, \%seqs); my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new( -params => { 'runmode' => -2, 'seqtype' => 1,} ); $kaks_factory->alignment($dna_aln); my ($rc,$parser) = $kaks_factory->run(); my $result = $parser->next_result(); But I receive an error: -------------------- WARNING --------------------- MSG: There was an error - see error_string for the program output --------------------------------------------------- ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/vendor_perl/5.10.0/Bio/Root/Root.pm:359 STACK: Bio::Tools::Phylo::PAML::_parse_summary /usr/lib/perl5/vendor_perl/5.10.0/Bio/Tools/Phylo/PAML.pm:441 STACK: Bio::Tools::Phylo::PAML::next_result /usr/lib/perl5/vendor_perl/5.10.0/Bio/Tools/Phylo/PAML.pm:257 I use PAML4.4. Could this be the reason? Best, Alex --- By the time you've read this, you've already read it! From David.Messina at sbc.su.se Mon Mar 8 11:29:00 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 8 Mar 2010 17:29:00 +0100 Subject: [Bioperl-l] Problem with PAML/Codeml wrapper In-Reply-To: References: Message-ID: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> Hi Alexander, Hmm, it *should* work given those parameters ? it does for 4.3b ? but I haven't tested it with codeml 4.4 yet. Could you file a bug, including a small test case (code + sequence) so we can try to reproduce and fix the problem? http://bugzilla.open-bio.org/ Thanks, Dave From alex at bioinf.uni-leipzig.de Mon Mar 8 12:11:42 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Mon, 8 Mar 2010 18:11:42 +0100 (CET) Subject: [Bioperl-l] Problem with PAML/Codeml wrapper In-Reply-To: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> References: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> Message-ID: sure. thanks! alex On Mon, 8 Mar 2010, Dave Messina wrote: > Hi Alexander, > > Hmm, it *should* work given those parameters ? it does for 4.3b ? but I haven't tested it with codeml 4.4 yet. > > Could you file a bug, including a small test case (code + sequence) so we can try to reproduce and fix the problem? > > http://bugzilla.open-bio.org/ > > > Thanks, > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > --- By the time you've read this, you've already read it! From jovel_juan at hotmail.com Mon Mar 8 23:08:20 2010 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 9 Mar 2010 04:08:20 +0000 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: , Message-ID: Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor, but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE From jovel_juan at hotmail.com Mon Mar 8 23:50:45 2010 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 9 Mar 2010 04:50:45 +0000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , Message-ID: Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE From florent.angly at gmail.com Tue Mar 9 01:41:33 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 09 Mar 2010 16:41:33 +1000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , Message-ID: <4B95ED9D.6080307@gmail.com> Hi Juan, How about you throw away sequences that have a mismatch in the adapter? After all, if there is a mismatch in the first few bases, it does not bode well for the rest of the sequence and there are so many sequences that it is not a big loss. Florent On 09/03/10 14:50, Juan Jovel wrote: > > > Hello Guys! > > Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. > > I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. > > Any suggestion will be appreciated. Thanks! > > JUAN > > > _________________________________________________________________ > Discover the new Windows Vista > http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From michael.watson at bbsrc.ac.uk Tue Mar 9 01:38:26 2010 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 9 Mar 2010 06:38:26 +0000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , , Message-ID: <8D08960C647E64438CE5740657CBBDC501F910621D@iahcexch1.iah.bbsrc.ac.uk> Use fastx toolkit or something within emboss. Failing that, just write something in pure perl:) ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] On Behalf Of Juan Jovel [jovel_juan at hotmail.com] Sent: 09 March 2010 04:50 To: bioperl Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From acn at stowers.org Tue Mar 9 01:31:49 2010 From: acn at stowers.org (Noll, Aaron) Date: Tue, 9 Mar 2010 00:31:49 -0600 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: Message-ID: http://hannonlab.cshl.edu/fastx_toolkit/commandline.html try out the clipper tool FASTA/Q Clipper $ fastx_clipper -h usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE] version 0.0.6 [-h] = This helpful help screen. [-a ADAPTER] = ADAPTER string. default is CCTTAAGG (dummy adapter). [-l N] = discard sequences shorter than N nucleotides. default is 5. [-d N] = Keep the adapter and N bases after it. (using '-d 0' is the same as not using '-d' at all. which is the default). [-c] = Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter). [-C] = Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter). [-k] = Report Adapter-Only sequences. [-n] = keep sequences with unknown (N) nucleotides. default is to discard such sequences. [-v] = Verbose - report number of sequences. If [-o] is specified, report will be printed to STDOUT. If [-o] is not specified (and output goes to STDOUT), report will be printed to STDERR. [-z] = Compress output with GZIP. [-D] = DEBUG output. [-i INFILE] = FASTA/Q input file. default is STDIN. [-o OUTFILE] = FASTA/Q output file. default is STDOUT. This is a suite of nice utilities that can be downloaded and that by the way are also used by galaxy. -Aaron -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Juan Jovel Sent: Monday, March 08, 2010 10:51 PM To: bioperl Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From alex at bioinf.uni-leipzig.de Tue Mar 9 13:00:01 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Tue, 9 Mar 2010 19:00:01 +0100 (CET) Subject: [Bioperl-l] bootstrap values in cladogram Message-ID: Hi, using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and bootstrap values and try to plot the tree as cladogram. But somehow I cannot print the bootstrap values. Short example: test.nwk ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); [..] use Bio::TreeIO; use Bio::Tree::Draw::Cladogram; [..] my $trees = Bio::TreeIO->new( -file => "test.nwk", -format => 'newick'); my $tree = $trees->next_tree(); [..] my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, -tree => $tree, -compact => 0); $out->print(-file => "test.eps"); I already tried it by copying the bootstrap values into the ids of the internal nodes - nothing. Any suggestions? Thanks, Alex --- By the time you've read this, you've already read it! From jason at bioperl.org Tue Mar 9 15:49:06 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 09 Mar 2010 12:49:06 -0800 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: Message-ID: <4B96B442.8070003@bioperl.org> SearchIO writer -> BLAST format. presumably something like Bio::SearchIO::Writer::TextResultWriter Janine Arloth wrote, On 3/5/10 1:43 AM: > Hello, > using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: > > 1 50 > test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA > lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA > > > 51 100 > test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA > lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA > > > > But I prefer this format: > > > > Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 > > Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 > ||||||||||||||||||||||||| > Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 > > > How can I get this? > > Best Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bhakti.dwivedi at gmail.com Tue Mar 9 15:58:34 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Tue, 9 Mar 2010 15:58:34 -0500 Subject: [Bioperl-l] How to retrieve the Gene Info from the hit genomes start and end positions in the blast table report? Message-ID: Hi, I have a blastn and blastx report (both in blast table m-8 format) against the ncbi nr database. Based on the Hits Start and End positions, how can I retrieve the gene name/acc/id? The blast table does show the hit organism accession number, but what I want is specifically the gene to which it is hitting to. Is there a way to do this in bioperl? Thanks From David.Messina at sbc.su.se Tue Mar 9 16:39:08 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 9 Mar 2010 22:39:08 +0100 Subject: [Bioperl-l] How to retrieve the Gene Info from the hit genomes start and end positions in the blast table report? In-Reply-To: References: Message-ID: Hi Bhakti, Forgive me if the below shows that I've totally misunderstood ? it's late here. > The blast table does show the hit organism > accession number, As you say, in BLAST -m 8 reports, the hit's accession number is the second column. I'm not sure when this would be different from the gene's accession number, at least for the entries in nr for which a gene name has been assigned (some are known only by their accession number). > Based on the Hits Start and End positions, how can I > retrieve the gene name/acc/id? The short answer is 'you can't'. But this makes me think that you're not going against the nr database, but instead whole genome or chromosome sequence records. In which case some of them will have genes annotated in the feature table, which you can get out using BioPerl: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation But many (most?) won't be annotated in this way, in which case you will need to find some file or database that has all the genes' start and stop positions in the sequence that you're searching. Perhaps you could provide a couple of your hits as examples so the problem is clearer? Dave From till.bayer at kaust.edu.sa Wed Mar 10 03:20:15 2010 From: till.bayer at kaust.edu.sa (Till Bayer) Date: Wed, 10 Mar 2010 11:20:15 +0300 Subject: [Bioperl-l] Bio::Index::Blast bug Message-ID: <4B97563F.3020901@kaust.edu.sa> Hi all! I tried to use Bio::Index::Blast, but always got the first hit back, no matter what ID I used. The reason is that the Blast indexer seems to use 'BLAST' as a record separator in all cases, except for RPS-BLAST. I think however that for the current versions of blastall and blast+ 'Query=' should be used. Thus, changing line 222 in Blast.pm from $indexpoint = tell($BLAST) - length $_ if ( $prefix eq 'RPS-' ); to $indexpoint = tell($BLAST) - length $_; makes it work for me. However I have no idea what RPS-BLAST may be, or what different versions of blast output are used, so maybe someone who knows should have a look at that before changing things, and writing a cleaner version than the above hack. Cheers, Till -- Till Bayer 4700 King Abdullah University for Science and Technology Building 2, Room 4231-W16 Thuwal 23955-6900 Saudi Arabia Phone: +96628082373 From avilella at gmail.com Wed Mar 10 03:55:09 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 10 Mar 2010 08:55:09 +0000 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores Message-ID: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> Hi all, I would like to know if anyone knows of a script or method in bioperl to do an unambiguous assembly of fastq sequences, combining the q-scores to give assembled fastq sequences as the output. By unambiguous I mean something like what abyss would produce with this options: ABYSS -k$k -b0 -t0 -e0 -c0 but giving assembled fastq sequences with combined q-scores as output instead of simple fasta assembled sequences. Thanks in advance From sdavis2 at mail.nih.gov Wed Mar 10 05:31:50 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 10 Mar 2010 05:31:50 -0500 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores In-Reply-To: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> References: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> Message-ID: <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> On Wed, Mar 10, 2010 at 3:55 AM, Albert Vilella wrote: > Hi all, > > I would like to know if anyone knows of a script or method in bioperl > to do an unambiguous assembly of fastq sequences, combining the q-scores to > give assembled fastq sequences as the output. > > By unambiguous I mean something like what abyss would produce with this options: > > ABYSS -k$k -b0 -t0 -e0 -c0 > > but giving assembled fastq sequences with combined q-scores as output > instead of simple > fasta assembled sequences. Hi, Albert. I'm not sure exactly what you want here, but have you looked at the Mosaik aligner? Also, look at samtools pileup; you can probably produce something similar to what you want from it as well. I certainly might have misunderstood the problem, though. Sean From biopython at maubp.freeserve.co.uk Wed Mar 10 05:35:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Mar 2010 10:35:56 +0000 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <4B97563F.3020901@kaust.edu.sa> References: <4B97563F.3020901@kaust.edu.sa> Message-ID: <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: > Hi all! > > I tried to use Bio::Index::Blast, but always got the first hit back, no > matter what ID I used. The reason is that the Blast indexer seems to use > 'BLAST' as a record separator in all cases, except for RPS-BLAST. > I think however that for the current versions of blastall and blast+ > 'Query=' should be used. That fits with changes I had to make in Biopython for breaking up the plain text BLAST output into each query. For a while only the RPS-BLAST report omitted the "header" (the BLAST line and the journal references users should cite) between records, but now all the NCBI BLAST tools do this - forcing us to look for the Query= line. i.e. I can't comment on the BioPerl change itself, but your reasoning about the BLAST output makes sense. Peter From avilella at gmail.com Wed Mar 10 05:47:01 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 10 Mar 2010 10:47:01 +0000 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores In-Reply-To: <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> References: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> Message-ID: <358f4d651003100247k789344a2m2decd7283e658de9@mail.gmail.com> Hi Sean, By unambiguous assembly of reads I mean that one would not squash bubbles or trim branches, but simply collapse fully overlapping (embedded) reads by combining the q-scores, or raising the q-scores if you want, and keeping branching graphs separate. This unambiguous denovo assembly would discard depth information, which is important if you are doing digital gene expression analysis, but would produce a collapsed fastq set of sequences that would be leaner for downstream processing. I'll have a look at Mosaik. I tried samtools pileup, but it seems a bit overcomplicated to have to map back the reads if what you want to do is just have the assembled reads with fastq scores coming out of the assembler in the first place. That's why I was thinking it would be good to have this unambiguous or "dummy" fastq assembly output could fit into a bioperl script or method. Cheers On Wed, Mar 10, 2010 at 10:31 AM, Sean Davis wrote: > On Wed, Mar 10, 2010 at 3:55 AM, Albert Vilella wrote: >> Hi all, >> >> I would like to know if anyone knows of a script or method in bioperl >> to do an unambiguous assembly of fastq sequences, combining the q-scores to >> give assembled fastq sequences as the output. >> >> By unambiguous I mean something like what abyss would produce with this options: >> >> ABYSS -k$k -b0 -t0 -e0 -c0 >> >> but giving assembled fastq sequences with combined q-scores as output >> instead of simple >> fasta assembled sequences. > > Hi, Albert. > > I'm not sure exactly what you want here, but have you looked at the > Mosaik aligner? ?Also, look at samtools pileup; you can probably > produce something similar to what you want from it as well. > > I certainly might have misunderstood the problem, though. > > Sean > From adsj at novozymes.com Wed Mar 10 08:46:02 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 10 Mar 2010 14:46:02 +0100 Subject: [Bioperl-l] [PATCH] Fix infinite loop in EMBL writer. Message-ID: <87k4tke1d1.fsf@topper.koldfront.dk> This fix is an exact duplicate of the fix for bug #2915 - of the Genbank writer, which was fixed in revision 16275. --- Bio/SeqIO/embl.pm | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/Bio/SeqIO/embl.pm b/Bio/SeqIO/embl.pm index cfea1b6..de1bf11 100644 --- a/Bio/SeqIO/embl.pm +++ b/Bio/SeqIO/embl.pm @@ -1432,7 +1432,7 @@ sub _write_line_EMBL_regex { CHUNK: while($line) { foreach my $pat ($regex, '[,;\.\/-]\s|'.$regex, '[,;\.\/-]|'.$regex) { - if ($line =~ m/^(.{1,$subl})($pat)(.*)/ ) { + if ($line =~ m/^(.{0,$subl})($pat)(.*)/ ) { my $l = $1.$2; $l =~ s/#/ /g # remove word wrap protection char '#' if $pre1 eq "RA "; @@ -1441,6 +1441,7 @@ sub _write_line_EMBL_regex { # be strict about not padding spaces according to # genbank format $l =~ s/\s+$//; + next CHUNK if ($l eq ''); push(@lines, $l); next CHUNK; } -- 1.6.3.3 -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Wed Mar 10 09:27:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 08:27:59 -0600 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> Message-ID: On Mar 10, 2010, at 4:35 AM, Peter wrote: > On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >> Hi all! >> >> I tried to use Bio::Index::Blast, but always got the first hit back, no >> matter what ID I used. The reason is that the Blast indexer seems to use >> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >> I think however that for the current versions of blastall and blast+ >> 'Query=' should be used. > > That fits with changes I had to make in Biopython for breaking > up the plain text BLAST output into each query. For a while only > the RPS-BLAST report omitted the "header" (the BLAST line > and the journal references users should cite) between records, > but now all the NCBI BLAST tools do this - forcing us to look > for the Query= line. > > i.e. I can't comment on the BioPerl change itself, but your > reasoning about the BLAST output makes sense. > > Peter One side-effect of this is we will be missing the search algorithm and a few small odds and ends from all but the first report; this trickles down into how we properly deal with HSP coordinates, but we can probably wrangle some magic there to get things working for the most part. This is similar to how XML format is currently dealt with (and another reason this format is the easiest to support, as it doesn't change based on NCBI's whims). Do we have example reports with multiple queries from BLAST+ available? It would be invaluable for the projects; if not I can probably generate a few locally. chris From biopython at maubp.freeserve.co.uk Wed Mar 10 09:40:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Mar 2010 14:40:16 +0000 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> Message-ID: <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> On Wed, Mar 10, 2010 at 2:27 PM, Chris Fields wrote: > On Mar 10, 2010, at 4:35 AM, Peter wrote: > >> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >>> Hi all! >>> >>> I tried to use Bio::Index::Blast, but always got the first hit back, no >>> matter what ID I used. The reason is that the Blast indexer seems to use >>> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >>> I think however that for the current versions of blastall and blast+ >>> 'Query=' should be used. >> >> That fits with changes I had to make in Biopython for breaking >> up the plain text BLAST output into each query. For a while only >> the RPS-BLAST report omitted the "header" (the BLAST line >> and the journal references users should cite) between records, >> but now all the NCBI BLAST tools do this - forcing us to look >> for the Query= line. >> >> i.e. I can't comment on the BioPerl change itself, but your >> reasoning about the BLAST output makes sense. >> >> Peter > > One side-effect of this is we will be missing the search > algorithm and a few small odds and ends from all but > the first report; this trickles down into how we properly > deal with HSP coordinates, but we can probably wrangle > some magic there to get things working for the most part. > ... Yeah - I had similar issues with the Biopython plain text BLAST parser. The hack/magic I used was to cache the header text from the first record and then re-insert it on subsequence records. Nasty, but works. >?This is similar to how XML format is currently dealt with > (and another reason this format is the easiest to support, > as it doesn't change based on NCBI's whims). They may have changed a few things here too - watch out. > Do we have example reports with multiple queries from > BLAST+ available? ?It would be invaluable for the projects; > if not I can probably generate a few locally. I've got one example in Biopython's unit tests, http://biopython.org/SRC/biopython/Tests/Blast/bt081.txt Peter From cjfields at illinois.edu Wed Mar 10 10:19:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 09:19:42 -0600 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> Message-ID: <27C91884-E910-4BDF-B777-B90E7B4F9103@illinois.edu> On Mar 10, 2010, at 8:40 AM, Peter wrote: > On Wed, Mar 10, 2010 at 2:27 PM, Chris Fields wrote: >> On Mar 10, 2010, at 4:35 AM, Peter wrote: >> >>> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >>>> Hi all! >>>> >>>> I tried to use Bio::Index::Blast, but always got the first hit back, no >>>> matter what ID I used. The reason is that the Blast indexer seems to use >>>> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >>>> I think however that for the current versions of blastall and blast+ >>>> 'Query=' should be used. >>> >>> That fits with changes I had to make in Biopython for breaking >>> up the plain text BLAST output into each query. For a while only >>> the RPS-BLAST report omitted the "header" (the BLAST line >>> and the journal references users should cite) between records, >>> but now all the NCBI BLAST tools do this - forcing us to look >>> for the Query= line. >>> >>> i.e. I can't comment on the BioPerl change itself, but your >>> reasoning about the BLAST output makes sense. >>> >>> Peter >> >> One side-effect of this is we will be missing the search >> algorithm and a few small odds and ends from all but >> the first report; this trickles down into how we properly >> deal with HSP coordinates, but we can probably wrangle >> some magic there to get things working for the most part. >> ... > > Yeah - I had similar issues with the Biopython plain > text BLAST parser. The hack/magic I used was to > cache the header text from the first record and then > re-insert it on subsequence records. Nasty, but works. Right, but here's the side-effect: unless that data is somehow stored when indexing, it will not be caught if one starts an IO stream at any point past the BLAST header (in other words, all but the first report). We could, in effect, store that as meta information somehow (I think Index may have some meta storage), or just parse it prior to initiating the stream and pass the information into the IO object. >> This is similar to how XML format is currently dealt with >> (and another reason this format is the easiest to support, >> as it doesn't change based on NCBI's whims). > > They may have changed a few things here too - watch out. Ugh. >> Do we have example reports with multiple queries from >> BLAST+ available? It would be invaluable for the projects; >> if not I can probably generate a few locally. > > I've got one example in Biopython's unit tests, > http://biopython.org/SRC/biopython/Tests/Blast/bt081.txt > > Peter Okay, will start up some work to work out tests, etc. chris From thomas.sharpton at gmail.com Wed Mar 10 10:30:37 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 10 Mar 2010 07:30:37 -0800 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser Message-ID: Hey everyone, Since HMMER version 3 went live in the middle of last month, I thought it a good time to update the SearchIO parser I've been working on for some time and submit the tool to the community (finally....). At the moment, the module seems capable of parsing hmmsearch and hmmscan outputs, both with and without the alignment option. Some aspects of functionality have yet to be flushed out, but this one should be capable of doing most of your day to day procedures (at least it appears to on my end). I'd love to have people play with it and I'm happy to hear feedback, criticism, development requests and bug reports. That said, this is the first code I've contributed to BioPerl, so please be gentle ;). You can find the bioperl-hmmer3 package in bioperl-dev. I've included a test script as well as sample hmmscan/hmmsearch report files and test data in the bioperl-hmmer3 root directory. As an aside, BioPerl has been a wonderful resource for me and I'm glad to be giving back, even if only a little. I hope this helps out at least a few of you. All the best, Tom From cjfields at illinois.edu Wed Mar 10 10:53:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 09:53:41 -0600 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser In-Reply-To: References: Message-ID: <1268236421.20872.21.camel@pyrimidine.igb.uiuc.edu> Wonderful! Tom, thanks for your hard work! chris On Wed, 2010-03-10 at 07:30 -0800, Thomas Sharpton wrote: > Hey everyone, > > Since HMMER version 3 went live in the middle of last month, I thought > it a good time to update the SearchIO parser I've been working on for > some time and submit the tool to the community (finally....). At the > moment, the module seems capable of parsing hmmsearch and hmmscan > outputs, both with and without the alignment option. Some aspects of > functionality have yet to be flushed out, but this one should be > capable of doing most of your day to day procedures (at least it > appears to on my end). > > I'd love to have people play with it and I'm happy to hear feedback, > criticism, development requests and bug reports. That said, this is > the first code I've contributed to BioPerl, so please be gentle ;). > You can find the bioperl-hmmer3 package in bioperl-dev. I've included > a test script as well as sample hmmscan/hmmsearch report files and > test data in the bioperl-hmmer3 root directory. > > As an aside, BioPerl has been a wonderful resource for me and I'm glad > to be giving back, even if only a little. I hope this helps out at > least a few of you. > > All the best, > Tom > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From asjo at koldfront.dk Wed Mar 10 12:04:00 2010 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 10 Mar 2010 18:04:00 +0100 Subject: [Bioperl-l] Fix infinite loop in EMBL writer. In-Reply-To: <87k4tke1d1.fsf@topper.koldfront.dk> ("Adam =?iso-8859-1?Q?Sj?= =?iso-8859-1?Q?=F8gren=22's?= message of "Wed, 10 Mar 2010 14:46:02 +0100") References: <87k4tke1d1.fsf@topper.koldfront.dk> Message-ID: <87wrxkw1kv.fsf@topper.koldfront.dk> On Wed, 10 Mar 2010 14:46:02 +0100, Adam wrote: > This fix is an exact duplicate of the fix for bug #2915 - of > the Genbank writer, which was fixed in revision 16275. I have created bug #3025 in bugzilla with the patch (I couldn't remember whether here or there is most appropriate). Best regards, Adam -- "It isn't modern just because it's electric. Country Adam Sj?gren music was electric too." asjo at koldfront.dk From David.Messina at sbc.su.se Wed Mar 10 12:35:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 10 Mar 2010 18:35:52 +0100 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser In-Reply-To: References: Message-ID: Thanks so much, Thomas! I expect to be using Hmmer 3 for my own work fairly soon, so I'm looking forward to taking advantage of this. Dave From rmb32 at cornell.edu Wed Mar 10 15:13:57 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 10 Mar 2010 12:13:57 -0800 Subject: [Bioperl-l] call for help - BioPerl GSoC wiki page Message-ID: <4B97FD85.50402@cornell.edu> Hi all, BioPerl's Google Summer of Code page in support of the Open Bioinformatics Foundation's application to Google Summer of Code is shaping up, but still needs some polishing. We're coming up on the application deadline, and we need to make a good, polished show of it. Please put in a little time to look at, edit, polish, and flesh out the BioPerl and OBF wiki pages in support of our application: BioPerl: http://bioperl.org/wiki/Google_Summer_of_Code OBF: http://open-bio.org/wiki/Google_Summer_of_Code Specific things for the BioPerl page, the Bio::Assembly project on that page needs to either be fleshed out or removed. Thanks for all the hard work from everyone so far (especially Chris!). It would be *very* good to have some more project ideas and mentor volunteers. So if you haven't already, please consider volunteering to mentor a student. Also, we all know many things that BioPerl needs help with, so if you can think of a good intern project, add it to the page and maybe we can get a GSoC student to work on it. Rob From nml5566 at gmail.com Wed Mar 10 17:52:19 2010 From: nml5566 at gmail.com (Nathan Liles) Date: Wed, 10 Mar 2010 16:52:19 -0600 Subject: [Bioperl-l] Can protein glyph tracks interfere with other tracks? Message-ID: <4B9822A3.2050202@gmail.com> I'm trying to patch Gbrowse to properly display circular segments. Currently, I'm working on getting the protein glyphs to display properly beyond the end of the track. I noticed when I turn on the protein track, it can sometimes affect another track. Specifically, turning on the protein track can either cause the gene glyphs to disappear or be duplicated. This only happens for features with two subfeatures that appear on the panel at opposite ends. This seems strange since I can't imagine how one track could affect another. Has anyone noticed this behavior before? Can anybody think of a way that the protein glyph module can affect other glyphs? Thanks, Nathan Liles From me at miguel.weapps.com Thu Mar 11 00:48:17 2010 From: me at miguel.weapps.com (Luis M Rodriguez-R) Date: Thu, 11 Mar 2010 00:48:17 -0500 Subject: [Bioperl-l] PSI-BLAST uncommon result Message-ID: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> Hello all, I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception: MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792 My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail. Best regards, Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 luisrodr at uniandes.edu.co me at miguel.weapps.com BLASTP 2.2.18 [Mar-02-2008] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for compositional score matrix adjustment: Altschul, Stephen F., John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. Reference for composition-based statistics starting in round 2: Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. Query= eff254 (67 letters) Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 10,383,435 sequences; 3,542,477,638 total letters Searching..................................................done Results from round 1 Score E Sequences producing significant alignments: (bits) Value ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28 >ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin hrpA gb|ABA39805.1| HrpA [Erwinia pyrifoliae] emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM 12163] Length = 67 Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust. Identities = 67/67 (100%), Positives = 67/67 (100%) Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 Query: 61 AAKAIQF 67 AAKAIQF Sbjct: 61 AAKAIQF 67 Searching..................................................done ***** No hits found ****** Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 863,709,833 Number of sequences in database: 2,562,282 Database: /storage1/databases/ncbi-blast/nr.01 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 936,189,781 Number of sequences in database: 2,674,439 Database: /storage1/databases/ncbi-blast/nr.02 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 974,890,473 Number of sequences in database: 2,826,395 Database: /storage1/databases/ncbi-blast/nr.03 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 767,687,551 Number of sequences in database: 2,320,319 Lambda K H 0.297 0.107 0.256 Lambda K H 0.267 0.0344 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 480,706,425 Number of Sequences: 10383435 Number of extensions: 8598061 Number of successful extensions: 47335 Number of sequences better than 1.0e-25: 1 Number of HSP's better than 0.0 without gapping: 2 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 47333 Number of HSP's gapped (non-prelim): 2 length of query: 67 length of database: 3,542,477,638 effective HSP length: 39 effective length of query: 28 effective length of database: 3,137,523,673 effective search space: 87850662844 effective search space used: 87850662844 T: 11 A: 40 X1: 16 ( 6.9 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 43 (21.7 bits) S2: 298 (119.7 bits) From jason at bioperl.org Thu Mar 11 03:13:24 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 00:13:24 -0800 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: References: Message-ID: <4B98A624.7020102@bioperl.org> not sure if the cladogram is printing bootstraps from the internal id or the bootstrap function. See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how to automatically convert internal IDs to boostrap slots basically by using -internal_node_id => 'bootstrap' in the TreeIO initialization. You may want to iterate through the tree and print $node->bootstrap where you think it should be so you can verify that it is working too. -jason Alexander Donath wrote, On 3/9/10 10:00 AM: > Hi, > > using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and > bootstrap values and try to plot the tree as cladogram. But somehow I > cannot print the bootstrap values. > > Short example: > > test.nwk > ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); > > > > [..] > use Bio::TreeIO; > use Bio::Tree::Draw::Cladogram; > [..] > my $trees = Bio::TreeIO->new( -file => "test.nwk", > -format => 'newick'); > my $tree = $trees->next_tree(); > [..] > my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, > -tree => $tree, > -compact => 0); > > $out->print(-file => "test.eps"); > > > I already tried it by copying the bootstrap values into the ids of the > internal nodes - nothing. Any suggestions? > > > Thanks, > Alex > > --- > By the time you've read this, you've already read it! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 11 09:27:33 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 08:27:33 -0600 Subject: [Bioperl-l] PSI-BLAST uncommon result In-Reply-To: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> References: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> Message-ID: <70AF1FA5-FD88-48E3-A672-F72B9D3E1B3B@illinois.edu> Luis, The best way to handle this is to attach the problematic report (not append it) to a bug report on bugzilla. This ensures we aren't running into artifacts generated via the email client, etc. chris On Mar 10, 2010, at 11:48 PM, Luis M Rodriguez-R wrote: > Hello all, > > I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception: > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792 > My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail. > > Best regards, > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > > BLASTP 2.2.18 [Mar-02-2008] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > > Reference for compositional score matrix adjustment: Altschul, Stephen F., > John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, > Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches > using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. > > > Reference for composition-based statistics starting in round 2: > Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, > Sergei Shavirin, John L. Spouge, Yuri I. Wolf, > Eugene V. Koonin, and Stephen F. Altschul (2001), > "Improving the accuracy of PSI-BLAST protein database searches with > composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > Query= eff254 > (67 letters) > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 10,383,435 sequences; 3,542,477,638 total letters > > Searching..................................................done > > > Results from round 1 > > > Score E > Sequences producing significant alignments: (bits) Value > > ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28 > >> ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation > pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] > sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin > hrpA > gb|ABA39805.1| HrpA [Erwinia pyrifoliae] > emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation > pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] > emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM > 12163] > Length = 67 > > Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust. > Identities = 67/67 (100%), Positives = 67/67 (100%) > > Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 > MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN > Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 > > Query: 61 AAKAIQF 67 > AAKAIQF > Sbjct: 61 AAKAIQF 67 > > > Searching..................................................done > > > > ***** No hits found ****** > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 863,709,833 > Number of sequences in database: 2,562,282 > > Database: /storage1/databases/ncbi-blast/nr.01 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 936,189,781 > Number of sequences in database: 2,674,439 > > Database: /storage1/databases/ncbi-blast/nr.02 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 974,890,473 > Number of sequences in database: 2,826,395 > > Database: /storage1/databases/ncbi-blast/nr.03 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 767,687,551 > Number of sequences in database: 2,320,319 > > Lambda K H > 0.297 0.107 0.256 > > Lambda K H > 0.267 0.0344 0.140 > > > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Hits to DB: 480,706,425 > Number of Sequences: 10383435 > Number of extensions: 8598061 > Number of successful extensions: 47335 > Number of sequences better than 1.0e-25: 1 > Number of HSP's better than 0.0 without gapping: 2 > Number of HSP's successfully gapped in prelim test: 0 > Number of HSP's that attempted gapping in prelim test: 47333 > Number of HSP's gapped (non-prelim): 2 > length of query: 67 > length of database: 3,542,477,638 > effective HSP length: 39 > effective length of query: 28 > effective length of database: 3,137,523,673 > effective search space: 87850662844 > effective search space used: 87850662844 > T: 11 > A: 40 > X1: 16 ( 6.9 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 43 (21.7 bits) > S2: 298 (119.7 bits) > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Mar 11 10:38:50 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 07:38:50 -0800 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: References: <4B98A624.7020102@bioperl.org> Message-ID: <4B990E8A.5060704@bioperl.org> Yeah sorry then I don't know what the problem is. The usual - are you using the latest version question applies, but sounds like something else is wrong with this module. I don't have any time to try out any code sorry but maybe someone else can step in to give a hand. -jason Alexander Donath wrote, On 3/11/10 1:05 AM: > I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. > > Nevertheless, iterating through the tree and printing $node->bootstrap > worked in both cases and gave me the correct bootstrap values of the > inner nodes. > > I also called move_id_to_bootstrap on the tree. But this resulted in > an error: > > Can't locate object method "move_id_to_bootstrap" via package > "Bio::Tree::Tree". > Even though it's inherited from the interface, as far as I can tell. > > > alex > > > On Thu, 11 Mar 2010, Jason Stajich wrote: > >> not sure if the cladogram is printing bootstraps from the internal id >> or the bootstrap function. >> >> See the example code here http://bioperl.org/wiki/HOWTO:Trees that >> shows how to automatically convert internal IDs to boostrap slots >> basically by using >> -internal_node_id => 'bootstrap' >> in the TreeIO initialization. >> >> You may want to iterate through the tree and print $node->bootstrap >> where you think it should be so you can verify that it is working too. >> >> -jason >> >> Alexander Donath wrote, On 3/9/10 10:00 AM: >>> Hi, >>> >>> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths >>> and bootstrap values and try to plot the tree as cladogram. But >>> somehow I cannot print the bootstrap values. >>> >>> Short example: >>> >>> test.nwk >>> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >>> >>> >>> >>> >>> [..] >>> use Bio::TreeIO; >>> use Bio::Tree::Draw::Cladogram; >>> [..] >>> my $trees = Bio::TreeIO->new( -file => "test.nwk", >>> -format => 'newick'); >>> my $tree = $trees->next_tree(); >>> [..] >>> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >>> -tree => $tree, >>> -compact => 0); >>> >>> $out->print(-file => "test.eps"); >>> >>> >>> I already tried it by copying the bootstrap values into the ids of the >>> internal nodes - nothing. Any suggestions? >>> >>> >>> Thanks, >>> Alex >>> >>> --- >>> By the time you've read this, you've already read it! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > --- > Alexander Donath > Professur f?r Bioinformatik > Institut f?r Informatik > Universit?t Leipzig > H?rtelstr. 16-18 > D-04107 Leipzig, Germany > > phone: +49 (0)341 97-16702 > fax: +49 (0)341 97-16679 > > By the time you've read this, you've already read it! From jason at bioperl.org Thu Mar 11 10:40:59 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 07:40:59 -0800 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: References: Message-ID: <4B990F0B.8010100@bioperl.org> You should only have TWO nodes in the array not all the leaves. =head2 distance Title : distance Usage : distance(-nodes => \@nodes ) Function: returns the distance between TWO given nodes Returns : numerical distance Args : -nodes => arrayref of nodes to test or ($node1, $node2) =cut Jeffrey Detras wrote, On 3/4/10 10:17 PM: > Hi, > > I am new at using the Bio::TreeIO module specifically using the newick > format for a phylogenetic analysis. The sample_tree attached is > Newick-formatted tree. My objective is to get all the distances between all > the leaf nodes. I copied examples of the code from > http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my > knowledge) so that I understand how to assign the right array value for the > nodes/leaves. The message would say must provide 2 root nodes. > > Here is what I have right now: > > #!/usr/bin/perl -w > use strict; > > my $treefile = 'sample_tree'; > use Bio::TreeIO; > my $treeio = Bio::TreeIO->new(-format => 'newick', > -file => $treefile); > > while (my $tree = $treeio->next_tree) { > my @leaves = $tree->get_leaf_nodes; > for (my $dist = $tree->distance(-nodes => \@leaves)){ > print "Distance between trees is $dist\n"; > } > } > > Thanks, > Jeff > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Mar 11 11:11:04 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 11:11:04 -0500 Subject: [Bioperl-l] Can protein glyph tracks interfere with other tracks? In-Reply-To: <4B9822A3.2050202@gmail.com> References: <4B9822A3.2050202@gmail.com> Message-ID: <4536f7701003110811s79c30638x100ae521bce1084a@mail.gmail.com> Hi Nathan, Well, it certainly shouldn't! The tracks are supposed to be calculated independently without reusing anything. Debugging should be fun though. Does it matter if you change the adaptor (for instance, if you are using the memory adaptor for Bio::DB::SeqFeature::Store, try putting it in a mysql database (or vice versa) to help narrow down where the bug is. Scott On Wed, Mar 10, 2010 at 5:52 PM, Nathan Liles wrote: > I'm trying to patch Gbrowse to properly display circular segments. > Currently, I'm working on getting the protein glyphs to display properly > beyond the end of the track. > > I noticed when I turn on the protein track, it can sometimes affect another > track. Specifically, turning on the protein track can either cause the gene > glyphs to disappear or be duplicated. > This only happens for features with two subfeatures that appear on the panel > at opposite ends. > > This seems strange since I can't imagine how one track could affect another. > Has anyone noticed this behavior before? > Can anybody think of a way that the protein glyph module can affect other > glyphs? > > Thanks, > Nathan Liles > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Thu Mar 11 11:21:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 10:21:02 -0600 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: <4B990E8A.5060704@bioperl.org> References: <4B98A624.7020102@bioperl.org> <4B990E8A.5060704@bioperl.org> Message-ID: <2BBC0220-4233-4EB7-81A8-FA8342ED9714@illinois.edu> Alex, The best thing to do is to file this as a bug so we don't lose track of it, including demonstration code. chris On Mar 11, 2010, at 9:38 AM, Jason Stajich wrote: > Yeah sorry then I don't know what the problem is. The usual - are you using the latest version question applies, but sounds like something else is wrong with this module. > > I don't have any time to try out any code sorry but maybe someone else can step in to give a hand. > -jason > > Alexander Donath wrote, On 3/11/10 1:05 AM: >> I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. >> >> Nevertheless, iterating through the tree and printing $node->bootstrap worked in both cases and gave me the correct bootstrap values of the inner nodes. >> >> I also called move_id_to_bootstrap on the tree. But this resulted in an error: >> >> Can't locate object method "move_id_to_bootstrap" via package "Bio::Tree::Tree". >> Even though it's inherited from the interface, as far as I can tell. >> >> >> alex >> >> >> On Thu, 11 Mar 2010, Jason Stajich wrote: >> >>> not sure if the cladogram is printing bootstraps from the internal id or the bootstrap function. >>> >>> See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how to automatically convert internal IDs to boostrap slots basically by using >>> -internal_node_id => 'bootstrap' >>> in the TreeIO initialization. >>> >>> You may want to iterate through the tree and print $node->bootstrap where you think it should be so you can verify that it is working too. >>> >>> -jason >>> >>> Alexander Donath wrote, On 3/9/10 10:00 AM: >>>> Hi, >>>> >>>> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and bootstrap values and try to plot the tree as cladogram. But somehow I cannot print the bootstrap values. >>>> >>>> Short example: >>>> >>>> test.nwk >>>> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >>>> >>>> >>>> >>>> [..] >>>> use Bio::TreeIO; >>>> use Bio::Tree::Draw::Cladogram; >>>> [..] >>>> my $trees = Bio::TreeIO->new( -file => "test.nwk", >>>> -format => 'newick'); >>>> my $tree = $trees->next_tree(); >>>> [..] >>>> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >>>> -tree => $tree, >>>> -compact => 0); >>>> >>>> $out->print(-file => "test.eps"); >>>> >>>> >>>> I already tried it by copying the bootstrap values into the ids of the >>>> internal nodes - nothing. Any suggestions? >>>> >>>> >>>> Thanks, >>>> Alex >>>> >>>> --- >>>> By the time you've read this, you've already read it! >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> --- >> Alexander Donath >> Professur f?r Bioinformatik >> Institut f?r Informatik >> Universit?t Leipzig >> H?rtelstr. 16-18 >> D-04107 Leipzig, Germany >> >> phone: +49 (0)341 97-16702 >> fax: +49 (0)341 97-16679 >> >> By the time you've read this, you've already read it! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Mon Mar 8 16:06:11 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 08 Mar 2010 16:06:11 -0500 Subject: [Bioperl-l] Next Gen Formats Message-ID: <4B9566C3.6000007@umdnj.edu> Does Bioperl support color-space sequences, or FASTA formatted quality value files? ABI's Solid platform generates a number of files, two of which are fairly important (at the moment): 1) .csfasta Color-space sequences in FASTA format 2) .qual Quality values of each color call, also in FASTA format. I didn't see (at quick glance) support for this in Bioperl, but maybe someone can point me in the right direction? Ryan -------------- next part -------------- A non-text attachment was scrubbed... Name: golharam.vcf Type: text/x-vcard Size: 379 bytes Desc: not available URL: From alex at bioinf.uni-leipzig.de Thu Mar 11 04:05:13 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Thu, 11 Mar 2010 10:05:13 +0100 (CET) Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: <4B98A624.7020102@bioperl.org> References: <4B98A624.7020102@bioperl.org> Message-ID: I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. Nevertheless, iterating through the tree and printing $node->bootstrap worked in both cases and gave me the correct bootstrap values of the inner nodes. I also called move_id_to_bootstrap on the tree. But this resulted in an error: Can't locate object method "move_id_to_bootstrap" via package "Bio::Tree::Tree". Even though it's inherited from the interface, as far as I can tell. alex On Thu, 11 Mar 2010, Jason Stajich wrote: > not sure if the cladogram is printing bootstraps from the internal id or the > bootstrap function. > > See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how > to automatically convert internal IDs to boostrap slots basically by using > -internal_node_id => 'bootstrap' > in the TreeIO initialization. > > You may want to iterate through the tree and print $node->bootstrap where you > think it should be so you can verify that it is working too. > > -jason > > Alexander Donath wrote, On 3/9/10 10:00 AM: >> Hi, >> >> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and >> bootstrap values and try to plot the tree as cladogram. But somehow I >> cannot print the bootstrap values. >> >> Short example: >> >> test.nwk >> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >> >> >> >> [..] >> use Bio::TreeIO; >> use Bio::Tree::Draw::Cladogram; >> [..] >> my $trees = Bio::TreeIO->new( -file => "test.nwk", >> -format => 'newick'); >> my $tree = $trees->next_tree(); >> [..] >> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >> -tree => $tree, >> -compact => 0); >> >> $out->print(-file => "test.eps"); >> >> >> I already tried it by copying the bootstrap values into the ids of the >> internal nodes - nothing. Any suggestions? >> >> >> Thanks, >> Alex >> >> --- >> By the time you've read this, you've already read it! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Alexander Donath Professur f?r Bioinformatik Institut f?r Informatik Universit?t Leipzig H?rtelstr. 16-18 D-04107 Leipzig, Germany phone: +49 (0)341 97-16702 fax: +49 (0)341 97-16679 By the time you've read this, you've already read it! From Alexander.Kanapin at oicr.on.ca Thu Mar 11 10:56:41 2010 From: Alexander.Kanapin at oicr.on.ca (Alexander Kanapin) Date: Thu, 11 Mar 2010 10:56:41 -0500 Subject: [Bioperl-l] GFF to GTF converter Message-ID: Hi BioPerl gurus, Does anybody knows a reliable GFF to GTF converter which can generate files acceptable by cufflinks ? We attempted to convert a drosophila and worm genome GFFs (taken from Flybase and Wormbase ftp) to GTF with Bio::FeatureIO #read from a file my $in = Bio::FeatureIO->new(-file => $infile , -format => 'GFF'); #write out features my $out = Bio::FeatureIO->new(-file => ">$outfile" , -format => 'GFF' , -version => 2.5); However, we discovered that the resulting file is not compliant with GTF format specifications as they are described here: http://mblab.wustl.edu/GTF22.html Although, this chunk of code produces CDS and exon entries in the output file, it does not output start codon/stop codon annotations. Also, we think it misinterprets annotations, so that one do see UTR entries annotated as CDS' or exons. Many thanks for ideas/notes. Alex -- Alexander Kanapin, PhD Scientific Associate Ontario Institute for Cancer Research MaRS Centre, South Tower 101 College Street, Suite 800 Toronto, Ontario, Canada M5G 0A3 Tel: 647-260-7993 Toll-free: 1-866-678-6427 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From cjfields at illinois.edu Thu Mar 11 12:27:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 11:27:35 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9566C3.6000007@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> Message-ID: <7D743CA2-80A1-42E3-81D2-03B7CD01FC69@illinois.edu> Not that I know of, though we are certainly receptive to anyone wanting to work this into the current code. chris On Mar 8, 2010, at 3:06 PM, Ryan Golhar wrote: > Does Bioperl support color-space sequences, or FASTA formatted quality value files? > > ABI's Solid platform generates a number of files, two of which are fairly important (at the moment): > > 1) .csfasta > > Color-space sequences in FASTA format > > 2) .qual > > Quality values of each color call, also in FASTA format. > > I didn't see (at quick glance) support for this in Bioperl, but maybe someone can point me in the right direction? > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Thu Mar 11 12:35:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Mar 2010 17:35:32 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9566C3.6000007@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> Message-ID: <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: > Does Bioperl support color-space sequences, or FASTA formatted quality value > files? > > ABI's Solid platform generates a number of files, two of which are fairly > important (at the moment): > > 1) ?.csfasta > > Color-space sequences in FASTA format > > 2) .qual > > Quality values of each color call, also in FASTA format. You mean the QUAL format which was originally introduced by PHRED. Try "qual" as the format name in SeqIO, http://bioperl.org/wiki/HOWTO:SeqIO#Formats > I didn't see (at quick glance) support for this in Bioperl, but maybe > someone can point me in the right direction? I expect that (like in Biopython) you can treat color space FASTA + QUAL just like sequence space files, provided you are happy to interpret the color space strings yourself. Are you hoping to get BioPerl to convert the color space data into sequence space data for you? Peter From cjfields at illinois.edu Thu Mar 11 13:02:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 12:02:43 -0600 Subject: [Bioperl-l] GFF to GTF converter In-Reply-To: References: Message-ID: <8CB58FD4-633F-4711-A2F4-23D00AEB6FB8@illinois.edu> On Mar 11, 2010, at 9:56 AM, Alexander Kanapin wrote: > Hi BioPerl gurus, > > Does anybody knows a reliable GFF to GTF converter which can generate files acceptable by cufflinks ? > > We attempted to convert a drosophila and worm genome GFFs (taken from Flybase and Wormbase ftp) to GTF with Bio::FeatureIO > > #read from a file > my $in = Bio::FeatureIO->new(-file => $infile , -format => 'GFF'); > > #write out features > my $out = Bio::FeatureIO->new(-file => ">$outfile" , > -format => 'GFF' , > -version => 2.5); > > However, we discovered that the resulting file is not compliant with GTF format specifications as they are described here: http://mblab.wustl.edu/GTF22.html Just so this is clear, even though the FeatureIO docs currently state (and I quote): "[Bio::FeatureIO] is the officially sanctioned way of getting at the format objects, which most people should use." it is nowhere near complete, so I have removed said quote from main trunk and replaced with it a very explicit caveat about it's current state, i.e. highly experimental and not currently suggested for production use. It's basically half-baked right now; I am in the midst of refactoring Bio::FeatureIO to try getting it up to speed and to add in flexibility when parsing this data (I'm actually working on it right now), but it's early days on that and may take a bit. Do realize that, even with a refactored FeatureIO, this is one of the more significant problems with GTF, e.g. there are too many definitions of what constitutes GTF or GFF2, so no clear path on how to go about this. At this point most users end up writing up their own parsers, unfortunately. > Although, this chunk of code produces CDS and exon entries in the output file, it does not output start codon/stop codon annotations. > Also, we think it misinterprets annotations, so that one do see UTR entries annotated as CDS' or exons. The start/stop codons can normally be inferred from the CDS/UTRs and exons if they are provided, but again this is one of those issues where there isn't a lot of consistency with the data across various data sources (something addressed at the recent GMOD meeting). What is the source of your GFF? > Many thanks for ideas/notes. > > Alex > > -- > Alexander Kanapin, PhD > Scientific Associate > > Ontario Institute for Cancer Research > MaRS Centre, South Tower > 101 College Street, Suite 800 > Toronto, Ontario, Canada M5G 0A3 > Tel: 647-260-7993 > Toll-free: 1-866-678-6427 > www.oicr.on.ca > This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. chris From jessica.sun at gmail.com Thu Mar 11 14:38:21 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Thu, 11 Mar 2010 14:38:21 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation Message-ID: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> *I downloaded module *>* > Bio-SCF from CPAN. *>* > And I am trying to install it when I got the following error. Can *>* someone help? Thanks much in advance Note (probably harmless): No library found for -lstaden-read Writing Makefile for Bio::SCF how to obtain the missing library * -- Jessica Jingping Sun From cjfields at illinois.edu Thu Mar 11 14:49:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 13:49:51 -0600 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <62CF899F-7C31-49F0-8F5E-C99B2179F3A5@illinois.edu> Did you read the documentation for Bio-SCF? http://cpansearch.perl.org/src/LDS/Bio-SCF-1.03/INSTALL chris On Mar 11, 2010, at 1:38 PM, Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Mar 11 15:00:58 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 15:00:58 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <4536f7701003111200y7d194b3cp2aabb558dcbea5ca@mail.gmail.com> Hello Jessica, You need the Staden io-lib: http://staden.sourceforge.net/ It looks like 1.12.2 is the most recent release. Scott On Thu, Mar 11, 2010 at 2:38 PM, Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From rmb32 at cornell.edu Thu Mar 11 15:02:28 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 11 Mar 2010 12:02:28 -0800 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <4B994C54.50501@cornell.edu> Hello Jessica, For Bio-SCF, you have to have the staden package installed. See the INSTALL notes included in the Bio-SCF distribution. The easiest way to view the INSTALL notes for a perl module's distribution: - go to http://search.cpan.org/ - search for 'Bio::SCF' - click the link to the Bio-SCF-1.03 distribution you see in the search results - the page linked here describes the installation package that Bio::SCF comes in. - On that page, you will see a link to the INSTALL notes for it. This is a good thing to know how to do when you have problems with other perl modules as well. But yes, as Chris said, those installation notes direct you to install the staden io-lib libraries from staden.sourceforge.net. Rob Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > From jessica.sun at gmail.com Thu Mar 11 15:49:49 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Thu, 11 Mar 2010 15:49:49 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <4B994C54.50501@cornell.edu> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> <4B994C54.50501@cornell.edu> Message-ID: <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> Thanks, I got it resolve. Do any one knows how to add a scale of the blast hit image through Bio:Graphics, I mean the rectangle should be difference width rather than the same at the example. shown here http://www.bioperl.org/wiki/HOWTO:Graphics Thanks, On Thu, Mar 11, 2010 at 3:02 PM, Robert Buels wrote: > Hello Jessica, > > For Bio-SCF, you have to have the staden package installed. See the > INSTALL notes included in the Bio-SCF distribution. > > The easiest way to view the INSTALL notes for a perl module's distribution: > - go to http://search.cpan.org/ > - search for 'Bio::SCF' > - click the link to the Bio-SCF-1.03 distribution you see in the search > results > - the page linked here describes the installation package that Bio::SCF > comes in. > - On that page, you will see a link to the INSTALL notes for it. > > This is a good thing to know how to do when you have problems with other > perl modules as well. > > > But yes, as Chris said, those installation notes direct you to install the > staden io-lib libraries from staden.sourceforge.net. > > Rob > > Jessica Sun wrote: > >> *I downloaded module >> >> *>* > Bio-SCF from CPAN. >> *>* > And I am trying to install it when I got the following error. Can >> *>* someone help? Thanks much in advance >> Note (probably harmless): No library found for -lstaden-read >> Writing Makefile for Bio::SCF >> >> how to obtain the missing library >> >> >> * >> >> >> >> > -- Jessica Jingping Sun From scott at scottcain.net Thu Mar 11 16:33:47 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 16:33:47 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> <4B994C54.50501@cornell.edu> <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> Message-ID: <4536f7701003111333q2105c71ftdab0c0b71372ba9f@mail.gmail.com> Hello Jessica, A few things: * It would be better to start a new thread to ask an unrelated question, since people may see the subject of this thread and ignore it if they don't know the answer to the original question. * Can you please try to ask your question again, with more details? Like what have you done already, what was the result, and what would you like for it to look like. If you want it to look like something that is on the wiki, link to that something. The Howto page you linked to has lots of pictures on it. Scott On Thu, Mar 11, 2010 at 3:49 PM, Jessica Sun wrote: > Thanks, I got it resolve. > > Do any one knows how to add a scale of the blast hit image through > Bio:Graphics, I mean the rectangle should be difference width rather than > the same at the example. shown here > > http://www.bioperl.org/wiki/HOWTO:Graphics > > > > Thanks, > > > > On Thu, Mar 11, 2010 at 3:02 PM, Robert Buels wrote: > >> Hello Jessica, >> >> For Bio-SCF, you have to have the staden package installed. ?See the >> INSTALL notes included in the Bio-SCF distribution. >> >> The easiest way to view the INSTALL notes for a perl module's distribution: >> ?- go to http://search.cpan.org/ >> ?- search for 'Bio::SCF' >> ?- click the link to the Bio-SCF-1.03 distribution you see in the search >> results >> ?- the page linked here describes the installation package that Bio::SCF >> comes in. >> ?- On that page, you will see a link to the INSTALL notes for it. >> >> This is a good thing to know how to do when you have problems with other >> perl modules as well. >> >> >> But yes, as Chris said, those installation notes direct you to install the >> staden io-lib libraries from staden.sourceforge.net. >> >> Rob >> >> Jessica Sun wrote: >> >>> *I downloaded module >>> >>> *>* > Bio-SCF from CPAN. >>> *>* > And I am trying to install it when I got the following error. Can >>> *>* someone help? Thanks much in advance >>> Note (probably harmless): No library found for -lstaden-read >>> Writing Makefile for Bio::SCF >>> >>> how to obtain the missing library >>> >>> >>> * >>> >>> >>> >>> >> > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From golharam at umdnj.edu Thu Mar 11 21:19:37 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 11 Mar 2010 21:19:37 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> Message-ID: <4B99A4B9.1070901@umdnj.edu> Not convert the sequences, just read the sequence file and allow me to process each one individually, sort of like: $seqio = new Bio::Seq(...) while ($seq = $seqio->next_seq) { ... } Peter wrote: > On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: >> Does Bioperl support color-space sequences, or FASTA formatted quality value >> files? >> >> ABI's Solid platform generates a number of files, two of which are fairly >> important (at the moment): >> >> 1) .csfasta >> >> Color-space sequences in FASTA format >> >> 2) .qual >> >> Quality values of each color call, also in FASTA format. > > You mean the QUAL format which was originally introduced by PHRED. > Try "qual" as the format name in SeqIO, > http://bioperl.org/wiki/HOWTO:SeqIO#Formats > >> I didn't see (at quick glance) support for this in Bioperl, but maybe >> someone can point me in the right direction? > > I expect that (like in Biopython) you can treat color space FASTA + QUAL > just like sequence space files, provided you are happy to interpret the > color space strings yourself. > > Are you hoping to get BioPerl to convert the color space data into > sequence space data for you? > > Peter > From cjfields at illinois.edu Thu Mar 11 22:35:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 21:35:50 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B99A4B9.1070901@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> Message-ID: Ryan, We would have to see example files to get an idea of how feasible it is. You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual stream, and interleave the two somehow. However, BioPerl qual scores are PHRED-based by default, and I'm not sure how color-space data would work within that schematic. chris On Mar 11, 2010, at 8:19 PM, Ryan Golhar wrote: > Not convert the sequences, just read the sequence file and allow me to > process each one individually, sort of like: > > $seqio = new Bio::Seq(...) > while ($seq = $seqio->next_seq) { > ... > } > > Peter wrote: >> On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: >>> Does Bioperl support color-space sequences, or FASTA formatted quality value >>> files? >>> >>> ABI's Solid platform generates a number of files, two of which are fairly >>> important (at the moment): >>> >>> 1) .csfasta >>> >>> Color-space sequences in FASTA format >>> >>> 2) .qual >>> >>> Quality values of each color call, also in FASTA format. >> You mean the QUAL format which was originally introduced by PHRED. >> Try "qual" as the format name in SeqIO, >> http://bioperl.org/wiki/HOWTO:SeqIO#Formats >>> I didn't see (at quick glance) support for this in Bioperl, but maybe >>> someone can point me in the right direction? >> I expect that (like in Biopython) you can treat color space FASTA + QUAL >> just like sequence space files, provided you are happy to interpret the >> color space strings yourself. >> Are you hoping to get BioPerl to convert the color space data into >> sequence space data for you? >> Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Fri Mar 12 02:28:20 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 12 Mar 2010 07:28:20 +0000 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3969F1.8080002@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <4A3969F1.8080002@sendu.me.uk> Message-ID: <358f4d651003112328g2864ef1as7b8c44ce7bb77c82@mail.gmail.com> > I think not. Well, at least SeqFeature::Store doesn't scale. Try storing > millions of features in a database and watch it crawl to complete > unusability. I can't imagine a db scaling to holding hundreds of TB of data > either. I'm also not sure what the benefit is. There are already high-speed > ways of indexing your fastq or bam files. Hi Sendu, What are the available options to have a quick indexing of fastq files that can be integrated into bioperl? Bio::Index::fastq can be painfully slow for the latest Illumina runs... Cheers, Albert. From biopython at maubp.freeserve.co.uk Fri Mar 12 05:06:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Mar 2010 10:06:46 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> Message-ID: <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: > Ryan, > > We would have to see example files to get an idea of how feasible it is. >?You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual > stream, and interleave the two somehow. ?However, BioPerl qual > scores are PHRED-based by default, and I'm not sure how color-space > data would work within that schematic. > > chris Chris, I am under the (possibly mistaken) assumption that PHRED scores are used for SOLiD color space QUAL files - the key issue is each score corresponds to the color call in the color sequence. Ignoring color-space for a moment, are there BioPerl examples of iterating over a pair of sequence-space FASTA and QUAL files? i.e. What you'd get if you had a FASTQ file to iterate over. [I guess Ryan could just merge the color-space FASTA and QUAL into a color-space FASTQ file and iterate over that] Peter From cjfields at illinois.edu Fri Mar 12 08:04:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 12 Mar 2010 07:04:53 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> Message-ID: <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> On Mar 12, 2010, at 4:06 AM, Peter wrote: > On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: >> Ryan, >> >> We would have to see example files to get an idea of how feasible it is. >> You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual >> stream, and interleave the two somehow. However, BioPerl qual >> scores are PHRED-based by default, and I'm not sure how color-space >> data would work within that schematic. >> >> chris > > Chris, > > I am under the (possibly mistaken) assumption that PHRED scores > are used for SOLiD color space QUAL files - the key issue is each > score corresponds to the color call in the color sequence. > > Ignoring color-space for a moment, are there BioPerl examples > of iterating over a pair of sequence-space FASTA and QUAL files? > i.e. What you'd get if you had a FASTQ file to iterate over. > > [I guess Ryan could just merge the color-space FASTA and > QUAL into a color-space FASTQ file and iterate over that] > > Peter If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things. Iterating over pairs is something that has popped up before. For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like: -------------------------------- my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq'); my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta'); my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual'); while (my $seq = $in->next_seq) { $out1->write_fasta($seq); $out2->write_fasta($seq); } -------------------------------- Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'. This should work for those as well, just haven't tried it myself (it's a bug otherwise). I'm assuming for input it would be something like: -------------------------------- my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta'); my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual'); my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq'); # 'qual' parser joins the two streams while (my $seq = $in2->next_seq($in1)) { $out->write_seq($seq); } -------------------------------- chris From biopython at maubp.freeserve.co.uk Fri Mar 12 08:26:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Mar 2010 13:26:39 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9A3D14.3010208@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> <4B9A3D14.3010208@umdnj.edu> Message-ID: <320fb6e01003120526x7c0c3dddjb4e1422a41968894@mail.gmail.com> On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: > > Here is an example of a color-space sequence: > > In one file (something.csfasta): > >>1_30_226_F3 > T210320010.200.03.0110320320220212200122200.2220200 >>1_30_252_F3 > T322220212.133.00.2202322132022202221002011.0011020 > > The '.' means the color could not be called > > In another file (something.qual): > >>1_30_226_F3 > 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 > 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>1_30_252_F3 > 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 > 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 > > The -1 represents those colors that could not be called. Now that is funny (using -1). True PHRED scores are defined with a logarithm and can't be negative. A score of zero is normally used in this situation since that maps to a probability of error of 1 (i.e. the read is 100% wrong, or 0% true). Where did these files come from? Direct from a sequencing machine or via some third party script? Peter From golharam at umdnj.edu Fri Mar 12 08:43:01 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 12 Mar 2010 13:43:01 +0000 Subject: [Bioperl-l] Next Gen Formats Message-ID: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Direct from sequencing machine ------Original Message------ From: Peter Sender: p.j.a.cock at googlemail.com To: golharam at umdnj.edu Cc: Chris Fields Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Next Gen Formats Sent: Mar 12, 2010 8:26 AM On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: > > Here is an example of a color-space sequence: > > In one file (something.csfasta): > >>1_30_226_F3 > T210320010.200.03.0110320320220212200122200.2220200 >>1_30_252_F3 > T322220212.133.00.2202322132022202221002011.0011020 > > The '.' means the color could not be called > > In another file (something.qual): > >>1_30_226_F3 > 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 > 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>1_30_252_F3 > 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 > 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 > > The -1 represents those colors that could not be called. Now that is funny (using -1). True PHRED scores are defined with a logarithm and can't be negative. A score of zero is normally used in this situation since that maps to a probability of error of 1 (i.e. the read is 100% wrong, or 0% true). Where did these files come from? Direct from a sequencing machine or via some third party script? Peter Sent from my Verizon Wireless BlackBerry From cjfields at illinois.edu Fri Mar 12 09:06:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 12 Mar 2010 08:06:51 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> References: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Message-ID: For the colorspace fasta we could derive a parser just for that based on the current fasta parser. They could retain their original color space designation (maybe via a meta designation), and possibly convert to sequence calls based on their mapping (if the following link is current): http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf Did the sequencing facility provide the actual sequence, though, and not just the color calls and qual? Seems strange to not provide it... chris On Mar 12, 2010, at 7:43 AM, Ryan Golhar wrote: > Direct from sequencing machine > > ------Original Message------ > From: Peter > Sender: p.j.a.cock at googlemail.com > To: golharam at umdnj.edu > Cc: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Next Gen Formats > Sent: Mar 12, 2010 8:26 AM > > On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: >> >> Here is an example of a color-space sequence: >> >> In one file (something.csfasta): >> >>> 1_30_226_F3 >> T210320010.200.03.0110320320220212200122200.2220200 >>> 1_30_252_F3 >> T322220212.133.00.2202322132022202221002011.0011020 >> >> The '.' means the color could not be called >> >> In another file (something.qual): >> >>> 1_30_226_F3 >> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 >> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>> 1_30_252_F3 >> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 >> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 >> >> The -1 represents those colors that could not be called. > > Now that is funny (using -1). True PHRED scores are defined with a > logarithm and can't be negative. A score of zero is normally used in > this situation since that maps to a probability of error of 1 (i.e. the > read is 100% wrong, or 0% true). > > Where did these files come from? Direct from a sequencing > machine or via some third party script? > > Peter > > > Sent from my Verizon Wireless BlackBerry > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Fri Mar 12 08:09:40 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 12 Mar 2010 08:09:40 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> Message-ID: <4B9A3D14.3010208@umdnj.edu> Here is an example of a color-space sequence: In one file (something.csfasta): >1_30_226_F3 T210320010.200.03.0110320320220212200122200.2220200 >1_30_252_F3 T322220212.133.00.2202322132022202221002011.0011020 The '.' means the color could not be called In another file (something.qual): >1_30_226_F3 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >1_30_252_F3 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 The -1 represents those colors that could not be called. Chris Fields wrote: > On Mar 12, 2010, at 4:06 AM, Peter wrote: > >> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: >>> Ryan, >>> >>> We would have to see example files to get an idea of how feasible it is. >>> You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual >>> stream, and interleave the two somehow. However, BioPerl qual >>> scores are PHRED-based by default, and I'm not sure how color-space >>> data would work within that schematic. >>> >>> chris >> Chris, >> >> I am under the (possibly mistaken) assumption that PHRED scores >> are used for SOLiD color space QUAL files - the key issue is each >> score corresponds to the color call in the color sequence. >> >> Ignoring color-space for a moment, are there BioPerl examples >> of iterating over a pair of sequence-space FASTA and QUAL files? >> i.e. What you'd get if you had a FASTQ file to iterate over. >> >> [I guess Ryan could just merge the color-space FASTA and >> QUAL into a color-space FASTQ file and iterate over that] >> >> Peter > > If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things. > > Iterating over pairs is something that has popped up before. For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like: > > -------------------------------- > my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq'); > my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta'); > my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual'); > > while (my $seq = $in->next_seq) { > $out1->write_fasta($seq); > $out2->write_fasta($seq); > } > -------------------------------- > > Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'. This should work for those as well, just haven't tried it myself (it's a bug otherwise). > > I'm assuming for input it would be something like: > > -------------------------------- > my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta'); > my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual'); > my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq'); > > # 'qual' parser joins the two streams > while (my $seq = $in2->next_seq($in1)) { > $out->write_seq($seq); > } > -------------------------------- > > chris > > From pmiguel at purdue.edu Fri Mar 12 09:56:33 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 12 Mar 2010 09:56:33 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: References: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Message-ID: <4B9A5621.2020006@purdue.edu> Hi Chris, Converting back and forth from color space is something that would be needed. However, a warning for anyone working with color space data: It is a really bad idea to convert raw color space reads into sequence. This is because conversion propagates from the key base on the left to the right. A sequence error *anywhere* in the sequence will ensure all bases farther down will be converted on the wrong track. Analogous to a "frame shift" -- except there are 4 "frames", not 3. Meanwhile, the converse is not true--sequence space bases can be converted into color space without error propagation. So you want to do all your work in color space and convert to real sequence only at the end, when your consensus certain. A little more detail here: http://seqanswers.com/forums/showthread.php?t=3367 For people wanting to use a non-color space aware program for analysis of color space data, it is possible to use a process called "double encoding", where 0,1,2,3 bases of color space are just replaced with A, C, G, T of a "fake" base space. This is nearly the same as working in color space and does not incur the propagation error issues. However it is fraught with the obvious problems: you might later confuse the double encoded sequence with true sequence space with likely maddening results. Also, to get the opposite strand of color space reads you reverse without complementing. So top and bottom strands will look different. Finally, Kevin McKernan said that the dual base encoding error-detection scheme was technically using "Perforated Convolutional Codes" and said these were used on 3G networks. I only mention this in case there are some engineering types who might be interested. Phillip Chris Fields wrote: > For the colorspace fasta we could derive a parser just for that based on the current fasta parser. They could retain their original color space designation (maybe via a meta designation), and possibly convert to sequence calls based on their mapping (if the following link is current): > > http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf > > Did the sequencing facility provide the actual sequence, though, and not just the color calls and qual? Seems strange to not provide it... > > chris > > On Mar 12, 2010, at 7:43 AM, Ryan Golhar wrote: > > >> Direct from sequencing machine >> >> ------Original Message------ >> From: Peter >> Sender: p.j.a.cock at googlemail.com >> To: golharam at umdnj.edu >> Cc: Chris Fields >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Next Gen Formats >> Sent: Mar 12, 2010 8:26 AM >> >> On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: >> >>> Here is an example of a color-space sequence: >>> >>> In one file (something.csfasta): >>> >>> >>>> 1_30_226_F3 >>>> >>> T210320010.200.03.0110320320220212200122200.2220200 >>> >>>> 1_30_252_F3 >>>> >>> T322220212.133.00.2202322132022202221002011.0011020 >>> >>> The '.' means the color could not be called >>> >>> In another file (something.qual): >>> >>> >>>> 1_30_226_F3 >>>> >>> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 >>> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>> >>>> 1_30_252_F3 >>>> >>> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 >>> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 >>> >>> The -1 represents those colors that could not be called. >>> >> Now that is funny (using -1). True PHRED scores are defined with a >> logarithm and can't be negative. A score of zero is normally used in >> this situation since that maps to a probability of error of 1 (i.e. the >> read is 100% wrong, or 0% true). >> >> Where did these files come from? Direct from a sequencing >> machine or via some third party script? >> >> Peter >> >> >> Sent from my Verizon Wireless BlackBerry >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Fri Mar 12 10:44:35 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 12 Mar 2010 07:44:35 -0800 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: <30E5CA8A-56DE-4764-9A50-DF2E95015216@gmail.com> References: <4B96B442.8070003@bioperl.org> <30E5CA8A-56DE-4764-9A50-DF2E95015216@gmail.com> Message-ID: <4B9A6163.9060407@bioperl.org> I'm sure it does, that what it is supposed to do. I don't know that there is any way to directly get what you want but the code since the format that you want is not a standard multiple-alignment output format. You might consider clustalw format which shows the identical columns with '*' and you can keep the start/stop of the alignment embedded in the sequence names. Or you can extract the code you need that does the writing out of the writer module so you can try and dig out what you need. You're asking for something that is a customized view that is not standard and the tools for it are in the existing code, so it means you need to roll your view own from it. This would just mean another ResultWriter module that looks a lot like the existing one, but doesn't write the header and footer and hit table out - so those methods would just not do anything... -jason Janine Arloth wrote, On 3/12/10 12:40 AM: > Hi, > thanks... > but > > use Bio::SearchIO; > use Bio::SearchIO::Writer::TextResultWriter; > > my $in = Bio::SearchIO->new(-format => 'blast', > -file => shift @ARGV); > > my $writer = Bio::SearchIO::Writer::TextResultWriter->new(); > my $out = Bio::SearchIO->new(-writer => $writer); > $out->write_result($in->next_result); > > gives me the whole result, but I only need the alignment ;( > Am 09.03.2010 um 21:49 schrieb Jason Stajich: > > >> SearchIO writer -> BLAST format. presumably something like Bio::SearchIO::Writer::TextResultWriter >> >> Janine Arloth wrote, On 3/5/10 1:43 AM: >> >>> Hello, >>> using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: >>> >>> 1 50 >>> test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA >>> lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA >>> >>> >>> 51 100 >>> test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA >>> lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA >>> >>> >>> >>> But I prefer this format: >>> >>> >>> >>> Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 >>> >>> Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 >>> ||||||||||||||||||||||||| >>> Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 >>> >>> >>> How can I get this? >>> >>> Best Regards >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > From maj at fortinbras.us Fri Mar 12 10:45:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 12 Mar 2010 10:45:15 -0500 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: References: Message-ID: <31AA49FD0FDD466CB349ABAE75591B26@NewLife> along with Jason's comment then you'll need to loop through the node pairs by hand: my @leaves = $tree->get_leaf_nodes; my @dists; while (my $l = shift @leaves) { foreach my $m (@leaves) { push @dists, $tree->distance( -nodes => [$l, $m] ); } } should give you all n(n-1)/2 pairwise distances. ----- Original Message ----- From: "Jeffrey Detras" To: Sent: Friday, March 05, 2010 1:17 AM Subject: [Bioperl-l] distances between leaf nodes > Hi, > > I am new at using the Bio::TreeIO module specifically using the newick > format for a phylogenetic analysis. The sample_tree attached is > Newick-formatted tree. My objective is to get all the distances between all > the leaf nodes. I copied examples of the code from > http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my > knowledge) so that I understand how to assign the right array value for the > nodes/leaves. The message would say must provide 2 root nodes. > > Here is what I have right now: > > #!/usr/bin/perl -w > use strict; > > my $treefile = 'sample_tree'; > use Bio::TreeIO; > my $treeio = Bio::TreeIO->new(-format => 'newick', > -file => $treefile); > > while (my $tree = $treeio->next_tree) { > my @leaves = $tree->get_leaf_nodes; > for (my $dist = $tree->distance(-nodes => \@leaves)){ > print "Distance between trees is $dist\n"; > } > } > > Thanks, > Jeff > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rtbio.2009 at gmail.com Fri Mar 12 12:36:44 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 12 Mar 2010 18:36:44 +0100 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Hello all, I am trying remote blast program and connecting to NCBI Blast, but I am unable to retrieve the sequences. Chris had suggested me to update from SVN. Could you please tell me how to update it from SVN? Regards, Roopa. On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: > Hi Chris, > > Thank you very much for the information. Could you please tell me how to > update it from SVN? > > Thanks and regards, > Roopa > > > On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: > >> Roopa, >> >> I committed a fix for this a few days ago; if you update from SVN it >> should work. The problem stemmed from server-side changes at NCBI. >> >> chris >> >> On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: >> >> > Hello Mark and everybody, >> > >> > I have been trying to connect to remote blast to retrieve similar >> sequences >> > to a given sequence. But my program is unable to retrieve the sequences >> from >> > BLAST, i.e., it is getting executed till the remote blast ids, but it is >> not >> > entering the else loop after collecting the rid. Please check this >> problem >> > and help me in this regard. I think the problem is in getting the >> sequence >> > and going to the 'else' part. i.e., >> > >> > else { >> > >> > open(OUTFILE,'>',$blastdebugfile); # I think the problem >> is >> > in else part, i.e., it is not taking the next result.# >> > print OUTFILE "else entered"; >> > close(OUTFILE); >> > >> > my $result = $rc->next_result(); >> > >> > #save the output >> > >> > Please give me your reply. >> > >> > Thanks and regards, >> > Roopa. >> > >> > My code is as follows. >> > >> > #!/usr/bin/perl >> > >> > #path for extra camel module >> > use lib "/srv/www/htdocs/rain/RNAi/"; >> > use rnai_blast; >> > >> > >> > use Bio::SearchIO; >> > use Bio::Search::Result::BlastResult; >> > use Bio::Perl; >> > use Bio::Tools::Run::RemoteBlast; >> > use Bio::Seq; >> > use Bio::SeqIO; >> > use Bio::DB::GenBank; >> > >> > $serverpath = "/srv/www/htdocs/rain/RNAi"; >> > $serverurl = "http://141.84.66.66/rain/RNAi"; >> > $outfile = $serverpath."/rnairesult_".time().".html"; >> > $nuc = $serverpath."/nuc".time().".txt"; >> > $debugfile = $serverpath."/debug_".time().".txt"; >> > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> > >> > my $outstring =""; >> > >> > &parse_form; >> > >> > print "Content-type: text/html\n\n"; >> > print "\n"; >> > print "RNAi Result"; >> > print "> > URL=$serverurl/rnairesult_".time().".html\"> \n"; >> > print "\n"; >> > print "\n"; >> > print " Your results will appear > > href=$serverurl/rnairesult_".time().".html>here
"; >> > print " Please be patient, runtime can be up to 5 minutes
"; >> > print " This page will automatically reload in 30 seconds."; >> > print "\n"; >> > print "\n"; >> > >> > defined(my $pid = fork) or die "Can't fork: $!"; >> > exit if $pid; >> > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >> > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >> > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >> > >> > >> > >> > open(OUTFILE, '>',$outfile); >> > >> > print OUTFILE "\n >> > RNAi Result >> > > > URL=$serverurl//rnairesult_".time().".html\"> \n >> > >> > \n >> > \n >> > Your results will appear > > href=$serverurl/rnairesult_".time().".html>here
>> > Please be patient, runtime can be up to 5 minutes
>> > This page will automatically reload in 30 seconds
>> > \n >> > \n"; >> > >> > close(OUTFILE); >> > >> > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >> > >> > $in{'Inputseq'} =~ s/>.*$//m; >> > $in{'Inputseq'} =~ s/[^TAGC]//gim; >> > $in{'Inputseq'} =~ tr/actg/ACTG/; >> > >> > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >> > $in{'Threshold'}); >> > >> > >> > sub blastcode >> > { >> > >> > $inpu1= $_[0]; >> > >> > $organ= $_[1]; >> > >> > open(NUC,'>',$nuc); >> > print NUC $inpu1,"\n"; >> > close(NUC); >> > >> > my $prog = 'blastn'; >> > my $db = 'refseq_rna'; >> > my $e_val= '1e-10'; >> > my $organism= $organ; >> > >> > $gb = new Bio::DB::GenBank; >> > >> > my @params = ( '-prog' => $prog, >> > '-data' => $db, >> > '-expect' => $e_val, >> > '-readmethod' => 'SearchIO', >> > '-Organism' => $organism ); >> > >> > open(OUTFILE,'>',$blastdebugfile); >> > print OUTFILE @params; >> > close(OUTFILE); >> > >> > >> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >> => >> > "$organ\[ORGN]"); >> > >> > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> > >> > #change a paramter >> > >> > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> > Brucei[ORGN]'; >> > >> > #change a paramter >> > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >> '$input2[ORGN]'; >> > >> > my $v = 1; >> > #$v is just to turn on and off the messages >> > >> > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >> > '-organism' => "$organ\[ORGN]"); >> > >> > while (my $input = $str->next_seq()) >> > { >> > #Blast a sequence against a database: >> > #Alternatively, you could pass in a file with many >> > #sequences rather than loop through sequence one at a time >> > #Remove the loop starting 'while (my $input = $str->next_seq())' >> > #and swap the two lines below for an example of that. >> > open(OUTFILE,'>',$debugfile); >> > print OUTFILE $input; >> > close(OUTFILE); >> > >> > #submits the input data to BLAST# >> > >> > my $r = $factory->submit_blast($input); >> > >> > open(OUTFILE,'>',$debugfile); >> > print OUTFILE $r; >> > close(OUTFILE); >> > >> > >> > print STDERR "waiting...." if($v>0); >> > >> > while ( my @rids = $factory->each_rid ) { >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "while entered"; >> > close(OUTFILE); >> > foreach my $rid ( @rids ) { >> > >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "foreach entered"; >> > close(OUTFILE); >> > #Retrieving the result ids# >> > >> > my $rc = $factory->retrieve_blast($rid); >> > >> > if( !ref($rc) ) >> > { >> > if( $rc < 0 ) >> > { >> > $factory->remove_rid($rid); >> > } >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "if entered"; >> > close(OUTFILE); >> > print STDERR "." if ( $v > 0 ); >> > sleep 5; >> > } >> > >> > else { >> > >> > open(OUTFILE,'>',$blastdebugfile); # I think the problem >> is >> > in else part, i.e., it is not taking the next result.# >> > print OUTFILE "else entered"; >> > close(OUTFILE); >> > >> > my $result = $rc->next_result(); >> > >> > #save the output >> > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> > >> > open(BLASTDEBUGFILE,'>',$blastdebugfile); >> > print BLASTDEBUGFILE $result->next_hit(); >> > close(BLASTDEBUGFILE); >> > #saving the output in blastdata.time.out file# >> > >> > # $random=rand(); >> > >> > my $filename = $serverpath."/blastdata_".time()."\.out"; >> > # open(DEBUGFILE,'>',$debugfile); >> > # open(new,'>',$filename); >> > # @arra=; >> > # print DEBUGFILE @arra; >> > # close(DEBUGFILE); >> > # close(new); >> > >> > $factory->save_output($filename); >> > >> > # open(BLASTDEBUGFILE,'>',$debugfile); >> > # print BLASTDEBUGFILE "Hello $rid"; >> > # close(BLASTDEBUGFILE); >> > >> > $factory->remove_rid($rid); >> > >> > open(BLASTDEBUGFILE,'>',$blastdebugfile); >> > # print BLASTDEBUGFILE $organism; >> > close(BLASTDEBUGFILE); >> > >> > # open(OUTFILE,'>',$outfile); >> > # print OUTFILE "Test2 $result->database_name()"; >> > # close(OUTFILE); >> > >> > #$hit = $result->next_hit; >> > #open(new,'>',$debugfile); >> > #print $hit; >> > #close(new); >> > $dummy=0; >> > while ( my $hit = $result->next_hit ) { >> > >> > next unless ( $v >= 0); >> > >> > # open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "$hit in while hits"; >> > # close(OUTFILE); >> > >> > my $sequ = $gb->get_Seq_by_version($hit->name); >> > my $dna = $sequ->seq(); # get the sequence as a string >> > $dummy++; >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE $dna; >> > close(OUTFILE); >> > push(@seqs,$dna); >> > } >> > } >> > } >> > } >> > } >> > >> > $warum=@seqs; >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE $warum; >> > print OUTFILE @seqs; >> > close(OUTFILE); >> > >> > >> > return(@seqs); #returning the sequences obtained on BLAST# >> > } >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From bosborne11 at verizon.net Fri Mar 12 12:46:52 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 12 Mar 2010 12:46:52 -0500 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Please google "svn update bioperl". On Mar 12, 2010, at 12:36 PM, Roopa Raghuveer wrote: > Hello all, > > I am trying remote blast program and connecting to NCBI Blast, but I am > unable to retrieve the sequences. Chris had suggested me to update from SVN. > Could you please tell me how to update it from SVN? > > Regards, > Roopa. > > On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: > >> Hi Chris, >> >> Thank you very much for the information. Could you please tell me how to >> update it from SVN? >> >> Thanks and regards, >> Roopa >> >> >> On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: >> >>> Roopa, >>> >>> I committed a fix for this a few days ago; if you update from SVN it >>> should work. The problem stemmed from server-side changes at NCBI. >>> >>> chris >>> >>> On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: >>> >>>> Hello Mark and everybody, >>>> >>>> I have been trying to connect to remote blast to retrieve similar >>> sequences >>>> to a given sequence. But my program is unable to retrieve the sequences >>> from >>>> BLAST, i.e., it is getting executed till the remote blast ids, but it is >>> not >>>> entering the else loop after collecting the rid. Please check this >>> problem >>>> and help me in this regard. I think the problem is in getting the >>> sequence >>>> and going to the 'else' part. i.e., >>>> >>>> else { >>>> >>>> open(OUTFILE,'>',$blastdebugfile); # I think the problem >>> is >>>> in else part, i.e., it is not taking the next result.# >>>> print OUTFILE "else entered"; >>>> close(OUTFILE); >>>> >>>> my $result = $rc->next_result(); >>>> >>>> #save the output >>>> >>>> Please give me your reply. >>>> >>>> Thanks and regards, >>>> Roopa. >>>> >>>> My code is as follows. >>>> >>>> #!/usr/bin/perl >>>> >>>> #path for extra camel module >>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>> use rnai_blast; >>>> >>>> >>>> use Bio::SearchIO; >>>> use Bio::Search::Result::BlastResult; >>>> use Bio::Perl; >>>> use Bio::Tools::Run::RemoteBlast; >>>> use Bio::Seq; >>>> use Bio::SeqIO; >>>> use Bio::DB::GenBank; >>>> >>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>> $nuc = $serverpath."/nuc".time().".txt"; >>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>> >>>> my $outstring =""; >>>> >>>> &parse_form; >>>> >>>> print "Content-type: text/html\n\n"; >>>> print "\n"; >>>> print "RNAi Result"; >>>> print ">>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>> print "\n"; >>>> print "\n"; >>>> print " Your results will appear >>> href=$serverurl/rnairesult_".time().".html>here
"; >>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>> print " This page will automatically reload in 30 seconds."; >>>> print "\n"; >>>> print "\n"; >>>> >>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>> exit if $pid; >>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>> >>>> >>>> >>>> open(OUTFILE, '>',$outfile); >>>> >>>> print OUTFILE "\n >>>> RNAi Result >>>> >>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>> >>>> \n >>>> \n >>>> Your results will appear >>> href=$serverurl/rnairesult_".time().".html>here
>>>> Please be patient, runtime can be up to 5 minutes
>>>> This page will automatically reload in 30 seconds
>>>> \n >>>> \n"; >>>> >>>> close(OUTFILE); >>>> >>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>> >>>> $in{'Inputseq'} =~ s/>.*$//m; >>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>> >>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>> $in{'Threshold'}); >>>> >>>> >>>> sub blastcode >>>> { >>>> >>>> $inpu1= $_[0]; >>>> >>>> $organ= $_[1]; >>>> >>>> open(NUC,'>',$nuc); >>>> print NUC $inpu1,"\n"; >>>> close(NUC); >>>> >>>> my $prog = 'blastn'; >>>> my $db = 'refseq_rna'; >>>> my $e_val= '1e-10'; >>>> my $organism= $organ; >>>> >>>> $gb = new Bio::DB::GenBank; >>>> >>>> my @params = ( '-prog' => $prog, >>>> '-data' => $db, >>>> '-expect' => $e_val, >>>> '-readmethod' => 'SearchIO', >>>> '-Organism' => $organism ); >>>> >>>> open(OUTFILE,'>',$blastdebugfile); >>>> print OUTFILE @params; >>>> close(OUTFILE); >>>> >>>> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>> => >>>> "$organ\[ORGN]"); >>>> >>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>> >>>> #change a paramter >>>> >>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>> Brucei[ORGN]'; >>>> >>>> #change a paramter >>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>> '$input2[ORGN]'; >>>> >>>> my $v = 1; >>>> #$v is just to turn on and off the messages >>>> >>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>> '-organism' => "$organ\[ORGN]"); >>>> >>>> while (my $input = $str->next_seq()) >>>> { >>>> #Blast a sequence against a database: >>>> #Alternatively, you could pass in a file with many >>>> #sequences rather than loop through sequence one at a time >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>> #and swap the two lines below for an example of that. >>>> open(OUTFILE,'>',$debugfile); >>>> print OUTFILE $input; >>>> close(OUTFILE); >>>> >>>> #submits the input data to BLAST# >>>> >>>> my $r = $factory->submit_blast($input); >>>> >>>> open(OUTFILE,'>',$debugfile); >>>> print OUTFILE $r; >>>> close(OUTFILE); >>>> >>>> >>>> print STDERR "waiting...." if($v>0); >>>> >>>> while ( my @rids = $factory->each_rid ) { >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "while entered"; >>>> close(OUTFILE); >>>> foreach my $rid ( @rids ) { >>>> >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "foreach entered"; >>>> close(OUTFILE); >>>> #Retrieving the result ids# >>>> >>>> my $rc = $factory->retrieve_blast($rid); >>>> >>>> if( !ref($rc) ) >>>> { >>>> if( $rc < 0 ) >>>> { >>>> $factory->remove_rid($rid); >>>> } >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "if entered"; >>>> close(OUTFILE); >>>> print STDERR "." if ( $v > 0 ); >>>> sleep 5; >>>> } >>>> >>>> else { >>>> >>>> open(OUTFILE,'>',$blastdebugfile); # I think the problem >>> is >>>> in else part, i.e., it is not taking the next result.# >>>> print OUTFILE "else entered"; >>>> close(OUTFILE); >>>> >>>> my $result = $rc->next_result(); >>>> >>>> #save the output >>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>> >>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>> print BLASTDEBUGFILE $result->next_hit(); >>>> close(BLASTDEBUGFILE); >>>> #saving the output in blastdata.time.out file# >>>> >>>> # $random=rand(); >>>> >>>> my $filename = $serverpath."/blastdata_".time()."\.out"; >>>> # open(DEBUGFILE,'>',$debugfile); >>>> # open(new,'>',$filename); >>>> # @arra=; >>>> # print DEBUGFILE @arra; >>>> # close(DEBUGFILE); >>>> # close(new); >>>> >>>> $factory->save_output($filename); >>>> >>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>> # print BLASTDEBUGFILE "Hello $rid"; >>>> # close(BLASTDEBUGFILE); >>>> >>>> $factory->remove_rid($rid); >>>> >>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>> # print BLASTDEBUGFILE $organism; >>>> close(BLASTDEBUGFILE); >>>> >>>> # open(OUTFILE,'>',$outfile); >>>> # print OUTFILE "Test2 $result->database_name()"; >>>> # close(OUTFILE); >>>> >>>> #$hit = $result->next_hit; >>>> #open(new,'>',$debugfile); >>>> #print $hit; >>>> #close(new); >>>> $dummy=0; >>>> while ( my $hit = $result->next_hit ) { >>>> >>>> next unless ( $v >= 0); >>>> >>>> # open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "$hit in while hits"; >>>> # close(OUTFILE); >>>> >>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>> my $dna = $sequ->seq(); # get the sequence as a string >>>> $dummy++; >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE $dna; >>>> close(OUTFILE); >>>> push(@seqs,$dna); >>>> } >>>> } >>>> } >>>> } >>>> } >>>> >>>> $warum=@seqs; >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE $warum; >>>> print OUTFILE @seqs; >>>> close(OUTFILE); >>>> >>>> >>>> return(@seqs); #returning the sequences obtained on BLAST# >>>> } >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Mar 12 12:41:23 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 12 Mar 2010 12:41:23 -0500 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Look at http://www.bioperl.org/wiki/Using_Subversion ----- Original Message ----- From: Roopa Raghuveer To: Chris Fields ; Mark A. Jensen ; bioperl-l at lists.open-bio.org Sent: Friday, March 12, 2010 12:36 PM Subject: Re: [Bioperl-l] remoteblast Hello all, I am trying remote blast program and connecting to NCBI Blast, but I am unable to retrieve the sequences. Chris had suggested me to update from SVN. Could you please tell me how to update it from SVN? Regards, Roopa. On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: Hi Chris, Thank you very much for the information. Could you please tell me how to update it from SVN? Thanks and regards, Roopa On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: Roopa, I committed a fix for this a few days ago; if you update from SVN it should work. The problem stemmed from server-side changes at NCBI. chris On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: > Hello Mark and everybody, > > I have been trying to connect to remote blast to retrieve similar sequences > to a given sequence. But my program is unable to retrieve the sequences from > BLAST, i.e., it is getting executed till the remote blast ids, but it is not > entering the else loop after collecting the rid. Please check this problem > and help me in this regard. I think the problem is in getting the sequence > and going to the 'else' part. i.e., > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > > Please give me your reply. > > Thanks and regards, > Roopa. > > My code is as follows. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use rnai_blast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds."; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes
> This page will automatically reload in 30 seconds
> \n > \n"; > > close(OUTFILE); > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$blastdebugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > "$organ\[ORGN]"); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > #submits the input data to BLAST# > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > #Retrieving the result ids# > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > #saving the output in blastdata.time.out file# > > # $random=rand(); > > my $filename = $serverpath."/blastdata_".time()."\.out"; > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > # print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dna; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > close(OUTFILE); > > > return(@seqs); #returning the sequences obtained on BLAST# > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jessica.sun at gmail.com Fri Mar 12 16:28:11 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 12 Mar 2010 16:28:11 -0500 Subject: [Bioperl-l] RefSeq Message-ID: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> I have a question: I have a refseq with NM_ number(mRNA), how can I get the genomic sequences(NT_number) with Bioperl, if it can be done? Thanks -- Jessica Jingping Sun From sidd.basu at gmail.com Sat Mar 13 15:29:52 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Sat, 13 Mar 2010 14:29:52 -0600 Subject: [Bioperl-l] Re: RefSeq In-Reply-To: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> References: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> Message-ID: <20100313202949.GA5621@Macintosh-74.local> The following code works with 1.6.1 of bioperl. It uses eutils and the workflow efetch -> elink -> esummary. #!/usr/bin/perl -w use strict; use Bio::DB::EUtilities; my $id = $ARGV[0] || 'NM_001618'; my $eutils = Bio::DB::EUtilities->new( -eutil => 'esearch', -db => 'nucleotide', -term => $id, -usehistory => 'y' ); my $hist = $eutils->next_History || die "no history\n"; $eutils->reset_parameters( -eutil => 'elink', -db => 'gene', -dbfrom => 'nuccore', -history => $hist ); my ($gene_id) = $eutils->next_LinkSet->get_ids; $eutils->reset_parameters( -eutil => 'esummary', -db => 'gene', -id => $gene_id, ); my ($item) = $eutils->next_DocSum->get_Items_by_name('GenomicInfoType'); print $item->get_contents_by_name('ChrAccVer'), "\n"; -siddhartha On Fri, 12 Mar 2010, Jessica Sun wrote: > I have a question: I have a refseq with NM_ number(mRNA), how can I get > the genomic sequences(NT_number) with Bioperl, if it can be done? > > Thanks > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robby.hones at gmail.com Sat Mar 13 18:57:43 2010 From: robby.hones at gmail.com (robby jhones) Date: Sat, 13 Mar 2010 15:57:43 -0800 Subject: [Bioperl-l] comparing fasta sequences in multiple files Message-ID: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> Dear Group, Can anyone offer advice on comparing multiple fasta sequences in many files. We have 1000's of fasta sequences in individual files of which I would like to fish out and print to a new file (the sequence and ID), ONLY the sequences which appear in at least a few of the files: 3 out of 4 runs, perhaps all 4 runs ( as some are replicates). Is there something out there which would do this? Thanks for your helps >>Robby From sdavis2 at mail.nih.gov Sat Mar 13 19:49:46 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 13 Mar 2010 19:49:46 -0500 Subject: [Bioperl-l] comparing fasta sequences in multiple files In-Reply-To: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> References: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> Message-ID: <264855a01003131649o725cf151i2fe51e948ebfc86d@mail.gmail.com> On Sat, Mar 13, 2010 at 6:57 PM, robby jhones wrote: > Dear Group, > > ?Can anyone offer advice on comparing multiple fasta sequences in many > files. We have 1000's of fasta sequences in individual files of which I > would like to fish out and print to a new file (the sequence and ID), ONLY > the sequences which appear in at least a few of the files: 3 out of 4 runs, > perhaps all 4 runs ( as some are replicates). > > ?Is there something out there which would do this? Hi, Robby. It sounds like making a hash of IDs and then incrementing a count for each as you loop over files would give you what you want? Sean From jessica.sun at gmail.com Sat Mar 13 20:29:08 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Sat, 13 Mar 2010 20:29:08 -0500 Subject: [Bioperl-l] RefSeq In-Reply-To: <20100313202949.GA5621@Macintosh-74.local> References: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> <20100313202949.GA5621@Macintosh-74.local> Message-ID: <9adc0e9b1003131729p4f78aa50kc1500cbbe01cd815@mail.gmail.com> Great. Thanks . On Sat, Mar 13, 2010 at 3:29 PM, Siddhartha Basu wrote: > The following code works with 1.6.1 of bioperl. It uses eutils and the > workflow efetch -> elink -> esummary. > > #!/usr/bin/perl -w > > use strict; > use Bio::DB::EUtilities; > > my $id = $ARGV[0] || 'NM_001618'; > > my $eutils = Bio::DB::EUtilities->new( > -eutil => 'esearch', > -db => 'nucleotide', > -term => $id, > -usehistory => 'y' > ); > > my $hist = $eutils->next_History || die "no history\n"; > > $eutils->reset_parameters( > -eutil => 'elink', > -db => 'gene', > -dbfrom => 'nuccore', > -history => $hist > ); > > my ($gene_id) = $eutils->next_LinkSet->get_ids; > > $eutils->reset_parameters( > -eutil => 'esummary', > -db => 'gene', > -id => $gene_id, > ); > > my ($item) = $eutils->next_DocSum->get_Items_by_name('GenomicInfoType'); > print $item->get_contents_by_name('ChrAccVer'), "\n"; > > -siddhartha > > On Fri, 12 Mar 2010, Jessica Sun wrote: > > > I have a question: I have a refseq with NM_ number(mRNA), how can I get > > the genomic sequences(NT_number) with Bioperl, if it can be done? > > > > Thanks > > > > > > -- > > Jessica Jingping Sun > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun From sdavis2 at mail.nih.gov Sun Mar 14 08:38:15 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 14 Mar 2010 07:38:15 -0500 Subject: [Bioperl-l] comparing fasta sequences in multiple files In-Reply-To: <407ea9d41003132312l755b2d9bm5a9d2ba83017fd02@mail.gmail.com> References: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> <264855a01003131649o725cf151i2fe51e948ebfc86d@mail.gmail.com> <407ea9d41003132312l755b2d9bm5a9d2ba83017fd02@mail.gmail.com> Message-ID: <264855a01003140538m6cee0c27s823e45d02002d200@mail.gmail.com> On Sun, Mar 14, 2010 at 2:12 AM, robby jhones wrote: > I think that I'll need to write a hash of the IDs and sequences, then > iterate over the sequences to see if they are identical and if so push them > and the ID into an output file. I was hoping there was something out there > like this, but I suppose not. Look in the mailing list archives for the last week or so. There was some discussion about generating hashes of sequences; you could use that to generate your hash of unique sequences. Sean > On Sat, Mar 13, 2010 at 4:49 PM, Sean Davis wrote: >> >> On Sat, Mar 13, 2010 at 6:57 PM, robby jhones >> wrote: >> > Dear Group, >> > >> > ?Can anyone offer advice on comparing multiple fasta sequences in many >> > files. We have 1000's of fasta sequences in individual files of which I >> > would like to fish out and print to a new file (the sequence and ID), >> > ONLY >> > the sequences which appear in at least a few of the files: 3 out of 4 >> > runs, >> > perhaps all 4 runs ( as some are replicates). >> > >> > ?Is there something out there which would do this? >> >> Hi, Robby. >> >> It sounds like making a hash of IDs and then incrementing a count for >> each as you loop over files would give you what you want? >> >> Sean > > From lpritc at scri.ac.uk Mon Mar 15 07:55:52 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 15 Mar 2010 11:55:52 +0000 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Message-ID: Hi Scott, Thanks for the reply. I tried your suggestions on a clean VM of CentOS 5.4 and the equally wordy outcome is below... On 02/03/2010 Tuesday, March 2, 16:11, "Scott Cain" wrote: > First, I am working on the 1.1 release of gmod/chado, and it > may fix some of the problems you are describing. Certainly, ID > collisions between GFF files should not be a problem (I didn't think > they were in the 1.0 release, but that was a long time ago). Please > try a checkout of the schema trunk in the gmod svn: > > http://gmod.org/wiki/SVN As a note for anyone following this, when I downloaded the trunk/chado files only, my build failed with """ $make [...] Manifying ../blib/man3/Bio::Chaos::ChaosGraph.3pm Manifying ../blib/man3/Bio::Chaos::FeatureUtil.3pm Manifying ../blib/man3/Bio::Chaos::XSLTHelper.3pm Manifying ../blib/man3/Bio::Chaos::Root.3pm make[1]: Leaving directory `/home/lpritc/Desktop/chado/chaos-xml' make: *** No rule to make target `bin/gmod_gff2biomart5.pl', needed by `blib/script/gmod_gff2biomart5.pl'. Stop. """ I had to download the whole trunk for the installation to work. I came across this thread: http://old.nabble.com/Minor-Makefile.PL-changes-td26272744.html while I was looking for a solution; someone else has had a similar problem. > Another thing you may want to look at is that just last week, a > developer at Texas A&M, Nathan Liles, contributed code to the > bioperl-live trunk for the genbank2gff3.pl script that will do a much > better job of converting bacterial genbank files to GFF3; perhaps that > will help too. Working with a svn checkout of bioperl-live shouldn't > be too scary either; the pieces you are interested in (that work with > Chado and GBrowse) are quite stable. I also checked out BioPerl-live. The svn server at code.open-bio.org was unresponsive for a couple of days, but Peter pointed me to GitHub at http://github.com/bioperl/bioperl-live so I went from there. The process isn't quite as clean as using the latest stable version of BioPerl, however. When I attempt to use the bp_genbank2gff3.pl script, I get the following error message: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk Can't locate object method "FT_SO_map" via package "Bio::SeqFeature::Tools::TypeMapper" at /usr/bin/bp_genbank2gff3.pl line 374. """ This appears to be associated with the following code (l207 onwards...) in TypeMapper: """ =head2 map_types_to_SO [...] hardcodes the genbank to SO mapping [...] dgg: separated out FT_SO_map for caller changes. Update with: open(FTSO,"curl -s http://sequenceontology.org/resources/mapping/FT_SO.txt|"); while(){ chomp; ($ft,$so,$sid,$ftdef,$sodef)= split"\t"; print " '$ft' => '$so',\n" if($ft && $so && $ftdef); } =cut sub ft_so_map { # $self= shift; """ The upper/lower case function declaration seems to be important, as changing it back to "sub FT_SO_map" lets the script work: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk # Input: NC_004547.gbk # working on region:NC_004547, Erwinia carotovora subsp. atroseptica SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, complete genome. # GFF3 saved to ./NC_004547.gbk.gff # Summary: # Feature Count # ------- ----- # repeat_region 19 # sequence_variant 2 # repeat_unit 2 # gene 4614 # region 17387 # exon 4597 # RESIDUES 5064019 # """ Obviously, this is another unsatsifactory sucky ad hoc post-install hack; I hope I'm doing the right sort of thing, there. I'm not familiar with BioPerl so I'm not clear on why this change was made to the interface (it's part of the recent changes by Nathan Liles you referred to in your post: http://github.com/bioperl/bioperl-live/commit/18dae5436130c7c77e31120af1a37d dcd8a77a03), but it also seems to break bp_genbank2gff3.pl. Also, the --noCDS flag appears to have no effect at all when using the new version of bp_genbank2gff3.pl. The old version of bp_genbank2gff3.pl appears to recognise more feature types in the summary: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk # Input: NC_004547.gbk # working on region:NC_004547, Erwinia carotovora subsp. atroseptica SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, complete genome. # GFF3 saved to ./NC_004547.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4472 # sequence_variant 2 # gene 4594 # region 8275 # pseudogene 20 # CDS 4472 # RESIDUES(tr) 1433791 # RESIDUES 5064019 # rRNA 22 # processed_transcript 24 # repeat_region 19 # pseudogenic_region 46 # repeat_unit 2 # exon 4597 # tRNA 76 # """ and this is reflected in the substantial difference in GFF3 output, for issuing exactly the same command when moving from BioPerl 1.6.1 to bioperl-live: we get different GFF3 output that represents a different gene model. I wasn't expecting so radical a change, but at least the IDs are based on the locus_tag with the new script, and this appears to solve my problem with clashing feature IDs on the files I was using. Many thanks for your help, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From invite+m4r54agn at facebookmail.com Mon Mar 15 09:13:29 2010 From: invite+m4r54agn at facebookmail.com (Animesh Sharma) Date: Mon, 15 Mar 2010 06:13:29 -0700 Subject: [Bioperl-l] =?utf-8?b?4KSu4KWH4KSw4KWAIEZhY2Vib29rIOCkquCljQ==?= =?utf-8?b?4KSw4KWL4KSr4KS84KS+4KSH4KSyIOCkpuClh+CkluClh+Ckgg==?= Message-ID: ??????? ????? Facebook ??????? ???? ?? ???? ?? ??? ???? ?????, ??????, ?? ??????? ????? ?? ????/???? ??? ?? ??? ???? ????? ?? ??? ??? ????? ???? ?????/????? ??? ???? ?? ?? ?????? ??? ????. ???? ???? ?? Facebook ?? ?????! ?? ??? ?? Facebook ?? ???? ????, ?? ?? ?? ???? Facebook ????????? ??? ???? ???. ??????? Animesh Facebook ?? ???? ?? ???? ?? ??? ???? ??? ?? ???? ?? ???? ????: http://www.facebook.com/p.php?i=533710399&k=53F2X5TR3TXF4BGFSBYVPVW2UPKK65&r Already have an account? Add this email address to your account http://www.facebook.com/n/?merge_accounts.php&e=bioperl-l at portal.open-bio.org&c=b3e84a2fc8af2503660e52d1ee5449c1.Animesh Sharma ?? Facebook ?? ????? ???? ?? ??? bioperl-l at portal.open-bio.org ????? ???. ??? ?????? ??? ?? Facebook ?? ?? ?????? ?? ???? ??????? ? ???? ????? ??? ?? ????? ???????????? ???? ?? ??? ???? ??? ?? ???? ?? ????? ????. http://www.facebook.com/o.php?k=3cf837&u=612036206&mid=2082fa6G247aee6eG0G8 Facebook ?? ????? 1601 S. California Ave., Palo Alto, CA 94304 ??? ????? ??. From scott at scottcain.net Mon Mar 15 10:55:17 2010 From: scott at scottcain.net (Scott Cain) Date: Mon, 15 Mar 2010 10:55:17 -0400 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: References: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Message-ID: <4536f7701003150755w2c2875fbob004bc03cf3387ab@mail.gmail.com> Hi Leighton, Thanks for the feedback both on getting chado installed from svn and on the genbank2gff3 converter. About installing Chado from svn, I thought I'd modified the Makefile.PL script to gracefully survive not having the GMODtools directory present; I guess I'll have to revisit that. Since I probably won't get to it today, I created a bug report for it: https://sourceforge.net/tracker/?func=detail&aid=2970687&group_id=27707&atid=391291 About the genbank2gff3 script, I'm cc'ing Nathan to make sure he sees your comments. Thanks, Scott On Mon, Mar 15, 2010 at 7:55 AM, Leighton Pritchard wrote: > Hi Scott, > > Thanks for the reply. ?I tried your suggestions on a clean VM of CentOS 5.4 > and the equally wordy outcome is below... > > On 02/03/2010 Tuesday, March 2, 16:11, "Scott Cain" > wrote: > >> First, I am working on the 1.1 release of gmod/chado, and it >> may fix some of the problems you are describing. ?Certainly, ID >> collisions between GFF files should not be a problem (I didn't think >> they were in the 1.0 release, but that was a long time ago). ?Please >> try a checkout of the schema trunk in the gmod svn: >> >> ? http://gmod.org/wiki/SVN > > As a note for anyone following this, when I downloaded the trunk/chado files > only, my build failed with > > """ > $make > [...] > Manifying ../blib/man3/Bio::Chaos::ChaosGraph.3pm > Manifying ../blib/man3/Bio::Chaos::FeatureUtil.3pm > Manifying ../blib/man3/Bio::Chaos::XSLTHelper.3pm > Manifying ../blib/man3/Bio::Chaos::Root.3pm > make[1]: Leaving directory `/home/lpritc/Desktop/chado/chaos-xml' > make: *** No rule to make target `bin/gmod_gff2biomart5.pl', needed by > `blib/script/gmod_gff2biomart5.pl'. ?Stop. > """ > > I had to download the whole trunk for the installation to work. ?I came > across this thread: > http://old.nabble.com/Minor-Makefile.PL-changes-td26272744.html > > while I was looking for a solution; someone else has had a similar problem. > >> Another thing you may want to look at is that just last week, a >> developer at Texas A&M, Nathan Liles, contributed code to the >> bioperl-live trunk for the genbank2gff3.pl script that will do a much >> better job of converting bacterial genbank files to GFF3; perhaps that >> will help too. ?Working with a svn checkout of bioperl-live shouldn't >> be too scary either; the pieces you are interested in (that work with >> Chado and GBrowse) are quite stable. > > I also checked out BioPerl-live. ?The svn server at code.open-bio.org was > unresponsive for a couple of days, but Peter pointed me to GitHub at > http://github.com/bioperl/bioperl-live so I went from there. ?The process > isn't quite as clean as using the latest stable version of BioPerl, however. > > When I attempt to use the bp_genbank2gff3.pl script, I get the following > error message: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > Can't locate object method "FT_SO_map" via package > "Bio::SeqFeature::Tools::TypeMapper" at /usr/bin/bp_genbank2gff3.pl line > 374. > """ > > This appears to be associated with the following code (l207 onwards...) in > TypeMapper: > > """ > =head2 map_types_to_SO > > [...] > > hardcodes the genbank to SO mapping > > [...] > dgg: separated out FT_SO_map for caller changes. Update with: > > ?open(FTSO,"curl -s > http://sequenceontology.org/resources/mapping/FT_SO.txt|"); > ?while(){ > ? ?chomp; ($ft,$so,$sid,$ftdef,$sodef)= split"\t"; > ? ?print " ? ? '$ft' => '$so',\n" if($ft && $so && $ftdef); > ?} > > =cut > > sub ft_so_map ?{ > ?# $self= shift; > """ > > The upper/lower case function declaration seems to be important, as changing > it back to "sub FT_SO_map" lets the script work: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > # Input: NC_004547.gbk > # working on region:NC_004547, Erwinia carotovora subsp. atroseptica > SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, > complete genome. > # GFF3 saved to ./NC_004547.gbk.gff > # Summary: > # Feature ? ? ? Count > # ------- ? ? ? ----- > # repeat_region ?19 > # sequence_variant ?2 > # repeat_unit ?2 > # gene ?4614 > # region ?17387 > # exon ?4597 > # RESIDUES ?5064019 > # > """ > > Obviously, this is another unsatsifactory sucky ad hoc post-install hack; I > hope I'm doing the right sort of thing, there. ?I'm not familiar with > BioPerl so I'm not clear on why this change was made to the interface (it's > part of the recent changes by Nathan Liles you referred to in your post: > http://github.com/bioperl/bioperl-live/commit/18dae5436130c7c77e31120af1a37d > dcd8a77a03), but it also seems to break bp_genbank2gff3.pl. ?Also, the > --noCDS flag appears to have no effect at all when using the new version of > bp_genbank2gff3.pl. > > The old version of bp_genbank2gff3.pl appears to recognise more feature > types in the summary: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > # Input: NC_004547.gbk > # working on region:NC_004547, Erwinia carotovora subsp. atroseptica > SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, > complete genome. > # GFF3 saved to ./NC_004547.gbk.gff > # Summary: > # Feature ? ? ? Count > # ------- ? ? ? ----- > # mRNA ?4472 > # sequence_variant ?2 > # gene ?4594 > # region ?8275 > # pseudogene ?20 > # CDS ?4472 > # RESIDUES(tr) ?1433791 > # RESIDUES ?5064019 > # rRNA ?22 > # processed_transcript ?24 > # repeat_region ?19 > # pseudogenic_region ?46 > # repeat_unit ?2 > # exon ?4597 > # tRNA ?76 > # > """ > > and this is reflected in the substantial difference in GFF3 output, for > issuing exactly the same command when moving from BioPerl 1.6.1 to > bioperl-live: we get different GFF3 output that represents a different gene > model. ?I wasn't expecting so radical a change, but at least the IDs are > based on the locus_tag with the new script, and this appears to solve my > problem with clashing feature IDs on the files I was using. > > Many thanks for your help, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From kiekyon.huang at gmail.com Mon Mar 15 11:44:13 2010 From: kiekyon.huang at gmail.com (kiekyon.huang at gmail.com) Date: Mon, 15 Mar 2010 15:44:13 +0000 Subject: [Bioperl-l] Taxonomy report Message-ID: <0016e64be064b8211f0481d8c02d@google.com> Hi, just like to know if there is there any way to generate the taxonomy report from the standalone blast output? thanks From cjfields at illinois.edu Mon Mar 15 11:57:29 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 15 Mar 2010 10:57:29 -0500 Subject: [Bioperl-l] Taxonomy report In-Reply-To: <0016e64be064b8211f0481d8c02d@google.com> References: <0016e64be064b8211f0481d8c02d@google.com> Message-ID: <53CE22BE-38F4-4EC6-80A9-37228A9CF602@illinois.edu> Not that I know of, at least not w/o doing some mapping (the tax report is generated on NCBI's servers last I recall). chris On Mar 15, 2010, at 10:44 AM, kiekyon.huang at gmail.com wrote: > Hi, > > just like to know if there is there any way to generate the taxonomy report from the standalone blast output? > > thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Mar 15 13:11:27 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 15 Mar 2010 10:11:27 -0700 Subject: [Bioperl-l] getting strand from Bio::Align::AlignI ?? In-Reply-To: <8425A547-149B-41F5-B4DB-A58C9E92B373@mail.nih.gov> References: <8425A547-149B-41F5-B4DB-A58C9E92B373@mail.nih.gov> Message-ID: <4B9E6A3F.6080104@bioperl.org> Did you start with Bio::SearchIO object and call get_aln on the HSP object? Strand is available from the $hsp->query->strand and $hsp->hit->strand and Bio::SearchIO is the preferred way of parsing pairwise alignment reports. Either way the sequences themselves have strands not the alignment. Each sequence should have a strand $seq->strand since they are Bio::LocatableSeq objects. for my $seq ( $aln->each_seq ) { print $seq->id, " ", $seq->strand, "\n"; } -jason Joan Pontius wrote, On 3/15/10 8:49 AM: > I am looking into using Bio::Align::AlignI for an application that > uses blast2seq > and can't figure out how to get the strand of an alignment? > > Thanks in advance > > > > Joan Pontius-Contractor SAIC > Laboratory of Genomic Diversity > Bldg 560-NCI > Frederick Maryland 21702 > phone (301)846-1761 > fax (301) 846-1686 From cjfields1 at gmail.com Mon Mar 15 14:57:08 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Mon, 15 Mar 2010 13:57:08 -0500 Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <6C998BD2392E4BF594F041368D9456E4@BlackJack> References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> Message-ID: <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> Francisco, In general, please address any questions directly to the bioperl mail list, in case I can't respond. The anon. svn on code.open-bio.org is down at the moment. OBF support knows about this problem and it's being addressed. There is a github mirror of the repos in case this happens: http://github.com/bioperl chris On Mar 15, 2010, at 10:38 AM, Francisco J. Ossand?n wrote: > Hello Chris Fields, > I have posted before in the Bugzilla about Bioperl bugs, but this time is about the Bioperl SVN. It has been several days since I could connect to the SVN for the last time (tried from different locations). I can't connect directly (svn://code.open-bio.org/bioperl/bioperl-live/trunk) nor using the http link provided in the wiki (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > There has been some change in the SVN address or configuration that I should update? I have seen devs posting in the Bugzilla about submitted revisions to the SVN, so I guess that it is working, but I still can't connect to it. > > I hope that you can help me with this. > > Regards, > > -- > Francisco J. Ossandon > Bioinformatician. > Ph.D. Student, University Andres Bello. > Center for Bioinformatics and Genome Biology, > Fundacion Ciencia para la Vida. > Santiago, Chile. > www.cienciavida.cl/CBGB.htm From hlapp at drycafe.net Tue Mar 16 16:03:50 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 16 Mar 2010 16:03:50 -0400 Subject: [Bioperl-l] [OT] Job opportunity: Training coordinator and Bioinformatics Project Manager Message-ID: <0CDDCED9-266E-4CCE-8240-D7E2C8522784@drycafe.net> Hi all - first off, sorry for the cross-posting, we're trying to advertise this as widely as possible. Second, apologies if this is committing an offense and considered spam. I thought though that there might be some people around here who may be interested and suitable. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== A unique position is available for a training coordinator and bioinformatics project manager at the U.S. National Evolutionary Synthesis Center in Durham, North Carolina (NESCent, http:// nescent.org). NESCent is a National Science Foundation funded research center managed by Duke University, the University of North Carolina at Chapel Hill and North Carolina State University on behalf of the international evolutionary biology community. NESCent facilitates synthetic research by bringing together diverse expertise, data, tools and concepts (Sidlauskas et al. 2009). In addition to a resident population of 20-30 scientists, the Center hosts over 800 visitors a year. An informatics staff is on-site to support resident and visiting scientists? needs in high-performance computing, electronic collaboration, scientific software and databases; this includes custom software development for a limited number of high- impact projects. NESCent?s informatics training program includes a rotating series of open-application summer courses, ad-hoc short courses for resident scientists, and remote internships (including past participation in the Google Summer of Code). The training coordinator and bioinformatics project manager will provide oversight to the Center?s training activities. The incumbent will also serve as the interface between scientists and software developers at NESCent. The position provides extensive opportunities for collaboration and intellectual engagement with both NESCent- sponsored scientists and informatics staff; however, this is not an independent research position. The incumbent will report to the Director, while overseeing the work of a small informatics team and coordinating activities among the Center?s science, education and informatics programs. Responsibilities: ? 50% - Consult with sponsored scientists (including scientists in residence and working group participants) about informatics resources and needs. Manage software product development by gathering requirements from scientists, participating in conceptual design, monitoring implementation progress and product quality, facilitating communication between software developers and scientists, and researching software solutions. ? 25% - Oversee NESCent?s course curriculum by identifying opportunities for onsite or online informatics courses that satisfy demand for advanced training of resident and visiting scientists, recruiting instructors, providing guidance to instructors in developing course syllabi, coordinating logistical and technical support requirements, conducting assessments, and serving as a liaison to course organizers at other institutions. ? 25% - Assisting in the management of NESCent?s summer informatics intern program, by coordinating the recruitment, application & review process for students, communicating expectations to students and mentors, monitoring student progress, documenting student outcomes, and performing assessments. Education: Required: M.S. in Biology, Bioinformatics, or a related field. Preferred: Ph.D. and two years postdoctoral experience in evolutionary biology, or an equivalent combination of relevant education and/or experience. Experience: Required: Excellent communication, interpersonal, and organizational skills. Experience with computationally oriented scientific research. Preferred: At least two years in development of databases and open source software. Organization, coordination, development and delivery of courses and workshops appropriate for graduate-level participants. Terms of Employment: Salary will be competitive and commensurate with experience. As a full-time employee, the incumbent will receive Duke University?s benefits package (http://hr.duke.edu/benefits/main.html). The position is available immediately and will remain open until filled. The position is currently funded through November 2014, contingent on annual renewal of the Center by the NSF. How to Apply: Please send a C.V., including contact information for three references, and a brief statement of interest to Allen Rodrigo, Director, NESCent, at a.rodrigo at nescent.org. Inquiries about suitability for the position are welcome. Duke University is an Equal Opportunity/Affirmative Action employer. Additional information about NESCent: http://www.nescent.org References: Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall LW, Price S, Scherle R, Spaeth PA, Kidd DM (2009) Linking Big: The Continuing Promise of Evolutionary Synthesis. Evolution. http://dx.doi.org/10.1111/j.1558-5646.2009.00892.x From hartzell at alerce.com Tue Mar 16 19:35:13 2010 From: hartzell at alerce.com (George Hartzell) Date: Tue, 16 Mar 2010 16:35:13 -0700 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check Message-ID: <19360.5553.985550.996751@gargle.gargle.HOWL> Apologies if this is as silly of a question as it seems, I think that I must just be decaffeinated this morning.... I'm cleaning up some modules and would like to express a dependency on BioPerl-run version 1.6.1. For the main bioperl I use Bio::Root::Version and 1.006001. That works, although the course of investigating below I found that Bio::Root::RootI (which uses BR::Version) doesn't. A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) use Bio::Root::Version and thereby acquire a reasonable version number but: a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency when I want bioperl-run c) it's funny that PiseWorkflow uses Bio::Root::Version (which imports a $VERSION into it's package) then goes on to set one itself. b) there's something hinky going on, when I do 'perl Build.PL' on my Task it doesn't think that PiseWorkflow is up to date (it thinks I have version (0) if I understand correctly), but when I './Build installdeps' everything appears up to date. It looks like the trickiness of assigning $Bio::Root::Version::VERSION to $VERSION confuses Module::Build::ModuleInfo::_evaluate_version_line and the result is that VERSION appears to be 0. What's The Right Thing to do? Thanks, g. From maj at fortinbras.us Wed Mar 17 10:41:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Mar 2010 10:41:00 -0400 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check In-Reply-To: <19360.5553.985550.996751@gargle.gargle.HOWL> References: <19360.5553.985550.996751@gargle.gargle.HOWL> Message-ID: I'd say the RTTD would be to submit a bugzilla report; this sounds pretty fishy to me--(esp since the Pise stuff is deprecated, IIRC) cheers MAJ ----- Original Message ----- From: "George Hartzell" To: "bioperl-l List" Sent: Tuesday, March 16, 2010 7:35 PM Subject: [Bioperl-l] What's to depend on for BioPerl-run version check > > Apologies if this is as silly of a question as it seems, I think that > I must just be decaffeinated this morning.... > > I'm cleaning up some modules and would like to express a dependency on > BioPerl-run version 1.6.1. > > For the main bioperl I use Bio::Root::Version and 1.006001. That > works, although the course of investigating below I found that > Bio::Root::RootI (which uses BR::Version) doesn't. > > A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) > use Bio::Root::Version and thereby acquire a reasonable version number > but: > > a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency > when I want bioperl-run > c) it's funny that PiseWorkflow uses Bio::Root::Version (which > imports a $VERSION into it's package) then goes on to set one > itself. > b) there's something hinky going on, when I do 'perl Build.PL' on my > Task it doesn't think that PiseWorkflow is up to date (it thinks > I have version (0) if I understand correctly), but when I > './Build installdeps' everything appears up to date. > > It looks like the trickiness of assigning > $Bio::Root::Version::VERSION to $VERSION confuses > Module::Build::ModuleInfo::_evaluate_version_line and the result > is that VERSION appears to be 0. > > What's The Right Thing to do? > > Thanks, > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From janine.arloth at googlemail.com Mon Mar 15 04:15:50 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Mon, 15 Mar 2010 09:15:50 +0100 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: Hello, exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) Best regards From cjfields at illinois.edu Wed Mar 17 11:13:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Mar 2010 10:13:20 -0500 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check In-Reply-To: References: <19360.5553.985550.996751@gargle.gargle.HOWL> Message-ID: <32C28662-BD24-4270-A0B6-71CEB459172C@illinois.edu> What is probably the best thing to do is set up a stub module for each of the subdistributions that contains a proper version to match against. So, for BioPerl-Run, use Bio::Run or Bio::Tools::Run, BioPerl-DB use Bio::DB, etc. Distribution-specific general documentation would go in those stub modules. I sort of started this, with the first alphas but didn't get around to finishing it up. Just as a footnote, the universal $VERSION thingy was set up quite a while ago, prior to perl 5.8 I believe, and doesn't play very well with $VERSION (and version.pm) on newer perl versions. Once we move beyond 1.6.x towards breaking things up we'll have to assign new VERSIONs to anything released independently on CPAN, anyway, so this may eventually be a moot point. chris The inherited $VERSION thingy was set up a while back, basically as a way of assigning a common version across BioPerl. On Mar 17, 2010, at 9:41 AM, Mark A. Jensen wrote: > I'd say the RTTD would be to submit a bugzilla report; this sounds pretty fishy > to me--(esp since the Pise stuff is deprecated, IIRC) cheers MAJ > ----- Original Message ----- From: "George Hartzell" > To: "bioperl-l List" > Sent: Tuesday, March 16, 2010 7:35 PM > Subject: [Bioperl-l] What's to depend on for BioPerl-run version check > > >> Apologies if this is as silly of a question as it seems, I think that >> I must just be decaffeinated this morning.... >> I'm cleaning up some modules and would like to express a dependency on >> BioPerl-run version 1.6.1. >> For the main bioperl I use Bio::Root::Version and 1.006001. That >> works, although the course of investigating below I found that >> Bio::Root::RootI (which uses BR::Version) doesn't. >> A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) >> use Bio::Root::Version and thereby acquire a reasonable version number >> but: >> a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency >> when I want bioperl-run >> c) it's funny that PiseWorkflow uses Bio::Root::Version (which >> imports a $VERSION into it's package) then goes on to set one >> itself. >> b) there's something hinky going on, when I do 'perl Build.PL' on my >> Task it doesn't think that PiseWorkflow is up to date (it thinks >> I have version (0) if I understand correctly), but when I >> './Build installdeps' everything appears up to date. >> It looks like the trickiness of assigning >> $Bio::Root::Version::VERSION to $VERSION confuses >> Module::Build::ModuleInfo::_evaluate_version_line and the result >> is that VERSION appears to be 0. >> What's The Right Thing to do? >> Thanks, >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robfsouza at gmail.com Wed Mar 17 11:20:21 2010 From: robfsouza at gmail.com (robfsouza) Date: Wed, 17 Mar 2010 08:20:21 -0700 (PDT) Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> Message-ID: <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> Hi Chris, Any idea when the SVN is going to be fixed? I could not find tar.gz or other download methods in github... Robson On Mar 15, 2:57?pm, Christopher Fields wrote: > Francisco, > > In general, please address any questions directly to the bioperl mail list, in case I can't respond. ? > > The anon. svn on code.open-bio.org is down at the moment. ?OBF support knows about this problem and it's being addressed. ?There is a github mirror of the repos in case this happens: > > http://github.com/bioperl > > chris > > On Mar 15, 2010, at 10:38 AM, Francisco J. Ossand?n wrote: > > > > > Hello Chris Fields, > > I have posted before in the Bugzilla about Bioperl bugs, but this time is about the Bioperl SVN. It has been several days since I could connect to the SVN for the last time (tried from different locations). I can't connect directly (svn://code.open-bio.org/bioperl/bioperl-live/trunk) nor using the http link provided in the wiki (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > > There has been some change in the SVN address or configuration that I should update? I have seen devs posting in the Bugzilla about submitted revisions to the SVN, so I guess that it is working, but I still can't connect to it. > > > I hope that you can help me with this. > > > Regards, > > > -- > > Francisco J. Ossandon > > Bioinformatician. > > Ph.D. Student, University Andres Bello. > > Center for Bioinformatics and Genome Biology, > > Fundacion Ciencia para la Vida. > > Santiago, Chile. > >www.cienciavida.cl/CBGB.htm > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From adsj at novozymes.com Wed Mar 17 12:00:34 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 17 Mar 2010 17:00:34 +0100 Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> (robfsouza@gmail.com's message of "Wed, 17 Mar 2010 08:20:21 -0700 (PDT)") References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> Message-ID: <874okfsztp.fsf@topper.koldfront.dk> On Wed, 17 Mar 2010 08:20:21 -0700 (PDT), robfsouza wrote: > Any idea when the SVN is going to be fixed? I could not find tar.gz or > other download methods in github... If you don't want to "git clone http://github.com/bioperl/bioperl-live.git", you can click on the "Download source" link in the upper right corner of http://github.com/bioperl/bioperl-live and you'll get to choose between downloading tar or zip. Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Wed Mar 17 12:12:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Mar 2010 11:12:42 -0500 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: <53EECF69-E9CE-4619-BE0A-97BE55754D8E@illinois.edu> Janine, How would you go about doing that from the BLAST report alone (which doesn't store the whole sequence)? Unless you know something I don't, you'll need to pull the unique identifier for the sequence from the hit object while parsgin the report and grab the seq from a local or remote database (or use fastacmd or it's equivalent in blast+). chris On Mar 15, 2010, at 3:15 AM, Janine Arloth wrote: > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 15:48:27 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 08:48:27 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > Sent: Monday, 15 March 2010 9:16 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the > hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From michael.watson at bbsrc.ac.uk Wed Mar 17 16:47:57 2010 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 17 Mar 2010 20:47:57 +0000 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> Message-ID: <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> I think that relies on the blast database being built with the "-o T" option, which is not the default for formatdb.... ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] On Behalf Of Smithies, Russell [Russell.Smithies at agresearch.co.nz] Sent: 17 March 2010 19:48 To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > Sent: Monday, 15 March 2010 9:16 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the > hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 17:07:29 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 10:07:29 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A725D@exchsth.agresearch.co.nz> Precompiled databases from NCBI are built with "-o T" but when building them yourself, the default is "-o F". We build all ours with "-o T" as we have some extra stuff built into our to retrieve sequences for all your blast hits. Here's an example of our sequence retrieval: https://isgcdata.agresearch.co.nz/cgi-bin/blast_results.py?filename=xCW3ez7FU46qvpKNTGNu9ZXnw&submit_time=1268859815.54&database=isgcdata_raw --Russell > -----Original Message----- > From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] > Sent: Thursday, 18 March 2010 9:48 a.m. > To: Smithies, Russell; 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' > Subject: RE: [Bioperl-l] SearchIO, StandAloneBlastPlus > > I think that relies on the blast database being built with the "-o T" > option, which is not the default for formatdb.... > ________________________________________ > From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Smithies, Russell > [Russell.Smithies at agresearch.co.nz] > Sent: 17 March 2010 19:48 > To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus > > If you're running blast locally, use fastacmd to extract the sequences > from the blast database. > Eg fastacmd -d nr -S AC147927 > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > > Sent: Monday, 15 March 2010 9:16 p.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > > > Hello, > > > > exists a possibility to get/extract the whole hit sequences? (Not only > the > > hit string from the alignment with $hsp->$hit_string;) > > > > Best regards > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 17:53:38 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 10:53:38 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <99D9C34C-655F-4BBC-AD01-83E2EC837317@gmail.com> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> <18DF7D20DFEC044098A1062202F5FFF32C6E2A725D@exchsth.agresearch.co.nz> <99D9C34C-655F-4BBC-AD01-83E2EC837317@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A72BD@exchsth.agresearch.co.nz> It's all a bit complicated as this page is on a public site but our blast server is internal and restricted so there's no direct communication between them. The public site takes the data from the blast requect and writes it to a template file then puts it in a folder that the internal blast server checks every 10 seconds. When a new request is found, it does the blast , creates the image and map with Bio::Graphics, then transfers it to a folder on the public server. As a sneaky bodge so I don't have to transfer the image, it's base64 encoded in the html then stripped out later. The blast result page keeps refreshing until it sees the required result has returned then displays the page. It sounds a bit odd but as blast runs on one of our main servers, we don't want anyone to be able to "accidently" run commands on it - no one has hacked our servers yet :) There's some good stuff in the BioPerl howtos http://www.bioperl.org/wiki/HOWTO:Graphics and http://www.bioperl.org/wiki/HOWTO:SearchIO Bio::SearchIO::Writer::HTMLResultWriter can be quite useful though ours is html-ized 'manually' as it's streamed through a post-processing script. --Russell From: Janine Arloth [mailto:janine.arloth at googlemail.com] Sent: Thursday, 18 March 2010 10:33 a.m. To: Smithies, Russell Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus Thank you very much. Can I ask you, how you get the figure in the blast output (blastmap)? I use use Bio::Graphics; But i did not see how to create this figure? Best Regards Am 17.03.2010 um 22:07 schrieb Smithies, Russell: Precompiled databases from NCBI are built with "-o T" but when building them yourself, the default is "-o F". We build all ours with "-o T" as we have some extra stuff built into our to retrieve sequences for all your blast hits. Here's an example of our sequence retrieval: https://isgcdata.agresearch.co.nz/cgi-bin/blast_results.py?filename=xCW3ez7FU46qvpKNTGNu9ZXnw&submit_time=1268859815.54&database=isgcdata_raw --Russell -----Original Message----- From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Sent: Thursday, 18 March 2010 9:48 a.m. To: Smithies, Russell; 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: RE: [Bioperl-l] SearchIO, StandAloneBlastPlus I think that relies on the blast database being built with the "-o T" option, which is not the default for formatdb.... ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open- bio.org] On Behalf Of Smithies, Russell [Russell.Smithies at agresearch.co.nz] Sent: 17 March 2010 19:48 To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- bounces at lists.open-bio.org] On Behalf Of Janine Arloth Sent: Monday, 15 March 2010 9:16 p.m. To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus Hello, exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) Best regards _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From armendarez77 at hotmail.com Thu Mar 18 12:27:20 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Thu, 18 Mar 2010 09:27:20 -0700 Subject: [Bioperl-l] Bio::DB::RefSeq and iPrism Web Filter Message-ID: Hello, I'm having a problem involving my company's StBernard iPrism Web Filter. I would like to be able to run my scripts (include Bio::DB::RefSeq, Bio::DB::GenBank) via crontab, however the web filter requires me to log in every 8 hours. The administrator removed the filter however, my scripts still failed. I then logged into iPrism and the scripts worked. The system administrators say its the script; that it is somehow caching information and preventing itself from accessing the internet. I'm using the following modules: strict, DBI, Bio::Perl, Bio::SeqIO, Getopt::Long and Bio::Tools::Run::StandAloneBlast. I would include the script, but it's a bit involved and passes arguments to other scripts. Thank you, Veronica _________________________________________________________________ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850553/direct/01/ From cjfields at illinois.edu Thu Mar 18 13:21:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Mar 2010 12:21:22 -0500 Subject: [Bioperl-l] Bio::DB::RefSeq and iPrism Web Filter In-Reply-To: References: Message-ID: Veronica, No caching occurs that I know of. If you have a environment proxy set somehow it will use that, using LWP::UserAgent and env_proxy() (your logging in via iPrism makes me think it is something along those lines). Otherwise the proxy has to be explicitly set for each object, so no caching is apparent. Could you have a local environment proxy set that you're unaware of? See here for examples: http://search.cpan.org/~gaas/libwww-perl-5.834/lib/LWP/UserAgent.pm#Proxy_attributes You could try something like this after you create the instances, which accesses the LWP::UserAgent instance cached in the relevant class and shuts off proxies: $db->ua->no_proxy(); Otherwise, you can try coming up with a minimal test case indicating what happens (including any output) and file a bug report, just in case. chris On Mar 18, 2010, at 11:27 AM, wrote: > > Hello, > > I'm having a problem involving my company's StBernard iPrism Web Filter. I would like to be able to run my scripts (include Bio::DB::RefSeq, Bio::DB::GenBank) via crontab, however the web filter requires me to log in every 8 hours. The administrator removed the filter however, my scripts still failed. I then logged into iPrism and the scripts worked. > > The system administrators say its the script; that it is somehow caching information and preventing itself from accessing the internet. I'm using the following modules: strict, DBI, Bio::Perl, Bio::SeqIO, Getopt::Long and Bio::Tools::Run::StandAloneBlast. > > I would include the script, but it's a bit involved and passes arguments to other scripts. > > Thank you, > > Veronica > > > > _________________________________________________________________ > Hotmail: Trusted email with powerful SPAM protection. > http://clk.atdmt.com/GBL/go/210850553/direct/01/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Mar 18 17:11:34 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 18 Mar 2010 14:11:34 -0700 Subject: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4BA29706.8040606@cornell.edu> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2010 Administrator From me at miguel.weapps.com Thu Mar 18 19:33:16 2010 From: me at miguel.weapps.com (Luis M Rodriguez-R) Date: Thu, 18 Mar 2010 18:33:16 -0500 Subject: [Bioperl-l] GSoC-2010 & the semantic web Message-ID: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Hello all, I would like to know how to apply to the GSoC-2010, and when it is planned to be performed. I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf, uniprot and some onthologies, but it could also be useful to integrate OWL, for example). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL for a discoverable "bio-cloud". The people of bio2rdf already have some parsers, but there are still a lot of things to do. Best regards, Luis. Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 luisrodr at uniandes.edu.co me at miguel.weapps.com From rhythmbox-devel at maubp.freeserve.co.uk Thu Mar 18 20:25:05 2010 From: rhythmbox-devel at maubp.freeserve.co.uk (Peter) Date: Fri, 19 Mar 2010 00:25:05 +0000 Subject: [Bioperl-l] GSoC-2010 & the semantic web In-Reply-To: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> References: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Message-ID: <320fb6e01003181725j2aa1268am80ae7649bd873b46@mail.gmail.com> On Thu, Mar 18, 2010 at 11:33 PM, Luis M Rodriguez-R wrote: > > I think there are great development opportunities in information > discovery using semantic web (I'm familiar with RDF in bio2rdf, > uniprot and some onthologies, ... Have a read of the wiki pages from this recent hackathon - it should be of interested to you: http://hackathon3.dbcls.jp/ Peter From cjfields at illinois.edu Thu Mar 18 20:29:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Mar 2010 19:29:19 -0500 Subject: [Bioperl-l] GSoC-2010 & the semantic web In-Reply-To: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> References: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Message-ID: <0FADD2C6-9458-4E0C-ADB5-E4C0F18A79D8@illinois.edu> Luis, See this page for the specifics: http://www.open-bio.org/wiki/Google_Summer_of_Code There are several proposed projects already listed, feel free to add yours to the page. I'm assuming these will be OBF-focused, so tying your proposal to one of the OBF projects is probably a good idea. chris On Mar 18, 2010, at 6:33 PM, Luis M Rodriguez-R wrote: > Hello all, > > I would like to know how to apply to the GSoC-2010, and when it is planned to be performed. > > I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf, uniprot and some onthologies, but it could also be useful to integrate OWL, for example). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL for a discoverable "bio-cloud". > > The people of bio2rdf already have some parsers, but there are still a lot of things to do. > > Best regards, > Luis. > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ross at cuhk.edu.hk Sat Mar 20 19:55:35 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 21 Mar 2010 07:55:35 +0800 Subject: [Bioperl-l] automation of translation based on alignment Message-ID: <002c01cac888$d570fe20$8052fa60$@edu.hk> Dear bioperl users, I am working on virus sequences and one of the Genbank file is here: http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu m with 1000 such nucleotide sequences, I'd like to translate the corresponding protein coding sequences. The difficulties lie in: 1) The genome sequence is circular 2) The genes are overlapping I don't have all the 1000 Genbank files but I plan to use the above guide one to direct the automation process. Has bioperl implemented specialized functions to handle this kind of problem? Thanks a lot for your advice, Ross From florent.angly at gmail.com Sun Mar 21 20:44:11 2010 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 22 Mar 2010 10:44:11 +1000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <002c01cac888$d570fe20$8052fa60$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> Message-ID: <4BA6BD5B.9010509@gmail.com> Hi Ross, It seems like your answer is in the link you put. On this link, all the coding sequences are already identified and their aminoacid sequence provided. You simply need to parse all the GenBank entries to extract this information. You may use EUtilities to achieve this online: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook Florent On 21/03/10 09:55, Ross KK Leung wrote: > Dear bioperl users, > > > > I am working on virus sequences and one of the Genbank file is here: > > > > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > m > > > > with 1000 such nucleotide sequences, I'd like to translate the corresponding > protein coding sequences. The difficulties lie in: > > > > 1) The genome sequence is circular > > 2) The genes are overlapping > > > > I don't have all the 1000 Genbank files but I plan to use the above guide > one to direct the automation process. Has bioperl implemented specialized > functions to handle this kind of problem? > > > > Thanks a lot for your advice, Ross > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Sun Mar 21 21:14:27 2010 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 22 Mar 2010 11:14:27 +1000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <004d01cac95c$15c95250$415bf6f0$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> Message-ID: <4BA6C473.4090404@gmail.com> Hi Ross, Please keep relies on the BioPerl mailing list so that everyone benefits. You should give detailed explanations of what you are tying to achieve., e.g.: * What type of input file do you have? * Do you already know the location of the ORFs? * what is the multiple alignments you are talking about ... Florent On 22/03/10 11:07, Ross KK Leung wrote: > Dear Florent, > > Thanks for your response. While the one with Genbank file can be extracted, > those without have to rely on alignment. Scripts certainly can be written to > move forward and backward on the multiple alignment but it is an error-prone > process and that's why I raised this question. > > Rgds, Ross > > > > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Monday, March 22, 2010 8:44 AM > To: Ross KK Leung > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] automation of translation based on alignment > > Hi Ross, > It seems like your answer is in the link you put. On this link, all the > coding sequences are already identified and their aminoacid sequence > provided. You simply need to parse all the GenBank entries to extract > this information. You may use EUtilities to achieve this online: > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > Florent > > On 21/03/10 09:55, Ross KK Leung wrote: > >> Dear bioperl users, >> >> >> >> I am working on virus sequences and one of the Genbank file is here: >> >> >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> >> > >> tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> >> >> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > >> m >> >> >> >> with 1000 such nucleotide sequences, I'd like to translate the >> > corresponding > >> protein coding sequences. The difficulties lie in: >> >> >> >> 1) The genome sequence is circular >> >> 2) The genes are overlapping >> >> >> >> I don't have all the 1000 Genbank files but I plan to use the above guide >> one to direct the automation process. Has bioperl implemented specialized >> functions to handle this kind of problem? >> >> >> >> Thanks a lot for your advice, Ross >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From ross at cuhk.edu.hk Sun Mar 21 21:22:47 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 09:22:47 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <4BA6C473.4090404@gmail.com> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> Message-ID: <004e01cac95e$2e375f10$8aa61d30$@edu.hk> Dear Florent, Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem details: Input: 1000 multiple aligned DNA sequences One of them has Genbank file http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 the remaining 999 ones only have genomic sequences. Objective: to derive the cognate protein aligned sequences. (here have 4 sets as there are 4 overlapping genes) Difficulties: 1) circular genome 2) there may be in-dels Hope now the problem has been clarified, Ross -----Original Message----- From: Florent Angly [mailto:florent.angly at gmail.com] Sent: Monday, March 22, 2010 9:14 AM To: Ross KK Leung; bioperl-l List Subject: Re: [Bioperl-l] automation of translation based on alignment Hi Ross, Please keep relies on the BioPerl mailing list so that everyone benefits. You should give detailed explanations of what you are tying to achieve., e.g.: * What type of input file do you have? * Do you already know the location of the ORFs? * what is the multiple alignments you are talking about ... Florent On 22/03/10 11:07, Ross KK Leung wrote: > Dear Florent, > > Thanks for your response. While the one with Genbank file can be extracted, > those without have to rely on alignment. Scripts certainly can be written to > move forward and backward on the multiple alignment but it is an error-prone > process and that's why I raised this question. > > Rgds, Ross > > > > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Monday, March 22, 2010 8:44 AM > To: Ross KK Leung > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] automation of translation based on alignment > > Hi Ross, > It seems like your answer is in the link you put. On this link, all the > coding sequences are already identified and their aminoacid sequence > provided. You simply need to parse all the GenBank entries to extract > this information. You may use EUtilities to achieve this online: > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > Florent > > On 21/03/10 09:55, Ross KK Leung wrote: > >> Dear bioperl users, >> >> >> >> I am working on virus sequences and one of the Genbank file is here: >> >> >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> >> > >> tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> >> >> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > >> m >> >> >> >> with 1000 such nucleotide sequences, I'd like to translate the >> > corresponding > >> protein coding sequences. The difficulties lie in: >> >> >> >> 1) The genome sequence is circular >> >> 2) The genes are overlapping >> >> >> >> I don't have all the 1000 Genbank files but I plan to use the above guide >> one to direct the automation process. Has bioperl implemented specialized >> functions to handle this kind of problem? >> >> >> >> Thanks a lot for your advice, Ross >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From cjfields at illinois.edu Sun Mar 21 23:40:34 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 21 Mar 2010 22:40:34 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <004e01cac95e$2e375f10$8aa61d30$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> Message-ID: <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> On Mar 21, 2010, at 8:22 PM, Ross KK Leung wrote: > Dear Florent, > > Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem > details: > > Input: > > 1000 multiple aligned DNA sequences > One of them has Genbank file > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > > the remaining 999 ones only have genomic sequences. > > Objective: to derive the cognate protein aligned sequences. (here have 4 > sets as there are 4 overlapping genes) > > Difficulties: > 1) circular genome > 2) there may be in-dels To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? One could try converting the reference sequence's feature coordinates to alignment column-based positions, pull sub-alignments out from there, then translate each sequence. There would be no need to re-retrieve sequences which are already present in the alignment, unless there is something not mentioned above that I'm missing. Re: circular genomes: recent commits to bioperl should allow handling circular genomes with features and subsequence extraction. If not I would consider that a serious bug that needs to be reported. If you need to grab remote sequences from a larger set of sequences (either locally or remotely) and translate them, you can use Bio::DB::GenBank, which will directly return a Bio::Seq object. Note you would obviously have to reset these per ID based on the start/end/strand: my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => 100, -seq_stop => 200, -strand => 1); my $seqobj = $gb->get_Seq_by_id($id); # or get_Seq_by_acc($acc) # do any preprocessing here... my $protein_seqobj = $seq->translate; If you want you could also download the sequences and use one of the various flatfile database classes to work with them (I believe Bio::DB::Fasta extracts subsequences very rapidly). It might be faster. For those regions that cross the origin you may need to pull two sequences and join them somehow, as the sequences likely won't run a join automatically. > Hope now the problem has been clarified, Ross Hope this helps. chris From ross at cuhk.edu.hk Mon Mar 22 01:30:06 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 13:30:06 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> Message-ID: <006901cac980$bb60f190$3222d4b0$@edu.hk> Dear Chris, It seems that Bioperl is "clever" enough to "rectify" my start and stop by reversing the order. e.g. start = 2300 stop = 1600 It will reverse back to 1600 and then 2300. What else to tell that I'm now working on a circular genome? -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Monday, March 22, 2010 11:41 AM To: Ross KK Leung Cc: 'Florent Angly'; 'bioperl-l List' Subject: Re: [Bioperl-l] automation of translation based on alignment On Mar 21, 2010, at 8:22 PM, Ross KK Leung wrote: > Dear Florent, > > Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem > details: > > Input: > > 1000 multiple aligned DNA sequences > One of them has Genbank file > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > > the remaining 999 ones only have genomic sequences. > > Objective: to derive the cognate protein aligned sequences. (here have 4 > sets as there are 4 overlapping genes) > > Difficulties: > 1) circular genome > 2) there may be in-dels To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? One could try converting the reference sequence's feature coordinates to alignment column-based positions, pull sub-alignments out from there, then translate each sequence. There would be no need to re-retrieve sequences which are already present in the alignment, unless there is something not mentioned above that I'm missing. Re: circular genomes: recent commits to bioperl should allow handling circular genomes with features and subsequence extraction. If not I would consider that a serious bug that needs to be reported. If you need to grab remote sequences from a larger set of sequences (either locally or remotely) and translate them, you can use Bio::DB::GenBank, which will directly return a Bio::Seq object. Note you would obviously have to reset these per ID based on the start/end/strand: my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => 100, -seq_stop => 200, -strand => 1); my $seqobj = $gb->get_Seq_by_id($id); # or get_Seq_by_acc($acc) # do any preprocessing here... my $protein_seqobj = $seq->translate; If you want you could also download the sequences and use one of the various flatfile database classes to work with them (I believe Bio::DB::Fasta extracts subsequences very rapidly). It might be faster. For those regions that cross the origin you may need to pull two sequences and join them somehow, as the sequences likely won't run a join automatically. > Hope now the problem has been clarified, Ross Hope this helps. chris From cjfields at illinois.edu Mon Mar 22 08:58:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 07:58:00 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <006901cac980$bb60f190$3222d4b0$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> Message-ID: <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > Dear Chris, > > It seems that Bioperl is "clever" enough to "rectify" my start and stop by > reversing the order. > > e.g. > start = 2300 > stop = 1600 > > It will reverse back to 1600 and then 2300. > What else to tell that I'm now working on a circular genome? Reverse it where, the alignment or the feature? The svn version of BioPerl, for alignments, retains strand information (this was a bug that was fixed). For features, start is always less than end, with directionality determined by strand. For a circular genome, the feature is split across the origin, as you have seen in the original sequence you posted: ... gene join(2307..3215,1..1623) /gene="P" ... This would be represented as a Bio::Location::SplitLocation in the feature; it would joined based on that order if $seq->is_circular() is true (or at least it should). In cases like this, the safe bet is to call spliced_seq() to get the joined sequence in all cases, then call translate() to get the protein sequence. chris From ross at cuhk.edu.hk Mon Mar 22 09:17:05 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 21:17:05 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> Message-ID: <011701cac9c1$f7b89260$e729b720$@edu.hk> Chris, The following codes are what I use to retrieve sequences from GenBank. I know that I can use something like: for my $feature ($seqobj->get_SeqFeatures){ if ($feature->primary_tag eq "CDS") { ... To get features, but how should Bio::Location::SplitLocation be used? Do you mean something like: If ($seq->is_circular()) { spliced_seq(); } ? But the genome indeed has several such spliced sequences then how can I specify which is to retrieve? Thanks for your advice again~ #!/usr/bin/perl use Bio::SeqIO::genbank; use Bio::DB::GenBank; use Bio::DB::RefSeq; $gb = new Bio::DB::GenBank; my ($acc, $start, $stop) = @ARGV; my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => "$start", -seq_stop => "$stop", -strand => 1); $gbout = $acc; $seq = $gb->get_Seq_by_acc($acc); print "seq is ", $seq->seq, "\n"; $seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' ); $seqio_obj->write_seq($seq); exit; -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Monday, March 22, 2010 8:58 PM To: Ross KK Leung Cc: 'Florent Angly'; 'bioperl-l List' Subject: Re: [Bioperl-l] automation of translation based on alignment On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > Dear Chris, > > It seems that Bioperl is "clever" enough to "rectify" my start and stop by > reversing the order. > > e.g. > start = 2300 > stop = 1600 > > It will reverse back to 1600 and then 2300. > What else to tell that I'm now working on a circular genome? Reverse it where, the alignment or the feature? The svn version of BioPerl, for alignments, retains strand information (this was a bug that was fixed). For features, start is always less than end, with directionality determined by strand. For a circular genome, the feature is split across the origin, as you have seen in the original sequence you posted: ... gene join(2307..3215,1..1623) /gene="P" ... This would be represented as a Bio::Location::SplitLocation in the feature; it would joined based on that order if $seq->is_circular() is true (or at least it should). In cases like this, the safe bet is to call spliced_seq() to get the joined sequence in all cases, then call translate() to get the protein sequence. chris From jessica.sun at gmail.com Mon Mar 22 14:48:38 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Mon, 22 Mar 2010 14:48:38 -0400 Subject: [Bioperl-l] using Bio::SeqFeature::Tools::Unflattener Message-ID: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> Does any know how to get CDS of the corresponding mRNA accession(NM_) using this function? *Bio::SeqFeature::Tools::Unflattener many thanks in advance. * -- Jessica Jingping Sun From cjfields at illinois.edu Mon Mar 22 14:56:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 13:56:30 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature spliced_seq() Message-ID: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> I have just noticed that spliced_seq() is borked with Bio::DB::SeqFeature and am thinking about implementing it. Or is similar functionality already implemented elsewhere? Currently, it is calling entire_seq(), which I plan on avoiding simply to prevent sucking in the entire sequence into memory. This is currently what happens: --------------------------- my $it = $store->get_seq_stream(-type => 'mRNA'); my $ct = 0; while (my $sf = $it->next_seq) { my $seq = $sf->spliced_seq; # dies with exception } --------------------------- ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented by package Bio::DB::SeqFeature. This is not your fault - author of Bio::DB::SeqFeature should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 STACK: Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 STACK: Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 STACK: Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 STACK: beestore.pl:17 ---------------------------------------------------------------- chris From csembry at ualr.edu Mon Mar 22 15:48:56 2010 From: csembry at ualr.edu (Charles Embry) Date: Mon, 22 Mar 2010 14:48:56 -0500 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista Message-ID: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> I want to create a Gui that will use current bioperl modules(along with some I am writing). It will be on a windows machine that runs XP and maybe a laptop with Vista.(this is a project i am working on in Graduate school for a professor). It will be id'ing promoter types in eukaryote organisms and also do multiple alignments. What recommendations do yo suggest to use t develop this? A java application? If so how hard is it to get Java to use perl and bioperl modules? Another language? Is there a tool to directly develop a GUI for bioperl modules that does no use another language? I will need to tag certain sequences with user specified colors and such. Thanks for the help From cjfields at illinois.edu Mon Mar 22 16:20:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 15:20:24 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <011701cac9c1$f7b89260$e729b720$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> <011701cac9c1$f7b89260$e729b720$@edu.hk> Message-ID: On Mar 22, 2010, at 8:17 AM, Ross KK Leung wrote: > Chris, > > The following codes are what I use to retrieve sequences from GenBank. I > know that I can use something like: > > for my $feature ($seqobj->get_SeqFeatures){ > > if ($feature->primary_tag eq "CDS") { > ... > > To get features, but how should > > Bio::Location::SplitLocation > > be used? Do you mean something like: > > If ($seq->is_circular()) { > spliced_seq(); > } You probably won't directly see the SplitLocation itself unless you explicitly request it (it is contained in the sequence feature). Okay, so if you are trying to retrieve the sequence for a specific feature, you can use $sf->seq() (simple subsequence from start to end corrected for strand of feature). However, in the case where the feature crosses the origin it will contain a split location. In this case, you should call $sf->spliced_seq() to retrieve spliced sequence. For convenience, you could call spliced_seq on all sequence features; for simple locations it will just return the ordinary subseq(). So, if one had a generic sequence feature, one could call: $sf->spliced_seq->translate; to get the Bio::Seq object that is the translation of the seq feature region. > ? But the genome indeed has several such spliced sequences then how can I > specify which is to retrieve? Thanks for your advice again~ Do you mean alternatively spliced variants? These would be designated as separate features in a GenBank file, so you would check for those. Otherwise you'll have to clarify. If you haven't read them yet I suggest looking over the HOWTOs, specifically ones covering Seq/SeqIO and Feature/Annotation to get an idea of what is possible. chris > #!/usr/bin/perl > > use Bio::SeqIO::genbank; use Bio::DB::GenBank; > > use Bio::DB::RefSeq; > > > > $gb = new Bio::DB::GenBank; > > > > my ($acc, $start, $stop) = @ARGV; > > > > my $gb = Bio::DB::GenBank->new(-format => 'Fasta', > > -seq_start => "$start", > > -seq_stop => "$stop", > > -strand => 1); > > > > $gbout = $acc; > > > > $seq = $gb->get_Seq_by_acc($acc); > > print "seq is ", $seq->seq, "\n"; > > > > $seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' ); > > $seqio_obj->write_seq($seq); > > exit; > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Monday, March 22, 2010 8:58 PM > To: Ross KK Leung > Cc: 'Florent Angly'; 'bioperl-l List' > Subject: Re: [Bioperl-l] automation of translation based on alignment > > On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > >> Dear Chris, >> >> It seems that Bioperl is "clever" enough to "rectify" my start and stop by >> reversing the order. >> >> e.g. >> start = 2300 >> stop = 1600 >> >> It will reverse back to 1600 and then 2300. >> What else to tell that I'm now working on a circular genome? > > Reverse it where, the alignment or the feature? The svn version of BioPerl, > for alignments, retains strand information (this was a bug that was fixed). > For features, start is always less than end, with directionality determined > by strand. For a circular genome, the feature is split across the origin, > as you have seen in the original sequence you posted: > > ... > gene join(2307..3215,1..1623) > /gene="P" > ... > > > This would be represented as a Bio::Location::SplitLocation in the feature; > it would joined based on that order if $seq->is_circular() is true (or at > least it should). In cases like this, the safe bet is to call spliced_seq() > to get the joined sequence in all cases, then call translate() to get the > protein sequence. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Mon Mar 22 16:23:50 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 09:23:50 +1300 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E8829C2@exchsth.agresearch.co.nz> I guess it depends on how complex you need your GUI. If you only need a few a few menus, input fields, buttons, and are getting text or images as output then I'd stick to a simple web interface. You could tart it up a bit with Dojo or YUI libraries so it didn't look like every other webpage. If you need something more complex, you could give TK a go but I'm not sure how good it is and it will look a bit dated. If you're going to write the GUI in Swing, try Inline::Java and Java::Swing - take a look here: http://www.perlmonks.org/?node_id=372197 It may be easier to call Perl from Java so take a look at PLJava http://search.cpan.org/~gmpassos/PLJava-0.04/README.pod I haven't tried a Java GUI for Perl yet - we tend to use web interfaces for scripts that are going to get used by the "public" (i.e. scientists, not developers). We've found Mobyle http://bioweb2.pasteur.fr/projects/mobyle/ to be a nice way to get something up fairly quickly and it keep a consistent look to all our scripts. Hope this helps, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Charles Embry > Sent: Tuesday, 23 March 2010 8:49 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > > I want to create a Gui that will use current bioperl modules(along with > some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school > for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Mon Mar 22 16:26:15 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 22 Mar 2010 13:26:15 -0700 Subject: [Bioperl-l] Bio::DB::SeqFeature spliced_seq() In-Reply-To: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> References: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> Message-ID: <4BA7D267.6050704@bioperl.org> Yes it needs a special case I guess - since spliced_seq should work, however ... The only problem is that if both exons and CDS are sub-features you have to be smart enough to not grab both... So I have just relied on specialized dumping scripts for gff3_to_cds for my own needs (i.e. http://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/dbgff_to_cdspep.pl ). But you might also see what the Gbrowse plugin dumpers do. -jason Chris Fields wrote, On 3/22/10 11:56 AM: > I have just noticed that spliced_seq() is borked with > Bio::DB::SeqFeature and am thinking about implementing it. Or is > similar functionality already implemented elsewhere? > > Currently, it is calling entire_seq(), which I plan on avoiding simply > to prevent sucking in the entire sequence into memory. This is > currently what happens: > > > --------------------------- > > my $it = $store->get_seq_stream(-type => 'mRNA'); > > my $ct = 0; > while (my $sf = $it->next_seq) { > my $seq = $sf->spliced_seq; # dies with exception > } > > --------------------------- > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented > by package Bio::DB::SeqFeature. > This is not your fault - author of Bio::DB::SeqFeature should be blamed! > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 > STACK: > Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 > STACK: > Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 > STACK: > Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 > STACK: beestore.pl:17 > ---------------------------------------------------------------- > > > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Mon Mar 22 16:33:48 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 22 Mar 2010 13:33:48 -0700 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> Message-ID: <4BA7D42C.5050602@cornell.edu> If I were doing a GUI for BioPerl, I would certainly not try to use Java. You could have a look at how Padre, the Perl IDE (written in Perl is implemented): http://search.cpan.org/~plaven/Padre-0.58/ They use wx, I think. But, a simple web or command-line application would be far easier to write, in any language, if you can find somewhere to host it. Rob Charles Embry wrote: > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Mar 22 16:33:51 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 22 Mar 2010 13:33:51 -0700 Subject: [Bioperl-l] using Bio::SeqFeature::Tools::Unflattener In-Reply-To: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> References: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> Message-ID: <4BA7D42F.2060807@bioperl.org> you can try this but it is a bit of an involved script because it is setup for dealing with multiple genomes in multiple folders so you might want to simplify it. http://github.com/hyphaltip/genome-scripts/blob/master/data_format/genbank_gbk2gff3_unflatten.pl But I thought the perldoc was a good starting point - have you tried it Generally I do: GENBANK -> GFF3 --> genbank_gbk2gff3_unflatten.pl GFF3 -> {CDS,PEP,GENE} --> http://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/gff3_to_cdspep.pl (or equivalent) -jason Jessica Sun wrote, On 3/22/10 11:48 AM: > Does any know how to get CDS of the corresponding mRNA accession(NM_) using > this function? > *Bio::SeqFeature::Tools::Unflattener > > many thanks in advance. > > * > From Russell.Smithies at agresearch.co.nz Mon Mar 22 17:10:36 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 10:10:36 +1300 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4BA7D42C.5050602@cornell.edu> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> <4BA7D42C.5050602@cornell.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E882A5B@exchsth.agresearch.co.nz> wx www.wxwidgets.org looks very interesting - I didn't realize Cn3D used it. wxPerl http://wxperl.sourceforge.net might be worth a look. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Robert Buels > Sent: Tuesday, 23 March 2010 9:34 a.m. > To: Charles Embry > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > > If I were doing a GUI for BioPerl, I would certainly not try to use > Java. You could have a look at how Padre, the Perl IDE (written in Perl > is implemented): http://search.cpan.org/~plaven/Padre-0.58/ They use > wx, I think. > > But, a simple web or command-line application would be far easier to > write, in any language, if you can find somewhere to host it. > > Rob > > > Charles Embry wrote: > > I want to create a Gui that will use current bioperl modules(along with > some > > I am writing). It will be on a windows machine that runs XP and maybe a > > laptop with Vista.(this is a project i am working on in Graduate school > for > > a professor). It will be id'ing promoter types in eukaryote organisms > and > > also do multiple alignments. > > > > What recommendations do yo suggest to use t develop this? A java > > application? If so how hard is it to get Java to use perl and bioperl > > modules? Another language? Is there a tool to directly develop a GUI for > > bioperl modules that does no use another language? > > > > I will need to tag certain sequences with user specified colors and > such. > > > > > > Thanks for the help > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From clarsen at vecna.com Mon Mar 22 16:51:08 2010 From: clarsen at vecna.com (Chris Larsen) Date: Mon, 22 Mar 2010 16:51:08 -0400 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: Ross, Chris F, I'd like to just comment on this since we are working in parallel on a similar problem. See also the prior thread in archives for Peters work in BioPython that I instigated: "Polyproteins, robo slippage, viral mat_peptides" This dialog below is just to clarify the science that will guide the pseudocode and logic flow would be needed to be built out into a BioPerl module. There are plenty of comments on the string mashing required, and its a harrowing morass, but heres some other thoughts. Three line item comments first, and then some open general ideas for moving this block of concepts forward: 1. >> Ross Said: >> I am working on virus sequences and one of the Genbank file is here: >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> If you are transferring protein annotation, why not use the RefSeq one instead of a GenBank one? In our experience at Virusbrc.org we find that protein annotation transfer is only a valid idea if you have reference sequences for each serotype, or your annotations will have propagation errors from the reference. They just dont align more than 80% of the time for instance in Dengue, and I assume you want better then that? Yes this HepB is a decent sequence, but the problem is that HepB has four main serotypes, and yet there is only one RefSeq: NC_003977. My guess is that you will have to define reference peptide seqs for all four serotypes first, and then grab the Taxon_ID from the input unknown file so you align right i.e. you need to do virus annotation below the species level or it isnt accurate. The number of reference sequences that you use is related to the conservation of your virus family. The script needs to know which one to align to, so we have pulled that from the taxon_ID field of the *.gbk file. You could also use blast and pull the high scorer. Your choice. >> Ross said: >> >> Thanks for your response. While the one with Genbank file can be >> extracted, >> those without have to rely on alignment. Scripts certainly can be >> written to >> move forward and backward on the multiple alignment but it is an >> error-prone We find also that viruses dont have the proteins annotated most of the time. It's just genome file. Part of the problem is that /host/ proteases sometimes cleave the /viral/ polyproteins, in a species- specific way, and since there is only one database entry, but many hosts, you can /only/ give the genome code and still be right for everything it /might/ infect. You cant define the peptides in the file, because they might be different, depending on the host. Sick, isnt it? The proteins produced in different animals based on their proteases cleavage specificity help determine whether the virus effects that animal or not. This is my hunch based on experience, no, I cannot give an example. 3. Chris F said: > To preface this, any reason you're not translating the alignment > sequences using the above sequence's features as a reference? A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. 4. Last thoughts: * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? *Sometimes the features on viruses are named differently: / mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. All ideas. Nothing set in stone. Dialog welcome. Good luck all. Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 clarsen at vecna.com From janine.arloth at googlemail.com Sun Mar 21 10:02:32 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Sun, 21 Mar 2010 15:02:32 +0100 Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs In-Reply-To: References: Message-ID: Hello all, while running blast(n) I want to extend to method_arg like: .. $result = $fac->$blastprogramm_input( -query => $seq, -outfile => "blast.txt", -method_args => [ "-num_alignments" => $num_alignments_input, "-evalue" => $evalue_input, "-word_size" => $word_size_input, "-?" => $match_score_input, "-?" => $gapcosts_input ..... ] ); ... in Bio/Tools/BlastPlus/Config.pm I found for gap costs: bln| gapopen and bln| gapextend so when I have the input value = "4 4" , then Existence: 4 = gapaopen and Extension: 4 = gapextend ?? Is there a similar usage for Match/Mismatch scores like value="1,-2" -> match=1 and mismatch=-2?? (I can't find it) Thanks for help. From nils.mueller0 at googlemail.com Sun Mar 21 11:17:06 2010 From: nils.mueller0 at googlemail.com (=?ISO-8859-1?Q?Nils_M=FCller?=) Date: Sun, 21 Mar 2010 16:17:06 +0100 Subject: [Bioperl-l] BlastPlus Masker Message-ID: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> Dear all, I am confused in handeling with maskers in blastplus: I have fasta seq. and want to run blast with a low complexity masker like dustmasker: $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -mask_data => 'maskseqs.fas', -create => 1); Is myseqs.fas the same as maskseqs.fas??? I don't want to create a maskfile , I only will run blast with a masked file?? From razi.khaja at gmail.com Mon Mar 22 20:55:42 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Mon, 22 Mar 2010 20:55:42 -0400 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: <201003191525.o2JFPIr3019479@portal.open-bio.org> References: <201003191525.o2JFPIr3019479@portal.open-bio.org> Message-ID: Hello All, I've submitted a patch (blast.pm.diff) to bugzilla to enhance Bio/SearchIO/ blast.pm to be able to parse the algorithm_reference from BLAST reports. I've also submitted a patch (blast.t.diff) of 26 additional tests to parse the algorithm_reference from many of the BLAST reports in the t/data dir in bioperl-live. I'd like to get the patch into bioperl-live and would like someone to review the patch and tests. If the architecture for BLAST report parsing is changing, can someone let me know and I can contribute my efforts there. Below are links to bugzilla. Thanks, Razi Khaja ---------- Forwarded message ---------- From: Date: Fri, Mar 19, 2010 at 11:25 AM Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO To: bioperl-guts-l at bioperl.org http://bugzilla.open-bio.org/show_bug.cgi?id=3031 ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST ------- Created an attachment (id=1462) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) patch for t/SearchIO/blast.t to perform 26 additional tests to parse algorithm_reference from many BLAST report files -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From Russell.Smithies at agresearch.co.nz Mon Mar 22 21:26:30 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 14:26:30 +1300 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: References: <201003191525.o2JFPIr3019479@portal.open-bio.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> It's not really a bug if it was never implemented and it probably wasn't implemented because it wasn't needed. Is there actually a use case where you'd programmatically need to access the algorithm reference from Blast results?? I'm sure I can't think of one. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Razi Khaja > Sent: Tuesday, 23 March 2010 1:56 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse > algorithm_reference from BLAST reports using Bio::SearchIO > > Hello All, > > I've submitted a patch (blast.pm.diff) to bugzilla to enhance > Bio/SearchIO/ > blast.pm to be able to parse the algorithm_reference from BLAST reports. > I've also submitted a patch (blast.t.diff) of 26 additional tests to parse > the algorithm_reference from many of the BLAST reports in the t/data dir > in > bioperl-live. > > I'd like to get the patch into bioperl-live and would like someone to > review > the patch and tests. > > If the architecture for BLAST report parsing is changing, can someone let > me > know and I can contribute my efforts there. > > Below are links to bugzilla. > > Thanks, > > Razi Khaja > > ---------- Forwarded message ---------- > From: > Date: Fri, Mar 19, 2010 at 11:25 AM > Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference > from BLAST reports using Bio::SearchIO > To: bioperl-guts-l at bioperl.org > > > http://bugzilla.open-bio.org/show_bug.cgi?id=3031 > > > > > > ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST ------- > Created an attachment (id=1462) > --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) > patch for t/SearchIO/blast.t to perform 26 additional tests to parse > algorithm_reference from many BLAST report files > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From ross at cuhk.edu.hk Mon Mar 22 21:32:06 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 23 Mar 2010 09:32:06 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <001201caca28$a5e325b0$f1a97110$@edu.hk> Chris L, Your comment is insightful and as a non-virologist, I have never known that before. My strategy is just to extract the genomic fragments encoding proteins and derive the putative translated sequences. I'll do another round of MSA for the protein sequences in order to discover any outliners. There may be truncations, but as long as the protease acts post-translationally, it's acceptable. Chris F, What makes me feel frustrated is the verisimilar data structures and naming of Bio objects in Bioperl. If I want to retrieve a genbank file over the internet by: $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_acc('J00522'); And from: http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq so I can't do something like: my $seqobj = $seq->next_seq; for my $feat_object ($seqobj->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->spliced_seq->seq,"\n"; if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ print "gene: ",$val,"\n"; } } } } >From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the methods there mention nothing about how to get the features or inter-convert among the object types. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Larsen Sent: Tuesday, March 23, 2010 4:51 AM To: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] automation of translation based on alignment Ross, Chris F, I'd like to just comment on this since we are working in parallel on a similar problem. See also the prior thread in archives for Peters work in BioPython that I instigated: "Polyproteins, robo slippage, viral mat_peptides" This dialog below is just to clarify the science that will guide the pseudocode and logic flow would be needed to be built out into a BioPerl module. There are plenty of comments on the string mashing required, and its a harrowing morass, but heres some other thoughts. Three line item comments first, and then some open general ideas for moving this block of concepts forward: 1. >> Ross Said: >> I am working on virus sequences and one of the Genbank file is here: >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> If you are transferring protein annotation, why not use the RefSeq one instead of a GenBank one? In our experience at Virusbrc.org we find that protein annotation transfer is only a valid idea if you have reference sequences for each serotype, or your annotations will have propagation errors from the reference. They just dont align more than 80% of the time for instance in Dengue, and I assume you want better then that? Yes this HepB is a decent sequence, but the problem is that HepB has four main serotypes, and yet there is only one RefSeq: NC_003977. My guess is that you will have to define reference peptide seqs for all four serotypes first, and then grab the Taxon_ID from the input unknown file so you align right i.e. you need to do virus annotation below the species level or it isnt accurate. The number of reference sequences that you use is related to the conservation of your virus family. The script needs to know which one to align to, so we have pulled that from the taxon_ID field of the *.gbk file. You could also use blast and pull the high scorer. Your choice. >> Ross said: >> >> Thanks for your response. While the one with Genbank file can be >> extracted, >> those without have to rely on alignment. Scripts certainly can be >> written to >> move forward and backward on the multiple alignment but it is an >> error-prone We find also that viruses dont have the proteins annotated most of the time. It's just genome file. Part of the problem is that /host/ proteases sometimes cleave the /viral/ polyproteins, in a species- specific way, and since there is only one database entry, but many hosts, you can /only/ give the genome code and still be right for everything it /might/ infect. You cant define the peptides in the file, because they might be different, depending on the host. Sick, isnt it? The proteins produced in different animals based on their proteases cleavage specificity help determine whether the virus effects that animal or not. This is my hunch based on experience, no, I cannot give an example. 3. Chris F said: > To preface this, any reason you're not translating the alignment > sequences using the above sequence's features as a reference? A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. 4. Last thoughts: * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? *Sometimes the features on viruses are named differently: / mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. All ideas. Nothing set in stone. Dialog welcome. Good luck all. Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 clarsen at vecna.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Mon Mar 22 22:08:45 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Mon, 22 Mar 2010 22:08:45 -0400 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> References: <201003191525.o2JFPIr3019479@portal.open-bio.org> <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> Message-ID: Nope, not a bug, It's an enhancement though ;) I implemented it so that I could do a loss less transformation from BLAST report format to other formats. You could consider that a use case. I also have additional patches that parse other details from BLAST reports that aren't currently implemented in Bio::SearchIO, and I'd like to contribute those as well, however, I thought I'd start with this one. Razi On Mon, Mar 22, 2010 at 9:26 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > It's not really a bug if it was never implemented and it probably wasn't > implemented because it wasn't needed. > Is there actually a use case where you'd programmatically need to access > the algorithm reference from Blast results?? > I'm sure I can't think of one. > > > --Russell > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Razi Khaja > > Sent: Tuesday, 23 March 2010 1:56 p.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse > > algorithm_reference from BLAST reports using Bio::SearchIO > > > > Hello All, > > > > I've submitted a patch (blast.pm.diff) to bugzilla to enhance > > Bio/SearchIO/ > > blast.pm to be able to parse the algorithm_reference from BLAST reports. > > I've also submitted a patch (blast.t.diff) of 26 additional tests to > parse > > the algorithm_reference from many of the BLAST reports in the t/data dir > > in > > bioperl-live. > > > > I'd like to get the patch into bioperl-live and would like someone to > > review > > the patch and tests. > > > > If the architecture for BLAST report parsing is changing, can someone let > > me > > know and I can contribute my efforts there. > > > > Below are links to bugzilla. > > > > Thanks, > > > > Razi Khaja > > > > ---------- Forwarded message ---------- > > From: > > Date: Fri, Mar 19, 2010 at 11:25 AM > > Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference > > from BLAST reports using Bio::SearchIO > > To: bioperl-guts-l at bioperl.org > > > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=3031 > > > > > > > > > > > > ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST > ------- > > Created an attachment (id=1462) > > --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) > > patch for t/SearchIO/blast.t to perform 26 additional tests to parse > > algorithm_reference from many BLAST report files > > > > > > -- > > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are the assignee for the bug, or are watching the assignee. > > _______________________________________________ > > Bioperl-guts-l mailing list > > Bioperl-guts-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From maj at fortinbras.us Mon Mar 22 22:51:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Mar 2010 22:51:24 -0400 Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs In-Reply-To: References: Message-ID: Hi Janine-- The options you need are "reward" (for the match score) and "penalty" (for the mismatch score). Add them to -method_args. cheers MAJ ----- Original Message ----- From: "Janine Arloth" To: Sent: Sunday, March 21, 2010 10:02 AM Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs > Hello all, > > while running blast(n) I want to extend to method_arg like: > .. > $result = $fac->$blastprogramm_input( > -query => $seq, > -outfile => "blast.txt", > -method_args => [ > "-num_alignments" => $num_alignments_input, > "-evalue" => $evalue_input, > "-word_size" => $word_size_input, > "-?" => $match_score_input, > "-?" => $gapcosts_input > ..... > ] > ); > ... > > in Bio/Tools/BlastPlus/Config.pm I found for gap costs: bln| gapopen and bln| > gapextend > so when I have the input value = "4 4" , then Existence: 4 = gapaopen and > Extension: 4 = gapextend ?? > > Is there a similar usage for Match/Mismatch scores like value="1,-2" -> > match=1 and mismatch=-2?? > (I can't find it) > > Thanks for help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Mon Mar 22 22:59:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Mar 2010 22:59:56 -0400 Subject: [Bioperl-l] BlastPlus Masker In-Reply-To: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> References: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> Message-ID: Hi Nils, You don't have to specify a mask_data file; the factory should make it for you; try simply $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -create => 1); -mask_data is there so that pre-made masks can be applied separately, or so you can name the file that is produced and preserve it; this is an "advanced feature", I suppose-- MAJ ----- Original Message ----- From: "Nils M?ller" To: Sent: Sunday, March 21, 2010 11:17 AM Subject: [Bioperl-l] BlastPlus Masker > Dear all, > > I am confused in handeling with maskers in blastplus: > I have fasta seq. and want to run blast with a low complexity masker like > dustmasker: > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'my_masked_db', > -db_data => 'myseqs.fas', > -masker => 'dustmasker', > -mask_data => 'maskseqs.fas', > -create => 1); > > Is myseqs.fas the same as maskseqs.fas??? I don't want to create a > maskfile , I only will run blast with a masked file?? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Mar 23 00:43:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 23:43:03 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <001201caca28$a5e325b0$f1a97110$@edu.hk> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> Message-ID: <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> On Mar 22, 2010, at 8:32 PM, Ross KK Leung wrote: > Chris L, > > Your comment is insightful and as a non-virologist, I have never known that > before. My strategy is just to extract the genomic fragments encoding > proteins and derive the putative translated sequences. I'll do another round > of MSA for the protein sequences in order to discover any outliners. There > may be truncations, but as long as the protease acts post-translationally, > it's acceptable. > > Chris F, > > What makes me feel frustrated is the verisimilar data structures and naming > of Bio objects in Bioperl. If I want to retrieve a genbank file over the > internet by: > > $gb = new Bio::DB::GenBank; > > $seq = $gb->get_Seq_by_acc('J00522'); > > And from: > http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html > > it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq > so I can't do something like: A Bio::Seq::RichSeq is-a Bio::Seq (it inherits Bio::Seq and augments it). I believe 'Bio::Seq' in the documents refers to the fact one can retrieve FASTA sequence data (which returns a simple Bio::Seq) or richer records, such as a GenBank record (which returns a Bio::Seq::RichSeq). In this case, it should probably read 'Bio::SeqI' to be more accurate (implements the Bio::SeqI interface). Beyond the addition of a few accessor methods they are essentially the same, in they both have annotation, features, etc. > my $seqobj = $seq->next_seq; You're either not reading the demos or the relevant documentation correctly, or there is a spot in the docs that needs to be fixed (if the latter, please let us know). Bio::Seq does not implement a next_seq() method, but sequence *streams* (ala Bio::SeqIO) do. You are probably thinking of something like this: my $streamobj = $gb->get_Stream_by_acc(@ids); while (my $seqobj = $stream->next_seq) { # do stuff here } The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time. Unless you call a stream in some way, that code will not work. If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > for my $feat_object ($seqobj->get_SeqFeatures) { > > if ($feat_object->primary_tag eq "CDS") { > > print $feat_object->spliced_seq->seq,"\n"; > > if ($feat_object->has_tag('gene')) { > > for my $val ($feat_object->get_tag_values('gene')){ > > print "gene: ",$val,"\n"; > > } > > } > > } > > } > >> From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the > methods there mention nothing about how to get the features or inter-convert > among the object types. Just a note, but make sure to read up-to-date documentation, particularly if you are using the latest code. Here is the pdoc for the latest release: http://doc.bioperl.org/releases/bioperl-1.6.1/Bio/Seq/RichSeqI.html This is definitely worth pointing out, and is a good example where we can improve our documentation; I've added some links to classes that would explain more. In the meantime, the best thing to do in this case is to point you to the online documentation (which I think I did already, but just in case): http://www.bioperl.org/wiki/HOWTO:Beginners http://www.bioperl.org/wiki/HOWTO:Feature-Annotation chris From cjfields at illinois.edu Tue Mar 23 00:53:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 23:53:48 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <42E3E2EC-2226-44CE-995E-01B425B161F1@illinois.edu> On Mar 22, 2010, at 3:51 PM, Chris Larsen wrote: > ... > 3. > Chris F said: > >> To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? > > > A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. > > Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. Yes, in this case nothing will be a immediate, perfect solution. It will take some additional work. > 4. > Last thoughts: > > * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). Might be nice to see what you've done, whenever that is ready. > * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? If you mean pulling out sets of sequences from a larger alignment or slices of alignments, there should be methods within Bio::SimpleAlign to do this, yes. > *Sometimes the features on viruses are named differently: /mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. > > * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. > > All ideas. Nothing set in stone. Dialog welcome. > > Good luck all. > > Chris > > > -- > > Christopher Larsen, Ph.D. > Sr. Scientist / Grants Manager > Vecna Technologies > 6404 Ivy Lane #500 > Greenbelt, MD 20770 > Phone: (240) 965-4525 > Fax: (240) 547-6133 > > clarsen at vecna.com Very nice summary of the problems in the field. thanks! chris From ross at cuhk.edu.hk Tue Mar 23 01:20:56 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 23 Mar 2010 13:20:56 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> Message-ID: <001501caca48$9db03f70$d910be50$@edu.hk> my $streamobj = $gb->get_Stream_by_acc(@ids); while (my $seqobj = $stream->next_seq) { # do stuff here } The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time. Unless you call a stream in some way, that code will not work. If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > for my $feat_object ($seqobj->get_SeqFeatures) { > > if ($feat_object->primary_tag eq "CDS") { > > print $feat_object->spliced_seq->seq,"\n"; > > if ($feat_object->has_tag('gene')) { > > for my $val ($feat_object->get_tag_values('gene')){ > > print "gene: ",$val,"\n"; > > } > > } > > } > > } Chris, in fact I did have this code before, but then it goes back to the old problem that the spliced sequence is incorrect. Please try using the following codes with "DQ089804" as the argument. If you check the printed result with: http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=2&itool=EntrezSyst em2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum you'll discover, for example, the sequence of gene P, is derived from splicing 1-1623 (starts with CTC...) and 2307-3215 (starts with ATG...), rather than 2307-3215 and 1-1623. use Bio::SeqIO::genbank; use Bio::DB::GenBank; use Bio::SeqIO; my ($acc) = @ARGV; $gb = new Bio::DB::GenBank; $streamobj = $gb->get_Stream_by_acc($acc); my $seqobj = $streamobj->next_seq; for my $feat_object ($seqobj->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->spliced_seq->seq,"\n"; if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ print "gene: ",$val,"\n"; } } } } exit; From e.osimo at gmail.com Tue Mar 23 05:42:25 2010 From: e.osimo at gmail.com (Emanuele Osimo) Date: Tue, 23 Mar 2010 10:42:25 +0100 Subject: [Bioperl-l] Xyplot and multiple lines plots Message-ID: <2ac05d0f1003230242o31779c30sffa42d8e99539b09@mail.gmail.com> Hello everyone, I would like to plot two data sets in Bio::Graphics using Xyplot, one superimposed on the other. I need to compare the differential expression of an Affy expression probeset in different subjects. I successfully managed to plot one at a time with: $panel->add_track( $feat, -graph_type=>'linepoints', -glyph =>'xyplot', -fgcolor=>'gray', -max_score => 1, -min_score => 0, ); But I cannot understand how to plot two lines independently in the same track. Thank you in advance, Emanuele From biopython at maubp.freeserve.co.uk Tue Mar 23 06:58:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Mar 2010 10:58:58 +0000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <320fb6e01003230358w11ae8e5fxef140652c5cc9f1b@mail.gmail.com> On Mon, Mar 22, 2010 at 8:51 PM, Chris Larsen wrote: > Ross, Chris F, > > I'd like to just comment on this since we are working in parallel on a > similar problem. See also the prior thread in archives for Peters work in > BioPython that I instigated: "Polyproteins, robo slippage, viral > mat_peptides" Minor typo - the old thread title was about ribo (ribosomal) slippage: http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031479.html http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031484.html etc Triggered in part by my discussion with Chris Larsen (off list) about the biological problem of getting the mature peptide sequences from GenBank files, Biopython 1.53 ended up with a new method for extracting the sequence region described by a (complex) location, e.g. from parsing in an EMBL/GenBank file. There were several threads about this, this is perhaps the best summary if anyone is interested: http://lists.open-bio.org/pipermail/biopython/2009-November/005813.html http://lists.open-bio.org/pipermail/biopython/2009-December/005889.html > This dialog below is just to clarify the science that will guide the > pseudocode and logic flow would be needed to be built out into a BioPerl > module. There are plenty of comments on the string mashing required, and its > a harrowing morass, but heres some other thoughts. Three line item comments > first, and then some open general ideas for moving this block of concepts > forward: Thanks for the update - it sounds like you've got a better understanding of the complexities now, any some of the reasons why representing things like mature peptides is tricky (the issue of different cleavage patterns in different hosts is interesting). Peter From cjfields at illinois.edu Tue Mar 23 08:46:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 23 Mar 2010 07:46:37 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <001501caca48$9db03f70$d910be50$@edu.hk> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> <001501caca48$9db03f70$d910be50$@edu.hk> Message-ID: <3A94734B-CD43-4674-8DB6-82EA1C6530E4@illinois.edu> On Mar 23, 2010, at 12:20 AM, Ross KK Leung wrote: > my $streamobj = $gb->get_Stream_by_acc(@ids); > > while (my $seqobj = $stream->next_seq) { > # do stuff here > } > > The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO > stream). '$stream->next_seq()' iterates through them one at a time. Unless > you call a stream in some way, that code will not work. If you call the > methods below directly on the *sequence* object ($seqobj, retrieved from > get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > >> for my $feat_object ($seqobj->get_SeqFeatures) { >> >> if ($feat_object->primary_tag eq "CDS") { >> >> print $feat_object->spliced_seq->seq,"\n"; >> >> if ($feat_object->has_tag('gene')) { >> >> for my $val ($feat_object->get_tag_values('gene')){ >> >> print "gene: ",$val,"\n"; >> >> } >> >> } >> >> } >> >> } > > Chris, in fact I did have this code before, but then it goes back to the old > problem that the spliced sequence is incorrect. Please try using the > following codes with "DQ089804" as the argument. If you check the printed > result with: > > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=2&itool=EntrezSyst > em2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum > > you'll discover, for example, the sequence of gene P, is derived from > splicing 1-1623 (starts with CTC...) and 2307-3215 (starts with ATG...), > rather than 2307-3215 and 1-1623. Okay, as I mentioned before, then that would be a bug. The best way to handle this is to file it in Bugzilla: http://bugzilla.open-bio.org/ I can likely look at it today, whether it's filed or not, just need to make some time. Please file the bug report, though, just in case I can't get to it right away. BTW, we had some discussion about circular genome support recently at the GMOD conference, and some code was added that was supposed to address the issues raised. I'm guessing we'll need to add more tests just to be sure. chris ... From Jean-Marc.Frigerio at pierroton.inra.fr Tue Mar 23 12:29:11 2010 From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA) Date: Tue, 23 Mar 2010 17:29:11 +0100 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: References: Message-ID: <4BA8EC57.7070802@pierroton.inra.fr> > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help Hi, Have also a look to Gtk-perl and perl-qt Best From Leighton.Pritchard at scri.ac.uk Tue Mar 23 12:35:42 2010 From: Leighton.Pritchard at scri.ac.uk (Leighton Pritchard) Date: Tue, 23 Mar 2010 16:35:42 -0000 Subject: [Bioperl-l] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? Message-ID: Hi, I can't seem to find any discussion of this on the mailing list archives (if anyone has a link, I'll happily follow it), so I was wondering what the rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live mapping CDS features to gene_component_region. For example, if I use the script on the E.coli sequence/annotation NC_000913.gbk, the gene: gene 190..255 /gene="thrL" /locus_tag="b0001" /note="synonyms: ECK0001, JW4367" /db_xref="EcoGene:EG11277" /db_xref="ECOCYC:EG11277" /db_xref="GeneID:944742" CDS 190..255 /gene="thrL" /locus_tag="b0001" /function="leader; Amino acid biosynthesis: Threonine" /function="1.5.1.8 metabolism; building block biosynthesis; amino acids; threonine" /note="GO_process: threonine biosynthetic process [goid 0009088]" /codon_start=1 /transl_table=11 /product="thr operon leader peptide" /protein_id="NP_414542.1" /db_xref="ASAP:ABE-0000006" /db_xref="UniProtKB/Swiss-Prot:P0AD86" /db_xref="GI:16127995" /db_xref="EcoGene:EG11277" /db_xref="ECOCYC:EG11277" /db_xref="GeneID:944742" /translation="MKRISTTITTTITITTGNGAG" Is mapped to NC_000913 GenBank region 190 255 . + . ID=GenBank:region:NC_000913:190:255 NC_000913 GenBank exon 190 255 . + . ID=GenBank:exon:NC_000913:190:255 NC_000913 GenBank gene 190 255 . + . ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank gene_component_region 190 255 . + . Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG NGAG I understand the region-exon-gene part of the model, but not the gene_component_region, which appears to be a catch-all. I would have assumed that the CDS is better mapped to a polypeptide, as described in the CHADO documentation: http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model There is no difference in script output whether --CDS or --noCDS is used. Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From djibrilo at yahoo.fr Tue Mar 23 13:38:25 2010 From: djibrilo at yahoo.fr (djibrilo) Date: Tue, 23 Mar 2010 10:38:25 -0700 (PDT) Subject: [Bioperl-l] Re : G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4BA8EC57.7070802@pierroton.inra.fr> References: <4BA8EC57.7070802@pierroton.inra.fr> Message-ID: <344176.4737.qm@web23001.mail.ird.yahoo.com> HI, Have also a look to perl/Tk. Best Regards ________________________________ De : Jean-Marc Frigerio INRA ? : bioperl-l at lists.open-bio.org Envoy? le : Mar 23 mars 2010, 17 h 29 min 11 s Objet : Re: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help Hi, Have also a look to Gtk-perl and perl-qt Best _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Mar 23 14:18:46 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 23 Mar 2010 14:18:46 -0400 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: References: Message-ID: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Hi Leighton, I wonder if this is a change stemming from Nathan's work on this script. Nathan? Scott On Tue, Mar 23, 2010 at 12:35 PM, Leighton Pritchard wrote: > Hi, > > I can't seem to find any discussion of this on the mailing list archives (if > anyone has a link, I'll happily follow it), so I was wondering what the > rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live > mapping CDS features to gene_component_region. > > For example, if I use the script on the E.coli sequence/annotation > NC_000913.gbk, the gene: > > ? ? gene ? ? ? ? ? ?190..255 > ? ? ? ? ? ? ? ? ? ? /gene="thrL" > ? ? ? ? ? ? ? ? ? ? /locus_tag="b0001" > ? ? ? ? ? ? ? ? ? ? /note="synonyms: ECK0001, JW4367" > ? ? ? ? ? ? ? ? ? ? /db_xref="EcoGene:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="ECOCYC:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="GeneID:944742" > ? ? CDS ? ? ? ? ? ? 190..255 > ? ? ? ? ? ? ? ? ? ? /gene="thrL" > ? ? ? ? ? ? ? ? ? ? /locus_tag="b0001" > ? ? ? ? ? ? ? ? ? ? /function="leader; Amino acid biosynthesis: Threonine" > ? ? ? ? ? ? ? ? ? ? /function="1.5.1.8 metabolism; building block > ? ? ? ? ? ? ? ? ? ? biosynthesis; amino acids; threonine" > ? ? ? ? ? ? ? ? ? ? /note="GO_process: threonine biosynthetic process [goid > ? ? ? ? ? ? ? ? ? ? 0009088]" > ? ? ? ? ? ? ? ? ? ? /codon_start=1 > ? ? ? ? ? ? ? ? ? ? /transl_table=11 > ? ? ? ? ? ? ? ? ? ? /product="thr operon leader peptide" > ? ? ? ? ? ? ? ? ? ? /protein_id="NP_414542.1" > ? ? ? ? ? ? ? ? ? ? /db_xref="ASAP:ABE-0000006" > ? ? ? ? ? ? ? ? ? ? /db_xref="UniProtKB/Swiss-Prot:P0AD86" > ? ? ? ? ? ? ? ? ? ? /db_xref="GI:16127995" > ? ? ? ? ? ? ? ? ? ? /db_xref="EcoGene:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="ECOCYC:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="GeneID:944742" > ? ? ? ? ? ? ? ? ? ? /translation="MKRISTTITTTITITTGNGAG" > > Is mapped to > > NC_000913 ? ? ? GenBank region ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=GenBank:region:NC_000913:190:255 > NC_000913 ? ? ? GenBank exon ? ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=GenBank:exon:NC_000913:190:255 > NC_000913 ? ? ? GenBank gene ? ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: > ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 > NC_000913 ? ? ? GenBank gene_component_region ? 190 ? ? 255 ? ? . ? ? ? + > . > Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 > ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine > biosynthetic process [goid > 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino > acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block > biosynthesis%3B amino acids%3B > threonine;gene=thrL;locus_tag=b0001;product=thr operon leader > peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG > NGAG > > I understand the region-exon-gene part of the model, but not the > gene_component_region, which appears to be a catch-all. ?I would have > assumed that the CDS is better mapped to a polypeptide, as described in the > CHADO documentation: > > http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model > > There is no difference in script output whether --CDS or --noCDS is used. > > Cheers, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Tue Mar 23 14:15:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 23 Mar 2010 14:15:38 -0400 Subject: [Bioperl-l] BlastPlus Masker In-Reply-To: <464282111003230942r231ca93kf56a2def9afa9651@mail.gmail.com> References: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> <464282111003230942r231ca93kf56a2def9afa9651@mail.gmail.com> Message-ID: Specifying 'dustmasker' for a nucleotide database is roughly the same as "filter : low complexity regions" and "mask : lookup table only", I believe. (There is also a facility for creating masks based on lowercase residues in a mask data fasta file; the blast+ utility is 'convert2blastmask'. You can run this with the SABlastPlus factory. I'm not very familiar with it, but you should be able to take the output file from this utility and feed it in to a new factory as the '-mask_data' to get what you want. (If anyone has done this, a brief step-by-step would be appreciated.)) cheers MAJ ----- Original Message ----- From: Nils M?ller To: Mark A. Jensen Sent: Tuesday, March 23, 2010 12:42 PM Subject: Re: [Bioperl-l] BlastPlus Masker Many thanks, is it the same as showed on the ncbi blast page (Filtering and Masking- filter: Low complexity regions and mask:Mask for lookup table only or Mask lower case letters)? 2010/3/23 Mark A. Jensen Hi Nils, You don't have to specify a mask_data file; the factory should make it for you; try simply $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -create => 1); -mask_data is there so that pre-made masks can be applied separately, or so you can name the file that is produced and preserve it; this is an "advanced feature", I suppose-- MAJ ----- Original Message ----- From: "Nils M?ller" To: Sent: Sunday, March 21, 2010 11:17 AM Subject: [Bioperl-l] BlastPlus Masker Dear all, I am confused in handeling with maskers in blastplus: I have fasta seq. and want to run blast with a low complexity masker like dustmasker: $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -mask_data => 'maskseqs.fas', -create => 1); Is myseqs.fas the same as maskseqs.fas??? I don't want to create a maskfile , I only will run blast with a masked file?? _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From lpritc at scri.ac.uk Wed Mar 24 08:05:08 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 24 Mar 2010 12:05:08 +0000 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Message-ID: Hi, I'm surprised that this issue hasn't come up already, as the change to the gene model is quite significant. For comparison, this is what the old bp_genbank2gff3.pl script would produce with --CDS: NC_000913 GenBank gene 190 255 . + . ID=thrL;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank mRNA 190 255 . + . ID=thrL.t01;Parent=thrL NC_000913 GenBank CDS 190 255 . + . ID=thrL.p01;Parent=thrL.t01;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0A D86,GI:16127995,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process : threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=length.21 NC_000913 GenBank exon 190 255 . + . Parent=thrL.t01 and with --noCDS: NC_000913 GenBank gene 190 255 . + . ID=thrL;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank mRNA 190 255 . + . ID=thrL.t01;Parent=thrL NC_000913 GenBank polypeptide 190 255 . + . ID=thrL.p01;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995, EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Derives_from=thrL.t01;Note=GO_p rocess: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=length.21 NC_000913 GenBank exon 190 255 . + . Parent=thrL.t01 The new script produces this identical output with both --CDS and --noCDS: NC_000913 GenBank region 190 255 . + . ID=GenBank:region:NC_000913:190:255 NC_000913 GenBank exon 190 255 . + . ID=GenBank:exon:NC_000913:190:255 NC_000913 GenBank gene 190 255 . + . ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank gene_component_region 190 255 . + . Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG NGAG So, although the new script improves the parent-child relationships by identifying parents on the locus_tag field (guaranteed to be unique), rather than gene name (not guaranteed to be unique), the GFF3 gene model has apparently changed from canonical: gene <- mRNA <- {polypeptide/CDS, exon} to this: region ; exon ; gene <- gene_component_region So I guess I don't understand the region-exon-gene part of the new model, after all. This new model doesn't appear to be Sequence Ontology-compatible any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon is no longer considered part_of the transcript. In fact, there's not a transcript. Given that the SO cite bp_genbank2gff3.pl as a way to get SO-compliant GFF3 (http://www.sequenceontology.org/resources/faq.html#convert), this might be an issue requiring a prompt fix or reversion. For now, due to the downstream problems this model causes with GBROWSE and ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the script to use the locus_tag field rather than the gene field for the feature ID. Cheers, L. On 23/03/2010 Tuesday, March 23, 18:18, "Scott Cain" wrote: > Hi Leighton, > > I wonder if this is a change stemming from Nathan's work on this > script. Nathan? > > Scott -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From cjfields at illinois.edu Wed Mar 24 09:06:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 08:06:01 -0500 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: References: Message-ID: <3A556027-C8DB-4683-8376-A42AC8796156@illinois.edu> On Mar 24, 2010, at 7:05 AM, Leighton Pritchard wrote: > Hi, > > I'm surprised that this issue hasn't come up already, as the change to the > gene model is quite significant. For comparison, this is what the old > bp_genbank2gff3.pl script would produce with --CDS: > ... > So, although the new script improves the parent-child relationships by > identifying parents on the locus_tag field (guaranteed to be unique), rather > than gene name (not guaranteed to be unique), the GFF3 gene model has > apparently changed from canonical: > > gene <- mRNA <- {polypeptide/CDS, exon} > > to this: > > region ; exon ; gene <- gene_component_region > > So I guess I don't understand the region-exon-gene part of the new model, > after all. This new model doesn't appear to be Sequence Ontology-compatible > any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon > is no longer considered part_of the transcript. In fact, there's not a > transcript. Given that the SO cite bp_genbank2gff3.pl as a way to get > SO-compliant GFF3 > (http://www.sequenceontology.org/resources/faq.html#convert), this might be > an issue requiring a prompt fix or reversion. I agree. I think this commit needs more code review to understand the reasoning behind it, though it will be a little trickier than a simple reversion (I think there have been additional unrelated commits since then). Nathan, was this the intent, or is this a bug? I would agree with Leighton that it's the latter. chris > For now, due to the downstream problems this model causes with GBROWSE and > ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the > script to use the locus_tag field rather than the gene field for the feature > ID. > > Cheers, > > L. From pmiguel at purdue.edu Wed Mar 24 09:49:55 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed, 24 Mar 2010 09:49:55 -0400 Subject: [Bioperl-l] How to set "complexity" param using EUtilities Message-ID: <4BAA1883.3010203@purdue.edu> Just a little FYI that might help someone using GenBank efetch (here with bioperl EUtilities) and, contrary to expectation, retrieving a bunch of accessions (or GIs) when that single accession is what is wanted. The trick is to change the "complexity" parameter from its apparent default of "1" to "0". Actually, this parameter might be worth adding to the HOWTO because it causes the EUtilities efetch to perform similar to a normal Entrez search. Which, to me, would be the expected behavior. Details below. Some accessions/GIs appear to be embedded in bundles of related sequences. Here is an example: gi|158819346|gb|EU011641.1| If I search Entrez Nucleotide http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar with the either "158819346" (the GI) or "EU011641.1", I get a single record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal RNA gene, partial sequence". This what I want. If I use the following code derived from the Eutils HOWTO: use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $id ='gb|EU011641.1|'; push @ids ,$id; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); my $file = "test.gb"; $factory->get_Response(-file => $file); I get a bundle of accessions: EU011584-EU011663. Same result using the GI number instead. From reading: http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam it looks like I would get what I want were I to set the efetch "complexity" parameter to "1". But how do I set that parameter? Below is how I did it. Not the most efficient path, but did not take that long to traverse... The HowTo does not mention it. I usually look to the the Deobfuscator: http://bioperl.org/cgi-bin/deob_interface.cgi to help me when I want some documentation for a method. But this is a parameter not a class. What class sets this parameter? Not sure. So I googled: complexity eutil site:bioperl.org The top ranked hit is actually to the deprecated 1.5.2 version of EUtilities. But the 2nd hit is to the (auto generatated?) email posted to the bioperl-guts email list by Chris Fields upon his commit of the new EUtilities overhaul: http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html From here it looks like the obvious way to set the parameter would be possible. And indeed: use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $id ='gb|EU011641.1|'; push @ids ,$id; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -complexity =>1, -id => \@ids); my $file = "test.gb"; $factory->get_Response(-file => $file); works! Also a good idea to add -email parameter so that Genbank might chastise me via email, rather than banning my IP, if I try to send more than 100 requests in a series outside of the acceptable 9PM-5AM Eastern Time hours. Phillip From peter at maubp.freeserve.co.uk Wed Mar 24 10:08:26 2010 From: peter at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:08:26 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: References: Message-ID: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> Hi, This is probably of interest to all the Bio* projects offering access to the NCBI Entrez utilities. See forwarded message below. I *think* the new guidelines basically say that the email & tool parameters are optional BUT if your IP address ever gets banned for excessive use you then have to register an email & tool combination. Regarding the email address, the NCBI say to use the email of the developer (not the end user). However, they do not distinguish between the developers of a library (like us), and the developers of an application or script using a library (who may also be the end user). Currently we (Biopython) and I think BioPerl ask developers using our libraries to populate the email address themselves. I *think* this is still the right action. Peter ---------- Forwarded message ---------- From: Date: Wed, Mar 24, 2010 at 1:53 PM Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy To: NLM/NCBI List utilities-announce New E-utility documentation now on the NCBI Bookshelf The Entrez Programming Utilities (E-Utilities) Help documentation has been added to the NCBI Bookshelf, and so?is now fully integrated with the Entrez search and retrieval system as a part of the Bookshelf database. This help document has been divided into chapters for better organization and includes several new sample Perl scripts. At present this book covers the standard URL interface for the E-utilties; material about the SOAP interface will be added soon and is still available at the same URL: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html. Revised E-utility usage policy In December, 2009 NCBI announced a change to the usage policy for the E-utilities that would require all requests to contain non-null values for both the?&email and &tool parameters. After several consultations with our users and developers, we have decided to revise this policy change, and the revised?policy is described in detail at the following link: http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen Please let us know if you have any questions or concerns about this policy change. Thank you, The E-Utilities Team NIH/NLM/NCBI eutilities at ncbi.nlm.nih.gov. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce -------------- next part -------------- _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From joseguillin at hotmail.com Tue Mar 23 13:30:44 2010 From: joseguillin at hotmail.com (Jose .) Date: Tue, 23 Mar 2010 17:30:44 +0000 Subject: [Bioperl-l] Phylo/Phylip/Consense Message-ID: Hello, I'm trying to use Phylo/Phylip/Consense, but I get the following message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: SeqBoot did not create files correctly (/var/folders/+s/+srMEKriEiWM+Q7Qleiti++++TI/-Tmp-/v3no1dYNqE/outfile) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::Tools::Run::Phylo::Phylip::SeqBoot::_run /usr/local/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm:389 STACK: Bio::Tools::Run::Phylo::Phylip::SeqBoot::run /usr/local/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm:339 STACK: INDELVOLUTION_5.1consensus.pl:492 ----------------------------------------------------------- My code is a modification of the code I found at http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Phylo/Phylip/Consense.pm use Bio::Tools::Run::Phylo::Phylip::Consense; use Bio::Tools::Run::Phylo::Phylip::SeqBoot; use Bio::Tools::Run::Phylo::Phylip::ProtDist; use Bio::Tools::Run::Phylo::Phylip::Neighbor; use Bio::Tools::Run::Phylo::Phylip::DrawTree; my $aio = Bio::AlignIO->new(-file =>'yeah.clustalw', -format=> 'clustalw'); my $aln = $aio->next_aln; my ($aln_safe, $ref_name)=$aln->set_displayname_safe(); #next use seqboot to generate multiple aligments my @params = ('datatype'=>'SEQUENCE','replicates'=>10); my $seqboot_factory = Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params); my $aln_ref= $seqboot_factory->run($aln); #my $aln_ref= $seqboot_factory->run($aln_safe); #next build distance matrices and construct trees my $pd_factory = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(); my $ne_factory = Bio::Tools::Run::Phylo::Phylip::Neighbor->new(); my @tree; foreach my $a (@{$aln_ref}){ my $mat = $pd_factory->create_distance_matrix($a); push @tree, $ne_factory->create_tree($mat); } #now use consense to get a final tree my $con_factory = Bio::Tools::Run::Phylo::Phylip::Consense->new(); #you may set outgroup either by the number representing the order in #which species are entered or by the name of the species $con_factory->outgroup(1); my $tree = $con_factory->run(\@tree); # Restore original sequence names, after ALL phylip runs: my @nodes = $tree->get_nodes(); foreach my $nd (@nodes){ $nd->id($ref_name->{$nd->id_output}) if $nd->is_Leaf; } #now draw the tree my $draw_factory = Bio::Tools::Run::Phylo::Phylip::DrawTree->new(); my $image_filename = $draw_factory->draw_tree($tree); And my yeah.clustalw file is OK: CLUSTAL W(1.81) multiple sequence alignment A/1-474 G---CGGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C B/1-452 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C C/1-466 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C D/1-476 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGA-------------TC-CTA----C E/1-439 G---CCGTGGGAGA------TGAGGAACCTGAGGTAGTCC-----TATATCTCTAGCGGC F/1-434 G---CCGTGGGAGA------TGAGGAACCCGAGG---TCC-----TATATCTCTAGCGGC G/1-462 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTA---------------TCTAGCGGC H/1-466 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTAGTCC--------ATCTCTAGCGGC I/1-462 GCTGCCGTGGGAGAGCAACATGAGGAACCGGAGGTAGTCCGGTATTATATCTCTA----C J/1-447 GCTGCCGTGGGAGAGCAACATGAGGAACCGGAGGTAGTCCGGTATTATATCTCTA----C K/1-448 G---CCGTGGGAGAGCA-CATGAGGAACCCGAGGTAGTCCGGT---ATATCTCGA----C L/1-431 G---CCGTGGGAGAGCA-CATGAGGAACCCGAGGTAGTCCGGT---ATATCTCTA----C M/1-432 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCGGTATTATATCTCTA----C N/1-422 G---CC------GAGCAACATGAGGAAC---AGGTTGTC---TATTATATCTCTA----C O/1-441 G---CAGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCG--------TCTCTA----C P/1-446 G---CAGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCG--------TCTCTA----C * * ** ******** *** * * * A/1-474 GCATCGCGGCCCTTGTC-GCTCCCACCCA--CCATC---GACGGC-ACA--TTTGCTTGT B/1-452 GCAT----------GTC-GCTC---------CCATCGCTGACGGC-ACATCTTTG---GT C/1-466 GCATCGCGGCCCTTGTC-GCTCCCACCCATCCCATCGCTGACGGC-ACA-----GCTTGT D/1-476 GCATCGCGGCCCTTGTC-GCTCCCACCCATCCCATCGCTGACGGC-ACA-----GCTTG- E/1-439 GCA-CGCGGCCCT--TC-GCTT---CCCATCCCATCGCTGACGGC-ACATCT----TTGT F/1-434 GCA-CGCGGCCCT--TCCGCTT---CCCATCCCATCGCTTACGGC-ACATCTTTGCTTGT G/1-462 GCATCGCGGCCCT--TC-GCTC---CCCATCCCATCGCTGACGTC-ACATCTTTG-TTGT H/1-466 GCATCGCGGCCCT--TC-GCTC---CCCATCCCATCGCTGACGGC-ACATCTTTGCTTGT I/1-462 GCAT-CCGGCCCTTGTC-GCTCCCA------CCATCGCTGACGGC-ACAT--TTGCTTGT J/1-447 GC------GCCCTTGTC-GCTCCCA---------TCGCTGACGGC-ACATCTTTGCTTGT K/1-448 GCATCC----CCTTGTC-GCTCCCA------CCATCGCTGACGGC----TCTTTGCTTGT L/1-431 GCATCC----CCTTGTC-GCTCCCA------CCATCGCTGACGGC----TCTTTGCTTGT M/1-432 GCATC---GCCCTTGTC-GCTCCCA------CCATCGCTGAC-GC-ACATC-TTGCTTGT N/1-422 GCATC---GCCCTTGTC-GCTCCCA------CCATCGCTGACAGCAACATCTTTGCTTGT O/1-441 GCATC---GCCCTTGTC-GCTCCCA------CCATCTCTGACGGC-ACATCTTTGCTTGT P/1-446 GCATC---GCCCTTGTC-GCTCCCA------CCATCTCTGACGGC-ACATCTTTGCTTGT ** ** *** ** ** * * A/1-474 ACGAGATTGCTTTCACACTA-TCTATTGTTCGGGTACCGAGAGTCGGCGGTGAATACATC B/1-452 ACGAGATTGCGTTCACACTA-TCCATTGTTCGGGTACCGAGAGTC-GCGGTGAATACATC C/1-466 ACGTG--TGCGTTCCCACTAATCCATTGTTCGGGTAACGAGAGTCGGCGGTGAATACATG D/1-476 -CGTGATTGCGTTCCCACTAATCCATTGTTCGGGTAACGAGAGTCGGCGGTGAATACATC E/1-439 ACGTGATTGCG----CA--AATCCATTGT---GGTACCGAGAGTCGGCGGTGAACT---C F/1-434 ACGTGATTGCG----CA--AATCCATTGTTCGGGTACCGAGAGTCG-----GAACT---C G/1-462 ACGT----GCGTTCCCA--AATCCATTGTTCGGGTACCGAGAGTCGGCGGTGAACT---C H/1-466 ACGT-------TTCCCA--AATCCAT---TCGGGTACCGAGAGTCGGCGGTGAACT---C I/1-462 ACGTGATTGC--TCCCACCAATCCAT-GTTCGGGTACCGAGAGTCGGCGGTGAACTCATC J/1-447 ACGTGATTGC--TCCCACTAATCCAT-GTTCGGGTACCGA-----------GAACTCATC K/1-448 ACGTGATTGC--TCCCACTAATCCACTG--------CCGAGAGTCGGCGGTG---CCATC L/1-431 ACGTGATTGC--TC------ATC--TTGTTCGGGTACCGA-----GGCGGTGAACTCATC M/1-432 ACGTGATTGC--TCCCACTAATCC----TTCGGGTACCAAGAGTCGGCGGTGAACTCATC N/1-422 ACGTGATTGC--TCCCACTAATCC----TTCGGGTACCAAGAGTCGGCGGTGAACTCATC O/1-441 ACGTGATTGC--TCCCACTAATCCAT--TTCGGGTACCGAGAGTCGGCGGTGAACTCATC P/1-446 ACGTGATTGC--TCCCACTAATCCATTG--CGGGTACCGAGAGTCGGCGGTGAACTCATC ** ** * * * A/1-474 TCCGGAG--AAGTGTGCTAACCACAGTG--GAACGTATAATGCTGATCCCGCTTGTTT-- B/1-452 TCCGGAG--AA--GTGCTAACCACAGTG--GAACGTATAATGCTGAT-CCGCTT-TTT-- C/1-466 TCCGGAG--AAGTGTGCTAACCACAGTG--GAAAGTATAATGCT-----------TTT-- D/1-476 TCCGGAG--AAGTGT---AACCACAGTG--GAAAGTATAATGCTGATCCCGCTTGTTT-- E/1-439 TCCGG-----AGTGTGG-AACCACAGTG--GAACGTATAATGC--ATCTCGCGTGTTT-- F/1-434 TCCGG-----AGTGTGGTAACCACAGTG--GAACGTATAATGC--ATCCCGCGTGTTT-- G/1-462 TCCGGAG--AAGTGTGGTAACCACAGTG--GAACGTATAATGC--ATC--GCGTGTTT-- H/1-466 TCCGGAG--AAGTGTGGTAACCACAGT----AACGTAT-ATGC--ATCCCGCGTGTTT-- I/1-462 TCCGGAG--AAGTGTGGTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC J/1-447 TCGGGAG--AAGTGTGCTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC K/1-448 TCCGGAG--AAGTGTGGTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC L/1-431 TCCGGAG--AAGTGTG----CCACAGTGCCGAAC--ATAATGC--ATC--GCGTGTTTGC M/1-432 TCCGGAGGAAAGTGTGGTAACCACAGTG--GAAC---------------CGC----TTCC N/1-422 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGC----TTCC O/1-441 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGCGTGTTTCC P/1-446 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGCGTGTTTCC ** ** * ** ******* ** ** A/1-474 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----ATGTAC-CCGAGGACAACTAACAGT B/1-452 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----AGGTAC-CCGAGGACAACTAACAGT C/1-466 --CTGTACCTAAAGTTCACCGGGTAGAGCCTCGTCATGTAC-CCG-----AACTAACAGT D/1-476 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----ATGTAC-CCGAGGACAACTAACAGT E/1-439 --CCGTACCTAAAGTT------GTAGGGCC-----ATGTACACCGAGGACAACTAACAGT F/1-434 --CCGTACCTAAAGTT-----GGTAGGGCC-----ATGTACACCGAGGACAACTAACAGT G/1-462 --CCGTACCTAAAGTTCTCC--GTAGGGCC-----ATGTACACCGAGGACAACTAACAGT H/1-466 --CCGTACCTAAAGTTCACCGGGTAGGGCC-----ATGTACACCGAGGACAACTAACAGT I/1-462 GATCGTACCTAAAGTTCACC--------CC-----A-------CGAG----ACTAACAG- J/1-447 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAG- K/1-448 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAGT L/1-431 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAGT M/1-432 GACCGTACCT-----T-ACCG-GTAGCGCC-----ATGTACACCGAGC---ACTA----T N/1-422 GACCGTACCT-----TCACCG-GTAGTGCC-----ATGTACACCGAGC---ACTAACAGT O/1-441 GACCGTACCT-----TCACCG-GTAGCGCC-----ATGTACACCGAGC---ACTAACAGT P/1-446 GACCGTACCT-----TCACCG-GTAGCGCC-----ATG---ACCGAGC---ACTAACAGT ****** * ** * ** **** A/1-474 GATCCTCA----TCTAAGCGCCGCTTCAGGAC----ATTGCCACGTCTACATCG------ B/1-452 GATCCTCA----TTTAAGCGCCGCTTCAGGCC----ATTGCCACGTCTACATCG------ C/1-466 GATCCTCA----TTTAAGCGCCGCTTCAGGAC----ATTACCACGTCTACATCGTTTCAT D/1-476 GATCCTCA----TTTAAGCGCCGCTTCAGGAC----ATTACCACGTCTACATCGTTTCCT E/1-439 GATCCTCA----TTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT F/1-434 GATCCTCA----TTTAAGCGCCGC---AGGACTTTTATTGCCACGTCTACATCGTTTCAT G/1-462 GATCCTCACAATTTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT H/1-466 GATCCTC-CCATTTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT I/1-462 ---CCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCCACGTCTACATC---TCAT J/1-447 ---CCTCA----T-TAAGCGCCGCTGCAGGAC----ATTGCCACGTCTACATCGTTTCAT K/1-448 GATCCTCA----TTTAAGCGCCGCTGCAGG-------TTGCCACGTCTACATCGTTTCAT L/1-431 GATCCTCA----TTTAAGCGCCGCTGC----------TTGCCACGTCTACATCGTTTCAT M/1-432 GATC--CA----TTTAAGCGCCGCTGCAGG--------TGCCACGTCTACATCGTTTCAT N/1-422 GATC--CA----TTTAAGCGCCGCTGCAGGAA----ATTGCCACGTCTACATCGTTTCAT O/1-441 GATCCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCC--GTCTACATCGTA---- P/1-446 GATCCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCC--GTCTACATCGTTTCA- * * * ********** * ** ********* A/1-474 -CATCTACTCTT--AGGCAGCAACAATTTGTCTCGTTCGACGTACAG--CGAAC--ATGT B/1-452 -CATCTACTCTT--AGGCAGCAACAATT-GTCTCGTTCGATGTACAG--CGAAC--ATGT C/1-466 TCATCTACTTTT--AGCCAGCAACAATTTGTCTCGTAGGATGTACAG--CGAACATA--- D/1-476 TCATCTACTTTT--AGCCAGCAACAATTTGTCTCGTAGGATGTACAG--CGAACATA--- E/1-439 TCATCTACTTTT--AGGCAGCAACA---TGTATCGTACGATGTACAG--CGAACATATGT F/1-434 TCATCTACTTTT--AGGCAGCAACA---TGTATCGTACGATGTACAG--CGAA------T G/1-462 TCATCTACTTTT--AGGC-GCAACAATCTGTATCG-ACGATGTAC-G--CGAACATATGT H/1-466 TCATCTACTTTT--AGGC-GCAACAATCTGTATCG-ACGATGTAC-G--CGAACATATGT I/1-462 TCACCTACTTTT--AGGGAGCAACAATCTGTATCC---G--GTACAGACCGAACATAGGA J/1-447 TC----AC-TTT--AGGGAGCAACAATCTGTATCC---G--GTAC---CCGAACATAGGT K/1-448 TCACCTACTTTT--AGGCAGCAACAATCT--ATCC---G--GTAC-GACCGAACATAGGT L/1-431 TCACCTACTTTT--AGGCAGCAACAATCT--ATCC---G--GTAC-GACCGAACATAGGT M/1-432 TCATTTACT-----AGGCAGCAACAATCTGTATC--------TATAGACCGAGCATATGT N/1-422 TCATCTACT-----AGGCAGCAACAATCTGTATCC---G--GTATAGACCAAGCATATGT O/1-441 ------ACTTTT--AGGCAGCAAC--TCTGTATCC---G--GTATAGACCGAACATATGT P/1-446 ------ACTTTTTGAGGCAGCAAC--TCTGTATCC---G--GTATAGACCGAACATATGT ** ** ***** ** ** * * A/1-474 GGGGCGTAAGACCAAAGTT--TATCGTTGGCCTTATTCGACCCAA-CAATTCGCGGATA- B/1-452 GGGGCGTAAGACCAAAGTT--TATCGTTGGCCTTATTCGACCCAA-CAATTCGCGGATA- C/1-466 TGGGCGTAAGACCAAAGTTGAT--CGTTGG---TATTCGACCCAATCAAGTCGCG----- D/1-476 TGGGCGTAAGACCAAAGTTGAT--CGTGGGCCTTATTCGACCCAATCAATTCGCG---A- E/1-439 T----GTAAGACCAAAGTT--TATCGTTGG---TATTTGACCCAGGCAATTCGCGGATA- F/1-434 T----GTAAGACCAAAGTT--TATCGTTGG---TATTTGACCCAGGCAATTCGCGGATA- G/1-462 T--GCGTAAGACCAAAGTT--TATCGTTGGCCTTATTTGACC----CAATTCGCGGGTA- H/1-466 T--GAGTAAGACCAAAGTT--TATCGTTGGCCTTATTTGACC----CAATTCGCGGGTA- I/1-462 TGTGCTTAAGACCAAAGTT--TATCGTT------ATATGACCCAAGCAATTCGCGGATA- J/1-447 -GTGCTTAAGACCAAAGTT--TATCGTT------ACATGACCCAAGCAATTCGCGGATA- K/1-448 TGGGCGCAAGACCAAAGTT--TATCGTT------ATTTGACCCAAGCAATTCGCGGATAC L/1-431 TGGGCGCAAGACCAAAGTT--TATCGTT------ATTTGACCCAAGCAATTCGC-GATA- M/1-432 TGGGCGTAAGACCAAAGTT--TATCGTTGGCTTT----GACCCAAGCAAT--GC------ N/1-422 TGGGGGTAAGACCAA-------------GGCTTT----GACCCAAGCAAT--GC------ O/1-441 TGGGCG-AAGACCAAAGTT--TATCGATGGCCTTATTTGACCCAAGCAAT--GCGGATA- P/1-446 TGGGCG-AAGACCAAAGTT--TATCGATGGCCTTATTTGACCCAAGCAAT--GCGGATA- ******** **** *** ** A/1-474 -A--AT-------TTATTCATTATTACCACTGATCAC--CCTG-CACCTATGCGGTTT-- B/1-452 -A--ATCCCGTCTTTATTC------ACCACTGATCAC--CCTG-CAC--ATGCGGTTT-- C/1-466 -----TCCCGTCTTTATTCATTATAACCACTGATCAC--CCTGGCAC--ATGCGCTTT-- D/1-476 -A--ATCCCGTCTTTATTCATTATAACCACTGATCACGACCTGGCAC--ATGCGCTAT-- E/1-439 -A---TCCCGTCTTTATT--TTTTTAGC-CTGATCTC--CCTGGCAC--AT--------- F/1-434 -A---TCCCGTCTTTATTCATTTTTACC-CTGATCTC--C---------AT--------- G/1-462 -A--ATCCCGTCTTTATTCATTATAACC-CTGATCTC--CCTGGCAC--ATGCGGTTA-- H/1-466 -A--ATCCCGTCTTTATTCATTATAACC-CTGATCTC--CCTGGCAC--ATGCGGTTA-- I/1-462 -AGGATCCTGT--TTATTCTTTATAACC-CTGATCAC--CCTGGCAT--ATGCGGTTTGC J/1-447 -AGGATCCCGT--TTATTCTTTATAACC-CTGATCAC--CCTGGCAC--ATGCGGTTTGC K/1-448 AAGGATCCCGT-----GTCATTATAACC-CTGATCAC--ACTGGCAC--ATGCGGTTTGC L/1-431 -AGGATCCCGT-----TTCATTAT--CC-CTG-TCAC--CCTGGCAC--ATGCGGTTTGC M/1-432 --GGATCCCGT--TTATTCATTAAAACC-CTGA---C--CCTGGCAC--ATGCGGTTTGC N/1-422 --GGATCCCGT--TTATTCATTATAACC-CTGA---C--CCTGGCAC--ATGCGGTTTGC O/1-441 -ATGATCCCGT--TTATTCATTATAACC-CT---CAC--CCTGGCAC--ATGCGGTTTGC P/1-446 -AGGATCCCGT--TTATTCATTATAACC-CTGATCAC--CCTGGCAC--ATGCGGTTTGC * * * ** * ** A/1-474 ACTTCGATGCC B/1-452 ACTTCGATGCC C/1-466 ACTTCGATG-- D/1-476 ACTTCGATGCC E/1-439 -CTTCGATGCC F/1-434 -CTTCGATGCC G/1-462 ACTTCGATG-- H/1-466 ACTTCGATGCC I/1-462 --TTCGATGCC J/1-447 ACTTCGATGCC K/1-448 ACTTCGATG-- L/1-431 ACTTCGATG-- M/1-432 ACTTCGATGCC N/1-422 ACTTCGATGCC O/1-441 ACTTCG-TGCC P/1-446 ACTTCG-TGCC **** ** I have tried different things, but I don't really know why do I have this problem... Does anyone knows? Thank you very much in advance, Jose G. _________________________________________________________________ ?Quieres saber qu? PC eres? ?Desc?brelo aqu?! http://www.quepceres.com/ From cjfields at illinois.edu Wed Mar 24 10:37:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 09:37:13 -0500 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> Message-ID: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> On Mar 24, 2010, at 9:08 AM, Peter wrote: > Hi, > > This is probably of interest to all the Bio* projects offering access > to the NCBI > Entrez utilities. See forwarded message below. > > I *think* the new guidelines basically say that the email & tool parameters are > optional BUT if your IP address ever gets banned for excessive use you then > have to register an email & tool combination. > > Regarding the email address, the NCBI say to use the email of the developer > (not the end user). However, they do not distinguish between the developers > of a library (like us), and the developers of an application or script using a > library (who may also be the end user). > > Currently we (Biopython) and I think BioPerl ask developers using our libraries > to populate the email address themselves. I *think* this is still the > right action. > > Peter Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I think with the SOAP-based ones as well). We're providing a specific set of tools for user to write up their own applications end applications. I can try contacting them regarding this to get an official response to clarify this somewhat. Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a default, but always leave the email blank and issue a warning if it isn't set. We could just as easily leave both blank and issue warnings for both. chris > ---------- Forwarded message ---------- > From: > Date: Wed, Mar 24, 2010 at 1:53 PM > Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy > To: NLM/NCBI List utilities-announce > > > New E-utility documentation now on the NCBI Bookshelf > > The Entrez Programming Utilities (E-Utilities) Help documentation has > been added to the NCBI Bookshelf, and so is now fully integrated with > the Entrez search and retrieval system as a part of the Bookshelf > database. This help document has been divided into chapters for better > organization and includes several new sample Perl scripts. At present > this book covers the standard URL interface for the E-utilties; > material about the SOAP interface will be added soon and is still > available at the same URL: > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html. > > > > Revised E-utility usage policy > > In December, 2009 NCBI announced a change to the usage policy for the > E-utilities that would require all requests to contain non-null values > for both the &email and &tool parameters. After several consultations > with our users and developers, we have decided to revise this policy > change, and the revised policy is described in detail at the following > link: > > http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen > > Please let us know if you have any questions or concerns about this > policy change. > > > > Thank you, > > The E-Utilities Team > > NIH/NLM/NCBI > > eutilities at ncbi.nlm.nih.gov. > > > > _______________________________________________ > Utilities-announce mailing list > http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Wed Mar 24 10:51:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:51:46 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> Message-ID: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: > > On Mar 24, 2010, at 9:08 AM, Peter wrote: > >> Hi, >> >> This is probably of interest to all the Bio* projects offering access >> to the NCBI Entrez utilities. See forwarded message below. >> >> I *think* the new guidelines basically say that the email & tool parameters are >> optional BUT if your IP address ever gets banned for excessive use you then >> have to register an email & tool combination. >> >> Regarding the email address, the NCBI say to use the email of the developer >> (not the end user). However, they do not distinguish between the developers >> of a library (like us), and the developers of an application or script using a >> library (who may also be the end user). >> >> Currently we (Biopython) and I think BioPerl ask developers using our libraries >> to populate the email address themselves. I *think* this is still the >> right action. >> >> Peter > > > Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I > think with the SOAP-based ones as well). ?We're providing a specific set of > tools for user to write up their own applications end applications. ?I can try > contacting them regarding this to get an official response to clarify this > somewhat. Please give the NCBI an email - you can CC me too if you like. > Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a > default, but always leave the email blank and issue a warning if it isn't > set. ?We could just as easily leave both blank and issue warnings for both. We currently leave out the email and set the tool parameter to "Biopython" by default but this can be overridden. Currently leaving out the email does cause Biopython to give a warning. Peter From pmiguel at purdue.edu Wed Mar 24 10:59:50 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed, 24 Mar 2010 10:59:50 -0400 Subject: [Bioperl-l] How to set "complexity" param using EUtilities In-Reply-To: <4BAA1883.3010203@purdue.edu> References: <4BAA1883.3010203@purdue.edu> Message-ID: <4BAA28E6.4090907@purdue.edu> Sorry, I got that backwards. The default is "0", apparently. But to get entrez-like performance you want "complexity" to be set to "1". Phillip Phillip San Miguel wrote: > Just a little FYI that might help someone using GenBank efetch (here > with bioperl EUtilities) and, contrary to expectation, retrieving a > bunch of accessions (or GIs) when that single accession is what is > wanted. The trick is to change the "complexity" parameter from its > apparent default of "1" to "0". > > Actually, this parameter might be worth adding to the HOWTO because it > causes the EUtilities efetch to perform similar to a normal Entrez > search. Which, to me, would be the expected behavior. > > Details below. > > Some accessions/GIs appear to be embedded in bundles of related > sequences. Here is an example: > > gi|158819346|gb|EU011641.1| > > > If I search Entrez Nucleotide > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar > > with the either "158819346" (the GI) or "EU011641.1", I get a single > record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal > RNA gene, partial sequence". This what I want. > > If I use the following code derived from the Eutils HOWTO: > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $id ='gb|EU011641.1|'; > push @ids ,$id; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -id => \@ids); > > my $file = "test.gb"; > $factory->get_Response(-file => $file); > > I get a bundle of accessions: EU011584-EU011663. > Same result using the GI number instead. > > From reading: > > http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam > > > it looks like I would get what I want were I to set the efetch > "complexity" parameter to "1". > > But how do I set that parameter? Below is how I did it. Not the most > efficient path, but did not take that long to traverse... > > The HowTo does not mention it. I usually look to the the Deobfuscator: > > http://bioperl.org/cgi-bin/deob_interface.cgi > > to help me when I want some documentation for a method. But this is a > parameter not a class. What class sets this parameter? Not sure. So I > googled: > > complexity eutil site:bioperl.org > > The top ranked hit is actually to the deprecated 1.5.2 version of > EUtilities. But the 2nd hit is to the (auto generatated?) email posted > to the bioperl-guts email list by Chris Fields upon his commit of the > new EUtilities overhaul: > > http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html > > > From here it looks like the obvious way to set the parameter would be > possible. And indeed: > > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $id ='gb|EU011641.1|'; > push @ids ,$id; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -complexity =>1, > -id => \@ids); > > my $file = "test.gb"; > $factory->get_Response(-file => $file); > > works! > > Also a good idea to add -email parameter so that Genbank might > chastise me via email, rather than banning my IP, if I try to send > more than 100 requests in a series outside of the acceptable 9PM-5AM > Eastern Time hours. > > Phillip > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at drycafe.net Wed Mar 24 11:27:37 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Mar 2010 11:27:37 -0400 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> On Mar 24, 2010, at 10:51 AM, Peter wrote: > Please give the NCBI an email - you can CC me too if you like. Can't this be the developers' mailing list (or lists, the appropriate one for each toolkit)? We can even whitelist all NCBI sender addresses so they can easily email us if there are issues. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed Mar 24 11:44:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 10:44:21 -0500 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <338BDDD8-2A66-4086-BFB7-35EC8F8F0D66@illinois.edu> On Mar 24, 2010, at 9:51 AM, Peter wrote: > On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: >> >> On Mar 24, 2010, at 9:08 AM, Peter wrote: >> >>> Hi, >>> >>> This is probably of interest to all the Bio* projects offering access >>> to the NCBI Entrez utilities. See forwarded message below. >>> >>> I *think* the new guidelines basically say that the email & tool parameters are >>> optional BUT if your IP address ever gets banned for excessive use you then >>> have to register an email & tool combination. >>> >>> Regarding the email address, the NCBI say to use the email of the developer >>> (not the end user). However, they do not distinguish between the developers >>> of a library (like us), and the developers of an application or script using a >>> library (who may also be the end user). >>> >>> Currently we (Biopython) and I think BioPerl ask developers using our libraries >>> to populate the email address themselves. I *think* this is still the >>> right action. >>> >>> Peter >> >> >> Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I >> think with the SOAP-based ones as well). We're providing a specific set of >> tools for user to write up their own applications end applications. I can try >> contacting them regarding this to get an official response to clarify this >> somewhat. > > Please give the NCBI an email - you can CC me too if you like. Sent, have cc'd the open-bio list. Don't want to cross-post this too much, so I think we should move the discussion there. >> Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a >> default, but always leave the email blank and issue a warning if it isn't >> set. We could just as easily leave both blank and issue warnings for both. > > We currently leave out the email and set the tool parameter to "Biopython" > by default but this can be overridden. Currently leaving out the email does > cause Biopython to give a warning. > > Peter We follow the same, then (down to the warning). This is mentioned in my post to them, I'll wait to see what they say. My concern is the wording of the new rules. Each tool and email must be registered with them if an IP is blocked. Does this mean each tool is assigned one specific email? And an IP that is blocked can register it to be allowed back into the fold? With that in mind, should we register each of our toolkits with them? Probably not a bad thing (it might help us as devs to get an idea of use), but then if one user abuses the rules will their actions affect all toolkit users? Is this all done on a per-IP basis, per-toolkit basis, etc? Unfortunately, at least to me, none of this is made very clear, so I'm hoping there is some clarification from their end. chris From maj at fortinbras.us Wed Mar 24 12:37:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 24 Mar 2010 12:37:56 -0400 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: I think this is a great idea--- MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Peter" Cc: ; "Biopython-Dev Mailing List" ; ; "bioperl-l list" ; "Chris Fields" ; Sent: Wednesday, March 24, 2010 11:27 AM Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > >> Please give the NCBI an email - you can CC me too if you like. > > > Can't this be the developers' mailing list (or lists, the appropriate one for > each toolkit)? We can even whitelist all NCBI sender addresses so they can > easily email us if there are issues. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From thomas.sharpton at gmail.com Wed Mar 24 13:43:48 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 24 Mar 2010 10:43:48 -0700 Subject: [Bioperl-l] Codeml runtime error Message-ID: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> Hi Bioperl gurus, I'm trying to run PAML v4.3b on a series of orthologs, specifically by implementing codeml to detect signatures of positive selection between all orthologous pairs. In some of my files, I notice that I'm getting an EOF error that causes codeml to break. The weirdness is that I only get the EOF error under one hypothesis model (the null) and never on the alternative hypothesis model - even when run on the same initial data. I've managed to track the problem down to the way BioPerl formats the temporary phylip alignment file that is fed into codeml. Apparently, PAML requires there to be at least two spaces between the sequence identifier and the start of the sequence. However, for some files - and I don't know if this is random or not - the temporary alignment file only contains one space after the sequence identifier. If I edit the phylip file accordingly and rerun codeml, the software compiles and processes the data correctly. Has anyone run into this problem before and has someone figured a work around using the kaks_factory in Bio::Tools::Run::Phylo::PAML::Codeml.pm? If this is something others have not seen, I'll submit a full bug report. Best regards, Tom From Russell.Smithies at agresearch.co.nz Wed Mar 24 15:53:45 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 25 Mar 2010 08:53:45 +1300 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> The email thing is mainly to help NCBI contact developers who may be abusing or having trouble with their services. I've had an email from Scott McGinnis at NCBI before after he noticed one of my scripts could be improved. Generally, I've found their developers to be useful - it's just some of their helpdesk people who could use a lesson in being helpful. After all, it's not like they're Google or Microsoft and just collecting addresses so they can spam you later ;-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Thursday, 25 March 2010 5:38 a.m. > To: Hilmar Lapp; Peter > Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython- > Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris > Fields > Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI > RevisedE-utility Usage Policy > > I think this is a great idea--- MAJ > ----- Original Message ----- > From: "Hilmar Lapp" > To: "Peter" > Cc: ; "Biopython-Dev Mailing List" > ; ; "bioperl- > l > list" ; "Chris Fields" > ; > > Sent: Wednesday, March 24, 2010 11:27 AM > Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI > RevisedE-utility Usage Policy > > > > > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > > > >> Please give the NCBI an email - you can CC me too if you like. > > > > > > Can't this be the developers' mailing list (or lists, the appropriate > one for > > each toolkit)? We can even whitelist all NCBI sender addresses so they > can > > easily email us if there are issues. > > > > -hilmar > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > > =========================================================== > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Mar 24 16:01:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 15:01:50 -0500 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> Message-ID: Russell, The problem we're possibly running into now is that (acc. to the documents) we will likely have to define both the tool and email (or neither), as the tool and email are registered together. There are advantages and disadvantages to both scenarios, one that you point out. ATM I'm awaiting back word from NCBI for clarification (I popped 'em an email about this earlier) and will hopefully post their response here if they send one, then we'll hash out what needs to be done. And agreed about Scott, he's always been helpful. chris On Mar 24, 2010, at 2:53 PM, Smithies, Russell wrote: > The email thing is mainly to help NCBI contact developers who may be abusing or having trouble with their services. > I've had an email from Scott McGinnis at NCBI before after he noticed one of my scripts could be improved. Generally, I've found their developers to be useful - it's just some of their helpdesk people who could use a lesson in being helpful. > > After all, it's not like they're Google or Microsoft and just collecting addresses so they can spam you later ;-) > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >> Sent: Thursday, 25 March 2010 5:38 a.m. >> To: Hilmar Lapp; Peter >> Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython- >> Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris >> Fields >> Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI >> RevisedE-utility Usage Policy >> >> I think this is a great idea--- MAJ >> ----- Original Message ----- >> From: "Hilmar Lapp" >> To: "Peter" >> Cc: ; "Biopython-Dev Mailing List" >> ; ; "bioperl- >> l >> list" ; "Chris Fields" >> ; >> >> Sent: Wednesday, March 24, 2010 11:27 AM >> Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI >> RevisedE-utility Usage Policy >> >> >>> >>> On Mar 24, 2010, at 10:51 AM, Peter wrote: >>> >>>> Please give the NCBI an email - you can CC me too if you like. >>> >>> >>> Can't this be the developers' mailing list (or lists, the appropriate >> one for >>> each toolkit)? We can even whitelist all NCBI sender addresses so they >> can >>> easily email us if there are issues. >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Mar 24 15:53:48 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 24 Mar 2010 12:53:48 -0700 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBIRevisedE-utility Usage Policy In-Reply-To: References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com><5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: <1A4207F8295607498283FE9E93B775B406A418BB@EX02.asurite.ad.asu.edu> Well, the problem with NCBI using the address to email about problem users is that the lists can't really identify the user since it isn't a specific program, but someone's specific implementation utilizing the toolkit that is causing problems. So, not sure how this would help with the problem of dealing with trouble users. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen Sent: Wednesday, March 24, 2010 9:38 AM To: Hilmar Lapp; Peter Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython-Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris Fields Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBIRevisedE-utility Usage Policy I think this is a great idea--- MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Peter" Cc: ; "Biopython-Dev Mailing List" ; ; "bioperl-l list" ; "Chris Fields" ; Sent: Wednesday, March 24, 2010 11:27 AM Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > >> Please give the NCBI an email - you can CC me too if you like. > > > Can't this be the developers' mailing list (or lists, the appropriate one for > each toolkit)? We can even whitelist all NCBI sender addresses so they can > easily email us if there are issues. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Mar 24 16:38:31 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 24 Mar 2010 21:38:31 +0100 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> Message-ID: <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> Hi Tom, Thanks for your note. From your description, it sounds like a bug report is in order. If you could include a little test case so we can reproduce it, that would be great. Dave From thomas.sharpton at gmail.com Wed Mar 24 16:40:55 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 24 Mar 2010 13:40:55 -0700 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> Message-ID: <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> Hi Dave, Thanks for the prompt reply. I'll submit a full bug report along with a code snippet and sample data set that should demonstrate the error. If there's anyway I can help, do let me know. Best, Tom On Mar 24, 2010, at 1:38 PM, Dave Messina wrote: > Hi Tom, > > Thanks for your note. From your description, it sounds like a bug > report is in order. If you could include a little test case so we > can reproduce it, that would be great. > > > Dave > From David.Messina at sbc.su.se Wed Mar 24 16:52:59 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 24 Mar 2010 21:52:59 +0100 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> Message-ID: <4BEA53ED-87B6-4EE0-B5E6-AE304A335AA8@sbc.su.se> > Thanks for the prompt reply. I'll submit a full bug report along with a code snippet and sample data set that should demonstrate the error. Terrific, thanks! > If there's anyway I can help, do let me know. Oh don't worry...I will. :) D From cjfields at illinois.edu Thu Mar 25 00:50:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 23:50:11 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] Bio::DB::SeqFeature spliced_seq() In-Reply-To: <4BA7D267.6050704@bioperl.org> References: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> <4BA7D267.6050704@bioperl.org> Message-ID: <46D94C25-4E2D-4E64-A696-1C9D3F785EEB@illinois.edu> Yes, that's essentially what I have working now. I suppose the best way to do this is have an optional type supplied and splice only those, checking the subfeatures to ensure that type exists. I'll check against SeqFeatureI's spliced_seq() to see if there are any API issues. chris On Mar 22, 2010, at 3:26 PM, Jason Stajich wrote: > Yes it needs a special case I guess - since spliced_seq should work, > however ... The only problem is that if both exons and CDS are > sub-features you have to be smart enough to not grab both... > > So I have just relied on specialized dumping scripts for gff3_to_cds for > my own needs (i.e. > http://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/dbgff_to_cdspep.pl > ). > But you might also see what the Gbrowse plugin dumpers do. > > -jason > Chris Fields wrote, On 3/22/10 11:56 AM: >> I have just noticed that spliced_seq() is borked with >> Bio::DB::SeqFeature and am thinking about implementing it. Or is >> similar functionality already implemented elsewhere? >> >> Currently, it is calling entire_seq(), which I plan on avoiding simply >> to prevent sucking in the entire sequence into memory. This is >> currently what happens: >> >> >> --------------------------- >> >> my $it = $store->get_seq_stream(-type => 'mRNA'); >> >> my $ct = 0; >> while (my $sf = $it->next_seq) { >> my $seq = $sf->spliced_seq; # dies with exception >> } >> >> --------------------------- >> >> ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >> MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented >> by package Bio::DB::SeqFeature. >> This is not your fault - author of Bio::DB::SeqFeature should be blamed! >> >> STACK: Error::throw >> STACK: >> Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 >> STACK: >> Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 >> STACK: >> Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 >> STACK: >> Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 >> STACK: beestore.pl:17 >> ---------------------------------------------------------------- >> >> >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse From lpritc at scri.ac.uk Thu Mar 25 07:20:01 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 25 Mar 2010 11:20:01 +0000 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Message-ID: Hi, Nathan's been in touch to ask exactly what the command-line was that I was using, and this was missing from the thread so, for info: bp_genbank2gff3.pl --noCDS NC_000913.gbk And bp_genbank2gff3.pl --CDS NC_000913.gbk With occasional absolute paths to the input sequence. L. On 23/03/2010 Tuesday, March 23, 18:18, "Scott Cain" wrote: > Hi Leighton, > > I wonder if this is a change stemming from Nathan's work on this > script. Nathan? > > Scott > -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From aradwen at gmail.com Fri Mar 26 07:29:16 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 26 Mar 2010 12:29:16 +0100 Subject: [Bioperl-l] aacomp.pl problem Message-ID: Hello, I'm facing a little problem with aacomp.pl in scripts examples that comes with Bioperl Here is the error message Can't locate object method "valid_aa" via package "Bio::Tools::CodonTable" at aacomp.pl line 16. Any Idea ? Thx Radwen From David.Messina at sbc.su.se Fri Mar 26 08:51:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 13:51:11 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: Message-ID: Hi Radwen, The latest version of aacomp (from subversion) worked fine for me. That version has this line near the top of the script: # $Id: aacomp.PLS 15088 2008-12-04 02:49:09Z bosborne $ If yours is different, you might try upgrading to the latest version. In fact, I'm almost certain that is the problem, since the valid_aa method is in the Bio::SeqUtils class, not Bio::Tools::CodonTable. Dave From David.Messina at sbc.su.se Fri Mar 26 10:24:25 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 15:24:25 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: Message-ID: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> Hi, Yes, the subversion site is temporarily down. However, there are nightly builds http://www.bioperl.org/DIST/nightly_builds/ and the Github mirror http://github.com/bioperl Dave On Mar 26, 2010, at 15:20, Radwen Aniba wrote: > The subversion site is down?!!! From David.Messina at sbc.su.se Fri Mar 26 10:35:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 15:35:29 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> Message-ID: <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> Radwen, Please be sure to 'reply all' so that everyone on the list can follow this discussion. > Sorry to ask beginners questions but how to configure these mirrors to upgrade ? > > I'm using ubuntu Step 1: download the bioperl-live tarball from, for example, http://www.bioperl.org/DIST/nightly_builds/ Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix Dave From cjfields at illinois.edu Fri Mar 26 10:40:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 09:40:20 -0500 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> Message-ID: <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> On Mar 26, 2010, at 9:35 AM, Dave Messina wrote: > Radwen, > > Please be sure to 'reply all' so that everyone on the list can follow this discussion. > > >> Sorry to ask beginners questions but how to configure these mirrors to upgrade ? >> >> I'm using ubuntu > > > > > Step 1: download the bioperl-live tarball from, for example, http://www.bioperl.org/DIST/nightly_builds/ > > Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > > > > Dave You can also get tarballs of bioperl-live from the github mirror (via the 'Download Source' link): http://github.com/bioperl/bioperl-live These are updated every 15 minutes. chris From aradwen at gmail.com Fri Mar 26 10:41:51 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 26 Mar 2010 15:41:51 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> Message-ID: Thank you 2010/3/26 Chris Fields > > On Mar 26, 2010, at 9:35 AM, Dave Messina wrote: > > > Radwen, > > > > Please be sure to 'reply all' so that everyone on the list can follow > this discussion. > > > > > >> Sorry to ask beginners questions but how to configure these mirrors to > upgrade ? > >> > >> I'm using ubuntu > > > > > > > > > > Step 1: download the bioperl-live tarball from, for example, > http://www.bioperl.org/DIST/nightly_builds/ > > > > Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > > > > > > > > > Dave > > > You can also get tarballs of bioperl-live from the github mirror (via the > 'Download Source' link): > > http://github.com/bioperl/bioperl-live > > These are updated every 15 minutes. > > chris From maj at fortinbras.us Fri Mar 26 10:34:49 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 10:34:49 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <4BABB825.6010803@cse.msu.edu> References: <4BABB825.6010803@cse.msu.edu> Message-ID: <249674A825C14BB3801C6184DEEA7A82@NewLife> Hi Alok-- Thanks for your interest! You should certainly consider applying. I can work with you on developing your application. I'm including the bioperl mailing list on this post; we'll continue to have this conversation on the list so that the helpful, friendly, knowledgeable, compassionate membership can participate. WrapperMaker code is currently available in svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run for an example of how Bio::Tools::Run::WrapperBase and CommandExts are used (er, by me...). cheers MAJ ----- Original Message ----- From: "Alok" To: Sent: Thursday, March 25, 2010 3:23 PM Subject: BioPerl Google SOC project > Hello Mark, > > My name is Alok Watve and I am currently pursuing PhD in Computer > Science at Michigan State University. I was going through the BioPerl > Wiki for Google SOC projects. I have good experience with Perl and was > wondering if I could work on the project "Perl Run Wrappers". > > Prior to joining MSU, I was working with D E Shaw India Software Pvt. > Ltd. My work was involved in writing Java programs and their perl > wrappers. We used perl scripts to fire java programs with all the > correct parameters. So I think I have some idea about what wrappers are. > However, I have not used BioPerl and may take some time to get familiar > with the structure. I am fairly confident that I will be able to do this. > > During my work here at MSU. I use perl a lot for doing basic text > analysis for my projects. Although I rarely use OO features of perl, I > have used them in past and never had any problems with it. I also > believe in writing well-documented and user/developer friendly code > (With comments, command line options for help/documentation). I have > attached a simple script I wrote for my project as an example. I have > also attached my resume for your consideration. > > Please let me know if you think that I am an appropriate candidate and > whether I should go ahead with submitting an application with BioPerl as > my Mentor Organization. > > Thanks a lot, > Alok > www.cse.msu.edu/~watvealo/ > -------------------------------------------------------------------------------- > #!/usr/bin/perl > > =pod > > =head1 SYNOPSIS > > Script to edit existing box query files to enable random box query. > This scripts inserts box size on each line corresponding to discrete > dimension in the existing box query file. The maximum value of "box size" > depends on the alphabet size. > > Example > ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile > > Use -perldoc for detailed help on options. > > =head1 OPTIONS > > =over > > =item -infile > > Specifies the name of the input box query file. > > =item -outfile > > Specifies the name of the output file. > > =item -uniform_box > > Specifies size of the uniform box query. > > =item -max_size > > Specifies the maximum box size for random sized box query. > > =item -help > > Displays a brief help message and exits. > > =item -perldoc > > Displays a detailed help. > > =back > > =cut > > use strict; > use warnings 'all'; > > use Getopt::Long; > use Pod::Usage; > > GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, > 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, > 'help' => \my $help, 'perldoc' => \my $perldoc); > > if(defined($perldoc)) > { > pod2usage(-verbose => 2); > } > > if(defined($help)) > { > pod2usage(-verbose=> 0); > } > > if(! (defined($infile) && defined ($outfile) )) > { > die('Please specify input, output files. Use -perldoc > for more help'); > } > > # Some basic error checking to ensure script runs .... > if(!(defined($uniformBox) ||defined($maxSize))) > { > die('Specify either box size for uniform box queries or maximum box size > for random box queries'); > } > > # Initialize random number generator. > srand(); > > # Read Input file and find out lines we are interested in > # Then perfix the line with correct box size as defined by > # user choice > open(IN, "<$infile"); > open(OUT, ">$outfile"); > my $count = 0; > while(my $line = ) > { > if( ($count%64) < 32 ) > { > if(defined($uniformBox)) > { > $line = sprintf("%d ",$uniformBox) . $line; > } > elsif(defined($maxSize)) > { > # This line corresponds to the discrete dimension. > $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; > } > } > $count ++; > print OUT $line > } > > close(OUT); > close(IN); > From cjfields at illinois.edu Fri Mar 26 11:06:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 10:06:26 -0500 Subject: [Bioperl-l] BioPerl and the Google Summer of Code Message-ID: Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via twitter: http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html http://use.perl.org/~cjfields/journal/40275 I'll update the BioPerl page with a couple more ideas later today (think: Moose and/or Perl6...). chris From awitney at sgul.ac.uk Fri Mar 26 11:20:36 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 26 Mar 2010 15:20:36 +0000 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl Message-ID: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please read the install the bioperl-ext package" even though i did compile and install the Bio::Ext::Align package If not using the pSW module, what do other people use for this? thanks adam From cjfields at illinois.edu Fri Mar 26 11:51:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 10:51:41 -0500 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl In-Reply-To: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> References: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> Message-ID: <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> It's not actively developed as far as I know. I've been thinking that we could break it out of bioperl-ext and release it on it's own, with the intent that someone could take it up at some point. We have started down that road with the HMM tools in bioperl-ext, though that one is still maintained by it's author. I know many users just use calls to outside programs, such EMBOSS (which has water and needle) or others. From the maintenance standpoint they're easier to update if something changes, XS can be a bugbear. chris On Mar 26, 2010, at 10:20 AM, Adam Witney wrote: > Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error > > "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. > Please read the install the bioperl-ext package" > > even though i did compile and install the Bio::Ext::Align package > > If not using the pSW module, what do other people use for this? > > thanks > > adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pmiguel at purdue.edu Fri Mar 26 11:52:17 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 26 Mar 2010 11:52:17 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook Message-ID: <4BACD831.20506@purdue.edu> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. I am trying to use the code provided at: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO and modified to request gi228534658 The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::DB::EUtilities; my @ids; push @ids, '228534658'; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); my $file = 'myseqs.gb'; # dump HTTP::Response content to a file (not retained in memory) $factory->get_Response(-file => $file); my $seqin = Bio::SeqIO->new(-file => $file, -format => 'genbank'); while (my $seq = $seqin->next_seq) { print "I see a sequence\n"; print $seq->species(); } "myseqs.gb" does have content: Seq-entry ::= seq { id { general { db "gpid:36555" , tag str "contig49313" } , genbank { accession "EZ113652" , version 1 } , gi 228534658 } , descr { title "TSA: Zea mays contig49313, mRNA sequence." , source { genome genomic , org { taxname "Zea mays" , db { { db "taxon" , tag id 4577 } } , orgname { name binomial { genus "Zea" , species "mays" } , lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACCAD clade; Panicoideae; Andropogoneae; Zea" , gcode 1 , mgcode 1 , div "PLN" } } } , molinfo { biomol mRNA , tech tsa } , pub { pub { article { title { name "Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing." } , authors { names std { { name name { last "Vega-Arreguin" , initials "J.C." } } , { name name { last "Ibarra-Laclette" , initials "E." } } , { name name { last "Jimenez-Moraila" , initials "B." } } , { name name { last "Martinez" , initials "O." } } , { name name { last "Vielle-Calzada" , initials "J.P." } } , { name name { last "Herrera-Estrella" , initials "L." } } , { name name { last "Herrera-Estrella" , initials "A." } } } } , from journal { title { iso-jta "BMC Genomics" , ml-jta "BMC Genomics" , issn "1471-2164" , name "BMC genomics" } , imp { date std { year 2009 , month 7 , day 6 } , volume "10" , issue "1" , pages "299" , language "ENG" , pubstatus aheadofprint , history { { pubstatus received , date std { year 2008 , month 12 , day 2 } } , { pubstatus accepted , date std { year 2009 , month 7 , day 6 } } , { pubstatus aheadofprint , date std { year 2009 , month 7 , day 6 } } , { pubstatus other , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } , { pubstatus pubmed , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } , { pubstatus medline , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } } } } , ids { pii "1471-2164-10-299" , doi "10.1186/1471-2164-10-299" , pubmed 19580677 } } , pmid 19580677 } } , pub { pub { sub { authors { names std { { name name { last "Vega-Arreguin" , first "Julio" , initials "J.C." } } , { name name { last "Ibarra-Laclette" , first "Enrique" , initials "E." } } , { name name { last "Jimenez-Moraila" , first "Beatriz" , initials "B." } } , { name name { last "Martinez" , first "Octavio" , initials "O." } } , { name name { last "Vielle-Calzada" , first "Jean" , initials "J.Philippe." } } , { name name { last "Herrera-Estrella" , first "Luis" , initials "L." } } , { name name { last "Herrera-Estrella" , first "Alfredo" , initials "A." } } } , affil std { affil "Laboratorio Nacional de Genomica para la Biodiversidad" , div "Cinvestav Campus Guanajuato" , city "Irapuato" , sub "Guanajuato" , country "Mexico" , street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , postal-code "36821" } } , medium other , date std { year 2009 , month 3 , day 23 } } } } , user { type str "GenomeProjectsDB" , data { { label str "ProjectID" , data int 36555 } , { label str "ParentID" , data int 0 } } } , create-date std { year 2009 , month 5 , day 5 } , update-date std { year 2009 , month 7 , day 14 } } , inst { repr raw , mol rna , length 450 , seq-data ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 760CFF0'H } } Maybe I am using the wrong format? This looks more like ASN than genbank format to me. Phillip From maj at fortinbras.us Fri Mar 26 11:37:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 11:37:56 -0400 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: References: Message-ID: <648F9E90AF07449887FD4C420AA8B00E@NewLife> and discussions are started in LinkedIn in 'Bioinformatics Geeks' and 'Perl Mongers' groups--MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Friday, March 26, 2010 11:06 AM Subject: [Bioperl-l] BioPerl and the Google Summer of Code > Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via > twitter: > > http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html > http://use.perl.org/~cjfields/journal/40275 > > I'll update the BioPerl page with a couple more ideas later today (think: > Moose and/or Perl6...). > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri Mar 26 12:16:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 11:16:22 -0500 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <4BACD831.20506@purdue.edu> References: <4BACD831.20506@purdue.edu> Message-ID: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. chris On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. > > I am trying to use the code provided at: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO > > and modified to request gi228534658 > > The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. > > Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: > > #!/usr/bin/perl > use strict; > use warnings; > > use Bio::SeqIO; > use Bio::DB::EUtilities; > > my @ids; > push @ids, '228534658'; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -id => \@ids); > > my $file = 'myseqs.gb'; > > # dump HTTP::Response content to a file (not retained in memory) > $factory->get_Response(-file => $file); > > my $seqin = Bio::SeqIO->new(-file => $file, > -format => 'genbank'); > > while (my $seq = $seqin->next_seq) { > print "I see a sequence\n"; > print $seq->species(); > } > > > "myseqs.gb" does have content: > > Seq-entry ::= seq { > id { > general { > db "gpid:36555" , > tag > str "contig49313" } , > genbank { > accession "EZ113652" , > version 1 } , > gi 228534658 } , > descr { > title "TSA: Zea mays contig49313, mRNA sequence." , > source { > genome genomic , > org { > taxname "Zea mays" , > db { > { > db "taxon" , > tag > id 4577 } } , > orgname { > name > binomial { > genus "Zea" , > species "mays" } , > lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; > Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; > PACCAD clade; Panicoideae; Andropogoneae; Zea" , > gcode 1 , > mgcode 1 , > div "PLN" } } } , > molinfo { > biomol mRNA , > tech tsa } , > pub { > pub { > article { > title { > name "Deep sampling of the Palomero maize transcriptome by a high > throughput strategy of pyrosequencing." } , > authors { > names > std { > { > name > name { > last "Vega-Arreguin" , > initials "J.C." } } , > { > name > name { > last "Ibarra-Laclette" , > initials "E." } } , > { > name > name { > last "Jimenez-Moraila" , > initials "B." } } , > { > name > name { > last "Martinez" , > initials "O." } } , > { > name > name { > last "Vielle-Calzada" , > initials "J.P." } } , > { > name > name { > last "Herrera-Estrella" , > initials "L." } } , > { > name > name { > last "Herrera-Estrella" , > initials "A." } } } } , > from > journal { > title { > iso-jta "BMC Genomics" , > ml-jta "BMC Genomics" , > issn "1471-2164" , > name "BMC genomics" } , > imp { > date > std { > year 2009 , > month 7 , > day 6 } , > volume "10" , > issue "1" , > pages "299" , > language "ENG" , > pubstatus aheadofprint , > history { > { > pubstatus received , > date > std { > year 2008 , > month 12 , > day 2 } } , > { > pubstatus accepted , > date > std { > year 2009 , > month 7 , > day 6 } } , > { > pubstatus aheadofprint , > date > std { > year 2009 , > month 7 , > day 6 } } , > { > pubstatus other , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } , > { > pubstatus pubmed , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } , > { > pubstatus medline , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } } } } , > ids { > pii "1471-2164-10-299" , > doi "10.1186/1471-2164-10-299" , > pubmed 19580677 } } , > pmid 19580677 } } , > pub { > pub { > sub { > authors { > names > std { > { > name > name { > last "Vega-Arreguin" , > first "Julio" , > initials "J.C." } } , > { > name > name { > last "Ibarra-Laclette" , > first "Enrique" , > initials "E." } } , > { > name > name { > last "Jimenez-Moraila" , > first "Beatriz" , > initials "B." } } , > { > name > name { > last "Martinez" , > first "Octavio" , > initials "O." } } , > { > name > name { > last "Vielle-Calzada" , > first "Jean" , > initials "J.Philippe." } } , > { > name > name { > last "Herrera-Estrella" , > first "Luis" , > initials "L." } } , > { > name > name { > last "Herrera-Estrella" , > first "Alfredo" , > initials "A." } } } , > affil > std { > affil "Laboratorio Nacional de Genomica para la Biodiversidad" , > div "Cinvestav Campus Guanajuato" , > city "Irapuato" , > sub "Guanajuato" , > country "Mexico" , > street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , > postal-code "36821" } } , > medium other , > date > std { > year 2009 , > month 3 , > day 23 } } } } , > user { > type > str "GenomeProjectsDB" , > data { > { > label > str "ProjectID" , > data > int 36555 } , > { > label > str "ParentID" , > data > int 0 } } } , > create-date > std { > year 2009 , > month 5 , > day 5 } , > update-date > std { > year 2009 , > month 7 , > day 14 } } , > inst { > repr raw , > mol rna , > length 450 , > seq-data > ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 > 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A > A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 > 760CFF0'H } } > > > Maybe I am using the wrong format? This looks more like ASN than genbank format to me. > > Phillip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 26 12:38:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 11:38:26 -0500 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: <648F9E90AF07449887FD4C420AA8B00E@NewLife> References: <648F9E90AF07449887FD4C420AA8B00E@NewLife> Message-ID: <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> BioPerl GSoC page updated with the Moose/Modern Perl/BioPerl 6-based project: http://www.bioperl.org/wiki/Google_Summer_of_Code#BioPerl_2.0_.28and_beyond.29 Feel free to add your name to the lost of mentors if you are interested. chris On Mar 26, 2010, at 10:37 AM, Mark A. Jensen wrote: > and discussions are started in LinkedIn in 'Bioinformatics Geeks' and 'Perl Mongers' groups--MAJ > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Friday, March 26, 2010 11:06 AM > Subject: [Bioperl-l] BioPerl and the Google Summer of Code > > >> Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via twitter: >> >> http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html >> http://use.perl.org/~cjfields/journal/40275 >> >> I'll update the BioPerl page with a couple more ideas later today (think: Moose and/or Perl6...). >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From pmiguel at purdue.edu Fri Mar 26 13:28:09 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 26 Mar 2010 13:28:09 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Message-ID: <4BACEEA9.2060407@purdue.edu> Ah, yes. That does the trick. Actually I have already downloaded a few thousand records in whatever that format that is returned when 'genbank' is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= seq {') Any idea what format that is and how to convert it to something SeqIO can use? If not, I can just pull them all down again by sending about 200 gi's per request. That should not offend the genbank gods... Thanks for your help, Phillip Chris Fields wrote: > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. > > Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. > > chris > > On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > > >> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. >> >> I am trying to use the code provided at: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO >> >> and modified to request gi228534658 >> >> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. >> >> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> >> use Bio::SeqIO; >> use Bio::DB::EUtilities; >> >> my @ids; >> push @ids, '228534658'; >> my $factory = Bio::DB::EUtilities->new( >> -eutil => 'efetch', >> -db => 'nucleotide', >> -rettype => 'genbank', >> -id => \@ids); >> >> my $file = 'myseqs.gb'; >> >> # dump HTTP::Response content to a file (not retained in memory) >> $factory->get_Response(-file => $file); >> >> my $seqin = Bio::SeqIO->new(-file => $file, >> -format => 'genbank'); >> >> while (my $seq = $seqin->next_seq) { >> print "I see a sequence\n"; >> print $seq->species(); >> } >> >> >> "myseqs.gb" does have content: >> >> Seq-entry ::= seq { >> id { >> general { >> db "gpid:36555" , >> tag >> str "contig49313" } , >> genbank { >> accession "EZ113652" , >> version 1 } , >> gi 228534658 } , >> descr { >> title "TSA: Zea mays contig49313, mRNA sequence." , >> source { >> genome genomic , >> org { >> taxname "Zea mays" , >> db { >> { >> db "taxon" , >> tag >> id 4577 } } , >> orgname { >> name >> binomial { >> genus "Zea" , >> species "mays" } , >> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; >> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; >> PACCAD clade; Panicoideae; Andropogoneae; Zea" , >> gcode 1 , >> mgcode 1 , >> div "PLN" } } } , >> molinfo { >> biomol mRNA , >> tech tsa } , >> pub { >> pub { >> article { >> title { >> name "Deep sampling of the Palomero maize transcriptome by a high >> throughput strategy of pyrosequencing." } , >> authors { >> names >> std { >> { >> name >> name { >> last "Vega-Arreguin" , >> initials "J.C." } } , >> { >> name >> name { >> last "Ibarra-Laclette" , >> initials "E." } } , >> { >> name >> name { >> last "Jimenez-Moraila" , >> initials "B." } } , >> { >> name >> name { >> last "Martinez" , >> initials "O." } } , >> { >> name >> name { >> last "Vielle-Calzada" , >> initials "J.P." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> initials "L." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> initials "A." } } } } , >> from >> journal { >> title { >> iso-jta "BMC Genomics" , >> ml-jta "BMC Genomics" , >> issn "1471-2164" , >> name "BMC genomics" } , >> imp { >> date >> std { >> year 2009 , >> month 7 , >> day 6 } , >> volume "10" , >> issue "1" , >> pages "299" , >> language "ENG" , >> pubstatus aheadofprint , >> history { >> { >> pubstatus received , >> date >> std { >> year 2008 , >> month 12 , >> day 2 } } , >> { >> pubstatus accepted , >> date >> std { >> year 2009 , >> month 7 , >> day 6 } } , >> { >> pubstatus aheadofprint , >> date >> std { >> year 2009 , >> month 7 , >> day 6 } } , >> { >> pubstatus other , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } , >> { >> pubstatus pubmed , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } , >> { >> pubstatus medline , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } } } } , >> ids { >> pii "1471-2164-10-299" , >> doi "10.1186/1471-2164-10-299" , >> pubmed 19580677 } } , >> pmid 19580677 } } , >> pub { >> pub { >> sub { >> authors { >> names >> std { >> { >> name >> name { >> last "Vega-Arreguin" , >> first "Julio" , >> initials "J.C." } } , >> { >> name >> name { >> last "Ibarra-Laclette" , >> first "Enrique" , >> initials "E." } } , >> { >> name >> name { >> last "Jimenez-Moraila" , >> first "Beatriz" , >> initials "B." } } , >> { >> name >> name { >> last "Martinez" , >> first "Octavio" , >> initials "O." } } , >> { >> name >> name { >> last "Vielle-Calzada" , >> first "Jean" , >> initials "J.Philippe." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> first "Luis" , >> initials "L." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> first "Alfredo" , >> initials "A." } } } , >> affil >> std { >> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , >> div "Cinvestav Campus Guanajuato" , >> city "Irapuato" , >> sub "Guanajuato" , >> country "Mexico" , >> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , >> postal-code "36821" } } , >> medium other , >> date >> std { >> year 2009 , >> month 3 , >> day 23 } } } } , >> user { >> type >> str "GenomeProjectsDB" , >> data { >> { >> label >> str "ProjectID" , >> data >> int 36555 } , >> { >> label >> str "ParentID" , >> data >> int 0 } } } , >> create-date >> std { >> year 2009 , >> month 5 , >> day 5 } , >> update-date >> std { >> year 2009 , >> month 7 , >> day 14 } } , >> inst { >> repr raw , >> mol rna , >> length 450 , >> seq-data >> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 >> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A >> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 >> 760CFF0'H } } >> >> >> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. >> >> Phillip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bioperlanand at yahoo.com Fri Mar 26 00:40:23 2010 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Thu, 25 Mar 2010 21:40:23 -0700 (PDT) Subject: [Bioperl-l] From Anand - a question on querying ncbi's genomeprj with Bio::DB::Eutilities Message-ID: <27160.94644.qm@web114211.mail.gq1.yahoo.com> Hi everybody, ? I have a list of genome project ids & I have a need where I need to gather information from a specific field? & store the output in a file. As regards what Info I want For example, for genome project id 30807? http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=30807, I need to grab the text information that reads (this is found at the bottom of the page):Anabaena azollae. Anabaena azollae is a cyanobacterial symbiont of the water fern Azolla, commonly known as 'duckweed'. Anabaena azollae is a nitrogen-fixer and provides nitrogen to the host plant.Nostoc azollae 0708. Nostoc azollae 0708, also called Anabaena azollae strain 0708, will be used for comparative analysis. I need to grab the? same information for a list of genome project ids. Is this possible using Bio::DB::Eutilities. If yes, what would be the fields/params? I did try out this: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#What_information_is_available_for_database_.27x.27.3F to find out what information is available for genomeprj, but I am unable to get the necessary field/param for my need. Please help. Alternatively, is there a better way to address my need other than Bio::DB::Eutilities Thanks in advance, Anand From rmb32 at cornell.edu Fri Mar 26 03:44:09 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 00:44:09 -0700 Subject: [Bioperl-l] GSoC mentors mailing list Message-ID: <4BAC65C9.307@cornell.edu> Hi all, If you have volunteered to be a possible GSoC mentor, and have not already been subscribed to the (mentors-only) gsoc-mentors mailing list, send me an email and I'll subscribe you. Rob Buels OBF GSoC 2010 Admin From rmb32 at cornell.edu Fri Mar 26 12:30:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 09:30:30 -0700 Subject: [Bioperl-l] Announcing OBF Summer of Code - please forward! Message-ID: <4BACE126.1030500@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 9! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2010 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010 Applications due 19:00 UTC, April 9, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 29 through Friday, April 9th, 2010. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2010 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs From cjfields at illinois.edu Fri Mar 26 14:28:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 13:28:46 -0500 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <4BACEEA9.2060407@purdue.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> <4BACEEA9.2060407@purdue.edu> Message-ID: <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> That format is ASN.1. and there isn't a BioPerl parser for GenBank ASN.1 format (it tends to be too cumbersome). However, there is a pure-perl-based one for the EntrezGene ASN.1 format (Bio::ASN1::EntrezGene). chris On Fri, 2010-03-26 at 13:28 -0400, Phillip San Miguel wrote: > Ah, yes. That does the trick. Actually I have already downloaded a few > thousand records in whatever that format that is returned when 'genbank' > is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= > seq {') Any idea what format that is and how to convert it to something > SeqIO can use? > > If not, I can just pull them all down again by sending about 200 gi's > per request. That should not offend the genbank gods... > > Thanks for your help, > Phillip > > Chris Fields wrote: > > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. > > > > Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. > > > > chris > > > > On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > > > > > >> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. > >> > >> I am trying to use the code provided at: > >> > >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO > >> > >> and modified to request gi228534658 > >> > >> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. > >> > >> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: > >> > >> #!/usr/bin/perl > >> use strict; > >> use warnings; > >> > >> use Bio::SeqIO; > >> use Bio::DB::EUtilities; > >> > >> my @ids; > >> push @ids, '228534658'; > >> my $factory = Bio::DB::EUtilities->new( > >> -eutil => 'efetch', > >> -db => 'nucleotide', > >> -rettype => 'genbank', > >> -id => \@ids); > >> > >> my $file = 'myseqs.gb'; > >> > >> # dump HTTP::Response content to a file (not retained in memory) > >> $factory->get_Response(-file => $file); > >> > >> my $seqin = Bio::SeqIO->new(-file => $file, > >> -format => 'genbank'); > >> > >> while (my $seq = $seqin->next_seq) { > >> print "I see a sequence\n"; > >> print $seq->species(); > >> } > >> > >> > >> "myseqs.gb" does have content: > >> > >> Seq-entry ::= seq { > >> id { > >> general { > >> db "gpid:36555" , > >> tag > >> str "contig49313" } , > >> genbank { > >> accession "EZ113652" , > >> version 1 } , > >> gi 228534658 } , > >> descr { > >> title "TSA: Zea mays contig49313, mRNA sequence." , > >> source { > >> genome genomic , > >> org { > >> taxname "Zea mays" , > >> db { > >> { > >> db "taxon" , > >> tag > >> id 4577 } } , > >> orgname { > >> name > >> binomial { > >> genus "Zea" , > >> species "mays" } , > >> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; > >> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; > >> PACCAD clade; Panicoideae; Andropogoneae; Zea" , > >> gcode 1 , > >> mgcode 1 , > >> div "PLN" } } } , > >> molinfo { > >> biomol mRNA , > >> tech tsa } , > >> pub { > >> pub { > >> article { > >> title { > >> name "Deep sampling of the Palomero maize transcriptome by a high > >> throughput strategy of pyrosequencing." } , > >> authors { > >> names > >> std { > >> { > >> name > >> name { > >> last "Vega-Arreguin" , > >> initials "J.C." } } , > >> { > >> name > >> name { > >> last "Ibarra-Laclette" , > >> initials "E." } } , > >> { > >> name > >> name { > >> last "Jimenez-Moraila" , > >> initials "B." } } , > >> { > >> name > >> name { > >> last "Martinez" , > >> initials "O." } } , > >> { > >> name > >> name { > >> last "Vielle-Calzada" , > >> initials "J.P." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> initials "L." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> initials "A." } } } } , > >> from > >> journal { > >> title { > >> iso-jta "BMC Genomics" , > >> ml-jta "BMC Genomics" , > >> issn "1471-2164" , > >> name "BMC genomics" } , > >> imp { > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } , > >> volume "10" , > >> issue "1" , > >> pages "299" , > >> language "ENG" , > >> pubstatus aheadofprint , > >> history { > >> { > >> pubstatus received , > >> date > >> std { > >> year 2008 , > >> month 12 , > >> day 2 } } , > >> { > >> pubstatus accepted , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } } , > >> { > >> pubstatus aheadofprint , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } } , > >> { > >> pubstatus other , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } , > >> { > >> pubstatus pubmed , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } , > >> { > >> pubstatus medline , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } } } } , > >> ids { > >> pii "1471-2164-10-299" , > >> doi "10.1186/1471-2164-10-299" , > >> pubmed 19580677 } } , > >> pmid 19580677 } } , > >> pub { > >> pub { > >> sub { > >> authors { > >> names > >> std { > >> { > >> name > >> name { > >> last "Vega-Arreguin" , > >> first "Julio" , > >> initials "J.C." } } , > >> { > >> name > >> name { > >> last "Ibarra-Laclette" , > >> first "Enrique" , > >> initials "E." } } , > >> { > >> name > >> name { > >> last "Jimenez-Moraila" , > >> first "Beatriz" , > >> initials "B." } } , > >> { > >> name > >> name { > >> last "Martinez" , > >> first "Octavio" , > >> initials "O." } } , > >> { > >> name > >> name { > >> last "Vielle-Calzada" , > >> first "Jean" , > >> initials "J.Philippe." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> first "Luis" , > >> initials "L." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> first "Alfredo" , > >> initials "A." } } } , > >> affil > >> std { > >> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , > >> div "Cinvestav Campus Guanajuato" , > >> city "Irapuato" , > >> sub "Guanajuato" , > >> country "Mexico" , > >> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , > >> postal-code "36821" } } , > >> medium other , > >> date > >> std { > >> year 2009 , > >> month 3 , > >> day 23 } } } } , > >> user { > >> type > >> str "GenomeProjectsDB" , > >> data { > >> { > >> label > >> str "ProjectID" , > >> data > >> int 36555 } , > >> { > >> label > >> str "ParentID" , > >> data > >> int 0 } } } , > >> create-date > >> std { > >> year 2009 , > >> month 5 , > >> day 5 } , > >> update-date > >> std { > >> year 2009 , > >> month 7 , > >> day 14 } } , > >> inst { > >> repr raw , > >> mol rna , > >> length 450 , > >> seq-data > >> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 > >> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A > >> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 > >> 760CFF0'H } } > >> > >> > >> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. > >> > >> Phillip > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wollenbergk at niaid.nih.gov Fri Mar 26 16:47:06 2010 From: wollenbergk at niaid.nih.gov (Wollenberg, Kurt (NIH/NIAID) [C]) Date: Fri, 26 Mar 2010 16:47:06 -0400 Subject: [Bioperl-l] Error during installation of 1.6.1 Message-ID: Hello: I am trying to install BioPerl (after a recent system upgrade) and am getting the following error: "Catching error: "Can't execute q install q: No such file or directory at /Library/Perl/Updates/5.8.8/CPAN/Shell.pm line 1755\cJ" at /Library/Perl/Updates/5.8.8/CPAN.pm line 391". Previous to this I've run the CPAN upgrade, etc. as recommended on the Installation for Unix page. This happens when I try to do the actual install, both vanilla and "force"ed. I'm attempting this on a Mac G5 workstation running 10.5.8. Any clues what I may be missing or doing incorrectly? Cheers, Kurt Wollenberg, Ph.D. Contractor - Lockheed Martin Phylogenetics Specialist Computational Biology Section Bioinformatics and Computational Biosciences Branch (BCBB) OCICB/OSMO/OD/NIAID/NIH 31 Center Drive, Room 3B62 Bethesda, MD 20892-0485 Office 301-402-8628 http://bioinformatics.niaid.nih.gov (Within NIH) http://exon.niaid.nih.gov (Public) Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives From rmb32 at cornell.edu Fri Mar 26 18:22:42 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 15:22:42 -0700 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> References: <648F9E90AF07449887FD4C420AA8B00E@NewLife> <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> Message-ID: <4BAD33B2.1060309@cornell.edu> You guys are the best. Hugs all around. R From watvealo at cse.msu.edu Fri Mar 26 19:06:24 2010 From: watvealo at cse.msu.edu (Alok) Date: Fri, 26 Mar 2010 19:06:24 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <249674A825C14BB3801C6184DEEA7A82@NewLife> References: <4BABB825.6010803@cse.msu.edu> <249674A825C14BB3801C6184DEEA7A82@NewLife> Message-ID: <4BAD3DF0.7090006@cse.msu.edu> Hi Mark, Thanks a lot for the response. I tried to access the SVN but was unable to do so. My SVN client just times out :-( I even tried SVN links from the BioPerl Wiki (http://www.bioperl.org/wiki/Using_Subversion) But they too are non-responsive. Thanks, Alok Mark A. Jensen wrote: > Hi Alok-- Thanks for your interest! You should certainly consider > applying. I can work with > you on developing your application. I'm including the bioperl mailing > list on this > post; we'll continue to have this conversation on the list so that the > helpful, friendly, > knowledgeable, compassionate membership can participate. > WrapperMaker code is currently available in > svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker > > Probably you want to have a look at Bio::Tools::Run::Samtools in > bioperl-run > for an example of how Bio::Tools::Run::WrapperBase and CommandExts are > used (er, by me...). > cheers > MAJ > ----- Original Message ----- From: "Alok" > To: > Sent: Thursday, March 25, 2010 3:23 PM > Subject: BioPerl Google SOC project > > >> Hello Mark, >> >> My name is Alok Watve and I am currently pursuing PhD in Computer >> Science at Michigan State University. I was going through the BioPerl >> Wiki for Google SOC projects. I have good experience with Perl and was >> wondering if I could work on the project "Perl Run Wrappers". >> >> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >> Ltd. My work was involved in writing Java programs and their perl >> wrappers. We used perl scripts to fire java programs with all the >> correct parameters. So I think I have some idea about what wrappers are. >> However, I have not used BioPerl and may take some time to get familiar >> with the structure. I am fairly confident that I will be able to do >> this. >> >> During my work here at MSU. I use perl a lot for doing basic text >> analysis for my projects. Although I rarely use OO features of perl, I >> have used them in past and never had any problems with it. I also >> believe in writing well-documented and user/developer friendly code >> (With comments, command line options for help/documentation). I have >> attached a simple script I wrote for my project as an example. I have >> also attached my resume for your consideration. >> >> Please let me know if you think that I am an appropriate candidate and >> whether I should go ahead with submitting an application with BioPerl as >> my Mentor Organization. >> >> Thanks a lot, >> Alok >> www.cse.msu.edu/~watvealo/ >> > > > -------------------------------------------------------------------------------- > > > >> #!/usr/bin/perl >> >> =pod >> >> =head1 SYNOPSIS >> >> Script to edit existing box query files to enable random box query. >> This scripts inserts box size on each line corresponding to discrete >> dimension in the existing box query file. The maximum value of "box >> size" >> depends on the alphabet size. >> >> Example >> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >> >> Use -perldoc for detailed help on options. >> >> =head1 OPTIONS >> >> =over >> >> =item -infile >> >> Specifies the name of the input box query file. >> >> =item -outfile >> >> Specifies the name of the output file. >> >> =item -uniform_box >> >> Specifies size of the uniform box query. >> >> =item -max_size >> >> Specifies the maximum box size for random sized box query. >> >> =item -help >> >> Displays a brief help message and exits. >> >> =item -perldoc >> >> Displays a detailed help. >> >> =back >> >> =cut >> >> use strict; >> use warnings 'all'; >> >> use Getopt::Long; >> use Pod::Usage; >> >> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, >> 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >> 'help' => \my $help, 'perldoc' => \my $perldoc); >> >> if(defined($perldoc)) >> { >> pod2usage(-verbose => 2); >> } >> >> if(defined($help)) >> { >> pod2usage(-verbose=> 0); >> } >> >> if(! (defined($infile) && defined ($outfile) )) >> { >> die('Please specify input, output files. Use -perldoc >> for more help'); >> } >> >> # Some basic error checking to ensure script runs .... >> if(!(defined($uniformBox) ||defined($maxSize))) >> { >> die('Specify either box size for uniform box queries or maximum >> box size for random box queries'); >> } >> >> # Initialize random number generator. >> srand(); >> >> # Read Input file and find out lines we are interested in >> # Then perfix the line with correct box size as defined by >> # user choice >> open(IN, "<$infile"); >> open(OUT, ">$outfile"); >> my $count = 0; >> while(my $line = ) >> { >> if( ($count%64) < 32 ) >> { >> if(defined($uniformBox)) >> { >> $line = sprintf("%d ",$uniformBox) . $line; >> } >> elsif(defined($maxSize)) >> { >> # This line corresponds to the discrete dimension. >> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >> } >> } >> $count ++; >> print OUT $line >> } >> >> close(OUT); >> close(IN); >> From maj at fortinbras.us Fri Mar 26 20:08:51 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 20:08:51 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <4BAD3DF0.7090006@cse.msu.edu> References: <4BABB825.6010803@cse.msu.edu><249674A825C14BB3801C6184DEEA7A82@NewLife> <4BAD3DF0.7090006@cse.msu.edu> Message-ID: Hi Alok-- There has been trouble with the code node of late. You can get a tarball of all the latest code at http://bioperl.org/DIST/nightly_builds/ Download both bioperl-live and bioperl-run cheers, MAJ ----- Original Message ----- From: "Alok" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Friday, March 26, 2010 7:06 PM Subject: Re: [Bioperl-l] BioPerl Google SOC project > Hi Mark, > > Thanks a lot for the response. I tried to access the SVN but was unable to do > so. My SVN client just times out :-( > I even tried SVN links from the BioPerl Wiki > (http://www.bioperl.org/wiki/Using_Subversion) > But they too are non-responsive. > > Thanks, > Alok > > Mark A. Jensen wrote: >> Hi Alok-- Thanks for your interest! You should certainly consider applying. I >> can work with >> you on developing your application. I'm including the bioperl mailing list on >> this >> post; we'll continue to have this conversation on the list so that the >> helpful, friendly, >> knowledgeable, compassionate membership can participate. >> WrapperMaker code is currently available in >> svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker >> Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run >> for an example of how Bio::Tools::Run::WrapperBase and CommandExts are >> used (er, by me...). >> cheers >> MAJ >> ----- Original Message ----- From: "Alok" >> To: >> Sent: Thursday, March 25, 2010 3:23 PM >> Subject: BioPerl Google SOC project >> >> >>> Hello Mark, >>> >>> My name is Alok Watve and I am currently pursuing PhD in Computer >>> Science at Michigan State University. I was going through the BioPerl >>> Wiki for Google SOC projects. I have good experience with Perl and was >>> wondering if I could work on the project "Perl Run Wrappers". >>> >>> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >>> Ltd. My work was involved in writing Java programs and their perl >>> wrappers. We used perl scripts to fire java programs with all the >>> correct parameters. So I think I have some idea about what wrappers are. >>> However, I have not used BioPerl and may take some time to get familiar >>> with the structure. I am fairly confident that I will be able to do this. >>> >>> During my work here at MSU. I use perl a lot for doing basic text >>> analysis for my projects. Although I rarely use OO features of perl, I >>> have used them in past and never had any problems with it. I also >>> believe in writing well-documented and user/developer friendly code >>> (With comments, command line options for help/documentation). I have >>> attached a simple script I wrote for my project as an example. I have >>> also attached my resume for your consideration. >>> >>> Please let me know if you think that I am an appropriate candidate and >>> whether I should go ahead with submitting an application with BioPerl as >>> my Mentor Organization. >>> >>> Thanks a lot, >>> Alok >>> www.cse.msu.edu/~watvealo/ >>> >> >> >> -------------------------------------------------------------------------------- >> >> >> >>> #!/usr/bin/perl >>> >>> =pod >>> >>> =head1 SYNOPSIS >>> >>> Script to edit existing box query files to enable random box query. >>> This scripts inserts box size on each line corresponding to discrete >>> dimension in the existing box query file. The maximum value of "box size" >>> depends on the alphabet size. >>> >>> Example >>> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >>> >>> Use -perldoc for detailed help on options. >>> >>> =head1 OPTIONS >>> >>> =over >>> >>> =item -infile >>> >>> Specifies the name of the input box query file. >>> >>> =item -outfile >>> >>> Specifies the name of the output file. >>> >>> =item -uniform_box >>> >>> Specifies size of the uniform box query. >>> >>> =item -max_size >>> >>> Specifies the maximum box size for random sized box query. >>> >>> =item -help >>> >>> Displays a brief help message and exits. >>> >>> =item -perldoc >>> >>> Displays a detailed help. >>> >>> =back >>> >>> =cut >>> >>> use strict; >>> use warnings 'all'; >>> >>> use Getopt::Long; >>> use Pod::Usage; >>> >>> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, >>> 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >>> 'help' => \my $help, 'perldoc' => \my $perldoc); >>> >>> if(defined($perldoc)) >>> { >>> pod2usage(-verbose => 2); >>> } >>> >>> if(defined($help)) >>> { >>> pod2usage(-verbose=> 0); >>> } >>> >>> if(! (defined($infile) && defined ($outfile) )) >>> { >>> die('Please specify input, output files. Use -perldoc >>> for more help'); >>> } >>> >>> # Some basic error checking to ensure script runs .... >>> if(!(defined($uniformBox) ||defined($maxSize))) >>> { >>> die('Specify either box size for uniform box queries or maximum box size >>> for random box queries'); >>> } >>> >>> # Initialize random number generator. >>> srand(); >>> >>> # Read Input file and find out lines we are interested in >>> # Then perfix the line with correct box size as defined by >>> # user choice >>> open(IN, "<$infile"); >>> open(OUT, ">$outfile"); >>> my $count = 0; >>> while(my $line = ) >>> { >>> if( ($count%64) < 32 ) >>> { >>> if(defined($uniformBox)) >>> { >>> $line = sprintf("%d ",$uniformBox) . $line; >>> } >>> elsif(defined($maxSize)) >>> { >>> # This line corresponds to the discrete dimension. >>> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >>> } >>> } >>> $count ++; >>> print OUT $line >>> } >>> >>> close(OUT); >>> close(IN); >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bioperlanand at yahoo.com Fri Mar 26 21:40:04 2010 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Fri, 26 Mar 2010 18:40:04 -0700 (PDT) Subject: [Bioperl-l] From Anand - a question on querying ncbi's genomeprj with Bio::DB::Eutilities Message-ID: <497143.33972.qm@web114218.mail.gq1.yahoo.com> Hi everybody, ? I have a list of genome project ids & I have a need where I need to gather information from a specific field? & store the output in a file. As regards what Info I want For example, for genome project id 30807??http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=30807, I need to grab the text information that reads (this is found at the bottom of the page):Anabaena azollae. Anabaena azollae is a cyanobacterial symbiont of the water fern Azolla, commonly known as 'duckweed'. Anabaena azollae is a nitrogen-fixer and provides nitrogen to the host plant.Nostoc azollae 0708. Nostoc azollae 0708, also called Anabaena azollae strain 0708, will be used for comparative analysis. I need to grab the? same information for a list of genome project ids. Is this possible using Bio::DB::Eutilities. If yes, what would be the fields/params? I did try out this:?http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#What_information_is_available_for_database_.27x.27.3F to find out what information is available for genomeprj, but I am unable to get the necessary field/param for my need. Please help. Alternatively, is there a better way to address my need other than Bio::DB::Eutilities Thanks in advance, Anand? From cjfields at illinois.edu Fri Mar 26 23:05:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 22:05:59 -0500 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: References: <4BABB825.6010803@cse.msu.edu><249674A825C14BB3801C6184DEEA7A82@NewLife> <4BAD3DF0.7090006@cse.msu.edu> Message-ID: <73AE1929-9920-4FD1-B36B-1C7244E20102@illinois.edu> You can also grab the code off the github mirror: http://github.com/bioperl/bioperl-live You can either run a checkout, or download the tarball using the 'Download Source' link. We'll have an SVN read-only mirror on Google Code as well very soon, if it isn't done already. chris On Mar 26, 2010, at 7:08 PM, Mark A. Jensen wrote: > Hi Alok-- There has been trouble with the code node > of late. You can get a tarball of all the latest code at > http://bioperl.org/DIST/nightly_builds/ > Download both bioperl-live and bioperl-run > cheers, > MAJ > ----- Original Message ----- From: "Alok" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Friday, March 26, 2010 7:06 PM > Subject: Re: [Bioperl-l] BioPerl Google SOC project > > >> Hi Mark, >> >> Thanks a lot for the response. I tried to access the SVN but was unable to do so. My SVN client just times out :-( >> I even tried SVN links from the BioPerl Wiki (http://www.bioperl.org/wiki/Using_Subversion) >> But they too are non-responsive. >> >> Thanks, >> Alok >> >> Mark A. Jensen wrote: >>> Hi Alok-- Thanks for your interest! You should certainly consider applying. I can work with >>> you on developing your application. I'm including the bioperl mailing list on this >>> post; we'll continue to have this conversation on the list so that the helpful, friendly, >>> knowledgeable, compassionate membership can participate. >>> WrapperMaker code is currently available in >>> svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker >>> Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run >>> for an example of how Bio::Tools::Run::WrapperBase and CommandExts are >>> used (er, by me...). >>> cheers >>> MAJ >>> ----- Original Message ----- From: "Alok" >>> To: >>> Sent: Thursday, March 25, 2010 3:23 PM >>> Subject: BioPerl Google SOC project >>> >>> >>>> Hello Mark, >>>> >>>> My name is Alok Watve and I am currently pursuing PhD in Computer >>>> Science at Michigan State University. I was going through the BioPerl >>>> Wiki for Google SOC projects. I have good experience with Perl and was >>>> wondering if I could work on the project "Perl Run Wrappers". >>>> >>>> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >>>> Ltd. My work was involved in writing Java programs and their perl >>>> wrappers. We used perl scripts to fire java programs with all the >>>> correct parameters. So I think I have some idea about what wrappers are. >>>> However, I have not used BioPerl and may take some time to get familiar >>>> with the structure. I am fairly confident that I will be able to do this. >>>> >>>> During my work here at MSU. I use perl a lot for doing basic text >>>> analysis for my projects. Although I rarely use OO features of perl, I >>>> have used them in past and never had any problems with it. I also >>>> believe in writing well-documented and user/developer friendly code >>>> (With comments, command line options for help/documentation). I have >>>> attached a simple script I wrote for my project as an example. I have >>>> also attached my resume for your consideration. >>>> >>>> Please let me know if you think that I am an appropriate candidate and >>>> whether I should go ahead with submitting an application with BioPerl as >>>> my Mentor Organization. >>>> >>>> Thanks a lot, >>>> Alok >>>> www.cse.msu.edu/~watvealo/ >>>> >>> >>> >>> -------------------------------------------------------------------------------- >>> >>> >>> >>>> #!/usr/bin/perl >>>> >>>> =pod >>>> >>>> =head1 SYNOPSIS >>>> >>>> Script to edit existing box query files to enable random box query. >>>> This scripts inserts box size on each line corresponding to discrete >>>> dimension in the existing box query file. The maximum value of "box size" >>>> depends on the alphabet size. >>>> >>>> Example >>>> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >>>> >>>> Use -perldoc for detailed help on options. >>>> >>>> =head1 OPTIONS >>>> >>>> =over >>>> >>>> =item -infile >>>> >>>> Specifies the name of the input box query file. >>>> >>>> =item -outfile >>>> >>>> Specifies the name of the output file. >>>> >>>> =item -uniform_box >>>> >>>> Specifies size of the uniform box query. >>>> >>>> =item -max_size >>>> >>>> Specifies the maximum box size for random sized box query. >>>> >>>> =item -help >>>> >>>> Displays a brief help message and exits. >>>> >>>> =item -perldoc >>>> >>>> Displays a detailed help. >>>> >>>> =back >>>> >>>> =cut >>>> >>>> use strict; >>>> use warnings 'all'; >>>> >>>> use Getopt::Long; >>>> use Pod::Usage; >>>> >>>> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >>>> 'help' => \my $help, 'perldoc' => \my $perldoc); >>>> >>>> if(defined($perldoc)) >>>> { >>>> pod2usage(-verbose => 2); >>>> } >>>> >>>> if(defined($help)) >>>> { >>>> pod2usage(-verbose=> 0); >>>> } >>>> >>>> if(! (defined($infile) && defined ($outfile) )) >>>> { >>>> die('Please specify input, output files. Use -perldoc >>>> for more help'); >>>> } >>>> >>>> # Some basic error checking to ensure script runs .... >>>> if(!(defined($uniformBox) ||defined($maxSize))) >>>> { >>>> die('Specify either box size for uniform box queries or maximum box size for random box queries'); >>>> } >>>> >>>> # Initialize random number generator. >>>> srand(); >>>> >>>> # Read Input file and find out lines we are interested in >>>> # Then perfix the line with correct box size as defined by >>>> # user choice >>>> open(IN, "<$infile"); >>>> open(OUT, ">$outfile"); >>>> my $count = 0; >>>> while(my $line = ) >>>> { >>>> if( ($count%64) < 32 ) >>>> { >>>> if(defined($uniformBox)) >>>> { >>>> $line = sprintf("%d ",$uniformBox) . $line; >>>> } >>>> elsif(defined($maxSize)) >>>> { >>>> # This line corresponds to the discrete dimension. >>>> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >>>> } >>>> } >>>> $count ++; >>>> print OUT $line >>>> } >>>> >>>> close(OUT); >>>> close(IN); >>>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Mar 26 23:15:30 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 23:15:30 -0400 Subject: [Bioperl-l] Error during installation of 1.6.1 In-Reply-To: References: Message-ID: Is it really "q install q" ? Then you probably need to do some cpan configuring. It's possible your original CPAN/Config.pm file is lost or not where cpan expects it to be after your upgrade. Try this $ cpan cpan> o conf make /usr/bin/make cpan> o conf make_install_make_command /usr/bin/make cpan> o conf commit and rerun the install. If you get other strangeness, I would check the values of all the config variables by listing with cpan> o conf BTW, by the message I infer you've got v1.93 of CPAN; maybe upgrading to the current version (v1.9402) would solve some problems. cheers MAJ ----- Original Message ----- From: "Wollenberg, Kurt (NIH/NIAID) [C]" To: Sent: Friday, March 26, 2010 4:47 PM Subject: [Bioperl-l] Error during installation of 1.6.1 > Hello: > > I am trying to install BioPerl (after a recent system upgrade) and am > getting the following error: > > "Catching error: "Can't execute q install q: No such file or directory at > /Library/Perl/Updates/5.8.8/CPAN/Shell.pm line 1755\cJ" at > /Library/Perl/Updates/5.8.8/CPAN.pm line 391". > > Previous to this I've run the CPAN upgrade, etc. as recommended on the > Installation for Unix page. This happens when I try to do the actual > install, both vanilla and "force"ed. I'm attempting this on a Mac G5 > workstation running 10.5.8. Any clues what I may be missing or doing > incorrectly? > > Cheers, > Kurt Wollenberg, Ph.D. > Contractor - Lockheed Martin > Phylogenetics Specialist > Computational Biology Section > Bioinformatics and Computational Biosciences Branch (BCBB) > OCICB/OSMO/OD/NIAID/NIH > > 31 Center Drive, Room 3B62 > Bethesda, MD 20892-0485 > Office 301-402-8628 > http://bioinformatics.niaid.nih.gov (Within NIH) > http://exon.niaid.nih.gov (Public) > > Disclaimer: > The information in this e-mail and any of its attachments is confidential > and may contain sensitive information. It should not be used by anyone who > is not the original intended recipient. If you have received this e-mail in > error please inform the sender and delete it from your mailbox or any other > storage devices. National Institute of Allergy and Infectious Diseases shall > not accept liability for any statements made that are sender's own and not > expressly made on behalf of the NIAID by one of its representatives > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Sat Mar 27 08:42:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 27 Mar 2010 12:42:12 +0000 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Message-ID: <320fb6e01003270542i1f3cd4d2x61c97bc7ccf1b917@mail.gmail.com> On Fri, Mar 26, 2010 at 4:16 PM, Chris Fields wrote: > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the > latter is if you always want a full nucleotide sequence instead > of possibly getting contig files). ?'genbank' used to be an alias > for 'gb', but apparently no longer, and appears to be something > that was changed on NCBI's end. Yeah, the NCBI changed that almost a year ago (Easter 2009). It broke one of the Biopython unit tests, and I asked the NCBI about this and if they could restore the alias "genbank". They declined, so in Biopython's efetch wrapper we spot anyone asking for retype=genbank, issue a warning, and convert it to retype=gb or retype=gp (for the protein database) instead. The relevant Biopython code is here if anyone is interested: http://biopython.org/SRC/biopython/Bio/Entrez/__init__.py Peter From pmiguel at purdue.edu Sat Mar 27 09:51:14 2010 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sat, 27 Mar 2010 09:51:14 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> <4BACEEA9.2060407@purdue.edu> <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> Message-ID: <4BAE0D52.60908@purdue.edu> Hi Chris, I also see there is a bunch of NCBI toolkit code that deals with asn.1 conversion. They even have some precompiled code: http://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html Thanks for your help, Phillip Chris Fields wrote: > That format is ASN.1. and there isn't a BioPerl parser for GenBank ASN.1 > format (it tends to be too cumbersome). > > However, there is a pure-perl-based one for the EntrezGene ASN.1 format > (Bio::ASN1::EntrezGene). > > chris > > > On Fri, 2010-03-26 at 13:28 -0400, Phillip San Miguel wrote: > >> Ah, yes. That does the trick. Actually I have already downloaded a few >> thousand records in whatever that format that is returned when 'genbank' >> is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= >> seq {') Any idea what format that is and how to convert it to something >> SeqIO can use? >> >> If not, I can just pull them all down again by sending about 200 gi's >> per request. That should not offend the genbank gods... >> >> Thanks for your help, >> Phillip >> >> Chris Fields wrote: >> >>> Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. >>> >>> Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. >>> >>> chris >>> >>> On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: >>> >>> >>> >>>> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. >>>> >>>> I am trying to use the code provided at: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO >>>> >>>> and modified to request gi228534658 >>>> >>>> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. >>>> >>>> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> >>>> use Bio::SeqIO; >>>> use Bio::DB::EUtilities; >>>> >>>> my @ids; >>>> push @ids, '228534658'; >>>> my $factory = Bio::DB::EUtilities->new( >>>> -eutil => 'efetch', >>>> -db => 'nucleotide', >>>> -rettype => 'genbank', >>>> -id => \@ids); >>>> >>>> my $file = 'myseqs.gb'; >>>> >>>> # dump HTTP::Response content to a file (not retained in memory) >>>> $factory->get_Response(-file => $file); >>>> >>>> my $seqin = Bio::SeqIO->new(-file => $file, >>>> -format => 'genbank'); >>>> >>>> while (my $seq = $seqin->next_seq) { >>>> print "I see a sequence\n"; >>>> print $seq->species(); >>>> } >>>> >>>> >>>> "myseqs.gb" does have content: >>>> >>>> Seq-entry ::= seq { >>>> id { >>>> general { >>>> db "gpid:36555" , >>>> tag >>>> str "contig49313" } , >>>> genbank { >>>> accession "EZ113652" , >>>> version 1 } , >>>> gi 228534658 } , >>>> descr { >>>> title "TSA: Zea mays contig49313, mRNA sequence." , >>>> source { >>>> genome genomic , >>>> org { >>>> taxname "Zea mays" , >>>> db { >>>> { >>>> db "taxon" , >>>> tag >>>> id 4577 } } , >>>> orgname { >>>> name >>>> binomial { >>>> genus "Zea" , >>>> species "mays" } , >>>> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; >>>> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; >>>> PACCAD clade; Panicoideae; Andropogoneae; Zea" , >>>> gcode 1 , >>>> mgcode 1 , >>>> div "PLN" } } } , >>>> molinfo { >>>> biomol mRNA , >>>> tech tsa } , >>>> pub { >>>> pub { >>>> article { >>>> title { >>>> name "Deep sampling of the Palomero maize transcriptome by a high >>>> throughput strategy of pyrosequencing." } , >>>> authors { >>>> names >>>> std { >>>> { >>>> name >>>> name { >>>> last "Vega-Arreguin" , >>>> initials "J.C." } } , >>>> { >>>> name >>>> name { >>>> last "Ibarra-Laclette" , >>>> initials "E." } } , >>>> { >>>> name >>>> name { >>>> last "Jimenez-Moraila" , >>>> initials "B." } } , >>>> { >>>> name >>>> name { >>>> last "Martinez" , >>>> initials "O." } } , >>>> { >>>> name >>>> name { >>>> last "Vielle-Calzada" , >>>> initials "J.P." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> initials "L." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> initials "A." } } } } , >>>> from >>>> journal { >>>> title { >>>> iso-jta "BMC Genomics" , >>>> ml-jta "BMC Genomics" , >>>> issn "1471-2164" , >>>> name "BMC genomics" } , >>>> imp { >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } , >>>> volume "10" , >>>> issue "1" , >>>> pages "299" , >>>> language "ENG" , >>>> pubstatus aheadofprint , >>>> history { >>>> { >>>> pubstatus received , >>>> date >>>> std { >>>> year 2008 , >>>> month 12 , >>>> day 2 } } , >>>> { >>>> pubstatus accepted , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } } , >>>> { >>>> pubstatus aheadofprint , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } } , >>>> { >>>> pubstatus other , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } , >>>> { >>>> pubstatus pubmed , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } , >>>> { >>>> pubstatus medline , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } } } } , >>>> ids { >>>> pii "1471-2164-10-299" , >>>> doi "10.1186/1471-2164-10-299" , >>>> pubmed 19580677 } } , >>>> pmid 19580677 } } , >>>> pub { >>>> pub { >>>> sub { >>>> authors { >>>> names >>>> std { >>>> { >>>> name >>>> name { >>>> last "Vega-Arreguin" , >>>> first "Julio" , >>>> initials "J.C." } } , >>>> { >>>> name >>>> name { >>>> last "Ibarra-Laclette" , >>>> first "Enrique" , >>>> initials "E." } } , >>>> { >>>> name >>>> name { >>>> last "Jimenez-Moraila" , >>>> first "Beatriz" , >>>> initials "B." } } , >>>> { >>>> name >>>> name { >>>> last "Martinez" , >>>> first "Octavio" , >>>> initials "O." } } , >>>> { >>>> name >>>> name { >>>> last "Vielle-Calzada" , >>>> first "Jean" , >>>> initials "J.Philippe." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> first "Luis" , >>>> initials "L." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> first "Alfredo" , >>>> initials "A." } } } , >>>> affil >>>> std { >>>> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , >>>> div "Cinvestav Campus Guanajuato" , >>>> city "Irapuato" , >>>> sub "Guanajuato" , >>>> country "Mexico" , >>>> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , >>>> postal-code "36821" } } , >>>> medium other , >>>> date >>>> std { >>>> year 2009 , >>>> month 3 , >>>> day 23 } } } } , >>>> user { >>>> type >>>> str "GenomeProjectsDB" , >>>> data { >>>> { >>>> label >>>> str "ProjectID" , >>>> data >>>> int 36555 } , >>>> { >>>> label >>>> str "ParentID" , >>>> data >>>> int 0 } } } , >>>> create-date >>>> std { >>>> year 2009 , >>>> month 5 , >>>> day 5 } , >>>> update-date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 14 } } , >>>> inst { >>>> repr raw , >>>> mol rna , >>>> length 450 , >>>> seq-data >>>> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 >>>> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A >>>> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 >>>> 760CFF0'H } } >>>> >>>> >>>> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. >>>> >>>> Phillip >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From awitney at sgul.ac.uk Mon Mar 29 13:26:40 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 29 Mar 2010 18:26:40 +0100 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl In-Reply-To: <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> References: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> Message-ID: <6DD3E9BB-27AD-4241-94F9-476AE6525A7D@sgul.ac.uk> thanks Chris for the explanation. It looks like Exonerate may also do something similar thanks adam On 26 Mar 2010, at 15:51, Chris Fields wrote: > It's not actively developed as far as I know. I've been thinking that we could break it out of bioperl-ext and release it on it's own, with the intent that someone could take it up at some point. We have started down that road with the HMM tools in bioperl-ext, though that one is still maintained by it's author. > > I know many users just use calls to outside programs, such EMBOSS (which has water and needle) or others. From the maintenance standpoint they're easier to update if something changes, XS can be a bugbear. > > chris > > On Mar 26, 2010, at 10:20 AM, Adam Witney wrote: > >> Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error >> >> "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. >> Please read the install the bioperl-ext package" >> >> even though i did compile and install the Bio::Ext::Align package >> >> If not using the pSW module, what do other people use for this? >> >> thanks >> >> adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nicolas.turenne at jouy.inra.fr Mon Mar 29 14:09:53 2010 From: nicolas.turenne at jouy.inra.fr (Nicolas Turenne) Date: Mon, 29 Mar 2010 20:09:53 +0200 Subject: [Bioperl-l] about biblio Message-ID: <4BB0ECF1.6050308@jouy.inra.fr> Hello, I am using biblio module from bioperl to download pubmed abstract. if i do the query "actb" on the pubmed site (http://www.ncbi.nlm.nih.gov/sites/entrez) i get 165 hits But using bioperl, if i do use Bio::Biblio; my $biblio = Bio::Biblio->new (-access => 'soap', -location => 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS', -destroy_on_exit => '0'); my @ListID = @{ $biblio->find ("actb")->get_all_ids }; i get 228 hits, so i dont understand the difference thank for help Nicolas From sj17m89 at gmail.com Mon Mar 29 13:47:38 2010 From: sj17m89 at gmail.com (Shweta Jha) Date: Mon, 29 Mar 2010 10:47:38 -0700 Subject: [Bioperl-l] Regarding Google Summer of Code Message-ID: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> Dear Sir / Madam , I , Shweta Jha , am a Third year B.Tech Bioinformatics student. I am interested to apply for the Google Summer of Code internship program. I am keen to work on project using Bioperl. Could you please let me know how do I apply for the program? Thanks and Regards Shweta Jha From rmb32 at cornell.edu Mon Mar 29 15:26:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 29 Mar 2010 12:26:30 -0700 Subject: [Bioperl-l] Regarding Google Summer of Code In-Reply-To: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> References: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> Message-ID: <4BB0FEE6.3080209@cornell.edu> Hi Shweta, See http://open-bio.org/wiki/Google_Summer_of_Code, and the GSoC FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs for details on the application process. Rob Shweta Jha wrote: > Dear Sir / Madam , > > I , Shweta Jha , am a Third year B.Tech Bioinformatics student. > > I am interested to apply for the Google Summer of Code internship program. > > I am keen to work on project using Bioperl. > > Could you please let me know how do I apply for the program? > > > > Thanks and Regards > Shweta Jha > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From martin.senger at gmail.com Mon Mar 29 17:02:02 2010 From: martin.senger at gmail.com (Martin Senger) Date: Mon, 29 Mar 2010 22:02:02 +0100 Subject: [Bioperl-l] about biblio In-Reply-To: <4BB0ECF1.6050308@jouy.inra.fr> References: <4BB0ECF1.6050308@jouy.inra.fr> Message-ID: <4d93f07c1003291402j5ab58216o3985157513d1820a@mail.gmail.com> Hi, I am actually not sure what is the correct answer - because I am not anymore maintaining the biblio server at EBI (I actually did not know that it was still running :-) - but I am very pleased that it does run). Mahmut, can I ask you a favor? Could you please pass the emailed question below to an appropriate person at EBI? Of course, if the result of this inquiry is that the problem is in the biblio module in bioperl I am quite happy and keen to fix it there. Cheers, Martin On Mon, Mar 29, 2010 at 7:09 PM, Nicolas Turenne < nicolas.turenne at jouy.inra.fr> wrote: > Hello, > I am using biblio module from bioperl to download pubmed abstract. > if i do the query "actb" on the pubmed site ( > http://www.ncbi.nlm.nih.gov/sites/entrez) > i get 165 hits > > But using bioperl, if i do > > use Bio::Biblio; > my $biblio = Bio::Biblio->new > (-access => 'soap', > -location => 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS', > -destroy_on_exit => '0'); > my @ListID = @{ $biblio->find ("actb")->get_all_ids }; > > i get 228 hits, so i dont understand the difference > > thank for help > Nicolas > -- Martin Senger email: martin.senger at gmail.com,martin.senger at kaust.edu.sa skype: martinsenger From click.xu at gmail.com Mon Mar 29 23:17:17 2010 From: click.xu at gmail.com (click xu) Date: Tue, 30 Mar 2010 11:17:17 +0800 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw Message-ID: Hi, I meet a problem when using Clustalw module. Here is the error message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht -output=gcg?? -matrix=BLOSUM -ktup le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | cannot find the file or path STACK: Error::throw STACK: Bio::Root::Root::throw /home/lf/data/BioPerl-1.6.1/Bio/Root/Root.pm:368 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig nment/Clustalw.pm:756 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali gnment/Clustalw.pm:515 STACK: test.txt:45 ----------------------------------------------------------- The test program is described as below: ----------------------------------------------------------- @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); # @seq_array is an array of Bio::Seq objects $aln = $factory->align(\@seq_array); ----------------------------------------------------------- The work path of clustalw2 has been configured: export CLUSTALDIR=/usr/local/bin/clustalw2 So, what may be reason of the error? Thanks! From Russell.Smithies at agresearch.co.nz Mon Mar 29 23:25:03 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 30 Mar 2010 16:25:03 +1300 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Do you have enough temp space? Will clustalw run 'manually' with your parameters from the command line? --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of click xu > Sent: Tuesday, 30 March 2010 4:17 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw > > Hi, > I meet a problem when using Clustalw module. > Here is the error message: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht > -output=gcg?? -matrix=BLOSUM -ktup > le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | > cannot find the file or path > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- > 1.6.1/Bio/Root/Root.pm:368 > STACK: Bio::Tools::Run::Alignment::Clustalw::_run > /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig > nment/Clustalw.pm:756 > STACK: Bio::Tools::Run::Alignment::Clustalw::align > /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali > gnment/Clustalw.pm:515 > STACK: test.txt:45 > ----------------------------------------------------------- > The test program is described as below: > ----------------------------------------------------------- > @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > # @seq_array is an array of Bio::Seq objects > $aln = $factory->align(\@seq_array); > ----------------------------------------------------------- > The work path of clustalw2 has been configured: > export CLUSTALDIR=/usr/local/bin/clustalw2 > So, what may be reason of the error? > Thanks! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From click.xu at gmail.com Tue Mar 30 00:03:49 2010 From: click.xu at gmail.com (click xu) Date: Tue, 30 Mar 2010 12:03:49 +0800 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Message-ID: Russell Clustalw2 can correctly run in command line, and the /tmp space is enough too. 2010/3/30 Smithies, Russell : > Do you have enough temp space? > Will clustalw run 'manually' with your parameters from the command line? > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of click xu >> Sent: Tuesday, 30 March 2010 4:17 p.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw >> >> Hi, >> I meet a problem when using Clustalw module. >> Here is the error message: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht >> -output=gcg?? -matrix=BLOSUM -ktup >> le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | >> cannot find the file or path >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- >> 1.6.1/Bio/Root/Root.pm:368 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig >> nment/Clustalw.pm:756 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align >> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali >> gnment/Clustalw.pm:515 >> STACK: test.txt:45 >> ----------------------------------------------------------- >> The test program is described as below: >> ----------------------------------------------------------- >> @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >> # @seq_array is an array of Bio::Seq objects >> $aln = $factory->align(\@seq_array); >> ----------------------------------------------------------- >> The work path of clustalw2 has been configured: >> export CLUSTALDIR=/usr/local/bin/clustalw2 >> So, what may be reason of the error? >> Thanks! >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From martin.senger at gmail.com Tue Mar 30 04:18:30 2010 From: martin.senger at gmail.com (Martin Senger) Date: Tue, 30 Mar 2010 09:18:30 +0100 Subject: [Bioperl-l] about biblio In-Reply-To: <4BB0ECF1.6050308@jouy.inra.fr> References: <4BB0ECF1.6050308@jouy.inra.fr> Message-ID: <4d93f07c1003300118q1c7b0551w4aa25a2a97fc35be@mail.gmail.com> Here is the answer sent by Mr Hamish McWilliam from EBI (where the MEDLINE server is running): The difference is OpenBQS adds a wildcard when it builds the SRS query: > > - [medline-AllText:actb*] gives 228 entries > - [medline-AllText:actb] gives 150 entries > > Performing the same query at PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) > gives similar answers: > > - "actb*" gives 255 entries > - "actb" gives 165 entries > > The remaining differences are probably due to slight differences in the > PubMed data at NCBI and the exported MEDLINE data. > Cheers, Martin -- Martin Senger email: martin.senger at gmail.com,martin.senger at kaust.edu.sa skype: martinsenger From cjfields at illinois.edu Tue Mar 30 08:42:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 30 Mar 2010 07:42:24 -0500 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Message-ID: <863E31F9-072B-4681-94C5-D2C8BEA82021@illinois.edu> You may need to submit this as a bug. I got clustalw2 working fairly recently, but it's possible some other API change is breaking things. chris On Mar 29, 2010, at 11:03 PM, click xu wrote: > Russell > Clustalw2 can correctly run in command line, and the /tmp space is enough too. > > > 2010/3/30 Smithies, Russell : >> Do you have enough temp space? >> Will clustalw run 'manually' with your parameters from the command line? >> >> --Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of click xu >>> Sent: Tuesday, 30 March 2010 4:17 p.m. >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw >>> >>> Hi, >>> I meet a problem when using Clustalw module. >>> Here is the error message: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: ClustalW call ( align -infile=/tmp/AeyAfdxGvH/YpcPbyhYht >>> -output=gcg -matrix=BLOSUM -ktup >>> le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | >>> cannot find the file or path >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- >>> 1.6.1/Bio/Root/Root.pm:368 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >>> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig >>> nment/Clustalw.pm:756 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::align >>> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali >>> gnment/Clustalw.pm:515 >>> STACK: test.txt:45 >>> ----------------------------------------------------------- >>> The test program is described as below: >>> ----------------------------------------------------------- >>> @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >>> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >>> # @seq_array is an array of Bio::Seq objects >>> $aln = $factory->align(\@seq_array); >>> ----------------------------------------------------------- >>> The work path of clustalw2 has been configured: >>> export CLUSTALDIR=/usr/local/bin/clustalw2 >>> So, what may be reason of the error? >>> Thanks! >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue Mar 30 16:10:09 2010 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 30 Mar 2010 22:10:09 +0200 Subject: [Bioperl-l] AlignIO formats Message-ID: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> Hi, Using GuessSeqFormat and AlignIO, I stumbled on some issues and am now wondering if the defined formats are actually OK. Esp. related to pfam, selex, stockholm formats it seems: pfam here is like selex without any comment lines, but with the /start-end after the seq id like myseq/1-111. The EBI site (http://www.ebi.ac.uk/2can/tutorials/formats.html#pfam) actually defines Pfam and Stockholm to be the same formats. This makes me wonder: is the Pfam format actually defined as Selex or Stockholm? Within BioPerl it is like Selex. In addition, Selex (as used in HMMER 2.3.2) contains comment lines like #=AC, #=RF or #=ID. GuessSeq format uses this to detect Selex, however, they do not have to be present. GuessSeqFormat uses: return (($lineno == 1 && $line =~ /^#=ID /) || ($lineno == 2 && $line =~ /^#=AC /) || ($line =~ /^#=SQ /)); to detect the Selex format. At the same time, the Selex reader does not seem to get the aln id or accession if( $entry =~ /^\#=GS\s+(\S+)\s+AC\s+(\S+)/ ) { $accession{ $1 } = $2; Also a Selex file like: seq1 ACGACGACGACG. seq2 ..GGGAAAGG.GA seq3 UUU..AAAUUU.A is guessed to be phylip (whereas the seq1/1-11 format will be guessed as pfam) I am not sure if the above is desired behaviour, though all sequences are read in the alignment object correctly. I' was wondering wether all Selex variations could be guessed as Selex, not as phylip, pfam or selex (though in the selex case we can have more alignments in one file). Regards, Bernd From p.j.a.cock at googlemail.com Tue Mar 30 17:12:46 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 30 Mar 2010 22:12:46 +0100 Subject: [Bioperl-l] AlignIO formats In-Reply-To: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> References: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> Message-ID: <320fb6e01003301412s6c90220el7a95bdc97dee03e6@mail.gmail.com> On Tue, Mar 30, 2010 at 9:10 PM, Bernd Web wrote: > Hi, > > Using GuessSeqFormat and AlignIO, I stumbled on some issues and > am now wondering if the defined formats are actually OK. Esp. related to > pfam, selex, stockholm formats it seems: > > pfam here is like selex without any comment lines, but with the > /start-end after the seq id like myseq/1-111. > The EBI site (http://www.ebi.ac.uk/2can/tutorials/formats.html#pfam) > actually defines Pfam and Stockholm to be the same formats. This makes > me wonder: is the Pfam format actually defined as Selex or Stockholm? > Within BioPerl it is like Selex. I (and therefore the Biopython documentation) also think PFAM and Stockholm alignments are basically the same thing. The BioPerl wiki seems to agree with this interpretation too. Looking at the HMMER2 examples, Selex is different but the comment style is similar. The obvious thing to check is the presence or absence of the "# STOCKHOLM 1.0" header if trying to tell them apart. See also: http://en.wikipedia.org/wiki/Stockholm_format and http://www.bioperl.org/wiki/Stockholm_multiple_alignment_format http://www.bioperl.org/wiki/SELEX_multiple_alignment_format Peter From jun.yin at ucd.ie Tue Mar 30 18:37:07 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 30 Mar 2010 23:37:07 +0100 Subject: [Bioperl-l] summer code project on Bioperl Message-ID: <7160acc75f99.4bb28b23@ucd.ie> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CV_JunYin.doc Type: application/msword Size: 27648 bytes Desc: not available URL: From ross at cuhk.edu.hk Wed Mar 31 17:28:59 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Thu, 1 Apr 2010 05:28:59 +0800 Subject: [Bioperl-l] BlastPlus usage inquiry In-Reply-To: References: Message-ID: <014401cad119$2d1467a0$873d36e0$@edu.hk> Dear all, I know it is inappropriate to raise this question in bioperl but as I received no better response from NCBI and so have to ask in this group (because finally I'll use bioperl to call blastplus). I have already been using the latest blastplus (the command is blastn directly) and found the problem of running slow and inability to run in a parallel/multithread manner. Previously I was using non blastplus version 2.2.22 with the command blastall -p blastn -a 8 etc. With similar arguments as below except the word size was 12, my shell script for the same input and database finishes almost instantly. I notice that except word size and min raw gapped score were changed by me, nothing appears to differ from the previous version parameters. Moreover, when I top my process, I find it uses only one CPU instead of 7. What may be the problem for the script that makes the job running for a day and still hasn't finished? blastn -query $1 -db $2 -out $1_$2.xml -num_threads 7 -word_size 4 -gapopen 3 -gapextend 1 -penalty -2 -outfmt 5 -xdrop_ungap 30 -xdrop_gap 30 -xdrop_gap_final 30 -min_raw_gapped_score 10 From anil_m_lal at yahoo.com Tue Mar 30 14:24:34 2010 From: anil_m_lal at yahoo.com (Anil Lal) Date: Tue, 30 Mar 2010 11:24:34 -0700 (PDT) Subject: [Bioperl-l] GSoC 2010 Message-ID: <717794.59615.qm@web37507.mail.mud.yahoo.com> Hello, I am a mid career software programmer and now transitioning in bioinformatics. I always had great interest in bioinformatics and only now am able to make a move to take classes. I am currently enrolled in University of santa cruz extension classes. I am very interested in GSoC 2010 and have identified potentially these two projects.Lightweight Sequence objects and Lazy Parsing mentored by Chris Fields and Perl Run Wrappers for External Programs in a Flash mentored by Mark Jenson. Please let me know if these projects are still available. If yes, I will send in my application with more details Thanks a lot for your help. I would be exciting to work in Bio Perl and contribute. Anil From schae234 at gmail.com Tue Mar 30 12:33:42 2010 From: schae234 at gmail.com (Robert Schaefer) Date: Tue, 30 Mar 2010 10:33:42 -0600 Subject: [Bioperl-l] Google Summer of Code Message-ID: <60c593881003300933p46c7c295k69a21ee986ef5777@mail.gmail.com> Hello, I am looking for more information of your mentorship program for google's SOC. Who would I contact for more information and to ask questions? Thank you, Rob Schaefer From maj at fortinbras.us Mon Mar 1 02:33:23 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 28 Feb 2010 21:33:23 -0500 Subject: [Bioperl-l] Bio::Tools::Run::StandaloneBlastPlus and outformat In-Reply-To: <9f985cdc1002281510o698291cu2c9b1e579536ad6d@mail.gmail.com> References: <9f985cdc1002281510o698291cu2c9b1e579536ad6d@mail.gmail.com> Message-ID: Ben -- Might be a bug; can you send your script and the error you get Thanks MAJ ----- Original Message ----- From: "Ben Bimber" To: "bioperl-l" Sent: Sunday, February 28, 2010 6:10 PM Subject: [Bioperl-l] Bio::Tools::Run::StandaloneBlastPlus and outformat > Not sure if this is a bug or if I'm missing something: > > In the standaloneBlastPlus wrapper, I can specify an output format > ('-outfmt' in blastn) using '-outformat'. blastn allows an integer to > specify output format (ie. 6 for tabular). It also allows a some number of > string to specify additonal columns. in this case, the whole block is > quoted: > > -outfmt "6 qgi qacc sseqid sallseqid sgi sacc sallacc qstart qend sstart > send qseq sseq length pident nident mismatch positive gapopen qframe sframe" > > The blastplus wrapper throws an error if you try to pass anything besides an > integer as -outformat. Is there another way to specify output format or is > this a limitation of the module? > > thanks, > ben > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From forrest_zhang at 163.com Mon Mar 1 05:10:31 2010 From: forrest_zhang at 163.com (forrest) Date: Mon, 01 Mar 2010 13:10:31 +0800 Subject: [Bioperl-l] use threads to get seq file error. Message-ID: <4B8B4C47.108@163.com> Hi all, When I use threads to get Genbank format file, show some error. It is shown as: "Can't call method "get_taxon" on unblessed reference at /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." ========================================= #!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GenBank; use threads; my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); my $seq_out = Bio::SeqIO->new(-format => "genbank", -file => ">dhn_all.gb"); my @seq; my $number = @id; my $max_threads = 6; for (my $thread_number=0;$thread_number<$number;){ my %threads_seq_hash; if ($number - $thread_number > $max_threads){ for (my $thread=0;$thread<$max_threads;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } }else{ my $else_number = $number % $max_threads; for (my $thread=0;$thread<$else_number;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } } foreach my $thread (sort keys %threads_seq_hash){ my ($seq) = $threads_seq_hash{$thread}->join; push (@seq,$seq); } } foreach (@seq){ $seq_out->write_seq($_); } ========================================= How can I fix this error? Thanks. Zhang Tao From cjfields at illinois.edu Mon Mar 1 20:37:18 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 01 Mar 2010 14:37:18 -0600 Subject: [Bioperl-l] use threads to get seq file error. In-Reply-To: <4B8B4C47.108@163.com> References: <4B8B4C47.108@163.com> Message-ID: <1267475838.16248.8.camel@pyrimidine.igb.uiuc.edu> I get much nastier ones than that; a small taste: --------------------- WARNING --------------------- MSG: exception while parsing location line [1..680] in reading EMBL/GenBank/SwissProt, ignoring feature source (seqid=AF031249): Eval-group not allowed at runtime, use re 'eval' in regex m/(.*?)\(((?x-ism: (?> [^()]+ | \( (??{.../ at /home/cjfields/bioperl/live/Bio/Factory/FTLocationFactory.pm line 161, line 36. --------------------------------------------------- Thread 2 terminated abnormally: Can't call method "primary_tag" on an undefined value at /home/cjfields/bioperl/live/Bio/SeqIO/genbank.pm line 662, line 36. Could you report this as a bug? chris On Mon, 2010-03-01 at 13:10 +0800, forrest wrote: > Hi all, > > When I use threads to get Genbank format file, show some error. It is > shown as: > > "Can't call method "get_taxon" on unblessed reference at > /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." > > ========================================= > #!/usr/bin/perl -w > use strict; > use Bio::SeqIO; > use Bio::Seq; > use Bio::DB::GenBank; > use threads; > > > my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); > > > my $seq_out = Bio::SeqIO->new(-format => "genbank", > -file => ">dhn_all.gb"); > my @seq; > > my $number = @id; > > my $max_threads = 6; > > for (my $thread_number=0;$thread_number<$number;){ > my %threads_seq_hash; > > if ($number - $thread_number > $max_threads){ > for (my $thread=0;$thread<$max_threads;){ > $threads_seq_hash{$thread} = threads->new(sub { > my $gb = Bio::DB::GenBank->new; > my $seq = $gb->get_Seq_by_acc($id[$thread_number]); > }); > $thread_number++; > $thread++; > > } > }else{ > my $else_number = $number % $max_threads; > for (my $thread=0;$thread<$else_number;){ > $threads_seq_hash{$thread} = threads->new(sub { > my $gb = Bio::DB::GenBank->new; > my $seq = $gb->get_Seq_by_acc($id[$thread_number]); > }); > $thread_number++; > $thread++; > > } > > > } > > foreach my $thread (sort keys %threads_seq_hash){ > my ($seq) = $threads_seq_hash{$thread}->join; > push (@seq,$seq); > } > } > > foreach (@seq){ > $seq_out->write_seq($_); > } > ========================================= > > > How can I fix this error? > Thanks. > > > Zhang Tao > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paolo.pavan at gmail.com Mon Mar 1 23:07:33 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 00:07:33 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> Message-ID: <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Dear all, Sorry for pushing up my post but, please does anyone have an hint for me? Maybe have I to send attached the report to the mailing list? I don't know attachment policies of the list, if it is allowed and is needed I can do that. Thank you, Paolo 2010/2/26 Paolo Pavan : > Sorry, > Maybe I forgot to add this is the megablast -m 5 output. > > Thank you again, > Paolo > > 2010/2/26 Paolo Pavan : >> Hi all, >> I have just a brief question: I've got some megablast reports such the >> one I've pasted below. >> I'm aware of the existence of the Bio::Search::IO::megablast and the >> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >> entire alignment represented as a Bio::SimpleAlign object or >> Bio::Align::AlignI implementing one? >> >> Thank you all, >> Paolo >> >> >> MEGABLAST 2.2.16 [Mar-25-2007] >> >> >> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >> "A greedy algorithm for aligning DNA sequences", >> J Comput Biol 2000; 7(1-2):203-14. >> >> Database: 00038-00053.fasta >> ?????????? 2 sequences; 2001 total letters >> >> Searching..................................................done >> >> Query= 00038-00053 >> ???????? (802 letters) >> >> >> >> ???????????????????????????????????????????????????????????????? Score??? E >> Sequences producing significant alignments:????????????????????? (bits) Value >> >> ______00038 >> 226?? 1e-62 >> ______00053 >> 115?? 3e-29 >> >> 1_0???????? 472 >> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >> ______00038 883 >> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 532 >> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >> ______00038 943 >> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 591 >> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 651 >> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053????? ------------------------------------------------------------ >> >> 1_0???????? 711 >> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >> ______00038 1000 >> ------------------------------------------------------------ 1001 >> ______00053 1??? -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >> >> 1_0???????? 771? ccttaagtatttactttaacttgtcttgatca 802 >> ______00038 1000 -------------------------------- 1001 >> ______00053 35?? ccttaagtatt-actttaacttgtcttgatca 65 >> ? Database: 00038-00053.fasta >> ??? Posted date:? Feb 25, 2010? 4:47 PM >> ? Number of letters in database: 2001 >> ? Number of sequences in database:? 2 >> >> Lambda???? K????? H >> ??? 1.37??? 0.711???? 1.31 >> >> Gapped >> Lambda???? K????? H >> ??? 1.37??? 0.711???? 1.31 >> >> >> Matrix: blastn matrix:1 -3 >> Gap Penalties: Existence: 0, Extension: 0 >> Number of Sequences: 2 >> Number of Hits to DB: 17 >> Number of extensions: 3 >> Number of successful extensions: 3 >> Number of sequences better than 10.0: 2 >> Number of HSP's gapped: 2 >> Number of HSP's successfully gapped: 2 >> Length of query: 802 >> Length of database: 2001 >> Length adjustment: 10 >> Effective length of query: 792 >> Effective length of database: 1981 >> Effective search space:? 1568952 >> Effective search space used:? 1568952 >> X1: 9 (17.8 bits) >> X2: 20 (39.6 bits) >> X3: 51 (101.1 bits) >> S1: 9 (18.3 bits) >> S2: 9 (18.3 bits) >> > From cjfields at illinois.edu Tue Mar 2 00:30:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 1 Mar 2010 18:30:43 -0600 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Message-ID: Paolo, You can get a Bio::SimpleAlign from the HSP object. The first code example in this section in the HOWTO demonstrates this: http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods chris On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > Dear all, > Sorry for pushing up my post but, please does anyone have an hint for me? > Maybe have I to send attached the report to the mailing list? I don't > know attachment policies of the list, if it is allowed and is needed I > can do that. > > Thank you, > Paolo > > 2010/2/26 Paolo Pavan : >> Sorry, >> Maybe I forgot to add this is the megablast -m 5 output. >> >> Thank you again, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Hi all, >>> I have just a brief question: I've got some megablast reports such the >>> one I've pasted below. >>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>> entire alignment represented as a Bio::SimpleAlign object or >>> Bio::Align::AlignI implementing one? >>> >>> Thank you all, >>> Paolo >>> >>> >>> MEGABLAST 2.2.16 [Mar-25-2007] >>> >>> >>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >>> "A greedy algorithm for aligning DNA sequences", >>> J Comput Biol 2000; 7(1-2):203-14. >>> >>> Database: 00038-00053.fasta >>> 2 sequences; 2001 total letters >>> >>> Searching..................................................done >>> >>> Query= 00038-00053 >>> (802 letters) >>> >>> >>> >>> Score E >>> Sequences producing significant alignments: (bits) Value >>> >>> ______00038 >>> 226 1e-62 >>> ______00053 >>> 115 3e-29 >>> >>> 1_0 472 >>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>> ______00038 883 >>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 532 >>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>> ______00038 943 >>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 591 >>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 651 >>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 ------------------------------------------------------------ >>> >>> 1_0 711 >>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>> ______00038 1000 >>> ------------------------------------------------------------ 1001 >>> ______00053 1 -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>> >>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>> ______00038 1000 -------------------------------- 1001 >>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>> Database: 00038-00053.fasta >>> Posted date: Feb 25, 2010 4:47 PM >>> Number of letters in database: 2001 >>> Number of sequences in database: 2 >>> >>> Lambda K H >>> 1.37 0.711 1.31 >>> >>> Gapped >>> Lambda K H >>> 1.37 0.711 1.31 >>> >>> >>> Matrix: blastn matrix:1 -3 >>> Gap Penalties: Existence: 0, Extension: 0 >>> Number of Sequences: 2 >>> Number of Hits to DB: 17 >>> Number of extensions: 3 >>> Number of successful extensions: 3 >>> Number of sequences better than 10.0: 2 >>> Number of HSP's gapped: 2 >>> Number of HSP's successfully gapped: 2 >>> Length of query: 802 >>> Length of database: 2001 >>> Length adjustment: 10 >>> Effective length of query: 792 >>> Effective length of database: 1981 >>> Effective search space: 1568952 >>> Effective search space used: 1568952 >>> X1: 9 (17.8 bits) >>> X2: 20 (39.6 bits) >>> X3: 51 (101.1 bits) >>> S1: 9 (18.3 bits) >>> S2: 9 (18.3 bits) >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Tue Mar 2 01:51:02 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 01 Mar 2010 17:51:02 -0800 Subject: [Bioperl-l] Any module for chromosome region analysis ? In-Reply-To: References: <1267131590.4355.2.camel@epistle> <1267131697.4355.3.camel@epistle> Message-ID: <4B8C6F06.5050905@bioperl.org> Like the ensembl perl API? Robert Bradbury wrote: > I'm not sure if the species being dealt with are "common", but it would seem > to me that a logical addition to bioperl would be an extension that took a > genome location (or locations) and interfaced one into a browser of those > regions in external databases (e.g. UCSC Genome Browser, Ensemble, etc.). > The only cases where that wouldn't work is if one is dealing with novel > species that aren't in the databases yet. > > Robert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Tue Mar 2 06:21:31 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 01 Mar 2010 22:21:31 -0800 Subject: [Bioperl-l] call for project ideas - Google Summer of Code Message-ID: <4B8CAE6B.4010807@cornell.edu> Hi all, Google's Summer of Code is coming round again, very soon now (mentoring organization applications are due next week). We need project ideas for prospective Summer of Code interns. There's a page on the BioPerl wiki, please have a look and add your ideas for intern projects. For more on Google Summer of Code, what it is and how it works, see their FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs One of the summer intern ideas I have on the page so far is to help with the tough grunt work of breaking BioPerl into smaller, more easily managed distributions. I'm sure you all can think of plenty more! Here's the page: http://www.bioperl.org/wiki/Google_Summer_of_Code Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From paolo.pavan at gmail.com Tue Mar 2 14:37:59 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 15:37:59 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> Message-ID: <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> Hi Chris, Thank you for your reply. So I have to understand that since the get_aln method returns the HSP alignment, there is no way to retrieve the whole alignment as in the example pasted, isn't it? Basically I'm trying to use megablast as kind of multiple local alignment engine and actually I'm not pretty sure this is a good idea but in my particular case could be suitable. I mean that the example below reports only the portions of the sequences that align loosing the portions that does not, I'm not sure I gave the idea. What do you think about? Can you give me your opinion? If there isn't any module written yet, I can try to write a parser, it could be of any interest? Thank you, Paolo 2010/3/2 Chris Fields : > Paolo, > > You can get a Bio::SimpleAlign from the HSP object. ?The first code example in this section in the HOWTO demonstrates this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > chris > > On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > >> Dear all, >> Sorry for pushing up my post but, please does anyone have an hint for me? >> Maybe have I to send attached the report to the mailing list? I don't >> know attachment policies of the list, if it is allowed and is needed I >> can do that. >> >> Thank you, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Sorry, >>> Maybe I forgot to add this is the megablast -m 5 output. >>> >>> Thank you again, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> Hi all, >>>> I have just a brief question: I've got some megablast reports such the >>>> one I've pasted below. >>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>> entire alignment represented as a Bio::SimpleAlign object or >>>> Bio::Align::AlignI implementing one? >>>> >>>> Thank you all, >>>> Paolo >>>> >>>> >>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>> >>>> >>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), >>>> "A greedy algorithm for aligning DNA sequences", >>>> J Comput Biol 2000; 7(1-2):203-14. >>>> >>>> Database: 00038-00053.fasta >>>> ? ? ? ? ? ?2 sequences; 2001 total letters >>>> >>>> Searching..................................................done >>>> >>>> Query= 00038-00053 >>>> ? ? ? ? ?(802 letters) >>>> >>>> >>>> >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Score ? ?E >>>> Sequences producing significant alignments: ? ? ? ? ? ? ? ? ? ? ?(bits) Value >>>> >>>> ______00038 >>>> 226 ? 1e-62 >>>> ______00053 >>>> 115 ? 3e-29 >>>> >>>> 1_0 ? ? ? ? 472 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>> ______00038 883 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 532 >>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>> ______00038 943 >>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 591 >>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 651 >>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ? ? ?------------------------------------------------------------ >>>> >>>> 1_0 ? ? ? ? 711 >>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 1 ? ?-------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>> >>>> 1_0 ? ? ? ? 771 ?ccttaagtatttactttaacttgtcttgatca 802 >>>> ______00038 1000 -------------------------------- 1001 >>>> ______00053 35 ? ccttaagtatt-actttaacttgtcttgatca 65 >>>> ? Database: 00038-00053.fasta >>>> ? ? Posted date: ?Feb 25, 2010 ?4:47 PM >>>> ? Number of letters in database: 2001 >>>> ? Number of sequences in database: ?2 >>>> >>>> Lambda ? ? K ? ? ?H >>>> ? ? 1.37 ? ?0.711 ? ? 1.31 >>>> >>>> Gapped >>>> Lambda ? ? K ? ? ?H >>>> ? ? 1.37 ? ?0.711 ? ? 1.31 >>>> >>>> >>>> Matrix: blastn matrix:1 -3 >>>> Gap Penalties: Existence: 0, Extension: 0 >>>> Number of Sequences: 2 >>>> Number of Hits to DB: 17 >>>> Number of extensions: 3 >>>> Number of successful extensions: 3 >>>> Number of sequences better than 10.0: 2 >>>> Number of HSP's gapped: 2 >>>> Number of HSP's successfully gapped: 2 >>>> Length of query: 802 >>>> Length of database: 2001 >>>> Length adjustment: 10 >>>> Effective length of query: 792 >>>> Effective length of database: 1981 >>>> Effective search space: ?1568952 >>>> Effective search space used: ?1568952 >>>> X1: 9 (17.8 bits) >>>> X2: 20 (39.6 bits) >>>> X3: 51 (101.1 bits) >>>> S1: 9 (18.3 bits) >>>> S2: 9 (18.3 bits) >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Zhang_tao at uestc.edu.cn Mon Mar 1 05:02:12 2010 From: Zhang_tao at uestc.edu.cn (Zhang_tao) Date: Mon, 01 Mar 2010 13:02:12 +0800 Subject: [Bioperl-l] use threads to get seq file error. Message-ID: <467416916.06375@eyou.net> Hi all, When I use threads to get Genbank format file, show some error. It is shown as: "Can't call method "get_taxon" on unblessed reference at /opt/local/lib/perl5/site_perl/5.8.9/Bio/Taxon.pm line 671." #!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq; use Bio::DB::GenBank; use threads; my @id = ("AK287649","AF031249","EZ238383","BLYDHN5","AY895908","EF409493","AY895886","AF181455","AY895930","EF409498"); my $seq_out = Bio::SeqIO->new(-format => "genbank", -file => ">dhn_all.gb"); my @seq; my $number = @id; my $max_threads = 6; for (my $thread_number=0;$thread_number<$number;){ my %threads_seq_hash; if ($number - $thread_number > $max_threads){ for (my $thread=0;$thread<$max_threads;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } }else{ my $else_number = $number % $max_threads; for (my $thread=0;$thread<$else_number;){ $threads_seq_hash{$thread} = threads->new(sub { my $gb = Bio::DB::GenBank->new; my $seq = $gb->get_Seq_by_acc($id[$thread_number]); }); $thread_number++; $thread++; } } foreach my $thread (sort keys %threads_seq_hash){ my ($seq) = $threads_seq_hash{$thread}->join; push (@seq,$seq); } } foreach (@seq){ $seq_out->write_seq($_); } How can I fix this error? Thanks. Zhang Tao From lpritc at scri.ac.uk Mon Mar 1 11:32:10 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 01 Mar 2010 11:32:10 +0000 Subject: [Bioperl-l] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts Message-ID: Hi, I've tried going back through the mailing list, Googling the answer, and reading the documentation and wiki to find a solution for this. I've either missed it, or it's not there yet. Hopefully there's a simple solution, or an option that I'm just not seeing. I'm sure other people must be using CHADO for bacterial genomes, and I would be interested in hearing about best practice for using CHADO/GBROWSE with these sequences (I've seen http://gmod.org/wiki/Chado_for_prokaryotes - but there's not much in there...). I have a working CHADO(GMOD-1.0)/GBROWSE2/BioPerl 1.6.1 setup on CentOS 5.4, and I'm trying to load some bacterial data. Specifically for this example, I'm trying to get the GenBank sequences for E.coli S88: NC_011742 and NC_011747 into CHADO. I've been following instructions from a number of locations, including http://gmod.org/wiki/Artemis-Chado_Integration_Tutorial and http://gmod.org/wiki/Chado_Tutorial, but there's an issue with these two files, in that the NC_011742 (chromosome) and NC_011747 (plasmid) sequences contain genes that have the same names (and several genes with the same name in the same sequence!), and this appears to be a problem. Here's what's going wrong: I start off with the two GenBank files: """ [lpritc at localhost ~]$ ls -1 *.gbk NC_011742.gbk NC_011747.gbk """ And convert these to .gff3 using the BioPerl script (it doesn't seem to matter whether I pass them with the wildcard, or convert separately, though passing multiple sequences for conversion might be a good place to check for unique IDs): """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s *.gbk # Input: NC_011742.gbk # working on region:NC_011742, Escherichia coli S88, 19-DEC-2008, Escherichia coli S88, complete genome. # GFF3 saved to ./NC_011742.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4696 # gene 4898 # region 1 # pseudogene 151 # CDS 4696 # RESIDUES(tr) 1442813 # RESIDUES 5032268 # processed_transcript 89 # rRNA 22 # pseudogenic_region 151 # exon 4899 # tRNA 91 # # Input: NC_011747.gbk # working on region:NC_011747, Escherichia coli S88, 18-AUG-2009, Escherichia coli S88 plasmid pECOS88, complete sequence. # GFF3 saved to ./NC_011747.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4832 # gene 5037 # region 2 # pseudogene 159 # CDS 4832 # RESIDUES(tr) 1477756 # RESIDUES 5166121 # processed_transcript 92 # rRNA 22 # pseudogenic_region 159 # exon 5038 # tRNA 91 # """ I can then use the gmod_bulk_load_gff3.pl script to load either file, but only singly. This appears to work, and the result is visible and seemingly correctly navigable in GBROWSE (using NC_011747 as the first sequence here, but the order is unimportant): """ [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref GeneID --noexon --recreate_cache --gfffile NC_011747.gbk.gff (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes...Done. Preparing data for inserting into the chado database (This may take a while ...) Dropping cds temp tables... Creating cds temp tables... NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_cds_row_id_seq" for serial column "tmp_cds_handler.cds_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_pkey" for table "tmp_cds_handler" NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_relationship_rel_row_id_seq" for serial column "tmp_cds_handler_relationship.rel_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" Loading data into feature table ... Loading data into featureloc table ... Loading data into feature_relationship table ... Loading data into featureprop table ... Skipping feature_cvterm table since the load file is empty... Skipping synonym table since the load file is empty... Skipping feature_synonym table since the load file is empty... Skipping dbxref table since the load file is empty... Loading data into feature_dbxref table ... Skipping analysisfeature table since the load file is empty... Skipping cvterm table since the load file is empty... Skipping db table since the load file is empty... Skipping cv table since the load file is empty... Skipping analysis table since the load file is empty... Skipping organism table since the load file is empty... Adding cvtermprop=MapReferenceType for 'region' ... Loading sequences (if any) ... Optimizing database (this may take a while) ... (feature featureloc feature_relationship featureprop feature_cvterm synonym feature_synonym dbxref feature_dbxref analysisfeature cvterm db cv analysis organism ) Done. While this script has made an effort to optimize the database, you should probably also run VACUUM FULL ANALYZE on the database as well """ """ chado=> SELECT feature_id, organism_id, name, uniquename FROM feature WHERE name='NC_011747'; feature_id | organism_id | name | uniquename ------------+-------------+-----------+------------ 146917 | 99 | NC_011747 | NC_011747 """ However, attempting to load in the second sequence throws an error (though this might also be a good point to check for ID uniqueness with a database check, and appropriate modification to the ID, if necessary - problems could arise if we were trying to add genuine duplicates, though...): """ [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref GeneID --noexon --recreate_cache --gfffile NC_011742.gbk.gff (Re)creating the uniquename cache in the database... Creating table... Populating table... Creating indexes...Done. Preparing data for inserting into the chado database (This may take a while ...) Dropping cds temp tables... Creating cds temp tables... NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_cds_row_id_seq" for serial column "tmp_cds_handler.cds_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_pkey" for table "tmp_cds_handler" NOTICE: CREATE TABLE will create implicit sequence "tmp_cds_handler_relationship_rel_row_id_seq" for serial column "tmp_cds_handler_relationship.rel_row_id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" no parent yacC; you probably need to rerun the loader with the --recreate_cache option Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Pg::db handle dbname=chado;port=5432;host=localhost. """ This, of course, prevents the upload of the sequence and its annotations, as a whole. The script recommends that the --recreate_cache option should be used, but I am already using it. If the same process is run, reversing the order of the input files, the same error is reported, but for the gene with name 'int'. Both sequences contain genes with the names 'int' and 'yacC' (NC_011742 appears to contain four genes with the name 'int'): """ [lpritc at localhost ~]$ grep 'ID=yacC;' *.gbk.gff NC_011742.gbk.gff:NC_011742 GenBank gene 142755 143273 . - . ID=yacC;Dbxref=GeneID:7130628;gene=yacC;locus_tag=ECS88_0131 NC_011747.gbk.gff:NC_011747 GenBank gene 85083 85931 . + . ID=yacC;Dbxref=GeneID:7119486;gene=yacC;locus_tag=pECS88_0103 [lpritc at localhost ~]$ grep 'ID=int;' *.gbk.gff NC_011742.gbk.gff:NC_011742 GenBank gene 1182443 1183585 . - . ID=int;Dbxref=GeneID:7131611;gene=int;locus_tag=ECS88_1152 NC_011742.gbk.gff:NC_011742 GenBank pseudogene 1998684 1999646 . + . ID=int;Dbxref=GeneID:7128964;gene=int;locus_tag=ECS88_2031;pseudo=_no_value NC_011742.gbk.gff:NC_011742 GenBank gene 2829972 2830991 . + . ID=int;Dbxref=GeneID:7131911;gene=int;locus_tag=ECS88_2851 NC_011742.gbk.gff:NC_011742 GenBank gene 3220074 3221336 . + . ID=int;Dbxref=GeneID:7129893;gene=int;locus_tag=ECS88_3250 NC_011747.gbk.gff:NC_011747 GenBank gene 132 872 . + . ID=int;Dbxref=GeneID:7119360;gene=int;locus_tag=pECS88_0001 """ Commenting out either of these genes, and their child features, defers the error to another gene that has the same name in both sequences in each case. It seems that the problem might derive from attempting to uniquely associate each gene uniquely with its 'gene' tag in the GenBank file and, as there are several points in the process where it would be sensible to check for name collisions, so that the feature:uniquename column can be modified to reflect this, I looked for command-line options to each script, but didn't see one that could help. Examining the manual for gmod_bulk_load_gff3.pl suggests that this might be the problem (though I might be misunderstanding it): """ Column 9 (group) Here is where the magic happens. Assigning feature.name, feature.uniquename The values of feature.name and feature.uniquename are assigned according to these simple rules: If there is an ID tag, that is used as feature.uniquename otherwise, it is assigned a uniquename that is equal to ?auto? concatenated with the feature_id. (Note that this is a potential problem as there is no check to make sure that it is appropriately unique.) If there is a Name tag, it?s value is set to feature.name; otherwise it is null. Note that these rules are much more simple than that those that Bio::DB::GFF uses, and may need to be revisited. """ I suspect that, as the bp_genbank2gff3.pl script converts gene names (which are not guaranteed to be unique) to ID tags, the problem recognised in the manual is cropping up at this point. Luckily, the GenBank files come with locus_tag tags, which should be unique for each gene (see http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html#locus_tag). For bacteria, at least, using the locus_tag values might be a more robust option for the bp_genbank2gff3.pl; this already appears to have been recognised in the script comments: """ #?? should gene_name from /locus_tag,/gene,/product,/transposon=xxx # be converted to or added as Name=xxx (if not ID= or as well) ## problematic: convert_to_name ($feature); # drops /locus_tag,/gene, tags """ I can get round the upload problem somewhat suckily by changing the priority given to 'locus_tag' and 'gene' tags for generating the .gff ID tag in the bp_genbank2gff3.pl script: """ [lpritc at localhost ~]$ diff bp_genbank2gff3.pl /usr/bin/bp_genbank2gff3.pl 976,977c976,977 < if ($g->has_tag('locus_tag')) { < ($gene_id) = $g->get_tag_values('locus_tag'); --- > if ($g->has_tag('gene')) { > ($gene_id) = $g->get_tag_values('gene'); 979,980c979,980 < elsif ($g->has_tag('gene')) { < ($gene_id) = $g->get_tag_values('gene'); --- > elsif ($g->has_tag('locus_tag')) { > ($gene_id) = $g->get_tag_values('locus_tag'); """ But this isn't a complete solution, as GBROWSE searches by gene name don't work after making this change, and presumably some further configuration or hacking about is required to sort that out (advice welcome). So, what are other people doing to overcome this issue (if you've seen it), and would a change to the bp_genbank2gff.pl script along the lines I mention be useful to others? Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From janine.arloth at googlemail.com Mon Mar 1 16:25:09 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Mon, 1 Mar 2010 17:25:09 +0100 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> Hello, I am running blast+ and want to create blastdb, depending on a checkbox. That means when mydb is to old then I want to rebuilt the blastdb files and create a ''new'' db. When the latest versions of my files is ok, then blast should ran with the existing db. Using this code, there I will never built a new db. It is creating and than it does not create a new one. if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } Thanks for helping From jensen at fortinbras.us Tue Mar 2 03:58:09 2010 From: jensen at fortinbras.us (Mark A. Jensen) Date: Mon, 1 Mar 2010 22:58:09 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> References: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> Message-ID: <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> Hi Janine-- You'll need to get the latest version of Bio/Tools/Run/StandAloneBlastPlus.pm (rev. 16878). Then the -overwrite parameter will actually work, and you can write if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -overwrite => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } MAJ ----- Original Message ----- From: "Janine Arloth" To: Cc: Sent: Monday, March 01, 2010 11:25 AM Subject: StandAloneBlastPlus Hello, I am running blast+ and want to create blastdb, depending on a checkbox. That means when mydb is to old then I want to rebuilt the blastdb files and create a ''new'' db. When the latest versions of my files is ok, then blast should ran with the existing db. Using this code, there I will never built a new db. It is creating and than it does not create a new one. if($checkbox eq 'yes'){ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } Thanks for helping From szy0931 at gmail.com Tue Mar 2 06:08:10 2010 From: szy0931 at gmail.com (Zhenyu Shen) Date: Mon, 1 Mar 2010 22:08:10 -0800 (PST) Subject: [Bioperl-l] how to convert a txt file to a bed file? Message-ID: I want to convert a txt file to a bed file and then load the bed file to USCS genome browser. But how to convert the txt file to a bed file with perl? thanks From joaofadista at gmail.com Tue Mar 2 09:10:03 2010 From: joaofadista at gmail.com (fadista) Date: Tue, 2 Mar 2010 01:10:03 -0800 (PST) Subject: [Bioperl-l] Next-gen modules Message-ID: Hi, I would like to know if there is any Next-gen sequencing modules on Bioperl. Specifically, I would like to know if there is a perl script to trim poor quality sequence reads from Illumina/Solexa platform. Best regards, Fadista From maj at fortinbras.us Tue Mar 2 14:51:12 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 09:51:12 -0500 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com><56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com><56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> Message-ID: <18C0182252934619AD12E49243BE3C14@NewLife> This might a good method to have for Bio::Search::Tiling-- you want to stitch together all the hsps and have the concatenated alignment returned as a Bio::SimpleAlign, correct? Tiling would create the right set of hsps from which to generate the composite alignment. I can try to get something working, but it may take a while- MAJ ----- Original Message ----- From: "Paolo Pavan" To: "Chris Fields" Cc: Sent: Tuesday, March 02, 2010 9:37 AM Subject: Re: [Bioperl-l] Alignment from blast report Hi Chris, Thank you for your reply. So I have to understand that since the get_aln method returns the HSP alignment, there is no way to retrieve the whole alignment as in the example pasted, isn't it? Basically I'm trying to use megablast as kind of multiple local alignment engine and actually I'm not pretty sure this is a good idea but in my particular case could be suitable. I mean that the example below reports only the portions of the sequences that align loosing the portions that does not, I'm not sure I gave the idea. What do you think about? Can you give me your opinion? If there isn't any module written yet, I can try to write a parser, it could be of any interest? Thank you, Paolo 2010/3/2 Chris Fields : > Paolo, > > You can get a Bio::SimpleAlign from the HSP object. The first code example in > this section in the HOWTO demonstrates this: > > http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods > > chris > > On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: > >> Dear all, >> Sorry for pushing up my post but, please does anyone have an hint for me? >> Maybe have I to send attached the report to the mailing list? I don't >> know attachment policies of the list, if it is allowed and is needed I >> can do that. >> >> Thank you, >> Paolo >> >> 2010/2/26 Paolo Pavan : >>> Sorry, >>> Maybe I forgot to add this is the megablast -m 5 output. >>> >>> Thank you again, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> Hi all, >>>> I have just a brief question: I've got some megablast reports such the >>>> one I've pasted below. >>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>> entire alignment represented as a Bio::SimpleAlign object or >>>> Bio::Align::AlignI implementing one? >>>> >>>> Thank you all, >>>> Paolo >>>> >>>> >>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>> >>>> >>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>> (2000), >>>> "A greedy algorithm for aligning DNA sequences", >>>> J Comput Biol 2000; 7(1-2):203-14. >>>> >>>> Database: 00038-00053.fasta >>>> 2 sequences; 2001 total letters >>>> >>>> Searching..................................................done >>>> >>>> Query= 00038-00053 >>>> (802 letters) >>>> >>>> >>>> >>>> Score E >>>> Sequences producing significant alignments: (bits) Value >>>> >>>> ______00038 >>>> 226 1e-62 >>>> ______00053 >>>> 115 3e-29 >>>> >>>> 1_0 472 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>> ______00038 883 >>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 532 >>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>> ______00038 943 >>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 591 >>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 651 >>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 ------------------------------------------------------------ >>>> >>>> 1_0 711 >>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>> ______00038 1000 >>>> ------------------------------------------------------------ 1001 >>>> ______00053 1 -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg >>>> 34 >>>> >>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>> ______00038 1000 -------------------------------- 1001 >>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>> Database: 00038-00053.fasta >>>> Posted date: Feb 25, 2010 4:47 PM >>>> Number of letters in database: 2001 >>>> Number of sequences in database: 2 >>>> >>>> Lambda K H >>>> 1.37 0.711 1.31 >>>> >>>> Gapped >>>> Lambda K H >>>> 1.37 0.711 1.31 >>>> >>>> >>>> Matrix: blastn matrix:1 -3 >>>> Gap Penalties: Existence: 0, Extension: 0 >>>> Number of Sequences: 2 >>>> Number of Hits to DB: 17 >>>> Number of extensions: 3 >>>> Number of successful extensions: 3 >>>> Number of sequences better than 10.0: 2 >>>> Number of HSP's gapped: 2 >>>> Number of HSP's successfully gapped: 2 >>>> Length of query: 802 >>>> Length of database: 2001 >>>> Length adjustment: 10 >>>> Effective length of query: 792 >>>> Effective length of database: 1981 >>>> Effective search space: 1568952 >>>> Effective search space used: 1568952 >>>> X1: 9 (17.8 bits) >>>> X2: 20 (39.6 bits) >>>> X3: 51 (101.1 bits) >>>> S1: 9 (18.3 bits) >>>> S2: 9 (18.3 bits) >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Mar 2 15:12:02 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 10:12:02 -0500 Subject: [Bioperl-l] Installing bioperl on windows In-Reply-To: <30b0ffab-3ad6-4b59-8c19-2f203ff6c4f9@f17g2000prh.googlegroups.com> References: <30b0ffab-3ad6-4b59-8c19-2f203ff6c4f9@f17g2000prh.googlegroups.com> Message-ID: The steps on the wiki are in fact quite detailed. What we need then is details from you--the commands you ran and your error messages. Thanks. ----- Original Message ----- From: "disha" To: Sent: Friday, February 26, 2010 8:43 AM Subject: [Bioperl-l] Installing bioperl on windows > Please tell me the procedure (detailed ) for installing bioperl on > windows vista.I tried the steps mentioned on the site but failed at > the initial steps > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From scott at scottcain.net Tue Mar 2 16:11:13 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 2 Mar 2010 11:11:13 -0500 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: References: Message-ID: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Hi Leighton, Wow, that is a lot of text; I really appreciate your thoroughness in describing the problem. I have a few suggestions to get the ball rolling. First, I am working on the 1.1 release of gmod/chado, and it may fix some of the problems you are describing. Certainly, ID collisions between GFF files should not be a problem (I didn't think they were in the 1.0 release, but that was a long time ago). Please try a checkout of the schema trunk in the gmod svn: http://gmod.org/wiki/SVN Another thing you may want to look at is that just last week, a developer at Texas A&M, Nathan Liles, contributed code to the bioperl-live trunk for the genbank2gff3.pl script that will do a much better job of converting bacterial genbank files to GFF3; perhaps that will help too. Working with a svn checkout of bioperl-live shouldn't be too scary either; the pieces you are interested in (that work with Chado and GBrowse) are quite stable. Let us know how it goes, Scott On Mon, Mar 1, 2010 at 6:32 AM, Leighton Pritchard wrote: > Hi, > > I've tried going back through the mailing list, Googling the answer, and > reading the documentation and wiki to find a solution for this. ?I've either > missed it, or it's not there yet. ?Hopefully there's a simple solution, or > an option that I'm just not seeing. ?I'm sure other people must be using > CHADO for bacterial genomes, and I would be interested in hearing about best > practice for using CHADO/GBROWSE with these sequences (I've seen > http://gmod.org/wiki/Chado_for_prokaryotes - but there's not much in > there...). > > I have a working CHADO(GMOD-1.0)/GBROWSE2/BioPerl 1.6.1 setup on CentOS 5.4, > and I'm trying to load some bacterial data. ?Specifically for this example, > I'm trying to get the GenBank sequences for E.coli S88: NC_011742 and > NC_011747 into CHADO. ?I've been following instructions from a number of > locations, including http://gmod.org/wiki/Artemis-Chado_Integration_Tutorial > and http://gmod.org/wiki/Chado_Tutorial, but there's an issue with these two > files, in that the NC_011742 (chromosome) and NC_011747 (plasmid) sequences > contain genes that have the same names (and several genes with the same name > in the same sequence!), and this appears to be a problem. ?Here's what's > going wrong: > > I start off with the two GenBank files: > > """ > [lpritc at localhost ~]$ ls -1 *.gbk > NC_011742.gbk > NC_011747.gbk > """ > > And convert these to .gff3 using the BioPerl script (it doesn't seem to > matter whether I pass them with the wildcard, or convert separately, though > passing multiple sequences for conversion might be a good place to check for > unique IDs): > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s *.gbk > # Input: NC_011742.gbk > # working on region:NC_011742, Escherichia coli S88, 19-DEC-2008, > Escherichia coli S88, complete genome. > # GFF3 saved to ./NC_011742.gbk.gff > # Summary: > # Feature ? ?Count > # ------- ? ?----- > # mRNA ?4696 > # gene ?4898 > # region ?1 > # pseudogene ?151 > # CDS ?4696 > # RESIDUES(tr) ?1442813 > # RESIDUES ?5032268 > # processed_transcript ?89 > # rRNA ?22 > # pseudogenic_region ?151 > # exon ?4899 > # tRNA ?91 > # > # Input: NC_011747.gbk > # working on region:NC_011747, Escherichia coli S88, 18-AUG-2009, > Escherichia coli S88 plasmid pECOS88, complete sequence. > # GFF3 saved to ./NC_011747.gbk.gff > # Summary: > # Feature ? ?Count > # ------- ? ?----- > # mRNA ?4832 > # gene ?5037 > # region ?2 > # pseudogene ?159 > # CDS ?4832 > # RESIDUES(tr) ?1477756 > # RESIDUES ?5166121 > # processed_transcript ?92 > # rRNA ?22 > # pseudogenic_region ?159 > # exon ?5038 > # tRNA ?91 > # > """ > > I can then use the gmod_bulk_load_gff3.pl script to load either file, but > only singly. ?This appears to work, and the result is visible and seemingly > correctly navigable in GBROWSE (using NC_011747 as the first sequence here, > but the order is unimportant): > > """ > [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref > GeneID --noexon --recreate_cache --gfffile NC_011747.gbk.gff > (Re)creating the uniquename cache in the database... > Creating table... > Populating table... > Creating indexes...Done. > Preparing data for inserting into the chado database > (This may take a while ...) > Dropping cds temp tables... > Creating cds temp tables... > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_cds_row_id_seq" for serial column > "tmp_cds_handler.cds_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_pkey" for table "tmp_cds_handler" > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column > "tmp_cds_handler_relationship.rel_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" > Loading data into feature table ... > Loading data into featureloc table ... > Loading data into feature_relationship table ... > Loading data into featureprop table ... > Skipping feature_cvterm table since the load file is empty... > Skipping synonym table since the load file is empty... > Skipping feature_synonym table since the load file is empty... > Skipping dbxref table since the load file is empty... > Loading data into feature_dbxref table ... > Skipping analysisfeature table since the load file is empty... > Skipping cvterm table since the load file is empty... > Skipping db table since the load file is empty... > Skipping cv table since the load file is empty... > Skipping analysis table since the load file is empty... > Skipping organism table since the load file is empty... > Adding cvtermprop=MapReferenceType for 'region' ... > Loading sequences (if any) ... > Optimizing database (this may take a while) ... > ?(feature featureloc feature_relationship featureprop feature_cvterm > synonym feature_synonym dbxref feature_dbxref analysisfeature cvterm db cv > analysis organism ) Done. > > While this script has made an effort to optimize the database, you > should probably also run VACUUM FULL ANALYZE on the database as well > """ > > """ > chado=> SELECT feature_id, organism_id, name, uniquename FROM feature WHERE > name='NC_011747'; > ?feature_id | organism_id | ? name ? ?| uniquename > ------------+-------------+-----------+------------ > ? ? 146917 | ? ? ? ? ?99 | NC_011747 | NC_011747 > """ > > However, attempting to load in the second sequence throws an error (though > this might also be a good point to check for ID uniqueness with a database > check, and appropriate modification to the ID, if necessary - problems could > arise if we were trying to add genuine duplicates, though...): > > """ > [lpritc at localhost ~]$ gmod_bulk_load_gff3.pl --organism E.coli --dbxref > GeneID --noexon --recreate_cache --gfffile NC_011742.gbk.gff > (Re)creating the uniquename cache in the database... > Creating table... > Populating table... > Creating indexes...Done. > Preparing data for inserting into the chado database > (This may take a while ...) > Dropping cds temp tables... > Creating cds temp tables... > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_cds_row_id_seq" for serial column > "tmp_cds_handler.cds_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_pkey" for table "tmp_cds_handler" > NOTICE: ?CREATE TABLE will create implicit sequence > "tmp_cds_handler_relationship_rel_row_id_seq" for serial column > "tmp_cds_handler_relationship.rel_row_id" > NOTICE: ?CREATE TABLE / PRIMARY KEY will create implicit index > "tmp_cds_handler_relationship_pkey" for table "tmp_cds_handler_relationship" > > no parent yacC; > you probably need to rerun the loader with the --recreate_cache option > > Issuing rollback() due to DESTROY without explicit disconnect() of > DBD::Pg::db handle dbname=chado;port=5432;host=localhost. > """ > > This, of course, prevents the upload of the sequence and its annotations, as > a whole. > > The script recommends that the --recreate_cache option should be used, but I > am already using it. ?If the same process is run, reversing the order of the > input files, the same error is reported, but for the gene with name 'int'. > Both sequences contain genes with the names 'int' and 'yacC' (NC_011742 > appears to contain four genes with the name 'int'): > > """ > [lpritc at localhost ~]$ grep 'ID=yacC;' *.gbk.gff > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?142755 ? ?143273 ? ?. ? ?- > . ? ?ID=yacC;Dbxref=GeneID:7130628;gene=yacC;locus_tag=ECS88_0131 > NC_011747.gbk.gff:NC_011747 ? ?GenBank ? ?gene ? ?85083 ? ?85931 ? ?. ? ?+ > . ? ?ID=yacC;Dbxref=GeneID:7119486;gene=yacC;locus_tag=pECS88_0103 > > [lpritc at localhost ~]$ grep 'ID=int;' *.gbk.gff > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?1182443 ? ?1183585 ? ?. > - ? ?. ? ?ID=int;Dbxref=GeneID:7131611;gene=int;locus_tag=ECS88_1152 > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?pseudogene ? ?1998684 ? ?1999646 > . ? ?+ ? ?. > ID=int;Dbxref=GeneID:7128964;gene=int;locus_tag=ECS88_2031;pseudo=_no_value > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?2829972 ? ?2830991 ? ?. > + ? ?. ? ?ID=int;Dbxref=GeneID:7131911;gene=int;locus_tag=ECS88_2851 > NC_011742.gbk.gff:NC_011742 ? ?GenBank ? ?gene ? ?3220074 ? ?3221336 ? ?. > + ? ?. ? ?ID=int;Dbxref=GeneID:7129893;gene=int;locus_tag=ECS88_3250 > NC_011747.gbk.gff:NC_011747 ? ?GenBank ? ?gene ? ?132 ? ?872 ? ?. ? ?+ ? ?. > ID=int;Dbxref=GeneID:7119360;gene=int;locus_tag=pECS88_0001 > """ > > Commenting out either of these genes, and their child features, defers the > error to another gene that has the same name in both sequences in each case. > It seems that the problem might derive from attempting to uniquely associate > each gene uniquely with its 'gene' tag in the GenBank file and, as there are > several points in the process where it would be sensible to check for name > collisions, so that the feature:uniquename column can be modified to reflect > this, I looked for command-line options to each script, but didn't see one > that could help. ?Examining the manual for gmod_bulk_load_gff3.pl suggests > that this might be the problem (though I might be misunderstanding it): > > """ > ? ? ? Column 9 (group) > ? ? ? ? ? Here is where the magic happens. > > ? ? ? ? ? Assigning feature.name, feature.uniquename > ? ? ? ? ? ? ? The values of feature.name and feature.uniquename are > assigned according to these simple rules: > > ? ? ? ? ? ? ? If there is an ID tag, that is used as feature.uniquename > ? ? ? ? ? ? ? ? ? otherwise, it is assigned a uniquename that is equal to > ?auto? concatenated with the feature_id. > > ? ? ? ? ? ? ? ? ? (Note that this is a potential problem as there is no > check to make sure that it is appropriately unique.) > > ? ? ? ? ? ? ? If there is a Name tag, it?s value is set to feature.name; > ? ? ? ? ? ? ? ? ? otherwise it is null. > > ? ? ? ? ? ? ? ? ? Note that these rules are much more simple than that > those that Bio::DB::GFF uses, and may need to be revisited. > """ > > I suspect that, as the bp_genbank2gff3.pl script converts gene names (which > are not guaranteed to be unique) to ID tags, the problem recognised in the > manual is cropping up at this point. ?Luckily, the GenBank files come with > locus_tag tags, which should be unique for each gene (see > http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html#locus_tag). ?For > bacteria, at least, using the locus_tag values might be a more robust option > for the bp_genbank2gff3.pl; this already appears to have been recognised in > the script comments: > > """ > ? ? ? ? ? ?#?? should gene_name from > /locus_tag,/gene,/product,/transposon=xxx > ? ? ? ? ? ?# be converted to or added as ?Name=xxx (if not ID= or as well) > ? ? ? ? ? ?## problematic: convert_to_name ($feature); # drops > /locus_tag,/gene, tags > """ > > I can get round the upload problem somewhat suckily by changing the priority > given to 'locus_tag' and 'gene' tags for generating the .gff ID tag in the > bp_genbank2gff3.pl script: > > """ > [lpritc at localhost ~]$ diff bp_genbank2gff3.pl /usr/bin/bp_genbank2gff3.pl > 976,977c976,977 > < ? ? if ($g->has_tag('locus_tag')) { > < ? ? ? ? ($gene_id) = $g->get_tag_values('locus_tag'); > --- >> ? ? if ($g->has_tag('gene')) { >> ? ? ? ? ($gene_id) = $g->get_tag_values('gene'); > 979,980c979,980 > < ? ? elsif ($g->has_tag('gene')) { > < ? ? ? ? ($gene_id) = $g->get_tag_values('gene'); > --- >> ? ? elsif ($g->has_tag('locus_tag')) { >> ? ? ? ? ($gene_id) = $g->get_tag_values('locus_tag'); > """ > > But this isn't a complete solution, as GBROWSE searches by gene name don't > work after making this change, and presumably some further configuration or > hacking about is required to sort that out (advice welcome). > > So, what are other people doing to overcome this issue (if you've seen it), > and would a change to the bp_genbank2gff.pl script along the lines I mention > be useful to others? > > Cheers, > > L. > > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From sdavis2 at mail.nih.gov Tue Mar 2 16:33:38 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 2 Mar 2010 11:33:38 -0500 Subject: [Bioperl-l] how to convert a txt file to a bed file? In-Reply-To: References: Message-ID: <264855a01003020833v3e15dcb7vcdd876ce80468740@mail.gmail.com> On Tue, Mar 2, 2010 at 1:08 AM, Zhenyu Shen wrote: > I want to convert a txt file to a bed file and then load the bed file > to USCS genome browser. But how to convert the txt file to a bed file > with perl? Hi, Zhenyu. A bed file IS a text file, with the format described here: http://genome.ucsc.edu/goldenPath/help/customTrack.html#BED You just need to make your text file conform to that format and you are set to go. Sean From paolo.pavan at gmail.com Tue Mar 2 15:17:35 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Tue, 2 Mar 2010 16:17:35 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <18C0182252934619AD12E49243BE3C14@NewLife> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> <18C0182252934619AD12E49243BE3C14@NewLife> Message-ID: <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> I think you got the sense, thank you. Of course hsps from different hits will be reflected in different elements aligned. I've attached the example pasted (unix text) because is more readable, hoping will not be held by the mailing server :-) Thank you, Paolo 2010/3/2 Mark A. Jensen : > This might a good method to have for Bio::Search::Tiling-- > you want to stitch together all the hsps and have the > concatenated alignment returned as a Bio::SimpleAlign, > correct? Tiling would create the right set of hsps from > which to generate the composite alignment. I can > try to get something working, but it may take a while- > MAJ > ----- Original Message ----- From: "Paolo Pavan" > To: "Chris Fields" > Cc: > Sent: Tuesday, March 02, 2010 9:37 AM > Subject: Re: [Bioperl-l] Alignment from blast report > > > Hi Chris, > Thank you for your reply. So I have to understand that since the > get_aln method returns the HSP alignment, there is no way to retrieve > the whole alignment as in the example pasted, isn't it? > Basically I'm trying to use megablast as kind of multiple local > alignment engine and actually I'm not pretty sure this is a good idea > but in my particular case could be suitable. I mean that the example > below reports only the portions of the sequences that align loosing > the portions that does not, I'm not sure I gave the idea. What do you > think about? Can you give me your opinion? > If there isn't any module written yet, I can try to write a parser, it > could be of any interest? > > Thank you, > Paolo > > 2010/3/2 Chris Fields : >> >> Paolo, >> >> You can get a Bio::SimpleAlign from the HSP object. The first code example >> in this section in the HOWTO demonstrates this: >> >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >> >> chris >> >> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >> >>> Dear all, >>> Sorry for pushing up my post but, please does anyone have an hint for me? >>> Maybe have I to send attached the report to the mailing list? I don't >>> know attachment policies of the list, if it is allowed and is needed I >>> can do that. >>> >>> Thank you, >>> Paolo >>> >>> 2010/2/26 Paolo Pavan : >>>> >>>> Sorry, >>>> Maybe I forgot to add this is the megablast -m 5 output. >>>> >>>> Thank you again, >>>> Paolo >>>> >>>> 2010/2/26 Paolo Pavan : >>>>> >>>>> Hi all, >>>>> I have just a brief question: I've got some megablast reports such the >>>>> one I've pasted below. >>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>> Bio::Align::AlignI implementing one? >>>>> >>>>> Thank you all, >>>>> Paolo >>>>> >>>>> >>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>> >>>>> >>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>> (2000), >>>>> "A greedy algorithm for aligning DNA sequences", >>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>> >>>>> Database: 00038-00053.fasta >>>>> 2 sequences; 2001 total letters >>>>> >>>>> Searching..................................................done >>>>> >>>>> Query= 00038-00053 >>>>> (802 letters) >>>>> >>>>> >>>>> >>>>> Score E >>>>> Sequences producing significant alignments: (bits) Value >>>>> >>>>> ______00038 >>>>> 226 1e-62 >>>>> ______00053 >>>>> 115 3e-29 >>>>> >>>>> 1_0 472 >>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>> ______00038 883 >>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 532 >>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>> ______00038 943 >>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 591 >>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 651 >>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 >>>>> ------------------------------------------------------------ >>>>> >>>>> 1_0 711 >>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>> ______00038 1000 >>>>> ------------------------------------------------------------ 1001 >>>>> ______00053 1 >>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>> >>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>> ______00038 1000 -------------------------------- 1001 >>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>> Database: 00038-00053.fasta >>>>> Posted date: Feb 25, 2010 4:47 PM >>>>> Number of letters in database: 2001 >>>>> Number of sequences in database: 2 >>>>> >>>>> Lambda K H >>>>> 1.37 0.711 1.31 >>>>> >>>>> Gapped >>>>> Lambda K H >>>>> 1.37 0.711 1.31 >>>>> >>>>> >>>>> Matrix: blastn matrix:1 -3 >>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>> Number of Sequences: 2 >>>>> Number of Hits to DB: 17 >>>>> Number of extensions: 3 >>>>> Number of successful extensions: 3 >>>>> Number of sequences better than 10.0: 2 >>>>> Number of HSP's gapped: 2 >>>>> Number of HSP's successfully gapped: 2 >>>>> Length of query: 802 >>>>> Length of database: 2001 >>>>> Length adjustment: 10 >>>>> Effective length of query: 792 >>>>> Effective length of database: 1981 >>>>> Effective search space: 1568952 >>>>> Effective search space used: 1568952 >>>>> X1: 9 (17.8 bits) >>>>> X2: 20 (39.6 bits) >>>>> X3: 51 (101.1 bits) >>>>> S1: 9 (18.3 bits) >>>>> S2: 9 (18.3 bits) >>>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: example.megaout Type: application/octet-stream Size: 2918 bytes Desc: not available URL: From Russell.Smithies at agresearch.co.nz Tue Mar 2 19:35:19 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Mar 2010 08:35:19 +1300 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> References: <4AA1F3D6-E7A1-4E84-8433-B94A531C1B1A@gmail.com> <14A8E8E1A97C4E77A21D4E1E2939FEE3@NewLife> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61E4E660@exchsth.agresearch.co.nz> If you want to continue using your current version, you could try to delete your old blast db first. if($checkbox eq 'yes'){ unlink "mydb.*"; #or maybe `rm -f mydb.*` $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -prog_dir => "/usr/local/ncbi/blast/bin", -db_name => 'mydb', -db_data => 'xxx.fa', -create => 1); } else{ $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'mydb'); } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Tuesday, 2 March 2010 4:58 p.m. > To: Janine Arloth > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] StandAloneBlastPlus > > Hi Janine-- > You'll need to get the latest version of > Bio/Tools/Run/StandAloneBlastPlus.pm > (rev. 16878). > Then the -overwrite parameter will actually work, and you can write > > if($checkbox eq 'yes'){ > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -prog_dir => "/usr/local/ncbi/blast/bin", > -db_name => 'mydb', > -db_data => 'xxx.fa', > -overwrite => 1); > } > else{ > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'mydb'); > } > > MAJ > > ----- Original Message ----- > From: "Janine Arloth" > To: > Cc: > Sent: Monday, March 01, 2010 11:25 AM > Subject: StandAloneBlastPlus > > > Hello, > > I am running blast+ and want to create blastdb, depending on a checkbox. > That > means when mydb is to old then I want to rebuilt the blastdb files and > create a > ''new'' db. > When the latest versions of my files is ok, then blast should ran with > the > existing db. > Using this code, there I will never built a new db. It is creating and > than it > does not create a new one. > > > if($checkbox eq 'yes'){ > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -prog_dir => "/usr/local/ncbi/blast/bin", > -db_name => 'mydb', > -db_data => 'xxx.fa', > -create => 1); > } > else{ > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'mydb'); > } > > Thanks for helping > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From armendarez77 at hotmail.com Tue Mar 2 21:06:17 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Tue, 2 Mar 2010 13:06:17 -0800 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 Message-ID: Hello, I am writing a script to remotely access annotation files and parse information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it with random RefSeq accession numbers (NC_######) when something odd happened. When I used the accession number 'NC_007092', the script seemed to freeze. After some time, 'Out of Memory' was printed to the terminal. When I investigated the annotation file associated with NC_007092, a MapViewer page opened. It turns out that NC_007092 is a genome shotgun sequence, but it does not start with 'NZ' as I though all shotgun sequences did. Is this a random event that I don't have to worry much about or is there a way to pre-screen accession numbers to ensure they are associated with complete genome RefSeq files? I've included my script in case there is something I missed that could have prevented this. Thank you, Veronica _________________ use strict; use Bio::Perl; use Getopt::Long; use IO::Handle; my $accessionNumber; GetOptions("accessionNumber=s"=>\$accessionNumber); unless($accessionNumber){ print<<"OPTIONS"; options for $0 accessionNumber -a accession number OPTIONS die; } my $description = annotation_info($accessionNumber); print "$description\n"; sub annotation_info{ my $seqObj; my $accNum = shift(@_); my $rs = Bio::DB::RefSeq->new(); my $gb = Bio::DB::GenBank->new(); if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore in their accession number $seqObj = $rs->get_Seq_by_id($accNum); } elsif($accNum !~ /_/){ #GenBank annotation $seqObj = $gb->get_Seq_by_id($accNum); } return $seqObj->desc(); } _________________________________________________________________ Hotmail: Trusted email with Microsoft?s powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469226/direct/01/ From maj at fortinbras.us Tue Mar 2 20:58:59 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 2 Mar 2010 15:58:59 -0500 Subject: [Bioperl-l] bioperl job Message-ID: Hi All, I have a contact looking for an individual with Bioperl experience who could do contractual on-site work in the Cambridge MA area. **I have no business interest in this whatever, just doing a friend a favor.** Let me know directly (not to the list) if you have interest. thanks -- MAJ From Russell.Smithies at agresearch.co.nz Tue Mar 2 23:08:51 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 3 Mar 2010 12:08:51 +1300 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large. Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html Also, it may help to check the docsum first to see how big the file is going to be? (the full Genbank file for this example is only 6MB in size) =================== use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' ); my ($id) = $factory->get_ids; # get a summary $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); my $ds = $factory->next_DocSum; print "ID: $id\n"; # flattened mode while (my $item = $ds->next_Item('flattened')) { # not all Items have content, so need to check... printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } print "\n"; # download the full genbank file $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => "$id.gb"); ================ Hope this helps, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com > Sent: Wednesday, 3 March 2010 10:06 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > > Hello, > > I am writing a script to remotely access annotation files and parse > information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it > with random RefSeq accession numbers (NC_######) when something odd > happened. When I used the accession number 'NC_007092', the script seemed > to freeze. After some time, 'Out of Memory' was printed to the terminal. > > When I investigated the annotation file associated with NC_007092, a > MapViewer page opened. It turns out that NC_007092 is a genome shotgun > sequence, but it does not start with 'NZ' as I though all shotgun > sequences did. > > Is this a random event that I don't have to worry much about or is there a > way to pre-screen accession numbers to ensure they are associated with > complete genome RefSeq files? > > I've included my script in case there is something I missed that could > have prevented this. > > Thank you, > > Veronica > > > _________________ > > use strict; > use Bio::Perl; > use Getopt::Long; > use IO::Handle; > > my $accessionNumber; > > GetOptions("accessionNumber=s"=>\$accessionNumber); > unless($accessionNumber){ > print<<"OPTIONS"; > options for $0 > accessionNumber -a accession number > OPTIONS > die; > } > > my $description = annotation_info($accessionNumber); > > print "$description\n"; > > > > sub annotation_info{ > > my $seqObj; > > my $accNum = shift(@_); > > my $rs = Bio::DB::RefSeq->new(); > my $gb = Bio::DB::GenBank->new(); > > > if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore > in their accession number > > $seqObj = $rs->get_Seq_by_id($accNum); > } > elsif($accNum !~ /_/){ #GenBank annotation > $seqObj = $gb->get_Seq_by_id($accNum); > } > > return $seqObj->desc(); > } > > > _________________________________________________________________ > Hotmail: Trusted email with Microsoft's powerful SPAM protection. > http://clk.atdmt.com/GBL/go/201469226/direct/01/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From armendarez77 at hotmail.com Tue Mar 2 23:16:03 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Tue, 2 Mar 2010 15:16:03 -0800 Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> References: , <18DF7D20DFEC044098A1062202F5FFF32C61E4E824@exchsth.agresearch.co.nz> Message-ID: I see. I work mostly in the bacteria world so mammalian chromosomes shouldn't be an issue. I just randomly picked it to test my script when it came up after I did a simple search for Bacillus in the Genome database. I'll look into docSum to help prevent unexpected large files from interrupting my script. Thank you. Veronica > From: Russell.Smithies at agresearch.co.nz > To: armendarez77 at hotmail.com; bioperl-l at lists.open-bio.org > Date: Wed, 3 Mar 2010 12:08:51 +1300 > Subject: Re: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > NC_ accessions are all chromosomes so if you're unlucky enough to get a mammalian one, there's a fair chance it could be quite large. > Take a look at this for accession number formats: http://www.ncbi.nlm.nih.gov/refseq/key.html > > Also, it may help to check the docsum first to see how big the file is going to be? > (the full Genbank file for this example is only 6MB in size) > > =================== > use Bio::DB::EUtilities; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',-db => 'nucleotide',-term => 'NC_007092' ); > > my ($id) = $factory->get_ids; > > # get a summary > $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); > my $ds = $factory->next_DocSum; > print "ID: $id\n"; > # flattened mode > while (my $item = $ds->next_Item('flattened')) { > # not all Items have content, so need to check... > printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; > } > print "\n"; > > > # download the full genbank file > $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', > -db => 'nucleotide', > -id => $id, > -rettype => 'gbwithparts'); > $factory->get_Response(-file => "$id.gb"); > > ================ > > Hope this helps, > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of armendarez77 at hotmail.com > > Sent: Wednesday, 3 March 2010 10:06 a.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::DB::RefSeq and NC_007092 > > > > > > Hello, > > > > I am writing a script to remotely access annotation files and parse > > information using Bio::DB::RefSeq and Bio::DB::Genbank. I was testing it > > with random RefSeq accession numbers (NC_######) when something odd > > happened. When I used the accession number 'NC_007092', the script seemed > > to freeze. After some time, 'Out of Memory' was printed to the terminal. > > > > When I investigated the annotation file associated with NC_007092, a > > MapViewer page opened. It turns out that NC_007092 is a genome shotgun > > sequence, but it does not start with 'NZ' as I though all shotgun > > sequences did. > > > > Is this a random event that I don't have to worry much about or is there a > > way to pre-screen accession numbers to ensure they are associated with > > complete genome RefSeq files? > > > > I've included my script in case there is something I missed that could > > have prevented this. > > > > Thank you, > > > > Veronica > > > > > > _________________ > > > > use strict; > > use Bio::Perl; > > use Getopt::Long; > > use IO::Handle; > > > > my $accessionNumber; > > > > GetOptions("accessionNumber=s"=>\$accessionNumber); > > unless($accessionNumber){ > > print<<"OPTIONS"; > > options for $0 > > accessionNumber -a accession number > > OPTIONS > > die; > > } > > > > my $description = annotation_info($accessionNumber); > > > > print "$description\n"; > > > > > > > > sub annotation_info{ > > > > my $seqObj; > > > > my $accNum = shift(@_); > > > > my $rs = Bio::DB::RefSeq->new(); > > my $gb = Bio::DB::GenBank->new(); > > > > > > if($accNum =~ /\w\w_\d{6}/){ #RefSeq annotations include an underscore > > in their accession number > > > > $seqObj = $rs->get_Seq_by_id($accNum); > > } > > elsif($accNum !~ /_/){ #GenBank annotation > > $seqObj = $gb->get_Seq_by_id($accNum); > > } > > > > return $seqObj->desc(); > > } > > > > > > _________________________________________________________________ > > Hotmail: Trusted email with Microsoft's powerful SPAM protection. > > http://clk.atdmt.com/GBL/go/201469226/direct/01/ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _________________________________________________________________ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. http://clk.atdmt.com/GBL/go/201469229/direct/01/ From csaba.ortutay at uta.fi Thu Mar 4 09:57:00 2010 From: csaba.ortutay at uta.fi (Csaba Ortutay) Date: Thu, 4 Mar 2010 11:57:00 +0200 Subject: [Bioperl-l] Bio::DB::CUTG problem Message-ID: <201003041157.01013.csaba.ortutay@uta.fi> Hello, We would use Bio::DB::CUTG module to get codon usage data for a large number of genomes. We have noticed that the module cannot findcertain organisms which are otherwise in the database. It happens when the name contains some non- alphabetic characters. A few examples: Streptococcus agalactiae 2603V/R Shigella flexneri 5 str. 8401 I have located the corresponding part in the CUTG.pm code, and I would suggest a change: 222c222 < my $nameparts = join "+", $self->sp =~ /(\w+)/g; --- > my $nameparts = join "+", $self->sp =~ /(\S+)/g; With this I can now access the wanted tables. Best regards, Csaba -- Csaba Ortutay PhD Docent of Bioinformatics IMT Bioinformatics University of Tampere Finland From maj at fortinbras.us Thu Mar 4 13:10:06 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 08:10:06 -0500 Subject: [Bioperl-l] Bio::DB::CUTG problem In-Reply-To: <201003041157.01013.csaba.ortutay@uta.fi> References: <201003041157.01013.csaba.ortutay@uta.fi> Message-ID: Thanks, Csaba - change made and commited at r16898 MAJA ----- Original Message ----- From: "Csaba Ortutay" To: Sent: Thursday, March 04, 2010 4:57 AM Subject: [Bioperl-l] Bio::DB::CUTG problem > Hello, > > We would use Bio::DB::CUTG module to get codon usage data for a large number > of genomes. > > We have noticed that the module cannot findcertain organisms which are > otherwise in the database. It happens when the name contains some non- > alphabetic characters. > > A few examples: > > Streptococcus agalactiae 2603V/R > Shigella flexneri 5 str. 8401 > > I have located the corresponding part in the CUTG.pm code, and I would suggest > a change: > > 222c222 > < my $nameparts = join "+", $self->sp =~ /(\w+)/g; > --- >> my $nameparts = join "+", $self->sp =~ /(\S+)/g; > > > With this I can now access the wanted tables. > > Best regards, > Csaba > > -- > Csaba Ortutay PhD > Docent of Bioinformatics > IMT Bioinformatics > University of Tampere > Finland > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Thu Mar 4 14:40:18 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 04 Mar 2010 14:40:18 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> Message-ID: <4B8FC652.2010607@bioperl.org> Palani - This should be directed to the mailing list. -------- Original Message -------- From: PalaniKannan K Subject: Enquiry about Remoteblast.pm Date: Thu, 4 Mar 2010 10:23:45 +0530 I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through Remoteblast.pm script available through CPAN. When i am submitting my query... it shows waiting for much time. Ex. (waiting .....................) http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html This is the reference script i am using through Remoteblast perl module. It worked upto last 02/03/2010. Now it is not working We had developed 3 applications using this module. The same error comes in 3 applications we developed. So, i confim that our script dont have problem. Kindly help me in this regard. -- With Regards, palani kannan. k From maj at fortinbras.us Thu Mar 4 14:50:54 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 09:50:54 -0500 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com><56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com><56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com><56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com><18C0182252934619AD12E49243BE3C14@NewLife> <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> Message-ID: <2FB5C317605B48269256ABFABBED2239@NewLife> Paolo -- Ok, there's now (r16900) an *experimental* method in Bio::Search::Tiling::MapTiling called get_tiled_alns(). POD is below. Try it out and let me know-- cheers, MAJ =head1 TILED ALIGNMENTS The experimental method L will use a tiling to concatenate tiled hsps into a series of L objects: @alns = $tiling->get_tiled_alns($type, $context); Each alignment contains two sequences with ids 'query' and 'subject', and consists of a concatenation of tiling HSPs which overlap or are directly adjacent. The alignment are returned in C<$type> sequence order. When HSPs overlap, the alignment sequence is taken from the HSP which comes first in the coverage map array. The sequences in each alignment contain features (even though they are L objects) which map the original query/subject coordinates to the new alignment sequence coordinates. You can determine the original BLAST fragments this way: $aln = ($tiling->get_tiled_alns)[0]; $qseq = $aln->get_seq_by_id('query'); $hseq = $aln->get_seq_by_id('subject'); foreach my $feat ($qseq->get_SeqFeatures) { $org_start = ($feat->get_tag_values('query_start'))[0]; $org_end = ($feat->get_tag_values('query_end'))[0]; # original fragment as represented in the tiled alignment: $org_fragment = $feat->seq; } foreach my $feat ($hseq->get_SeqFeatures) { $org_start = ($feat->get_tag_values('subject_start'))[0]; $org_end = ($feat->get_tag_values('subject_end'))[0]; # original fragment as represented in the tiled alignment: $org_fragment = $feat->seq; } ----- Original Message ----- From: "Paolo Pavan" To: "Mark A. Jensen" Cc: "Chris Fields" ; Sent: Tuesday, March 02, 2010 10:17 AM Subject: Re: [Bioperl-l] Alignment from blast report >I think you got the sense, thank you. Of course hsps from different > hits will be reflected in different elements aligned. I've attached > the example pasted (unix text) because is more readable, hoping will > not be held by the mailing server :-) > > Thank you, > Paolo > > 2010/3/2 Mark A. Jensen : >> This might a good method to have for Bio::Search::Tiling-- >> you want to stitch together all the hsps and have the >> concatenated alignment returned as a Bio::SimpleAlign, >> correct? Tiling would create the right set of hsps from >> which to generate the composite alignment. I can >> try to get something working, but it may take a while- >> MAJ >> ----- Original Message ----- From: "Paolo Pavan" >> To: "Chris Fields" >> Cc: >> Sent: Tuesday, March 02, 2010 9:37 AM >> Subject: Re: [Bioperl-l] Alignment from blast report >> >> >> Hi Chris, >> Thank you for your reply. So I have to understand that since the >> get_aln method returns the HSP alignment, there is no way to retrieve >> the whole alignment as in the example pasted, isn't it? >> Basically I'm trying to use megablast as kind of multiple local >> alignment engine and actually I'm not pretty sure this is a good idea >> but in my particular case could be suitable. I mean that the example >> below reports only the portions of the sequences that align loosing >> the portions that does not, I'm not sure I gave the idea. What do you >> think about? Can you give me your opinion? >> If there isn't any module written yet, I can try to write a parser, it >> could be of any interest? >> >> Thank you, >> Paolo >> >> 2010/3/2 Chris Fields : >>> >>> Paolo, >>> >>> You can get a Bio::SimpleAlign from the HSP object. The first code example >>> in this section in the HOWTO demonstrates this: >>> >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>> >>> chris >>> >>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >>> >>>> Dear all, >>>> Sorry for pushing up my post but, please does anyone have an hint for me? >>>> Maybe have I to send attached the report to the mailing list? I don't >>>> know attachment policies of the list, if it is allowed and is needed I >>>> can do that. >>>> >>>> Thank you, >>>> Paolo >>>> >>>> 2010/2/26 Paolo Pavan : >>>>> >>>>> Sorry, >>>>> Maybe I forgot to add this is the megablast -m 5 output. >>>>> >>>>> Thank you again, >>>>> Paolo >>>>> >>>>> 2010/2/26 Paolo Pavan : >>>>>> >>>>>> Hi all, >>>>>> I have just a brief question: I've got some megablast reports such the >>>>>> one I've pasted below. >>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>>> Bio::Align::AlignI implementing one? >>>>>> >>>>>> Thank you all, >>>>>> Paolo >>>>>> >>>>>> >>>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>>> >>>>>> >>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>>> (2000), >>>>>> "A greedy algorithm for aligning DNA sequences", >>>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>>> >>>>>> Database: 00038-00053.fasta >>>>>> 2 sequences; 2001 total letters >>>>>> >>>>>> Searching..................................................done >>>>>> >>>>>> Query= 00038-00053 >>>>>> (802 letters) >>>>>> >>>>>> >>>>>> >>>>>> Score E >>>>>> Sequences producing significant alignments: (bits) Value >>>>>> >>>>>> ______00038 >>>>>> 226 1e-62 >>>>>> ______00053 >>>>>> 115 3e-29 >>>>>> >>>>>> 1_0 472 >>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>>> ______00038 883 >>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 532 >>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>>> ______00038 943 >>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 591 >>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 651 >>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> 1_0 711 >>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>>> ______00038 1000 >>>>>> ------------------------------------------------------------ 1001 >>>>>> ______00053 1 >>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>>> >>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>>> ______00038 1000 -------------------------------- 1001 >>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>>> Database: 00038-00053.fasta >>>>>> Posted date: Feb 25, 2010 4:47 PM >>>>>> Number of letters in database: 2001 >>>>>> Number of sequences in database: 2 >>>>>> >>>>>> Lambda K H >>>>>> 1.37 0.711 1.31 >>>>>> >>>>>> Gapped >>>>>> Lambda K H >>>>>> 1.37 0.711 1.31 >>>>>> >>>>>> >>>>>> Matrix: blastn matrix:1 -3 >>>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>>> Number of Sequences: 2 >>>>>> Number of Hits to DB: 17 >>>>>> Number of extensions: 3 >>>>>> Number of successful extensions: 3 >>>>>> Number of sequences better than 10.0: 2 >>>>>> Number of HSP's gapped: 2 >>>>>> Number of HSP's successfully gapped: 2 >>>>>> Length of query: 802 >>>>>> Length of database: 2001 >>>>>> Length adjustment: 10 >>>>>> Effective length of query: 792 >>>>>> Effective length of database: 1981 >>>>>> Effective search space: 1568952 >>>>>> Effective search space used: 1568952 >>>>>> X1: 9 (17.8 bits) >>>>>> X2: 20 (39.6 bits) >>>>>> X3: 51 (101.1 bits) >>>>>> S1: 9 (18.3 bits) >>>>>> S2: 9 (18.3 bits) >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From janine.arloth at googlemail.com Wed Mar 3 09:44:18 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Wed, 3 Mar 2010 10:44:18 +0100 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: References: Message-ID: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Hello, which arguments or result can I get from hits? hit = $result->next_hit; print $hit->name; Are there more than the name? Exists a description, where I can look up this? Regards From David.Messina at sbc.su.se Thu Mar 4 15:27:46 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 4 Mar 2010 16:27:46 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <4B8FC652.2010607@bioperl.org> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> Message-ID: <31C89CCE-25B8-492A-924D-A7401D415584@sbc.su.se> Hi Palani, You're using a very old version of BioPerl, 1.4: > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html The current release version is 1.6.1. Also, NCBi is changing (or may have already changed) their remote access system to require an email address. The very latest builds of BioPerl should now be compatible with this change. Get it here: http://www.bioperl.org/DIST/nightly_builds/ or directly via Subversion ? instructions here: http://www.bioperl.org/wiki/Getting_BioPerl Dave From cjfields at illinois.edu Thu Mar 4 15:30:54 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 04 Mar 2010 09:30:54 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <4B8FC652.2010607@bioperl.org> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> Message-ID: <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> Palani, We have a few regression tests that should have caught this but aren't quite set up correctly (they silently pass if no report is returned). This may be smoething on NCBI's end though; any remote database or analyses are notoriously brittle, hence the need to skip these by default when installing tests. Final note, but hopefully you aren't using bioperl 1.4 (as indicated by the docs). We're now on the 1.6 release series and are now on v. 1.6.1; 1.4 isn't supported anymore. chris On Thu, 2010-03-04 at 14:40 +0000, Jason Stajich wrote: > Palani - > This should be directed to the mailing list. > > -------- Original Message -------- > From: PalaniKannan K > Subject: Enquiry about Remoteblast.pm > Date: Thu, 4 Mar 2010 10:23:45 +0530 > > > > > > I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through > Remoteblast.pm script available through CPAN. When i am submitting my > query... it shows waiting for much time. Ex. (waiting .....................) > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html > > This is the reference script i am using through Remoteblast perl module. > > It worked upto last 02/03/2010. Now it is not working > > We had developed 3 applications using this module. The same error comes in 3 > applications we developed. So, i confim that our script dont have problem. > Kindly help me in this regard. > > -- > With Regards, > palani kannan. k > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Mar 4 15:27:16 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 10:27:16 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> References: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Message-ID: Check out http://www.bioperl.org/wiki/HOWTO:SearchIO MAJ ----- Original Message ----- From: "Janine Arloth" To: Sent: Wednesday, March 03, 2010 4:44 AM Subject: [Bioperl-l] StandAloneBlastPlus > Hello, > > which arguments or result can I get from hits? > > hit = $result->next_hit; > print $hit->name; > > Are there more than the name? Exists a description, where I can look up this? > > Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Thu Mar 4 15:25:45 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 04 Mar 2010 10:25:45 -0500 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> References: <13EA1FC8-4D1C-4601-9C32-5AD01288ED98@gmail.com> Message-ID: <90B9BFFC-73DA-469F-900C-70448A9B1C03@verizon.net> http://www.bioperl.org/wiki/HOWTO:SearchIO On Mar 3, 2010, at 4:44 AM, Janine Arloth wrote: > Hello, > > which arguments or result can I get from hits? > > hit = $result->next_hit; > print $hit->name; > > Are there more than the name? Exists a description, where I can look up this? > > Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 4 16:49:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 04 Mar 2010 10:49:01 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> References: <50e1fe001003032053h5a2cfae9lc7be728d67717566@mail.gmail.com> <4B8FC652.2010607@bioperl.org> <1267716654.23329.19.camel@pyrimidine.igb.uiuc.edu> Message-ID: <1267721341.23329.26.camel@pyrimidine.igb.uiuc.edu> Okay, I'm able to replicate this (and the tests now correctly attempt to catch it). It appears that this may be a general RemoteBlast issue, as regular RemoteBlast tests are also taking forever. This shouldn't be related to the email issue (this isn't in RemoteBlast.pm yet). At least, I would hope NCBI would pass back another status besides 'WAITING' in cases where the email isn't provided. chris On Thu, 2010-03-04 at 09:30 -0600, Chris Fields wrote: > Palani, > > We have a few regression tests that should have caught this but aren't > quite set up correctly (they silently pass if no report is returned). > This may be smoething on NCBI's end though; any remote database or > analyses are notoriously brittle, hence the need to skip these by > default when installing tests. > > Final note, but hopefully you aren't using bioperl 1.4 (as indicated by > the docs). We're now on the 1.6 release series and are now on v. 1.6.1; > 1.4 isn't supported anymore. > > chris > > On Thu, 2010-03-04 at 14:40 +0000, Jason Stajich wrote: > > Palani - > > This should be directed to the mailing list. > > > > -------- Original Message -------- > > From: PalaniKannan K > > Subject: Enquiry about Remoteblast.pm > > Date: Thu, 4 Mar 2010 10:23:45 +0530 > > > > > > > > > > > > I am using nr, CDD/CDSearch KOG, CDD/CDSearch PFAM. I am accessing through > > Remoteblast.pm script available through CPAN. When i am submitting my > > query... it shows waiting for much time. Ex. (waiting .....................) > > > > http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html > > > > This is the reference script i am using through Remoteblast perl module. > > > > It worked upto last 02/03/2010. Now it is not working > > > > We had developed 3 applications using this module. The same error comes in 3 > > applications we developed. So, i confim that our script dont have problem. > > Kindly help me in this regard. > > > > -- > > With Regards, > > palani kannan. k > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Mar 4 19:06:33 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 04 Mar 2010 11:06:33 -0800 Subject: [Bioperl-l] call for project ideas - Google Summer of Code In-Reply-To: References: <4B8CAE6B.4010807@cornell.edu> Message-ID: <4B9004B9.8090107@cornell.edu> Hello Luis, These are interesting ideas. Have a look at http://sswap.info and http://sadiframework.org, perhaps you might want to work with one of those technologies? Be warned, these are both in early-stage development, you are on the cutting edge here! It seems like your desire to work with semantic technologies as a GSoC student could fit under a number of different mentoring organizations, possibly OBF or NEScent, or maybe another organization entirely. I'll make some inquiries. In the mean time, please add a project idea for this on the bioperl GSoC page, to give the idea somewhere to coalesce. If you can, try to come up with a more concrete idea for what you want to do. http://www.bioperl.org/wiki/Google_Summer_of_Code What do you think? Rob Luis M Rodriguez-R wrote: > Hello Robert, > > I would like to how to apply to and when the GSoC-2010 is planned to be performed. I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf and uniprot, but it could also be useful to integrate OWL). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL. The people of bio2rdf already have some parsers, but it's incompleteness is evident when working with their RDF as primary source of data. > > Best regards, > Luis. > > El 2/03/2010, a las 1:21, Robert Buels escribi?: > >> Hi all, >> >> Google's Summer of Code is coming round again, very soon now (mentoring organization applications are due next week). We need project ideas for prospective Summer of Code interns. >> >> There's a page on the BioPerl wiki, please have a look and add your ideas for intern projects. >> >> For more on Google Summer of Code, what it is and how it works, see their FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs >> >> One of the summer intern ideas I have on the page so far is to help with the tough grunt work of breaking BioPerl into smaller, more easily managed distributions. I'm sure you all can think of plenty more! >> >> Here's the page: http://www.bioperl.org/wiki/Google_Summer_of_Code >> >> Rob >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > From joa2006 at med.cornell.edu Thu Mar 4 20:11:58 2010 From: joa2006 at med.cornell.edu (Josef Anrather) Date: Thu, 04 Mar 2010 15:11:58 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] Message-ID: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Hi there, same problems here. Bioperl 1.6.1 installed; RemoteBlast version 1.006001. Could someone point me in the right direction. What is the put parameter for the email address? Does the supplied email address end up in an FBI data base if you blast the B.anthracis genome? Josef Cornell Medical College From maj at fortinbras.us Thu Mar 4 21:18:48 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 4 Mar 2010 16:18:48 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: we're not at liberty to say ----- Original Message ----- From: "Josef Anrather" To: Sent: Thursday, March 04, 2010 3:11 PM Subject: Re: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] > Hi there, > > same problems here. Bioperl 1.6.1 installed; RemoteBlast version > 1.006001. > Could someone point me in the right direction. What is the put > parameter for the email address? > > Does the supplied email address end up in an FBI data base if you > blast the B.anthracis genome? > > Josef > > Cornell Medical College > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From David.Messina at sbc.su.se Fri Mar 5 10:05:43 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Mar 2010 11:05:43 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: My apologies for jumping the gun on the email thing ? that won't take effect until June 1. See full details here: http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d Looks like the problems with RemoteBlast (as Chris reported elsewhere in this thread) is at NCBI's servers (and is probably temporary). Dave From robert.bradbury at gmail.com Fri Mar 5 13:20:36 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 5 Mar 2010 08:20:36 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: On Fri, Mar 5, 2010 at 5:05 AM, Dave Messina wrote: > My apologies for jumping the gun on the email thing ? that won't take > effect until June 1. > > See full details here: > > http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d > > > Looks like the problems with RemoteBlast (as Chris reported elsewhere in > this thread) is at NCBI's servers (and is probably temporary). > > I would not be at all surprised if any problems involving RemoteBlast were related to the recent changeovers to a Javascript requirement for all interfaces to NCBI databases (this took place around mid-February and I complained about this in a previous email to the BioPerl list). I received a response back from Dr. Eric Sayers at NCBI on Feb. 26 that indicated that they were aware of the problem (involving a Javascript requirement) and indicated that NCBI developers were "investigating" ways to mitigate the problem. I've looked briefly at the new Javascript code that one is required to run when using PubMed, etc. and it looks like they may have completely changed the external interfaces to NCBI databases -- so I'm not surprised if that broke some or all other external interfaces used by BioPerl (RemoteBlast, Eutils, etc.). I'd suggest that you try to document the problems as best you can and submit them to the NCBI help desk (or info at ncbi.nlm.nih.gov). It may be worth noting that it took ~3 weeks for me to receive any response to my reports. Also note, that (a) to the best of my knowledge there has been no public discussion regarding these recent changes at NCBI; and (b) under the Jan. 21, 2009 Memorandum on Transparency and Open Government, and under the Dec 8, 2009 Open Government Directive, NCBI *should* be doing a better job working with its end users (and the taxpayers) -- and at least thus far, while NIH seems to be making an effort that doesn't seem to have filtered down to NCBI. (For example, no open/public discussion regarding the email requirement for remote blasts...). It is also worth noting that it should be possible to file FOI requests with NIH/NCBI to find out exactly what they are doing and why they are doing it. I haven't taken such steps yet but I have given consideration to doing so. Robert From biopython at maubp.freeserve.co.uk Fri Mar 5 13:31:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Mar 2010 13:31:57 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> On Fri, Mar 5, 2010 at 1:20 PM, Robert Bradbury wrote: > > (For example, no open/public discussion regarding the email > requirement for remote blasts...). > Hi all, What email requirement for remote blasts are you talking about? Note that the email referred to earlier talks about to unrelated issues, (1) changes to the BLAST output with the introduction of BLAST+, and (2) the upcoming email requirement for Entrez (aka E-utilities, they have been very clear about that with plenty of warning). http://lists.open-bio.org/pipermail/open-bio-l/2010-February/000615.html http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032159.html Is there a misunderstanding here? Peter From David.Messina at sbc.su.se Fri Mar 5 13:44:08 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 5 Mar 2010 14:44:08 +0100 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> Message-ID: <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> > Is there a misunderstanding here? Whoops, yes there is ? that's my fault, too. I did not read carefully and conflated EUtilities and RemoteBLAST. Just to be clear, the upcoming email requirement will be for EUtilities, NOT for RemoteBLAST. Thanks for clearing that up, Peter. Dave On Mar 5, 2010, at 14:31, Peter wrote: > On Fri, Mar 5, 2010 at 1:20 PM, Robert Bradbury wrote: >> >> (For example, no open/public discussion regarding the email >> requirement for remote blasts...). >> > > Hi all, > > What email requirement for remote blasts are you talking about? > > Note that the email referred to earlier talks about to unrelated > issues, (1) changes to the BLAST output with the introduction > of BLAST+, and (2) the upcoming email requirement for Entrez > (aka E-utilities, they have been very clear about that with > plenty of warning). > > http://lists.open-bio.org/pipermail/open-bio-l/2010-February/000615.html > http://lists.open-bio.org/pipermail/bioperl-l/2010-February/032159.html > > > Peter From biopython at maubp.freeserve.co.uk Fri Mar 5 13:48:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Mar 2010 13:48:27 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> Message-ID: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: > >> Is there a misunderstanding here? > > Whoops, yes there is ? that's my fault, too. I did not > read carefully and conflated EUtilities and RemoteBLAST. > > Just to be clear, the upcoming email requirement will > be for EUtilities, NOT for RemoteBLAST. > > Thanks for clearing that up, Peter. > Dave No problem - you guys had me worried there for a minute ;) Peter From cjfields at illinois.edu Fri Mar 5 13:50:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 07:50:51 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> Message-ID: <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> On Mar 5, 2010, at 7:20 AM, Robert Bradbury wrote: > On Fri, Mar 5, 2010 at 5:05 AM, Dave Messina wrote: > >> My apologies for jumping the gun on the email thing ? that won't take >> effect until June 1. >> >> See full details here: >> >> http://groups.google.com/group/bioperl-l/browse_thread/thread/979a35fb9e22e45d/e7c88e7f087ff42d >> >> >> Looks like the problems with RemoteBlast (as Chris reported elsewhere in >> this thread) is at NCBI's servers (and is probably temporary). >> >> > I would not be at all surprised if any problems involving RemoteBlast were > related to the recent changeovers to a Javascript requirement for all > interfaces to NCBI databases (this took place around mid-February and I > complained about this in a previous email to the BioPerl list). Robert, according to Palani's recent response NCBI provided a perl script that worked, so I don't think it a Javascript issue. My guess is a change in the returned page information that isn't caught by the current regex, a problem that has happened in the past. I'll be looking into it today. > I received a response back from Dr. Eric Sayers at NCBI on Feb. 26 that > indicated that they were aware of the problem (involving a Javascript > requirement) and indicated that NCBI developers were "investigating" ways to > mitigate the problem. > > I've looked briefly at the new Javascript code that one is required to run > when using PubMed, etc. and it looks like they may have completely changed > the external interfaces to NCBI databases -- so I'm not surprised if that > broke some or all other external interfaces used by BioPerl (RemoteBlast, > Eutils, etc.). I'd suggest that you try to document the problems as best > you can and submit them to the NCBI help desk (or info at ncbi.nlm.nih.gov). > It may be worth noting that it took ~3 weeks for me to receive any response > to my reports. EUtilities works fine (both regular and SOAP); all regression tests are passing, so it's not affecting everything. > Also note, that (a) to the best of my knowledge there has been no public > discussion regarding these recent changes at NCBI; and (b) under the Jan. > 21, 2009 Memorandum on Transparency and Open Government, and under the Dec > 8, 2009 Open Government Directive, NCBI *should* be doing a better job > working with its end users (and the taxpayers) -- and at least thus far, > while NIH seems to be making an effort that doesn't seem to have filtered > down to NCBI. > > (For example, no open/public discussion regarding the email requirement for > remote blasts...). > > It is also worth noting that it should be possible to file FOI requests with > NIH/NCBI to find out exactly what they are doing and why they are doing it. > I haven't taken such steps yet but I have given consideration to doing so. > > Robert The email requirement has always been indicated, it was just never enforced. B/c of increased spamming issues on the NCBI server they took up the initiative to require users provide an email address (and enforce it starting in June). I just made a change to the BioPerl install that requests an email and bypasses Bio::DB::EUtilities tests if one is not provided, other tools will be following suit. I don't think there is anything insidious about this. My guess is they will be using them merely to track server usage per user and IP, and take necessary measures (i.e. contact or block) if needed. Finally, I'm not sure where the hostility is coming from. NCBI has provided a great service to the community for many years, even through many funding cuts, and they have had quite a few. Frankly, if one doesn't like their service requirements, there are other databases that one can use. chris From cjfields at illinois.edu Fri Mar 5 15:07:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 09:07:11 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> Message-ID: On Mar 5, 2010, at 7:48 AM, Peter wrote: > On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >> >>> Is there a misunderstanding here? >> >> Whoops, yes there is ? that's my fault, too. I did not >> read carefully and conflated EUtilities and RemoteBLAST. >> >> Just to be clear, the upcoming email requirement will >> be for EUtilities, NOT for RemoteBLAST. >> >> Thanks for clearing that up, Peter. >> Dave > > No problem - you guys had me worried there for a minute ;) > > Peter Just as an update, I can confirm it is a change with retrieve_blast() not catching the report (no Javascript, no email ;). Will try fixing this later today. chris From robert.bradbury at gmail.com Fri Mar 5 15:08:42 2010 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Fri, 5 Mar 2010 10:08:42 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> Message-ID: Sorry, yes I too was reading quickly and not separating RemoteBlast from Eutilities requirements. With respect to "hostility", I do agree Chris that NCBI has provided a great service over the years (I've used it for over 15 as I'm sure many here have). However, the recent Javascript requirement (without any apparent discussion within the user community) has me very annoyed [1]. One could back it up a level and ask why NCBI doesn't have a "user community forum" (at least that I'm aware of) or even a bug database (it isn't like putting up a bugzilla bug database requires all that much work). Heck, even the phone companies (whom I consider to be the epitome of bureaucracy) issue me a trouble ticket # when I have a problem (something to the best of my knowledge NCBI does not do). There is also the fact that several months ago when I requested an explanation for what code/utilities were being used to generate the Homologene "homology" graphics (so I could consider extending it to other species, potentially in BioPerl) I was told in unspecific terms that a variety of utilities were used (and my impression was perhaps an underlying suggestion that it might be too complicated for me to understand -- but that could just be subjective impression on my part). [Of course such a response doesn't fit well my perspective of "open government".) Robert 1. There are a long list of reasons why Javascript is bad ranging from increasing memory and CPU requirements on the end user (one cannot run hundreds of open PubMed tabs, as I often may when doing research, on an "average" machine if all the tabs are running Javascript, downloading and running lots of Javascripts can hardly be considered "green", Javascript doesn't work in the lightest weight browsers such as Dillo, Javascript decreases the reliability and security of the browser, excessive reliance on Javascript may decrease web access for individuals with disabilities (potentially in violation of current laws I suspect), etc.) From roy.chaudhuri at gmail.com Fri Mar 5 15:52:12 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 05 Mar 2010 15:52:12 +0000 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <9C048672-3D5B-472A-B523-706BCDE03F81@illinois.edu> Message-ID: <4B9128AC.1000405@gmail.com> Hi Robert, Just a suggestion, maybe you could use HubMed (www.hubmed.org) as a PubMed alternative? It seems to work ok with JavaScript disabled. Roy. On 05/03/2010 15:08, Robert Bradbury wrote: > Sorry, yes I too was reading quickly and not separating RemoteBlast from > Eutilities requirements. > > With respect to "hostility", I do agree Chris that NCBI has provided a great > service over the years (I've used it for over 15 as I'm sure many here > have). However, the recent Javascript requirement (without any apparent > discussion within the user community) has me very annoyed [1]. One could > back it up a level and ask why NCBI doesn't have a "user community forum" > (at least that I'm aware of) or even a bug database (it isn't like putting > up a bugzilla bug database requires all that much work). Heck, even the > phone companies (whom I consider to be the epitome of bureaucracy) issue me > a trouble ticket # when I have a problem (something to the best of my > knowledge NCBI does not do). > > There is also the fact that several months ago when I requested an > explanation for what code/utilities were being used to generate the > Homologene "homology" graphics (so I could consider extending it to other > species, potentially in BioPerl) I was told in unspecific terms that a > variety of utilities were used (and my impression was perhaps an underlying > suggestion that it might be too complicated for me to understand -- but that > could just be subjective impression on my part). [Of course such a response > doesn't fit well my perspective of "open government".) > > Robert > > 1. There are a long list of reasons why Javascript is bad ranging from > increasing memory and CPU requirements on the end user (one cannot run > hundreds of open PubMed tabs, as I often may when doing research, on an > "average" machine if all the tabs are running Javascript, downloading and > running lots of Javascripts can hardly be considered "green", Javascript > doesn't work in the lightest weight browsers such as Dillo, Javascript > decreases the reliability and security of the browser, excessive reliance on > Javascript may decrease web access for individuals with disabilities > (potentially in violation of current laws I suspect), etc.) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paolo.pavan at gmail.com Fri Mar 5 18:51:55 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Fri, 5 Mar 2010 19:51:55 +0100 Subject: [Bioperl-l] Alignment from blast report In-Reply-To: <2FB5C317605B48269256ABFABBED2239@NewLife> References: <56be91b61002260505j6a512587tc2d6623be21ba1b3@mail.gmail.com> <56be91b61002260617k744f12c3u1be774c314b3a4c8@mail.gmail.com> <56be91b61003011507h4e7acce3kcedff9948bf4b010@mail.gmail.com> <56be91b61003020637w6f94341cydcb76931c70a9c1@mail.gmail.com> <18C0182252934619AD12E49243BE3C14@NewLife> <56be91b61003020717l1e296657q4fdbe5ebcde973e@mail.gmail.com> <2FB5C317605B48269256ABFABBED2239@NewLife> Message-ID: <56be91b61003051051v6b06b872q9f59380b05492071@mail.gmail.com> Dear Mark, Thank you again for your efforts spent on this theme, I have read and tested carefully enough I hope, your new ads. I found they work perfectly but either I miss some feature of the Tiling API (and this is possible) or it could be that they don't entirely match what was the initial problem; for sure my fault, I can explain better. Let me start saying that what is needed is the merge of the alignments returned by the get_tiled_alns method. I have 2 seqs: h1, h2 (in the given example 00038 and 00053) and they could be aligned against the same sequence q (named 1_0) They cannot be aligned with common multiple sequence aligners like clustalw since in this case is to be preferred a local alignment algorithm instead of a global alignment. This specific case cannot be handled by programs like cap3 either. I found that megablast -m 5 can output a tiling of all the hits found versus the query, reporting this entire. I hope I gave the idea, if needed I can provide the input sequences of the megablast. Thank you again and have a nice week end, Paolo 2010/3/4 Mark A. Jensen : > Paolo -- Ok, there's now (r16900) an *experimental* method in > Bio::Search::Tiling::MapTiling called get_tiled_alns(). > POD is below. Try it out and let me know-- > cheers, > MAJ > > > =head1 TILED ALIGNMENTS > > The experimental method L will use a tiling > to concatenate tiled hsps into a series of L > objects: > > @alns = $tiling->get_tiled_alns($type, $context); > > Each alignment contains two sequences with ids 'query' and 'subject', > and consists of a concatenation of tiling HSPs which overlap or are > directly adjacent. The alignment are returned in C<$type> sequence > order. When HSPs overlap, the alignment sequence is taken from the HSP > which comes first in the coverage map array. > > The sequences in each alignment contain features (even though they are > L objects) which map the original query/subject > coordinates to the new alignment sequence coordinates. You can > determine the original BLAST fragments this way: > > $aln = ($tiling->get_tiled_alns)[0]; > $qseq = $aln->get_seq_by_id('query'); > $hseq = $aln->get_seq_by_id('subject'); > foreach my $feat ($qseq->get_SeqFeatures) { > ? $org_start = ($feat->get_tag_values('query_start'))[0]; > ? $org_end = ($feat->get_tag_values('query_end'))[0]; > ? # original fragment as represented in the tiled alignment: > ? $org_fragment = $feat->seq; > } > foreach my $feat ($hseq->get_SeqFeatures) { > ? $org_start = ($feat->get_tag_values('subject_start'))[0]; > ? $org_end = ($feat->get_tag_values('subject_end'))[0]; > ? # original fragment as represented in the tiled alignment: > ? $org_fragment = $feat->seq; > } > > > ----- Original Message ----- From: "Paolo Pavan" > To: "Mark A. Jensen" > Cc: "Chris Fields" ; > Sent: Tuesday, March 02, 2010 10:17 AM > Subject: Re: [Bioperl-l] Alignment from blast report > > >> I think you got the sense, thank you. Of course hsps from different >> hits will be reflected in different elements aligned. I've attached >> the example pasted (unix text) because is more readable, hoping will >> not be held by the mailing server :-) >> >> Thank you, >> Paolo >> >> 2010/3/2 Mark A. Jensen : >>> >>> This might a good method to have for Bio::Search::Tiling-- >>> you want to stitch together all the hsps and have the >>> concatenated alignment returned as a Bio::SimpleAlign, >>> correct? Tiling would create the right set of hsps from >>> which to generate the composite alignment. I can >>> try to get something working, but it may take a while- >>> MAJ >>> ----- Original Message ----- From: "Paolo Pavan" >>> To: "Chris Fields" >>> Cc: >>> Sent: Tuesday, March 02, 2010 9:37 AM >>> Subject: Re: [Bioperl-l] Alignment from blast report >>> >>> >>> Hi Chris, >>> Thank you for your reply. So I have to understand that since the >>> get_aln method returns the HSP alignment, there is no way to retrieve >>> the whole alignment as in the example pasted, isn't it? >>> Basically I'm trying to use megablast as kind of multiple local >>> alignment engine and actually I'm not pretty sure this is a good idea >>> but in my particular case could be suitable. I mean that the example >>> below reports only the portions of the sequences that align loosing >>> the portions that does not, I'm not sure I gave the idea. What do you >>> think about? Can you give me your opinion? >>> If there isn't any module written yet, I can try to write a parser, it >>> could be of any interest? >>> >>> Thank you, >>> Paolo >>> >>> 2010/3/2 Chris Fields : >>>> >>>> Paolo, >>>> >>>> You can get a Bio::SimpleAlign from the HSP object. The first code >>>> example >>>> in this section in the HOWTO demonstrates this: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods >>>> >>>> chris >>>> >>>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote: >>>> >>>>> Dear all, >>>>> Sorry for pushing up my post but, please does anyone have an hint for >>>>> me? >>>>> Maybe have I to send attached the report to the mailing list? I don't >>>>> know attachment policies of the list, if it is allowed and is needed I >>>>> can do that. >>>>> >>>>> Thank you, >>>>> Paolo >>>>> >>>>> 2010/2/26 Paolo Pavan : >>>>>> >>>>>> Sorry, >>>>>> Maybe I forgot to add this is the megablast -m 5 output. >>>>>> >>>>>> Thank you again, >>>>>> Paolo >>>>>> >>>>>> 2010/2/26 Paolo Pavan : >>>>>>> >>>>>>> Hi all, >>>>>>> I have just a brief question: I've got some megablast reports such >>>>>>> the >>>>>>> one I've pasted below. >>>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the >>>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the >>>>>>> entire alignment represented as a Bio::SimpleAlign object or >>>>>>> Bio::Align::AlignI implementing one? >>>>>>> >>>>>>> Thank you all, >>>>>>> Paolo >>>>>>> >>>>>>> >>>>>>> MEGABLAST 2.2.16 [Mar-25-2007] >>>>>>> >>>>>>> >>>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller >>>>>>> (2000), >>>>>>> "A greedy algorithm for aligning DNA sequences", >>>>>>> J Comput Biol 2000; 7(1-2):203-14. >>>>>>> >>>>>>> Database: 00038-00053.fasta >>>>>>> 2 sequences; 2001 total letters >>>>>>> >>>>>>> Searching..................................................done >>>>>>> >>>>>>> Query= 00038-00053 >>>>>>> (802 letters) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Score E >>>>>>> Sequences producing significant alignments: (bits) Value >>>>>>> >>>>>>> ______00038 >>>>>>> 226 1e-62 >>>>>>> ______00053 >>>>>>> 115 3e-29 >>>>>>> >>>>>>> 1_0 472 >>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531 >>>>>>> ______00038 883 >>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 532 >>>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590 >>>>>>> ______00038 943 >>>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 591 >>>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 651 >>>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> 1_0 711 >>>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770 >>>>>>> ______00038 1000 >>>>>>> ------------------------------------------------------------ 1001 >>>>>>> ______00053 1 >>>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34 >>>>>>> >>>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802 >>>>>>> ______00038 1000 -------------------------------- 1001 >>>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65 >>>>>>> Database: 00038-00053.fasta >>>>>>> Posted date: Feb 25, 2010 4:47 PM >>>>>>> Number of letters in database: 2001 >>>>>>> Number of sequences in database: 2 >>>>>>> >>>>>>> Lambda K H >>>>>>> 1.37 0.711 1.31 >>>>>>> >>>>>>> Gapped >>>>>>> Lambda K H >>>>>>> 1.37 0.711 1.31 >>>>>>> >>>>>>> >>>>>>> Matrix: blastn matrix:1 -3 >>>>>>> Gap Penalties: Existence: 0, Extension: 0 >>>>>>> Number of Sequences: 2 >>>>>>> Number of Hits to DB: 17 >>>>>>> Number of extensions: 3 >>>>>>> Number of successful extensions: 3 >>>>>>> Number of sequences better than 10.0: 2 >>>>>>> Number of HSP's gapped: 2 >>>>>>> Number of HSP's successfully gapped: 2 >>>>>>> Length of query: 802 >>>>>>> Length of database: 2001 >>>>>>> Length adjustment: 10 >>>>>>> Effective length of query: 792 >>>>>>> Effective length of database: 1981 >>>>>>> Effective search space: 1568952 >>>>>>> Effective search space used: 1568952 >>>>>>> X1: 9 (17.8 bits) >>>>>>> X2: 20 (39.6 bits) >>>>>>> X3: 51 (101.1 bits) >>>>>>> S1: 9 (18.3 bits) >>>>>>> S2: 9 (18.3 bits) >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From shalabh.sharma7 at gmail.com Fri Mar 5 20:06:30 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 15:06:30 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) Message-ID: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> Hi All, I have a set of accession numbers. Is it possible to get "isolation_source" from the GenBank records for all the Accession numbers. I would really appreciate if anyone can help me out. Thanks Shalabh From shalabh.sharma7 at gmail.com Fri Mar 5 20:29:17 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 15:29:17 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> Message-ID: <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> HI Brian, Thanks for your quick reply. I was reading the document and it think it talks about parsing a GenBank record. What i exactly want is to submit a batch of accession numbers and get "isolation_source" directly without downloading all the Genbank files. I am still reading the document may be i missed something. Thanks a lot shalabh On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > Shalabh, > > You can start by reading about how Bioperl processes Genbank files and > their annotations: > > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > > > Brian O. > > On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > > > Hi All, > > I have a set of accession numbers. Is it possible to get > > "isolation_source" from the GenBank records for all the Accession > numbers. > > > > I would really appreciate if anyone can help me out. > > > > Thanks > > Shalabh > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bosborne11 at verizon.net Fri Mar 5 20:43:33 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 05 Mar 2010 15:43:33 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> Message-ID: Shalabh, I see. I think you could use EUtils then. Take a look at these: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service I'm not an expert on these, and I do not know if one can ask for just a tag value ("isolation_source"). Getting a tag value from the downloaded Genbank entry is not difficult though, that Feature-Annotation HOWTO shows you how. Brian O. On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > HI Brian, > Thanks for your quick reply. > I was reading the document and it think it talks about parsing a GenBank > record. What i exactly want is to submit a batch of accession numbers and > get "isolation_source" directly without downloading all the Genbank files. > I am still reading the document may be i missed something. > > Thanks a lot > shalabh > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > >> Shalabh, >> >> You can start by reading about how Bioperl processes Genbank files and >> their annotations: >> >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation >> >> >> >> Brian O. >> >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: >> >>> Hi All, >>> I have a set of accession numbers. Is it possible to get >>> "isolation_source" from the GenBank records for all the Accession >> numbers. >>> >>> I would really appreciate if anyone can help me out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Fri Mar 5 20:13:45 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 05 Mar 2010 15:13:45 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> Message-ID: <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> Shalabh, You can start by reading about how Bioperl processes Genbank files and their annotations: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Brian O. On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > Hi All, > I have a set of accession numbers. Is it possible to get > "isolation_source" from the GenBank records for all the Accession numbers. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 5 21:22:47 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 05 Mar 2010 15:22:47 -0600 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> Message-ID: <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> Regardless on what you try, it will only limit records returned (e.g. you will still get full records, unless you take steps to limit those somehow, by adding sequence start/stop, etc). Anyway, this worked to retrieve those with that tag: "src isolation source"[Properties] That get a lot of hits. If you are only interested in that one line you could just parse it out w/o resorting to bioperl (beleiev it or not, it's not always the best answer). chris On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: > Shalabh, > > I see. I think you could use EUtils then. Take a look at these: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > I'm not an expert on these, and I do not know if one can ask for just a tag value ("isolation_source"). Getting a tag value from the downloaded Genbank entry is not difficult though, that Feature-Annotation HOWTO shows you how. > > Brian O. > > > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > > > HI Brian, > > Thanks for your quick reply. > > I was reading the document and it think it talks about parsing a GenBank > > record. What i exactly want is to submit a batch of accession numbers and > > get "isolation_source" directly without downloading all the Genbank files. > > I am still reading the document may be i missed something. > > > > Thanks a lot > > shalabh > > > > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne wrote: > > > >> Shalabh, > >> > >> You can start by reading about how Bioperl processes Genbank files and > >> their annotations: > >> > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > >> > >> > >> > >> Brian O. > >> > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > >> > >>> Hi All, > >>> I have a set of accession numbers. Is it possible to get > >>> "isolation_source" from the GenBank records for all the Accession > >> numbers. > >>> > >>> I would really appreciate if anyone can help me out. > >>> > >>> Thanks > >>> Shalabh > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Fri Mar 5 22:06:41 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 17:06:41 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> Message-ID: <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> Thanks Bran and Chris, I followed the example given here : http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook to retrieve raw data records from genbank. For example i used the id : 157091572 to get the genbank record, but the downloaded file does not contain "isolation_source" which is there when you look for the record online: http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&dopt=GenBank&RID=T2S9N0PJ01N&log%24=nuclalign&blast_rank=1&list_uids=157091572 Thanks Shalabh On Fri, Mar 5, 2010 at 4:22 PM, Chris Fields wrote: > Regardless on what you try, it will only limit records returned (e.g. > you will still get full records, unless you take steps to limit those > somehow, by adding sequence start/stop, etc). > > Anyway, this worked to retrieve those with that tag: > "src isolation source"[Properties] > > That get a lot of hits. > > If you are only interested in that one line you could just parse it out > w/o resorting to bioperl (beleiev it or not, it's not always the best > answer). > > chris > > On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: > > Shalabh, > > > > I see. I think you could use EUtils then. Take a look at these: > > > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service > > > > I'm not an expert on these, and I do not know if one can ask for just a > tag value ("isolation_source"). Getting a tag value from the downloaded > Genbank entry is not difficult though, that Feature-Annotation HOWTO shows > you how. > > > > Brian O. > > > > > > > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: > > > > > HI Brian, > > > Thanks for your quick reply. > > > I was reading the document and it think it talks about parsing a > GenBank > > > record. What i exactly want is to submit a batch of accession numbers > and > > > get "isolation_source" directly without downloading all the Genbank > files. > > > I am still reading the document may be i missed something. > > > > > > Thanks a lot > > > shalabh > > > > > > > > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne >wrote: > > > > > >> Shalabh, > > >> > > >> You can start by reading about how Bioperl processes Genbank files and > > >> their annotations: > > >> > > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation > > >> > > >> > > >> > > >> Brian O. > > >> > > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: > > >> > > >>> Hi All, > > >>> I have a set of accession numbers. Is it possible to get > > >>> "isolation_source" from the GenBank records for all the Accession > > >> numbers. > > >>> > > >>> I would really appreciate if anyone can help me out. > > >>> > > >>> Thanks > > >>> Shalabh > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From shalabh.sharma7 at gmail.com Fri Mar 5 22:57:00 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 5 Mar 2010 17:57:00 -0500 Subject: [Bioperl-l] Accession Nuber to Genbank Record (Isolation Source) In-Reply-To: <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> References: <9fcc48c71003051206s1b822059l314e6827d7ba3fba@mail.gmail.com> <224F4102-60C1-4BB0-8685-571ECDFF0FBC@verizon.net> <9fcc48c71003051229o3f352c2w2806c45ecfcb48ec@mail.gmail.com> <1267824167.11339.126.camel@pyrimidine.igb.uiuc.edu> <9fcc48c71003051406n4ea25b1atb66eaee32f8010dc@mail.gmail.com> Message-ID: <9fcc48c71003051457x7186e3e0y1c9b8ee5ea81e153@mail.gmail.com> Thanks everyone, i got it what i was looking for. EUtlities helped me a lot. Thanks Shalabh On Fri, Mar 5, 2010 at 5:06 PM, shalabh sharma wrote: > Thanks Bran and Chris, > I followed the example given here : > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > to retrieve raw data records from genbank. > For example i used the id : 157091572 to get the genbank record, but the > downloaded file does not contain "isolation_source" which is there when you > look for the record online: > > http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=nucleotide&dopt=GenBank&RID=T2S9N0PJ01N&log%24=nuclalign&blast_rank=1&list_uids=157091572 > > Thanks > Shalabh > > > On Fri, Mar 5, 2010 at 4:22 PM, Chris Fields wrote: > >> Regardless on what you try, it will only limit records returned (e.g. >> you will still get full records, unless you take steps to limit those >> somehow, by adding sequence start/stop, etc). >> >> Anyway, this worked to retrieve those with that tag: >> "src isolation source"[Properties] >> >> That get a lot of hits. >> >> If you are only interested in that one line you could just parse it out >> w/o resorting to bioperl (beleiev it or not, it's not always the best >> answer). >> >> chris >> >> On Fri, 2010-03-05 at 15:43 -0500, Brian Osborne wrote: >> > Shalabh, >> > >> > I see. I think you could use EUtils then. Take a look at these: >> > >> > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> > >> > http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service >> > >> > I'm not an expert on these, and I do not know if one can ask for just a >> tag value ("isolation_source"). Getting a tag value from the downloaded >> Genbank entry is not difficult though, that Feature-Annotation HOWTO shows >> you how. >> > >> > Brian O. >> > >> > >> >> > On Mar 5, 2010, at 3:29 PM, shalabh sharma wrote: >> > >> > > HI Brian, >> > > Thanks for your quick reply. >> > > I was reading the document and it think it talks about parsing a >> GenBank >> > > record. What i exactly want is to submit a batch of accession numbers >> and >> > > get "isolation_source" directly without downloading all the Genbank >> files. >> > > I am still reading the document may be i missed something. >> > > >> > > Thanks a lot >> > > shalabh >> > > >> > > >> > > On Fri, Mar 5, 2010 at 3:13 PM, Brian Osborne > >wrote: >> > > >> > >> Shalabh, >> > >> >> > >> You can start by reading about how Bioperl processes Genbank files >> and >> > >> their annotations: >> > >> >> > >> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation >> > >> >> > >> >> > >> >> > >> Brian O. >> > >> >> > >> On Mar 5, 2010, at 3:06 PM, shalabh sharma wrote: >> > >> >> > >>> Hi All, >> > >>> I have a set of accession numbers. Is it possible to get >> > >>> "isolation_source" from the GenBank records for all the Accession >> > >> numbers. >> > >>> >> > >>> I would really appreciate if anyone can help me out. >> > >>> >> > >>> Thanks >> > >>> Shalabh >> > >>> _______________________________________________ >> > >>> Bioperl-l mailing list >> > >>> Bioperl-l at lists.open-bio.org >> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> > >> >> > > _______________________________________________ >> > > Bioperl-l mailing list >> > > Bioperl-l at lists.open-bio.org >> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From cjfields at illinois.edu Sat Mar 6 04:14:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 5 Mar 2010 22:14:01 -0600 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> Message-ID: <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> On Mar 5, 2010, at 7:48 AM, Peter wrote: > On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >> >>> Is there a misunderstanding here? >> >> Whoops, yes there is ? that's my fault, too. I did not >> read carefully and conflated EUtilities and RemoteBLAST. >> >> Just to be clear, the upcoming email requirement will >> be for EUtilities, NOT for RemoteBLAST. >> >> Thanks for clearing that up, Peter. >> Dave > > No problem - you guys had me worried there for a minute ;) > > Peter Just to bring this thread full circle, I have committed a fix which (ironically) reduced the code down a bit. I also added an attribute (get_rtoe) that returns the approximate time until the report is returned. chris From joa2006 at med.cornell.edu Sat Mar 6 22:13:45 2010 From: joa2006 at med.cornell.edu (Josef Anrather) Date: Sat, 06 Mar 2010 17:13:45 -0500 Subject: [Bioperl-l] [Fwd: Enquiry about Remoteblast.pm] In-Reply-To: <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> References: <194105EF-93BF-4420-8127-CD65D47E320C@med.cornell.edu> <320fb6e01003050531kc4b556xb7223651cd362ff8@mail.gmail.com> <7D5B1C6B-82F3-4318-8C0B-D3DE75C02B26@sbc.su.se> <320fb6e01003050548y17c15ac2r181d9d197dd2ee52@mail.gmail.com> <282EA736-CDE2-4815-9E1F-36DA45111CCA@illinois.edu> Message-ID: Chris, the fix works flawlessly on my system. Thanks for the fast response. Cheers, Josef On Mar 5, 2010, at 11:14 PM, Chris Fields wrote: > > On Mar 5, 2010, at 7:48 AM, Peter wrote: > >> On Fri, Mar 5, 2010 at 1:44 PM, Dave Messina wrote: >>> >>>> Is there a misunderstanding here? >>> >>> Whoops, yes there is ? that's my fault, too. I did not >>> read carefully and conflated EUtilities and RemoteBLAST. >>> >>> Just to be clear, the upcoming email requirement will >>> be for EUtilities, NOT for RemoteBLAST. >>> >>> Thanks for clearing that up, Peter. >>> Dave >> >> No problem - you guys had me worried there for a minute ;) >> >> Peter > > Just to bring this thread full circle, I have committed a fix which > (ironically) reduced the code down a bit. I also added an attribute > (get_rtoe) that returns the approximate time until the report is > returned. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jarodpardon at yahoo.com.cn Sun Mar 7 09:13:40 2010 From: jarodpardon at yahoo.com.cn (=?gb2312?B?1MYgus4=?=) Date: Sun, 7 Mar 2010 17:13:40 +0800 (CST) Subject: [Bioperl-l] insertion code in pdb parser Message-ID: <643595.96038.qm@web15003.mail.cnb.yahoo.com> hi, all, insertion code for a residue number is very common in many cases, esp. in the numbering schema for antibody sequence, such as 82A, 82B. When Bio::Structure::IO::pdb parses a pdb file containing residues with insertion code, it will assign the id for such residue like 'PRO-52.A' where 'A' is the insertion code, however, the opposite operation (set the id of the residue) does not work. for example, if the original residue number is 51, $res->id('PRO-52.A') will not append the insertion code after the residue number correctly, though it indeed changes the residue number from 51 to 52. Finally, I found out the only way to set the insertion code for the residue: assign the insertion code for all atoms of this residue by the method $atom->icode('A'). I think it is inconvenient and misleading, since insertion code should not be a property for an atom, it is never seen that a residue have atoms with different insertion codes. I highly recommend that there should be some changes: add icode method for residue object, not the atom, as the same, the segment id should also be for residue. Jarod From rtbio.2009 at gmail.com Sun Mar 7 13:11:54 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Sun, 7 Mar 2010 14:11:54 +0100 Subject: [Bioperl-l] remoteblast Message-ID: Hello Mark and everybody, I have been trying to connect to remote blast to retrieve similar sequences to a given sequence. But my program is unable to retrieve the sequences from BLAST, i.e., it is getting executed till the remote blast ids, but it is not entering the else loop after collecting the rid. Please check this problem and help me in this regard. I think the problem is in getting the sequence and going to the 'else' part. i.e., else { open(OUTFILE,'>',$blastdebugfile); # I think the problem is in else part, i.e., it is not taking the next result.# print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output Please give me your reply. Thanks and regards, Roopa. My code is as follows. #!/usr/bin/perl #path for extra camel module use lib "/srv/www/htdocs/rain/RNAi/"; use rnai_blast; use Bio::SearchIO; use Bio::Search::Result::BlastResult; use Bio::Perl; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; $serverpath = "/srv/www/htdocs/rain/RNAi"; $serverurl = "http://141.84.66.66/rain/RNAi"; $outfile = $serverpath."/rnairesult_".time().".html"; $nuc = $serverpath."/nuc".time().".txt"; $debugfile = $serverpath."/debug_".time().".txt"; $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; my $outstring =""; &parse_form; print "Content-type: text/html\n\n"; print "\n"; print "RNAi Result"; print " \n"; print "\n"; print "\n"; print " Your results will appear here
"; print " Please be patient, runtime can be up to 5 minutes
"; print " This page will automatically reload in 30 seconds."; print "\n"; print "\n"; defined(my $pid = fork) or die "Can't fork: $!"; exit if $pid; open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; open(OUTFILE, '>',$outfile); print OUTFILE "\n RNAi Result \n \n \n Your results will appear here
Please be patient, runtime can be up to 5 minutes
This page will automatically reload in 30 seconds
\n \n"; close(OUTFILE); @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); $in{'Inputseq'} =~ s/>.*$//m; $in{'Inputseq'} =~ s/[^TAGC]//gim; $in{'Inputseq'} =~ tr/actg/ACTG/; @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, $in{'Threshold'}); sub blastcode { $inpu1= $_[0]; $organ= $_[1]; open(NUC,'>',$nuc); print NUC $inpu1,"\n"; close(NUC); my $prog = 'blastn'; my $db = 'refseq_rna'; my $e_val= '1e-10'; my $organism= $organ; $gb = new Bio::DB::GenBank; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO', '-Organism' => $organism ); open(OUTFILE,'>',$blastdebugfile); print OUTFILE @params; close(OUTFILE); my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => "$organ\[ORGN]"); #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma Brucei[ORGN]'; #change a paramter # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; my $v = 1; #$v is just to turn on and off the messages my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]"); while (my $input = $str->next_seq()) { #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. open(OUTFILE,'>',$debugfile); print OUTFILE $input; close(OUTFILE); #submits the input data to BLAST# my $r = $factory->submit_blast($input); open(OUTFILE,'>',$debugfile); print OUTFILE $r; close(OUTFILE); print STDERR "waiting...." if($v>0); while ( my @rids = $factory->each_rid ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "while entered"; close(OUTFILE); foreach my $rid ( @rids ) { open(OUTFILE,'>',$debugfile); # print OUTFILE "foreach entered"; close(OUTFILE); #Retrieving the result ids# my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } open(OUTFILE,'>',$debugfile); # print OUTFILE "if entered"; close(OUTFILE); print STDERR "." if ( $v > 0 ); sleep 5; } else { open(OUTFILE,'>',$blastdebugfile); # I think the problem is in else part, i.e., it is not taking the next result.# print OUTFILE "else entered"; close(OUTFILE); my $result = $rc->next_result(); #save the output $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; open(BLASTDEBUGFILE,'>',$blastdebugfile); print BLASTDEBUGFILE $result->next_hit(); close(BLASTDEBUGFILE); #saving the output in blastdata.time.out file# # $random=rand(); my $filename = $serverpath."/blastdata_".time()."\.out"; # open(DEBUGFILE,'>',$debugfile); # open(new,'>',$filename); # @arra=; # print DEBUGFILE @arra; # close(DEBUGFILE); # close(new); $factory->save_output($filename); # open(BLASTDEBUGFILE,'>',$debugfile); # print BLASTDEBUGFILE "Hello $rid"; # close(BLASTDEBUGFILE); $factory->remove_rid($rid); open(BLASTDEBUGFILE,'>',$blastdebugfile); # print BLASTDEBUGFILE $organism; close(BLASTDEBUGFILE); # open(OUTFILE,'>',$outfile); # print OUTFILE "Test2 $result->database_name()"; # close(OUTFILE); #$hit = $result->next_hit; #open(new,'>',$debugfile); #print $hit; #close(new); $dummy=0; while ( my $hit = $result->next_hit ) { next unless ( $v >= 0); # open(OUTFILE,'>',$debugfile); # print OUTFILE "$hit in while hits"; # close(OUTFILE); my $sequ = $gb->get_Seq_by_version($hit->name); my $dna = $sequ->seq(); # get the sequence as a string $dummy++; open(OUTFILE,'>',$debugfile); # print OUTFILE $dna; close(OUTFILE); push(@seqs,$dna); } } } } } $warum=@seqs; open(OUTFILE,'>',$debugfile); # print OUTFILE $warum; print OUTFILE @seqs; close(OUTFILE); return(@seqs); #returning the sequences obtained on BLAST# } From cjfields at illinois.edu Sun Mar 7 14:57:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 7 Mar 2010 08:57:43 -0600 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Roopa, I committed a fix for this a few days ago; if you update from SVN it should work. The problem stemmed from server-side changes at NCBI. chris On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: > Hello Mark and everybody, > > I have been trying to connect to remote blast to retrieve similar sequences > to a given sequence. But my program is unable to retrieve the sequences from > BLAST, i.e., it is getting executed till the remote blast ids, but it is not > entering the else loop after collecting the rid. Please check this problem > and help me in this regard. I think the problem is in getting the sequence > and going to the 'else' part. i.e., > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > > Please give me your reply. > > Thanks and regards, > Roopa. > > My code is as follows. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use rnai_blast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds."; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes
> This page will automatically reload in 30 seconds
> \n > \n"; > > close(OUTFILE); > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$blastdebugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > "$organ\[ORGN]"); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > #submits the input data to BLAST# > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > #Retrieving the result ids# > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > #saving the output in blastdata.time.out file# > > # $random=rand(); > > my $filename = $serverpath."/blastdata_".time()."\.out"; > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > # print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dna; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > close(OUTFILE); > > > return(@seqs); #returning the sequences obtained on BLAST# > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jdetras at gmail.com Fri Mar 5 06:17:40 2010 From: jdetras at gmail.com (Jeffrey Detras) Date: Fri, 5 Mar 2010 14:17:40 +0800 Subject: [Bioperl-l] distances between leaf nodes Message-ID: Hi, I am new at using the Bio::TreeIO module specifically using the newick format for a phylogenetic analysis. The sample_tree attached is Newick-formatted tree. My objective is to get all the distances between all the leaf nodes. I copied examples of the code from http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my knowledge) so that I understand how to assign the right array value for the nodes/leaves. The message would say must provide 2 root nodes. Here is what I have right now: #!/usr/bin/perl -w use strict; my $treefile = 'sample_tree'; use Bio::TreeIO; my $treeio = Bio::TreeIO->new(-format => 'newick', -file => $treefile); while (my $tree = $treeio->next_tree) { my @leaves = $tree->get_leaf_nodes; for (my $dist = $tree->distance(-nodes => \@leaves)){ print "Distance between trees is $dist\n"; } } Thanks, Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_tree Type: application/octet-stream Size: 418 bytes Desc: not available URL: From janine.arloth at googlemail.com Fri Mar 5 09:43:57 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Fri, 5 Mar 2010 10:43:57 +0100 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: Message-ID: Hello, using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: 1 50 test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA 51 100 test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA But I prefer this format: Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 ||||||||||||||||||||||||| Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 How can I get this? Best Regards From elujan at stanford.edu Mon Mar 8 00:49:34 2010 From: elujan at stanford.edu (Ernesto George Lujan) Date: Sun, 7 Mar 2010 16:49:34 -0800 (PST) Subject: [Bioperl-l] Installing BioPerl In-Reply-To: <1189627897.1477411268008644137.JavaMail.root@zm09.stanford.edu> Message-ID: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> Hi everyone, I'm running MacOSX 10.5.8 with Perl 5.8.8 and I'm having trouble installing the BioPerl module. I've downloaded and installed BioPerl 1.5.1-2 binary through FinkCommander, but when I type perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' into the Terminal, it tells me that I'm using BioPerl Version 1.006. How do I get this module to install correctly? Once again, my specs: Perl Version: 5.8.8 BioPerl Version: 1.006 Operating System: Max OSX 10.5.8 Thanks! -BioPerl Beginner From bimber at wisc.edu Mon Mar 8 03:57:12 2010 From: bimber at wisc.edu (Ben Bimber) Date: Sun, 7 Mar 2010 21:57:12 -0600 Subject: [Bioperl-l] Bioperl-run malformed svndiff Message-ID: <9f985cdc1003071957h6c82d4b8t1a6b9a3af7752bde@mail.gmail.com> I recently tried to check out a complete version of bioperl-run and received an error saying 'malformed svndiff'. I've tried this on two different machines, so unless I've doing something wrong, it should be reproducible. I cannot say where updating an existing repository would throw the same error or not. Below is the log: *** Check Out svn checkout "svn://code.open-bio.org/bioperl/bioperl-run/trunk/lib/Bio at HEAD" -r HEAD --depth infinity "C:\Projects\Bio" A C:/Projects/Bio/Tools A C:/Projects/Bio/Tools/Run A C:/Projects/Bio/Tools/Run/Genewise.pm A C:/Projects/Bio/Tools/Run/Analysis A C:/Projects/Bio/Tools/Run/Analysis/soap.pm A C:/Projects/Bio/Tools/Run/AssemblerBase.pm A C:/Projects/Bio/Tools/Run/BWA.pm A C:/Projects/Bio/Tools/Run/Phrap.pm A C:/Projects/Bio/Tools/Run/FootPrinter.pm A C:/Projects/Bio/Tools/Run/AnalysisFactory.pm A C:/Projects/Bio/Tools/Run/BEDTools.pm A C:/Projects/Bio/Tools/Run/EMBOSSApplication.pm A C:/Projects/Bio/Tools/Run/Genscan.pm A C:/Projects/Bio/Tools/Run/RNAMotif.pm A C:/Projects/Bio/Tools/Run/Phylo A C:/Projects/Bio/Tools/Run/Phylo/Phast A C:/Projects/Bio/Tools/Run/Phylo/Phast/PhyloFit.pm A C:/Projects/Bio/Tools/Run/Phylo/Phast/PhastCons.pm A C:/Projects/Bio/Tools/Run/Phylo/Semphy.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/FEL.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/Base.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/Modeltest.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/REL.pm A C:/Projects/Bio/Tools/Run/Phylo/Hyphy/SLAC.pm A C:/Projects/Bio/Tools/Run/Phylo/PhyloBase.pm A C:/Projects/Bio/Tools/Run/Phylo/Phyml.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip A C:/Projects/Bio/Tools/Run/Phylo/Phylip/DrawGram.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/ProtDist.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Base.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/ProtPars.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Consense.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/DrawTree.pm A C:/Projects/Bio/Tools/Run/Phylo/Phylip/Neighbor.pm A C:/Projects/Bio/Tools/Run/Phylo/Njtree A C:/Projects/Bio/Tools/Run/Phylo/Njtree/Best.pm A C:/Projects/Bio/Tools/Run/Phylo/QuickTree.pm A C:/Projects/Bio/Tools/Run/Phylo/Gerp.pm A C:/Projects/Bio/Tools/Run/Phylo/Molphy A C:/Projects/Bio/Tools/Run/Phylo/Molphy/ProtML.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML A C:/Projects/Bio/Tools/Run/Phylo/PAML/Yn00.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Evolver.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Baseml.pm A C:/Projects/Bio/Tools/Run/Phylo/PAML/Codeml.pm A C:/Projects/Bio/Tools/Run/Phylo/SLR.pm A C:/Projects/Bio/Tools/Run/Phylo/Gumby.pm A C:/Projects/Bio/Tools/Run/Phylo/LVB.pm A C:/Projects/Bio/Tools/Run/Primer3.pm A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus.pm A C:/Projects/Bio/Tools/Run/Meme.pm A C:/Projects/Bio/Tools/Run/RepeatMasker.pm A C:/Projects/Bio/Tools/Run/Analysis.pm A C:/Projects/Bio/Tools/Run/Cap3.pm A C:/Projects/Bio/Tools/Run/Vista.pm A C:/Projects/Bio/Tools/Run/Pseudowise.pm A C:/Projects/Bio/Tools/Run/Minimo.pm A C:/Projects/Bio/Tools/Run/Match.pm A C:/Projects/Bio/Tools/Run/Mdust.pm A C:/Projects/Bio/Tools/Run/Eponine.pm A C:/Projects/Bio/Tools/Run/Infernal.pm A C:/Projects/Bio/Tools/Run/BlastPlus A C:/Projects/Bio/Tools/Run/BlastPlus/Config.pm A C:/Projects/Bio/Tools/Run/EMBOSSacd.pm A C:/Projects/Bio/Tools/Run/Alignment A C:/Projects/Bio/Tools/Run/Alignment/Proda.pm A C:/Projects/Bio/Tools/Run/Alignment/Kalign.pm A C:/Projects/Bio/Tools/Run/Alignment/StandAloneFasta.pm A C:/Projects/Bio/Tools/Run/Alignment/TCoffee.pm A C:/Projects/Bio/Tools/Run/Alignment/Sim4.pm A C:/Projects/Bio/Tools/Run/Alignment/Probalign.pm A C:/Projects/Bio/Tools/Run/Alignment/Amap.pm A C:/Projects/Bio/Tools/Run/Alignment/Lagan.pm A C:/Projects/Bio/Tools/Run/Alignment/Blat.pm A C:/Projects/Bio/Tools/Run/Alignment/Gmap.pm A C:/Projects/Bio/Tools/Run/Alignment/Probcons.pm A C:/Projects/Bio/Tools/Run/Alignment/DBA.pm A C:/Projects/Bio/Tools/Run/Alignment/Muscle.pm A C:/Projects/Bio/Tools/Run/Alignment/Pal2Nal.pm A C:/Projects/Bio/Tools/Run/Alignment/Exonerate.pm A C:/Projects/Bio/Tools/Run/Alignment/MAFFT.pm A C:/Projects/Bio/Tools/Run/Alignment/Clustalw.pm A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus A C:/Projects/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm A C:/Projects/Bio/Tools/Run/Hmmer.pm A C:/Projects/Bio/Tools/Run/BlastPlus.pm A C:/Projects/Bio/Tools/Run/ERPIN.pm A C:/Projects/Bio/Tools/Run/Maq.pm A C:/Projects/Bio/Tools/Run/Bowtie A C:/Projects/Bio/Tools/Run/Bowtie/Config.pm A C:/Projects/Bio/Tools/Run/Seg.pm A C:/Projects/Bio/Tools/Run/Prints.pm A C:/Projects/Bio/Tools/Run/MCS.pm A C:/Projects/Bio/Tools/Run/Tmhmm.pm A C:/Projects/Bio/Tools/Run/Ensembl.pm A C:/Projects/Bio/Tools/Run/Coil.pm A C:/Projects/Bio/Tools/Run/Samtools A C:/Projects/Bio/Tools/Run/Samtools/Config.pm A C:/Projects/Bio/Tools/Run/Genemark.pm A C:/Projects/Bio/Tools/Run/Bowtie.pm A C:/Projects/Bio/Tools/Run/Glimmer.pm A C:/Projects/Bio/Tools/Run/Signalp.pm A C:/Projects/Bio/Tools/Run/Simprot.pm A C:/Projects/Bio/Tools/Run/BWA A C:/Projects/Bio/Tools/Run/BWA/Config.pm A C:/Projects/Bio/Tools/Run/Newbler.pm svn: Malformed svndiff data in representation *** Error (took 00:07.184) From David.Messina at sbc.su.se Mon Mar 8 07:01:13 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 8 Mar 2010 08:01:13 +0100 Subject: [Bioperl-l] Installing BioPerl In-Reply-To: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> References: <1598310059.1479181268009374330.JavaMail.root@zm09.stanford.edu> Message-ID: <0483C203-3E81-4112-877B-BC7A439CB916@sbc.su.se> Hey Ernesto, I'm pretty sure you've got BioPerl version 1.6.0, which is actually more current than 1.5.2 that you were looking for. Due to oddities of Perl version numbers, 1.006 = 1.6.0 (or something like that). So I think you're probably good to go. I should also mention that direct installation (i.e. not via fink) works pretty well these days, and through that you can get the current BioPerl release, which is 1.6.2 (or 1.006002000000000). Dave From alex at bioinf.uni-leipzig.de Mon Mar 8 15:45:14 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Mon, 8 Mar 2010 16:45:14 +0100 (CET) Subject: [Bioperl-l] Problem with PAML/Codeml wrapper Message-ID: Hi, I do have a problem with the PAML/Codeml wrapper. I want to calculate all pairwise K_a,K_s values from a given alignment, using the example procedure of http://www.bioperl.org/wiki/HOWTO:PAML my $dna_aln = aa_to_dna_aln($aln, \%seqs); my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new( -params => { 'runmode' => -2, 'seqtype' => 1,} ); $kaks_factory->alignment($dna_aln); my ($rc,$parser) = $kaks_factory->run(); my $result = $parser->next_result(); But I receive an error: -------------------- WARNING --------------------- MSG: There was an error - see error_string for the program output --------------------------------------------------- ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Unknown format of PAML output did not see seqtype STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/vendor_perl/5.10.0/Bio/Root/Root.pm:359 STACK: Bio::Tools::Phylo::PAML::_parse_summary /usr/lib/perl5/vendor_perl/5.10.0/Bio/Tools/Phylo/PAML.pm:441 STACK: Bio::Tools::Phylo::PAML::next_result /usr/lib/perl5/vendor_perl/5.10.0/Bio/Tools/Phylo/PAML.pm:257 I use PAML4.4. Could this be the reason? Best, Alex --- By the time you've read this, you've already read it! From David.Messina at sbc.su.se Mon Mar 8 16:29:00 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 8 Mar 2010 17:29:00 +0100 Subject: [Bioperl-l] Problem with PAML/Codeml wrapper In-Reply-To: References: Message-ID: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> Hi Alexander, Hmm, it *should* work given those parameters ? it does for 4.3b ? but I haven't tested it with codeml 4.4 yet. Could you file a bug, including a small test case (code + sequence) so we can try to reproduce and fix the problem? http://bugzilla.open-bio.org/ Thanks, Dave From alex at bioinf.uni-leipzig.de Mon Mar 8 17:11:42 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Mon, 8 Mar 2010 18:11:42 +0100 (CET) Subject: [Bioperl-l] Problem with PAML/Codeml wrapper In-Reply-To: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> References: <9DB11D6C-04A9-4B24-852C-B18F57F90CB9@sbc.su.se> Message-ID: sure. thanks! alex On Mon, 8 Mar 2010, Dave Messina wrote: > Hi Alexander, > > Hmm, it *should* work given those parameters ? it does for 4.3b ? but I haven't tested it with codeml 4.4 yet. > > Could you file a bug, including a small test case (code + sequence) so we can try to reproduce and fix the problem? > > http://bugzilla.open-bio.org/ > > > Thanks, > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > --- By the time you've read this, you've already read it! From jovel_juan at hotmail.com Tue Mar 9 04:08:20 2010 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 9 Mar 2010 04:08:20 +0000 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: , Message-ID: Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor, but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Explore the seven wonders of the world http://search.msn.com/results.aspx?q=7+wonders+world&mkt=en-US&form=QBRE From jovel_juan at hotmail.com Tue Mar 9 04:50:45 2010 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 9 Mar 2010 04:50:45 +0000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , Message-ID: Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE From florent.angly at gmail.com Tue Mar 9 06:41:33 2010 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 09 Mar 2010 16:41:33 +1000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , Message-ID: <4B95ED9D.6080307@gmail.com> Hi Juan, How about you throw away sequences that have a mismatch in the adapter? After all, if there is a mismatch in the first few bases, it does not bode well for the rest of the sequence and there are so many sequences that it is not a big loss. Florent On 09/03/10 14:50, Juan Jovel wrote: > > > Hello Guys! > > Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. > > I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. > > Any suggestion will be appreciated. Thanks! > > JUAN > > > _________________________________________________________________ > Discover the new Windows Vista > http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From michael.watson at bbsrc.ac.uk Tue Mar 9 06:38:26 2010 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Tue, 9 Mar 2010 06:38:26 +0000 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: References: , , , , Message-ID: <8D08960C647E64438CE5740657CBBDC501F910621D@iahcexch1.iah.bbsrc.ac.uk> Use fastx toolkit or something within emboss. Failing that, just write something in pure perl:) ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] On Behalf Of Juan Jovel [jovel_juan at hotmail.com] Sent: 09 March 2010 04:50 To: bioperl Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From acn at stowers.org Tue Mar 9 06:31:49 2010 From: acn at stowers.org (Noll, Aaron) Date: Tue, 9 Mar 2010 00:31:49 -0600 Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? In-Reply-To: Message-ID: http://hannonlab.cshl.edu/fastx_toolkit/commandline.html try out the clipper tool FASTA/Q Clipper $ fastx_clipper -h usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE] version 0.0.6 [-h] = This helpful help screen. [-a ADAPTER] = ADAPTER string. default is CCTTAAGG (dummy adapter). [-l N] = discard sequences shorter than N nucleotides. default is 5. [-d N] = Keep the adapter and N bases after it. (using '-d 0' is the same as not using '-d' at all. which is the default). [-c] = Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter). [-C] = Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter). [-k] = Report Adapter-Only sequences. [-n] = keep sequences with unknown (N) nucleotides. default is to discard such sequences. [-v] = Verbose - report number of sequences. If [-o] is specified, report will be printed to STDOUT. If [-o] is not specified (and output goes to STDOUT), report will be printed to STDERR. [-z] = Compress output with GZIP. [-D] = DEBUG output. [-i INFILE] = FASTA/Q input file. default is STDIN. [-o OUTFILE] = FASTA/Q output file. default is STDOUT. This is a suite of nice utilities that can be downloaded and that by the way are also used by galaxy. -Aaron -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Juan Jovel Sent: Monday, March 08, 2010 10:51 PM To: bioperl Subject: [Bioperl-l] How to trim 3' adaptors from solexa reads? Hello Guys! Does anybody has a good suggestion on how to trim 3' adapters from reads coming out from the Illumina pipeline? It becomes specially difficult when the quality of the reads is poor at the 3' end. I have been doing that with BioConductor (ShortRead library), but still is not good enough to fish adapters that contain mismatches in the Solexa reads. Any suggestion will be appreciated. Thanks! JUAN _________________________________________________________________ Discover the new Windows Vista http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From alex at bioinf.uni-leipzig.de Tue Mar 9 18:00:01 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Tue, 9 Mar 2010 19:00:01 +0100 (CET) Subject: [Bioperl-l] bootstrap values in cladogram Message-ID: Hi, using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and bootstrap values and try to plot the tree as cladogram. But somehow I cannot print the bootstrap values. Short example: test.nwk ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); [..] use Bio::TreeIO; use Bio::Tree::Draw::Cladogram; [..] my $trees = Bio::TreeIO->new( -file => "test.nwk", -format => 'newick'); my $tree = $trees->next_tree(); [..] my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, -tree => $tree, -compact => 0); $out->print(-file => "test.eps"); I already tried it by copying the bootstrap values into the ids of the internal nodes - nothing. Any suggestions? Thanks, Alex --- By the time you've read this, you've already read it! From jason at bioperl.org Tue Mar 9 20:49:06 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 09 Mar 2010 12:49:06 -0800 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: References: Message-ID: <4B96B442.8070003@bioperl.org> SearchIO writer -> BLAST format. presumably something like Bio::SearchIO::Writer::TextResultWriter Janine Arloth wrote, On 3/5/10 1:43 AM: > Hello, > using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: > > 1 50 > test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA > lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA > > > 51 100 > test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA > lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA > > > > But I prefer this format: > > > > Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 > > Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 > ||||||||||||||||||||||||| > Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 > > > How can I get this? > > Best Regards > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bhakti.dwivedi at gmail.com Tue Mar 9 20:58:34 2010 From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi) Date: Tue, 9 Mar 2010 15:58:34 -0500 Subject: [Bioperl-l] How to retrieve the Gene Info from the hit genomes start and end positions in the blast table report? Message-ID: Hi, I have a blastn and blastx report (both in blast table m-8 format) against the ncbi nr database. Based on the Hits Start and End positions, how can I retrieve the gene name/acc/id? The blast table does show the hit organism accession number, but what I want is specifically the gene to which it is hitting to. Is there a way to do this in bioperl? Thanks From David.Messina at sbc.su.se Tue Mar 9 21:39:08 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 9 Mar 2010 22:39:08 +0100 Subject: [Bioperl-l] How to retrieve the Gene Info from the hit genomes start and end positions in the blast table report? In-Reply-To: References: Message-ID: Hi Bhakti, Forgive me if the below shows that I've totally misunderstood ? it's late here. > The blast table does show the hit organism > accession number, As you say, in BLAST -m 8 reports, the hit's accession number is the second column. I'm not sure when this would be different from the gene's accession number, at least for the entries in nr for which a gene name has been assigned (some are known only by their accession number). > Based on the Hits Start and End positions, how can I > retrieve the gene name/acc/id? The short answer is 'you can't'. But this makes me think that you're not going against the nr database, but instead whole genome or chromosome sequence records. In which case some of them will have genes annotated in the feature table, which you can get out using BioPerl: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation But many (most?) won't be annotated in this way, in which case you will need to find some file or database that has all the genes' start and stop positions in the sequence that you're searching. Perhaps you could provide a couple of your hits as examples so the problem is clearer? Dave From till.bayer at kaust.edu.sa Wed Mar 10 08:20:15 2010 From: till.bayer at kaust.edu.sa (Till Bayer) Date: Wed, 10 Mar 2010 11:20:15 +0300 Subject: [Bioperl-l] Bio::Index::Blast bug Message-ID: <4B97563F.3020901@kaust.edu.sa> Hi all! I tried to use Bio::Index::Blast, but always got the first hit back, no matter what ID I used. The reason is that the Blast indexer seems to use 'BLAST' as a record separator in all cases, except for RPS-BLAST. I think however that for the current versions of blastall and blast+ 'Query=' should be used. Thus, changing line 222 in Blast.pm from $indexpoint = tell($BLAST) - length $_ if ( $prefix eq 'RPS-' ); to $indexpoint = tell($BLAST) - length $_; makes it work for me. However I have no idea what RPS-BLAST may be, or what different versions of blast output are used, so maybe someone who knows should have a look at that before changing things, and writing a cleaner version than the above hack. Cheers, Till -- Till Bayer 4700 King Abdullah University for Science and Technology Building 2, Room 4231-W16 Thuwal 23955-6900 Saudi Arabia Phone: +96628082373 From avilella at gmail.com Wed Mar 10 08:55:09 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 10 Mar 2010 08:55:09 +0000 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores Message-ID: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> Hi all, I would like to know if anyone knows of a script or method in bioperl to do an unambiguous assembly of fastq sequences, combining the q-scores to give assembled fastq sequences as the output. By unambiguous I mean something like what abyss would produce with this options: ABYSS -k$k -b0 -t0 -e0 -c0 but giving assembled fastq sequences with combined q-scores as output instead of simple fasta assembled sequences. Thanks in advance From sdavis2 at mail.nih.gov Wed Mar 10 10:31:50 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 10 Mar 2010 05:31:50 -0500 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores In-Reply-To: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> References: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> Message-ID: <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> On Wed, Mar 10, 2010 at 3:55 AM, Albert Vilella wrote: > Hi all, > > I would like to know if anyone knows of a script or method in bioperl > to do an unambiguous assembly of fastq sequences, combining the q-scores to > give assembled fastq sequences as the output. > > By unambiguous I mean something like what abyss would produce with this options: > > ABYSS -k$k -b0 -t0 -e0 -c0 > > but giving assembled fastq sequences with combined q-scores as output > instead of simple > fasta assembled sequences. Hi, Albert. I'm not sure exactly what you want here, but have you looked at the Mosaik aligner? Also, look at samtools pileup; you can probably produce something similar to what you want from it as well. I certainly might have misunderstood the problem, though. Sean From biopython at maubp.freeserve.co.uk Wed Mar 10 10:35:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Mar 2010 10:35:56 +0000 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <4B97563F.3020901@kaust.edu.sa> References: <4B97563F.3020901@kaust.edu.sa> Message-ID: <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: > Hi all! > > I tried to use Bio::Index::Blast, but always got the first hit back, no > matter what ID I used. The reason is that the Blast indexer seems to use > 'BLAST' as a record separator in all cases, except for RPS-BLAST. > I think however that for the current versions of blastall and blast+ > 'Query=' should be used. That fits with changes I had to make in Biopython for breaking up the plain text BLAST output into each query. For a while only the RPS-BLAST report omitted the "header" (the BLAST line and the journal references users should cite) between records, but now all the NCBI BLAST tools do this - forcing us to look for the Query= line. i.e. I can't comment on the BioPerl change itself, but your reasoning about the BLAST output makes sense. Peter From avilella at gmail.com Wed Mar 10 10:47:01 2010 From: avilella at gmail.com (Albert Vilella) Date: Wed, 10 Mar 2010 10:47:01 +0000 Subject: [Bioperl-l] unambiguous assembly of fastq reads into fastq sequences combining q-scores In-Reply-To: <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> References: <358f4d651003100055u375c7b61kc7a46a76df8854a0@mail.gmail.com> <264855a01003100231j2e4aeab4t4b84fe01d0005936@mail.gmail.com> Message-ID: <358f4d651003100247k789344a2m2decd7283e658de9@mail.gmail.com> Hi Sean, By unambiguous assembly of reads I mean that one would not squash bubbles or trim branches, but simply collapse fully overlapping (embedded) reads by combining the q-scores, or raising the q-scores if you want, and keeping branching graphs separate. This unambiguous denovo assembly would discard depth information, which is important if you are doing digital gene expression analysis, but would produce a collapsed fastq set of sequences that would be leaner for downstream processing. I'll have a look at Mosaik. I tried samtools pileup, but it seems a bit overcomplicated to have to map back the reads if what you want to do is just have the assembled reads with fastq scores coming out of the assembler in the first place. That's why I was thinking it would be good to have this unambiguous or "dummy" fastq assembly output could fit into a bioperl script or method. Cheers On Wed, Mar 10, 2010 at 10:31 AM, Sean Davis wrote: > On Wed, Mar 10, 2010 at 3:55 AM, Albert Vilella wrote: >> Hi all, >> >> I would like to know if anyone knows of a script or method in bioperl >> to do an unambiguous assembly of fastq sequences, combining the q-scores to >> give assembled fastq sequences as the output. >> >> By unambiguous I mean something like what abyss would produce with this options: >> >> ABYSS -k$k -b0 -t0 -e0 -c0 >> >> but giving assembled fastq sequences with combined q-scores as output >> instead of simple >> fasta assembled sequences. > > Hi, Albert. > > I'm not sure exactly what you want here, but have you looked at the > Mosaik aligner? ?Also, look at samtools pileup; you can probably > produce something similar to what you want from it as well. > > I certainly might have misunderstood the problem, though. > > Sean > From adsj at novozymes.com Wed Mar 10 13:46:02 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 10 Mar 2010 14:46:02 +0100 Subject: [Bioperl-l] [PATCH] Fix infinite loop in EMBL writer. Message-ID: <87k4tke1d1.fsf@topper.koldfront.dk> This fix is an exact duplicate of the fix for bug #2915 - of the Genbank writer, which was fixed in revision 16275. --- Bio/SeqIO/embl.pm | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/Bio/SeqIO/embl.pm b/Bio/SeqIO/embl.pm index cfea1b6..de1bf11 100644 --- a/Bio/SeqIO/embl.pm +++ b/Bio/SeqIO/embl.pm @@ -1432,7 +1432,7 @@ sub _write_line_EMBL_regex { CHUNK: while($line) { foreach my $pat ($regex, '[,;\.\/-]\s|'.$regex, '[,;\.\/-]|'.$regex) { - if ($line =~ m/^(.{1,$subl})($pat)(.*)/ ) { + if ($line =~ m/^(.{0,$subl})($pat)(.*)/ ) { my $l = $1.$2; $l =~ s/#/ /g # remove word wrap protection char '#' if $pre1 eq "RA "; @@ -1441,6 +1441,7 @@ sub _write_line_EMBL_regex { # be strict about not padding spaces according to # genbank format $l =~ s/\s+$//; + next CHUNK if ($l eq ''); push(@lines, $l); next CHUNK; } -- 1.6.3.3 -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Wed Mar 10 14:27:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 08:27:59 -0600 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> Message-ID: On Mar 10, 2010, at 4:35 AM, Peter wrote: > On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >> Hi all! >> >> I tried to use Bio::Index::Blast, but always got the first hit back, no >> matter what ID I used. The reason is that the Blast indexer seems to use >> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >> I think however that for the current versions of blastall and blast+ >> 'Query=' should be used. > > That fits with changes I had to make in Biopython for breaking > up the plain text BLAST output into each query. For a while only > the RPS-BLAST report omitted the "header" (the BLAST line > and the journal references users should cite) between records, > but now all the NCBI BLAST tools do this - forcing us to look > for the Query= line. > > i.e. I can't comment on the BioPerl change itself, but your > reasoning about the BLAST output makes sense. > > Peter One side-effect of this is we will be missing the search algorithm and a few small odds and ends from all but the first report; this trickles down into how we properly deal with HSP coordinates, but we can probably wrangle some magic there to get things working for the most part. This is similar to how XML format is currently dealt with (and another reason this format is the easiest to support, as it doesn't change based on NCBI's whims). Do we have example reports with multiple queries from BLAST+ available? It would be invaluable for the projects; if not I can probably generate a few locally. chris From biopython at maubp.freeserve.co.uk Wed Mar 10 14:40:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Mar 2010 14:40:16 +0000 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> Message-ID: <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> On Wed, Mar 10, 2010 at 2:27 PM, Chris Fields wrote: > On Mar 10, 2010, at 4:35 AM, Peter wrote: > >> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >>> Hi all! >>> >>> I tried to use Bio::Index::Blast, but always got the first hit back, no >>> matter what ID I used. The reason is that the Blast indexer seems to use >>> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >>> I think however that for the current versions of blastall and blast+ >>> 'Query=' should be used. >> >> That fits with changes I had to make in Biopython for breaking >> up the plain text BLAST output into each query. For a while only >> the RPS-BLAST report omitted the "header" (the BLAST line >> and the journal references users should cite) between records, >> but now all the NCBI BLAST tools do this - forcing us to look >> for the Query= line. >> >> i.e. I can't comment on the BioPerl change itself, but your >> reasoning about the BLAST output makes sense. >> >> Peter > > One side-effect of this is we will be missing the search > algorithm and a few small odds and ends from all but > the first report; this trickles down into how we properly > deal with HSP coordinates, but we can probably wrangle > some magic there to get things working for the most part. > ... Yeah - I had similar issues with the Biopython plain text BLAST parser. The hack/magic I used was to cache the header text from the first record and then re-insert it on subsequence records. Nasty, but works. >?This is similar to how XML format is currently dealt with > (and another reason this format is the easiest to support, > as it doesn't change based on NCBI's whims). They may have changed a few things here too - watch out. > Do we have example reports with multiple queries from > BLAST+ available? ?It would be invaluable for the projects; > if not I can probably generate a few locally. I've got one example in Biopython's unit tests, http://biopython.org/SRC/biopython/Tests/Blast/bt081.txt Peter From cjfields at illinois.edu Wed Mar 10 15:19:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 09:19:42 -0600 Subject: [Bioperl-l] Bio::Index::Blast bug In-Reply-To: <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> References: <4B97563F.3020901@kaust.edu.sa> <320fb6e01003100235i64d5bbfu1b7fcfde006f940b@mail.gmail.com> <320fb6e01003100640p3a9ac966wed41943d95dbfb84@mail.gmail.com> Message-ID: <27C91884-E910-4BDF-B777-B90E7B4F9103@illinois.edu> On Mar 10, 2010, at 8:40 AM, Peter wrote: > On Wed, Mar 10, 2010 at 2:27 PM, Chris Fields wrote: >> On Mar 10, 2010, at 4:35 AM, Peter wrote: >> >>> On Wed, Mar 10, 2010 at 8:20 AM, Till Bayer wrote: >>>> Hi all! >>>> >>>> I tried to use Bio::Index::Blast, but always got the first hit back, no >>>> matter what ID I used. The reason is that the Blast indexer seems to use >>>> 'BLAST' as a record separator in all cases, except for RPS-BLAST. >>>> I think however that for the current versions of blastall and blast+ >>>> 'Query=' should be used. >>> >>> That fits with changes I had to make in Biopython for breaking >>> up the plain text BLAST output into each query. For a while only >>> the RPS-BLAST report omitted the "header" (the BLAST line >>> and the journal references users should cite) between records, >>> but now all the NCBI BLAST tools do this - forcing us to look >>> for the Query= line. >>> >>> i.e. I can't comment on the BioPerl change itself, but your >>> reasoning about the BLAST output makes sense. >>> >>> Peter >> >> One side-effect of this is we will be missing the search >> algorithm and a few small odds and ends from all but >> the first report; this trickles down into how we properly >> deal with HSP coordinates, but we can probably wrangle >> some magic there to get things working for the most part. >> ... > > Yeah - I had similar issues with the Biopython plain > text BLAST parser. The hack/magic I used was to > cache the header text from the first record and then > re-insert it on subsequence records. Nasty, but works. Right, but here's the side-effect: unless that data is somehow stored when indexing, it will not be caught if one starts an IO stream at any point past the BLAST header (in other words, all but the first report). We could, in effect, store that as meta information somehow (I think Index may have some meta storage), or just parse it prior to initiating the stream and pass the information into the IO object. >> This is similar to how XML format is currently dealt with >> (and another reason this format is the easiest to support, >> as it doesn't change based on NCBI's whims). > > They may have changed a few things here too - watch out. Ugh. >> Do we have example reports with multiple queries from >> BLAST+ available? It would be invaluable for the projects; >> if not I can probably generate a few locally. > > I've got one example in Biopython's unit tests, > http://biopython.org/SRC/biopython/Tests/Blast/bt081.txt > > Peter Okay, will start up some work to work out tests, etc. chris From thomas.sharpton at gmail.com Wed Mar 10 15:30:37 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 10 Mar 2010 07:30:37 -0800 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser Message-ID: Hey everyone, Since HMMER version 3 went live in the middle of last month, I thought it a good time to update the SearchIO parser I've been working on for some time and submit the tool to the community (finally....). At the moment, the module seems capable of parsing hmmsearch and hmmscan outputs, both with and without the alignment option. Some aspects of functionality have yet to be flushed out, but this one should be capable of doing most of your day to day procedures (at least it appears to on my end). I'd love to have people play with it and I'm happy to hear feedback, criticism, development requests and bug reports. That said, this is the first code I've contributed to BioPerl, so please be gentle ;). You can find the bioperl-hmmer3 package in bioperl-dev. I've included a test script as well as sample hmmscan/hmmsearch report files and test data in the bioperl-hmmer3 root directory. As an aside, BioPerl has been a wonderful resource for me and I'm glad to be giving back, even if only a little. I hope this helps out at least a few of you. All the best, Tom From cjfields at illinois.edu Wed Mar 10 15:53:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 10 Mar 2010 09:53:41 -0600 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser In-Reply-To: References: Message-ID: <1268236421.20872.21.camel@pyrimidine.igb.uiuc.edu> Wonderful! Tom, thanks for your hard work! chris On Wed, 2010-03-10 at 07:30 -0800, Thomas Sharpton wrote: > Hey everyone, > > Since HMMER version 3 went live in the middle of last month, I thought > it a good time to update the SearchIO parser I've been working on for > some time and submit the tool to the community (finally....). At the > moment, the module seems capable of parsing hmmsearch and hmmscan > outputs, both with and without the alignment option. Some aspects of > functionality have yet to be flushed out, but this one should be > capable of doing most of your day to day procedures (at least it > appears to on my end). > > I'd love to have people play with it and I'm happy to hear feedback, > criticism, development requests and bug reports. That said, this is > the first code I've contributed to BioPerl, so please be gentle ;). > You can find the bioperl-hmmer3 package in bioperl-dev. I've included > a test script as well as sample hmmscan/hmmsearch report files and > test data in the bioperl-hmmer3 root directory. > > As an aside, BioPerl has been a wonderful resource for me and I'm glad > to be giving back, even if only a little. I hope this helps out at > least a few of you. > > All the best, > Tom > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From asjo at koldfront.dk Wed Mar 10 17:04:00 2010 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 10 Mar 2010 18:04:00 +0100 Subject: [Bioperl-l] Fix infinite loop in EMBL writer. In-Reply-To: <87k4tke1d1.fsf@topper.koldfront.dk> ("Adam =?iso-8859-1?Q?Sj?= =?iso-8859-1?Q?=F8gren=22's?= message of "Wed, 10 Mar 2010 14:46:02 +0100") References: <87k4tke1d1.fsf@topper.koldfront.dk> Message-ID: <87wrxkw1kv.fsf@topper.koldfront.dk> On Wed, 10 Mar 2010 14:46:02 +0100, Adam wrote: > This fix is an exact duplicate of the fix for bug #2915 - of > the Genbank writer, which was fixed in revision 16275. I have created bug #3025 in bugzilla with the patch (I couldn't remember whether here or there is most appropriate). Best regards, Adam -- "It isn't modern just because it's electric. Country Adam Sj?gren music was electric too." asjo at koldfront.dk From David.Messina at sbc.su.se Wed Mar 10 17:35:52 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 10 Mar 2010 18:35:52 +0100 Subject: [Bioperl-l] Introducing SearchIOified HMMER v3 parser In-Reply-To: References: Message-ID: Thanks so much, Thomas! I expect to be using Hmmer 3 for my own work fairly soon, so I'm looking forward to taking advantage of this. Dave From rmb32 at cornell.edu Wed Mar 10 20:13:57 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 10 Mar 2010 12:13:57 -0800 Subject: [Bioperl-l] call for help - BioPerl GSoC wiki page Message-ID: <4B97FD85.50402@cornell.edu> Hi all, BioPerl's Google Summer of Code page in support of the Open Bioinformatics Foundation's application to Google Summer of Code is shaping up, but still needs some polishing. We're coming up on the application deadline, and we need to make a good, polished show of it. Please put in a little time to look at, edit, polish, and flesh out the BioPerl and OBF wiki pages in support of our application: BioPerl: http://bioperl.org/wiki/Google_Summer_of_Code OBF: http://open-bio.org/wiki/Google_Summer_of_Code Specific things for the BioPerl page, the Bio::Assembly project on that page needs to either be fleshed out or removed. Thanks for all the hard work from everyone so far (especially Chris!). It would be *very* good to have some more project ideas and mentor volunteers. So if you haven't already, please consider volunteering to mentor a student. Also, we all know many things that BioPerl needs help with, so if you can think of a good intern project, add it to the page and maybe we can get a GSoC student to work on it. Rob From nml5566 at gmail.com Wed Mar 10 22:52:19 2010 From: nml5566 at gmail.com (Nathan Liles) Date: Wed, 10 Mar 2010 16:52:19 -0600 Subject: [Bioperl-l] Can protein glyph tracks interfere with other tracks? Message-ID: <4B9822A3.2050202@gmail.com> I'm trying to patch Gbrowse to properly display circular segments. Currently, I'm working on getting the protein glyphs to display properly beyond the end of the track. I noticed when I turn on the protein track, it can sometimes affect another track. Specifically, turning on the protein track can either cause the gene glyphs to disappear or be duplicated. This only happens for features with two subfeatures that appear on the panel at opposite ends. This seems strange since I can't imagine how one track could affect another. Has anyone noticed this behavior before? Can anybody think of a way that the protein glyph module can affect other glyphs? Thanks, Nathan Liles From me at miguel.weapps.com Thu Mar 11 05:48:17 2010 From: me at miguel.weapps.com (Luis M Rodriguez-R) Date: Thu, 11 Mar 2010 00:48:17 -0500 Subject: [Bioperl-l] PSI-BLAST uncommon result Message-ID: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> Hello all, I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception: MSG: no data for midline ***** No hits found ****** STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792 My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail. Best regards, Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 luisrodr at uniandes.edu.co me at miguel.weapps.com BLASTP 2.2.18 [Mar-02-2008] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for compositional score matrix adjustment: Altschul, Stephen F., John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. Reference for composition-based statistics starting in round 2: Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. Query= eff254 (67 letters) Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 10,383,435 sequences; 3,542,477,638 total letters Searching..................................................done Results from round 1 Score E Sequences producing significant alignments: (bits) Value ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28 >ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin hrpA gb|ABA39805.1| HrpA [Erwinia pyrifoliae] emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM 12163] Length = 67 Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust. Identities = 67/67 (100%), Positives = 67/67 (100%) Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 Query: 61 AAKAIQF 67 AAKAIQF Sbjct: 61 AAKAIQF 67 Searching..................................................done ***** No hits found ****** Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 863,709,833 Number of sequences in database: 2,562,282 Database: /storage1/databases/ncbi-blast/nr.01 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 936,189,781 Number of sequences in database: 2,674,439 Database: /storage1/databases/ncbi-blast/nr.02 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 974,890,473 Number of sequences in database: 2,826,395 Database: /storage1/databases/ncbi-blast/nr.03 Posted date: Jan 24, 2010 4:41 AM Number of letters in database: 767,687,551 Number of sequences in database: 2,320,319 Lambda K H 0.297 0.107 0.256 Lambda K H 0.267 0.0344 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 480,706,425 Number of Sequences: 10383435 Number of extensions: 8598061 Number of successful extensions: 47335 Number of sequences better than 1.0e-25: 1 Number of HSP's better than 0.0 without gapping: 2 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 47333 Number of HSP's gapped (non-prelim): 2 length of query: 67 length of database: 3,542,477,638 effective HSP length: 39 effective length of query: 28 effective length of database: 3,137,523,673 effective search space: 87850662844 effective search space used: 87850662844 T: 11 A: 40 X1: 16 ( 6.9 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 43 (21.7 bits) S2: 298 (119.7 bits) From jason at bioperl.org Thu Mar 11 08:13:24 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 00:13:24 -0800 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: References: Message-ID: <4B98A624.7020102@bioperl.org> not sure if the cladogram is printing bootstraps from the internal id or the bootstrap function. See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how to automatically convert internal IDs to boostrap slots basically by using -internal_node_id => 'bootstrap' in the TreeIO initialization. You may want to iterate through the tree and print $node->bootstrap where you think it should be so you can verify that it is working too. -jason Alexander Donath wrote, On 3/9/10 10:00 AM: > Hi, > > using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and > bootstrap values and try to plot the tree as cladogram. But somehow I > cannot print the bootstrap values. > > Short example: > > test.nwk > ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); > > > > [..] > use Bio::TreeIO; > use Bio::Tree::Draw::Cladogram; > [..] > my $trees = Bio::TreeIO->new( -file => "test.nwk", > -format => 'newick'); > my $tree = $trees->next_tree(); > [..] > my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, > -tree => $tree, > -compact => 0); > > $out->print(-file => "test.eps"); > > > I already tried it by copying the bootstrap values into the ids of the > internal nodes - nothing. Any suggestions? > > > Thanks, > Alex > > --- > By the time you've read this, you've already read it! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 11 14:27:33 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 08:27:33 -0600 Subject: [Bioperl-l] PSI-BLAST uncommon result In-Reply-To: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> References: <049170A6-F83E-453A-A7B7-832E75916E9D@miguel.weapps.com> Message-ID: <70AF1FA5-FD88-48E3-A672-F72B9D3E1B3B@illinois.edu> Luis, The best way to handle this is to attach the problematic report (not append it) to a bug report on bugzilla. This ensures we aren't running into artifacts generated via the email client, etc. chris On Mar 10, 2010, at 11:48 PM, Luis M Rodriguez-R wrote: > Hello all, > > I'm having a weird result in PSI-BLAST (weird but possible) that can't be parsed by bioperl: 1 result in the first round (or identical results in the aligned regions) and no hits in the 2nd round. Bioperl thinks '*** No hits found ***' is a part of the alignment and dies with the exception: > MSG: no data for midline ***** No hits found ****** > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::SearchIO::blast::next_result /usr/local/share/perl/5.10.0/Bio/SearchIO/blast.pm:1792 > My workaround was to use the XML output, but it's still a bug (I think). I append the example PSI-BLAST output at the end of the mail. > > Best regards, > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > > BLASTP 2.2.18 [Mar-02-2008] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > > Reference for compositional score matrix adjustment: Altschul, Stephen F., > John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, > Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches > using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. > > > Reference for composition-based statistics starting in round 2: > Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, > Sergei Shavirin, John L. Spouge, Yuri I. Wolf, > Eugene V. Koonin, and Stephen F. Altschul (2001), > "Improving the accuracy of PSI-BLAST protein database searches with > composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > Query= eff254 > (67 letters) > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 10,383,435 sequences; 3,542,477,638 total letters > > Searching..................................................done > > > Results from round 1 > > > Score E > Sequences producing significant alignments: (bits) Value > > ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc se... 127 5e-28 > >> ref|YP_002650062.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation > pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] > sp|Q3HY20.1|HRPA_ERWPY RecName: Full=Hrp pili protein hrpA; AltName: Full=TTSS pilin > hrpA > gb|ABA39805.1| HrpA [Erwinia pyrifoliae] > emb|CAX56860.1| hrp/hrc Type III secretion system-Hrp/hrc secretion/translocation > pathway-hrp pilin [Erwinia pyrifoliae Ep1/96] > emb|CAY75708.1| Hrp pili protein HrpA (TTSS pilin HrpA) [Erwinia pyrifoliae DSM > 12163] > Length = 67 > > Score = 127 bits (318), Expect = 5e-28, Method: Compositional matrix adjust. > Identities = 67/67 (100%), Positives = 67/67 (100%) > > Query: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 > MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN > Sbjct: 1 MSGLLTSASSSASKTLESAMGQSLTESANAQASKMKMDTQNSILDGKMDSASKSLNSGHN 60 > > Query: 61 AAKAIQF 67 > AAKAIQF > Sbjct: 61 AAKAIQF 67 > > > Searching..................................................done > > > > ***** No hits found ****** > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 863,709,833 > Number of sequences in database: 2,562,282 > > Database: /storage1/databases/ncbi-blast/nr.01 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 936,189,781 > Number of sequences in database: 2,674,439 > > Database: /storage1/databases/ncbi-blast/nr.02 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 974,890,473 > Number of sequences in database: 2,826,395 > > Database: /storage1/databases/ncbi-blast/nr.03 > Posted date: Jan 24, 2010 4:41 AM > Number of letters in database: 767,687,551 > Number of sequences in database: 2,320,319 > > Lambda K H > 0.297 0.107 0.256 > > Lambda K H > 0.267 0.0344 0.140 > > > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Hits to DB: 480,706,425 > Number of Sequences: 10383435 > Number of extensions: 8598061 > Number of successful extensions: 47335 > Number of sequences better than 1.0e-25: 1 > Number of HSP's better than 0.0 without gapping: 2 > Number of HSP's successfully gapped in prelim test: 0 > Number of HSP's that attempted gapping in prelim test: 47333 > Number of HSP's gapped (non-prelim): 2 > length of query: 67 > length of database: 3,542,477,638 > effective HSP length: 39 > effective length of query: 28 > effective length of database: 3,137,523,673 > effective search space: 87850662844 > effective search space used: 87850662844 > T: 11 > A: 40 > X1: 16 ( 6.9 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 43 (21.7 bits) > S2: 298 (119.7 bits) > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Mar 11 15:38:50 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 07:38:50 -0800 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: References: <4B98A624.7020102@bioperl.org> Message-ID: <4B990E8A.5060704@bioperl.org> Yeah sorry then I don't know what the problem is. The usual - are you using the latest version question applies, but sounds like something else is wrong with this module. I don't have any time to try out any code sorry but maybe someone else can step in to give a hand. -jason Alexander Donath wrote, On 3/11/10 1:05 AM: > I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. > > Nevertheless, iterating through the tree and printing $node->bootstrap > worked in both cases and gave me the correct bootstrap values of the > inner nodes. > > I also called move_id_to_bootstrap on the tree. But this resulted in > an error: > > Can't locate object method "move_id_to_bootstrap" via package > "Bio::Tree::Tree". > Even though it's inherited from the interface, as far as I can tell. > > > alex > > > On Thu, 11 Mar 2010, Jason Stajich wrote: > >> not sure if the cladogram is printing bootstraps from the internal id >> or the bootstrap function. >> >> See the example code here http://bioperl.org/wiki/HOWTO:Trees that >> shows how to automatically convert internal IDs to boostrap slots >> basically by using >> -internal_node_id => 'bootstrap' >> in the TreeIO initialization. >> >> You may want to iterate through the tree and print $node->bootstrap >> where you think it should be so you can verify that it is working too. >> >> -jason >> >> Alexander Donath wrote, On 3/9/10 10:00 AM: >>> Hi, >>> >>> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths >>> and bootstrap values and try to plot the tree as cladogram. But >>> somehow I cannot print the bootstrap values. >>> >>> Short example: >>> >>> test.nwk >>> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >>> >>> >>> >>> >>> [..] >>> use Bio::TreeIO; >>> use Bio::Tree::Draw::Cladogram; >>> [..] >>> my $trees = Bio::TreeIO->new( -file => "test.nwk", >>> -format => 'newick'); >>> my $tree = $trees->next_tree(); >>> [..] >>> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >>> -tree => $tree, >>> -compact => 0); >>> >>> $out->print(-file => "test.eps"); >>> >>> >>> I already tried it by copying the bootstrap values into the ids of the >>> internal nodes - nothing. Any suggestions? >>> >>> >>> Thanks, >>> Alex >>> >>> --- >>> By the time you've read this, you've already read it! >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > --- > Alexander Donath > Professur f?r Bioinformatik > Institut f?r Informatik > Universit?t Leipzig > H?rtelstr. 16-18 > D-04107 Leipzig, Germany > > phone: +49 (0)341 97-16702 > fax: +49 (0)341 97-16679 > > By the time you've read this, you've already read it! From jason at bioperl.org Thu Mar 11 15:40:59 2010 From: jason at bioperl.org (Jason Stajich) Date: Thu, 11 Mar 2010 07:40:59 -0800 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: References: Message-ID: <4B990F0B.8010100@bioperl.org> You should only have TWO nodes in the array not all the leaves. =head2 distance Title : distance Usage : distance(-nodes => \@nodes ) Function: returns the distance between TWO given nodes Returns : numerical distance Args : -nodes => arrayref of nodes to test or ($node1, $node2) =cut Jeffrey Detras wrote, On 3/4/10 10:17 PM: > Hi, > > I am new at using the Bio::TreeIO module specifically using the newick > format for a phylogenetic analysis. The sample_tree attached is > Newick-formatted tree. My objective is to get all the distances between all > the leaf nodes. I copied examples of the code from > http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my > knowledge) so that I understand how to assign the right array value for the > nodes/leaves. The message would say must provide 2 root nodes. > > Here is what I have right now: > > #!/usr/bin/perl -w > use strict; > > my $treefile = 'sample_tree'; > use Bio::TreeIO; > my $treeio = Bio::TreeIO->new(-format => 'newick', > -file => $treefile); > > while (my $tree = $treeio->next_tree) { > my @leaves = $tree->get_leaf_nodes; > for (my $dist = $tree->distance(-nodes => \@leaves)){ > print "Distance between trees is $dist\n"; > } > } > > Thanks, > Jeff > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Mar 11 16:11:04 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 11:11:04 -0500 Subject: [Bioperl-l] Can protein glyph tracks interfere with other tracks? In-Reply-To: <4B9822A3.2050202@gmail.com> References: <4B9822A3.2050202@gmail.com> Message-ID: <4536f7701003110811s79c30638x100ae521bce1084a@mail.gmail.com> Hi Nathan, Well, it certainly shouldn't! The tracks are supposed to be calculated independently without reusing anything. Debugging should be fun though. Does it matter if you change the adaptor (for instance, if you are using the memory adaptor for Bio::DB::SeqFeature::Store, try putting it in a mysql database (or vice versa) to help narrow down where the bug is. Scott On Wed, Mar 10, 2010 at 5:52 PM, Nathan Liles wrote: > I'm trying to patch Gbrowse to properly display circular segments. > Currently, I'm working on getting the protein glyphs to display properly > beyond the end of the track. > > I noticed when I turn on the protein track, it can sometimes affect another > track. Specifically, turning on the protein track can either cause the gene > glyphs to disappear or be duplicated. > This only happens for features with two subfeatures that appear on the panel > at opposite ends. > > This seems strange since I can't imagine how one track could affect another. > Has anyone noticed this behavior before? > Can anybody think of a way that the protein glyph module can affect other > glyphs? > > Thanks, > Nathan Liles > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Thu Mar 11 16:21:02 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 10:21:02 -0600 Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: <4B990E8A.5060704@bioperl.org> References: <4B98A624.7020102@bioperl.org> <4B990E8A.5060704@bioperl.org> Message-ID: <2BBC0220-4233-4EB7-81A8-FA8342ED9714@illinois.edu> Alex, The best thing to do is to file this as a bug so we don't lose track of it, including demonstration code. chris On Mar 11, 2010, at 9:38 AM, Jason Stajich wrote: > Yeah sorry then I don't know what the problem is. The usual - are you using the latest version question applies, but sounds like something else is wrong with this module. > > I don't have any time to try out any code sorry but maybe someone else can step in to give a hand. > -jason > > Alexander Donath wrote, On 3/11/10 1:05 AM: >> I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. >> >> Nevertheless, iterating through the tree and printing $node->bootstrap worked in both cases and gave me the correct bootstrap values of the inner nodes. >> >> I also called move_id_to_bootstrap on the tree. But this resulted in an error: >> >> Can't locate object method "move_id_to_bootstrap" via package "Bio::Tree::Tree". >> Even though it's inherited from the interface, as far as I can tell. >> >> >> alex >> >> >> On Thu, 11 Mar 2010, Jason Stajich wrote: >> >>> not sure if the cladogram is printing bootstraps from the internal id or the bootstrap function. >>> >>> See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how to automatically convert internal IDs to boostrap slots basically by using >>> -internal_node_id => 'bootstrap' >>> in the TreeIO initialization. >>> >>> You may want to iterate through the tree and print $node->bootstrap where you think it should be so you can verify that it is working too. >>> >>> -jason >>> >>> Alexander Donath wrote, On 3/9/10 10:00 AM: >>>> Hi, >>>> >>>> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and bootstrap values and try to plot the tree as cladogram. But somehow I cannot print the bootstrap values. >>>> >>>> Short example: >>>> >>>> test.nwk >>>> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >>>> >>>> >>>> >>>> [..] >>>> use Bio::TreeIO; >>>> use Bio::Tree::Draw::Cladogram; >>>> [..] >>>> my $trees = Bio::TreeIO->new( -file => "test.nwk", >>>> -format => 'newick'); >>>> my $tree = $trees->next_tree(); >>>> [..] >>>> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >>>> -tree => $tree, >>>> -compact => 0); >>>> >>>> $out->print(-file => "test.eps"); >>>> >>>> >>>> I already tried it by copying the bootstrap values into the ids of the >>>> internal nodes - nothing. Any suggestions? >>>> >>>> >>>> Thanks, >>>> Alex >>>> >>>> --- >>>> By the time you've read this, you've already read it! >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> --- >> Alexander Donath >> Professur f?r Bioinformatik >> Institut f?r Informatik >> Universit?t Leipzig >> H?rtelstr. 16-18 >> D-04107 Leipzig, Germany >> >> phone: +49 (0)341 97-16702 >> fax: +49 (0)341 97-16679 >> >> By the time you've read this, you've already read it! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Mon Mar 8 21:06:11 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 08 Mar 2010 16:06:11 -0500 Subject: [Bioperl-l] Next Gen Formats Message-ID: <4B9566C3.6000007@umdnj.edu> Does Bioperl support color-space sequences, or FASTA formatted quality value files? ABI's Solid platform generates a number of files, two of which are fairly important (at the moment): 1) .csfasta Color-space sequences in FASTA format 2) .qual Quality values of each color call, also in FASTA format. I didn't see (at quick glance) support for this in Bioperl, but maybe someone can point me in the right direction? Ryan -------------- next part -------------- A non-text attachment was scrubbed... Name: golharam.vcf Type: text/x-vcard Size: 379 bytes Desc: not available URL: From alex at bioinf.uni-leipzig.de Thu Mar 11 09:05:13 2010 From: alex at bioinf.uni-leipzig.de (Alexander Donath) Date: Thu, 11 Mar 2010 10:05:13 +0100 (CET) Subject: [Bioperl-l] bootstrap values in cladogram In-Reply-To: <4B98A624.7020102@bioperl.org> References: <4B98A624.7020102@bioperl.org> Message-ID: I tried both, with -internal_node_id => 'bootstrap' and without. Nothing. Nevertheless, iterating through the tree and printing $node->bootstrap worked in both cases and gave me the correct bootstrap values of the inner nodes. I also called move_id_to_bootstrap on the tree. But this resulted in an error: Can't locate object method "move_id_to_bootstrap" via package "Bio::Tree::Tree". Even though it's inherited from the interface, as far as I can tell. alex On Thu, 11 Mar 2010, Jason Stajich wrote: > not sure if the cladogram is printing bootstraps from the internal id or the > bootstrap function. > > See the example code here http://bioperl.org/wiki/HOWTO:Trees that shows how > to automatically convert internal IDs to boostrap slots basically by using > -internal_node_id => 'bootstrap' > in the TreeIO initialization. > > You may want to iterate through the tree and print $node->bootstrap where you > think it should be so you can verify that it is working too. > > -jason > > Alexander Donath wrote, On 3/9/10 10:00 AM: >> Hi, >> >> using Bioperl 1.6.1, I'm reading a newick tree with branch lengths and >> bootstrap values and try to plot the tree as cladogram. But somehow I >> cannot print the bootstrap values. >> >> Short example: >> >> test.nwk >> ((seq_1:0.18484,seq_3:0.23183):0.17826[879],seq_2:0.36341,seq_4:0.30326); >> >> >> >> [..] >> use Bio::TreeIO; >> use Bio::Tree::Draw::Cladogram; >> [..] >> my $trees = Bio::TreeIO->new( -file => "test.nwk", >> -format => 'newick'); >> my $tree = $trees->next_tree(); >> [..] >> my $out = Bio::Tree::Draw::Cladogram->new( -bootstrap => 1, >> -tree => $tree, >> -compact => 0); >> >> $out->print(-file => "test.eps"); >> >> >> I already tried it by copying the bootstrap values into the ids of the >> internal nodes - nothing. Any suggestions? >> >> >> Thanks, >> Alex >> >> --- >> By the time you've read this, you've already read it! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l --- Alexander Donath Professur f?r Bioinformatik Institut f?r Informatik Universit?t Leipzig H?rtelstr. 16-18 D-04107 Leipzig, Germany phone: +49 (0)341 97-16702 fax: +49 (0)341 97-16679 By the time you've read this, you've already read it! From Alexander.Kanapin at oicr.on.ca Thu Mar 11 15:56:41 2010 From: Alexander.Kanapin at oicr.on.ca (Alexander Kanapin) Date: Thu, 11 Mar 2010 10:56:41 -0500 Subject: [Bioperl-l] GFF to GTF converter Message-ID: Hi BioPerl gurus, Does anybody knows a reliable GFF to GTF converter which can generate files acceptable by cufflinks ? We attempted to convert a drosophila and worm genome GFFs (taken from Flybase and Wormbase ftp) to GTF with Bio::FeatureIO #read from a file my $in = Bio::FeatureIO->new(-file => $infile , -format => 'GFF'); #write out features my $out = Bio::FeatureIO->new(-file => ">$outfile" , -format => 'GFF' , -version => 2.5); However, we discovered that the resulting file is not compliant with GTF format specifications as they are described here: http://mblab.wustl.edu/GTF22.html Although, this chunk of code produces CDS and exon entries in the output file, it does not output start codon/stop codon annotations. Also, we think it misinterprets annotations, so that one do see UTR entries annotated as CDS' or exons. Many thanks for ideas/notes. Alex -- Alexander Kanapin, PhD Scientific Associate Ontario Institute for Cancer Research MaRS Centre, South Tower 101 College Street, Suite 800 Toronto, Ontario, Canada M5G 0A3 Tel: 647-260-7993 Toll-free: 1-866-678-6427 www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From cjfields at illinois.edu Thu Mar 11 17:27:35 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 11:27:35 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9566C3.6000007@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> Message-ID: <7D743CA2-80A1-42E3-81D2-03B7CD01FC69@illinois.edu> Not that I know of, though we are certainly receptive to anyone wanting to work this into the current code. chris On Mar 8, 2010, at 3:06 PM, Ryan Golhar wrote: > Does Bioperl support color-space sequences, or FASTA formatted quality value files? > > ABI's Solid platform generates a number of files, two of which are fairly important (at the moment): > > 1) .csfasta > > Color-space sequences in FASTA format > > 2) .qual > > Quality values of each color call, also in FASTA format. > > I didn't see (at quick glance) support for this in Bioperl, but maybe someone can point me in the right direction? > > Ryan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Thu Mar 11 17:35:32 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Mar 2010 17:35:32 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9566C3.6000007@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> Message-ID: <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: > Does Bioperl support color-space sequences, or FASTA formatted quality value > files? > > ABI's Solid platform generates a number of files, two of which are fairly > important (at the moment): > > 1) ?.csfasta > > Color-space sequences in FASTA format > > 2) .qual > > Quality values of each color call, also in FASTA format. You mean the QUAL format which was originally introduced by PHRED. Try "qual" as the format name in SeqIO, http://bioperl.org/wiki/HOWTO:SeqIO#Formats > I didn't see (at quick glance) support for this in Bioperl, but maybe > someone can point me in the right direction? I expect that (like in Biopython) you can treat color space FASTA + QUAL just like sequence space files, provided you are happy to interpret the color space strings yourself. Are you hoping to get BioPerl to convert the color space data into sequence space data for you? Peter From cjfields at illinois.edu Thu Mar 11 18:02:43 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 12:02:43 -0600 Subject: [Bioperl-l] GFF to GTF converter In-Reply-To: References: Message-ID: <8CB58FD4-633F-4711-A2F4-23D00AEB6FB8@illinois.edu> On Mar 11, 2010, at 9:56 AM, Alexander Kanapin wrote: > Hi BioPerl gurus, > > Does anybody knows a reliable GFF to GTF converter which can generate files acceptable by cufflinks ? > > We attempted to convert a drosophila and worm genome GFFs (taken from Flybase and Wormbase ftp) to GTF with Bio::FeatureIO > > #read from a file > my $in = Bio::FeatureIO->new(-file => $infile , -format => 'GFF'); > > #write out features > my $out = Bio::FeatureIO->new(-file => ">$outfile" , > -format => 'GFF' , > -version => 2.5); > > However, we discovered that the resulting file is not compliant with GTF format specifications as they are described here: http://mblab.wustl.edu/GTF22.html Just so this is clear, even though the FeatureIO docs currently state (and I quote): "[Bio::FeatureIO] is the officially sanctioned way of getting at the format objects, which most people should use." it is nowhere near complete, so I have removed said quote from main trunk and replaced with it a very explicit caveat about it's current state, i.e. highly experimental and not currently suggested for production use. It's basically half-baked right now; I am in the midst of refactoring Bio::FeatureIO to try getting it up to speed and to add in flexibility when parsing this data (I'm actually working on it right now), but it's early days on that and may take a bit. Do realize that, even with a refactored FeatureIO, this is one of the more significant problems with GTF, e.g. there are too many definitions of what constitutes GTF or GFF2, so no clear path on how to go about this. At this point most users end up writing up their own parsers, unfortunately. > Although, this chunk of code produces CDS and exon entries in the output file, it does not output start codon/stop codon annotations. > Also, we think it misinterprets annotations, so that one do see UTR entries annotated as CDS' or exons. The start/stop codons can normally be inferred from the CDS/UTRs and exons if they are provided, but again this is one of those issues where there isn't a lot of consistency with the data across various data sources (something addressed at the recent GMOD meeting). What is the source of your GFF? > Many thanks for ideas/notes. > > Alex > > -- > Alexander Kanapin, PhD > Scientific Associate > > Ontario Institute for Cancer Research > MaRS Centre, South Tower > 101 College Street, Suite 800 > Toronto, Ontario, Canada M5G 0A3 > Tel: 647-260-7993 > Toll-free: 1-866-678-6427 > www.oicr.on.ca > This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. chris From jessica.sun at gmail.com Thu Mar 11 19:38:21 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Thu, 11 Mar 2010 14:38:21 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation Message-ID: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> *I downloaded module *>* > Bio-SCF from CPAN. *>* > And I am trying to install it when I got the following error. Can *>* someone help? Thanks much in advance Note (probably harmless): No library found for -lstaden-read Writing Makefile for Bio::SCF how to obtain the missing library * -- Jessica Jingping Sun From cjfields at illinois.edu Thu Mar 11 19:49:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 13:49:51 -0600 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <62CF899F-7C31-49F0-8F5E-C99B2179F3A5@illinois.edu> Did you read the documentation for Bio-SCF? http://cpansearch.perl.org/src/LDS/Bio-SCF-1.03/INSTALL chris On Mar 11, 2010, at 1:38 PM, Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Thu Mar 11 20:00:58 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 15:00:58 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <4536f7701003111200y7d194b3cp2aabb558dcbea5ca@mail.gmail.com> Hello Jessica, You need the Staden io-lib: http://staden.sourceforge.net/ It looks like 1.12.2 is the most recent release. Scott On Thu, Mar 11, 2010 at 2:38 PM, Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From rmb32 at cornell.edu Thu Mar 11 20:02:28 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 11 Mar 2010 12:02:28 -0800 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> Message-ID: <4B994C54.50501@cornell.edu> Hello Jessica, For Bio-SCF, you have to have the staden package installed. See the INSTALL notes included in the Bio-SCF distribution. The easiest way to view the INSTALL notes for a perl module's distribution: - go to http://search.cpan.org/ - search for 'Bio::SCF' - click the link to the Bio-SCF-1.03 distribution you see in the search results - the page linked here describes the installation package that Bio::SCF comes in. - On that page, you will see a link to the INSTALL notes for it. This is a good thing to know how to do when you have problems with other perl modules as well. But yes, as Chris said, those installation notes direct you to install the staden io-lib libraries from staden.sourceforge.net. Rob Jessica Sun wrote: > *I downloaded module > *>* > Bio-SCF from CPAN. > *>* > And I am trying to install it when I got the following error. Can > *>* someone help? Thanks much in advance > Note (probably harmless): No library found for -lstaden-read > Writing Makefile for Bio::SCF > > how to obtain the missing library > > > * > > > From jessica.sun at gmail.com Thu Mar 11 20:49:49 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Thu, 11 Mar 2010 15:49:49 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <4B994C54.50501@cornell.edu> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> <4B994C54.50501@cornell.edu> Message-ID: <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> Thanks, I got it resolve. Do any one knows how to add a scale of the blast hit image through Bio:Graphics, I mean the rectangle should be difference width rather than the same at the example. shown here http://www.bioperl.org/wiki/HOWTO:Graphics Thanks, On Thu, Mar 11, 2010 at 3:02 PM, Robert Buels wrote: > Hello Jessica, > > For Bio-SCF, you have to have the staden package installed. See the > INSTALL notes included in the Bio-SCF distribution. > > The easiest way to view the INSTALL notes for a perl module's distribution: > - go to http://search.cpan.org/ > - search for 'Bio::SCF' > - click the link to the Bio-SCF-1.03 distribution you see in the search > results > - the page linked here describes the installation package that Bio::SCF > comes in. > - On that page, you will see a link to the INSTALL notes for it. > > This is a good thing to know how to do when you have problems with other > perl modules as well. > > > But yes, as Chris said, those installation notes direct you to install the > staden io-lib libraries from staden.sourceforge.net. > > Rob > > Jessica Sun wrote: > >> *I downloaded module >> >> *>* > Bio-SCF from CPAN. >> *>* > And I am trying to install it when I got the following error. Can >> *>* someone help? Thanks much in advance >> Note (probably harmless): No library found for -lstaden-read >> Writing Makefile for Bio::SCF >> >> how to obtain the missing library >> >> >> * >> >> >> >> > -- Jessica Jingping Sun From scott at scottcain.net Thu Mar 11 21:33:47 2010 From: scott at scottcain.net (Scott Cain) Date: Thu, 11 Mar 2010 16:33:47 -0500 Subject: [Bioperl-l] Bio-SCF from CPAN == error installation In-Reply-To: <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> References: <9adc0e9b1003111138m4197ffb2x4031c107240a0cf9@mail.gmail.com> <4B994C54.50501@cornell.edu> <9adc0e9b1003111249n70dcd666nb88bd745ab87164c@mail.gmail.com> Message-ID: <4536f7701003111333q2105c71ftdab0c0b71372ba9f@mail.gmail.com> Hello Jessica, A few things: * It would be better to start a new thread to ask an unrelated question, since people may see the subject of this thread and ignore it if they don't know the answer to the original question. * Can you please try to ask your question again, with more details? Like what have you done already, what was the result, and what would you like for it to look like. If you want it to look like something that is on the wiki, link to that something. The Howto page you linked to has lots of pictures on it. Scott On Thu, Mar 11, 2010 at 3:49 PM, Jessica Sun wrote: > Thanks, I got it resolve. > > Do any one knows how to add a scale of the blast hit image through > Bio:Graphics, I mean the rectangle should be difference width rather than > the same at the example. shown here > > http://www.bioperl.org/wiki/HOWTO:Graphics > > > > Thanks, > > > > On Thu, Mar 11, 2010 at 3:02 PM, Robert Buels wrote: > >> Hello Jessica, >> >> For Bio-SCF, you have to have the staden package installed. ?See the >> INSTALL notes included in the Bio-SCF distribution. >> >> The easiest way to view the INSTALL notes for a perl module's distribution: >> ?- go to http://search.cpan.org/ >> ?- search for 'Bio::SCF' >> ?- click the link to the Bio-SCF-1.03 distribution you see in the search >> results >> ?- the page linked here describes the installation package that Bio::SCF >> comes in. >> ?- On that page, you will see a link to the INSTALL notes for it. >> >> This is a good thing to know how to do when you have problems with other >> perl modules as well. >> >> >> But yes, as Chris said, those installation notes direct you to install the >> staden io-lib libraries from staden.sourceforge.net. >> >> Rob >> >> Jessica Sun wrote: >> >>> *I downloaded module >>> >>> *>* > Bio-SCF from CPAN. >>> *>* > And I am trying to install it when I got the following error. Can >>> *>* someone help? Thanks much in advance >>> Note (probably harmless): No library found for -lstaden-read >>> Writing Makefile for Bio::SCF >>> >>> how to obtain the missing library >>> >>> >>> * >>> >>> >>> >>> >> > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From golharam at umdnj.edu Fri Mar 12 02:19:37 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 11 Mar 2010 21:19:37 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> Message-ID: <4B99A4B9.1070901@umdnj.edu> Not convert the sequences, just read the sequence file and allow me to process each one individually, sort of like: $seqio = new Bio::Seq(...) while ($seq = $seqio->next_seq) { ... } Peter wrote: > On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: >> Does Bioperl support color-space sequences, or FASTA formatted quality value >> files? >> >> ABI's Solid platform generates a number of files, two of which are fairly >> important (at the moment): >> >> 1) .csfasta >> >> Color-space sequences in FASTA format >> >> 2) .qual >> >> Quality values of each color call, also in FASTA format. > > You mean the QUAL format which was originally introduced by PHRED. > Try "qual" as the format name in SeqIO, > http://bioperl.org/wiki/HOWTO:SeqIO#Formats > >> I didn't see (at quick glance) support for this in Bioperl, but maybe >> someone can point me in the right direction? > > I expect that (like in Biopython) you can treat color space FASTA + QUAL > just like sequence space files, provided you are happy to interpret the > color space strings yourself. > > Are you hoping to get BioPerl to convert the color space data into > sequence space data for you? > > Peter > From cjfields at illinois.edu Fri Mar 12 03:35:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 11 Mar 2010 21:35:50 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B99A4B9.1070901@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> Message-ID: Ryan, We would have to see example files to get an idea of how feasible it is. You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual stream, and interleave the two somehow. However, BioPerl qual scores are PHRED-based by default, and I'm not sure how color-space data would work within that schematic. chris On Mar 11, 2010, at 8:19 PM, Ryan Golhar wrote: > Not convert the sequences, just read the sequence file and allow me to > process each one individually, sort of like: > > $seqio = new Bio::Seq(...) > while ($seq = $seqio->next_seq) { > ... > } > > Peter wrote: >> On Mon, Mar 8, 2010 at 9:06 PM, Ryan Golhar wrote: >>> Does Bioperl support color-space sequences, or FASTA formatted quality value >>> files? >>> >>> ABI's Solid platform generates a number of files, two of which are fairly >>> important (at the moment): >>> >>> 1) .csfasta >>> >>> Color-space sequences in FASTA format >>> >>> 2) .qual >>> >>> Quality values of each color call, also in FASTA format. >> You mean the QUAL format which was originally introduced by PHRED. >> Try "qual" as the format name in SeqIO, >> http://bioperl.org/wiki/HOWTO:SeqIO#Formats >>> I didn't see (at quick glance) support for this in Bioperl, but maybe >>> someone can point me in the right direction? >> I expect that (like in Biopython) you can treat color space FASTA + QUAL >> just like sequence space files, provided you are happy to interpret the >> color space strings yourself. >> Are you hoping to get BioPerl to convert the color space data into >> sequence space data for you? >> Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From avilella at gmail.com Fri Mar 12 07:28:20 2010 From: avilella at gmail.com (Albert Vilella) Date: Fri, 12 Mar 2010 07:28:20 +0000 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A3969F1.8080002@sendu.me.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <4A3933D0.4040808@sendu.me.uk> <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu> <4A3969F1.8080002@sendu.me.uk> Message-ID: <358f4d651003112328g2864ef1as7b8c44ce7bb77c82@mail.gmail.com> > I think not. Well, at least SeqFeature::Store doesn't scale. Try storing > millions of features in a database and watch it crawl to complete > unusability. I can't imagine a db scaling to holding hundreds of TB of data > either. I'm also not sure what the benefit is. There are already high-speed > ways of indexing your fastq or bam files. Hi Sendu, What are the available options to have a quick indexing of fastq files that can be integrated into bioperl? Bio::Index::fastq can be painfully slow for the latest Illumina runs... Cheers, Albert. From biopython at maubp.freeserve.co.uk Fri Mar 12 10:06:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Mar 2010 10:06:46 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> Message-ID: <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: > Ryan, > > We would have to see example files to get an idea of how feasible it is. >?You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual > stream, and interleave the two somehow. ?However, BioPerl qual > scores are PHRED-based by default, and I'm not sure how color-space > data would work within that schematic. > > chris Chris, I am under the (possibly mistaken) assumption that PHRED scores are used for SOLiD color space QUAL files - the key issue is each score corresponds to the color call in the color sequence. Ignoring color-space for a moment, are there BioPerl examples of iterating over a pair of sequence-space FASTA and QUAL files? i.e. What you'd get if you had a FASTQ file to iterate over. [I guess Ryan could just merge the color-space FASTA and QUAL into a color-space FASTQ file and iterate over that] Peter From cjfields at illinois.edu Fri Mar 12 13:04:53 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 12 Mar 2010 07:04:53 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> Message-ID: <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> On Mar 12, 2010, at 4:06 AM, Peter wrote: > On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: >> Ryan, >> >> We would have to see example files to get an idea of how feasible it is. >> You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual >> stream, and interleave the two somehow. However, BioPerl qual >> scores are PHRED-based by default, and I'm not sure how color-space >> data would work within that schematic. >> >> chris > > Chris, > > I am under the (possibly mistaken) assumption that PHRED scores > are used for SOLiD color space QUAL files - the key issue is each > score corresponds to the color call in the color sequence. > > Ignoring color-space for a moment, are there BioPerl examples > of iterating over a pair of sequence-space FASTA and QUAL files? > i.e. What you'd get if you had a FASTQ file to iterate over. > > [I guess Ryan could just merge the color-space FASTA and > QUAL into a color-space FASTQ file and iterate over that] > > Peter If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things. Iterating over pairs is something that has popped up before. For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like: -------------------------------- my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq'); my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta'); my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual'); while (my $seq = $in->next_seq) { $out1->write_fasta($seq); $out2->write_fasta($seq); } -------------------------------- Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'. This should work for those as well, just haven't tried it myself (it's a bug otherwise). I'm assuming for input it would be something like: -------------------------------- my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta'); my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual'); my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq'); # 'qual' parser joins the two streams while (my $seq = $in2->next_seq($in1)) { $out->write_seq($seq); } -------------------------------- chris From biopython at maubp.freeserve.co.uk Fri Mar 12 13:26:39 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Mar 2010 13:26:39 +0000 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4B9A3D14.3010208@umdnj.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> <4B9A3D14.3010208@umdnj.edu> Message-ID: <320fb6e01003120526x7c0c3dddjb4e1422a41968894@mail.gmail.com> On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: > > Here is an example of a color-space sequence: > > In one file (something.csfasta): > >>1_30_226_F3 > T210320010.200.03.0110320320220212200122200.2220200 >>1_30_252_F3 > T322220212.133.00.2202322132022202221002011.0011020 > > The '.' means the color could not be called > > In another file (something.qual): > >>1_30_226_F3 > 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 > 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>1_30_252_F3 > 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 > 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 > > The -1 represents those colors that could not be called. Now that is funny (using -1). True PHRED scores are defined with a logarithm and can't be negative. A score of zero is normally used in this situation since that maps to a probability of error of 1 (i.e. the read is 100% wrong, or 0% true). Where did these files come from? Direct from a sequencing machine or via some third party script? Peter From golharam at umdnj.edu Fri Mar 12 13:43:01 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 12 Mar 2010 13:43:01 +0000 Subject: [Bioperl-l] Next Gen Formats Message-ID: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Direct from sequencing machine ------Original Message------ From: Peter Sender: p.j.a.cock at googlemail.com To: golharam at umdnj.edu Cc: Chris Fields Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Next Gen Formats Sent: Mar 12, 2010 8:26 AM On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: > > Here is an example of a color-space sequence: > > In one file (something.csfasta): > >>1_30_226_F3 > T210320010.200.03.0110320320220212200122200.2220200 >>1_30_252_F3 > T322220212.133.00.2202322132022202221002011.0011020 > > The '.' means the color could not be called > > In another file (something.qual): > >>1_30_226_F3 > 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 > 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>1_30_252_F3 > 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 > 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 > > The -1 represents those colors that could not be called. Now that is funny (using -1). True PHRED scores are defined with a logarithm and can't be negative. A score of zero is normally used in this situation since that maps to a probability of error of 1 (i.e. the read is 100% wrong, or 0% true). Where did these files come from? Direct from a sequencing machine or via some third party script? Peter Sent from my Verizon Wireless BlackBerry From cjfields at illinois.edu Fri Mar 12 14:06:51 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 12 Mar 2010 08:06:51 -0600 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> References: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Message-ID: For the colorspace fasta we could derive a parser just for that based on the current fasta parser. They could retain their original color space designation (maybe via a meta designation), and possibly convert to sequence calls based on their mapping (if the following link is current): http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf Did the sequencing facility provide the actual sequence, though, and not just the color calls and qual? Seems strange to not provide it... chris On Mar 12, 2010, at 7:43 AM, Ryan Golhar wrote: > Direct from sequencing machine > > ------Original Message------ > From: Peter > Sender: p.j.a.cock at googlemail.com > To: golharam at umdnj.edu > Cc: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Next Gen Formats > Sent: Mar 12, 2010 8:26 AM > > On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: >> >> Here is an example of a color-space sequence: >> >> In one file (something.csfasta): >> >>> 1_30_226_F3 >> T210320010.200.03.0110320320220212200122200.2220200 >>> 1_30_252_F3 >> T322220212.133.00.2202322132022202221002011.0011020 >> >> The '.' means the color could not be called >> >> In another file (something.qual): >> >>> 1_30_226_F3 >> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 >> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>> 1_30_252_F3 >> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 >> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 >> >> The -1 represents those colors that could not be called. > > Now that is funny (using -1). True PHRED scores are defined with a > logarithm and can't be negative. A score of zero is normally used in > this situation since that maps to a probability of error of 1 (i.e. the > read is 100% wrong, or 0% true). > > Where did these files come from? Direct from a sequencing > machine or via some third party script? > > Peter > > > Sent from my Verizon Wireless BlackBerry > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Fri Mar 12 13:09:40 2010 From: golharam at umdnj.edu (Ryan Golhar) Date: Fri, 12 Mar 2010 08:09:40 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> References: <4B9566C3.6000007@umdnj.edu> <320fb6e01003110935t31f7c00an3f33078cfe7c7a1f@mail.gmail.com> <4B99A4B9.1070901@umdnj.edu> <320fb6e01003120206i90a3762if47d0ddd427b9d31@mail.gmail.com> <4F965F47-43DD-4527-8E61-FDCDD4E2AFA8@illinois.edu> Message-ID: <4B9A3D14.3010208@umdnj.edu> Here is an example of a color-space sequence: In one file (something.csfasta): >1_30_226_F3 T210320010.200.03.0110320320220212200122200.2220200 >1_30_252_F3 T322220212.133.00.2202322132022202221002011.0011020 The '.' means the color could not be called In another file (something.qual): >1_30_226_F3 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >1_30_252_F3 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 The -1 represents those colors that could not be called. Chris Fields wrote: > On Mar 12, 2010, at 4:06 AM, Peter wrote: > >> On Fri, Mar 12, 2010 at 3:35 AM, Chris Fields wrote: >>> Ryan, >>> >>> We would have to see example files to get an idea of how feasible it is. >>> You could possibly use a Bio::SeqIO::fasta and a Bio::SeqIO::qual >>> stream, and interleave the two somehow. However, BioPerl qual >>> scores are PHRED-based by default, and I'm not sure how color-space >>> data would work within that schematic. >>> >>> chris >> Chris, >> >> I am under the (possibly mistaken) assumption that PHRED scores >> are used for SOLiD color space QUAL files - the key issue is each >> score corresponds to the color call in the color sequence. >> >> Ignoring color-space for a moment, are there BioPerl examples >> of iterating over a pair of sequence-space FASTA and QUAL files? >> i.e. What you'd get if you had a FASTQ file to iterate over. >> >> [I guess Ryan could just merge the color-space FASTA and >> QUAL into a color-space FASTQ file and iterate over that] >> >> Peter > > If they're PHRED scores then it should be fine, though we may need to work in a few color-space specific things. > > Iterating over pairs is something that has popped up before. For output, in the Bio::SeqIO::fastq module there is code for writing fasta/qual (to two separate streams), where I'm assuming one could do something like: > > -------------------------------- > my $in = Bio::SeqIO->new(-format => 'fastq', -file => 'foo.fastq'); > my $out1 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fasta'); > my $out2 = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.qual'); > > while (my $seq = $in->next_seq) { > $out1->write_fasta($seq); > $out2->write_fasta($seq); > } > -------------------------------- > > Note that all use the 'fastq' formatm instead of 'fasta' or 'qual'. This should work for those as well, just haven't tried it myself (it's a bug otherwise). > > I'm assuming for input it would be something like: > > -------------------------------- > my $in1 = Bio::SeqIO->new(-format => 'fasta', -file => 'foo.fasta'); > my $in2 = Bio::SeqIO->new(-format => 'qual', -file => 'foo.qual'); > my $out = Bio::SeqIO->new(-format => 'fastq', -file => '>foo.fastq'); > > # 'qual' parser joins the two streams > while (my $seq = $in2->next_seq($in1)) { > $out->write_seq($seq); > } > -------------------------------- > > chris > > From pmiguel at purdue.edu Fri Mar 12 14:56:33 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 12 Mar 2010 09:56:33 -0500 Subject: [Bioperl-l] Next Gen Formats In-Reply-To: References: <1094748451-1268401286-cardhu_decombobulator_blackberry.rim.net-348598184-@bda413.bisx.prod.on.blackberry> Message-ID: <4B9A5621.2020006@purdue.edu> Hi Chris, Converting back and forth from color space is something that would be needed. However, a warning for anyone working with color space data: It is a really bad idea to convert raw color space reads into sequence. This is because conversion propagates from the key base on the left to the right. A sequence error *anywhere* in the sequence will ensure all bases farther down will be converted on the wrong track. Analogous to a "frame shift" -- except there are 4 "frames", not 3. Meanwhile, the converse is not true--sequence space bases can be converted into color space without error propagation. So you want to do all your work in color space and convert to real sequence only at the end, when your consensus certain. A little more detail here: http://seqanswers.com/forums/showthread.php?t=3367 For people wanting to use a non-color space aware program for analysis of color space data, it is possible to use a process called "double encoding", where 0,1,2,3 bases of color space are just replaced with A, C, G, T of a "fake" base space. This is nearly the same as working in color space and does not incur the propagation error issues. However it is fraught with the obvious problems: you might later confuse the double encoded sequence with true sequence space with likely maddening results. Also, to get the opposite strand of color space reads you reverse without complementing. So top and bottom strands will look different. Finally, Kevin McKernan said that the dual base encoding error-detection scheme was technically using "Perforated Convolutional Codes" and said these were used on 3G networks. I only mention this in case there are some engineering types who might be interested. Phillip Chris Fields wrote: > For the colorspace fasta we could derive a parser just for that based on the current fasta parser. They could retain their original color space designation (maybe via a meta designation), and possibly convert to sequence calls based on their mapping (if the following link is current): > > http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf > > Did the sequencing facility provide the actual sequence, though, and not just the color calls and qual? Seems strange to not provide it... > > chris > > On Mar 12, 2010, at 7:43 AM, Ryan Golhar wrote: > > >> Direct from sequencing machine >> >> ------Original Message------ >> From: Peter >> Sender: p.j.a.cock at googlemail.com >> To: golharam at umdnj.edu >> Cc: Chris Fields >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Next Gen Formats >> Sent: Mar 12, 2010 8:26 AM >> >> On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar wrote: >> >>> Here is an example of a color-space sequence: >>> >>> In one file (something.csfasta): >>> >>> >>>> 1_30_226_F3 >>>> >>> T210320010.200.03.0110320320220212200122200.2220200 >>> >>>> 1_30_252_F3 >>>> >>> T322220212.133.00.2202322132022202221002011.0011020 >>> >>> The '.' means the color could not be called >>> >>> In another file (something.qual): >>> >>> >>>> 1_30_226_F3 >>>> >>> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20 >>> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4 >>> >>>> 1_30_252_F3 >>>> >>> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5 >>> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12 >>> >>> The -1 represents those colors that could not be called. >>> >> Now that is funny (using -1). True PHRED scores are defined with a >> logarithm and can't be negative. A score of zero is normally used in >> this situation since that maps to a probability of error of 1 (i.e. the >> read is 100% wrong, or 0% true). >> >> Where did these files come from? Direct from a sequencing >> machine or via some third party script? >> >> Peter >> >> >> Sent from my Verizon Wireless BlackBerry >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Fri Mar 12 15:44:35 2010 From: jason at bioperl.org (Jason Stajich) Date: Fri, 12 Mar 2010 07:44:35 -0800 Subject: [Bioperl-l] Bio::SearchIO In-Reply-To: <30E5CA8A-56DE-4764-9A50-DF2E95015216@gmail.com> References: <4B96B442.8070003@bioperl.org> <30E5CA8A-56DE-4764-9A50-DF2E95015216@gmail.com> Message-ID: <4B9A6163.9060407@bioperl.org> I'm sure it does, that what it is supposed to do. I don't know that there is any way to directly get what you want but the code since the format that you want is not a standard multiple-alignment output format. You might consider clustalw format which shows the identical columns with '*' and you can keep the start/stop of the alignment embedded in the sequence names. Or you can extract the code you need that does the writing out of the writer module so you can try and dig out what you need. You're asking for something that is a customized view that is not standard and the tools for it are in the existing code, so it means you need to roll your view own from it. This would just mean another ResultWriter module that looks a lot like the existing one, but doesn't write the header and footer and hit table out - so those methods would just not do anything... -jason Janine Arloth wrote, On 3/12/10 12:40 AM: > Hi, > thanks... > but > > use Bio::SearchIO; > use Bio::SearchIO::Writer::TextResultWriter; > > my $in = Bio::SearchIO->new(-format => 'blast', > -file => shift @ARGV); > > my $writer = Bio::SearchIO::Writer::TextResultWriter->new(); > my $out = Bio::SearchIO->new(-writer => $writer); > $out->write_result($in->next_result); > > gives me the whole result, but I only need the alignment ;( > Am 09.03.2010 um 21:49 schrieb Jason Stajich: > > >> SearchIO writer -> BLAST format. presumably something like Bio::SearchIO::Writer::TextResultWriter >> >> Janine Arloth wrote, On 3/5/10 1:43 AM: >> >>> Hello, >>> using the example from http://www.bioperl.org/wiki/HOWTO:SearchIO -> Format msf I only got such an alignment: >>> >>> 1 50 >>> test/1-85 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA >>> lcl|3013/20-104 ATGTGTGCAT ACATGTGTAA TCATCCTTGC TCCCCAGCAT CAGAGAATGA >>> >>> >>> 51 100 >>> test/1-85 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA >>> lcl|3013/20-104 TCTCTCCTTA TGGCCTTTTG TCTTTCTCCA AAGCA >>> >>> >>> >>> But I prefer this format: >>> >>> >>> >>> Query 1 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 60 >>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct 20 ATGTGTGCATACATGTGTAATCATCCTTGCTCCCCAGCATCAGAGAATGATCTCTCCTTA 79 >>> >>> Query 61 TGGCCTTTTGTCTTTCTCCAAAGCA 85 >>> ||||||||||||||||||||||||| >>> Sbjct 80 TGGCCTTTTGTCTTTCTCCAAAGCA 104 >>> >>> >>> How can I get this? >>> >>> Best Regards >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> > > From maj at fortinbras.us Fri Mar 12 15:45:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 12 Mar 2010 10:45:15 -0500 Subject: [Bioperl-l] distances between leaf nodes In-Reply-To: References: Message-ID: <31AA49FD0FDD466CB349ABAE75591B26@NewLife> along with Jason's comment then you'll need to loop through the node pairs by hand: my @leaves = $tree->get_leaf_nodes; my @dists; while (my $l = shift @leaves) { foreach my $m (@leaves) { push @dists, $tree->distance( -nodes => [$l, $m] ); } } should give you all n(n-1)/2 pairwise distances. ----- Original Message ----- From: "Jeffrey Detras" To: Sent: Friday, March 05, 2010 1:17 AM Subject: [Bioperl-l] distances between leaf nodes > Hi, > > I am new at using the Bio::TreeIO module specifically using the newick > format for a phylogenetic analysis. The sample_tree attached is > Newick-formatted tree. My objective is to get all the distances between all > the leaf nodes. I copied examples of the code from > http://www.bioperl.org/wiki/HOWTO:Trees but it does not tell me much (to my > knowledge) so that I understand how to assign the right array value for the > nodes/leaves. The message would say must provide 2 root nodes. > > Here is what I have right now: > > #!/usr/bin/perl -w > use strict; > > my $treefile = 'sample_tree'; > use Bio::TreeIO; > my $treeio = Bio::TreeIO->new(-format => 'newick', > -file => $treefile); > > while (my $tree = $treeio->next_tree) { > my @leaves = $tree->get_leaf_nodes; > for (my $dist = $tree->distance(-nodes => \@leaves)){ > print "Distance between trees is $dist\n"; > } > } > > Thanks, > Jeff > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rtbio.2009 at gmail.com Fri Mar 12 17:36:44 2010 From: rtbio.2009 at gmail.com (Roopa Raghuveer) Date: Fri, 12 Mar 2010 18:36:44 +0100 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Hello all, I am trying remote blast program and connecting to NCBI Blast, but I am unable to retrieve the sequences. Chris had suggested me to update from SVN. Could you please tell me how to update it from SVN? Regards, Roopa. On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: > Hi Chris, > > Thank you very much for the information. Could you please tell me how to > update it from SVN? > > Thanks and regards, > Roopa > > > On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: > >> Roopa, >> >> I committed a fix for this a few days ago; if you update from SVN it >> should work. The problem stemmed from server-side changes at NCBI. >> >> chris >> >> On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: >> >> > Hello Mark and everybody, >> > >> > I have been trying to connect to remote blast to retrieve similar >> sequences >> > to a given sequence. But my program is unable to retrieve the sequences >> from >> > BLAST, i.e., it is getting executed till the remote blast ids, but it is >> not >> > entering the else loop after collecting the rid. Please check this >> problem >> > and help me in this regard. I think the problem is in getting the >> sequence >> > and going to the 'else' part. i.e., >> > >> > else { >> > >> > open(OUTFILE,'>',$blastdebugfile); # I think the problem >> is >> > in else part, i.e., it is not taking the next result.# >> > print OUTFILE "else entered"; >> > close(OUTFILE); >> > >> > my $result = $rc->next_result(); >> > >> > #save the output >> > >> > Please give me your reply. >> > >> > Thanks and regards, >> > Roopa. >> > >> > My code is as follows. >> > >> > #!/usr/bin/perl >> > >> > #path for extra camel module >> > use lib "/srv/www/htdocs/rain/RNAi/"; >> > use rnai_blast; >> > >> > >> > use Bio::SearchIO; >> > use Bio::Search::Result::BlastResult; >> > use Bio::Perl; >> > use Bio::Tools::Run::RemoteBlast; >> > use Bio::Seq; >> > use Bio::SeqIO; >> > use Bio::DB::GenBank; >> > >> > $serverpath = "/srv/www/htdocs/rain/RNAi"; >> > $serverurl = "http://141.84.66.66/rain/RNAi"; >> > $outfile = $serverpath."/rnairesult_".time().".html"; >> > $nuc = $serverpath."/nuc".time().".txt"; >> > $debugfile = $serverpath."/debug_".time().".txt"; >> > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> > >> > my $outstring =""; >> > >> > &parse_form; >> > >> > print "Content-type: text/html\n\n"; >> > print "\n"; >> > print "RNAi Result"; >> > print "> > URL=$serverurl/rnairesult_".time().".html\"> \n"; >> > print "\n"; >> > print "\n"; >> > print " Your results will appear > > href=$serverurl/rnairesult_".time().".html>here
"; >> > print " Please be patient, runtime can be up to 5 minutes
"; >> > print " This page will automatically reload in 30 seconds."; >> > print "\n"; >> > print "\n"; >> > >> > defined(my $pid = fork) or die "Can't fork: $!"; >> > exit if $pid; >> > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >> > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >> > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >> > >> > >> > >> > open(OUTFILE, '>',$outfile); >> > >> > print OUTFILE "\n >> > RNAi Result >> > > > URL=$serverurl//rnairesult_".time().".html\"> \n >> > >> > \n >> > \n >> > Your results will appear > > href=$serverurl/rnairesult_".time().".html>here
>> > Please be patient, runtime can be up to 5 minutes
>> > This page will automatically reload in 30 seconds
>> > \n >> > \n"; >> > >> > close(OUTFILE); >> > >> > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >> > >> > $in{'Inputseq'} =~ s/>.*$//m; >> > $in{'Inputseq'} =~ s/[^TAGC]//gim; >> > $in{'Inputseq'} =~ tr/actg/ACTG/; >> > >> > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >> > $in{'Threshold'}); >> > >> > >> > sub blastcode >> > { >> > >> > $inpu1= $_[0]; >> > >> > $organ= $_[1]; >> > >> > open(NUC,'>',$nuc); >> > print NUC $inpu1,"\n"; >> > close(NUC); >> > >> > my $prog = 'blastn'; >> > my $db = 'refseq_rna'; >> > my $e_val= '1e-10'; >> > my $organism= $organ; >> > >> > $gb = new Bio::DB::GenBank; >> > >> > my @params = ( '-prog' => $prog, >> > '-data' => $db, >> > '-expect' => $e_val, >> > '-readmethod' => 'SearchIO', >> > '-Organism' => $organism ); >> > >> > open(OUTFILE,'>',$blastdebugfile); >> > print OUTFILE @params; >> > close(OUTFILE); >> > >> > >> > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >> => >> > "$organ\[ORGN]"); >> > >> > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> > >> > #change a paramter >> > >> > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >> > Brucei[ORGN]'; >> > >> > #change a paramter >> > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >> '$input2[ORGN]'; >> > >> > my $v = 1; >> > #$v is just to turn on and off the messages >> > >> > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >> > '-organism' => "$organ\[ORGN]"); >> > >> > while (my $input = $str->next_seq()) >> > { >> > #Blast a sequence against a database: >> > #Alternatively, you could pass in a file with many >> > #sequences rather than loop through sequence one at a time >> > #Remove the loop starting 'while (my $input = $str->next_seq())' >> > #and swap the two lines below for an example of that. >> > open(OUTFILE,'>',$debugfile); >> > print OUTFILE $input; >> > close(OUTFILE); >> > >> > #submits the input data to BLAST# >> > >> > my $r = $factory->submit_blast($input); >> > >> > open(OUTFILE,'>',$debugfile); >> > print OUTFILE $r; >> > close(OUTFILE); >> > >> > >> > print STDERR "waiting...." if($v>0); >> > >> > while ( my @rids = $factory->each_rid ) { >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "while entered"; >> > close(OUTFILE); >> > foreach my $rid ( @rids ) { >> > >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "foreach entered"; >> > close(OUTFILE); >> > #Retrieving the result ids# >> > >> > my $rc = $factory->retrieve_blast($rid); >> > >> > if( !ref($rc) ) >> > { >> > if( $rc < 0 ) >> > { >> > $factory->remove_rid($rid); >> > } >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "if entered"; >> > close(OUTFILE); >> > print STDERR "." if ( $v > 0 ); >> > sleep 5; >> > } >> > >> > else { >> > >> > open(OUTFILE,'>',$blastdebugfile); # I think the problem >> is >> > in else part, i.e., it is not taking the next result.# >> > print OUTFILE "else entered"; >> > close(OUTFILE); >> > >> > my $result = $rc->next_result(); >> > >> > #save the output >> > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >> > >> > open(BLASTDEBUGFILE,'>',$blastdebugfile); >> > print BLASTDEBUGFILE $result->next_hit(); >> > close(BLASTDEBUGFILE); >> > #saving the output in blastdata.time.out file# >> > >> > # $random=rand(); >> > >> > my $filename = $serverpath."/blastdata_".time()."\.out"; >> > # open(DEBUGFILE,'>',$debugfile); >> > # open(new,'>',$filename); >> > # @arra=; >> > # print DEBUGFILE @arra; >> > # close(DEBUGFILE); >> > # close(new); >> > >> > $factory->save_output($filename); >> > >> > # open(BLASTDEBUGFILE,'>',$debugfile); >> > # print BLASTDEBUGFILE "Hello $rid"; >> > # close(BLASTDEBUGFILE); >> > >> > $factory->remove_rid($rid); >> > >> > open(BLASTDEBUGFILE,'>',$blastdebugfile); >> > # print BLASTDEBUGFILE $organism; >> > close(BLASTDEBUGFILE); >> > >> > # open(OUTFILE,'>',$outfile); >> > # print OUTFILE "Test2 $result->database_name()"; >> > # close(OUTFILE); >> > >> > #$hit = $result->next_hit; >> > #open(new,'>',$debugfile); >> > #print $hit; >> > #close(new); >> > $dummy=0; >> > while ( my $hit = $result->next_hit ) { >> > >> > next unless ( $v >= 0); >> > >> > # open(OUTFILE,'>',$debugfile); >> > # print OUTFILE "$hit in while hits"; >> > # close(OUTFILE); >> > >> > my $sequ = $gb->get_Seq_by_version($hit->name); >> > my $dna = $sequ->seq(); # get the sequence as a string >> > $dummy++; >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE $dna; >> > close(OUTFILE); >> > push(@seqs,$dna); >> > } >> > } >> > } >> > } >> > } >> > >> > $warum=@seqs; >> > open(OUTFILE,'>',$debugfile); >> > # print OUTFILE $warum; >> > print OUTFILE @seqs; >> > close(OUTFILE); >> > >> > >> > return(@seqs); #returning the sequences obtained on BLAST# >> > } >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From bosborne11 at verizon.net Fri Mar 12 17:46:52 2010 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 12 Mar 2010 12:46:52 -0500 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Please google "svn update bioperl". On Mar 12, 2010, at 12:36 PM, Roopa Raghuveer wrote: > Hello all, > > I am trying remote blast program and connecting to NCBI Blast, but I am > unable to retrieve the sequences. Chris had suggested me to update from SVN. > Could you please tell me how to update it from SVN? > > Regards, > Roopa. > > On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: > >> Hi Chris, >> >> Thank you very much for the information. Could you please tell me how to >> update it from SVN? >> >> Thanks and regards, >> Roopa >> >> >> On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: >> >>> Roopa, >>> >>> I committed a fix for this a few days ago; if you update from SVN it >>> should work. The problem stemmed from server-side changes at NCBI. >>> >>> chris >>> >>> On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: >>> >>>> Hello Mark and everybody, >>>> >>>> I have been trying to connect to remote blast to retrieve similar >>> sequences >>>> to a given sequence. But my program is unable to retrieve the sequences >>> from >>>> BLAST, i.e., it is getting executed till the remote blast ids, but it is >>> not >>>> entering the else loop after collecting the rid. Please check this >>> problem >>>> and help me in this regard. I think the problem is in getting the >>> sequence >>>> and going to the 'else' part. i.e., >>>> >>>> else { >>>> >>>> open(OUTFILE,'>',$blastdebugfile); # I think the problem >>> is >>>> in else part, i.e., it is not taking the next result.# >>>> print OUTFILE "else entered"; >>>> close(OUTFILE); >>>> >>>> my $result = $rc->next_result(); >>>> >>>> #save the output >>>> >>>> Please give me your reply. >>>> >>>> Thanks and regards, >>>> Roopa. >>>> >>>> My code is as follows. >>>> >>>> #!/usr/bin/perl >>>> >>>> #path for extra camel module >>>> use lib "/srv/www/htdocs/rain/RNAi/"; >>>> use rnai_blast; >>>> >>>> >>>> use Bio::SearchIO; >>>> use Bio::Search::Result::BlastResult; >>>> use Bio::Perl; >>>> use Bio::Tools::Run::RemoteBlast; >>>> use Bio::Seq; >>>> use Bio::SeqIO; >>>> use Bio::DB::GenBank; >>>> >>>> $serverpath = "/srv/www/htdocs/rain/RNAi"; >>>> $serverurl = "http://141.84.66.66/rain/RNAi"; >>>> $outfile = $serverpath."/rnairesult_".time().".html"; >>>> $nuc = $serverpath."/nuc".time().".txt"; >>>> $debugfile = $serverpath."/debug_".time().".txt"; >>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>> >>>> my $outstring =""; >>>> >>>> &parse_form; >>>> >>>> print "Content-type: text/html\n\n"; >>>> print "\n"; >>>> print "RNAi Result"; >>>> print ">>> URL=$serverurl/rnairesult_".time().".html\"> \n"; >>>> print "\n"; >>>> print "\n"; >>>> print " Your results will appear >>> href=$serverurl/rnairesult_".time().".html>here
"; >>>> print " Please be patient, runtime can be up to 5 minutes
"; >>>> print " This page will automatically reload in 30 seconds."; >>>> print "\n"; >>>> print "\n"; >>>> >>>> defined(my $pid = fork) or die "Can't fork: $!"; >>>> exit if $pid; >>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; >>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; >>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; >>>> >>>> >>>> >>>> open(OUTFILE, '>',$outfile); >>>> >>>> print OUTFILE "\n >>>> RNAi Result >>>> >>> URL=$serverurl//rnairesult_".time().".html\"> \n >>>> >>>> \n >>>> \n >>>> Your results will appear >>> href=$serverurl/rnairesult_".time().".html>here
>>>> Please be patient, runtime can be up to 5 minutes
>>>> This page will automatically reload in 30 seconds
>>>> \n >>>> \n"; >>>> >>>> close(OUTFILE); >>>> >>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); >>>> >>>> $in{'Inputseq'} =~ s/>.*$//m; >>>> $in{'Inputseq'} =~ s/[^TAGC]//gim; >>>> $in{'Inputseq'} =~ tr/actg/ACTG/; >>>> >>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, >>>> $in{'Threshold'}); >>>> >>>> >>>> sub blastcode >>>> { >>>> >>>> $inpu1= $_[0]; >>>> >>>> $organ= $_[1]; >>>> >>>> open(NUC,'>',$nuc); >>>> print NUC $inpu1,"\n"; >>>> close(NUC); >>>> >>>> my $prog = 'blastn'; >>>> my $db = 'refseq_rna'; >>>> my $e_val= '1e-10'; >>>> my $organism= $organ; >>>> >>>> $gb = new Bio::DB::GenBank; >>>> >>>> my @params = ( '-prog' => $prog, >>>> '-data' => $db, >>>> '-expect' => $e_val, >>>> '-readmethod' => 'SearchIO', >>>> '-Organism' => $organism ); >>>> >>>> open(OUTFILE,'>',$blastdebugfile); >>>> print OUTFILE @params; >>>> close(OUTFILE); >>>> >>>> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY >>> => >>>> "$organ\[ORGN]"); >>>> >>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>> >>>> #change a paramter >>>> >>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma >>>> Brucei[ORGN]'; >>>> >>>> #change a paramter >>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = >>> '$input2[ORGN]'; >>>> >>>> my $v = 1; >>>> #$v is just to turn on and off the messages >>>> >>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , >>>> '-organism' => "$organ\[ORGN]"); >>>> >>>> while (my $input = $str->next_seq()) >>>> { >>>> #Blast a sequence against a database: >>>> #Alternatively, you could pass in a file with many >>>> #sequences rather than loop through sequence one at a time >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>> #and swap the two lines below for an example of that. >>>> open(OUTFILE,'>',$debugfile); >>>> print OUTFILE $input; >>>> close(OUTFILE); >>>> >>>> #submits the input data to BLAST# >>>> >>>> my $r = $factory->submit_blast($input); >>>> >>>> open(OUTFILE,'>',$debugfile); >>>> print OUTFILE $r; >>>> close(OUTFILE); >>>> >>>> >>>> print STDERR "waiting...." if($v>0); >>>> >>>> while ( my @rids = $factory->each_rid ) { >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "while entered"; >>>> close(OUTFILE); >>>> foreach my $rid ( @rids ) { >>>> >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "foreach entered"; >>>> close(OUTFILE); >>>> #Retrieving the result ids# >>>> >>>> my $rc = $factory->retrieve_blast($rid); >>>> >>>> if( !ref($rc) ) >>>> { >>>> if( $rc < 0 ) >>>> { >>>> $factory->remove_rid($rid); >>>> } >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "if entered"; >>>> close(OUTFILE); >>>> print STDERR "." if ( $v > 0 ); >>>> sleep 5; >>>> } >>>> >>>> else { >>>> >>>> open(OUTFILE,'>',$blastdebugfile); # I think the problem >>> is >>>> in else part, i.e., it is not taking the next result.# >>>> print OUTFILE "else entered"; >>>> close(OUTFILE); >>>> >>>> my $result = $rc->next_result(); >>>> >>>> #save the output >>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; >>>> >>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>> print BLASTDEBUGFILE $result->next_hit(); >>>> close(BLASTDEBUGFILE); >>>> #saving the output in blastdata.time.out file# >>>> >>>> # $random=rand(); >>>> >>>> my $filename = $serverpath."/blastdata_".time()."\.out"; >>>> # open(DEBUGFILE,'>',$debugfile); >>>> # open(new,'>',$filename); >>>> # @arra=; >>>> # print DEBUGFILE @arra; >>>> # close(DEBUGFILE); >>>> # close(new); >>>> >>>> $factory->save_output($filename); >>>> >>>> # open(BLASTDEBUGFILE,'>',$debugfile); >>>> # print BLASTDEBUGFILE "Hello $rid"; >>>> # close(BLASTDEBUGFILE); >>>> >>>> $factory->remove_rid($rid); >>>> >>>> open(BLASTDEBUGFILE,'>',$blastdebugfile); >>>> # print BLASTDEBUGFILE $organism; >>>> close(BLASTDEBUGFILE); >>>> >>>> # open(OUTFILE,'>',$outfile); >>>> # print OUTFILE "Test2 $result->database_name()"; >>>> # close(OUTFILE); >>>> >>>> #$hit = $result->next_hit; >>>> #open(new,'>',$debugfile); >>>> #print $hit; >>>> #close(new); >>>> $dummy=0; >>>> while ( my $hit = $result->next_hit ) { >>>> >>>> next unless ( $v >= 0); >>>> >>>> # open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE "$hit in while hits"; >>>> # close(OUTFILE); >>>> >>>> my $sequ = $gb->get_Seq_by_version($hit->name); >>>> my $dna = $sequ->seq(); # get the sequence as a string >>>> $dummy++; >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE $dna; >>>> close(OUTFILE); >>>> push(@seqs,$dna); >>>> } >>>> } >>>> } >>>> } >>>> } >>>> >>>> $warum=@seqs; >>>> open(OUTFILE,'>',$debugfile); >>>> # print OUTFILE $warum; >>>> print OUTFILE @seqs; >>>> close(OUTFILE); >>>> >>>> >>>> return(@seqs); #returning the sequences obtained on BLAST# >>>> } >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Mar 12 17:41:23 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 12 Mar 2010 12:41:23 -0500 Subject: [Bioperl-l] remoteblast In-Reply-To: References: Message-ID: Look at http://www.bioperl.org/wiki/Using_Subversion ----- Original Message ----- From: Roopa Raghuveer To: Chris Fields ; Mark A. Jensen ; bioperl-l at lists.open-bio.org Sent: Friday, March 12, 2010 12:36 PM Subject: Re: [Bioperl-l] remoteblast Hello all, I am trying remote blast program and connecting to NCBI Blast, but I am unable to retrieve the sequences. Chris had suggested me to update from SVN. Could you please tell me how to update it from SVN? Regards, Roopa. On Sun, Mar 7, 2010 at 6:48 PM, Roopa Raghuveer wrote: Hi Chris, Thank you very much for the information. Could you please tell me how to update it from SVN? Thanks and regards, Roopa On Sun, Mar 7, 2010 at 3:57 PM, Chris Fields wrote: Roopa, I committed a fix for this a few days ago; if you update from SVN it should work. The problem stemmed from server-side changes at NCBI. chris On Mar 7, 2010, at 7:11 AM, Roopa Raghuveer wrote: > Hello Mark and everybody, > > I have been trying to connect to remote blast to retrieve similar sequences > to a given sequence. But my program is unable to retrieve the sequences from > BLAST, i.e., it is getting executed till the remote blast ids, but it is not > entering the else loop after collecting the rid. Please check this problem > and help me in this regard. I think the problem is in getting the sequence > and going to the 'else' part. i.e., > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > > Please give me your reply. > > Thanks and regards, > Roopa. > > My code is as follows. > > #!/usr/bin/perl > > #path for extra camel module > use lib "/srv/www/htdocs/rain/RNAi/"; > use rnai_blast; > > > use Bio::SearchIO; > use Bio::Search::Result::BlastResult; > use Bio::Perl; > use Bio::Tools::Run::RemoteBlast; > use Bio::Seq; > use Bio::SeqIO; > use Bio::DB::GenBank; > > $serverpath = "/srv/www/htdocs/rain/RNAi"; > $serverurl = "http://141.84.66.66/rain/RNAi"; > $outfile = $serverpath."/rnairesult_".time().".html"; > $nuc = $serverpath."/nuc".time().".txt"; > $debugfile = $serverpath."/debug_".time().".txt"; > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > my $outstring =""; > > &parse_form; > > print "Content-type: text/html\n\n"; > print "\n"; > print "RNAi Result"; > print " URL=$serverurl/rnairesult_".time().".html\"> \n"; > print "\n"; > print "\n"; > print " Your results will appear href=$serverurl/rnairesult_".time().".html>here
"; > print " Please be patient, runtime can be up to 5 minutes
"; > print " This page will automatically reload in 30 seconds."; > print "\n"; > print "\n"; > > defined(my $pid = fork) or die "Can't fork: $!"; > exit if $pid; > open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; > open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!"; > open STDERR, '>&STDOUT' or die "Can't dup stdout: $!"; > > > > open(OUTFILE, '>',$outfile); > > print OUTFILE "\n > RNAi Result > URL=$serverurl//rnairesult_".time().".html\"> \n > > \n > \n > Your results will appear href=$serverurl/rnairesult_".time().".html>here
> Please be patient, runtime can be up to 5 minutes
> This page will automatically reload in 30 seconds
> \n > \n"; > > close(OUTFILE); > > @compseqs = blastcode($in{'Inputseq'},$in{'Organism'}); > > $in{'Inputseq'} =~ s/>.*$//m; > $in{'Inputseq'} =~ s/[^TAGC]//gim; > $in{'Inputseq'} =~ tr/actg/ACTG/; > > @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'}, > $in{'Threshold'}); > > > sub blastcode > { > > $inpu1= $_[0]; > > $organ= $_[1]; > > open(NUC,'>',$nuc); > print NUC $inpu1,"\n"; > close(NUC); > > my $prog = 'blastn'; > my $db = 'refseq_rna'; > my $e_val= '1e-10'; > my $organism= $organ; > > $gb = new Bio::DB::GenBank; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO', > '-Organism' => $organism ); > > open(OUTFILE,'>',$blastdebugfile); > print OUTFILE @params; > close(OUTFILE); > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => > "$organ\[ORGN]"); > > #my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > > #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma > Brucei[ORGN]'; > > #change a paramter > # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]'; > > my $v = 1; > #$v is just to turn on and off the messages > > my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , > '-organism' => "$organ\[ORGN]"); > > while (my $input = $str->next_seq()) > { > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > open(OUTFILE,'>',$debugfile); > print OUTFILE $input; > close(OUTFILE); > > #submits the input data to BLAST# > > my $r = $factory->submit_blast($input); > > open(OUTFILE,'>',$debugfile); > print OUTFILE $r; > close(OUTFILE); > > > print STDERR "waiting...." if($v>0); > > while ( my @rids = $factory->each_rid ) { > open(OUTFILE,'>',$debugfile); > # print OUTFILE "while entered"; > close(OUTFILE); > foreach my $rid ( @rids ) { > > open(OUTFILE,'>',$debugfile); > # print OUTFILE "foreach entered"; > close(OUTFILE); > #Retrieving the result ids# > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > open(OUTFILE,'>',$debugfile); > # print OUTFILE "if entered"; > close(OUTFILE); > print STDERR "." if ( $v > 0 ); > sleep 5; > } > > else { > > open(OUTFILE,'>',$blastdebugfile); # I think the problem is > in else part, i.e., it is not taking the next result.# > print OUTFILE "else entered"; > close(OUTFILE); > > my $result = $rc->next_result(); > > #save the output > $blastdebugfile = $serverpath."/blastdebug_".time().".txt"; > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > print BLASTDEBUGFILE $result->next_hit(); > close(BLASTDEBUGFILE); > #saving the output in blastdata.time.out file# > > # $random=rand(); > > my $filename = $serverpath."/blastdata_".time()."\.out"; > # open(DEBUGFILE,'>',$debugfile); > # open(new,'>',$filename); > # @arra=; > # print DEBUGFILE @arra; > # close(DEBUGFILE); > # close(new); > > $factory->save_output($filename); > > # open(BLASTDEBUGFILE,'>',$debugfile); > # print BLASTDEBUGFILE "Hello $rid"; > # close(BLASTDEBUGFILE); > > $factory->remove_rid($rid); > > open(BLASTDEBUGFILE,'>',$blastdebugfile); > # print BLASTDEBUGFILE $organism; > close(BLASTDEBUGFILE); > > # open(OUTFILE,'>',$outfile); > # print OUTFILE "Test2 $result->database_name()"; > # close(OUTFILE); > > #$hit = $result->next_hit; > #open(new,'>',$debugfile); > #print $hit; > #close(new); > $dummy=0; > while ( my $hit = $result->next_hit ) { > > next unless ( $v >= 0); > > # open(OUTFILE,'>',$debugfile); > # print OUTFILE "$hit in while hits"; > # close(OUTFILE); > > my $sequ = $gb->get_Seq_by_version($hit->name); > my $dna = $sequ->seq(); # get the sequence as a string > $dummy++; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $dna; > close(OUTFILE); > push(@seqs,$dna); > } > } > } > } > } > > $warum=@seqs; > open(OUTFILE,'>',$debugfile); > # print OUTFILE $warum; > print OUTFILE @seqs; > close(OUTFILE); > > > return(@seqs); #returning the sequences obtained on BLAST# > } > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jessica.sun at gmail.com Fri Mar 12 21:28:11 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Fri, 12 Mar 2010 16:28:11 -0500 Subject: [Bioperl-l] RefSeq Message-ID: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> I have a question: I have a refseq with NM_ number(mRNA), how can I get the genomic sequences(NT_number) with Bioperl, if it can be done? Thanks -- Jessica Jingping Sun From sidd.basu at gmail.com Sat Mar 13 20:29:52 2010 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Sat, 13 Mar 2010 14:29:52 -0600 Subject: [Bioperl-l] Re: RefSeq In-Reply-To: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> References: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> Message-ID: <20100313202949.GA5621@Macintosh-74.local> The following code works with 1.6.1 of bioperl. It uses eutils and the workflow efetch -> elink -> esummary. #!/usr/bin/perl -w use strict; use Bio::DB::EUtilities; my $id = $ARGV[0] || 'NM_001618'; my $eutils = Bio::DB::EUtilities->new( -eutil => 'esearch', -db => 'nucleotide', -term => $id, -usehistory => 'y' ); my $hist = $eutils->next_History || die "no history\n"; $eutils->reset_parameters( -eutil => 'elink', -db => 'gene', -dbfrom => 'nuccore', -history => $hist ); my ($gene_id) = $eutils->next_LinkSet->get_ids; $eutils->reset_parameters( -eutil => 'esummary', -db => 'gene', -id => $gene_id, ); my ($item) = $eutils->next_DocSum->get_Items_by_name('GenomicInfoType'); print $item->get_contents_by_name('ChrAccVer'), "\n"; -siddhartha On Fri, 12 Mar 2010, Jessica Sun wrote: > I have a question: I have a refseq with NM_ number(mRNA), how can I get > the genomic sequences(NT_number) with Bioperl, if it can be done? > > Thanks > > > -- > Jessica Jingping Sun > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robby.hones at gmail.com Sat Mar 13 23:57:43 2010 From: robby.hones at gmail.com (robby jhones) Date: Sat, 13 Mar 2010 15:57:43 -0800 Subject: [Bioperl-l] comparing fasta sequences in multiple files Message-ID: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> Dear Group, Can anyone offer advice on comparing multiple fasta sequences in many files. We have 1000's of fasta sequences in individual files of which I would like to fish out and print to a new file (the sequence and ID), ONLY the sequences which appear in at least a few of the files: 3 out of 4 runs, perhaps all 4 runs ( as some are replicates). Is there something out there which would do this? Thanks for your helps >>Robby From sdavis2 at mail.nih.gov Sun Mar 14 00:49:46 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 13 Mar 2010 19:49:46 -0500 Subject: [Bioperl-l] comparing fasta sequences in multiple files In-Reply-To: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> References: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> Message-ID: <264855a01003131649o725cf151i2fe51e948ebfc86d@mail.gmail.com> On Sat, Mar 13, 2010 at 6:57 PM, robby jhones wrote: > Dear Group, > > ?Can anyone offer advice on comparing multiple fasta sequences in many > files. We have 1000's of fasta sequences in individual files of which I > would like to fish out and print to a new file (the sequence and ID), ONLY > the sequences which appear in at least a few of the files: 3 out of 4 runs, > perhaps all 4 runs ( as some are replicates). > > ?Is there something out there which would do this? Hi, Robby. It sounds like making a hash of IDs and then incrementing a count for each as you loop over files would give you what you want? Sean From jessica.sun at gmail.com Sun Mar 14 01:29:08 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Sat, 13 Mar 2010 20:29:08 -0500 Subject: [Bioperl-l] RefSeq In-Reply-To: <20100313202949.GA5621@Macintosh-74.local> References: <9adc0e9b1003121328j271c0d03ufe2843001ea98de6@mail.gmail.com> <20100313202949.GA5621@Macintosh-74.local> Message-ID: <9adc0e9b1003131729p4f78aa50kc1500cbbe01cd815@mail.gmail.com> Great. Thanks . On Sat, Mar 13, 2010 at 3:29 PM, Siddhartha Basu wrote: > The following code works with 1.6.1 of bioperl. It uses eutils and the > workflow efetch -> elink -> esummary. > > #!/usr/bin/perl -w > > use strict; > use Bio::DB::EUtilities; > > my $id = $ARGV[0] || 'NM_001618'; > > my $eutils = Bio::DB::EUtilities->new( > -eutil => 'esearch', > -db => 'nucleotide', > -term => $id, > -usehistory => 'y' > ); > > my $hist = $eutils->next_History || die "no history\n"; > > $eutils->reset_parameters( > -eutil => 'elink', > -db => 'gene', > -dbfrom => 'nuccore', > -history => $hist > ); > > my ($gene_id) = $eutils->next_LinkSet->get_ids; > > $eutils->reset_parameters( > -eutil => 'esummary', > -db => 'gene', > -id => $gene_id, > ); > > my ($item) = $eutils->next_DocSum->get_Items_by_name('GenomicInfoType'); > print $item->get_contents_by_name('ChrAccVer'), "\n"; > > -siddhartha > > On Fri, 12 Mar 2010, Jessica Sun wrote: > > > I have a question: I have a refseq with NM_ number(mRNA), how can I get > > the genomic sequences(NT_number) with Bioperl, if it can be done? > > > > Thanks > > > > > > -- > > Jessica Jingping Sun > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Jessica Jingping Sun From sdavis2 at mail.nih.gov Sun Mar 14 12:38:15 2010 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 14 Mar 2010 07:38:15 -0500 Subject: [Bioperl-l] comparing fasta sequences in multiple files In-Reply-To: <407ea9d41003132312l755b2d9bm5a9d2ba83017fd02@mail.gmail.com> References: <407ea9d41003131557g49d06ae2j4cd6d3fb2de16d7a@mail.gmail.com> <264855a01003131649o725cf151i2fe51e948ebfc86d@mail.gmail.com> <407ea9d41003132312l755b2d9bm5a9d2ba83017fd02@mail.gmail.com> Message-ID: <264855a01003140538m6cee0c27s823e45d02002d200@mail.gmail.com> On Sun, Mar 14, 2010 at 2:12 AM, robby jhones wrote: > I think that I'll need to write a hash of the IDs and sequences, then > iterate over the sequences to see if they are identical and if so push them > and the ID into an output file. I was hoping there was something out there > like this, but I suppose not. Look in the mailing list archives for the last week or so. There was some discussion about generating hashes of sequences; you could use that to generate your hash of unique sequences. Sean > On Sat, Mar 13, 2010 at 4:49 PM, Sean Davis wrote: >> >> On Sat, Mar 13, 2010 at 6:57 PM, robby jhones >> wrote: >> > Dear Group, >> > >> > ?Can anyone offer advice on comparing multiple fasta sequences in many >> > files. We have 1000's of fasta sequences in individual files of which I >> > would like to fish out and print to a new file (the sequence and ID), >> > ONLY >> > the sequences which appear in at least a few of the files: 3 out of 4 >> > runs, >> > perhaps all 4 runs ( as some are replicates). >> > >> > ?Is there something out there which would do this? >> >> Hi, Robby. >> >> It sounds like making a hash of IDs and then incrementing a count for >> each as you loop over files would give you what you want? >> >> Sean > > From lpritc at scri.ac.uk Mon Mar 15 11:55:52 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 15 Mar 2010 11:55:52 +0000 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Message-ID: Hi Scott, Thanks for the reply. I tried your suggestions on a clean VM of CentOS 5.4 and the equally wordy outcome is below... On 02/03/2010 Tuesday, March 2, 16:11, "Scott Cain" wrote: > First, I am working on the 1.1 release of gmod/chado, and it > may fix some of the problems you are describing. Certainly, ID > collisions between GFF files should not be a problem (I didn't think > they were in the 1.0 release, but that was a long time ago). Please > try a checkout of the schema trunk in the gmod svn: > > http://gmod.org/wiki/SVN As a note for anyone following this, when I downloaded the trunk/chado files only, my build failed with """ $make [...] Manifying ../blib/man3/Bio::Chaos::ChaosGraph.3pm Manifying ../blib/man3/Bio::Chaos::FeatureUtil.3pm Manifying ../blib/man3/Bio::Chaos::XSLTHelper.3pm Manifying ../blib/man3/Bio::Chaos::Root.3pm make[1]: Leaving directory `/home/lpritc/Desktop/chado/chaos-xml' make: *** No rule to make target `bin/gmod_gff2biomart5.pl', needed by `blib/script/gmod_gff2biomart5.pl'. Stop. """ I had to download the whole trunk for the installation to work. I came across this thread: http://old.nabble.com/Minor-Makefile.PL-changes-td26272744.html while I was looking for a solution; someone else has had a similar problem. > Another thing you may want to look at is that just last week, a > developer at Texas A&M, Nathan Liles, contributed code to the > bioperl-live trunk for the genbank2gff3.pl script that will do a much > better job of converting bacterial genbank files to GFF3; perhaps that > will help too. Working with a svn checkout of bioperl-live shouldn't > be too scary either; the pieces you are interested in (that work with > Chado and GBrowse) are quite stable. I also checked out BioPerl-live. The svn server at code.open-bio.org was unresponsive for a couple of days, but Peter pointed me to GitHub at http://github.com/bioperl/bioperl-live so I went from there. The process isn't quite as clean as using the latest stable version of BioPerl, however. When I attempt to use the bp_genbank2gff3.pl script, I get the following error message: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk Can't locate object method "FT_SO_map" via package "Bio::SeqFeature::Tools::TypeMapper" at /usr/bin/bp_genbank2gff3.pl line 374. """ This appears to be associated with the following code (l207 onwards...) in TypeMapper: """ =head2 map_types_to_SO [...] hardcodes the genbank to SO mapping [...] dgg: separated out FT_SO_map for caller changes. Update with: open(FTSO,"curl -s http://sequenceontology.org/resources/mapping/FT_SO.txt|"); while(){ chomp; ($ft,$so,$sid,$ftdef,$sodef)= split"\t"; print " '$ft' => '$so',\n" if($ft && $so && $ftdef); } =cut sub ft_so_map { # $self= shift; """ The upper/lower case function declaration seems to be important, as changing it back to "sub FT_SO_map" lets the script work: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk # Input: NC_004547.gbk # working on region:NC_004547, Erwinia carotovora subsp. atroseptica SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, complete genome. # GFF3 saved to ./NC_004547.gbk.gff # Summary: # Feature Count # ------- ----- # repeat_region 19 # sequence_variant 2 # repeat_unit 2 # gene 4614 # region 17387 # exon 4597 # RESIDUES 5064019 # """ Obviously, this is another unsatsifactory sucky ad hoc post-install hack; I hope I'm doing the right sort of thing, there. I'm not familiar with BioPerl so I'm not clear on why this change was made to the interface (it's part of the recent changes by Nathan Liles you referred to in your post: http://github.com/bioperl/bioperl-live/commit/18dae5436130c7c77e31120af1a37d dcd8a77a03), but it also seems to break bp_genbank2gff3.pl. Also, the --noCDS flag appears to have no effect at all when using the new version of bp_genbank2gff3.pl. The old version of bp_genbank2gff3.pl appears to recognise more feature types in the summary: """ [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk # Input: NC_004547.gbk # working on region:NC_004547, Erwinia carotovora subsp. atroseptica SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, complete genome. # GFF3 saved to ./NC_004547.gbk.gff # Summary: # Feature Count # ------- ----- # mRNA 4472 # sequence_variant 2 # gene 4594 # region 8275 # pseudogene 20 # CDS 4472 # RESIDUES(tr) 1433791 # RESIDUES 5064019 # rRNA 22 # processed_transcript 24 # repeat_region 19 # pseudogenic_region 46 # repeat_unit 2 # exon 4597 # tRNA 76 # """ and this is reflected in the substantial difference in GFF3 output, for issuing exactly the same command when moving from BioPerl 1.6.1 to bioperl-live: we get different GFF3 output that represents a different gene model. I wasn't expecting so radical a change, but at least the IDs are based on the locus_tag with the new script, and this appears to solve my problem with clashing feature IDs on the files I was using. Many thanks for your help, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From invite+m4r54agn at facebookmail.com Mon Mar 15 13:13:29 2010 From: invite+m4r54agn at facebookmail.com (Animesh Sharma) Date: Mon, 15 Mar 2010 06:13:29 -0700 Subject: [Bioperl-l] =?utf-8?b?4KSu4KWH4KSw4KWAIEZhY2Vib29rIOCkquCljQ==?= =?utf-8?b?4KSw4KWL4KSr4KS84KS+4KSH4KSyIOCkpuClh+CkluClh+Ckgg==?= Message-ID: ??????? ????? Facebook ??????? ???? ?? ???? ?? ??? ???? ?????, ??????, ?? ??????? ????? ?? ????/???? ??? ?? ??? ???? ????? ?? ??? ??? ????? ???? ?????/????? ??? ???? ?? ?? ?????? ??? ????. ???? ???? ?? Facebook ?? ?????! ?? ??? ?? Facebook ?? ???? ????, ?? ?? ?? ???? Facebook ????????? ??? ???? ???. ??????? Animesh Facebook ?? ???? ?? ???? ?? ??? ???? ??? ?? ???? ?? ???? ????: http://www.facebook.com/p.php?i=533710399&k=53F2X5TR3TXF4BGFSBYVPVW2UPKK65&r Already have an account? Add this email address to your account http://www.facebook.com/n/?merge_accounts.php&e=bioperl-l at portal.open-bio.org&c=b3e84a2fc8af2503660e52d1ee5449c1.Animesh Sharma ?? Facebook ?? ????? ???? ?? ??? bioperl-l at portal.open-bio.org ????? ???. ??? ?????? ??? ?? Facebook ?? ?? ?????? ?? ???? ??????? ? ???? ????? ??? ?? ????? ???????????? ???? ?? ??? ???? ??? ?? ???? ?? ????? ????. http://www.facebook.com/o.php?k=3cf837&u=612036206&mid=2082fa6G247aee6eG0G8 Facebook ?? ????? 1601 S. California Ave., Palo Alto, CA 94304 ??? ????? ??. From scott at scottcain.net Mon Mar 15 14:55:17 2010 From: scott at scottcain.net (Scott Cain) Date: Mon, 15 Mar 2010 10:55:17 -0400 Subject: [Bioperl-l] [Gmod-schema] Loading NCBI/GenBank bacteria into CHADO: Chromosome/Plasmid gene name conflicts In-Reply-To: References: <4536f7701003020811n1bf68c7bvdfea47fc9bad9f44@mail.gmail.com> Message-ID: <4536f7701003150755w2c2875fbob004bc03cf3387ab@mail.gmail.com> Hi Leighton, Thanks for the feedback both on getting chado installed from svn and on the genbank2gff3 converter. About installing Chado from svn, I thought I'd modified the Makefile.PL script to gracefully survive not having the GMODtools directory present; I guess I'll have to revisit that. Since I probably won't get to it today, I created a bug report for it: https://sourceforge.net/tracker/?func=detail&aid=2970687&group_id=27707&atid=391291 About the genbank2gff3 script, I'm cc'ing Nathan to make sure he sees your comments. Thanks, Scott On Mon, Mar 15, 2010 at 7:55 AM, Leighton Pritchard wrote: > Hi Scott, > > Thanks for the reply. ?I tried your suggestions on a clean VM of CentOS 5.4 > and the equally wordy outcome is below... > > On 02/03/2010 Tuesday, March 2, 16:11, "Scott Cain" > wrote: > >> First, I am working on the 1.1 release of gmod/chado, and it >> may fix some of the problems you are describing. ?Certainly, ID >> collisions between GFF files should not be a problem (I didn't think >> they were in the 1.0 release, but that was a long time ago). ?Please >> try a checkout of the schema trunk in the gmod svn: >> >> ? http://gmod.org/wiki/SVN > > As a note for anyone following this, when I downloaded the trunk/chado files > only, my build failed with > > """ > $make > [...] > Manifying ../blib/man3/Bio::Chaos::ChaosGraph.3pm > Manifying ../blib/man3/Bio::Chaos::FeatureUtil.3pm > Manifying ../blib/man3/Bio::Chaos::XSLTHelper.3pm > Manifying ../blib/man3/Bio::Chaos::Root.3pm > make[1]: Leaving directory `/home/lpritc/Desktop/chado/chaos-xml' > make: *** No rule to make target `bin/gmod_gff2biomart5.pl', needed by > `blib/script/gmod_gff2biomart5.pl'. ?Stop. > """ > > I had to download the whole trunk for the installation to work. ?I came > across this thread: > http://old.nabble.com/Minor-Makefile.PL-changes-td26272744.html > > while I was looking for a solution; someone else has had a similar problem. > >> Another thing you may want to look at is that just last week, a >> developer at Texas A&M, Nathan Liles, contributed code to the >> bioperl-live trunk for the genbank2gff3.pl script that will do a much >> better job of converting bacterial genbank files to GFF3; perhaps that >> will help too. ?Working with a svn checkout of bioperl-live shouldn't >> be too scary either; the pieces you are interested in (that work with >> Chado and GBrowse) are quite stable. > > I also checked out BioPerl-live. ?The svn server at code.open-bio.org was > unresponsive for a couple of days, but Peter pointed me to GitHub at > http://github.com/bioperl/bioperl-live so I went from there. ?The process > isn't quite as clean as using the latest stable version of BioPerl, however. > > When I attempt to use the bp_genbank2gff3.pl script, I get the following > error message: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > Can't locate object method "FT_SO_map" via package > "Bio::SeqFeature::Tools::TypeMapper" at /usr/bin/bp_genbank2gff3.pl line > 374. > """ > > This appears to be associated with the following code (l207 onwards...) in > TypeMapper: > > """ > =head2 map_types_to_SO > > [...] > > hardcodes the genbank to SO mapping > > [...] > dgg: separated out FT_SO_map for caller changes. Update with: > > ?open(FTSO,"curl -s > http://sequenceontology.org/resources/mapping/FT_SO.txt|"); > ?while(){ > ? ?chomp; ($ft,$so,$sid,$ftdef,$sodef)= split"\t"; > ? ?print " ? ? '$ft' => '$so',\n" if($ft && $so && $ftdef); > ?} > > =cut > > sub ft_so_map ?{ > ?# $self= shift; > """ > > The upper/lower case function declaration seems to be important, as changing > it back to "sub FT_SO_map" lets the script work: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > # Input: NC_004547.gbk > # working on region:NC_004547, Erwinia carotovora subsp. atroseptica > SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, > complete genome. > # GFF3 saved to ./NC_004547.gbk.gff > # Summary: > # Feature ? ? ? Count > # ------- ? ? ? ----- > # repeat_region ?19 > # sequence_variant ?2 > # repeat_unit ?2 > # gene ?4614 > # region ?17387 > # exon ?4597 > # RESIDUES ?5064019 > # > """ > > Obviously, this is another unsatsifactory sucky ad hoc post-install hack; I > hope I'm doing the right sort of thing, there. ?I'm not familiar with > BioPerl so I'm not clear on why this change was made to the interface (it's > part of the recent changes by Nathan Liles you referred to in your post: > http://github.com/bioperl/bioperl-live/commit/18dae5436130c7c77e31120af1a37d > dcd8a77a03), but it also seems to break bp_genbank2gff3.pl. ?Also, the > --noCDS flag appears to have no effect at all when using the new version of > bp_genbank2gff3.pl. > > The old version of bp_genbank2gff3.pl appears to recognise more feature > types in the summary: > > """ > [lpritc at localhost ~]$ bp_genbank2gff3.pl -s NC_004547.gbk > # Input: NC_004547.gbk > # working on region:NC_004547, Erwinia carotovora subsp. atroseptica > SCRI1043, 03-DEC-2007, Erwinia carotovora subsp. atroseptica SCRI1043, > complete genome. > # GFF3 saved to ./NC_004547.gbk.gff > # Summary: > # Feature ? ? ? Count > # ------- ? ? ? ----- > # mRNA ?4472 > # sequence_variant ?2 > # gene ?4594 > # region ?8275 > # pseudogene ?20 > # CDS ?4472 > # RESIDUES(tr) ?1433791 > # RESIDUES ?5064019 > # rRNA ?22 > # processed_transcript ?24 > # repeat_region ?19 > # pseudogenic_region ?46 > # repeat_unit ?2 > # exon ?4597 > # tRNA ?76 > # > """ > > and this is reflected in the substantial difference in GFF3 output, for > issuing exactly the same command when moving from BioPerl 1.6.1 to > bioperl-live: we get different GFF3 output that represents a different gene > model. ?I wasn't expecting so radical a change, but at least the IDs are > based on the locus_tag with the new script, and this appears to solve my > problem with clashing feature IDs on the files I was using. > > Many thanks for your help, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From kiekyon.huang at gmail.com Mon Mar 15 15:44:13 2010 From: kiekyon.huang at gmail.com (kiekyon.huang at gmail.com) Date: Mon, 15 Mar 2010 15:44:13 +0000 Subject: [Bioperl-l] Taxonomy report Message-ID: <0016e64be064b8211f0481d8c02d@google.com> Hi, just like to know if there is there any way to generate the taxonomy report from the standalone blast output? thanks From cjfields at illinois.edu Mon Mar 15 15:57:29 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 15 Mar 2010 10:57:29 -0500 Subject: [Bioperl-l] Taxonomy report In-Reply-To: <0016e64be064b8211f0481d8c02d@google.com> References: <0016e64be064b8211f0481d8c02d@google.com> Message-ID: <53CE22BE-38F4-4EC6-80A9-37228A9CF602@illinois.edu> Not that I know of, at least not w/o doing some mapping (the tax report is generated on NCBI's servers last I recall). chris On Mar 15, 2010, at 10:44 AM, kiekyon.huang at gmail.com wrote: > Hi, > > just like to know if there is there any way to generate the taxonomy report from the standalone blast output? > > thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Mar 15 17:11:27 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 15 Mar 2010 10:11:27 -0700 Subject: [Bioperl-l] getting strand from Bio::Align::AlignI ?? In-Reply-To: <8425A547-149B-41F5-B4DB-A58C9E92B373@mail.nih.gov> References: <8425A547-149B-41F5-B4DB-A58C9E92B373@mail.nih.gov> Message-ID: <4B9E6A3F.6080104@bioperl.org> Did you start with Bio::SearchIO object and call get_aln on the HSP object? Strand is available from the $hsp->query->strand and $hsp->hit->strand and Bio::SearchIO is the preferred way of parsing pairwise alignment reports. Either way the sequences themselves have strands not the alignment. Each sequence should have a strand $seq->strand since they are Bio::LocatableSeq objects. for my $seq ( $aln->each_seq ) { print $seq->id, " ", $seq->strand, "\n"; } -jason Joan Pontius wrote, On 3/15/10 8:49 AM: > I am looking into using Bio::Align::AlignI for an application that > uses blast2seq > and can't figure out how to get the strand of an alignment? > > Thanks in advance > > > > Joan Pontius-Contractor SAIC > Laboratory of Genomic Diversity > Bldg 560-NCI > Frederick Maryland 21702 > phone (301)846-1761 > fax (301) 846-1686 From cjfields1 at gmail.com Mon Mar 15 18:57:08 2010 From: cjfields1 at gmail.com (Christopher Fields) Date: Mon, 15 Mar 2010 13:57:08 -0500 Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <6C998BD2392E4BF594F041368D9456E4@BlackJack> References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> Message-ID: <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> Francisco, In general, please address any questions directly to the bioperl mail list, in case I can't respond. The anon. svn on code.open-bio.org is down at the moment. OBF support knows about this problem and it's being addressed. There is a github mirror of the repos in case this happens: http://github.com/bioperl chris On Mar 15, 2010, at 10:38 AM, Francisco J. Ossand?n wrote: > Hello Chris Fields, > I have posted before in the Bugzilla about Bioperl bugs, but this time is about the Bioperl SVN. It has been several days since I could connect to the SVN for the last time (tried from different locations). I can't connect directly (svn://code.open-bio.org/bioperl/bioperl-live/trunk) nor using the http link provided in the wiki (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > There has been some change in the SVN address or configuration that I should update? I have seen devs posting in the Bugzilla about submitted revisions to the SVN, so I guess that it is working, but I still can't connect to it. > > I hope that you can help me with this. > > Regards, > > -- > Francisco J. Ossandon > Bioinformatician. > Ph.D. Student, University Andres Bello. > Center for Bioinformatics and Genome Biology, > Fundacion Ciencia para la Vida. > Santiago, Chile. > www.cienciavida.cl/CBGB.htm From hlapp at drycafe.net Tue Mar 16 20:03:50 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 16 Mar 2010 16:03:50 -0400 Subject: [Bioperl-l] [OT] Job opportunity: Training coordinator and Bioinformatics Project Manager Message-ID: <0CDDCED9-266E-4CCE-8240-D7E2C8522784@drycafe.net> Hi all - first off, sorry for the cross-posting, we're trying to advertise this as widely as possible. Second, apologies if this is committing an offense and considered spam. I thought though that there might be some people around here who may be interested and suitable. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== A unique position is available for a training coordinator and bioinformatics project manager at the U.S. National Evolutionary Synthesis Center in Durham, North Carolina (NESCent, http:// nescent.org). NESCent is a National Science Foundation funded research center managed by Duke University, the University of North Carolina at Chapel Hill and North Carolina State University on behalf of the international evolutionary biology community. NESCent facilitates synthetic research by bringing together diverse expertise, data, tools and concepts (Sidlauskas et al. 2009). In addition to a resident population of 20-30 scientists, the Center hosts over 800 visitors a year. An informatics staff is on-site to support resident and visiting scientists? needs in high-performance computing, electronic collaboration, scientific software and databases; this includes custom software development for a limited number of high- impact projects. NESCent?s informatics training program includes a rotating series of open-application summer courses, ad-hoc short courses for resident scientists, and remote internships (including past participation in the Google Summer of Code). The training coordinator and bioinformatics project manager will provide oversight to the Center?s training activities. The incumbent will also serve as the interface between scientists and software developers at NESCent. The position provides extensive opportunities for collaboration and intellectual engagement with both NESCent- sponsored scientists and informatics staff; however, this is not an independent research position. The incumbent will report to the Director, while overseeing the work of a small informatics team and coordinating activities among the Center?s science, education and informatics programs. Responsibilities: ? 50% - Consult with sponsored scientists (including scientists in residence and working group participants) about informatics resources and needs. Manage software product development by gathering requirements from scientists, participating in conceptual design, monitoring implementation progress and product quality, facilitating communication between software developers and scientists, and researching software solutions. ? 25% - Oversee NESCent?s course curriculum by identifying opportunities for onsite or online informatics courses that satisfy demand for advanced training of resident and visiting scientists, recruiting instructors, providing guidance to instructors in developing course syllabi, coordinating logistical and technical support requirements, conducting assessments, and serving as a liaison to course organizers at other institutions. ? 25% - Assisting in the management of NESCent?s summer informatics intern program, by coordinating the recruitment, application & review process for students, communicating expectations to students and mentors, monitoring student progress, documenting student outcomes, and performing assessments. Education: Required: M.S. in Biology, Bioinformatics, or a related field. Preferred: Ph.D. and two years postdoctoral experience in evolutionary biology, or an equivalent combination of relevant education and/or experience. Experience: Required: Excellent communication, interpersonal, and organizational skills. Experience with computationally oriented scientific research. Preferred: At least two years in development of databases and open source software. Organization, coordination, development and delivery of courses and workshops appropriate for graduate-level participants. Terms of Employment: Salary will be competitive and commensurate with experience. As a full-time employee, the incumbent will receive Duke University?s benefits package (http://hr.duke.edu/benefits/main.html). The position is available immediately and will remain open until filled. The position is currently funded through November 2014, contingent on annual renewal of the Center by the NSF. How to Apply: Please send a C.V., including contact information for three references, and a brief statement of interest to Allen Rodrigo, Director, NESCent, at a.rodrigo at nescent.org. Inquiries about suitability for the position are welcome. Duke University is an Equal Opportunity/Affirmative Action employer. Additional information about NESCent: http://www.nescent.org References: Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall LW, Price S, Scherle R, Spaeth PA, Kidd DM (2009) Linking Big: The Continuing Promise of Evolutionary Synthesis. Evolution. http://dx.doi.org/10.1111/j.1558-5646.2009.00892.x From hartzell at alerce.com Tue Mar 16 23:35:13 2010 From: hartzell at alerce.com (George Hartzell) Date: Tue, 16 Mar 2010 16:35:13 -0700 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check Message-ID: <19360.5553.985550.996751@gargle.gargle.HOWL> Apologies if this is as silly of a question as it seems, I think that I must just be decaffeinated this morning.... I'm cleaning up some modules and would like to express a dependency on BioPerl-run version 1.6.1. For the main bioperl I use Bio::Root::Version and 1.006001. That works, although the course of investigating below I found that Bio::Root::RootI (which uses BR::Version) doesn't. A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) use Bio::Root::Version and thereby acquire a reasonable version number but: a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency when I want bioperl-run c) it's funny that PiseWorkflow uses Bio::Root::Version (which imports a $VERSION into it's package) then goes on to set one itself. b) there's something hinky going on, when I do 'perl Build.PL' on my Task it doesn't think that PiseWorkflow is up to date (it thinks I have version (0) if I understand correctly), but when I './Build installdeps' everything appears up to date. It looks like the trickiness of assigning $Bio::Root::Version::VERSION to $VERSION confuses Module::Build::ModuleInfo::_evaluate_version_line and the result is that VERSION appears to be 0. What's The Right Thing to do? Thanks, g. From maj at fortinbras.us Wed Mar 17 14:41:00 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 17 Mar 2010 10:41:00 -0400 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check In-Reply-To: <19360.5553.985550.996751@gargle.gargle.HOWL> References: <19360.5553.985550.996751@gargle.gargle.HOWL> Message-ID: I'd say the RTTD would be to submit a bugzilla report; this sounds pretty fishy to me--(esp since the Pise stuff is deprecated, IIRC) cheers MAJ ----- Original Message ----- From: "George Hartzell" To: "bioperl-l List" Sent: Tuesday, March 16, 2010 7:35 PM Subject: [Bioperl-l] What's to depend on for BioPerl-run version check > > Apologies if this is as silly of a question as it seems, I think that > I must just be decaffeinated this morning.... > > I'm cleaning up some modules and would like to express a dependency on > BioPerl-run version 1.6.1. > > For the main bioperl I use Bio::Root::Version and 1.006001. That > works, although the course of investigating below I found that > Bio::Root::RootI (which uses BR::Version) doesn't. > > A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) > use Bio::Root::Version and thereby acquire a reasonable version number > but: > > a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency > when I want bioperl-run > c) it's funny that PiseWorkflow uses Bio::Root::Version (which > imports a $VERSION into it's package) then goes on to set one > itself. > b) there's something hinky going on, when I do 'perl Build.PL' on my > Task it doesn't think that PiseWorkflow is up to date (it thinks > I have version (0) if I understand correctly), but when I > './Build installdeps' everything appears up to date. > > It looks like the trickiness of assigning > $Bio::Root::Version::VERSION to $VERSION confuses > Module::Build::ModuleInfo::_evaluate_version_line and the result > is that VERSION appears to be 0. > > What's The Right Thing to do? > > Thanks, > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From janine.arloth at googlemail.com Mon Mar 15 08:15:50 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Mon, 15 Mar 2010 09:15:50 +0100 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: Hello, exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) Best regards From cjfields at illinois.edu Wed Mar 17 15:13:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Mar 2010 10:13:20 -0500 Subject: [Bioperl-l] What's to depend on for BioPerl-run version check In-Reply-To: References: <19360.5553.985550.996751@gargle.gargle.HOWL> Message-ID: <32C28662-BD24-4270-A0B6-71CEB459172C@illinois.edu> What is probably the best thing to do is set up a stub module for each of the subdistributions that contains a proper version to match against. So, for BioPerl-Run, use Bio::Run or Bio::Tools::Run, BioPerl-DB use Bio::DB, etc. Distribution-specific general documentation would go in those stub modules. I sort of started this, with the first alphas but didn't get around to finishing it up. Just as a footnote, the universal $VERSION thingy was set up quite a while ago, prior to perl 5.8 I believe, and doesn't play very well with $VERSION (and version.pm) on newer perl versions. Once we move beyond 1.6.x towards breaking things up we'll have to assign new VERSIONs to anything released independently on CPAN, anyway, so this may eventually be a moot point. chris The inherited $VERSION thingy was set up a while back, basically as a way of assigning a common version across BioPerl. On Mar 17, 2010, at 9:41 AM, Mark A. Jensen wrote: > I'd say the RTTD would be to submit a bugzilla report; this sounds pretty fishy > to me--(esp since the Pise stuff is deprecated, IIRC) cheers MAJ > ----- Original Message ----- From: "George Hartzell" > To: "bioperl-l List" > Sent: Tuesday, March 16, 2010 7:35 PM > Subject: [Bioperl-l] What's to depend on for BioPerl-run version check > > >> Apologies if this is as silly of a question as it seems, I think that >> I must just be decaffeinated this morning.... >> I'm cleaning up some modules and would like to express a dependency on >> BioPerl-run version 1.6.1. >> For the main bioperl I use Bio::Root::Version and 1.006001. That >> works, although the course of investigating below I found that >> Bio::Root::RootI (which uses BR::Version) doesn't. >> A couple of the modules in -run (e.g. Bio::Tools::Run::PiseWorkflow) >> use Bio::Root::Version and thereby acquire a reasonable version number >> but: >> a) it's funny to list Bio::Tools::Run::PiseWorkflow as a dependency >> when I want bioperl-run >> c) it's funny that PiseWorkflow uses Bio::Root::Version (which >> imports a $VERSION into it's package) then goes on to set one >> itself. >> b) there's something hinky going on, when I do 'perl Build.PL' on my >> Task it doesn't think that PiseWorkflow is up to date (it thinks >> I have version (0) if I understand correctly), but when I >> './Build installdeps' everything appears up to date. >> It looks like the trickiness of assigning >> $Bio::Root::Version::VERSION to $VERSION confuses >> Module::Build::ModuleInfo::_evaluate_version_line and the result >> is that VERSION appears to be 0. >> What's The Right Thing to do? >> Thanks, >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From robfsouza at gmail.com Wed Mar 17 15:20:21 2010 From: robfsouza at gmail.com (robfsouza) Date: Wed, 17 Mar 2010 08:20:21 -0700 (PDT) Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> Message-ID: <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> Hi Chris, Any idea when the SVN is going to be fixed? I could not find tar.gz or other download methods in github... Robson On Mar 15, 2:57?pm, Christopher Fields wrote: > Francisco, > > In general, please address any questions directly to the bioperl mail list, in case I can't respond. ? > > The anon. svn on code.open-bio.org is down at the moment. ?OBF support knows about this problem and it's being addressed. ?There is a github mirror of the repos in case this happens: > > http://github.com/bioperl > > chris > > On Mar 15, 2010, at 10:38 AM, Francisco J. Ossand?n wrote: > > > > > Hello Chris Fields, > > I have posted before in the Bugzilla about Bioperl bugs, but this time is about the Bioperl SVN. It has been several days since I could connect to the SVN for the last time (tried from different locations). I can't connect directly (svn://code.open-bio.org/bioperl/bioperl-live/trunk) nor using the http link provided in the wiki (http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live). > > > There has been some change in the SVN address or configuration that I should update? I have seen devs posting in the Bugzilla about submitted revisions to the SVN, so I guess that it is working, but I still can't connect to it. > > > I hope that you can help me with this. > > > Regards, > > > -- > > Francisco J. Ossandon > > Bioinformatician. > > Ph.D. Student, University Andres Bello. > > Center for Bioinformatics and Genome Biology, > > Fundacion Ciencia para la Vida. > > Santiago, Chile. > >www.cienciavida.cl/CBGB.htm > > _______________________________________________ > Bioperl-l mailing list > Bioper... at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l From adsj at novozymes.com Wed Mar 17 16:00:34 2010 From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 17 Mar 2010 17:00:34 +0100 Subject: [Bioperl-l] Bioperl SVNconnection problem In-Reply-To: <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> (robfsouza@gmail.com's message of "Wed, 17 Mar 2010 08:20:21 -0700 (PDT)") References: <6C998BD2392E4BF594F041368D9456E4@BlackJack> <313A477B-0A50-4C4E-86C5-FCD62264A09C@gmail.com> <91e8aa2d-376f-4499-9831-350f7c9ea9c9@g11g2000yqe.googlegroups.com> Message-ID: <874okfsztp.fsf@topper.koldfront.dk> On Wed, 17 Mar 2010 08:20:21 -0700 (PDT), robfsouza wrote: > Any idea when the SVN is going to be fixed? I could not find tar.gz or > other download methods in github... If you don't want to "git clone http://github.com/bioperl/bioperl-live.git", you can click on the "Download source" link in the upper right corner of http://github.com/bioperl/bioperl-live and you'll get to choose between downloading tar or zip. Best regards, Adam -- Adam Sj?gren adsj at novozymes.com From cjfields at illinois.edu Wed Mar 17 16:12:42 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 17 Mar 2010 11:12:42 -0500 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: <53EECF69-E9CE-4619-BE0A-97BE55754D8E@illinois.edu> Janine, How would you go about doing that from the BLAST report alone (which doesn't store the whole sequence)? Unless you know something I don't, you'll need to pull the unique identifier for the sequence from the hit object while parsgin the report and grab the seq from a local or remote database (or use fastacmd or it's equivalent in blast+). chris On Mar 15, 2010, at 3:15 AM, Janine Arloth wrote: > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 19:48:27 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 08:48:27 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > Sent: Monday, 15 March 2010 9:16 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the > hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From michael.watson at bbsrc.ac.uk Wed Mar 17 20:47:57 2010 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 17 Mar 2010 20:47:57 +0000 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> Message-ID: <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> I think that relies on the blast database being built with the "-o T" option, which is not the default for formatdb.... ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] On Behalf Of Smithies, Russell [Russell.Smithies at agresearch.co.nz] Sent: 17 March 2010 19:48 To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > Sent: Monday, 15 March 2010 9:16 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > Hello, > > exists a possibility to get/extract the whole hit sequences? (Not only the > hit string from the alignment with $hsp->$hit_string;) > > Best regards > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 21:07:29 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 10:07:29 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A725D@exchsth.agresearch.co.nz> Precompiled databases from NCBI are built with "-o T" but when building them yourself, the default is "-o F". We build all ours with "-o T" as we have some extra stuff built into our to retrieve sequences for all your blast hits. Here's an example of our sequence retrieval: https://isgcdata.agresearch.co.nz/cgi-bin/blast_results.py?filename=xCW3ez7FU46qvpKNTGNu9ZXnw&submit_time=1268859815.54&database=isgcdata_raw --Russell > -----Original Message----- > From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] > Sent: Thursday, 18 March 2010 9:48 a.m. > To: Smithies, Russell; 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' > Subject: RE: [Bioperl-l] SearchIO, StandAloneBlastPlus > > I think that relies on the blast database being built with the "-o T" > option, which is not the default for formatdb.... > ________________________________________ > From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open- > bio.org] On Behalf Of Smithies, Russell > [Russell.Smithies at agresearch.co.nz] > Sent: 17 March 2010 19:48 > To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus > > If you're running blast locally, use fastacmd to extract the sequences > from the blast database. > Eg fastacmd -d nr -S AC147927 > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Janine Arloth > > Sent: Monday, 15 March 2010 9:16 p.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus > > > > Hello, > > > > exists a possibility to get/extract the whole hit sequences? (Not only > the > > hit string from the alignment with $hsp->$hit_string;) > > > > Best regards > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Wed Mar 17 21:53:38 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 18 Mar 2010 10:53:38 +1300 Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus In-Reply-To: <99D9C34C-655F-4BBC-AD01-83E2EC837317@gmail.com> References: , <18DF7D20DFEC044098A1062202F5FFF32C6E2A71A3@exchsth.agresearch.co.nz> <8D08960C647E64438CE5740657CBBDC5020F05DD35@iahcexch1.iah.bbsrc.ac.uk> <18DF7D20DFEC044098A1062202F5FFF32C6E2A725D@exchsth.agresearch.co.nz> <99D9C34C-655F-4BBC-AD01-83E2EC837317@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E2A72BD@exchsth.agresearch.co.nz> It's all a bit complicated as this page is on a public site but our blast server is internal and restricted so there's no direct communication between them. The public site takes the data from the blast requect and writes it to a template file then puts it in a folder that the internal blast server checks every 10 seconds. When a new request is found, it does the blast , creates the image and map with Bio::Graphics, then transfers it to a folder on the public server. As a sneaky bodge so I don't have to transfer the image, it's base64 encoded in the html then stripped out later. The blast result page keeps refreshing until it sees the required result has returned then displays the page. It sounds a bit odd but as blast runs on one of our main servers, we don't want anyone to be able to "accidently" run commands on it - no one has hacked our servers yet :) There's some good stuff in the BioPerl howtos http://www.bioperl.org/wiki/HOWTO:Graphics and http://www.bioperl.org/wiki/HOWTO:SearchIO Bio::SearchIO::Writer::HTMLResultWriter can be quite useful though ours is html-ized 'manually' as it's streamed through a post-processing script. --Russell From: Janine Arloth [mailto:janine.arloth at googlemail.com] Sent: Thursday, 18 March 2010 10:33 a.m. To: Smithies, Russell Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus Thank you very much. Can I ask you, how you get the figure in the blast output (blastmap)? I use use Bio::Graphics; But i did not see how to create this figure? Best Regards Am 17.03.2010 um 22:07 schrieb Smithies, Russell: Precompiled databases from NCBI are built with "-o T" but when building them yourself, the default is "-o F". We build all ours with "-o T" as we have some extra stuff built into our to retrieve sequences for all your blast hits. Here's an example of our sequence retrieval: https://isgcdata.agresearch.co.nz/cgi-bin/blast_results.py?filename=xCW3ez7FU46qvpKNTGNu9ZXnw&submit_time=1268859815.54&database=isgcdata_raw --Russell -----Original Message----- From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Sent: Thursday, 18 March 2010 9:48 a.m. To: Smithies, Russell; 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: RE: [Bioperl-l] SearchIO, StandAloneBlastPlus I think that relies on the blast database being built with the "-o T" option, which is not the default for formatdb.... ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open- bio.org] On Behalf Of Smithies, Russell [Russell.Smithies at agresearch.co.nz] Sent: 17 March 2010 19:48 To: 'Janine Arloth'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] SearchIO, StandAloneBlastPlus If you're running blast locally, use fastacmd to extract the sequences from the blast database. Eg fastacmd -d nr -S AC147927 Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- bounces at lists.open-bio.org] On Behalf Of Janine Arloth Sent: Monday, 15 March 2010 9:16 p.m. To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] SearchIO, StandAloneBlastPlus Hello, exists a possibility to get/extract the whole hit sequences? (Not only the hit string from the alignment with $hsp->$hit_string;) Best regards _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From armendarez77 at hotmail.com Thu Mar 18 16:27:20 2010 From: armendarez77 at hotmail.com (armendarez77 at hotmail.com) Date: Thu, 18 Mar 2010 09:27:20 -0700 Subject: [Bioperl-l] Bio::DB::RefSeq and iPrism Web Filter Message-ID: Hello, I'm having a problem involving my company's StBernard iPrism Web Filter. I would like to be able to run my scripts (include Bio::DB::RefSeq, Bio::DB::GenBank) via crontab, however the web filter requires me to log in every 8 hours. The administrator removed the filter however, my scripts still failed. I then logged into iPrism and the scripts worked. The system administrators say its the script; that it is somehow caching information and preventing itself from accessing the internet. I'm using the following modules: strict, DBI, Bio::Perl, Bio::SeqIO, Getopt::Long and Bio::Tools::Run::StandAloneBlast. I would include the script, but it's a bit involved and passes arguments to other scripts. Thank you, Veronica _________________________________________________________________ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/210850553/direct/01/ From cjfields at illinois.edu Thu Mar 18 17:21:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Mar 2010 12:21:22 -0500 Subject: [Bioperl-l] Bio::DB::RefSeq and iPrism Web Filter In-Reply-To: References: Message-ID: Veronica, No caching occurs that I know of. If you have a environment proxy set somehow it will use that, using LWP::UserAgent and env_proxy() (your logging in via iPrism makes me think it is something along those lines). Otherwise the proxy has to be explicitly set for each object, so no caching is apparent. Could you have a local environment proxy set that you're unaware of? See here for examples: http://search.cpan.org/~gaas/libwww-perl-5.834/lib/LWP/UserAgent.pm#Proxy_attributes You could try something like this after you create the instances, which accesses the LWP::UserAgent instance cached in the relevant class and shuts off proxies: $db->ua->no_proxy(); Otherwise, you can try coming up with a minimal test case indicating what happens (including any output) and file a bug report, just in case. chris On Mar 18, 2010, at 11:27 AM, wrote: > > Hello, > > I'm having a problem involving my company's StBernard iPrism Web Filter. I would like to be able to run my scripts (include Bio::DB::RefSeq, Bio::DB::GenBank) via crontab, however the web filter requires me to log in every 8 hours. The administrator removed the filter however, my scripts still failed. I then logged into iPrism and the scripts worked. > > The system administrators say its the script; that it is somehow caching information and preventing itself from accessing the internet. I'm using the following modules: strict, DBI, Bio::Perl, Bio::SeqIO, Getopt::Long and Bio::Tools::Run::StandAloneBlast. > > I would include the script, but it's a bit involved and passes arguments to other scripts. > > Thank you, > > Veronica > > > > _________________________________________________________________ > Hotmail: Trusted email with powerful SPAM protection. > http://clk.atdmt.com/GBL/go/210850553/direct/01/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Mar 18 21:11:34 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 18 Mar 2010 14:11:34 -0700 Subject: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4BA29706.8040606@cornell.edu> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2010 FAQ at http://tinyurl.com/yzemdfo Student applications are due April 9, 2010 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and who to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2010 Administrator From me at miguel.weapps.com Thu Mar 18 23:33:16 2010 From: me at miguel.weapps.com (Luis M Rodriguez-R) Date: Thu, 18 Mar 2010 18:33:16 -0500 Subject: [Bioperl-l] GSoC-2010 & the semantic web Message-ID: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Hello all, I would like to know how to apply to the GSoC-2010, and when it is planned to be performed. I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf, uniprot and some onthologies, but it could also be useful to integrate OWL, for example). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL for a discoverable "bio-cloud". The people of bio2rdf already have some parsers, but there are still a lot of things to do. Best regards, Luis. Luis M. Rodriguez-R [http://bioinf.uniandes.edu.co/~miguel/] --------------------------------- Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a Universidad de Los Andes, Colombia [http://bioinf.uniandes.edu.co] + 57 1 3394949 ext 2619 luisrodr at uniandes.edu.co me at miguel.weapps.com From rhythmbox-devel at maubp.freeserve.co.uk Fri Mar 19 00:25:05 2010 From: rhythmbox-devel at maubp.freeserve.co.uk (Peter) Date: Fri, 19 Mar 2010 00:25:05 +0000 Subject: [Bioperl-l] GSoC-2010 & the semantic web In-Reply-To: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> References: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Message-ID: <320fb6e01003181725j2aa1268am80ae7649bd873b46@mail.gmail.com> On Thu, Mar 18, 2010 at 11:33 PM, Luis M Rodriguez-R wrote: > > I think there are great development opportunities in information > discovery using semantic web (I'm familiar with RDF in bio2rdf, > uniprot and some onthologies, ... Have a read of the wiki pages from this recent hackathon - it should be of interested to you: http://hackathon3.dbcls.jp/ Peter From cjfields at illinois.edu Fri Mar 19 00:29:19 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 18 Mar 2010 19:29:19 -0500 Subject: [Bioperl-l] GSoC-2010 & the semantic web In-Reply-To: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> References: <32B198C6-EA53-4629-A5CC-0B22580628C9@miguel.weapps.com> Message-ID: <0FADD2C6-9458-4E0C-ADB5-E4C0F18A79D8@illinois.edu> Luis, See this page for the specifics: http://www.open-bio.org/wiki/Google_Summer_of_Code There are several proposed projects already listed, feel free to add yours to the page. I'm assuming these will be OBF-focused, so tying your proposal to one of the OBF projects is probably a good idea. chris On Mar 18, 2010, at 6:33 PM, Luis M Rodriguez-R wrote: > Hello all, > > I would like to know how to apply to the GSoC-2010, and when it is planned to be performed. > > I think there are great development opportunities in information discovery using semantic web (I'm familiar with RDF in bio2rdf, uniprot and some onthologies, but it could also be useful to integrate OWL, for example). I've been playing with this, and I think parsers from, for example, GenBank and EMBL to RDF, and parsers of RDF from bio2rdf and uniprot would be very useful, specially thinking in the implementation of SPARQL for a discoverable "bio-cloud". > > The people of bio2rdf already have some parsers, but there are still a lot of things to do. > > Best regards, > Luis. > > Luis M. Rodriguez-R > [http://bioinf.uniandes.edu.co/~miguel/] > --------------------------------- > Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a > Universidad de Los Andes, Colombia > [http://bioinf.uniandes.edu.co] > > + 57 1 3394949 ext 2619 > luisrodr at uniandes.edu.co > me at miguel.weapps.com > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ross at cuhk.edu.hk Sat Mar 20 23:55:35 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sun, 21 Mar 2010 07:55:35 +0800 Subject: [Bioperl-l] automation of translation based on alignment Message-ID: <002c01cac888$d570fe20$8052fa60$@edu.hk> Dear bioperl users, I am working on virus sequences and one of the Genbank file is here: http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu m with 1000 such nucleotide sequences, I'd like to translate the corresponding protein coding sequences. The difficulties lie in: 1) The genome sequence is circular 2) The genes are overlapping I don't have all the 1000 Genbank files but I plan to use the above guide one to direct the automation process. Has bioperl implemented specialized functions to handle this kind of problem? Thanks a lot for your advice, Ross From florent.angly at gmail.com Mon Mar 22 00:44:11 2010 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 22 Mar 2010 10:44:11 +1000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <002c01cac888$d570fe20$8052fa60$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> Message-ID: <4BA6BD5B.9010509@gmail.com> Hi Ross, It seems like your answer is in the link you put. On this link, all the coding sequences are already identified and their aminoacid sequence provided. You simply need to parse all the GenBank entries to extract this information. You may use EUtilities to achieve this online: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook Florent On 21/03/10 09:55, Ross KK Leung wrote: > Dear bioperl users, > > > > I am working on virus sequences and one of the Genbank file is here: > > > > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > m > > > > with 1000 such nucleotide sequences, I'd like to translate the corresponding > protein coding sequences. The difficulties lie in: > > > > 1) The genome sequence is circular > > 2) The genes are overlapping > > > > I don't have all the 1000 Genbank files but I plan to use the above guide > one to direct the automation process. Has bioperl implemented specialized > functions to handle this kind of problem? > > > > Thanks a lot for your advice, Ross > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Mon Mar 22 01:14:27 2010 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 22 Mar 2010 11:14:27 +1000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <004d01cac95c$15c95250$415bf6f0$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> Message-ID: <4BA6C473.4090404@gmail.com> Hi Ross, Please keep relies on the BioPerl mailing list so that everyone benefits. You should give detailed explanations of what you are tying to achieve., e.g.: * What type of input file do you have? * Do you already know the location of the ORFs? * what is the multiple alignments you are talking about ... Florent On 22/03/10 11:07, Ross KK Leung wrote: > Dear Florent, > > Thanks for your response. While the one with Genbank file can be extracted, > those without have to rely on alignment. Scripts certainly can be written to > move forward and backward on the multiple alignment but it is an error-prone > process and that's why I raised this question. > > Rgds, Ross > > > > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Monday, March 22, 2010 8:44 AM > To: Ross KK Leung > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] automation of translation based on alignment > > Hi Ross, > It seems like your answer is in the link you put. On this link, all the > coding sequences are already identified and their aminoacid sequence > provided. You simply need to parse all the GenBank entries to extract > this information. You may use EUtilities to achieve this online: > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > Florent > > On 21/03/10 09:55, Ross KK Leung wrote: > >> Dear bioperl users, >> >> >> >> I am working on virus sequences and one of the Genbank file is here: >> >> >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> >> > >> tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> >> >> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > >> m >> >> >> >> with 1000 such nucleotide sequences, I'd like to translate the >> > corresponding > >> protein coding sequences. The difficulties lie in: >> >> >> >> 1) The genome sequence is circular >> >> 2) The genes are overlapping >> >> >> >> I don't have all the 1000 Genbank files but I plan to use the above guide >> one to direct the automation process. Has bioperl implemented specialized >> functions to handle this kind of problem? >> >> >> >> Thanks a lot for your advice, Ross >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From ross at cuhk.edu.hk Mon Mar 22 01:22:47 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 09:22:47 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <4BA6C473.4090404@gmail.com> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> Message-ID: <004e01cac95e$2e375f10$8aa61d30$@edu.hk> Dear Florent, Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem details: Input: 1000 multiple aligned DNA sequences One of them has Genbank file http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 the remaining 999 ones only have genomic sequences. Objective: to derive the cognate protein aligned sequences. (here have 4 sets as there are 4 overlapping genes) Difficulties: 1) circular genome 2) there may be in-dels Hope now the problem has been clarified, Ross -----Original Message----- From: Florent Angly [mailto:florent.angly at gmail.com] Sent: Monday, March 22, 2010 9:14 AM To: Ross KK Leung; bioperl-l List Subject: Re: [Bioperl-l] automation of translation based on alignment Hi Ross, Please keep relies on the BioPerl mailing list so that everyone benefits. You should give detailed explanations of what you are tying to achieve., e.g.: * What type of input file do you have? * Do you already know the location of the ORFs? * what is the multiple alignments you are talking about ... Florent On 22/03/10 11:07, Ross KK Leung wrote: > Dear Florent, > > Thanks for your response. While the one with Genbank file can be extracted, > those without have to rely on alignment. Scripts certainly can be written to > move forward and backward on the multiple alignment but it is an error-prone > process and that's why I raised this question. > > Rgds, Ross > > > > -----Original Message----- > From: Florent Angly [mailto:florent.angly at gmail.com] > Sent: Monday, March 22, 2010 8:44 AM > To: Ross KK Leung > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] automation of translation based on alignment > > Hi Ross, > It seems like your answer is in the link you put. On this link, all the > coding sequences are already identified and their aminoacid sequence > provided. You simply need to parse all the GenBank entries to extract > this information. You may use EUtilities to achieve this online: > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > Florent > > On 21/03/10 09:55, Ross KK Leung wrote: > >> Dear bioperl users, >> >> >> >> I am working on virus sequences and one of the Genbank file is here: >> >> >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> >> > >> tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> >> >> > &itool=EntrezSystem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSu > >> m >> >> >> >> with 1000 such nucleotide sequences, I'd like to translate the >> > corresponding > >> protein coding sequences. The difficulties lie in: >> >> >> >> 1) The genome sequence is circular >> >> 2) The genes are overlapping >> >> >> >> I don't have all the 1000 Genbank files but I plan to use the above guide >> one to direct the automation process. Has bioperl implemented specialized >> functions to handle this kind of problem? >> >> >> >> Thanks a lot for your advice, Ross >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > From cjfields at illinois.edu Mon Mar 22 03:40:34 2010 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 21 Mar 2010 22:40:34 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <004e01cac95e$2e375f10$8aa61d30$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> Message-ID: <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> On Mar 21, 2010, at 8:22 PM, Ross KK Leung wrote: > Dear Florent, > > Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem > details: > > Input: > > 1000 multiple aligned DNA sequences > One of them has Genbank file > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > > the remaining 999 ones only have genomic sequences. > > Objective: to derive the cognate protein aligned sequences. (here have 4 > sets as there are 4 overlapping genes) > > Difficulties: > 1) circular genome > 2) there may be in-dels To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? One could try converting the reference sequence's feature coordinates to alignment column-based positions, pull sub-alignments out from there, then translate each sequence. There would be no need to re-retrieve sequences which are already present in the alignment, unless there is something not mentioned above that I'm missing. Re: circular genomes: recent commits to bioperl should allow handling circular genomes with features and subsequence extraction. If not I would consider that a serious bug that needs to be reported. If you need to grab remote sequences from a larger set of sequences (either locally or remotely) and translate them, you can use Bio::DB::GenBank, which will directly return a Bio::Seq object. Note you would obviously have to reset these per ID based on the start/end/strand: my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => 100, -seq_stop => 200, -strand => 1); my $seqobj = $gb->get_Seq_by_id($id); # or get_Seq_by_acc($acc) # do any preprocessing here... my $protein_seqobj = $seq->translate; If you want you could also download the sequences and use one of the various flatfile database classes to work with them (I believe Bio::DB::Fasta extracts subsequences very rapidly). It might be faster. For those regions that cross the origin you may need to pull two sequences and join them somehow, as the sequences likely won't run a join automatically. > Hope now the problem has been clarified, Ross Hope this helps. chris From ross at cuhk.edu.hk Mon Mar 22 05:30:06 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 13:30:06 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> Message-ID: <006901cac980$bb60f190$3222d4b0$@edu.hk> Dear Chris, It seems that Bioperl is "clever" enough to "rectify" my start and stop by reversing the order. e.g. start = 2300 stop = 1600 It will reverse back to 1600 and then 2300. What else to tell that I'm now working on a circular genome? -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Monday, March 22, 2010 11:41 AM To: Ross KK Leung Cc: 'Florent Angly'; 'bioperl-l List' Subject: Re: [Bioperl-l] automation of translation based on alignment On Mar 21, 2010, at 8:22 PM, Ross KK Leung wrote: > Dear Florent, > > Sorry for mis-clicking "reply" instead of "reply-all". Here are my problem > details: > > Input: > > 1000 multiple aligned DNA sequences > One of them has Genbank file > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 > > the remaining 999 ones only have genomic sequences. > > Objective: to derive the cognate protein aligned sequences. (here have 4 > sets as there are 4 overlapping genes) > > Difficulties: > 1) circular genome > 2) there may be in-dels To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? One could try converting the reference sequence's feature coordinates to alignment column-based positions, pull sub-alignments out from there, then translate each sequence. There would be no need to re-retrieve sequences which are already present in the alignment, unless there is something not mentioned above that I'm missing. Re: circular genomes: recent commits to bioperl should allow handling circular genomes with features and subsequence extraction. If not I would consider that a serious bug that needs to be reported. If you need to grab remote sequences from a larger set of sequences (either locally or remotely) and translate them, you can use Bio::DB::GenBank, which will directly return a Bio::Seq object. Note you would obviously have to reset these per ID based on the start/end/strand: my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => 100, -seq_stop => 200, -strand => 1); my $seqobj = $gb->get_Seq_by_id($id); # or get_Seq_by_acc($acc) # do any preprocessing here... my $protein_seqobj = $seq->translate; If you want you could also download the sequences and use one of the various flatfile database classes to work with them (I believe Bio::DB::Fasta extracts subsequences very rapidly). It might be faster. For those regions that cross the origin you may need to pull two sequences and join them somehow, as the sequences likely won't run a join automatically. > Hope now the problem has been clarified, Ross Hope this helps. chris From cjfields at illinois.edu Mon Mar 22 12:58:00 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 07:58:00 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <006901cac980$bb60f190$3222d4b0$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> Message-ID: <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > Dear Chris, > > It seems that Bioperl is "clever" enough to "rectify" my start and stop by > reversing the order. > > e.g. > start = 2300 > stop = 1600 > > It will reverse back to 1600 and then 2300. > What else to tell that I'm now working on a circular genome? Reverse it where, the alignment or the feature? The svn version of BioPerl, for alignments, retains strand information (this was a bug that was fixed). For features, start is always less than end, with directionality determined by strand. For a circular genome, the feature is split across the origin, as you have seen in the original sequence you posted: ... gene join(2307..3215,1..1623) /gene="P" ... This would be represented as a Bio::Location::SplitLocation in the feature; it would joined based on that order if $seq->is_circular() is true (or at least it should). In cases like this, the safe bet is to call spliced_seq() to get the joined sequence in all cases, then call translate() to get the protein sequence. chris From ross at cuhk.edu.hk Mon Mar 22 13:17:05 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Mon, 22 Mar 2010 21:17:05 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> Message-ID: <011701cac9c1$f7b89260$e729b720$@edu.hk> Chris, The following codes are what I use to retrieve sequences from GenBank. I know that I can use something like: for my $feature ($seqobj->get_SeqFeatures){ if ($feature->primary_tag eq "CDS") { ... To get features, but how should Bio::Location::SplitLocation be used? Do you mean something like: If ($seq->is_circular()) { spliced_seq(); } ? But the genome indeed has several such spliced sequences then how can I specify which is to retrieve? Thanks for your advice again~ #!/usr/bin/perl use Bio::SeqIO::genbank; use Bio::DB::GenBank; use Bio::DB::RefSeq; $gb = new Bio::DB::GenBank; my ($acc, $start, $stop) = @ARGV; my $gb = Bio::DB::GenBank->new(-format => 'Fasta', -seq_start => "$start", -seq_stop => "$stop", -strand => 1); $gbout = $acc; $seq = $gb->get_Seq_by_acc($acc); print "seq is ", $seq->seq, "\n"; $seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' ); $seqio_obj->write_seq($seq); exit; -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Monday, March 22, 2010 8:58 PM To: Ross KK Leung Cc: 'Florent Angly'; 'bioperl-l List' Subject: Re: [Bioperl-l] automation of translation based on alignment On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > Dear Chris, > > It seems that Bioperl is "clever" enough to "rectify" my start and stop by > reversing the order. > > e.g. > start = 2300 > stop = 1600 > > It will reverse back to 1600 and then 2300. > What else to tell that I'm now working on a circular genome? Reverse it where, the alignment or the feature? The svn version of BioPerl, for alignments, retains strand information (this was a bug that was fixed). For features, start is always less than end, with directionality determined by strand. For a circular genome, the feature is split across the origin, as you have seen in the original sequence you posted: ... gene join(2307..3215,1..1623) /gene="P" ... This would be represented as a Bio::Location::SplitLocation in the feature; it would joined based on that order if $seq->is_circular() is true (or at least it should). In cases like this, the safe bet is to call spliced_seq() to get the joined sequence in all cases, then call translate() to get the protein sequence. chris From jessica.sun at gmail.com Mon Mar 22 18:48:38 2010 From: jessica.sun at gmail.com (Jessica Sun) Date: Mon, 22 Mar 2010 14:48:38 -0400 Subject: [Bioperl-l] using Bio::SeqFeature::Tools::Unflattener Message-ID: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> Does any know how to get CDS of the corresponding mRNA accession(NM_) using this function? *Bio::SeqFeature::Tools::Unflattener many thanks in advance. * -- Jessica Jingping Sun From cjfields at illinois.edu Mon Mar 22 18:56:30 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 13:56:30 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature spliced_seq() Message-ID: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> I have just noticed that spliced_seq() is borked with Bio::DB::SeqFeature and am thinking about implementing it. Or is similar functionality already implemented elsewhere? Currently, it is calling entire_seq(), which I plan on avoiding simply to prevent sucking in the entire sequence into memory. This is currently what happens: --------------------------- my $it = $store->get_seq_stream(-type => 'mRNA'); my $ct = 0; while (my $sf = $it->next_seq) { my $seq = $sf->spliced_seq; # dies with exception } --------------------------- ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented by package Bio::DB::SeqFeature. This is not your fault - author of Bio::DB::SeqFeature should be blamed! STACK: Error::throw STACK: Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 STACK: Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 STACK: Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 STACK: Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 STACK: beestore.pl:17 ---------------------------------------------------------------- chris From csembry at ualr.edu Mon Mar 22 19:48:56 2010 From: csembry at ualr.edu (Charles Embry) Date: Mon, 22 Mar 2010 14:48:56 -0500 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista Message-ID: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> I want to create a Gui that will use current bioperl modules(along with some I am writing). It will be on a windows machine that runs XP and maybe a laptop with Vista.(this is a project i am working on in Graduate school for a professor). It will be id'ing promoter types in eukaryote organisms and also do multiple alignments. What recommendations do yo suggest to use t develop this? A java application? If so how hard is it to get Java to use perl and bioperl modules? Another language? Is there a tool to directly develop a GUI for bioperl modules that does no use another language? I will need to tag certain sequences with user specified colors and such. Thanks for the help From cjfields at illinois.edu Mon Mar 22 20:20:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 15:20:24 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <011701cac9c1$f7b89260$e729b720$@edu.hk> References: <002c01cac888$d570fe20$8052fa60$@edu.hk> <4BA6BD5B.9010509@gmail.com> <004d01cac95c$15c95250$415bf6f0$@edu.hk> <4BA6C473.4090404@gmail.com> <004e01cac95e$2e375f10$8aa61d30$@edu.hk> <181E4756-47D9-40C0-9A18-80241554289B@illinois.edu> <006901cac980$bb60f190$3222d4b0$@edu.hk> <0FACC77A-DBC1-4F41-8A4C-31824D23AD3C@illinois.edu> <011701cac9c1$f7b89260$e729b720$@edu.hk> Message-ID: On Mar 22, 2010, at 8:17 AM, Ross KK Leung wrote: > Chris, > > The following codes are what I use to retrieve sequences from GenBank. I > know that I can use something like: > > for my $feature ($seqobj->get_SeqFeatures){ > > if ($feature->primary_tag eq "CDS") { > ... > > To get features, but how should > > Bio::Location::SplitLocation > > be used? Do you mean something like: > > If ($seq->is_circular()) { > spliced_seq(); > } You probably won't directly see the SplitLocation itself unless you explicitly request it (it is contained in the sequence feature). Okay, so if you are trying to retrieve the sequence for a specific feature, you can use $sf->seq() (simple subsequence from start to end corrected for strand of feature). However, in the case where the feature crosses the origin it will contain a split location. In this case, you should call $sf->spliced_seq() to retrieve spliced sequence. For convenience, you could call spliced_seq on all sequence features; for simple locations it will just return the ordinary subseq(). So, if one had a generic sequence feature, one could call: $sf->spliced_seq->translate; to get the Bio::Seq object that is the translation of the seq feature region. > ? But the genome indeed has several such spliced sequences then how can I > specify which is to retrieve? Thanks for your advice again~ Do you mean alternatively spliced variants? These would be designated as separate features in a GenBank file, so you would check for those. Otherwise you'll have to clarify. If you haven't read them yet I suggest looking over the HOWTOs, specifically ones covering Seq/SeqIO and Feature/Annotation to get an idea of what is possible. chris > #!/usr/bin/perl > > use Bio::SeqIO::genbank; use Bio::DB::GenBank; > > use Bio::DB::RefSeq; > > > > $gb = new Bio::DB::GenBank; > > > > my ($acc, $start, $stop) = @ARGV; > > > > my $gb = Bio::DB::GenBank->new(-format => 'Fasta', > > -seq_start => "$start", > > -seq_stop => "$stop", > > -strand => 1); > > > > $gbout = $acc; > > > > $seq = $gb->get_Seq_by_acc($acc); > > print "seq is ", $seq->seq, "\n"; > > > > $seqio_obj = Bio::SeqIO->new(-file => ">$gbout.fa", -format => 'fasta' ); > > $seqio_obj->write_seq($seq); > > exit; > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Monday, March 22, 2010 8:58 PM > To: Ross KK Leung > Cc: 'Florent Angly'; 'bioperl-l List' > Subject: Re: [Bioperl-l] automation of translation based on alignment > > On Mar 22, 2010, at 12:30 AM, Ross KK Leung wrote: > >> Dear Chris, >> >> It seems that Bioperl is "clever" enough to "rectify" my start and stop by >> reversing the order. >> >> e.g. >> start = 2300 >> stop = 1600 >> >> It will reverse back to 1600 and then 2300. >> What else to tell that I'm now working on a circular genome? > > Reverse it where, the alignment or the feature? The svn version of BioPerl, > for alignments, retains strand information (this was a bug that was fixed). > For features, start is always less than end, with directionality determined > by strand. For a circular genome, the feature is split across the origin, > as you have seen in the original sequence you posted: > > ... > gene join(2307..3215,1..1623) > /gene="P" > ... > > > This would be represented as a Bio::Location::SplitLocation in the feature; > it would joined based on that order if $seq->is_circular() is true (or at > least it should). In cases like this, the safe bet is to call spliced_seq() > to get the joined sequence in all cases, then call translate() to get the > protein sequence. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Mon Mar 22 20:23:50 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 09:23:50 +1300 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E8829C2@exchsth.agresearch.co.nz> I guess it depends on how complex you need your GUI. If you only need a few a few menus, input fields, buttons, and are getting text or images as output then I'd stick to a simple web interface. You could tart it up a bit with Dojo or YUI libraries so it didn't look like every other webpage. If you need something more complex, you could give TK a go but I'm not sure how good it is and it will look a bit dated. If you're going to write the GUI in Swing, try Inline::Java and Java::Swing - take a look here: http://www.perlmonks.org/?node_id=372197 It may be easier to call Perl from Java so take a look at PLJava http://search.cpan.org/~gmpassos/PLJava-0.04/README.pod I haven't tried a Java GUI for Perl yet - we tend to use web interfaces for scripts that are going to get used by the "public" (i.e. scientists, not developers). We've found Mobyle http://bioweb2.pasteur.fr/projects/mobyle/ to be a nice way to get something up fairly quickly and it keep a consistent look to all our scripts. Hope this helps, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Charles Embry > Sent: Tuesday, 23 March 2010 8:49 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > > I want to create a Gui that will use current bioperl modules(along with > some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school > for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Mon Mar 22 20:26:15 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 22 Mar 2010 13:26:15 -0700 Subject: [Bioperl-l] Bio::DB::SeqFeature spliced_seq() In-Reply-To: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> References: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> Message-ID: <4BA7D267.6050704@bioperl.org> Yes it needs a special case I guess - since spliced_seq should work, however ... The only problem is that if both exons and CDS are sub-features you have to be smart enough to not grab both... So I have just relied on specialized dumping scripts for gff3_to_cds for my own needs (i.e. http://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/dbgff_to_cdspep.pl ). But you might also see what the Gbrowse plugin dumpers do. -jason Chris Fields wrote, On 3/22/10 11:56 AM: > I have just noticed that spliced_seq() is borked with > Bio::DB::SeqFeature and am thinking about implementing it. Or is > similar functionality already implemented elsewhere? > > Currently, it is calling entire_seq(), which I plan on avoiding simply > to prevent sucking in the entire sequence into memory. This is > currently what happens: > > > --------------------------- > > my $it = $store->get_seq_stream(-type => 'mRNA'); > > my $ct = 0; > while (my $sf = $it->next_seq) { > my $seq = $sf->spliced_seq; # dies with exception > } > > --------------------------- > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented > by package Bio::DB::SeqFeature. > This is not your fault - author of Bio::DB::SeqFeature should be blamed! > > STACK: Error::throw > STACK: > Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 > STACK: > Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 > STACK: > Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 > STACK: > Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 > STACK: beestore.pl:17 > ---------------------------------------------------------------- > > > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Mon Mar 22 20:33:48 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 22 Mar 2010 13:33:48 -0700 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> Message-ID: <4BA7D42C.5050602@cornell.edu> If I were doing a GUI for BioPerl, I would certainly not try to use Java. You could have a look at how Padre, the Perl IDE (written in Perl is implemented): http://search.cpan.org/~plaven/Padre-0.58/ They use wx, I think. But, a simple web or command-line application would be far easier to write, in any language, if you can find somewhere to host it. Rob Charles Embry wrote: > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Mon Mar 22 20:33:51 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 22 Mar 2010 13:33:51 -0700 Subject: [Bioperl-l] using Bio::SeqFeature::Tools::Unflattener In-Reply-To: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> References: <9adc0e9b1003221148n60151478y261e36f5341157ff@mail.gmail.com> Message-ID: <4BA7D42F.2060807@bioperl.org> you can try this but it is a bit of an involved script because it is setup for dealing with multiple genomes in multiple folders so you might want to simplify it. http://github.com/hyphaltip/genome-scripts/blob/master/data_format/genbank_gbk2gff3_unflatten.pl But I thought the perldoc was a good starting point - have you tried it Generally I do: GENBANK -> GFF3 --> genbank_gbk2gff3_unflatten.pl GFF3 -> {CDS,PEP,GENE} --> http://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/gff3_to_cdspep.pl (or equivalent) -jason Jessica Sun wrote, On 3/22/10 11:48 AM: > Does any know how to get CDS of the corresponding mRNA accession(NM_) using > this function? > *Bio::SeqFeature::Tools::Unflattener > > many thanks in advance. > > * > From Russell.Smithies at agresearch.co.nz Mon Mar 22 21:10:36 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 10:10:36 +1300 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4BA7D42C.5050602@cornell.edu> References: <4ebd3a291003221248g66a0cd30qcb14700b593de359@mail.gmail.com> <4BA7D42C.5050602@cornell.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E882A5B@exchsth.agresearch.co.nz> wx www.wxwidgets.org looks very interesting - I didn't realize Cn3D used it. wxPerl http://wxperl.sourceforge.net might be worth a look. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Robert Buels > Sent: Tuesday, 23 March 2010 9:34 a.m. > To: Charles Embry > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > > If I were doing a GUI for BioPerl, I would certainly not try to use > Java. You could have a look at how Padre, the Perl IDE (written in Perl > is implemented): http://search.cpan.org/~plaven/Padre-0.58/ They use > wx, I think. > > But, a simple web or command-line application would be far easier to > write, in any language, if you can find somewhere to host it. > > Rob > > > Charles Embry wrote: > > I want to create a Gui that will use current bioperl modules(along with > some > > I am writing). It will be on a windows machine that runs XP and maybe a > > laptop with Vista.(this is a project i am working on in Graduate school > for > > a professor). It will be id'ing promoter types in eukaryote organisms > and > > also do multiple alignments. > > > > What recommendations do yo suggest to use t develop this? A java > > application? If so how hard is it to get Java to use perl and bioperl > > modules? Another language? Is there a tool to directly develop a GUI for > > bioperl modules that does no use another language? > > > > I will need to tag certain sequences with user specified colors and > such. > > > > > > Thanks for the help > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From clarsen at vecna.com Mon Mar 22 20:51:08 2010 From: clarsen at vecna.com (Chris Larsen) Date: Mon, 22 Mar 2010 16:51:08 -0400 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: Ross, Chris F, I'd like to just comment on this since we are working in parallel on a similar problem. See also the prior thread in archives for Peters work in BioPython that I instigated: "Polyproteins, robo slippage, viral mat_peptides" This dialog below is just to clarify the science that will guide the pseudocode and logic flow would be needed to be built out into a BioPerl module. There are plenty of comments on the string mashing required, and its a harrowing morass, but heres some other thoughts. Three line item comments first, and then some open general ideas for moving this block of concepts forward: 1. >> Ross Said: >> I am working on virus sequences and one of the Genbank file is here: >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> If you are transferring protein annotation, why not use the RefSeq one instead of a GenBank one? In our experience at Virusbrc.org we find that protein annotation transfer is only a valid idea if you have reference sequences for each serotype, or your annotations will have propagation errors from the reference. They just dont align more than 80% of the time for instance in Dengue, and I assume you want better then that? Yes this HepB is a decent sequence, but the problem is that HepB has four main serotypes, and yet there is only one RefSeq: NC_003977. My guess is that you will have to define reference peptide seqs for all four serotypes first, and then grab the Taxon_ID from the input unknown file so you align right i.e. you need to do virus annotation below the species level or it isnt accurate. The number of reference sequences that you use is related to the conservation of your virus family. The script needs to know which one to align to, so we have pulled that from the taxon_ID field of the *.gbk file. You could also use blast and pull the high scorer. Your choice. >> Ross said: >> >> Thanks for your response. While the one with Genbank file can be >> extracted, >> those without have to rely on alignment. Scripts certainly can be >> written to >> move forward and backward on the multiple alignment but it is an >> error-prone We find also that viruses dont have the proteins annotated most of the time. It's just genome file. Part of the problem is that /host/ proteases sometimes cleave the /viral/ polyproteins, in a species- specific way, and since there is only one database entry, but many hosts, you can /only/ give the genome code and still be right for everything it /might/ infect. You cant define the peptides in the file, because they might be different, depending on the host. Sick, isnt it? The proteins produced in different animals based on their proteases cleavage specificity help determine whether the virus effects that animal or not. This is my hunch based on experience, no, I cannot give an example. 3. Chris F said: > To preface this, any reason you're not translating the alignment > sequences using the above sequence's features as a reference? A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. 4. Last thoughts: * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? *Sometimes the features on viruses are named differently: / mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. All ideas. Nothing set in stone. Dialog welcome. Good luck all. Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 clarsen at vecna.com From janine.arloth at googlemail.com Sun Mar 21 14:02:32 2010 From: janine.arloth at googlemail.com (Janine Arloth) Date: Sun, 21 Mar 2010 15:02:32 +0100 Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs In-Reply-To: References: Message-ID: Hello all, while running blast(n) I want to extend to method_arg like: .. $result = $fac->$blastprogramm_input( -query => $seq, -outfile => "blast.txt", -method_args => [ "-num_alignments" => $num_alignments_input, "-evalue" => $evalue_input, "-word_size" => $word_size_input, "-?" => $match_score_input, "-?" => $gapcosts_input ..... ] ); ... in Bio/Tools/BlastPlus/Config.pm I found for gap costs: bln| gapopen and bln| gapextend so when I have the input value = "4 4" , then Existence: 4 = gapaopen and Extension: 4 = gapextend ?? Is there a similar usage for Match/Mismatch scores like value="1,-2" -> match=1 and mismatch=-2?? (I can't find it) Thanks for help. From nils.mueller0 at googlemail.com Sun Mar 21 15:17:06 2010 From: nils.mueller0 at googlemail.com (=?ISO-8859-1?Q?Nils_M=FCller?=) Date: Sun, 21 Mar 2010 16:17:06 +0100 Subject: [Bioperl-l] BlastPlus Masker Message-ID: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> Dear all, I am confused in handeling with maskers in blastplus: I have fasta seq. and want to run blast with a low complexity masker like dustmasker: $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -mask_data => 'maskseqs.fas', -create => 1); Is myseqs.fas the same as maskseqs.fas??? I don't want to create a maskfile , I only will run blast with a masked file?? From razi.khaja at gmail.com Tue Mar 23 00:55:42 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Mon, 22 Mar 2010 20:55:42 -0400 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: <201003191525.o2JFPIr3019479@portal.open-bio.org> References: <201003191525.o2JFPIr3019479@portal.open-bio.org> Message-ID: Hello All, I've submitted a patch (blast.pm.diff) to bugzilla to enhance Bio/SearchIO/ blast.pm to be able to parse the algorithm_reference from BLAST reports. I've also submitted a patch (blast.t.diff) of 26 additional tests to parse the algorithm_reference from many of the BLAST reports in the t/data dir in bioperl-live. I'd like to get the patch into bioperl-live and would like someone to review the patch and tests. If the architecture for BLAST report parsing is changing, can someone let me know and I can contribute my efforts there. Below are links to bugzilla. Thanks, Razi Khaja ---------- Forwarded message ---------- From: Date: Fri, Mar 19, 2010 at 11:25 AM Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO To: bioperl-guts-l at bioperl.org http://bugzilla.open-bio.org/show_bug.cgi?id=3031 ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST ------- Created an attachment (id=1462) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) patch for t/SearchIO/blast.t to perform 26 additional tests to parse algorithm_reference from many BLAST report files -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l From Russell.Smithies at agresearch.co.nz Tue Mar 23 01:26:30 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 23 Mar 2010 14:26:30 +1300 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: References: <201003191525.o2JFPIr3019479@portal.open-bio.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> It's not really a bug if it was never implemented and it probably wasn't implemented because it wasn't needed. Is there actually a use case where you'd programmatically need to access the algorithm reference from Blast results?? I'm sure I can't think of one. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Razi Khaja > Sent: Tuesday, 23 March 2010 1:56 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse > algorithm_reference from BLAST reports using Bio::SearchIO > > Hello All, > > I've submitted a patch (blast.pm.diff) to bugzilla to enhance > Bio/SearchIO/ > blast.pm to be able to parse the algorithm_reference from BLAST reports. > I've also submitted a patch (blast.t.diff) of 26 additional tests to parse > the algorithm_reference from many of the BLAST reports in the t/data dir > in > bioperl-live. > > I'd like to get the patch into bioperl-live and would like someone to > review > the patch and tests. > > If the architecture for BLAST report parsing is changing, can someone let > me > know and I can contribute my efforts there. > > Below are links to bugzilla. > > Thanks, > > Razi Khaja > > ---------- Forwarded message ---------- > From: > Date: Fri, Mar 19, 2010 at 11:25 AM > Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference > from BLAST reports using Bio::SearchIO > To: bioperl-guts-l at bioperl.org > > > http://bugzilla.open-bio.org/show_bug.cgi?id=3031 > > > > > > ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST ------- > Created an attachment (id=1462) > --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) > patch for t/SearchIO/blast.t to perform 26 additional tests to parse > algorithm_reference from many BLAST report files > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From ross at cuhk.edu.hk Tue Mar 23 01:32:06 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 23 Mar 2010 09:32:06 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <001201caca28$a5e325b0$f1a97110$@edu.hk> Chris L, Your comment is insightful and as a non-virologist, I have never known that before. My strategy is just to extract the genomic fragments encoding proteins and derive the putative translated sequences. I'll do another round of MSA for the protein sequences in order to discover any outliners. There may be truncations, but as long as the protease acts post-translationally, it's acceptable. Chris F, What makes me feel frustrated is the verisimilar data structures and naming of Bio objects in Bioperl. If I want to retrieve a genbank file over the internet by: $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_acc('J00522'); And from: http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq so I can't do something like: my $seqobj = $seq->next_seq; for my $feat_object ($seqobj->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->spliced_seq->seq,"\n"; if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ print "gene: ",$val,"\n"; } } } } >From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the methods there mention nothing about how to get the features or inter-convert among the object types. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Larsen Sent: Tuesday, March 23, 2010 4:51 AM To: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] automation of translation based on alignment Ross, Chris F, I'd like to just comment on this since we are working in parallel on a similar problem. See also the prior thread in archives for Peters work in BioPython that I instigated: "Polyproteins, robo slippage, viral mat_peptides" This dialog below is just to clarify the science that will guide the pseudocode and logic flow would be needed to be built out into a BioPerl module. There are plenty of comments on the string mashing required, and its a harrowing morass, but heres some other thoughts. Three line item comments first, and then some open general ideas for moving this block of concepts forward: 1. >> Ross Said: >> I am working on virus sequences and one of the Genbank file is here: >> >> http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=1 >> > tem2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum> If you are transferring protein annotation, why not use the RefSeq one instead of a GenBank one? In our experience at Virusbrc.org we find that protein annotation transfer is only a valid idea if you have reference sequences for each serotype, or your annotations will have propagation errors from the reference. They just dont align more than 80% of the time for instance in Dengue, and I assume you want better then that? Yes this HepB is a decent sequence, but the problem is that HepB has four main serotypes, and yet there is only one RefSeq: NC_003977. My guess is that you will have to define reference peptide seqs for all four serotypes first, and then grab the Taxon_ID from the input unknown file so you align right i.e. you need to do virus annotation below the species level or it isnt accurate. The number of reference sequences that you use is related to the conservation of your virus family. The script needs to know which one to align to, so we have pulled that from the taxon_ID field of the *.gbk file. You could also use blast and pull the high scorer. Your choice. >> Ross said: >> >> Thanks for your response. While the one with Genbank file can be >> extracted, >> those without have to rely on alignment. Scripts certainly can be >> written to >> move forward and backward on the multiple alignment but it is an >> error-prone We find also that viruses dont have the proteins annotated most of the time. It's just genome file. Part of the problem is that /host/ proteases sometimes cleave the /viral/ polyproteins, in a species- specific way, and since there is only one database entry, but many hosts, you can /only/ give the genome code and still be right for everything it /might/ infect. You cant define the peptides in the file, because they might be different, depending on the host. Sick, isnt it? The proteins produced in different animals based on their proteases cleavage specificity help determine whether the virus effects that animal or not. This is my hunch based on experience, no, I cannot give an example. 3. Chris F said: > To preface this, any reason you're not translating the alignment > sequences using the above sequence's features as a reference? A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. 4. Last thoughts: * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? *Sometimes the features on viruses are named differently: / mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. All ideas. Nothing set in stone. Dialog welcome. Good luck all. Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 clarsen at vecna.com _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From razi.khaja at gmail.com Tue Mar 23 02:08:45 2010 From: razi.khaja at gmail.com (Razi Khaja) Date: Mon, 22 Mar 2010 22:08:45 -0400 Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference from BLAST reports using Bio::SearchIO In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> References: <201003191525.o2JFPIr3019479@portal.open-bio.org> <18DF7D20DFEC044098A1062202F5FFF32C6E882C24@exchsth.agresearch.co.nz> Message-ID: Nope, not a bug, It's an enhancement though ;) I implemented it so that I could do a loss less transformation from BLAST report format to other formats. You could consider that a use case. I also have additional patches that parse other details from BLAST reports that aren't currently implemented in Bio::SearchIO, and I'd like to contribute those as well, however, I thought I'd start with this one. Razi On Mon, Mar 22, 2010 at 9:26 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > It's not really a bug if it was never implemented and it probably wasn't > implemented because it wasn't needed. > Is there actually a use case where you'd programmatically need to access > the algorithm reference from Blast results?? > I'm sure I can't think of one. > > > --Russell > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Razi Khaja > > Sent: Tuesday, 23 March 2010 1:56 p.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 3031] Unable to parse > > algorithm_reference from BLAST reports using Bio::SearchIO > > > > Hello All, > > > > I've submitted a patch (blast.pm.diff) to bugzilla to enhance > > Bio/SearchIO/ > > blast.pm to be able to parse the algorithm_reference from BLAST reports. > > I've also submitted a patch (blast.t.diff) of 26 additional tests to > parse > > the algorithm_reference from many of the BLAST reports in the t/data dir > > in > > bioperl-live. > > > > I'd like to get the patch into bioperl-live and would like someone to > > review > > the patch and tests. > > > > If the architecture for BLAST report parsing is changing, can someone let > > me > > know and I can contribute my efforts there. > > > > Below are links to bugzilla. > > > > Thanks, > > > > Razi Khaja > > > > ---------- Forwarded message ---------- > > From: > > Date: Fri, Mar 19, 2010 at 11:25 AM > > Subject: [Bioperl-guts-l] [Bug 3031] Unable to parse algorithm_reference > > from BLAST reports using Bio::SearchIO > > To: bioperl-guts-l at bioperl.org > > > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=3031 > > > > > > > > > > > > ------- Comment #2 from razi.khaja at gmail.com 2010-03-19 11:25 EST > ------- > > Created an attachment (id=1462) > > --> (http://bugzilla.open-bio.org/attachment.cgi?id=1462&action=view) > > patch for t/SearchIO/blast.t to perform 26 additional tests to parse > > algorithm_reference from many BLAST report files > > > > > > -- > > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are the assignee for the bug, or are watching the assignee. > > _______________________________________________ > > Bioperl-guts-l mailing list > > Bioperl-guts-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From maj at fortinbras.us Tue Mar 23 02:51:24 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Mar 2010 22:51:24 -0400 Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs In-Reply-To: References: Message-ID: Hi Janine-- The options you need are "reward" (for the match score) and "penalty" (for the mismatch score). Add them to -method_args. cheers MAJ ----- Original Message ----- From: "Janine Arloth" To: Sent: Sunday, March 21, 2010 10:02 AM Subject: [Bioperl-l] BlastPlus -Match/Mismatch scores + Gap costs > Hello all, > > while running blast(n) I want to extend to method_arg like: > .. > $result = $fac->$blastprogramm_input( > -query => $seq, > -outfile => "blast.txt", > -method_args => [ > "-num_alignments" => $num_alignments_input, > "-evalue" => $evalue_input, > "-word_size" => $word_size_input, > "-?" => $match_score_input, > "-?" => $gapcosts_input > ..... > ] > ); > ... > > in Bio/Tools/BlastPlus/Config.pm I found for gap costs: bln| gapopen and bln| > gapextend > so when I have the input value = "4 4" , then Existence: 4 = gapaopen and > Extension: 4 = gapextend ?? > > Is there a similar usage for Match/Mismatch scores like value="1,-2" -> > match=1 and mismatch=-2?? > (I can't find it) > > Thanks for help. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue Mar 23 02:59:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 22 Mar 2010 22:59:56 -0400 Subject: [Bioperl-l] BlastPlus Masker In-Reply-To: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> References: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> Message-ID: Hi Nils, You don't have to specify a mask_data file; the factory should make it for you; try simply $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -create => 1); -mask_data is there so that pre-made masks can be applied separately, or so you can name the file that is produced and preserve it; this is an "advanced feature", I suppose-- MAJ ----- Original Message ----- From: "Nils M?ller" To: Sent: Sunday, March 21, 2010 11:17 AM Subject: [Bioperl-l] BlastPlus Masker > Dear all, > > I am confused in handeling with maskers in blastplus: > I have fasta seq. and want to run blast with a low complexity masker like > dustmasker: > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'my_masked_db', > -db_data => 'myseqs.fas', > -masker => 'dustmasker', > -mask_data => 'maskseqs.fas', > -create => 1); > > Is myseqs.fas the same as maskseqs.fas??? I don't want to create a > maskfile , I only will run blast with a masked file?? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Mar 23 04:43:03 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 23:43:03 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <001201caca28$a5e325b0$f1a97110$@edu.hk> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> Message-ID: <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> On Mar 22, 2010, at 8:32 PM, Ross KK Leung wrote: > Chris L, > > Your comment is insightful and as a non-virologist, I have never known that > before. My strategy is just to extract the genomic fragments encoding > proteins and derive the putative translated sequences. I'll do another round > of MSA for the protein sequences in order to discover any outliners. There > may be truncations, but as long as the protease acts post-translationally, > it's acceptable. > > Chris F, > > What makes me feel frustrated is the verisimilar data structures and naming > of Bio objects in Bioperl. If I want to retrieve a genbank file over the > internet by: > > $gb = new Bio::DB::GenBank; > > $seq = $gb->get_Seq_by_acc('J00522'); > > And from: > http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html > > it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq > so I can't do something like: A Bio::Seq::RichSeq is-a Bio::Seq (it inherits Bio::Seq and augments it). I believe 'Bio::Seq' in the documents refers to the fact one can retrieve FASTA sequence data (which returns a simple Bio::Seq) or richer records, such as a GenBank record (which returns a Bio::Seq::RichSeq). In this case, it should probably read 'Bio::SeqI' to be more accurate (implements the Bio::SeqI interface). Beyond the addition of a few accessor methods they are essentially the same, in they both have annotation, features, etc. > my $seqobj = $seq->next_seq; You're either not reading the demos or the relevant documentation correctly, or there is a spot in the docs that needs to be fixed (if the latter, please let us know). Bio::Seq does not implement a next_seq() method, but sequence *streams* (ala Bio::SeqIO) do. You are probably thinking of something like this: my $streamobj = $gb->get_Stream_by_acc(@ids); while (my $seqobj = $stream->next_seq) { # do stuff here } The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time. Unless you call a stream in some way, that code will not work. If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > for my $feat_object ($seqobj->get_SeqFeatures) { > > if ($feat_object->primary_tag eq "CDS") { > > print $feat_object->spliced_seq->seq,"\n"; > > if ($feat_object->has_tag('gene')) { > > for my $val ($feat_object->get_tag_values('gene')){ > > print "gene: ",$val,"\n"; > > } > > } > > } > > } > >> From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the > methods there mention nothing about how to get the features or inter-convert > among the object types. Just a note, but make sure to read up-to-date documentation, particularly if you are using the latest code. Here is the pdoc for the latest release: http://doc.bioperl.org/releases/bioperl-1.6.1/Bio/Seq/RichSeqI.html This is definitely worth pointing out, and is a good example where we can improve our documentation; I've added some links to classes that would explain more. In the meantime, the best thing to do in this case is to point you to the online documentation (which I think I did already, but just in case): http://www.bioperl.org/wiki/HOWTO:Beginners http://www.bioperl.org/wiki/HOWTO:Feature-Annotation chris From cjfields at illinois.edu Tue Mar 23 04:53:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 22 Mar 2010 23:53:48 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <42E3E2EC-2226-44CE-995E-01B425B161F1@illinois.edu> On Mar 22, 2010, at 3:51 PM, Chris Larsen wrote: > ... > 3. > Chris F said: > >> To preface this, any reason you're not translating the alignment sequences using the above sequence's features as a reference? > > > A logical place to start. But-they are usually not given. In addition to the above reason, the amount of data for viral sequences is rarer since fewer grad students want to sequence things that mame you or make you hurl, if you screw up on the nucleic acid extraction. Also, the locations for protein processing sites can be variable, like > or < instead of a real location in the string. So, the GenBank file isnt really very good as a reference, 5% of the time. Last, if there are three child proteins from a CDS, and one is made by a host protease, one by a viral protease, and one by a start codon, what do you say is 'mature'? What should be in the 'feature' field? Its not standardized right now. Nobody has this nailed at NCBI or UniProt. > > Still, like Chris says, a script that asks first for the coordinates, and takes that as the first go round, is best. The GenBank coords when provided, are accurate most of the time. AFter that, you end up comparing everything and making your choice. Yes, in this case nothing will be a immediate, perfect solution. It will take some additional work. > 4. > Last thoughts: > > * We tried BL2Seq to align query to target one at a time, with good reference sequences. It works, for exactly what you ask for. But! Only in a few virus families. And, its 1200 lines long, doing error checking; as you say its just not easy. Pulling an HSP from a blast report leaves one with with a lot of end trimming and comparing to do, since the HSP ends in an identity, and well, sometimes viruses vary at the point of cleavage of proteins. Good luck with that task, it gave us fits. Its not really appropriate to look at the ends of the hsp and say they are right. It requires that extra code. Still, we may open that code to the public after April database release. It only works for well conserved viruses. (I know... Jumbo Shrimp). Might be nice to see what you've done, whenever that is ready. > * I know of no BioPerl module that can parse an MSA and take out the relevant alignments, so you dont have to assign a reference sequence from scratch, every time you do this. Is there one? If you mean pulling out sets of sequences from a larger alignment or slices of alignments, there should be methods within Bio::SimpleAlign to do this, yes. > *Sometimes the features on viruses are named differently: /mat_peptide, /sig_peptide; sometimes they are named different in /note or /product. There is no standard for much of this. It needs to be proposed. Maybe we can do that together. > > * If you want to use a synoptic MSA for all Hepatitis B viruses, and then pull the alignments out of that, I'd love to talk to you. The VBRC used precomputed MSAs for all their virus families and got forward a little bit. We are looking into that code. > > All ideas. Nothing set in stone. Dialog welcome. > > Good luck all. > > Chris > > > -- > > Christopher Larsen, Ph.D. > Sr. Scientist / Grants Manager > Vecna Technologies > 6404 Ivy Lane #500 > Greenbelt, MD 20770 > Phone: (240) 965-4525 > Fax: (240) 547-6133 > > clarsen at vecna.com Very nice summary of the problems in the field. thanks! chris From ross at cuhk.edu.hk Tue Mar 23 05:20:56 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 23 Mar 2010 13:20:56 +0800 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> Message-ID: <001501caca48$9db03f70$d910be50$@edu.hk> my $streamobj = $gb->get_Stream_by_acc(@ids); while (my $seqobj = $stream->next_seq) { # do stuff here } The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time. Unless you call a stream in some way, that code will not work. If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > for my $feat_object ($seqobj->get_SeqFeatures) { > > if ($feat_object->primary_tag eq "CDS") { > > print $feat_object->spliced_seq->seq,"\n"; > > if ($feat_object->has_tag('gene')) { > > for my $val ($feat_object->get_tag_values('gene')){ > > print "gene: ",$val,"\n"; > > } > > } > > } > > } Chris, in fact I did have this code before, but then it goes back to the old problem that the spliced sequence is incorrect. Please try using the following codes with "DQ089804" as the argument. If you check the printed result with: http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=2&itool=EntrezSyst em2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum you'll discover, for example, the sequence of gene P, is derived from splicing 1-1623 (starts with CTC...) and 2307-3215 (starts with ATG...), rather than 2307-3215 and 1-1623. use Bio::SeqIO::genbank; use Bio::DB::GenBank; use Bio::SeqIO; my ($acc) = @ARGV; $gb = new Bio::DB::GenBank; $streamobj = $gb->get_Stream_by_acc($acc); my $seqobj = $streamobj->next_seq; for my $feat_object ($seqobj->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { print $feat_object->spliced_seq->seq,"\n"; if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ print "gene: ",$val,"\n"; } } } } exit; From e.osimo at gmail.com Tue Mar 23 09:42:25 2010 From: e.osimo at gmail.com (Emanuele Osimo) Date: Tue, 23 Mar 2010 10:42:25 +0100 Subject: [Bioperl-l] Xyplot and multiple lines plots Message-ID: <2ac05d0f1003230242o31779c30sffa42d8e99539b09@mail.gmail.com> Hello everyone, I would like to plot two data sets in Bio::Graphics using Xyplot, one superimposed on the other. I need to compare the differential expression of an Affy expression probeset in different subjects. I successfully managed to plot one at a time with: $panel->add_track( $feat, -graph_type=>'linepoints', -glyph =>'xyplot', -fgcolor=>'gray', -max_score => 1, -min_score => 0, ); But I cannot understand how to plot two lines independently in the same track. Thank you in advance, Emanuele From biopython at maubp.freeserve.co.uk Tue Mar 23 10:58:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Mar 2010 10:58:58 +0000 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: References: Message-ID: <320fb6e01003230358w11ae8e5fxef140652c5cc9f1b@mail.gmail.com> On Mon, Mar 22, 2010 at 8:51 PM, Chris Larsen wrote: > Ross, Chris F, > > I'd like to just comment on this since we are working in parallel on a > similar problem. See also the prior thread in archives for Peters work in > BioPython that I instigated: "Polyproteins, robo slippage, viral > mat_peptides" Minor typo - the old thread title was about ribo (ribosomal) slippage: http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031479.html http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031484.html etc Triggered in part by my discussion with Chris Larsen (off list) about the biological problem of getting the mature peptide sequences from GenBank files, Biopython 1.53 ended up with a new method for extracting the sequence region described by a (complex) location, e.g. from parsing in an EMBL/GenBank file. There were several threads about this, this is perhaps the best summary if anyone is interested: http://lists.open-bio.org/pipermail/biopython/2009-November/005813.html http://lists.open-bio.org/pipermail/biopython/2009-December/005889.html > This dialog below is just to clarify the science that will guide the > pseudocode and logic flow would be needed to be built out into a BioPerl > module. There are plenty of comments on the string mashing required, and its > a harrowing morass, but heres some other thoughts. Three line item comments > first, and then some open general ideas for moving this block of concepts > forward: Thanks for the update - it sounds like you've got a better understanding of the complexities now, any some of the reasons why representing things like mature peptides is tricky (the issue of different cleavage patterns in different hosts is interesting). Peter From cjfields at illinois.edu Tue Mar 23 12:46:37 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 23 Mar 2010 07:46:37 -0500 Subject: [Bioperl-l] automation of translation based on alignment In-Reply-To: <001501caca48$9db03f70$d910be50$@edu.hk> References: <001201caca28$a5e325b0$f1a97110$@edu.hk> <678B9B84-B309-4B31-AA37-38B73057C41A@illinois.edu> <001501caca48$9db03f70$d910be50$@edu.hk> Message-ID: <3A94734B-CD43-4674-8DB6-82EA1C6530E4@illinois.edu> On Mar 23, 2010, at 12:20 AM, Ross KK Leung wrote: > my $streamobj = $gb->get_Stream_by_acc(@ids); > > while (my $seqobj = $stream->next_seq) { > # do stuff here > } > > The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO > stream). '$stream->next_seq()' iterates through them one at a time. Unless > you call a stream in some way, that code will not work. If you call the > methods below directly on the *sequence* object ($seqobj, retrieved from > get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work. > >> for my $feat_object ($seqobj->get_SeqFeatures) { >> >> if ($feat_object->primary_tag eq "CDS") { >> >> print $feat_object->spliced_seq->seq,"\n"; >> >> if ($feat_object->has_tag('gene')) { >> >> for my $val ($feat_object->get_tag_values('gene')){ >> >> print "gene: ",$val,"\n"; >> >> } >> >> } >> >> } >> >> } > > Chris, in fact I did have this code before, but then it goes back to the old > problem that the spliced sequence is incorrect. Please try using the > following codes with "DQ089804" as the argument. If you check the printed > result with: > > http://www.ncbi.nlm.nih.gov/nuccore/DQ089804.1?ordinalpos=2&itool=EntrezSyst > em2.PEntrez.Sequence.Sequence_ResultsPanel.Sequence_RVDocSum > > you'll discover, for example, the sequence of gene P, is derived from > splicing 1-1623 (starts with CTC...) and 2307-3215 (starts with ATG...), > rather than 2307-3215 and 1-1623. Okay, as I mentioned before, then that would be a bug. The best way to handle this is to file it in Bugzilla: http://bugzilla.open-bio.org/ I can likely look at it today, whether it's filed or not, just need to make some time. Please file the bug report, though, just in case I can't get to it right away. BTW, we had some discussion about circular genome support recently at the GMOD conference, and some code was added that was supposed to address the issues raised. I'm guessing we'll need to add more tests just to be sure. chris ... From Jean-Marc.Frigerio at pierroton.inra.fr Tue Mar 23 16:29:11 2010 From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA) Date: Tue, 23 Mar 2010 17:29:11 +0100 Subject: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista In-Reply-To: References: Message-ID: <4BA8EC57.7070802@pierroton.inra.fr> > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help Hi, Have also a look to Gtk-perl and perl-qt Best From Leighton.Pritchard at scri.ac.uk Tue Mar 23 16:35:42 2010 From: Leighton.Pritchard at scri.ac.uk (Leighton Pritchard) Date: Tue, 23 Mar 2010 16:35:42 -0000 Subject: [Bioperl-l] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? Message-ID: Hi, I can't seem to find any discussion of this on the mailing list archives (if anyone has a link, I'll happily follow it), so I was wondering what the rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live mapping CDS features to gene_component_region. For example, if I use the script on the E.coli sequence/annotation NC_000913.gbk, the gene: gene 190..255 /gene="thrL" /locus_tag="b0001" /note="synonyms: ECK0001, JW4367" /db_xref="EcoGene:EG11277" /db_xref="ECOCYC:EG11277" /db_xref="GeneID:944742" CDS 190..255 /gene="thrL" /locus_tag="b0001" /function="leader; Amino acid biosynthesis: Threonine" /function="1.5.1.8 metabolism; building block biosynthesis; amino acids; threonine" /note="GO_process: threonine biosynthetic process [goid 0009088]" /codon_start=1 /transl_table=11 /product="thr operon leader peptide" /protein_id="NP_414542.1" /db_xref="ASAP:ABE-0000006" /db_xref="UniProtKB/Swiss-Prot:P0AD86" /db_xref="GI:16127995" /db_xref="EcoGene:EG11277" /db_xref="ECOCYC:EG11277" /db_xref="GeneID:944742" /translation="MKRISTTITTTITITTGNGAG" Is mapped to NC_000913 GenBank region 190 255 . + . ID=GenBank:region:NC_000913:190:255 NC_000913 GenBank exon 190 255 . + . ID=GenBank:exon:NC_000913:190:255 NC_000913 GenBank gene 190 255 . + . ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank gene_component_region 190 255 . + . Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG NGAG I understand the region-exon-gene part of the model, but not the gene_component_region, which appears to be a catch-all. I would have assumed that the CDS is better mapped to a polypeptide, as described in the CHADO documentation: http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model There is no difference in script output whether --CDS or --noCDS is used. Cheers, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From djibrilo at yahoo.fr Tue Mar 23 17:38:25 2010 From: djibrilo at yahoo.fr (djibrilo) Date: Tue, 23 Mar 2010 10:38:25 -0700 (PDT) Subject: [Bioperl-l] Re : G.U.I for bioperl on XP and possibly Vista In-Reply-To: <4BA8EC57.7070802@pierroton.inra.fr> References: <4BA8EC57.7070802@pierroton.inra.fr> Message-ID: <344176.4737.qm@web23001.mail.ird.yahoo.com> HI, Have also a look to perl/Tk. Best Regards ________________________________ De : Jean-Marc Frigerio INRA ? : bioperl-l at lists.open-bio.org Envoy? le : Mar 23 mars 2010, 17 h 29 min 11 s Objet : Re: [Bioperl-l] G.U.I for bioperl on XP and possibly Vista > I want to create a Gui that will use current bioperl modules(along with some > I am writing). It will be on a windows machine that runs XP and maybe a > laptop with Vista.(this is a project i am working on in Graduate school for > a professor). It will be id'ing promoter types in eukaryote organisms and > also do multiple alignments. > > What recommendations do yo suggest to use t develop this? A java > application? If so how hard is it to get Java to use perl and bioperl > modules? Another language? Is there a tool to directly develop a GUI for > bioperl modules that does no use another language? > > I will need to tag certain sequences with user specified colors and such. > > > Thanks for the help Hi, Have also a look to Gtk-perl and perl-qt Best _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From scott at scottcain.net Tue Mar 23 18:18:46 2010 From: scott at scottcain.net (Scott Cain) Date: Tue, 23 Mar 2010 14:18:46 -0400 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: References: Message-ID: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Hi Leighton, I wonder if this is a change stemming from Nathan's work on this script. Nathan? Scott On Tue, Mar 23, 2010 at 12:35 PM, Leighton Pritchard wrote: > Hi, > > I can't seem to find any discussion of this on the mailing list archives (if > anyone has a link, I'll happily follow it), so I was wondering what the > rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live > mapping CDS features to gene_component_region. > > For example, if I use the script on the E.coli sequence/annotation > NC_000913.gbk, the gene: > > ? ? gene ? ? ? ? ? ?190..255 > ? ? ? ? ? ? ? ? ? ? /gene="thrL" > ? ? ? ? ? ? ? ? ? ? /locus_tag="b0001" > ? ? ? ? ? ? ? ? ? ? /note="synonyms: ECK0001, JW4367" > ? ? ? ? ? ? ? ? ? ? /db_xref="EcoGene:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="ECOCYC:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="GeneID:944742" > ? ? CDS ? ? ? ? ? ? 190..255 > ? ? ? ? ? ? ? ? ? ? /gene="thrL" > ? ? ? ? ? ? ? ? ? ? /locus_tag="b0001" > ? ? ? ? ? ? ? ? ? ? /function="leader; Amino acid biosynthesis: Threonine" > ? ? ? ? ? ? ? ? ? ? /function="1.5.1.8 metabolism; building block > ? ? ? ? ? ? ? ? ? ? biosynthesis; amino acids; threonine" > ? ? ? ? ? ? ? ? ? ? /note="GO_process: threonine biosynthetic process [goid > ? ? ? ? ? ? ? ? ? ? 0009088]" > ? ? ? ? ? ? ? ? ? ? /codon_start=1 > ? ? ? ? ? ? ? ? ? ? /transl_table=11 > ? ? ? ? ? ? ? ? ? ? /product="thr operon leader peptide" > ? ? ? ? ? ? ? ? ? ? /protein_id="NP_414542.1" > ? ? ? ? ? ? ? ? ? ? /db_xref="ASAP:ABE-0000006" > ? ? ? ? ? ? ? ? ? ? /db_xref="UniProtKB/Swiss-Prot:P0AD86" > ? ? ? ? ? ? ? ? ? ? /db_xref="GI:16127995" > ? ? ? ? ? ? ? ? ? ? /db_xref="EcoGene:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="ECOCYC:EG11277" > ? ? ? ? ? ? ? ? ? ? /db_xref="GeneID:944742" > ? ? ? ? ? ? ? ? ? ? /translation="MKRISTTITTTITITTGNGAG" > > Is mapped to > > NC_000913 ? ? ? GenBank region ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=GenBank:region:NC_000913:190:255 > NC_000913 ? ? ? GenBank exon ? ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=GenBank:exon:NC_000913:190:255 > NC_000913 ? ? ? GenBank gene ? ?190 ? ? 255 ? ? . ? ? ? + ? ? ? . > ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: > ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 > NC_000913 ? ? ? GenBank gene_component_region ? 190 ? ? 255 ? ? . ? ? ? + > . > Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 > ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine > biosynthetic process [goid > 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino > acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block > biosynthesis%3B amino acids%3B > threonine;gene=thrL;locus_tag=b0001;product=thr operon leader > peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG > NGAG > > I understand the region-exon-gene part of the model, but not the > gene_component_region, which appears to be a catch-all. ?I would have > assumed that the CDS is better mapped to a polypeptide, as described in the > CHADO documentation: > > http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model > > There is no difference in script output whether --CDS or --noCDS is used. > > Cheers, > > L. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk ? ? ? w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C ? ? ? tel:+44(0)1382 562731 x2405 > > > ______________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. ?This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. ?It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). > ______________________________________________________ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-schema mailing list > Gmod-schema at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-schema > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From maj at fortinbras.us Tue Mar 23 18:15:38 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 23 Mar 2010 14:15:38 -0400 Subject: [Bioperl-l] BlastPlus Masker In-Reply-To: <464282111003230942r231ca93kf56a2def9afa9651@mail.gmail.com> References: <464282111003210817g109086f1v1c5a8ccef2180e09@mail.gmail.com> <464282111003230942r231ca93kf56a2def9afa9651@mail.gmail.com> Message-ID: Specifying 'dustmasker' for a nucleotide database is roughly the same as "filter : low complexity regions" and "mask : lookup table only", I believe. (There is also a facility for creating masks based on lowercase residues in a mask data fasta file; the blast+ utility is 'convert2blastmask'. You can run this with the SABlastPlus factory. I'm not very familiar with it, but you should be able to take the output file from this utility and feed it in to a new factory as the '-mask_data' to get what you want. (If anyone has done this, a brief step-by-step would be appreciated.)) cheers MAJ ----- Original Message ----- From: Nils M?ller To: Mark A. Jensen Sent: Tuesday, March 23, 2010 12:42 PM Subject: Re: [Bioperl-l] BlastPlus Masker Many thanks, is it the same as showed on the ncbi blast page (Filtering and Masking- filter: Low complexity regions and mask:Mask for lookup table only or Mask lower case letters)? 2010/3/23 Mark A. Jensen Hi Nils, You don't have to specify a mask_data file; the factory should make it for you; try simply $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -create => 1); -mask_data is there so that pre-made masks can be applied separately, or so you can name the file that is produced and preserve it; this is an "advanced feature", I suppose-- MAJ ----- Original Message ----- From: "Nils M?ller" To: Sent: Sunday, March 21, 2010 11:17 AM Subject: [Bioperl-l] BlastPlus Masker Dear all, I am confused in handeling with maskers in blastplus: I have fasta seq. and want to run blast with a low complexity masker like dustmasker: $fac = Bio::Tools::Run::StandAloneBlastPlus->new( -db_name => 'my_masked_db', -db_data => 'myseqs.fas', -masker => 'dustmasker', -mask_data => 'maskseqs.fas', -create => 1); Is myseqs.fas the same as maskseqs.fas??? I don't want to create a maskfile , I only will run blast with a masked file?? _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From lpritc at scri.ac.uk Wed Mar 24 12:05:08 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 24 Mar 2010 12:05:08 +0000 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Message-ID: Hi, I'm surprised that this issue hasn't come up already, as the change to the gene model is quite significant. For comparison, this is what the old bp_genbank2gff3.pl script would produce with --CDS: NC_000913 GenBank gene 190 255 . + . ID=thrL;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank mRNA 190 255 . + . ID=thrL.t01;Parent=thrL NC_000913 GenBank CDS 190 255 . + . ID=thrL.p01;Parent=thrL.t01;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0A D86,GI:16127995,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process : threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=length.21 NC_000913 GenBank exon 190 255 . + . Parent=thrL.t01 and with --noCDS: NC_000913 GenBank gene 190 255 . + . ID=thrL;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank mRNA 190 255 . + . ID=thrL.t01;Parent=thrL NC_000913 GenBank polypeptide 190 255 . + . ID=thrL.p01;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995, EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Derives_from=thrL.t01;Note=GO_p rocess: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=length.21 NC_000913 GenBank exon 190 255 . + . Parent=thrL.t01 The new script produces this identical output with both --CDS and --noCDS: NC_000913 GenBank region 190 255 . + . ID=GenBank:region:NC_000913:190:255 NC_000913 GenBank exon 190 255 . + . ID=GenBank:exon:NC_000913:190:255 NC_000913 GenBank gene 190 255 . + . ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms: ECK0001%2C JW4367;gene=thrL;locus_tag=b0001 NC_000913 GenBank gene_component_region 190 255 . + . Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995 ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine biosynthetic process [goid 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block biosynthesis%3B amino acids%3B threonine;gene=thrL;locus_tag=b0001;product=thr operon leader peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG NGAG So, although the new script improves the parent-child relationships by identifying parents on the locus_tag field (guaranteed to be unique), rather than gene name (not guaranteed to be unique), the GFF3 gene model has apparently changed from canonical: gene <- mRNA <- {polypeptide/CDS, exon} to this: region ; exon ; gene <- gene_component_region So I guess I don't understand the region-exon-gene part of the new model, after all. This new model doesn't appear to be Sequence Ontology-compatible any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon is no longer considered part_of the transcript. In fact, there's not a transcript. Given that the SO cite bp_genbank2gff3.pl as a way to get SO-compliant GFF3 (http://www.sequenceontology.org/resources/faq.html#convert), this might be an issue requiring a prompt fix or reversion. For now, due to the downstream problems this model causes with GBROWSE and ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the script to use the locus_tag field rather than the gene field for the feature ID. Cheers, L. On 23/03/2010 Tuesday, March 23, 18:18, "Scott Cain" wrote: > Hi Leighton, > > I wonder if this is a change stemming from Nathan's work on this > script. Nathan? > > Scott -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From cjfields at illinois.edu Wed Mar 24 13:06:01 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 08:06:01 -0500 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: References: Message-ID: <3A556027-C8DB-4683-8376-A42AC8796156@illinois.edu> On Mar 24, 2010, at 7:05 AM, Leighton Pritchard wrote: > Hi, > > I'm surprised that this issue hasn't come up already, as the change to the > gene model is quite significant. For comparison, this is what the old > bp_genbank2gff3.pl script would produce with --CDS: > ... > So, although the new script improves the parent-child relationships by > identifying parents on the locus_tag field (guaranteed to be unique), rather > than gene name (not guaranteed to be unique), the GFF3 gene model has > apparently changed from canonical: > > gene <- mRNA <- {polypeptide/CDS, exon} > > to this: > > region ; exon ; gene <- gene_component_region > > So I guess I don't understand the region-exon-gene part of the new model, > after all. This new model doesn't appear to be Sequence Ontology-compatible > any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon > is no longer considered part_of the transcript. In fact, there's not a > transcript. Given that the SO cite bp_genbank2gff3.pl as a way to get > SO-compliant GFF3 > (http://www.sequenceontology.org/resources/faq.html#convert), this might be > an issue requiring a prompt fix or reversion. I agree. I think this commit needs more code review to understand the reasoning behind it, though it will be a little trickier than a simple reversion (I think there have been additional unrelated commits since then). Nathan, was this the intent, or is this a bug? I would agree with Leighton that it's the latter. chris > For now, due to the downstream problems this model causes with GBROWSE and > ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the > script to use the locus_tag field rather than the gene field for the feature > ID. > > Cheers, > > L. From pmiguel at purdue.edu Wed Mar 24 13:49:55 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed, 24 Mar 2010 09:49:55 -0400 Subject: [Bioperl-l] How to set "complexity" param using EUtilities Message-ID: <4BAA1883.3010203@purdue.edu> Just a little FYI that might help someone using GenBank efetch (here with bioperl EUtilities) and, contrary to expectation, retrieving a bunch of accessions (or GIs) when that single accession is what is wanted. The trick is to change the "complexity" parameter from its apparent default of "1" to "0". Actually, this parameter might be worth adding to the HOWTO because it causes the EUtilities efetch to perform similar to a normal Entrez search. Which, to me, would be the expected behavior. Details below. Some accessions/GIs appear to be embedded in bundles of related sequences. Here is an example: gi|158819346|gb|EU011641.1| If I search Entrez Nucleotide http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar with the either "158819346" (the GI) or "EU011641.1", I get a single record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal RNA gene, partial sequence". This what I want. If I use the following code derived from the Eutils HOWTO: use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $id ='gb|EU011641.1|'; push @ids ,$id; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); my $file = "test.gb"; $factory->get_Response(-file => $file); I get a bundle of accessions: EU011584-EU011663. Same result using the GI number instead. From reading: http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam it looks like I would get what I want were I to set the efetch "complexity" parameter to "1". But how do I set that parameter? Below is how I did it. Not the most efficient path, but did not take that long to traverse... The HowTo does not mention it. I usually look to the the Deobfuscator: http://bioperl.org/cgi-bin/deob_interface.cgi to help me when I want some documentation for a method. But this is a parameter not a class. What class sets this parameter? Not sure. So I googled: complexity eutil site:bioperl.org The top ranked hit is actually to the deprecated 1.5.2 version of EUtilities. But the 2nd hit is to the (auto generatated?) email posted to the bioperl-guts email list by Chris Fields upon his commit of the new EUtilities overhaul: http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html From here it looks like the obvious way to set the parameter would be possible. And indeed: use Bio::DB::EUtilities; use Bio::SeqIO; my @ids; my $id ='gb|EU011641.1|'; push @ids ,$id; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -complexity =>1, -id => \@ids); my $file = "test.gb"; $factory->get_Response(-file => $file); works! Also a good idea to add -email parameter so that Genbank might chastise me via email, rather than banning my IP, if I try to send more than 100 requests in a series outside of the acceptable 9PM-5AM Eastern Time hours. Phillip From peter at maubp.freeserve.co.uk Wed Mar 24 14:08:26 2010 From: peter at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:08:26 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: References: Message-ID: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> Hi, This is probably of interest to all the Bio* projects offering access to the NCBI Entrez utilities. See forwarded message below. I *think* the new guidelines basically say that the email & tool parameters are optional BUT if your IP address ever gets banned for excessive use you then have to register an email & tool combination. Regarding the email address, the NCBI say to use the email of the developer (not the end user). However, they do not distinguish between the developers of a library (like us), and the developers of an application or script using a library (who may also be the end user). Currently we (Biopython) and I think BioPerl ask developers using our libraries to populate the email address themselves. I *think* this is still the right action. Peter ---------- Forwarded message ---------- From: Date: Wed, Mar 24, 2010 at 1:53 PM Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy To: NLM/NCBI List utilities-announce New E-utility documentation now on the NCBI Bookshelf The Entrez Programming Utilities (E-Utilities) Help documentation has been added to the NCBI Bookshelf, and so?is now fully integrated with the Entrez search and retrieval system as a part of the Bookshelf database. This help document has been divided into chapters for better organization and includes several new sample Perl scripts. At present this book covers the standard URL interface for the E-utilties; material about the SOAP interface will be added soon and is still available at the same URL: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html. Revised E-utility usage policy In December, 2009 NCBI announced a change to the usage policy for the E-utilities that would require all requests to contain non-null values for both the?&email and &tool parameters. After several consultations with our users and developers, we have decided to revise this policy change, and the revised?policy is described in detail at the following link: http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen Please let us know if you have any questions or concerns about this policy change. Thank you, The E-Utilities Team NIH/NLM/NCBI eutilities at ncbi.nlm.nih.gov. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce -------------- next part -------------- _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From joseguillin at hotmail.com Tue Mar 23 17:30:44 2010 From: joseguillin at hotmail.com (Jose .) Date: Tue, 23 Mar 2010 17:30:44 +0000 Subject: [Bioperl-l] Phylo/Phylip/Consense Message-ID: Hello, I'm trying to use Phylo/Phylip/Consense, but I get the following message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: SeqBoot did not create files correctly (/var/folders/+s/+srMEKriEiWM+Q7Qleiti++++TI/-Tmp-/v3no1dYNqE/outfile) STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::Tools::Run::Phylo::Phylip::SeqBoot::_run /usr/local/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm:389 STACK: Bio::Tools::Run::Phylo::Phylip::SeqBoot::run /usr/local/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/Phylo/Phylip/SeqBoot.pm:339 STACK: INDELVOLUTION_5.1consensus.pl:492 ----------------------------------------------------------- My code is a modification of the code I found at http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Phylo/Phylip/Consense.pm use Bio::Tools::Run::Phylo::Phylip::Consense; use Bio::Tools::Run::Phylo::Phylip::SeqBoot; use Bio::Tools::Run::Phylo::Phylip::ProtDist; use Bio::Tools::Run::Phylo::Phylip::Neighbor; use Bio::Tools::Run::Phylo::Phylip::DrawTree; my $aio = Bio::AlignIO->new(-file =>'yeah.clustalw', -format=> 'clustalw'); my $aln = $aio->next_aln; my ($aln_safe, $ref_name)=$aln->set_displayname_safe(); #next use seqboot to generate multiple aligments my @params = ('datatype'=>'SEQUENCE','replicates'=>10); my $seqboot_factory = Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params); my $aln_ref= $seqboot_factory->run($aln); #my $aln_ref= $seqboot_factory->run($aln_safe); #next build distance matrices and construct trees my $pd_factory = Bio::Tools::Run::Phylo::Phylip::ProtDist->new(); my $ne_factory = Bio::Tools::Run::Phylo::Phylip::Neighbor->new(); my @tree; foreach my $a (@{$aln_ref}){ my $mat = $pd_factory->create_distance_matrix($a); push @tree, $ne_factory->create_tree($mat); } #now use consense to get a final tree my $con_factory = Bio::Tools::Run::Phylo::Phylip::Consense->new(); #you may set outgroup either by the number representing the order in #which species are entered or by the name of the species $con_factory->outgroup(1); my $tree = $con_factory->run(\@tree); # Restore original sequence names, after ALL phylip runs: my @nodes = $tree->get_nodes(); foreach my $nd (@nodes){ $nd->id($ref_name->{$nd->id_output}) if $nd->is_Leaf; } #now draw the tree my $draw_factory = Bio::Tools::Run::Phylo::Phylip::DrawTree->new(); my $image_filename = $draw_factory->draw_tree($tree); And my yeah.clustalw file is OK: CLUSTAL W(1.81) multiple sequence alignment A/1-474 G---CGGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C B/1-452 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C C/1-466 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGAGTCC-----TATATC-CTA----C D/1-476 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGGA-------------TC-CTA----C E/1-439 G---CCGTGGGAGA------TGAGGAACCTGAGGTAGTCC-----TATATCTCTAGCGGC F/1-434 G---CCGTGGGAGA------TGAGGAACCCGAGG---TCC-----TATATCTCTAGCGGC G/1-462 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTA---------------TCTAGCGGC H/1-466 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTAGTCC--------ATCTCTAGCGGC I/1-462 GCTGCCGTGGGAGAGCAACATGAGGAACCGGAGGTAGTCCGGTATTATATCTCTA----C J/1-447 GCTGCCGTGGGAGAGCAACATGAGGAACCGGAGGTAGTCCGGTATTATATCTCTA----C K/1-448 G---CCGTGGGAGAGCA-CATGAGGAACCCGAGGTAGTCCGGT---ATATCTCGA----C L/1-431 G---CCGTGGGAGAGCA-CATGAGGAACCCGAGGTAGTCCGGT---ATATCTCTA----C M/1-432 G---CCGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCGGTATTATATCTCTA----C N/1-422 G---CC------GAGCAACATGAGGAAC---AGGTTGTC---TATTATATCTCTA----C O/1-441 G---CAGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCG--------TCTCTA----C P/1-446 G---CAGTGGGAGAGCAACATGAGGAACCCGAGGTTGTCCG--------TCTCTA----C * * ** ******** *** * * * A/1-474 GCATCGCGGCCCTTGTC-GCTCCCACCCA--CCATC---GACGGC-ACA--TTTGCTTGT B/1-452 GCAT----------GTC-GCTC---------CCATCGCTGACGGC-ACATCTTTG---GT C/1-466 GCATCGCGGCCCTTGTC-GCTCCCACCCATCCCATCGCTGACGGC-ACA-----GCTTGT D/1-476 GCATCGCGGCCCTTGTC-GCTCCCACCCATCCCATCGCTGACGGC-ACA-----GCTTG- E/1-439 GCA-CGCGGCCCT--TC-GCTT---CCCATCCCATCGCTGACGGC-ACATCT----TTGT F/1-434 GCA-CGCGGCCCT--TCCGCTT---CCCATCCCATCGCTTACGGC-ACATCTTTGCTTGT G/1-462 GCATCGCGGCCCT--TC-GCTC---CCCATCCCATCGCTGACGTC-ACATCTTTG-TTGT H/1-466 GCATCGCGGCCCT--TC-GCTC---CCCATCCCATCGCTGACGGC-ACATCTTTGCTTGT I/1-462 GCAT-CCGGCCCTTGTC-GCTCCCA------CCATCGCTGACGGC-ACAT--TTGCTTGT J/1-447 GC------GCCCTTGTC-GCTCCCA---------TCGCTGACGGC-ACATCTTTGCTTGT K/1-448 GCATCC----CCTTGTC-GCTCCCA------CCATCGCTGACGGC----TCTTTGCTTGT L/1-431 GCATCC----CCTTGTC-GCTCCCA------CCATCGCTGACGGC----TCTTTGCTTGT M/1-432 GCATC---GCCCTTGTC-GCTCCCA------CCATCGCTGAC-GC-ACATC-TTGCTTGT N/1-422 GCATC---GCCCTTGTC-GCTCCCA------CCATCGCTGACAGCAACATCTTTGCTTGT O/1-441 GCATC---GCCCTTGTC-GCTCCCA------CCATCTCTGACGGC-ACATCTTTGCTTGT P/1-446 GCATC---GCCCTTGTC-GCTCCCA------CCATCTCTGACGGC-ACATCTTTGCTTGT ** ** *** ** ** * * A/1-474 ACGAGATTGCTTTCACACTA-TCTATTGTTCGGGTACCGAGAGTCGGCGGTGAATACATC B/1-452 ACGAGATTGCGTTCACACTA-TCCATTGTTCGGGTACCGAGAGTC-GCGGTGAATACATC C/1-466 ACGTG--TGCGTTCCCACTAATCCATTGTTCGGGTAACGAGAGTCGGCGGTGAATACATG D/1-476 -CGTGATTGCGTTCCCACTAATCCATTGTTCGGGTAACGAGAGTCGGCGGTGAATACATC E/1-439 ACGTGATTGCG----CA--AATCCATTGT---GGTACCGAGAGTCGGCGGTGAACT---C F/1-434 ACGTGATTGCG----CA--AATCCATTGTTCGGGTACCGAGAGTCG-----GAACT---C G/1-462 ACGT----GCGTTCCCA--AATCCATTGTTCGGGTACCGAGAGTCGGCGGTGAACT---C H/1-466 ACGT-------TTCCCA--AATCCAT---TCGGGTACCGAGAGTCGGCGGTGAACT---C I/1-462 ACGTGATTGC--TCCCACCAATCCAT-GTTCGGGTACCGAGAGTCGGCGGTGAACTCATC J/1-447 ACGTGATTGC--TCCCACTAATCCAT-GTTCGGGTACCGA-----------GAACTCATC K/1-448 ACGTGATTGC--TCCCACTAATCCACTG--------CCGAGAGTCGGCGGTG---CCATC L/1-431 ACGTGATTGC--TC------ATC--TTGTTCGGGTACCGA-----GGCGGTGAACTCATC M/1-432 ACGTGATTGC--TCCCACTAATCC----TTCGGGTACCAAGAGTCGGCGGTGAACTCATC N/1-422 ACGTGATTGC--TCCCACTAATCC----TTCGGGTACCAAGAGTCGGCGGTGAACTCATC O/1-441 ACGTGATTGC--TCCCACTAATCCAT--TTCGGGTACCGAGAGTCGGCGGTGAACTCATC P/1-446 ACGTGATTGC--TCCCACTAATCCATTG--CGGGTACCGAGAGTCGGCGGTGAACTCATC ** ** * * * A/1-474 TCCGGAG--AAGTGTGCTAACCACAGTG--GAACGTATAATGCTGATCCCGCTTGTTT-- B/1-452 TCCGGAG--AA--GTGCTAACCACAGTG--GAACGTATAATGCTGAT-CCGCTT-TTT-- C/1-466 TCCGGAG--AAGTGTGCTAACCACAGTG--GAAAGTATAATGCT-----------TTT-- D/1-476 TCCGGAG--AAGTGT---AACCACAGTG--GAAAGTATAATGCTGATCCCGCTTGTTT-- E/1-439 TCCGG-----AGTGTGG-AACCACAGTG--GAACGTATAATGC--ATCTCGCGTGTTT-- F/1-434 TCCGG-----AGTGTGGTAACCACAGTG--GAACGTATAATGC--ATCCCGCGTGTTT-- G/1-462 TCCGGAG--AAGTGTGGTAACCACAGTG--GAACGTATAATGC--ATC--GCGTGTTT-- H/1-466 TCCGGAG--AAGTGTGGTAACCACAGT----AACGTAT-ATGC--ATCCCGCGTGTTT-- I/1-462 TCCGGAG--AAGTGTGGTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC J/1-447 TCGGGAG--AAGTGTGCTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC K/1-448 TCCGGAG--AAGTGTGGTAACCACAGTGCCGAAC--ATAATGC--ATCCCGCGTGTTTGC L/1-431 TCCGGAG--AAGTGTG----CCACAGTGCCGAAC--ATAATGC--ATC--GCGTGTTTGC M/1-432 TCCGGAGGAAAGTGTGGTAACCACAGTG--GAAC---------------CGC----TTCC N/1-422 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGC----TTCC O/1-441 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGCGTGTTTCC P/1-446 TCCGGAG--AAGTGTGGTAACCACAGTG--GAAC---------------CGCGTGTTTCC ** ** * ** ******* ** ** A/1-474 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----ATGTAC-CCGAGGACAACTAACAGT B/1-452 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----AGGTAC-CCGAGGACAACTAACAGT C/1-466 --CTGTACCTAAAGTTCACCGGGTAGAGCCTCGTCATGTAC-CCG-----AACTAACAGT D/1-476 --CTGTACCTAAAGTTCACCGGGTAGAGCC-----ATGTAC-CCGAGGACAACTAACAGT E/1-439 --CCGTACCTAAAGTT------GTAGGGCC-----ATGTACACCGAGGACAACTAACAGT F/1-434 --CCGTACCTAAAGTT-----GGTAGGGCC-----ATGTACACCGAGGACAACTAACAGT G/1-462 --CCGTACCTAAAGTTCTCC--GTAGGGCC-----ATGTACACCGAGGACAACTAACAGT H/1-466 --CCGTACCTAAAGTTCACCGGGTAGGGCC-----ATGTACACCGAGGACAACTAACAGT I/1-462 GATCGTACCTAAAGTTCACC--------CC-----A-------CGAG----ACTAACAG- J/1-447 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAG- K/1-448 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAGT L/1-431 GATCGTACCTAAAGTTCACCG-GTAGCGCC-----A-------CGAG----ACTAACAGT M/1-432 GACCGTACCT-----T-ACCG-GTAGCGCC-----ATGTACACCGAGC---ACTA----T N/1-422 GACCGTACCT-----TCACCG-GTAGTGCC-----ATGTACACCGAGC---ACTAACAGT O/1-441 GACCGTACCT-----TCACCG-GTAGCGCC-----ATGTACACCGAGC---ACTAACAGT P/1-446 GACCGTACCT-----TCACCG-GTAGCGCC-----ATG---ACCGAGC---ACTAACAGT ****** * ** * ** **** A/1-474 GATCCTCA----TCTAAGCGCCGCTTCAGGAC----ATTGCCACGTCTACATCG------ B/1-452 GATCCTCA----TTTAAGCGCCGCTTCAGGCC----ATTGCCACGTCTACATCG------ C/1-466 GATCCTCA----TTTAAGCGCCGCTTCAGGAC----ATTACCACGTCTACATCGTTTCAT D/1-476 GATCCTCA----TTTAAGCGCCGCTTCAGGAC----ATTACCACGTCTACATCGTTTCCT E/1-439 GATCCTCA----TTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT F/1-434 GATCCTCA----TTTAAGCGCCGC---AGGACTTTTATTGCCACGTCTACATCGTTTCAT G/1-462 GATCCTCACAATTTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT H/1-466 GATCCTC-CCATTTTAAGCGCCGC---AGGAC----ATTGCCACGTCTACATCGTTTCAT I/1-462 ---CCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCCACGTCTACATC---TCAT J/1-447 ---CCTCA----T-TAAGCGCCGCTGCAGGAC----ATTGCCACGTCTACATCGTTTCAT K/1-448 GATCCTCA----TTTAAGCGCCGCTGCAGG-------TTGCCACGTCTACATCGTTTCAT L/1-431 GATCCTCA----TTTAAGCGCCGCTGC----------TTGCCACGTCTACATCGTTTCAT M/1-432 GATC--CA----TTTAAGCGCCGCTGCAGG--------TGCCACGTCTACATCGTTTCAT N/1-422 GATC--CA----TTTAAGCGCCGCTGCAGGAA----ATTGCCACGTCTACATCGTTTCAT O/1-441 GATCCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCC--GTCTACATCGTA---- P/1-446 GATCCTCA----TTTAAGCGCCGCTGCAGGAC----ATTGCC--GTCTACATCGTTTCA- * * * ********** * ** ********* A/1-474 -CATCTACTCTT--AGGCAGCAACAATTTGTCTCGTTCGACGTACAG--CGAAC--ATGT B/1-452 -CATCTACTCTT--AGGCAGCAACAATT-GTCTCGTTCGATGTACAG--CGAAC--ATGT C/1-466 TCATCTACTTTT--AGCCAGCAACAATTTGTCTCGTAGGATGTACAG--CGAACATA--- D/1-476 TCATCTACTTTT--AGCCAGCAACAATTTGTCTCGTAGGATGTACAG--CGAACATA--- E/1-439 TCATCTACTTTT--AGGCAGCAACA---TGTATCGTACGATGTACAG--CGAACATATGT F/1-434 TCATCTACTTTT--AGGCAGCAACA---TGTATCGTACGATGTACAG--CGAA------T G/1-462 TCATCTACTTTT--AGGC-GCAACAATCTGTATCG-ACGATGTAC-G--CGAACATATGT H/1-466 TCATCTACTTTT--AGGC-GCAACAATCTGTATCG-ACGATGTAC-G--CGAACATATGT I/1-462 TCACCTACTTTT--AGGGAGCAACAATCTGTATCC---G--GTACAGACCGAACATAGGA J/1-447 TC----AC-TTT--AGGGAGCAACAATCTGTATCC---G--GTAC---CCGAACATAGGT K/1-448 TCACCTACTTTT--AGGCAGCAACAATCT--ATCC---G--GTAC-GACCGAACATAGGT L/1-431 TCACCTACTTTT--AGGCAGCAACAATCT--ATCC---G--GTAC-GACCGAACATAGGT M/1-432 TCATTTACT-----AGGCAGCAACAATCTGTATC--------TATAGACCGAGCATATGT N/1-422 TCATCTACT-----AGGCAGCAACAATCTGTATCC---G--GTATAGACCAAGCATATGT O/1-441 ------ACTTTT--AGGCAGCAAC--TCTGTATCC---G--GTATAGACCGAACATATGT P/1-446 ------ACTTTTTGAGGCAGCAAC--TCTGTATCC---G--GTATAGACCGAACATATGT ** ** ***** ** ** * * A/1-474 GGGGCGTAAGACCAAAGTT--TATCGTTGGCCTTATTCGACCCAA-CAATTCGCGGATA- B/1-452 GGGGCGTAAGACCAAAGTT--TATCGTTGGCCTTATTCGACCCAA-CAATTCGCGGATA- C/1-466 TGGGCGTAAGACCAAAGTTGAT--CGTTGG---TATTCGACCCAATCAAGTCGCG----- D/1-476 TGGGCGTAAGACCAAAGTTGAT--CGTGGGCCTTATTCGACCCAATCAATTCGCG---A- E/1-439 T----GTAAGACCAAAGTT--TATCGTTGG---TATTTGACCCAGGCAATTCGCGGATA- F/1-434 T----GTAAGACCAAAGTT--TATCGTTGG---TATTTGACCCAGGCAATTCGCGGATA- G/1-462 T--GCGTAAGACCAAAGTT--TATCGTTGGCCTTATTTGACC----CAATTCGCGGGTA- H/1-466 T--GAGTAAGACCAAAGTT--TATCGTTGGCCTTATTTGACC----CAATTCGCGGGTA- I/1-462 TGTGCTTAAGACCAAAGTT--TATCGTT------ATATGACCCAAGCAATTCGCGGATA- J/1-447 -GTGCTTAAGACCAAAGTT--TATCGTT------ACATGACCCAAGCAATTCGCGGATA- K/1-448 TGGGCGCAAGACCAAAGTT--TATCGTT------ATTTGACCCAAGCAATTCGCGGATAC L/1-431 TGGGCGCAAGACCAAAGTT--TATCGTT------ATTTGACCCAAGCAATTCGC-GATA- M/1-432 TGGGCGTAAGACCAAAGTT--TATCGTTGGCTTT----GACCCAAGCAAT--GC------ N/1-422 TGGGGGTAAGACCAA-------------GGCTTT----GACCCAAGCAAT--GC------ O/1-441 TGGGCG-AAGACCAAAGTT--TATCGATGGCCTTATTTGACCCAAGCAAT--GCGGATA- P/1-446 TGGGCG-AAGACCAAAGTT--TATCGATGGCCTTATTTGACCCAAGCAAT--GCGGATA- ******** **** *** ** A/1-474 -A--AT-------TTATTCATTATTACCACTGATCAC--CCTG-CACCTATGCGGTTT-- B/1-452 -A--ATCCCGTCTTTATTC------ACCACTGATCAC--CCTG-CAC--ATGCGGTTT-- C/1-466 -----TCCCGTCTTTATTCATTATAACCACTGATCAC--CCTGGCAC--ATGCGCTTT-- D/1-476 -A--ATCCCGTCTTTATTCATTATAACCACTGATCACGACCTGGCAC--ATGCGCTAT-- E/1-439 -A---TCCCGTCTTTATT--TTTTTAGC-CTGATCTC--CCTGGCAC--AT--------- F/1-434 -A---TCCCGTCTTTATTCATTTTTACC-CTGATCTC--C---------AT--------- G/1-462 -A--ATCCCGTCTTTATTCATTATAACC-CTGATCTC--CCTGGCAC--ATGCGGTTA-- H/1-466 -A--ATCCCGTCTTTATTCATTATAACC-CTGATCTC--CCTGGCAC--ATGCGGTTA-- I/1-462 -AGGATCCTGT--TTATTCTTTATAACC-CTGATCAC--CCTGGCAT--ATGCGGTTTGC J/1-447 -AGGATCCCGT--TTATTCTTTATAACC-CTGATCAC--CCTGGCAC--ATGCGGTTTGC K/1-448 AAGGATCCCGT-----GTCATTATAACC-CTGATCAC--ACTGGCAC--ATGCGGTTTGC L/1-431 -AGGATCCCGT-----TTCATTAT--CC-CTG-TCAC--CCTGGCAC--ATGCGGTTTGC M/1-432 --GGATCCCGT--TTATTCATTAAAACC-CTGA---C--CCTGGCAC--ATGCGGTTTGC N/1-422 --GGATCCCGT--TTATTCATTATAACC-CTGA---C--CCTGGCAC--ATGCGGTTTGC O/1-441 -ATGATCCCGT--TTATTCATTATAACC-CT---CAC--CCTGGCAC--ATGCGGTTTGC P/1-446 -AGGATCCCGT--TTATTCATTATAACC-CTGATCAC--CCTGGCAC--ATGCGGTTTGC * * * ** * ** A/1-474 ACTTCGATGCC B/1-452 ACTTCGATGCC C/1-466 ACTTCGATG-- D/1-476 ACTTCGATGCC E/1-439 -CTTCGATGCC F/1-434 -CTTCGATGCC G/1-462 ACTTCGATG-- H/1-466 ACTTCGATGCC I/1-462 --TTCGATGCC J/1-447 ACTTCGATGCC K/1-448 ACTTCGATG-- L/1-431 ACTTCGATG-- M/1-432 ACTTCGATGCC N/1-422 ACTTCGATGCC O/1-441 ACTTCG-TGCC P/1-446 ACTTCG-TGCC **** ** I have tried different things, but I don't really know why do I have this problem... Does anyone knows? Thank you very much in advance, Jose G. _________________________________________________________________ ?Quieres saber qu? PC eres? ?Desc?brelo aqu?! http://www.quepceres.com/ From cjfields at illinois.edu Wed Mar 24 14:37:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 09:37:13 -0500 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> Message-ID: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> On Mar 24, 2010, at 9:08 AM, Peter wrote: > Hi, > > This is probably of interest to all the Bio* projects offering access > to the NCBI > Entrez utilities. See forwarded message below. > > I *think* the new guidelines basically say that the email & tool parameters are > optional BUT if your IP address ever gets banned for excessive use you then > have to register an email & tool combination. > > Regarding the email address, the NCBI say to use the email of the developer > (not the end user). However, they do not distinguish between the developers > of a library (like us), and the developers of an application or script using a > library (who may also be the end user). > > Currently we (Biopython) and I think BioPerl ask developers using our libraries > to populate the email address themselves. I *think* this is still the > right action. > > Peter Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I think with the SOAP-based ones as well). We're providing a specific set of tools for user to write up their own applications end applications. I can try contacting them regarding this to get an official response to clarify this somewhat. Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a default, but always leave the email blank and issue a warning if it isn't set. We could just as easily leave both blank and issue warnings for both. chris > ---------- Forwarded message ---------- > From: > Date: Wed, Mar 24, 2010 at 1:53 PM > Subject: [Utilities-announce] NCBI Revised E-utility Usage Policy > To: NLM/NCBI List utilities-announce > > > New E-utility documentation now on the NCBI Bookshelf > > The Entrez Programming Utilities (E-Utilities) Help documentation has > been added to the NCBI Bookshelf, and so is now fully integrated with > the Entrez search and retrieval system as a part of the Bookshelf > database. This help document has been divided into chapters for better > organization and includes several new sample Perl scripts. At present > this book covers the standard URL interface for the E-utilties; > material about the SOAP interface will be added soon and is still > available at the same URL: > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html. > > > > Revised E-utility usage policy > > In December, 2009 NCBI announced a change to the usage policy for the > E-utilities that would require all requests to contain non-null values > for both the &email and &tool parameters. After several consultations > with our users and developers, we have decided to revise this policy > change, and the revised policy is described in detail at the following > link: > > http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=helpeutils&part=chapter2#chapter2.Usage_Guidelines_and_Requiremen > > Please let us know if you have any questions or concerns about this > policy change. > > > > Thank you, > > The E-Utilities Team > > NIH/NLM/NCBI > > eutilities at ncbi.nlm.nih.gov. > > > > _______________________________________________ > Utilities-announce mailing list > http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Wed Mar 24 14:51:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Mar 2010 14:51:46 +0000 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> Message-ID: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: > > On Mar 24, 2010, at 9:08 AM, Peter wrote: > >> Hi, >> >> This is probably of interest to all the Bio* projects offering access >> to the NCBI Entrez utilities. See forwarded message below. >> >> I *think* the new guidelines basically say that the email & tool parameters are >> optional BUT if your IP address ever gets banned for excessive use you then >> have to register an email & tool combination. >> >> Regarding the email address, the NCBI say to use the email of the developer >> (not the end user). However, they do not distinguish between the developers >> of a library (like us), and the developers of an application or script using a >> library (who may also be the end user). >> >> Currently we (Biopython) and I think BioPerl ask developers using our libraries >> to populate the email address themselves. I *think* this is still the >> right action. >> >> Peter > > > Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I > think with the SOAP-based ones as well). ?We're providing a specific set of > tools for user to write up their own applications end applications. ?I can try > contacting them regarding this to get an official response to clarify this > somewhat. Please give the NCBI an email - you can CC me too if you like. > Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a > default, but always leave the email blank and issue a warning if it isn't > set. ?We could just as easily leave both blank and issue warnings for both. We currently leave out the email and set the tool parameter to "Biopython" by default but this can be overridden. Currently leaving out the email does cause Biopython to give a warning. Peter From pmiguel at purdue.edu Wed Mar 24 14:59:50 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Wed, 24 Mar 2010 10:59:50 -0400 Subject: [Bioperl-l] How to set "complexity" param using EUtilities In-Reply-To: <4BAA1883.3010203@purdue.edu> References: <4BAA1883.3010203@purdue.edu> Message-ID: <4BAA28E6.4090907@purdue.edu> Sorry, I got that backwards. The default is "0", apparently. But to get entrez-like performance you want "complexity" to be set to "1". Phillip Phillip San Miguel wrote: > Just a little FYI that might help someone using GenBank efetch (here > with bioperl EUtilities) and, contrary to expectation, retrieving a > bunch of accessions (or GIs) when that single accession is what is > wanted. The trick is to change the "complexity" parameter from its > apparent default of "1" to "0". > > Actually, this parameter might be worth adding to the HOWTO because it > causes the EUtilities efetch to perform similar to a normal Entrez > search. Which, to me, would be the expected behavior. > > Details below. > > Some accessions/GIs appear to be embedded in bundles of related > sequences. Here is an example: > > gi|158819346|gb|EU011641.1| > > > If I search Entrez Nucleotide > > http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar > > with the either "158819346" (the GI) or "EU011641.1", I get a single > record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal > RNA gene, partial sequence". This what I want. > > If I use the following code derived from the Eutils HOWTO: > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $id ='gb|EU011641.1|'; > push @ids ,$id; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -id => \@ids); > > my $file = "test.gb"; > $factory->get_Response(-file => $file); > > I get a bundle of accessions: EU011584-EU011663. > Same result using the GI number instead. > > From reading: > > http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam > > > it looks like I would get what I want were I to set the efetch > "complexity" parameter to "1". > > But how do I set that parameter? Below is how I did it. Not the most > efficient path, but did not take that long to traverse... > > The HowTo does not mention it. I usually look to the the Deobfuscator: > > http://bioperl.org/cgi-bin/deob_interface.cgi > > to help me when I want some documentation for a method. But this is a > parameter not a class. What class sets this parameter? Not sure. So I > googled: > > complexity eutil site:bioperl.org > > The top ranked hit is actually to the deprecated 1.5.2 version of > EUtilities. But the 2nd hit is to the (auto generatated?) email posted > to the bioperl-guts email list by Chris Fields upon his commit of the > new EUtilities overhaul: > > http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html > > > From here it looks like the obvious way to set the parameter would be > possible. And indeed: > > > use Bio::DB::EUtilities; > use Bio::SeqIO; > my @ids; > my $id ='gb|EU011641.1|'; > push @ids ,$id; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -complexity =>1, > -id => \@ids); > > my $file = "test.gb"; > $factory->get_Response(-file => $file); > > works! > > Also a good idea to add -email parameter so that Genbank might > chastise me via email, rather than banning my IP, if I try to send > more than 100 requests in a series outside of the acceptable 9PM-5AM > Eastern Time hours. > > Phillip > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at drycafe.net Wed Mar 24 15:27:37 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 24 Mar 2010 11:27:37 -0400 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> On Mar 24, 2010, at 10:51 AM, Peter wrote: > Please give the NCBI an email - you can CC me too if you like. Can't this be the developers' mailing list (or lists, the appropriate one for each toolkit)? We can even whitelist all NCBI sender addresses so they can easily email us if there are issues. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed Mar 24 15:44:21 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 10:44:21 -0500 Subject: [Bioperl-l] Fwd: [Utilities-announce] NCBI Revised E-utility Usage Policy In-Reply-To: <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com> <38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu> <320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> Message-ID: <338BDDD8-2A66-4086-BFB7-35EC8F8F0D66@illinois.edu> On Mar 24, 2010, at 9:51 AM, Peter wrote: > On Wed, Mar 24, 2010 at 2:37 PM, Chris Fields wrote: >> >> On Mar 24, 2010, at 9:08 AM, Peter wrote: >> >>> Hi, >>> >>> This is probably of interest to all the Bio* projects offering access >>> to the NCBI Entrez utilities. See forwarded message below. >>> >>> I *think* the new guidelines basically say that the email & tool parameters are >>> optional BUT if your IP address ever gets banned for excessive use you then >>> have to register an email & tool combination. >>> >>> Regarding the email address, the NCBI say to use the email of the developer >>> (not the end user). However, they do not distinguish between the developers >>> of a library (like us), and the developers of an application or script using a >>> library (who may also be the end user). >>> >>> Currently we (Biopython) and I think BioPerl ask developers using our libraries >>> to populate the email address themselves. I *think* this is still the >>> right action. >>> >>> Peter >> >> >> Basically, that's the same tactic I'm going with with Bio::DB::EUtilities (and I >> think with the SOAP-based ones as well). We're providing a specific set of >> tools for user to write up their own applications end applications. I can try >> contacting them regarding this to get an official response to clarify this >> somewhat. > > Please give the NCBI an email - you can CC me too if you like. Sent, have cc'd the open-bio list. Don't want to cross-post this too much, so I think we should move the discussion there. >> Re: the tool parameter, we currently set the tool itself to 'BioPerl' as a >> default, but always leave the email blank and issue a warning if it isn't >> set. We could just as easily leave both blank and issue warnings for both. > > We currently leave out the email and set the tool parameter to "Biopython" > by default but this can be overridden. Currently leaving out the email does > cause Biopython to give a warning. > > Peter We follow the same, then (down to the warning). This is mentioned in my post to them, I'll wait to see what they say. My concern is the wording of the new rules. Each tool and email must be registered with them if an IP is blocked. Does this mean each tool is assigned one specific email? And an IP that is blocked can register it to be allowed back into the fold? With that in mind, should we register each of our toolkits with them? Probably not a bad thing (it might help us as devs to get an idea of use), but then if one user abuses the rules will their actions affect all toolkit users? Is this all done on a per-IP basis, per-toolkit basis, etc? Unfortunately, at least to me, none of this is made very clear, so I'm hoping there is some clarification from their end. chris From maj at fortinbras.us Wed Mar 24 16:37:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 24 Mar 2010 12:37:56 -0400 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: I think this is a great idea--- MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Peter" Cc: ; "Biopython-Dev Mailing List" ; ; "bioperl-l list" ; "Chris Fields" ; Sent: Wednesday, March 24, 2010 11:27 AM Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > >> Please give the NCBI an email - you can CC me too if you like. > > > Can't this be the developers' mailing list (or lists, the appropriate one for > each toolkit)? We can even whitelist all NCBI sender addresses so they can > easily email us if there are issues. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From thomas.sharpton at gmail.com Wed Mar 24 17:43:48 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 24 Mar 2010 10:43:48 -0700 Subject: [Bioperl-l] Codeml runtime error Message-ID: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> Hi Bioperl gurus, I'm trying to run PAML v4.3b on a series of orthologs, specifically by implementing codeml to detect signatures of positive selection between all orthologous pairs. In some of my files, I notice that I'm getting an EOF error that causes codeml to break. The weirdness is that I only get the EOF error under one hypothesis model (the null) and never on the alternative hypothesis model - even when run on the same initial data. I've managed to track the problem down to the way BioPerl formats the temporary phylip alignment file that is fed into codeml. Apparently, PAML requires there to be at least two spaces between the sequence identifier and the start of the sequence. However, for some files - and I don't know if this is random or not - the temporary alignment file only contains one space after the sequence identifier. If I edit the phylip file accordingly and rerun codeml, the software compiles and processes the data correctly. Has anyone run into this problem before and has someone figured a work around using the kaks_factory in Bio::Tools::Run::Phylo::PAML::Codeml.pm? If this is something others have not seen, I'll submit a full bug report. Best regards, Tom From Russell.Smithies at agresearch.co.nz Wed Mar 24 19:53:45 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 25 Mar 2010 08:53:45 +1300 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> The email thing is mainly to help NCBI contact developers who may be abusing or having trouble with their services. I've had an email from Scott McGinnis at NCBI before after he noticed one of my scripts could be improved. Generally, I've found their developers to be useful - it's just some of their helpdesk people who could use a lesson in being helpful. After all, it's not like they're Google or Microsoft and just collecting addresses so they can spam you later ;-) --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Thursday, 25 March 2010 5:38 a.m. > To: Hilmar Lapp; Peter > Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython- > Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris > Fields > Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI > RevisedE-utility Usage Policy > > I think this is a great idea--- MAJ > ----- Original Message ----- > From: "Hilmar Lapp" > To: "Peter" > Cc: ; "Biopython-Dev Mailing List" > ; ; "bioperl- > l > list" ; "Chris Fields" > ; > > Sent: Wednesday, March 24, 2010 11:27 AM > Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI > RevisedE-utility Usage Policy > > > > > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > > > >> Please give the NCBI an email - you can CC me too if you like. > > > > > > Can't this be the developers' mailing list (or lists, the appropriate > one for > > each toolkit)? We can even whitelist all NCBI sender addresses so they > can > > easily email us if there are issues. > > > > -hilmar > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > > =========================================================== > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Mar 24 20:01:50 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 15:01:50 -0500 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com> <5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> <18DF7D20DFEC044098A1062202F5FFF32C6E88321B@exchsth.agresearch.co.nz> Message-ID: Russell, The problem we're possibly running into now is that (acc. to the documents) we will likely have to define both the tool and email (or neither), as the tool and email are registered together. There are advantages and disadvantages to both scenarios, one that you point out. ATM I'm awaiting back word from NCBI for clarification (I popped 'em an email about this earlier) and will hopefully post their response here if they send one, then we'll hash out what needs to be done. And agreed about Scott, he's always been helpful. chris On Mar 24, 2010, at 2:53 PM, Smithies, Russell wrote: > The email thing is mainly to help NCBI contact developers who may be abusing or having trouble with their services. > I've had an email from Scott McGinnis at NCBI before after he noticed one of my scripts could be improved. Generally, I've found their developers to be useful - it's just some of their helpdesk people who could use a lesson in being helpful. > > After all, it's not like they're Google or Microsoft and just collecting addresses so they can spam you later ;-) > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen >> Sent: Thursday, 25 March 2010 5:38 a.m. >> To: Hilmar Lapp; Peter >> Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython- >> Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris >> Fields >> Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI >> RevisedE-utility Usage Policy >> >> I think this is a great idea--- MAJ >> ----- Original Message ----- >> From: "Hilmar Lapp" >> To: "Peter" >> Cc: ; "Biopython-Dev Mailing List" >> ; ; "bioperl- >> l >> list" ; "Chris Fields" >> ; >> >> Sent: Wednesday, March 24, 2010 11:27 AM >> Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI >> RevisedE-utility Usage Policy >> >> >>> >>> On Mar 24, 2010, at 10:51 AM, Peter wrote: >>> >>>> Please give the NCBI an email - you can CC me too if you like. >>> >>> >>> Can't this be the developers' mailing list (or lists, the appropriate >> one for >>> each toolkit)? We can even whitelist all NCBI sender addresses so they >> can >>> easily email us if there are issues. >>> >>> -hilmar >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >>> =========================================================== >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Mar 24 19:53:48 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 24 Mar 2010 12:53:48 -0700 Subject: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBIRevisedE-utility Usage Policy In-Reply-To: References: <320fb6e01003240708o48eeb30eq3b09110dcc2d1873@mail.gmail.com><38D43B03-4A85-48CB-913A-CD564EB5168C@illinois.edu><320fb6e01003240751v2afd5d5bwa39590afa9b13209@mail.gmail.com><5D427F97-706E-4F66-95BA-2B397520C4FA@drycafe.net> Message-ID: <1A4207F8295607498283FE9E93B775B406A418BB@EX02.asurite.ad.asu.edu> Well, the problem with NCBI using the address to email about problem users is that the lists can't really identify the user since it isn't a specific program, but someone's specific implementation utilizing the toolkit that is causing problems. So, not sure how this would help with the problem of dealing with trouble users. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen Sent: Wednesday, March 24, 2010 9:38 AM To: Hilmar Lapp; Peter Cc: bioruby at lists.open-bio.org; biojava-dev at lists.open-bio.org; Biopython-Dev Mailing List; bioperl-l list; open-bio-l at lists.open-bio.org; Chris Fields Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBIRevisedE-utility Usage Policy I think this is a great idea--- MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Peter" Cc: ; "Biopython-Dev Mailing List" ; ; "bioperl-l list" ; "Chris Fields" ; Sent: Wednesday, March 24, 2010 11:27 AM Subject: Re: [Bioperl-l] [Open-bio-l] Fwd: [Utilities-announce] NCBI RevisedE-utility Usage Policy > > On Mar 24, 2010, at 10:51 AM, Peter wrote: > >> Please give the NCBI an email - you can CC me too if you like. > > > Can't this be the developers' mailing list (or lists, the appropriate one for > each toolkit)? We can even whitelist all NCBI sender addresses so they can > easily email us if there are issues. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed Mar 24 20:38:31 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 24 Mar 2010 21:38:31 +0100 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> Message-ID: <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> Hi Tom, Thanks for your note. From your description, it sounds like a bug report is in order. If you could include a little test case so we can reproduce it, that would be great. Dave From thomas.sharpton at gmail.com Wed Mar 24 20:40:55 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 24 Mar 2010 13:40:55 -0700 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> Message-ID: <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> Hi Dave, Thanks for the prompt reply. I'll submit a full bug report along with a code snippet and sample data set that should demonstrate the error. If there's anyway I can help, do let me know. Best, Tom On Mar 24, 2010, at 1:38 PM, Dave Messina wrote: > Hi Tom, > > Thanks for your note. From your description, it sounds like a bug > report is in order. If you could include a little test case so we > can reproduce it, that would be great. > > > Dave > From David.Messina at sbc.su.se Wed Mar 24 20:52:59 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 24 Mar 2010 21:52:59 +0100 Subject: [Bioperl-l] Codeml runtime error In-Reply-To: <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> References: <629EF23D-0C79-4F44-9201-E76F78378C07@berkeley.edu> <55E90C9C-2008-4122-8EA4-B5A89149B7E0@sbc.su.se> <433DEFF0-BF0F-481F-BA7F-4D4A2C8BFF0D@gmail.com> Message-ID: <4BEA53ED-87B6-4EE0-B5E6-AE304A335AA8@sbc.su.se> > Thanks for the prompt reply. I'll submit a full bug report along with a code snippet and sample data set that should demonstrate the error. Terrific, thanks! > If there's anyway I can help, do let me know. Oh don't worry...I will. :) D From cjfields at illinois.edu Thu Mar 25 04:50:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 24 Mar 2010 23:50:11 -0500 Subject: [Bioperl-l] [Gmod-gbrowse] Bio::DB::SeqFeature spliced_seq() In-Reply-To: <4BA7D267.6050704@bioperl.org> References: <1269284190.9834.14.camel@pyrimidine.igb.uiuc.edu> <4BA7D267.6050704@bioperl.org> Message-ID: <46D94C25-4E2D-4E64-A696-1C9D3F785EEB@illinois.edu> Yes, that's essentially what I have working now. I suppose the best way to do this is have an optional type supplied and splice only those, checking the subfeatures to ensure that type exists. I'll check against SeqFeatureI's spliced_seq() to see if there are any API issues. chris On Mar 22, 2010, at 3:26 PM, Jason Stajich wrote: > Yes it needs a special case I guess - since spliced_seq should work, > however ... The only problem is that if both exons and CDS are > sub-features you have to be smart enough to not grab both... > > So I have just relied on specialized dumping scripts for gff3_to_cds for > my own needs (i.e. > http://github.com/hyphaltip/genome-scripts/blob/master/seqfeature/dbgff_to_cdspep.pl > ). > But you might also see what the Gbrowse plugin dumpers do. > > -jason > Chris Fields wrote, On 3/22/10 11:56 AM: >> I have just noticed that spliced_seq() is borked with >> Bio::DB::SeqFeature and am thinking about implementing it. Or is >> similar functionality already implemented elsewhere? >> >> Currently, it is calling entire_seq(), which I plan on avoiding simply >> to prevent sucking in the entire sequence into memory. This is >> currently what happens: >> >> >> --------------------------- >> >> my $it = $store->get_seq_stream(-type => 'mRNA'); >> >> my $ct = 0; >> while (my $sf = $it->next_seq) { >> my $seq = $sf->spliced_seq; # dies with exception >> } >> >> --------------------------- >> >> ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >> MSG: Abstract method "Bio::SeqFeatureI::entire_seq" is not implemented >> by package Bio::DB::SeqFeature. >> This is not your fault - author of Bio::DB::SeqFeature should be blamed! >> >> STACK: Error::throw >> STACK: >> Bio::Root::Root::throw /home/cjfields/bioperl/live/Bio/Root/Root.pm:368 >> STACK: >> Bio::Root::RootI::throw_not_implemented /home/cjfields/bioperl/live/Bio/Root/RootI.pm:739 >> STACK: >> Bio::SeqFeatureI::entire_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:325 >> STACK: >> Bio::SeqFeatureI::spliced_seq /home/cjfields/bioperl/live/Bio/SeqFeatureI.pm:458 >> STACK: beestore.pl:17 >> ---------------------------------------------------------------- >> >> >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse From lpritc at scri.ac.uk Thu Mar 25 11:20:01 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 25 Mar 2010 11:20:01 +0000 Subject: [Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region? In-Reply-To: <4536f7701003231118s431fb44g42bbaba526c2f1ca@mail.gmail.com> Message-ID: Hi, Nathan's been in touch to ask exactly what the command-line was that I was using, and this was missing from the thread so, for info: bp_genbank2gff3.pl --noCDS NC_000913.gbk And bp_genbank2gff3.pl --CDS NC_000913.gbk With occasional absolute paths to the input sequence. L. On 23/03/2010 Tuesday, March 23, 18:18, "Scott Cain" wrote: > Hi Leighton, > > I wonder if this is a change stemming from Nathan's work on this > script. Nathan? > > Scott > -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From aradwen at gmail.com Fri Mar 26 11:29:16 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 26 Mar 2010 12:29:16 +0100 Subject: [Bioperl-l] aacomp.pl problem Message-ID: Hello, I'm facing a little problem with aacomp.pl in scripts examples that comes with Bioperl Here is the error message Can't locate object method "valid_aa" via package "Bio::Tools::CodonTable" at aacomp.pl line 16. Any Idea ? Thx Radwen From David.Messina at sbc.su.se Fri Mar 26 12:51:11 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 13:51:11 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: Message-ID: Hi Radwen, The latest version of aacomp (from subversion) worked fine for me. That version has this line near the top of the script: # $Id: aacomp.PLS 15088 2008-12-04 02:49:09Z bosborne $ If yours is different, you might try upgrading to the latest version. In fact, I'm almost certain that is the problem, since the valid_aa method is in the Bio::SeqUtils class, not Bio::Tools::CodonTable. Dave From David.Messina at sbc.su.se Fri Mar 26 14:24:25 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 15:24:25 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: Message-ID: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> Hi, Yes, the subversion site is temporarily down. However, there are nightly builds http://www.bioperl.org/DIST/nightly_builds/ and the Github mirror http://github.com/bioperl Dave On Mar 26, 2010, at 15:20, Radwen Aniba wrote: > The subversion site is down?!!! From David.Messina at sbc.su.se Fri Mar 26 14:35:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 26 Mar 2010 15:35:29 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> Message-ID: <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> Radwen, Please be sure to 'reply all' so that everyone on the list can follow this discussion. > Sorry to ask beginners questions but how to configure these mirrors to upgrade ? > > I'm using ubuntu Step 1: download the bioperl-live tarball from, for example, http://www.bioperl.org/DIST/nightly_builds/ Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix Dave From cjfields at illinois.edu Fri Mar 26 14:40:20 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 09:40:20 -0500 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> Message-ID: <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> On Mar 26, 2010, at 9:35 AM, Dave Messina wrote: > Radwen, > > Please be sure to 'reply all' so that everyone on the list can follow this discussion. > > >> Sorry to ask beginners questions but how to configure these mirrors to upgrade ? >> >> I'm using ubuntu > > > > > Step 1: download the bioperl-live tarball from, for example, http://www.bioperl.org/DIST/nightly_builds/ > > Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > > > > Dave You can also get tarballs of bioperl-live from the github mirror (via the 'Download Source' link): http://github.com/bioperl/bioperl-live These are updated every 15 minutes. chris From aradwen at gmail.com Fri Mar 26 14:41:51 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Fri, 26 Mar 2010 15:41:51 +0100 Subject: [Bioperl-l] aacomp.pl problem In-Reply-To: <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> References: <8F4A5B98-FA2A-41E6-B1A9-953405203AB6@sbc.su.se> <57ED3418-CEF2-42BE-8318-2C9D0B566826@sbc.su.se> <448C78BA-7AEB-41EF-9121-2DF22B861AC9@illinois.edu> Message-ID: Thank you 2010/3/26 Chris Fields > > On Mar 26, 2010, at 9:35 AM, Dave Messina wrote: > > > Radwen, > > > > Please be sure to 'reply all' so that everyone on the list can follow > this discussion. > > > > > >> Sorry to ask beginners questions but how to configure these mirrors to > upgrade ? > >> > >> I'm using ubuntu > > > > > > > > > > Step 1: download the bioperl-live tarball from, for example, > http://www.bioperl.org/DIST/nightly_builds/ > > > > Step 2: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix > > > > > > > > > > Dave > > > You can also get tarballs of bioperl-live from the github mirror (via the > 'Download Source' link): > > http://github.com/bioperl/bioperl-live > > These are updated every 15 minutes. > > chris From maj at fortinbras.us Fri Mar 26 14:34:49 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 10:34:49 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <4BABB825.6010803@cse.msu.edu> References: <4BABB825.6010803@cse.msu.edu> Message-ID: <249674A825C14BB3801C6184DEEA7A82@NewLife> Hi Alok-- Thanks for your interest! You should certainly consider applying. I can work with you on developing your application. I'm including the bioperl mailing list on this post; we'll continue to have this conversation on the list so that the helpful, friendly, knowledgeable, compassionate membership can participate. WrapperMaker code is currently available in svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run for an example of how Bio::Tools::Run::WrapperBase and CommandExts are used (er, by me...). cheers MAJ ----- Original Message ----- From: "Alok" To: Sent: Thursday, March 25, 2010 3:23 PM Subject: BioPerl Google SOC project > Hello Mark, > > My name is Alok Watve and I am currently pursuing PhD in Computer > Science at Michigan State University. I was going through the BioPerl > Wiki for Google SOC projects. I have good experience with Perl and was > wondering if I could work on the project "Perl Run Wrappers". > > Prior to joining MSU, I was working with D E Shaw India Software Pvt. > Ltd. My work was involved in writing Java programs and their perl > wrappers. We used perl scripts to fire java programs with all the > correct parameters. So I think I have some idea about what wrappers are. > However, I have not used BioPerl and may take some time to get familiar > with the structure. I am fairly confident that I will be able to do this. > > During my work here at MSU. I use perl a lot for doing basic text > analysis for my projects. Although I rarely use OO features of perl, I > have used them in past and never had any problems with it. I also > believe in writing well-documented and user/developer friendly code > (With comments, command line options for help/documentation). I have > attached a simple script I wrote for my project as an example. I have > also attached my resume for your consideration. > > Please let me know if you think that I am an appropriate candidate and > whether I should go ahead with submitting an application with BioPerl as > my Mentor Organization. > > Thanks a lot, > Alok > www.cse.msu.edu/~watvealo/ > -------------------------------------------------------------------------------- > #!/usr/bin/perl > > =pod > > =head1 SYNOPSIS > > Script to edit existing box query files to enable random box query. > This scripts inserts box size on each line corresponding to discrete > dimension in the existing box query file. The maximum value of "box size" > depends on the alphabet size. > > Example > ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile > > Use -perldoc for detailed help on options. > > =head1 OPTIONS > > =over > > =item -infile > > Specifies the name of the input box query file. > > =item -outfile > > Specifies the name of the output file. > > =item -uniform_box > > Specifies size of the uniform box query. > > =item -max_size > > Specifies the maximum box size for random sized box query. > > =item -help > > Displays a brief help message and exits. > > =item -perldoc > > Displays a detailed help. > > =back > > =cut > > use strict; > use warnings 'all'; > > use Getopt::Long; > use Pod::Usage; > > GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, > 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, > 'help' => \my $help, 'perldoc' => \my $perldoc); > > if(defined($perldoc)) > { > pod2usage(-verbose => 2); > } > > if(defined($help)) > { > pod2usage(-verbose=> 0); > } > > if(! (defined($infile) && defined ($outfile) )) > { > die('Please specify input, output files. Use -perldoc > for more help'); > } > > # Some basic error checking to ensure script runs .... > if(!(defined($uniformBox) ||defined($maxSize))) > { > die('Specify either box size for uniform box queries or maximum box size > for random box queries'); > } > > # Initialize random number generator. > srand(); > > # Read Input file and find out lines we are interested in > # Then perfix the line with correct box size as defined by > # user choice > open(IN, "<$infile"); > open(OUT, ">$outfile"); > my $count = 0; > while(my $line = ) > { > if( ($count%64) < 32 ) > { > if(defined($uniformBox)) > { > $line = sprintf("%d ",$uniformBox) . $line; > } > elsif(defined($maxSize)) > { > # This line corresponds to the discrete dimension. > $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; > } > } > $count ++; > print OUT $line > } > > close(OUT); > close(IN); > From cjfields at illinois.edu Fri Mar 26 15:06:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 10:06:26 -0500 Subject: [Bioperl-l] BioPerl and the Google Summer of Code Message-ID: Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via twitter: http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html http://use.perl.org/~cjfields/journal/40275 I'll update the BioPerl page with a couple more ideas later today (think: Moose and/or Perl6...). chris From awitney at sgul.ac.uk Fri Mar 26 15:20:36 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 26 Mar 2010 15:20:36 +0000 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl Message-ID: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. Please read the install the bioperl-ext package" even though i did compile and install the Bio::Ext::Align package If not using the pSW module, what do other people use for this? thanks adam From cjfields at illinois.edu Fri Mar 26 15:51:41 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 10:51:41 -0500 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl In-Reply-To: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> References: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> Message-ID: <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> It's not actively developed as far as I know. I've been thinking that we could break it out of bioperl-ext and release it on it's own, with the intent that someone could take it up at some point. We have started down that road with the HMM tools in bioperl-ext, though that one is still maintained by it's author. I know many users just use calls to outside programs, such EMBOSS (which has water and needle) or others. From the maintenance standpoint they're easier to update if something changes, XS can be a bugbear. chris On Mar 26, 2010, at 10:20 AM, Adam Witney wrote: > Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error > > "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. > Please read the install the bioperl-ext package" > > even though i did compile and install the Bio::Ext::Align package > > If not using the pSW module, what do other people use for this? > > thanks > > adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pmiguel at purdue.edu Fri Mar 26 15:52:17 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 26 Mar 2010 11:52:17 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook Message-ID: <4BACD831.20506@purdue.edu> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. I am trying to use the code provided at: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO and modified to request gi228534658 The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::DB::EUtilities; my @ids; push @ids, '228534658'; my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); my $file = 'myseqs.gb'; # dump HTTP::Response content to a file (not retained in memory) $factory->get_Response(-file => $file); my $seqin = Bio::SeqIO->new(-file => $file, -format => 'genbank'); while (my $seq = $seqin->next_seq) { print "I see a sequence\n"; print $seq->species(); } "myseqs.gb" does have content: Seq-entry ::= seq { id { general { db "gpid:36555" , tag str "contig49313" } , genbank { accession "EZ113652" , version 1 } , gi 228534658 } , descr { title "TSA: Zea mays contig49313, mRNA sequence." , source { genome genomic , org { taxname "Zea mays" , db { { db "taxon" , tag id 4577 } } , orgname { name binomial { genus "Zea" , species "mays" } , lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACCAD clade; Panicoideae; Andropogoneae; Zea" , gcode 1 , mgcode 1 , div "PLN" } } } , molinfo { biomol mRNA , tech tsa } , pub { pub { article { title { name "Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing." } , authors { names std { { name name { last "Vega-Arreguin" , initials "J.C." } } , { name name { last "Ibarra-Laclette" , initials "E." } } , { name name { last "Jimenez-Moraila" , initials "B." } } , { name name { last "Martinez" , initials "O." } } , { name name { last "Vielle-Calzada" , initials "J.P." } } , { name name { last "Herrera-Estrella" , initials "L." } } , { name name { last "Herrera-Estrella" , initials "A." } } } } , from journal { title { iso-jta "BMC Genomics" , ml-jta "BMC Genomics" , issn "1471-2164" , name "BMC genomics" } , imp { date std { year 2009 , month 7 , day 6 } , volume "10" , issue "1" , pages "299" , language "ENG" , pubstatus aheadofprint , history { { pubstatus received , date std { year 2008 , month 12 , day 2 } } , { pubstatus accepted , date std { year 2009 , month 7 , day 6 } } , { pubstatus aheadofprint , date std { year 2009 , month 7 , day 6 } } , { pubstatus other , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } , { pubstatus pubmed , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } , { pubstatus medline , date std { year 2009 , month 7 , day 8 , hour 9 , minute 0 } } } } } , ids { pii "1471-2164-10-299" , doi "10.1186/1471-2164-10-299" , pubmed 19580677 } } , pmid 19580677 } } , pub { pub { sub { authors { names std { { name name { last "Vega-Arreguin" , first "Julio" , initials "J.C." } } , { name name { last "Ibarra-Laclette" , first "Enrique" , initials "E." } } , { name name { last "Jimenez-Moraila" , first "Beatriz" , initials "B." } } , { name name { last "Martinez" , first "Octavio" , initials "O." } } , { name name { last "Vielle-Calzada" , first "Jean" , initials "J.Philippe." } } , { name name { last "Herrera-Estrella" , first "Luis" , initials "L." } } , { name name { last "Herrera-Estrella" , first "Alfredo" , initials "A." } } } , affil std { affil "Laboratorio Nacional de Genomica para la Biodiversidad" , div "Cinvestav Campus Guanajuato" , city "Irapuato" , sub "Guanajuato" , country "Mexico" , street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , postal-code "36821" } } , medium other , date std { year 2009 , month 3 , day 23 } } } } , user { type str "GenomeProjectsDB" , data { { label str "ProjectID" , data int 36555 } , { label str "ParentID" , data int 0 } } } , create-date std { year 2009 , month 5 , day 5 } , update-date std { year 2009 , month 7 , day 14 } } , inst { repr raw , mol rna , length 450 , seq-data ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 760CFF0'H } } Maybe I am using the wrong format? This looks more like ASN than genbank format to me. Phillip From maj at fortinbras.us Fri Mar 26 15:37:56 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 11:37:56 -0400 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: References: Message-ID: <648F9E90AF07449887FD4C420AA8B00E@NewLife> and discussions are started in LinkedIn in 'Bioinformatics Geeks' and 'Perl Mongers' groups--MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Friday, March 26, 2010 11:06 AM Subject: [Bioperl-l] BioPerl and the Google Summer of Code > Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via > twitter: > > http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html > http://use.perl.org/~cjfields/journal/40275 > > I'll update the BioPerl page with a couple more ideas later today (think: > Moose and/or Perl6...). > > chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri Mar 26 16:16:22 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 11:16:22 -0500 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <4BACD831.20506@purdue.edu> References: <4BACD831.20506@purdue.edu> Message-ID: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. chris On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. > > I am trying to use the code provided at: > > http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO > > and modified to request gi228534658 > > The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. > > Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: > > #!/usr/bin/perl > use strict; > use warnings; > > use Bio::SeqIO; > use Bio::DB::EUtilities; > > my @ids; > push @ids, '228534658'; > my $factory = Bio::DB::EUtilities->new( > -eutil => 'efetch', > -db => 'nucleotide', > -rettype => 'genbank', > -id => \@ids); > > my $file = 'myseqs.gb'; > > # dump HTTP::Response content to a file (not retained in memory) > $factory->get_Response(-file => $file); > > my $seqin = Bio::SeqIO->new(-file => $file, > -format => 'genbank'); > > while (my $seq = $seqin->next_seq) { > print "I see a sequence\n"; > print $seq->species(); > } > > > "myseqs.gb" does have content: > > Seq-entry ::= seq { > id { > general { > db "gpid:36555" , > tag > str "contig49313" } , > genbank { > accession "EZ113652" , > version 1 } , > gi 228534658 } , > descr { > title "TSA: Zea mays contig49313, mRNA sequence." , > source { > genome genomic , > org { > taxname "Zea mays" , > db { > { > db "taxon" , > tag > id 4577 } } , > orgname { > name > binomial { > genus "Zea" , > species "mays" } , > lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; > Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; > PACCAD clade; Panicoideae; Andropogoneae; Zea" , > gcode 1 , > mgcode 1 , > div "PLN" } } } , > molinfo { > biomol mRNA , > tech tsa } , > pub { > pub { > article { > title { > name "Deep sampling of the Palomero maize transcriptome by a high > throughput strategy of pyrosequencing." } , > authors { > names > std { > { > name > name { > last "Vega-Arreguin" , > initials "J.C." } } , > { > name > name { > last "Ibarra-Laclette" , > initials "E." } } , > { > name > name { > last "Jimenez-Moraila" , > initials "B." } } , > { > name > name { > last "Martinez" , > initials "O." } } , > { > name > name { > last "Vielle-Calzada" , > initials "J.P." } } , > { > name > name { > last "Herrera-Estrella" , > initials "L." } } , > { > name > name { > last "Herrera-Estrella" , > initials "A." } } } } , > from > journal { > title { > iso-jta "BMC Genomics" , > ml-jta "BMC Genomics" , > issn "1471-2164" , > name "BMC genomics" } , > imp { > date > std { > year 2009 , > month 7 , > day 6 } , > volume "10" , > issue "1" , > pages "299" , > language "ENG" , > pubstatus aheadofprint , > history { > { > pubstatus received , > date > std { > year 2008 , > month 12 , > day 2 } } , > { > pubstatus accepted , > date > std { > year 2009 , > month 7 , > day 6 } } , > { > pubstatus aheadofprint , > date > std { > year 2009 , > month 7 , > day 6 } } , > { > pubstatus other , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } , > { > pubstatus pubmed , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } , > { > pubstatus medline , > date > std { > year 2009 , > month 7 , > day 8 , > hour 9 , > minute 0 } } } } } , > ids { > pii "1471-2164-10-299" , > doi "10.1186/1471-2164-10-299" , > pubmed 19580677 } } , > pmid 19580677 } } , > pub { > pub { > sub { > authors { > names > std { > { > name > name { > last "Vega-Arreguin" , > first "Julio" , > initials "J.C." } } , > { > name > name { > last "Ibarra-Laclette" , > first "Enrique" , > initials "E." } } , > { > name > name { > last "Jimenez-Moraila" , > first "Beatriz" , > initials "B." } } , > { > name > name { > last "Martinez" , > first "Octavio" , > initials "O." } } , > { > name > name { > last "Vielle-Calzada" , > first "Jean" , > initials "J.Philippe." } } , > { > name > name { > last "Herrera-Estrella" , > first "Luis" , > initials "L." } } , > { > name > name { > last "Herrera-Estrella" , > first "Alfredo" , > initials "A." } } } , > affil > std { > affil "Laboratorio Nacional de Genomica para la Biodiversidad" , > div "Cinvestav Campus Guanajuato" , > city "Irapuato" , > sub "Guanajuato" , > country "Mexico" , > street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , > postal-code "36821" } } , > medium other , > date > std { > year 2009 , > month 3 , > day 23 } } } } , > user { > type > str "GenomeProjectsDB" , > data { > { > label > str "ProjectID" , > data > int 36555 } , > { > label > str "ParentID" , > data > int 0 } } } , > create-date > std { > year 2009 , > month 5 , > day 5 } , > update-date > std { > year 2009 , > month 7 , > day 14 } } , > inst { > repr raw , > mol rna , > length 450 , > seq-data > ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 > 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A > A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 > 760CFF0'H } } > > > Maybe I am using the wrong format? This looks more like ASN than genbank format to me. > > Phillip > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Mar 26 16:38:26 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 11:38:26 -0500 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: <648F9E90AF07449887FD4C420AA8B00E@NewLife> References: <648F9E90AF07449887FD4C420AA8B00E@NewLife> Message-ID: <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> BioPerl GSoC page updated with the Moose/Modern Perl/BioPerl 6-based project: http://www.bioperl.org/wiki/Google_Summer_of_Code#BioPerl_2.0_.28and_beyond.29 Feel free to add your name to the lost of mentors if you are interested. chris On Mar 26, 2010, at 10:37 AM, Mark A. Jensen wrote: > and discussions are started in LinkedIn in 'Bioinformatics Geeks' and 'Perl Mongers' groups--MAJ > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Sent: Friday, March 26, 2010 11:06 AM > Subject: [Bioperl-l] BioPerl and the Google Summer of Code > > >> Just posted a blog re: BioPerl and GSoC to the main Perl blogs and via twitter: >> >> http://blogs.perl.org/users/pyrimidine/2010/03/bioperl-and-the-google-summer-of-code.html >> http://use.perl.org/~cjfields/journal/40275 >> >> I'll update the BioPerl page with a couple more ideas later today (think: Moose and/or Perl6...). >> >> chris >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From pmiguel at purdue.edu Fri Mar 26 17:28:09 2010 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 26 Mar 2010 13:28:09 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Message-ID: <4BACEEA9.2060407@purdue.edu> Ah, yes. That does the trick. Actually I have already downloaded a few thousand records in whatever that format that is returned when 'genbank' is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= seq {') Any idea what format that is and how to convert it to something SeqIO can use? If not, I can just pull them all down again by sending about 200 gi's per request. That should not offend the genbank gods... Thanks for your help, Phillip Chris Fields wrote: > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. > > Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. > > chris > > On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > > >> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. >> >> I am trying to use the code provided at: >> >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO >> >> and modified to request gi228534658 >> >> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. >> >> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: >> >> #!/usr/bin/perl >> use strict; >> use warnings; >> >> use Bio::SeqIO; >> use Bio::DB::EUtilities; >> >> my @ids; >> push @ids, '228534658'; >> my $factory = Bio::DB::EUtilities->new( >> -eutil => 'efetch', >> -db => 'nucleotide', >> -rettype => 'genbank', >> -id => \@ids); >> >> my $file = 'myseqs.gb'; >> >> # dump HTTP::Response content to a file (not retained in memory) >> $factory->get_Response(-file => $file); >> >> my $seqin = Bio::SeqIO->new(-file => $file, >> -format => 'genbank'); >> >> while (my $seq = $seqin->next_seq) { >> print "I see a sequence\n"; >> print $seq->species(); >> } >> >> >> "myseqs.gb" does have content: >> >> Seq-entry ::= seq { >> id { >> general { >> db "gpid:36555" , >> tag >> str "contig49313" } , >> genbank { >> accession "EZ113652" , >> version 1 } , >> gi 228534658 } , >> descr { >> title "TSA: Zea mays contig49313, mRNA sequence." , >> source { >> genome genomic , >> org { >> taxname "Zea mays" , >> db { >> { >> db "taxon" , >> tag >> id 4577 } } , >> orgname { >> name >> binomial { >> genus "Zea" , >> species "mays" } , >> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; >> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; >> PACCAD clade; Panicoideae; Andropogoneae; Zea" , >> gcode 1 , >> mgcode 1 , >> div "PLN" } } } , >> molinfo { >> biomol mRNA , >> tech tsa } , >> pub { >> pub { >> article { >> title { >> name "Deep sampling of the Palomero maize transcriptome by a high >> throughput strategy of pyrosequencing." } , >> authors { >> names >> std { >> { >> name >> name { >> last "Vega-Arreguin" , >> initials "J.C." } } , >> { >> name >> name { >> last "Ibarra-Laclette" , >> initials "E." } } , >> { >> name >> name { >> last "Jimenez-Moraila" , >> initials "B." } } , >> { >> name >> name { >> last "Martinez" , >> initials "O." } } , >> { >> name >> name { >> last "Vielle-Calzada" , >> initials "J.P." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> initials "L." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> initials "A." } } } } , >> from >> journal { >> title { >> iso-jta "BMC Genomics" , >> ml-jta "BMC Genomics" , >> issn "1471-2164" , >> name "BMC genomics" } , >> imp { >> date >> std { >> year 2009 , >> month 7 , >> day 6 } , >> volume "10" , >> issue "1" , >> pages "299" , >> language "ENG" , >> pubstatus aheadofprint , >> history { >> { >> pubstatus received , >> date >> std { >> year 2008 , >> month 12 , >> day 2 } } , >> { >> pubstatus accepted , >> date >> std { >> year 2009 , >> month 7 , >> day 6 } } , >> { >> pubstatus aheadofprint , >> date >> std { >> year 2009 , >> month 7 , >> day 6 } } , >> { >> pubstatus other , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } , >> { >> pubstatus pubmed , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } , >> { >> pubstatus medline , >> date >> std { >> year 2009 , >> month 7 , >> day 8 , >> hour 9 , >> minute 0 } } } } } , >> ids { >> pii "1471-2164-10-299" , >> doi "10.1186/1471-2164-10-299" , >> pubmed 19580677 } } , >> pmid 19580677 } } , >> pub { >> pub { >> sub { >> authors { >> names >> std { >> { >> name >> name { >> last "Vega-Arreguin" , >> first "Julio" , >> initials "J.C." } } , >> { >> name >> name { >> last "Ibarra-Laclette" , >> first "Enrique" , >> initials "E." } } , >> { >> name >> name { >> last "Jimenez-Moraila" , >> first "Beatriz" , >> initials "B." } } , >> { >> name >> name { >> last "Martinez" , >> first "Octavio" , >> initials "O." } } , >> { >> name >> name { >> last "Vielle-Calzada" , >> first "Jean" , >> initials "J.Philippe." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> first "Luis" , >> initials "L." } } , >> { >> name >> name { >> last "Herrera-Estrella" , >> first "Alfredo" , >> initials "A." } } } , >> affil >> std { >> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , >> div "Cinvestav Campus Guanajuato" , >> city "Irapuato" , >> sub "Guanajuato" , >> country "Mexico" , >> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , >> postal-code "36821" } } , >> medium other , >> date >> std { >> year 2009 , >> month 3 , >> day 23 } } } } , >> user { >> type >> str "GenomeProjectsDB" , >> data { >> { >> label >> str "ProjectID" , >> data >> int 36555 } , >> { >> label >> str "ParentID" , >> data >> int 0 } } } , >> create-date >> std { >> year 2009 , >> month 5 , >> day 5 } , >> update-date >> std { >> year 2009 , >> month 7 , >> day 14 } } , >> inst { >> repr raw , >> mol rna , >> length 450 , >> seq-data >> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 >> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A >> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 >> 760CFF0'H } } >> >> >> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. >> >> Phillip >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bioperlanand at yahoo.com Fri Mar 26 04:40:23 2010 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Thu, 25 Mar 2010 21:40:23 -0700 (PDT) Subject: [Bioperl-l] From Anand - a question on querying ncbi's genomeprj with Bio::DB::Eutilities Message-ID: <27160.94644.qm@web114211.mail.gq1.yahoo.com> Hi everybody, ? I have a list of genome project ids & I have a need where I need to gather information from a specific field? & store the output in a file. As regards what Info I want For example, for genome project id 30807? http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=30807, I need to grab the text information that reads (this is found at the bottom of the page):Anabaena azollae. Anabaena azollae is a cyanobacterial symbiont of the water fern Azolla, commonly known as 'duckweed'. Anabaena azollae is a nitrogen-fixer and provides nitrogen to the host plant.Nostoc azollae 0708. Nostoc azollae 0708, also called Anabaena azollae strain 0708, will be used for comparative analysis. I need to grab the? same information for a list of genome project ids. Is this possible using Bio::DB::Eutilities. If yes, what would be the fields/params? I did try out this: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#What_information_is_available_for_database_.27x.27.3F to find out what information is available for genomeprj, but I am unable to get the necessary field/param for my need. Please help. Alternatively, is there a better way to address my need other than Bio::DB::Eutilities Thanks in advance, Anand From rmb32 at cornell.edu Fri Mar 26 07:44:09 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 00:44:09 -0700 Subject: [Bioperl-l] GSoC mentors mailing list Message-ID: <4BAC65C9.307@cornell.edu> Hi all, If you have volunteered to be a possible GSoC mentor, and have not already been subscribed to the (mentors-only) gsoc-mentors mailing list, send me an email and I'll subscribe you. Rob Buels OBF GSoC 2010 Admin From rmb32 at cornell.edu Fri Mar 26 16:30:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 09:30:30 -0700 Subject: [Bioperl-l] Announcing OBF Summer of Code - please forward! Message-ID: <4BACE126.1030500@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 9! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2010 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010 Applications due 19:00 UTC, April 9, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 29 through Friday, April 9th, 2010. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2010 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs From cjfields at illinois.edu Fri Mar 26 18:28:46 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 13:28:46 -0500 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <4BACEEA9.2060407@purdue.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> <4BACEEA9.2060407@purdue.edu> Message-ID: <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> That format is ASN.1. and there isn't a BioPerl parser for GenBank ASN.1 format (it tends to be too cumbersome). However, there is a pure-perl-based one for the EntrezGene ASN.1 format (Bio::ASN1::EntrezGene). chris On Fri, 2010-03-26 at 13:28 -0400, Phillip San Miguel wrote: > Ah, yes. That does the trick. Actually I have already downloaded a few > thousand records in whatever that format that is returned when 'genbank' > is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= > seq {') Any idea what format that is and how to convert it to something > SeqIO can use? > > If not, I can just pull them all down again by sending about 200 gi's > per request. That should not offend the genbank gods... > > Thanks for your help, > Phillip > > Chris Fields wrote: > > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. > > > > Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. > > > > chris > > > > On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: > > > > > >> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. > >> > >> I am trying to use the code provided at: > >> > >> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO > >> > >> and modified to request gi228534658 > >> > >> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. > >> > >> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: > >> > >> #!/usr/bin/perl > >> use strict; > >> use warnings; > >> > >> use Bio::SeqIO; > >> use Bio::DB::EUtilities; > >> > >> my @ids; > >> push @ids, '228534658'; > >> my $factory = Bio::DB::EUtilities->new( > >> -eutil => 'efetch', > >> -db => 'nucleotide', > >> -rettype => 'genbank', > >> -id => \@ids); > >> > >> my $file = 'myseqs.gb'; > >> > >> # dump HTTP::Response content to a file (not retained in memory) > >> $factory->get_Response(-file => $file); > >> > >> my $seqin = Bio::SeqIO->new(-file => $file, > >> -format => 'genbank'); > >> > >> while (my $seq = $seqin->next_seq) { > >> print "I see a sequence\n"; > >> print $seq->species(); > >> } > >> > >> > >> "myseqs.gb" does have content: > >> > >> Seq-entry ::= seq { > >> id { > >> general { > >> db "gpid:36555" , > >> tag > >> str "contig49313" } , > >> genbank { > >> accession "EZ113652" , > >> version 1 } , > >> gi 228534658 } , > >> descr { > >> title "TSA: Zea mays contig49313, mRNA sequence." , > >> source { > >> genome genomic , > >> org { > >> taxname "Zea mays" , > >> db { > >> { > >> db "taxon" , > >> tag > >> id 4577 } } , > >> orgname { > >> name > >> binomial { > >> genus "Zea" , > >> species "mays" } , > >> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; > >> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; > >> PACCAD clade; Panicoideae; Andropogoneae; Zea" , > >> gcode 1 , > >> mgcode 1 , > >> div "PLN" } } } , > >> molinfo { > >> biomol mRNA , > >> tech tsa } , > >> pub { > >> pub { > >> article { > >> title { > >> name "Deep sampling of the Palomero maize transcriptome by a high > >> throughput strategy of pyrosequencing." } , > >> authors { > >> names > >> std { > >> { > >> name > >> name { > >> last "Vega-Arreguin" , > >> initials "J.C." } } , > >> { > >> name > >> name { > >> last "Ibarra-Laclette" , > >> initials "E." } } , > >> { > >> name > >> name { > >> last "Jimenez-Moraila" , > >> initials "B." } } , > >> { > >> name > >> name { > >> last "Martinez" , > >> initials "O." } } , > >> { > >> name > >> name { > >> last "Vielle-Calzada" , > >> initials "J.P." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> initials "L." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> initials "A." } } } } , > >> from > >> journal { > >> title { > >> iso-jta "BMC Genomics" , > >> ml-jta "BMC Genomics" , > >> issn "1471-2164" , > >> name "BMC genomics" } , > >> imp { > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } , > >> volume "10" , > >> issue "1" , > >> pages "299" , > >> language "ENG" , > >> pubstatus aheadofprint , > >> history { > >> { > >> pubstatus received , > >> date > >> std { > >> year 2008 , > >> month 12 , > >> day 2 } } , > >> { > >> pubstatus accepted , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } } , > >> { > >> pubstatus aheadofprint , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 6 } } , > >> { > >> pubstatus other , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } , > >> { > >> pubstatus pubmed , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } , > >> { > >> pubstatus medline , > >> date > >> std { > >> year 2009 , > >> month 7 , > >> day 8 , > >> hour 9 , > >> minute 0 } } } } } , > >> ids { > >> pii "1471-2164-10-299" , > >> doi "10.1186/1471-2164-10-299" , > >> pubmed 19580677 } } , > >> pmid 19580677 } } , > >> pub { > >> pub { > >> sub { > >> authors { > >> names > >> std { > >> { > >> name > >> name { > >> last "Vega-Arreguin" , > >> first "Julio" , > >> initials "J.C." } } , > >> { > >> name > >> name { > >> last "Ibarra-Laclette" , > >> first "Enrique" , > >> initials "E." } } , > >> { > >> name > >> name { > >> last "Jimenez-Moraila" , > >> first "Beatriz" , > >> initials "B." } } , > >> { > >> name > >> name { > >> last "Martinez" , > >> first "Octavio" , > >> initials "O." } } , > >> { > >> name > >> name { > >> last "Vielle-Calzada" , > >> first "Jean" , > >> initials "J.Philippe." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> first "Luis" , > >> initials "L." } } , > >> { > >> name > >> name { > >> last "Herrera-Estrella" , > >> first "Alfredo" , > >> initials "A." } } } , > >> affil > >> std { > >> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , > >> div "Cinvestav Campus Guanajuato" , > >> city "Irapuato" , > >> sub "Guanajuato" , > >> country "Mexico" , > >> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , > >> postal-code "36821" } } , > >> medium other , > >> date > >> std { > >> year 2009 , > >> month 3 , > >> day 23 } } } } , > >> user { > >> type > >> str "GenomeProjectsDB" , > >> data { > >> { > >> label > >> str "ProjectID" , > >> data > >> int 36555 } , > >> { > >> label > >> str "ParentID" , > >> data > >> int 0 } } } , > >> create-date > >> std { > >> year 2009 , > >> month 5 , > >> day 5 } , > >> update-date > >> std { > >> year 2009 , > >> month 7 , > >> day 14 } } , > >> inst { > >> repr raw , > >> mol rna , > >> length 450 , > >> seq-data > >> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 > >> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A > >> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 > >> 760CFF0'H } } > >> > >> > >> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. > >> > >> Phillip > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wollenbergk at niaid.nih.gov Fri Mar 26 20:47:06 2010 From: wollenbergk at niaid.nih.gov (Wollenberg, Kurt (NIH/NIAID) [C]) Date: Fri, 26 Mar 2010 16:47:06 -0400 Subject: [Bioperl-l] Error during installation of 1.6.1 Message-ID: Hello: I am trying to install BioPerl (after a recent system upgrade) and am getting the following error: "Catching error: "Can't execute q install q: No such file or directory at /Library/Perl/Updates/5.8.8/CPAN/Shell.pm line 1755\cJ" at /Library/Perl/Updates/5.8.8/CPAN.pm line 391". Previous to this I've run the CPAN upgrade, etc. as recommended on the Installation for Unix page. This happens when I try to do the actual install, both vanilla and "force"ed. I'm attempting this on a Mac G5 workstation running 10.5.8. Any clues what I may be missing or doing incorrectly? Cheers, Kurt Wollenberg, Ph.D. Contractor - Lockheed Martin Phylogenetics Specialist Computational Biology Section Bioinformatics and Computational Biosciences Branch (BCBB) OCICB/OSMO/OD/NIAID/NIH 31 Center Drive, Room 3B62 Bethesda, MD 20892-0485 Office 301-402-8628 http://bioinformatics.niaid.nih.gov (Within NIH) http://exon.niaid.nih.gov (Public) Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives From rmb32 at cornell.edu Fri Mar 26 22:22:42 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 15:22:42 -0700 Subject: [Bioperl-l] BioPerl and the Google Summer of Code In-Reply-To: <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> References: <648F9E90AF07449887FD4C420AA8B00E@NewLife> <4D4CF1CC-3C99-448A-A55D-62D2D0E67066@illinois.edu> Message-ID: <4BAD33B2.1060309@cornell.edu> You guys are the best. Hugs all around. R From watvealo at cse.msu.edu Fri Mar 26 23:06:24 2010 From: watvealo at cse.msu.edu (Alok) Date: Fri, 26 Mar 2010 19:06:24 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <249674A825C14BB3801C6184DEEA7A82@NewLife> References: <4BABB825.6010803@cse.msu.edu> <249674A825C14BB3801C6184DEEA7A82@NewLife> Message-ID: <4BAD3DF0.7090006@cse.msu.edu> Hi Mark, Thanks a lot for the response. I tried to access the SVN but was unable to do so. My SVN client just times out :-( I even tried SVN links from the BioPerl Wiki (http://www.bioperl.org/wiki/Using_Subversion) But they too are non-responsive. Thanks, Alok Mark A. Jensen wrote: > Hi Alok-- Thanks for your interest! You should certainly consider > applying. I can work with > you on developing your application. I'm including the bioperl mailing > list on this > post; we'll continue to have this conversation on the list so that the > helpful, friendly, > knowledgeable, compassionate membership can participate. > WrapperMaker code is currently available in > svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker > > Probably you want to have a look at Bio::Tools::Run::Samtools in > bioperl-run > for an example of how Bio::Tools::Run::WrapperBase and CommandExts are > used (er, by me...). > cheers > MAJ > ----- Original Message ----- From: "Alok" > To: > Sent: Thursday, March 25, 2010 3:23 PM > Subject: BioPerl Google SOC project > > >> Hello Mark, >> >> My name is Alok Watve and I am currently pursuing PhD in Computer >> Science at Michigan State University. I was going through the BioPerl >> Wiki for Google SOC projects. I have good experience with Perl and was >> wondering if I could work on the project "Perl Run Wrappers". >> >> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >> Ltd. My work was involved in writing Java programs and their perl >> wrappers. We used perl scripts to fire java programs with all the >> correct parameters. So I think I have some idea about what wrappers are. >> However, I have not used BioPerl and may take some time to get familiar >> with the structure. I am fairly confident that I will be able to do >> this. >> >> During my work here at MSU. I use perl a lot for doing basic text >> analysis for my projects. Although I rarely use OO features of perl, I >> have used them in past and never had any problems with it. I also >> believe in writing well-documented and user/developer friendly code >> (With comments, command line options for help/documentation). I have >> attached a simple script I wrote for my project as an example. I have >> also attached my resume for your consideration. >> >> Please let me know if you think that I am an appropriate candidate and >> whether I should go ahead with submitting an application with BioPerl as >> my Mentor Organization. >> >> Thanks a lot, >> Alok >> www.cse.msu.edu/~watvealo/ >> > > > -------------------------------------------------------------------------------- > > > >> #!/usr/bin/perl >> >> =pod >> >> =head1 SYNOPSIS >> >> Script to edit existing box query files to enable random box query. >> This scripts inserts box size on each line corresponding to discrete >> dimension in the existing box query file. The maximum value of "box >> size" >> depends on the alphabet size. >> >> Example >> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >> >> Use -perldoc for detailed help on options. >> >> =head1 OPTIONS >> >> =over >> >> =item -infile >> >> Specifies the name of the input box query file. >> >> =item -outfile >> >> Specifies the name of the output file. >> >> =item -uniform_box >> >> Specifies size of the uniform box query. >> >> =item -max_size >> >> Specifies the maximum box size for random sized box query. >> >> =item -help >> >> Displays a brief help message and exits. >> >> =item -perldoc >> >> Displays a detailed help. >> >> =back >> >> =cut >> >> use strict; >> use warnings 'all'; >> >> use Getopt::Long; >> use Pod::Usage; >> >> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, >> 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >> 'help' => \my $help, 'perldoc' => \my $perldoc); >> >> if(defined($perldoc)) >> { >> pod2usage(-verbose => 2); >> } >> >> if(defined($help)) >> { >> pod2usage(-verbose=> 0); >> } >> >> if(! (defined($infile) && defined ($outfile) )) >> { >> die('Please specify input, output files. Use -perldoc >> for more help'); >> } >> >> # Some basic error checking to ensure script runs .... >> if(!(defined($uniformBox) ||defined($maxSize))) >> { >> die('Specify either box size for uniform box queries or maximum >> box size for random box queries'); >> } >> >> # Initialize random number generator. >> srand(); >> >> # Read Input file and find out lines we are interested in >> # Then perfix the line with correct box size as defined by >> # user choice >> open(IN, "<$infile"); >> open(OUT, ">$outfile"); >> my $count = 0; >> while(my $line = ) >> { >> if( ($count%64) < 32 ) >> { >> if(defined($uniformBox)) >> { >> $line = sprintf("%d ",$uniformBox) . $line; >> } >> elsif(defined($maxSize)) >> { >> # This line corresponds to the discrete dimension. >> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >> } >> } >> $count ++; >> print OUT $line >> } >> >> close(OUT); >> close(IN); >> From maj at fortinbras.us Sat Mar 27 00:08:51 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 20:08:51 -0400 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: <4BAD3DF0.7090006@cse.msu.edu> References: <4BABB825.6010803@cse.msu.edu><249674A825C14BB3801C6184DEEA7A82@NewLife> <4BAD3DF0.7090006@cse.msu.edu> Message-ID: Hi Alok-- There has been trouble with the code node of late. You can get a tarball of all the latest code at http://bioperl.org/DIST/nightly_builds/ Download both bioperl-live and bioperl-run cheers, MAJ ----- Original Message ----- From: "Alok" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Friday, March 26, 2010 7:06 PM Subject: Re: [Bioperl-l] BioPerl Google SOC project > Hi Mark, > > Thanks a lot for the response. I tried to access the SVN but was unable to do > so. My SVN client just times out :-( > I even tried SVN links from the BioPerl Wiki > (http://www.bioperl.org/wiki/Using_Subversion) > But they too are non-responsive. > > Thanks, > Alok > > Mark A. Jensen wrote: >> Hi Alok-- Thanks for your interest! You should certainly consider applying. I >> can work with >> you on developing your application. I'm including the bioperl mailing list on >> this >> post; we'll continue to have this conversation on the list so that the >> helpful, friendly, >> knowledgeable, compassionate membership can participate. >> WrapperMaker code is currently available in >> svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker >> Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run >> for an example of how Bio::Tools::Run::WrapperBase and CommandExts are >> used (er, by me...). >> cheers >> MAJ >> ----- Original Message ----- From: "Alok" >> To: >> Sent: Thursday, March 25, 2010 3:23 PM >> Subject: BioPerl Google SOC project >> >> >>> Hello Mark, >>> >>> My name is Alok Watve and I am currently pursuing PhD in Computer >>> Science at Michigan State University. I was going through the BioPerl >>> Wiki for Google SOC projects. I have good experience with Perl and was >>> wondering if I could work on the project "Perl Run Wrappers". >>> >>> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >>> Ltd. My work was involved in writing Java programs and their perl >>> wrappers. We used perl scripts to fire java programs with all the >>> correct parameters. So I think I have some idea about what wrappers are. >>> However, I have not used BioPerl and may take some time to get familiar >>> with the structure. I am fairly confident that I will be able to do this. >>> >>> During my work here at MSU. I use perl a lot for doing basic text >>> analysis for my projects. Although I rarely use OO features of perl, I >>> have used them in past and never had any problems with it. I also >>> believe in writing well-documented and user/developer friendly code >>> (With comments, command line options for help/documentation). I have >>> attached a simple script I wrote for my project as an example. I have >>> also attached my resume for your consideration. >>> >>> Please let me know if you think that I am an appropriate candidate and >>> whether I should go ahead with submitting an application with BioPerl as >>> my Mentor Organization. >>> >>> Thanks a lot, >>> Alok >>> www.cse.msu.edu/~watvealo/ >>> >> >> >> -------------------------------------------------------------------------------- >> >> >> >>> #!/usr/bin/perl >>> >>> =pod >>> >>> =head1 SYNOPSIS >>> >>> Script to edit existing box query files to enable random box query. >>> This scripts inserts box size on each line corresponding to discrete >>> dimension in the existing box query file. The maximum value of "box size" >>> depends on the alphabet size. >>> >>> Example >>> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >>> >>> Use -perldoc for detailed help on options. >>> >>> =head1 OPTIONS >>> >>> =over >>> >>> =item -infile >>> >>> Specifies the name of the input box query file. >>> >>> =item -outfile >>> >>> Specifies the name of the output file. >>> >>> =item -uniform_box >>> >>> Specifies size of the uniform box query. >>> >>> =item -max_size >>> >>> Specifies the maximum box size for random sized box query. >>> >>> =item -help >>> >>> Displays a brief help message and exits. >>> >>> =item -perldoc >>> >>> Displays a detailed help. >>> >>> =back >>> >>> =cut >>> >>> use strict; >>> use warnings 'all'; >>> >>> use Getopt::Long; >>> use Pod::Usage; >>> >>> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, >>> 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >>> 'help' => \my $help, 'perldoc' => \my $perldoc); >>> >>> if(defined($perldoc)) >>> { >>> pod2usage(-verbose => 2); >>> } >>> >>> if(defined($help)) >>> { >>> pod2usage(-verbose=> 0); >>> } >>> >>> if(! (defined($infile) && defined ($outfile) )) >>> { >>> die('Please specify input, output files. Use -perldoc >>> for more help'); >>> } >>> >>> # Some basic error checking to ensure script runs .... >>> if(!(defined($uniformBox) ||defined($maxSize))) >>> { >>> die('Specify either box size for uniform box queries or maximum box size >>> for random box queries'); >>> } >>> >>> # Initialize random number generator. >>> srand(); >>> >>> # Read Input file and find out lines we are interested in >>> # Then perfix the line with correct box size as defined by >>> # user choice >>> open(IN, "<$infile"); >>> open(OUT, ">$outfile"); >>> my $count = 0; >>> while(my $line = ) >>> { >>> if( ($count%64) < 32 ) >>> { >>> if(defined($uniformBox)) >>> { >>> $line = sprintf("%d ",$uniformBox) . $line; >>> } >>> elsif(defined($maxSize)) >>> { >>> # This line corresponds to the discrete dimension. >>> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >>> } >>> } >>> $count ++; >>> print OUT $line >>> } >>> >>> close(OUT); >>> close(IN); >>> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bioperlanand at yahoo.com Sat Mar 27 01:40:04 2010 From: bioperlanand at yahoo.com (Anand Venkatraman) Date: Fri, 26 Mar 2010 18:40:04 -0700 (PDT) Subject: [Bioperl-l] From Anand - a question on querying ncbi's genomeprj with Bio::DB::Eutilities Message-ID: <497143.33972.qm@web114218.mail.gq1.yahoo.com> Hi everybody, ? I have a list of genome project ids & I have a need where I need to gather information from a specific field? & store the output in a file. As regards what Info I want For example, for genome project id 30807??http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=30807, I need to grab the text information that reads (this is found at the bottom of the page):Anabaena azollae. Anabaena azollae is a cyanobacterial symbiont of the water fern Azolla, commonly known as 'duckweed'. Anabaena azollae is a nitrogen-fixer and provides nitrogen to the host plant.Nostoc azollae 0708. Nostoc azollae 0708, also called Anabaena azollae strain 0708, will be used for comparative analysis. I need to grab the? same information for a list of genome project ids. Is this possible using Bio::DB::Eutilities. If yes, what would be the fields/params? I did try out this:?http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#What_information_is_available_for_database_.27x.27.3F to find out what information is available for genomeprj, but I am unable to get the necessary field/param for my need. Please help. Alternatively, is there a better way to address my need other than Bio::DB::Eutilities Thanks in advance, Anand? From cjfields at illinois.edu Sat Mar 27 03:05:59 2010 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 26 Mar 2010 22:05:59 -0500 Subject: [Bioperl-l] BioPerl Google SOC project In-Reply-To: References: <4BABB825.6010803@cse.msu.edu><249674A825C14BB3801C6184DEEA7A82@NewLife> <4BAD3DF0.7090006@cse.msu.edu> Message-ID: <73AE1929-9920-4FD1-B36B-1C7244E20102@illinois.edu> You can also grab the code off the github mirror: http://github.com/bioperl/bioperl-live You can either run a checkout, or download the tarball using the 'Download Source' link. We'll have an SVN read-only mirror on Google Code as well very soon, if it isn't done already. chris On Mar 26, 2010, at 7:08 PM, Mark A. Jensen wrote: > Hi Alok-- There has been trouble with the code node > of late. You can get a tarball of all the latest code at > http://bioperl.org/DIST/nightly_builds/ > Download both bioperl-live and bioperl-run > cheers, > MAJ > ----- Original Message ----- From: "Alok" > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Friday, March 26, 2010 7:06 PM > Subject: Re: [Bioperl-l] BioPerl Google SOC project > > >> Hi Mark, >> >> Thanks a lot for the response. I tried to access the SVN but was unable to do so. My SVN client just times out :-( >> I even tried SVN links from the BioPerl Wiki (http://www.bioperl.org/wiki/Using_Subversion) >> But they too are non-responsive. >> >> Thanks, >> Alok >> >> Mark A. Jensen wrote: >>> Hi Alok-- Thanks for your interest! You should certainly consider applying. I can work with >>> you on developing your application. I'm including the bioperl mailing list on this >>> post; we'll continue to have this conversation on the list so that the helpful, friendly, >>> knowledgeable, compassionate membership can participate. >>> WrapperMaker code is currently available in >>> svn://code.open-bio.org/bioperl/bioperl-dev/trunk/lib/Bio/Tools/WrapperMaker >>> Probably you want to have a look at Bio::Tools::Run::Samtools in bioperl-run >>> for an example of how Bio::Tools::Run::WrapperBase and CommandExts are >>> used (er, by me...). >>> cheers >>> MAJ >>> ----- Original Message ----- From: "Alok" >>> To: >>> Sent: Thursday, March 25, 2010 3:23 PM >>> Subject: BioPerl Google SOC project >>> >>> >>>> Hello Mark, >>>> >>>> My name is Alok Watve and I am currently pursuing PhD in Computer >>>> Science at Michigan State University. I was going through the BioPerl >>>> Wiki for Google SOC projects. I have good experience with Perl and was >>>> wondering if I could work on the project "Perl Run Wrappers". >>>> >>>> Prior to joining MSU, I was working with D E Shaw India Software Pvt. >>>> Ltd. My work was involved in writing Java programs and their perl >>>> wrappers. We used perl scripts to fire java programs with all the >>>> correct parameters. So I think I have some idea about what wrappers are. >>>> However, I have not used BioPerl and may take some time to get familiar >>>> with the structure. I am fairly confident that I will be able to do this. >>>> >>>> During my work here at MSU. I use perl a lot for doing basic text >>>> analysis for my projects. Although I rarely use OO features of perl, I >>>> have used them in past and never had any problems with it. I also >>>> believe in writing well-documented and user/developer friendly code >>>> (With comments, command line options for help/documentation). I have >>>> attached a simple script I wrote for my project as an example. I have >>>> also attached my resume for your consideration. >>>> >>>> Please let me know if you think that I am an appropriate candidate and >>>> whether I should go ahead with submitting an application with BioPerl as >>>> my Mentor Organization. >>>> >>>> Thanks a lot, >>>> Alok >>>> www.cse.msu.edu/~watvealo/ >>>> >>> >>> >>> -------------------------------------------------------------------------------- >>> >>> >>> >>>> #!/usr/bin/perl >>>> >>>> =pod >>>> >>>> =head1 SYNOPSIS >>>> >>>> Script to edit existing box query files to enable random box query. >>>> This scripts inserts box size on each line corresponding to discrete >>>> dimension in the existing box query file. The maximum value of "box size" >>>> depends on the alphabet size. >>>> >>>> Example >>>> ./modify_bqfile.pl -alpha 8 -infile bqfile -outfile mod_bqfile >>>> >>>> Use -perldoc for detailed help on options. >>>> >>>> =head1 OPTIONS >>>> >>>> =over >>>> >>>> =item -infile >>>> >>>> Specifies the name of the input box query file. >>>> >>>> =item -outfile >>>> >>>> Specifies the name of the output file. >>>> >>>> =item -uniform_box >>>> >>>> Specifies size of the uniform box query. >>>> >>>> =item -max_size >>>> >>>> Specifies the maximum box size for random sized box query. >>>> >>>> =item -help >>>> >>>> Displays a brief help message and exits. >>>> >>>> =item -perldoc >>>> >>>> Displays a detailed help. >>>> >>>> =back >>>> >>>> =cut >>>> >>>> use strict; >>>> use warnings 'all'; >>>> >>>> use Getopt::Long; >>>> use Pod::Usage; >>>> >>>> GetOptions('infile=s' => \my $infile, 'outfile=s' => \my $outfile, 'max_size=i' => \my $maxSize, 'uniform_box=s' => \my $uniformBox, >>>> 'help' => \my $help, 'perldoc' => \my $perldoc); >>>> >>>> if(defined($perldoc)) >>>> { >>>> pod2usage(-verbose => 2); >>>> } >>>> >>>> if(defined($help)) >>>> { >>>> pod2usage(-verbose=> 0); >>>> } >>>> >>>> if(! (defined($infile) && defined ($outfile) )) >>>> { >>>> die('Please specify input, output files. Use -perldoc >>>> for more help'); >>>> } >>>> >>>> # Some basic error checking to ensure script runs .... >>>> if(!(defined($uniformBox) ||defined($maxSize))) >>>> { >>>> die('Specify either box size for uniform box queries or maximum box size for random box queries'); >>>> } >>>> >>>> # Initialize random number generator. >>>> srand(); >>>> >>>> # Read Input file and find out lines we are interested in >>>> # Then perfix the line with correct box size as defined by >>>> # user choice >>>> open(IN, "<$infile"); >>>> open(OUT, ">$outfile"); >>>> my $count = 0; >>>> while(my $line = ) >>>> { >>>> if( ($count%64) < 32 ) >>>> { >>>> if(defined($uniformBox)) >>>> { >>>> $line = sprintf("%d ",$uniformBox) . $line; >>>> } >>>> elsif(defined($maxSize)) >>>> { >>>> # This line corresponds to the discrete dimension. >>>> $line = sprintf("%d ", int(rand($maxSize))+1 ) . $line; >>>> } >>>> } >>>> $count ++; >>>> print OUT $line >>>> } >>>> >>>> close(OUT); >>>> close(IN); >>>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sat Mar 27 03:15:30 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 26 Mar 2010 23:15:30 -0400 Subject: [Bioperl-l] Error during installation of 1.6.1 In-Reply-To: References: Message-ID: Is it really "q install q" ? Then you probably need to do some cpan configuring. It's possible your original CPAN/Config.pm file is lost or not where cpan expects it to be after your upgrade. Try this $ cpan cpan> o conf make /usr/bin/make cpan> o conf make_install_make_command /usr/bin/make cpan> o conf commit and rerun the install. If you get other strangeness, I would check the values of all the config variables by listing with cpan> o conf BTW, by the message I infer you've got v1.93 of CPAN; maybe upgrading to the current version (v1.9402) would solve some problems. cheers MAJ ----- Original Message ----- From: "Wollenberg, Kurt (NIH/NIAID) [C]" To: Sent: Friday, March 26, 2010 4:47 PM Subject: [Bioperl-l] Error during installation of 1.6.1 > Hello: > > I am trying to install BioPerl (after a recent system upgrade) and am > getting the following error: > > "Catching error: "Can't execute q install q: No such file or directory at > /Library/Perl/Updates/5.8.8/CPAN/Shell.pm line 1755\cJ" at > /Library/Perl/Updates/5.8.8/CPAN.pm line 391". > > Previous to this I've run the CPAN upgrade, etc. as recommended on the > Installation for Unix page. This happens when I try to do the actual > install, both vanilla and "force"ed. I'm attempting this on a Mac G5 > workstation running 10.5.8. Any clues what I may be missing or doing > incorrectly? > > Cheers, > Kurt Wollenberg, Ph.D. > Contractor - Lockheed Martin > Phylogenetics Specialist > Computational Biology Section > Bioinformatics and Computational Biosciences Branch (BCBB) > OCICB/OSMO/OD/NIAID/NIH > > 31 Center Drive, Room 3B62 > Bethesda, MD 20892-0485 > Office 301-402-8628 > http://bioinformatics.niaid.nih.gov (Within NIH) > http://exon.niaid.nih.gov (Public) > > Disclaimer: > The information in this e-mail and any of its attachments is confidential > and may contain sensitive information. It should not be used by anyone who > is not the original intended recipient. If you have received this e-mail in > error please inform the sender and delete it from your mailbox or any other > storage devices. National Institute of Allergy and Infectious Diseases shall > not accept liability for any statements made that are sender's own and not > expressly made on behalf of the NIAID by one of its representatives > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From biopython at maubp.freeserve.co.uk Sat Mar 27 12:42:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 27 Mar 2010 12:42:12 +0000 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> Message-ID: <320fb6e01003270542i1f3cd4d2x61c97bc7ccf1b917@mail.gmail.com> On Fri, Mar 26, 2010 at 4:16 PM, Chris Fields wrote: > Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the > latter is if you always want a full nucleotide sequence instead > of possibly getting contig files). ?'genbank' used to be an alias > for 'gb', but apparently no longer, and appears to be something > that was changed on NCBI's end. Yeah, the NCBI changed that almost a year ago (Easter 2009). It broke one of the Biopython unit tests, and I asked the NCBI about this and if they could restore the alias "genbank". They declined, so in Biopython's efetch wrapper we spot anyone asking for retype=genbank, issue a warning, and convert it to retype=gb or retype=gp (for the protein database) instead. The relevant Biopython code is here if anyone is interested: http://biopython.org/SRC/biopython/Bio/Entrez/__init__.py Peter From pmiguel at purdue.edu Sat Mar 27 13:51:14 2010 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sat, 27 Mar 2010 09:51:14 -0400 Subject: [Bioperl-l] SeqIO issue? EUtilities Cookbook In-Reply-To: <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> References: <4BACD831.20506@purdue.edu> <76509B1C-0856-4052-8C9A-ACBD2FBAF356@illinois.edu> <4BACEEA9.2060407@purdue.edu> <1269628126.24729.57.camel@pyrimidine.igb.uiuc.edu> Message-ID: <4BAE0D52.60908@purdue.edu> Hi Chris, I also see there is a bunch of NCBI toolkit code that deals with asn.1 conversion. They even have some precompiled code: http://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html Thanks for your help, Phillip Chris Fields wrote: > That format is ASN.1. and there isn't a BioPerl parser for GenBank ASN.1 > format (it tends to be too cumbersome). > > However, there is a pure-perl-based one for the EntrezGene ASN.1 format > (Bio::ASN1::EntrezGene). > > chris > > > On Fri, 2010-03-26 at 13:28 -0400, Phillip San Miguel wrote: > >> Ah, yes. That does the trick. Actually I have already downloaded a few >> thousand records in whatever that format that is returned when 'genbank' >> is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::= >> seq {') Any idea what format that is and how to convert it to something >> SeqIO can use? >> >> If not, I can just pull them all down again by sending about 200 gi's >> per request. That should not offend the genbank gods... >> >> Thanks for your help, >> Phillip >> >> Chris Fields wrote: >> >>> Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end. >>> >>> Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both. >>> >>> chris >>> >>> On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote: >>> >>> >>> >>>> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work. >>>> >>>> I am trying to use the code provided at: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO >>>> >>>> and modified to request gi228534658 >>>> >>>> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything. >>>> >>>> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1: >>>> >>>> #!/usr/bin/perl >>>> use strict; >>>> use warnings; >>>> >>>> use Bio::SeqIO; >>>> use Bio::DB::EUtilities; >>>> >>>> my @ids; >>>> push @ids, '228534658'; >>>> my $factory = Bio::DB::EUtilities->new( >>>> -eutil => 'efetch', >>>> -db => 'nucleotide', >>>> -rettype => 'genbank', >>>> -id => \@ids); >>>> >>>> my $file = 'myseqs.gb'; >>>> >>>> # dump HTTP::Response content to a file (not retained in memory) >>>> $factory->get_Response(-file => $file); >>>> >>>> my $seqin = Bio::SeqIO->new(-file => $file, >>>> -format => 'genbank'); >>>> >>>> while (my $seq = $seqin->next_seq) { >>>> print "I see a sequence\n"; >>>> print $seq->species(); >>>> } >>>> >>>> >>>> "myseqs.gb" does have content: >>>> >>>> Seq-entry ::= seq { >>>> id { >>>> general { >>>> db "gpid:36555" , >>>> tag >>>> str "contig49313" } , >>>> genbank { >>>> accession "EZ113652" , >>>> version 1 } , >>>> gi 228534658 } , >>>> descr { >>>> title "TSA: Zea mays contig49313, mRNA sequence." , >>>> source { >>>> genome genomic , >>>> org { >>>> taxname "Zea mays" , >>>> db { >>>> { >>>> db "taxon" , >>>> tag >>>> id 4577 } } , >>>> orgname { >>>> name >>>> binomial { >>>> genus "Zea" , >>>> species "mays" } , >>>> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta; >>>> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; >>>> PACCAD clade; Panicoideae; Andropogoneae; Zea" , >>>> gcode 1 , >>>> mgcode 1 , >>>> div "PLN" } } } , >>>> molinfo { >>>> biomol mRNA , >>>> tech tsa } , >>>> pub { >>>> pub { >>>> article { >>>> title { >>>> name "Deep sampling of the Palomero maize transcriptome by a high >>>> throughput strategy of pyrosequencing." } , >>>> authors { >>>> names >>>> std { >>>> { >>>> name >>>> name { >>>> last "Vega-Arreguin" , >>>> initials "J.C." } } , >>>> { >>>> name >>>> name { >>>> last "Ibarra-Laclette" , >>>> initials "E." } } , >>>> { >>>> name >>>> name { >>>> last "Jimenez-Moraila" , >>>> initials "B." } } , >>>> { >>>> name >>>> name { >>>> last "Martinez" , >>>> initials "O." } } , >>>> { >>>> name >>>> name { >>>> last "Vielle-Calzada" , >>>> initials "J.P." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> initials "L." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> initials "A." } } } } , >>>> from >>>> journal { >>>> title { >>>> iso-jta "BMC Genomics" , >>>> ml-jta "BMC Genomics" , >>>> issn "1471-2164" , >>>> name "BMC genomics" } , >>>> imp { >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } , >>>> volume "10" , >>>> issue "1" , >>>> pages "299" , >>>> language "ENG" , >>>> pubstatus aheadofprint , >>>> history { >>>> { >>>> pubstatus received , >>>> date >>>> std { >>>> year 2008 , >>>> month 12 , >>>> day 2 } } , >>>> { >>>> pubstatus accepted , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } } , >>>> { >>>> pubstatus aheadofprint , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 6 } } , >>>> { >>>> pubstatus other , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } , >>>> { >>>> pubstatus pubmed , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } , >>>> { >>>> pubstatus medline , >>>> date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 8 , >>>> hour 9 , >>>> minute 0 } } } } } , >>>> ids { >>>> pii "1471-2164-10-299" , >>>> doi "10.1186/1471-2164-10-299" , >>>> pubmed 19580677 } } , >>>> pmid 19580677 } } , >>>> pub { >>>> pub { >>>> sub { >>>> authors { >>>> names >>>> std { >>>> { >>>> name >>>> name { >>>> last "Vega-Arreguin" , >>>> first "Julio" , >>>> initials "J.C." } } , >>>> { >>>> name >>>> name { >>>> last "Ibarra-Laclette" , >>>> first "Enrique" , >>>> initials "E." } } , >>>> { >>>> name >>>> name { >>>> last "Jimenez-Moraila" , >>>> first "Beatriz" , >>>> initials "B." } } , >>>> { >>>> name >>>> name { >>>> last "Martinez" , >>>> first "Octavio" , >>>> initials "O." } } , >>>> { >>>> name >>>> name { >>>> last "Vielle-Calzada" , >>>> first "Jean" , >>>> initials "J.Philippe." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> first "Luis" , >>>> initials "L." } } , >>>> { >>>> name >>>> name { >>>> last "Herrera-Estrella" , >>>> first "Alfredo" , >>>> initials "A." } } } , >>>> affil >>>> std { >>>> affil "Laboratorio Nacional de Genomica para la Biodiversidad" , >>>> div "Cinvestav Campus Guanajuato" , >>>> city "Irapuato" , >>>> sub "Guanajuato" , >>>> country "Mexico" , >>>> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" , >>>> postal-code "36821" } } , >>>> medium other , >>>> date >>>> std { >>>> year 2009 , >>>> month 3 , >>>> day 23 } } } } , >>>> user { >>>> type >>>> str "GenomeProjectsDB" , >>>> data { >>>> { >>>> label >>>> str "ProjectID" , >>>> data >>>> int 36555 } , >>>> { >>>> label >>>> str "ParentID" , >>>> data >>>> int 0 } } } , >>>> create-date >>>> std { >>>> year 2009 , >>>> month 5 , >>>> day 5 } , >>>> update-date >>>> std { >>>> year 2009 , >>>> month 7 , >>>> day 14 } } , >>>> inst { >>>> repr raw , >>>> mol rna , >>>> length 450 , >>>> seq-data >>>> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02 >>>> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A >>>> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63 >>>> 760CFF0'H } } >>>> >>>> >>>> Maybe I am using the wrong format? This looks more like ASN than genbank format to me. >>>> >>>> Phillip >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From awitney at sgul.ac.uk Mon Mar 29 17:26:40 2010 From: awitney at sgul.ac.uk (Adam Witney) Date: Mon, 29 Mar 2010 18:26:40 +0100 Subject: [Bioperl-l] Running Smith Waterman alignments in BioPerl In-Reply-To: <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> References: <97B95E8A-9E93-471F-B7FB-31D5D226D104@sgul.ac.uk> <5CAC472B-FD3A-4905-9B63-1D05DBAFCA36@illinois.edu> Message-ID: <6DD3E9BB-27AD-4241-94F9-476AE6525A7D@sgul.ac.uk> thanks Chris for the explanation. It looks like Exonerate may also do something similar thanks adam On 26 Mar 2010, at 15:51, Chris Fields wrote: > It's not actively developed as far as I know. I've been thinking that we could break it out of bioperl-ext and release it on it's own, with the intent that someone could take it up at some point. We have started down that road with the HMM tools in bioperl-ext, though that one is still maintained by it's author. > > I know many users just use calls to outside programs, such EMBOSS (which has water and needle) or others. From the maintenance standpoint they're easier to update if something changes, XS can be a bugbear. > > chris > > On Mar 26, 2010, at 10:20 AM, Adam Witney wrote: > >> Is the bioperl-ext package still being developed? I ask because i am looking at running some SW alignments using the pSW module, but the simple example in the pod gives the error >> >> "The C-compiled engine for Smith Waterman alignments (Bio::Ext::Align) has not been installed. >> Please read the install the bioperl-ext package" >> >> even though i did compile and install the Bio::Ext::Align package >> >> If not using the pSW module, what do other people use for this? >> >> thanks >> >> adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nicolas.turenne at jouy.inra.fr Mon Mar 29 18:09:53 2010 From: nicolas.turenne at jouy.inra.fr (Nicolas Turenne) Date: Mon, 29 Mar 2010 20:09:53 +0200 Subject: [Bioperl-l] about biblio Message-ID: <4BB0ECF1.6050308@jouy.inra.fr> Hello, I am using biblio module from bioperl to download pubmed abstract. if i do the query "actb" on the pubmed site (http://www.ncbi.nlm.nih.gov/sites/entrez) i get 165 hits But using bioperl, if i do use Bio::Biblio; my $biblio = Bio::Biblio->new (-access => 'soap', -location => 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS', -destroy_on_exit => '0'); my @ListID = @{ $biblio->find ("actb")->get_all_ids }; i get 228 hits, so i dont understand the difference thank for help Nicolas From sj17m89 at gmail.com Mon Mar 29 17:47:38 2010 From: sj17m89 at gmail.com (Shweta Jha) Date: Mon, 29 Mar 2010 10:47:38 -0700 Subject: [Bioperl-l] Regarding Google Summer of Code Message-ID: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> Dear Sir / Madam , I , Shweta Jha , am a Third year B.Tech Bioinformatics student. I am interested to apply for the Google Summer of Code internship program. I am keen to work on project using Bioperl. Could you please let me know how do I apply for the program? Thanks and Regards Shweta Jha From rmb32 at cornell.edu Mon Mar 29 19:26:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 29 Mar 2010 12:26:30 -0700 Subject: [Bioperl-l] Regarding Google Summer of Code In-Reply-To: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> References: <7922ad021003291047q36142064nfd91372407bf6f0d@mail.gmail.com> Message-ID: <4BB0FEE6.3080209@cornell.edu> Hi Shweta, See http://open-bio.org/wiki/Google_Summer_of_Code, and the GSoC FAQ at http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs for details on the application process. Rob Shweta Jha wrote: > Dear Sir / Madam , > > I , Shweta Jha , am a Third year B.Tech Bioinformatics student. > > I am interested to apply for the Google Summer of Code internship program. > > I am keen to work on project using Bioperl. > > Could you please let me know how do I apply for the program? > > > > Thanks and Regards > Shweta Jha > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From martin.senger at gmail.com Mon Mar 29 21:02:02 2010 From: martin.senger at gmail.com (Martin Senger) Date: Mon, 29 Mar 2010 22:02:02 +0100 Subject: [Bioperl-l] about biblio In-Reply-To: <4BB0ECF1.6050308@jouy.inra.fr> References: <4BB0ECF1.6050308@jouy.inra.fr> Message-ID: <4d93f07c1003291402j5ab58216o3985157513d1820a@mail.gmail.com> Hi, I am actually not sure what is the correct answer - because I am not anymore maintaining the biblio server at EBI (I actually did not know that it was still running :-) - but I am very pleased that it does run). Mahmut, can I ask you a favor? Could you please pass the emailed question below to an appropriate person at EBI? Of course, if the result of this inquiry is that the problem is in the biblio module in bioperl I am quite happy and keen to fix it there. Cheers, Martin On Mon, Mar 29, 2010 at 7:09 PM, Nicolas Turenne < nicolas.turenne at jouy.inra.fr> wrote: > Hello, > I am using biblio module from bioperl to download pubmed abstract. > if i do the query "actb" on the pubmed site ( > http://www.ncbi.nlm.nih.gov/sites/entrez) > i get 165 hits > > But using bioperl, if i do > > use Bio::Biblio; > my $biblio = Bio::Biblio->new > (-access => 'soap', > -location => 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS', > -destroy_on_exit => '0'); > my @ListID = @{ $biblio->find ("actb")->get_all_ids }; > > i get 228 hits, so i dont understand the difference > > thank for help > Nicolas > -- Martin Senger email: martin.senger at gmail.com,martin.senger at kaust.edu.sa skype: martinsenger From click.xu at gmail.com Tue Mar 30 03:17:17 2010 From: click.xu at gmail.com (click xu) Date: Tue, 30 Mar 2010 11:17:17 +0800 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw Message-ID: Hi, I meet a problem when using Clustalw module. Here is the error message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht -output=gcg?? -matrix=BLOSUM -ktup le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | cannot find the file or path STACK: Error::throw STACK: Bio::Root::Root::throw /home/lf/data/BioPerl-1.6.1/Bio/Root/Root.pm:368 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig nment/Clustalw.pm:756 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali gnment/Clustalw.pm:515 STACK: test.txt:45 ----------------------------------------------------------- The test program is described as below: ----------------------------------------------------------- @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); # @seq_array is an array of Bio::Seq objects $aln = $factory->align(\@seq_array); ----------------------------------------------------------- The work path of clustalw2 has been configured: export CLUSTALDIR=/usr/local/bin/clustalw2 So, what may be reason of the error? Thanks! From Russell.Smithies at agresearch.co.nz Tue Mar 30 03:25:03 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 30 Mar 2010 16:25:03 +1300 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Do you have enough temp space? Will clustalw run 'manually' with your parameters from the command line? --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of click xu > Sent: Tuesday, 30 March 2010 4:17 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw > > Hi, > I meet a problem when using Clustalw module. > Here is the error message: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht > -output=gcg?? -matrix=BLOSUM -ktup > le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | > cannot find the file or path > STACK: Error::throw > STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- > 1.6.1/Bio/Root/Root.pm:368 > STACK: Bio::Tools::Run::Alignment::Clustalw::_run > /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig > nment/Clustalw.pm:756 > STACK: Bio::Tools::Run::Alignment::Clustalw::align > /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali > gnment/Clustalw.pm:515 > STACK: test.txt:45 > ----------------------------------------------------------- > The test program is described as below: > ----------------------------------------------------------- > @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); > $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); > # @seq_array is an array of Bio::Seq objects > $aln = $factory->align(\@seq_array); > ----------------------------------------------------------- > The work path of clustalw2 has been configured: > export CLUSTALDIR=/usr/local/bin/clustalw2 > So, what may be reason of the error? > Thanks! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From click.xu at gmail.com Tue Mar 30 04:03:49 2010 From: click.xu at gmail.com (click xu) Date: Tue, 30 Mar 2010 12:03:49 +0800 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Message-ID: Russell Clustalw2 can correctly run in command line, and the /tmp space is enough too. 2010/3/30 Smithies, Russell : > Do you have enough temp space? > Will clustalw run 'manually' with your parameters from the command line? > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of click xu >> Sent: Tuesday, 30 March 2010 4:17 p.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw >> >> Hi, >> I meet a problem when using Clustalw module. >> Here is the error message: >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: ClustalW call ( align? -infile=/tmp/AeyAfdxGvH/YpcPbyhYht >> -output=gcg?? -matrix=BLOSUM -ktup >> le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | >> cannot find the file or path >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- >> 1.6.1/Bio/Root/Root.pm:368 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig >> nment/Clustalw.pm:756 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align >> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali >> gnment/Clustalw.pm:515 >> STACK: test.txt:45 >> ----------------------------------------------------------- >> The test program is described as below: >> ----------------------------------------------------------- >> @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >> # @seq_array is an array of Bio::Seq objects >> $aln = $factory->align(\@seq_array); >> ----------------------------------------------------------- >> The work path of clustalw2 has been configured: >> export CLUSTALDIR=/usr/local/bin/clustalw2 >> So, what may be reason of the error? >> Thanks! >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From martin.senger at gmail.com Tue Mar 30 08:18:30 2010 From: martin.senger at gmail.com (Martin Senger) Date: Tue, 30 Mar 2010 09:18:30 +0100 Subject: [Bioperl-l] about biblio In-Reply-To: <4BB0ECF1.6050308@jouy.inra.fr> References: <4BB0ECF1.6050308@jouy.inra.fr> Message-ID: <4d93f07c1003300118q1c7b0551w4aa25a2a97fc35be@mail.gmail.com> Here is the answer sent by Mr Hamish McWilliam from EBI (where the MEDLINE server is running): The difference is OpenBQS adds a wildcard when it builds the SRS query: > > - [medline-AllText:actb*] gives 228 entries > - [medline-AllText:actb] gives 150 entries > > Performing the same query at PubMed (http://www.ncbi.nlm.nih.gov/pubmed/) > gives similar answers: > > - "actb*" gives 255 entries > - "actb" gives 165 entries > > The remaining differences are probably due to slight differences in the > PubMed data at NCBI and the exported MEDLINE data. > Cheers, Martin -- Martin Senger email: martin.senger at gmail.com,martin.senger at kaust.edu.sa skype: martinsenger From cjfields at illinois.edu Tue Mar 30 12:42:24 2010 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 30 Mar 2010 07:42:24 -0500 Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32C6EAE66CD@exchsth.agresearch.co.nz> Message-ID: <863E31F9-072B-4681-94C5-D2C8BEA82021@illinois.edu> You may need to submit this as a bug. I got clustalw2 working fairly recently, but it's possible some other API change is breaking things. chris On Mar 29, 2010, at 11:03 PM, click xu wrote: > Russell > Clustalw2 can correctly run in command line, and the /tmp space is enough too. > > > 2010/3/30 Smithies, Russell : >> Do you have enough temp space? >> Will clustalw run 'manually' with your parameters from the command line? >> >> --Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of click xu >>> Sent: Tuesday, 30 March 2010 4:17 p.m. >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Trouble about Bio::Tools::Run::Alignment::Clustalw >>> >>> Hi, >>> I meet a problem when using Clustalw module. >>> Here is the error message: >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: ClustalW call ( align -infile=/tmp/AeyAfdxGvH/YpcPbyhYht >>> -output=gcg -matrix=BLOSUM -ktup >>> le=2 -outfile=/tmp/AeyAfdxGvH/Z2MbO0ylbF 2>&1) failed to start: 0 | >>> cannot find the file or path >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /home/lf/data/BioPerl- >>> 1.6.1/Bio/Root/Root.pm:368 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >>> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Alig >>> nment/Clustalw.pm:756 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::align >>> /usr/local/share/perl/5.10.0/Bio/Tools/Run/Ali >>> gnment/Clustalw.pm:515 >>> STACK: test.txt:45 >>> ----------------------------------------------------------- >>> The test program is described as below: >>> ----------------------------------------------------------- >>> @params = ('ktuple' => 2, 'matrix' => 'BLOSUM'); >>> $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params); >>> # @seq_array is an array of Bio::Seq objects >>> $aln = $factory->align(\@seq_array); >>> ----------------------------------------------------------- >>> The work path of clustalw2 has been configured: >>> export CLUSTALDIR=/usr/local/bin/clustalw2 >>> So, what may be reason of the error? >>> Thanks! >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Tue Mar 30 20:10:09 2010 From: bernd.web at gmail.com (Bernd Web) Date: Tue, 30 Mar 2010 22:10:09 +0200 Subject: [Bioperl-l] AlignIO formats Message-ID: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> Hi, Using GuessSeqFormat and AlignIO, I stumbled on some issues and am now wondering if the defined formats are actually OK. Esp. related to pfam, selex, stockholm formats it seems: pfam here is like selex without any comment lines, but with the /start-end after the seq id like myseq/1-111. The EBI site (http://www.ebi.ac.uk/2can/tutorials/formats.html#pfam) actually defines Pfam and Stockholm to be the same formats. This makes me wonder: is the Pfam format actually defined as Selex or Stockholm? Within BioPerl it is like Selex. In addition, Selex (as used in HMMER 2.3.2) contains comment lines like #=AC, #=RF or #=ID. GuessSeq format uses this to detect Selex, however, they do not have to be present. GuessSeqFormat uses: return (($lineno == 1 && $line =~ /^#=ID /) || ($lineno == 2 && $line =~ /^#=AC /) || ($line =~ /^#=SQ /)); to detect the Selex format. At the same time, the Selex reader does not seem to get the aln id or accession if( $entry =~ /^\#=GS\s+(\S+)\s+AC\s+(\S+)/ ) { $accession{ $1 } = $2; Also a Selex file like: seq1 ACGACGACGACG. seq2 ..GGGAAAGG.GA seq3 UUU..AAAUUU.A is guessed to be phylip (whereas the seq1/1-11 format will be guessed as pfam) I am not sure if the above is desired behaviour, though all sequences are read in the alignment object correctly. I' was wondering wether all Selex variations could be guessed as Selex, not as phylip, pfam or selex (though in the selex case we can have more alignments in one file). Regards, Bernd From p.j.a.cock at googlemail.com Tue Mar 30 21:12:46 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 30 Mar 2010 22:12:46 +0100 Subject: [Bioperl-l] AlignIO formats In-Reply-To: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> References: <716af09c1003301310n70367415x51c0538f73c6b162@mail.gmail.com> Message-ID: <320fb6e01003301412s6c90220el7a95bdc97dee03e6@mail.gmail.com> On Tue, Mar 30, 2010 at 9:10 PM, Bernd Web wrote: > Hi, > > Using GuessSeqFormat and AlignIO, I stumbled on some issues and > am now wondering if the defined formats are actually OK. Esp. related to > pfam, selex, stockholm formats it seems: > > pfam here is like selex without any comment lines, but with the > /start-end after the seq id like myseq/1-111. > The EBI site (http://www.ebi.ac.uk/2can/tutorials/formats.html#pfam) > actually defines Pfam and Stockholm to be the same formats. This makes > me wonder: is the Pfam format actually defined as Selex or Stockholm? > Within BioPerl it is like Selex. I (and therefore the Biopython documentation) also think PFAM and Stockholm alignments are basically the same thing. The BioPerl wiki seems to agree with this interpretation too. Looking at the HMMER2 examples, Selex is different but the comment style is similar. The obvious thing to check is the presence or absence of the "# STOCKHOLM 1.0" header if trying to tell them apart. See also: http://en.wikipedia.org/wiki/Stockholm_format and http://www.bioperl.org/wiki/Stockholm_multiple_alignment_format http://www.bioperl.org/wiki/SELEX_multiple_alignment_format Peter From jun.yin at ucd.ie Tue Mar 30 22:37:07 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Tue, 30 Mar 2010 23:37:07 +0100 Subject: [Bioperl-l] summer code project on Bioperl Message-ID: <7160acc75f99.4bb28b23@ucd.ie> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CV_JunYin.doc Type: application/msword Size: 27648 bytes Desc: not available URL: From ross at cuhk.edu.hk Wed Mar 31 21:28:59 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Thu, 1 Apr 2010 05:28:59 +0800 Subject: [Bioperl-l] BlastPlus usage inquiry In-Reply-To: References: Message-ID: <014401cad119$2d1467a0$873d36e0$@edu.hk> Dear all, I know it is inappropriate to raise this question in bioperl but as I received no better response from NCBI and so have to ask in this group (because finally I'll use bioperl to call blastplus). I have already been using the latest blastplus (the command is blastn directly) and found the problem of running slow and inability to run in a parallel/multithread manner. Previously I was using non blastplus version 2.2.22 with the command blastall -p blastn -a 8 etc. With similar arguments as below except the word size was 12, my shell script for the same input and database finishes almost instantly. I notice that except word size and min raw gapped score were changed by me, nothing appears to differ from the previous version parameters. Moreover, when I top my process, I find it uses only one CPU instead of 7. What may be the problem for the script that makes the job running for a day and still hasn't finished? blastn -query $1 -db $2 -out $1_$2.xml -num_threads 7 -word_size 4 -gapopen 3 -gapextend 1 -penalty -2 -outfmt 5 -xdrop_ungap 30 -xdrop_gap 30 -xdrop_gap_final 30 -min_raw_gapped_score 10 From anil_m_lal at yahoo.com Tue Mar 30 18:24:34 2010 From: anil_m_lal at yahoo.com (Anil Lal) Date: Tue, 30 Mar 2010 11:24:34 -0700 (PDT) Subject: [Bioperl-l] GSoC 2010 Message-ID: <717794.59615.qm@web37507.mail.mud.yahoo.com> Hello, I am a mid career software programmer and now transitioning in bioinformatics. I always had great interest in bioinformatics and only now am able to make a move to take classes. I am currently enrolled in University of santa cruz extension classes. I am very interested in GSoC 2010 and have identified potentially these two projects.Lightweight Sequence objects and Lazy Parsing mentored by Chris Fields and Perl Run Wrappers for External Programs in a Flash mentored by Mark Jenson. Please let me know if these projects are still available. If yes, I will send in my application with more details Thanks a lot for your help. I would be exciting to work in Bio Perl and contribute. Anil From schae234 at gmail.com Tue Mar 30 16:33:42 2010 From: schae234 at gmail.com (Robert Schaefer) Date: Tue, 30 Mar 2010 10:33:42 -0600 Subject: [Bioperl-l] Google Summer of Code Message-ID: <60c593881003300933p46c7c295k69a21ee986ef5777@mail.gmail.com> Hello, I am looking for more information of your mentorship program for google's SOC. Who would I contact for more information and to ask questions? Thank you, Rob Schaefer