From shalabh.sharma7 at gmail.com Wed Sep 1 16:56:35 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 1 Sep 2010 16:56:35 -0400 Subject: [Bioperl-l] Bio::SearchIO::hmmer Message-ID: Hi , I am trying to parse hmmsearch report (from HMMER3). I am using the script mentioned here: http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm I am not getting anything but this "amoA_10genes_align.fasta.2 [M=247] for HMM" as the output, i am not even getting any error. I am attaching the hmmsearch report (just a test report) which i tried to test against the parser. I would really appreciate if anyone can help me out. Thanks Shalabh Sharma -------------- next part -------------- # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: amoA_10genes.hmm # target sequence database: test.faa # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: amoA_10genes_align.fasta.2 [M=247] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 1.6e-72 231.1 5.1 1.7e-72 231.0 3.5 1.0 1 gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacte 1.6e-72 231.1 5.1 1.7e-72 231.0 3.5 1.0 1 gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacte Domain annotation for each sequence (and alignments): >> gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacterium] # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 231.0 3.5 1.7e-72 1.7e-72 113 245 .. 1 144 [. 1 146 [. 0.95 Alignments for each domain: == domain 1 score: 231.0 bits; conditional E-value: 1.7e-72 amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe gi|63021979|gb|AAY26564.1| 1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 8********************************************************************************** PP amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af +k+ gi|63021979|gb|AAY26564.1| 84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144 **********************************************966666666655555 PP >> gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacterium] # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 231.0 3.5 1.7e-72 1.7e-72 113 245 .. 1 144 [. 1 146 [. 0.95 Alignments for each domain: == domain 1 score: 231.0 bits; conditional E-value: 1.7e-72 amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe gi|63021981|gb|AAY26565.1| 1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 8********************************************************************************** PP amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af +k+ gi|63021981|gb|AAY26565.1| 84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144 **********************************************966666666655555 PP Internal pipeline statistics summary: ------------------------------------- Query model(s): 1 (247 nodes) Target sequences: 2 (300 residues) Passed MSV filter: 2 (1); expected 0.0 (0.02) Passed bias filter: 2 (1); expected 0.0 (0.02) Passed Vit filter: 2 (1); expected 0.0 (0.001) Passed Fwd filter: 2 (1); expected 0.0 (1e-05) Initial search space (Z): 2 [actual number of targets] Domain search space (domZ): 2 [number of targets reported over threshold] # CPU time: 0.03u 0.00s 00:00:00.03 Elapsed: 00:00:00.08 # Mc/sec: 0.93 // From thomas.sharpton at gmail.com Wed Sep 1 17:29:26 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 1 Sep 2010 14:29:26 -0700 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: References: Message-ID: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> Hi Shalabh, We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to use the HMMER3 version, as found here: http://github.com/bioperl/bioperl-hmmer3 Hope this helps, T On Sep 1, 2010, at 1:56 PM, shalabh sharma wrote: > Hi , > I am trying to parse hmmsearch report (from HMMER3). I am using > the > script mentioned here: > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm > > I am not getting anything but this "amoA_10genes_align.fasta.2 > [M=247] for > HMM" as the output, i am not even getting any error. > I am attaching the hmmsearch report (just a test report) which i > tried to > test against the parser. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Thu Sep 2 04:44:58 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 2 Sep 2010 10:44:58 +0200 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> References: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> Message-ID: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> On Wed, 1 Sep 2010 14:29:26 -0700 Thomas Sharpton wrote: Hi, > We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to > use the HMMER3 version, as found here: > > http://github.com/bioperl/bioperl-hmmer3 Actually it's now included in the bioperl-live repository, but the code hasn't made it into a release yet. http://github.com/bioperl/bioperl-live.git Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From e.stupka at ucl.ac.uk Thu Sep 2 08:32:02 2010 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Thu, 2 Sep 2010 13:32:02 +0100 Subject: [Bioperl-l] git account Message-ID: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> Hello there, I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any? thanks! Elia --- '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." ~ Stephen Hawkings Senior Lecturer, Bioinformatics Scientific Director - Bioinformatics, UCL Genomics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Institute of Cell and Molecular Science Barts and The London School of Medicine and Dentistry 4 Newark Street Whitechapel London E1 2AT Office (UCL): +44 207 679 6493 Fax: +44 0207 6796817 Office (ICMS): +44 0207 8822374 Mobile: +44 787 6478912 From cjfields at illinois.edu Thu Sep 2 10:29:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 2 Sep 2010 09:29:40 -0500 Subject: [Bioperl-l] git account In-Reply-To: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> References: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> Message-ID: Done! Let us know if you run into problems. chris On Sep 2, 2010, at 7:32 AM, Elia Stupka wrote: > Hello there, > > I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any? > > thanks! > > Elia > > > --- > '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." > ~ Stephen Hawkings > > Senior Lecturer, Bioinformatics > Scientific Director - Bioinformatics, UCL Genomics > > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Institute of Cell and Molecular Science > Barts and The London School of Medicine and Dentistry > 4 Newark Street > Whitechapel > London > E1 2AT > > Office (UCL): +44 207 679 6493 > Fax: +44 0207 6796817 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 787 6478912 > > > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From J.Christopher.Ellis at duke.edu Thu Sep 2 10:53:34 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Thu, 2 Sep 2010 10:53:34 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <53096.1283439214@duke.edu> Chris have you had any luck with this? Thanks, Chris On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent: Yes, I see that one. It may be the ID hash that is being returned is empty. I'll look into it. -c On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > Hi Chris, > > The error is... > > "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." > > The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > use Bio::DB::EUtilities; > > > > > > > > > my (%taxa, @taxa); > > > > my (%names, %idmap); > > > > > > > > > # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', > > > > # (probably) > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > 20807972 > 730439); > > > > > > > my $factory = Bio::DB::EUtilities->new( > > - > eutil => 'elink', > > > -db => 'taxonomy', > > > > > -dbfrom => 'protein', > > > > > -correspondence => 1, > > > > > -id => @ids); > > > > > > > > > # iterate through the LinkSet objects > > > > while (my $ds = $factory->next_LinkSet) { > > > > > $taxa{($ds->get_submitted_ids)[0] > > } > = ($ds->get_ids)[0] > > } > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > => > 'esummary', > > > -db => 'taxonomy', > > > > > -id => @taxa ); > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > { > > > $names{($_->get_contents_by_name('TaxId')) > > [ > 0]} = > > ($_->get_contents_by_name('ScientificName'))[0 > > ] > ; > > } > > > > > > > > > foreach (@ids) { > > > > > $idmap{$_} = $names{$taxa{$_ > > } > }; > > } > > > > > > > > > # %idmap is > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > # 730439 => 'Bacillus caldolyticus' > > > > # 89318838 => undef (this record has been removed from the db) > > > > > > > > > 1; > > > Thanks, > > > > Chris > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: > Chris, > > Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. > > chris > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > Hi All, > > > > I am trying to extract the entire taxonomy of an organism including the > > classifications. Some thing like... > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > > > Thanks for all your help in advance! > > > > Chris > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Sep 2 12:21:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 02 Sep 2010 11:21:48 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <53096.1283439214@duke.edu> References: <53096.1283439214@duke.edu> Message-ID: <1283444508.5339.10.camel@pyrimidine.igb.uiuc.edu> Chris, There are a few things wrong with the original script, so I'll fix them. Basically, it makes the assumption that every ID in the original list is found. The problem: eutils only reports back data it finds, silently discarding IDs that don't match. So, using the original ID list when building the hashes needs a bit more error checking. Here's the revised script that works for me. https://gist.github.com/f5db90a432fed68548d4 I'm also adding a check to ensure all IDs are defined prior to adding them to the param string, just in case. chris On Thu, 2010-09-02 at 10:53 -0400, J. Christopher Ellis wrote: > Chris have you had any luck with this? > > Thanks, > Chris > > On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent: > Yes, I see that one. It may be the ID hash that is being > returned is empty. I'll look into it. > > -c > > On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > > > Hi Chris, > > > > The error is... > > > > "Use of uninitialized value $id in join or string at > C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm > line 363." > > > > The script from > http://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > > > use Bio::DB::EUtilities; > > > > > > > > > > > > > > > > > > my (%taxa, @taxa); > > > > > > > > my (%names, %idmap); > > > > > > > > > > > > > > > > > > # these are protein ids; nuc ids will work by changing > -dbfrom => 'nucleotide', > > > > > > > > # (probably) > > > > > > > > > > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > > > 20807972 > > 730439); > > > > > > > > > > > > > > my $factory = Bio::DB::EUtilities->new( > > > > - > > eutil => 'elink', > > > > > > -db => 'taxonomy', > > > > > > > > > > -dbfrom => 'protein', > > > > > > > > > > -correspondence => 1, > > > > > > > > > > -id => \@ids); > > > > > > > > > > > > > > > > > > # iterate through the LinkSet objects > > > > > > > > while (my $ds = $factory->next_LinkSet) { > > > > > > > > > > $taxa{($ds->get_submitted_ids)[0] > > > > } > > = ($ds->get_ids)[0] > > > > } > > > > > > > > > > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > > > => > > 'esummary', > > > > > > -db => 'taxonomy', > > > > > > > > > > -id => \@taxa ); > > > > > > > > > > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > > > > { > > > > > > $names{($_->get_contents_by_name('TaxId')) > > > > [ > > 0]} = > > > > ($_->get_contents_by_name('ScientificName'))[0 > > > > ] > > ; > > > > } > > > > > > > > > > > > > > > > > > foreach (@ids) { > > > > > > > > > > $idmap{$_} = $names{$taxa{$_ > > > > } > > }; > > > > } > > > > > > > > > > > > > > > > > > # %idmap is > > > > > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > > > > > # 730439 => 'Bacillus caldolyticus' > > > > > > > > # 89318838 => undef (this record has been removed from the > db) > > > > > > > > > > > > > > > > > > 1; > > > > > > Thanks, > > > > > > > > Chris > > > > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu > sent: > > Chris, > > > > Regarding a fix for that script, we would have to see your > modified script and the error. However, there are modules > within BioPerl to essentially do what you want, in particular, > Bio::DB::Taxonomy. > > > > chris > > > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > > > Hi All, > > > > > > I am trying to extract the entire taxonomy of an organism > including the > > > classifications. Some thing like... > > > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, > Order:Enterobacteriales, Family:Enterobacteriaceae, > Genus:Escherichia > > > > > > I am not worried about format just that I get the > information and the associated level of hierarchy. The script > found > http://bioperl.org/wiki/Species_names_from_accession_numbers% > 26quot%3B%26gt% > 3Bhttp://bioperl.org/wiki/Species_names_from_accession_numbers">athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > > > My first question is "Is there a known fix for this?" and > my second question is how do I get the full hierarchical > information (as seen above) with the taxonomy db? > > > > > > Thanks for all your help in advance! > > > > > > Chris > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l% > 26quot%3B%26gt% > 3Bhttp://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From thomas.sharpton at gmail.com Thu Sep 2 12:34:07 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Thu, 2 Sep 2010 09:34:07 -0700 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> References: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> Message-ID: So it is! I'm paying attention, I swear I am.... Shalabh, if the HMMER3 version of SearchIO doesn't solve your problem, do let us know. Best, Tom On Sep 2, 2010, at 1:44 AM, Kai Blin wrote: > On Wed, 1 Sep 2010 14:29:26 -0700 > Thomas Sharpton wrote: > > Hi, > >> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to >> use the HMMER3 version, as found here: >> >> http://github.com/bioperl/bioperl-hmmer3 > > Actually it's now included in the bioperl-live repository, but the > code > hasn't made it into a release yet. > > http://github.com/bioperl/bioperl-live.git > > Cheers, > Kai > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From johnny at mit.edu Sat Sep 4 11:40:37 2010 From: johnny at mit.edu (Jonathan Rameseder) Date: Sat, 4 Sep 2010 11:40:37 -0400 Subject: [Bioperl-l] Client-side Scansite Bioperl module Message-ID: hi guys it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases. maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D! best wishes johnny [1] Bio::Tools::Analysis::Protein::Scansite [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593 ******************** Jonathan Rameseder Ph.D. Candidate Computational Systems Biology Initiative Koch Institute for Integrative Cancer Research Massachusetts Institute of Technology ******************** From David.Messina at sbc.su.se Mon Sep 6 08:14:20 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Sep 2010 14:14:20 +0200 Subject: [Bioperl-l] Client-side Scansite Bioperl module In-Reply-To: References: Message-ID: <0EA1C4B0-66CF-4AE3-9A47-CC6624737821@sbc.su.se> Hi Jonathan, Great to hear you're interested in including your code in BioPerl! In general, we are liberal in what we accept. I think (and I'd like to hear what other BioPerlers think) the value of adding your code depends a lot on how it ties in with existing BioPerl objects ? does it make use of Bio::Seq or Bio::SeqIO, for example? If you haven't already, you might want to take a look at some of our developer documentation. For example: http://www.bioperl.org/wiki/Bioperl_Best_Practices http://www.bioperl.org/wiki/Advanced_BioPerl Also, the other thing to be aware of is that in the near future BioPerl itself will be splitting up into separately distributed modules anyway. I can't find a good recent thread that discussed the rationale and details, but here's a couple anyway: http://www.bioperl.org/wiki/Proposed_BioPerl_changes http://old.nabble.com/Final-BioPerl-1.6-release-td29180027.html#a29195208 Dave From ross at cuhk.edu.hk Tue Sep 7 04:28:00 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 7 Sep 2010 16:28:00 +0800 Subject: [Bioperl-l] Indexing nr database In-Reply-To: References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> By the following codes, I wanna index the 4G nr database, however, the index file is > 1T and the job has been running for weeks and still hasn't finished. Could anybody tell me how you accomplish the goal? Thanks in advance. use strict; use Bio::DB::Flat::BinarySearch; (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; # use single quotes so you don't have to write # regular expressions like "gi\\|(\\d+)" #my $primary_pattern = '^>(\S+)'; #if ($fullHeader == 1) { my $primary_pattern = '^>(.+)'; #} my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis H37Rv complete genome"; #$string =~ s/$primary_pattern/RRR/g; #print "$string\n"; # one or more patterns stored in a hash: my $secondary_patterns = {GI => 'gi\|(\d+)'}; my $db = Bio::DB::Flat::BinarySearch->new( -directory => $baseDir, -dbname => $dbName, -write_flag => 1, -primary_pattern => $primary_pattern, -primary_namespace => 'ACC', -secondary_patterns => $secondary_patterns, -verbose => 1, -format => 'fasta' ); $db->build_index($seqFile); From David.Messina at sbc.su.se Tue Sep 7 05:23:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 7 Sep 2010 11:23:42 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> Message-ID: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> Hi Ross, What do you need the index for? If it's random retrieval of sequences using an accession or GI, you'd be better off using NCBI's own database indexing and retrieval tools. They're far faster than BioPerl. They're distributed with Blast+ and available here: ftp://ftp.ncbi.nlm.nih.gov//blast/executables/LATEST Specifically, I'm talking about 'makeblastdb' and blastdbcmd'. I'm not sure what you mean by "4g" nr, but there's an already-indexed version of nr available here: ftp://ftp.ncbi.nih.gov//blast/db You can use that directly with the BLAST+ database tools. Also, you take a look at the cookbook at the end of the Blast+ user manual (available in the same download directory as Blast+ itself). Some nice examples there showing off the flexibility of this latest version of the software. Dave From ross at cuhk.edu.hk Tue Sep 7 05:18:16 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 7 Sep 2010 17:18:16 +0800 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <4C860148.3030000@fmi.ch> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> Message-ID: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> The reason is that I have to retrieve the specific information of the matched sequences, e.g. extract the 64th amino acid of the top matched sequence. Is there any way to achieve that? -----Original Message----- From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] Sent: Tuesday, September 07, 2010 5:09 PM To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk Subject: Re: [Bioperl-l] Indexing nr database Hi why don't you use the pre-indexed BLAST files from NCBI: ftp://ftp.ncbi.nih.gov/blast/db/ you can use them to fetch individual sequences by gi number or accession with the tool "blastdbcmd" from blast+ binaries: ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ regards, Hans On 09/07/2010 10:28 AM, Ross KK Leung wrote: > By the following codes, I wanna index the 4G nr database, however, the index > file is> 1T and the job has been running for weeks and still hasn't > finished. Could anybody tell me how you accomplish the goal? Thanks in > advance. > > use strict; > > use Bio::DB::Flat::BinarySearch; > > > > (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; > > > > # use single quotes so you don't have to write > > # regular expressions like "gi\\|(\\d+)" > > #my $primary_pattern = '^>(\S+)'; > > #if ($fullHeader == 1) { > > my $primary_pattern = '^>(.+)'; > > #} > > my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis > H37Rv complete genome"; > #$string =~ s/$primary_pattern/RRR/g; > > #print "$string\n"; > > > > # one or more patterns stored in a hash: > > my $secondary_patterns = {GI => 'gi\|(\d+)'}; > > > > my $db = Bio::DB::Flat::BinarySearch->new( > > -directory => $baseDir, > > -dbname => $dbName, > > -write_flag => 1, > > -primary_pattern => $primary_pattern, > > -primary_namespace => 'ACC', > > -secondary_patterns => $secondary_patterns, > > -verbose => 1, > > -format => 'fasta' ); > > > > $db->build_index($seqFile); > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Tue Sep 7 05:09:28 2010 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Tue, 07 Sep 2010 11:09:28 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> Message-ID: <4C860148.3030000@fmi.ch> Hi why don't you use the pre-indexed BLAST files from NCBI: ftp://ftp.ncbi.nih.gov/blast/db/ you can use them to fetch individual sequences by gi number or accession with the tool "blastdbcmd" from blast+ binaries: ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ regards, Hans On 09/07/2010 10:28 AM, Ross KK Leung wrote: > By the following codes, I wanna index the 4G nr database, however, the index > file is> 1T and the job has been running for weeks and still hasn't > finished. Could anybody tell me how you accomplish the goal? Thanks in > advance. > > use strict; > > use Bio::DB::Flat::BinarySearch; > > > > (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; > > > > # use single quotes so you don't have to write > > # regular expressions like "gi\\|(\\d+)" > > #my $primary_pattern = '^>(\S+)'; > > #if ($fullHeader == 1) { > > my $primary_pattern = '^>(.+)'; > > #} > > my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis > H37Rv complete genome"; > #$string =~ s/$primary_pattern/RRR/g; > > #print "$string\n"; > > > > # one or more patterns stored in a hash: > > my $secondary_patterns = {GI => 'gi\|(\d+)'}; > > > > my $db = Bio::DB::Flat::BinarySearch->new( > > -directory => $baseDir, > > -dbname => $dbName, > > -write_flag => 1, > > -primary_pattern => $primary_pattern, > > -primary_namespace => 'ACC', > > -secondary_patterns => $secondary_patterns, > > -verbose => 1, > > -format => 'fasta' ); > > > > $db->build_index($seqFile); > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Tue Sep 7 05:33:46 2010 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Tue, 07 Sep 2010 11:33:46 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> Message-ID: <4C8606FA.3000509@fmi.ch> On 09/07/2010 11:18 AM, Ross KK Leung wrote: > The reason is that I have to retrieve the specific information of the > matched sequences, e.g. extract the 64th amino acid of the top matched > sequence. Is there any way to achieve that? "blastdbcmd" has several options like "-range" and even if "blastdbcmd" does not give you the subset of information you want to fetch, I am still convinced you are quicker by fetching the complete entry with"blastdbcmd" and then parse the required data out of just one entry. Hans > -----Original Message----- > From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] > Sent: Tuesday, September 07, 2010 5:09 PM > To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk > Subject: Re: [Bioperl-l] Indexing nr database > > Hi > > > why don't you use the pre-indexed BLAST files from NCBI: > > ftp://ftp.ncbi.nih.gov/blast/db/ > > you can use them to fetch individual sequences by gi number or accession > with the tool "blastdbcmd" from blast+ binaries: > > ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ > > > regards, Hans > > > > On 09/07/2010 10:28 AM, Ross KK Leung wrote: >> By the following codes, I wanna index the 4G nr database, however, the > index >> file is> 1T and the job has been running for weeks and still hasn't >> finished. Could anybody tell me how you accomplish the goal? Thanks in >> advance. >> >> use strict; >> >> use Bio::DB::Flat::BinarySearch; >> >> >> >> (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = > @ARGV; >> >> >> >> # use single quotes so you don't have to write >> >> # regular expressions like "gi\\|(\\d+)" >> >> #my $primary_pattern = '^>(\S+)'; >> >> #if ($fullHeader == 1) { >> >> my $primary_pattern = '^>(.+)'; >> >> #} >> >> my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis >> H37Rv complete genome"; >> #$string =~ s/$primary_pattern/RRR/g; >> >> #print "$string\n"; >> >> >> >> # one or more patterns stored in a hash: >> >> my $secondary_patterns = {GI => 'gi\|(\d+)'}; >> >> >> >> my $db = Bio::DB::Flat::BinarySearch->new( >> >> -directory => $baseDir, >> >> -dbname => $dbName, >> >> -write_flag => 1, >> >> -primary_pattern => $primary_pattern, >> >> -primary_namespace => 'ACC', >> >> -secondary_patterns => $secondary_patterns, >> >> -verbose => 1, >> >> -format => 'fasta' ); >> >> >> >> $db->build_index($seqFile); >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From fs5 at sanger.ac.uk Tue Sep 7 08:09:52 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 07 Sep 2010 13:09:52 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> Message-ID: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> I am working a lot with feature-rich Bio::Seq objects these days and thought that it would be really nice if I could do something like: my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); instead of having to grep for the feature every time. There could then be 'by_tag' and 'by_region' options as well. According to the Bio::Seq docs, something like this seems to be planned at some stage. I would be willing to contribute to this feature if I can and if this isn't already being implemented by somebody else. Does anybody know the state of this feature? Frank -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jason at bioperl.org Tue Sep 7 13:36:07 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 07 Sep 2010 10:36:07 -0700 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <4C867807.2040907@bioperl.org> And the implementation would just be something like this? my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] eq 'my_gene' } $seq->get_SeqFeatures(); I think any implementation would be if we moved from the in-memory arrays & hash-based system to a sqlite db on the back-end for how Sequence and Feature objects are stored. This would be a somewhat slower but wouldn't have performance/memory problems we get for sequences with many annotations. -jason Frank Schwach wrote, On 9/7/10 5:09 AM: > I am working a lot with feature-rich Bio::Seq objects these days and > thought that it would be really nice if I could do something like: > > my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > > instead of having to grep for the feature every time. > There could then be 'by_tag' and 'by_region' options as well. > > According to the Bio::Seq docs, something like this seems to be planned > at some stage. I would be willing to contribute to this feature if I can > and if this isn't already being implemented by somebody else. > Does anybody know the state of this feature? > > Frank > > > > > > > From fs5 at sanger.ac.uk Wed Sep 8 04:42:57 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 08 Sep 2010 09:42:57 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <4C867807.2040907@bioperl.org> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> Message-ID: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Jason, Yes, I guess that would be the simplest way of doing it - basically just doing it the way the docs suggest for getting at a specific feature but hiding the grep behind a Bio::Seq method with search parameters. But we could also build a hash of feature tags as the Bio::Seq is built so that retrieval is more efficient. This could also be used to implement a bin indexing scheme for range queries, similar to what Bio::DB::GFF does. Is a move to an sqlite backend planend for the near future? Frank On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: > And the implementation would just be something like this? > > my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] > eq 'my_gene' } $seq->get_SeqFeatures(); > > I think any implementation would be if we moved from the in-memory > arrays & hash-based system to a sqlite db on the back-end for how > Sequence and Feature objects are stored. > This would be a somewhat slower but wouldn't have performance/memory > problems we get for sequences with many annotations. > > -jason > Frank Schwach wrote, On 9/7/10 5:09 AM: > > I am working a lot with feature-rich Bio::Seq objects these days and > > thought that it would be really nice if I could do something like: > > > > my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > > > > instead of having to grep for the feature every time. > > There could then be 'by_tag' and 'by_region' options as well. > > > > According to the Bio::Seq docs, something like this seems to be planned > > at some stage. I would be willing to contribute to this feature if I can > > and if this isn't already being implemented by somebody else. > > Does anybody know the state of this feature? > > > > Frank > > > > > > > > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From stefan.kirov at bms.com Wed Sep 8 11:09:55 2010 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 08 Sep 2010 11:09:55 -0400 Subject: [Bioperl-l] Another interesting Javascript library Message-ID: <4C87A743.5010109@bms.com> Sorry for off topic, but I believe a lot of people can find this quite useful: "CanvasXpress is a javascript library based on the tag implemented in HTML5. I developed this library as the core visualization component for our BMS systems biology platform which I hope to release soon. The basic idea was to have generic and simple way to display genomics data. CanvasXpress supports bar graphs, line graphs, bar-line combination graphs, boxplots, dotplots, area graphs, stacked graphs, percentage-stacked graphs, correlation plots, Venn diagrams, heatmaps, newick trees, 2D-scatter plots, 2D-scatter bubble plots, 3D-scatter plots, pie charts, networks (or pathways), and a genome browser. It also supports a few data transformations like log and exponential transformation, z-score, percentile transformation and ratio. It also support grouping of samples, zooming, events ... yada, yada, yada ... and more importantly I created an Ext panel for it. Take a look. http://canvasxpress.org/" Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From alperyilmaz at gmail.com Wed Sep 8 12:47:42 2010 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Wed, 8 Sep 2010 12:47:42 -0400 Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates Message-ID: Hi, I have a GFF file listing mRNA and CDS coordinates for every transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates based on that information. I was wondering, if there's already made script for that purpose that you're aware of. I already uploaded the GFF file into Bio::DB::SeqFeature database, so I can utilize both flat file or database based scripts. thanks, Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 From cjfields at illinois.edu Wed Sep 8 19:20:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Sep 2010 18:20:09 -0500 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). chris On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > Hi Jason, > > Yes, I guess that would be the simplest way of doing it - basically just > doing it the way the docs suggest for getting at a specific feature but > hiding the grep behind a Bio::Seq method with search parameters. But we > could also build a hash of feature tags as the Bio::Seq is built so that > retrieval is more efficient. This could also be used to implement a bin > indexing scheme for range queries, similar to what Bio::DB::GFF does. > Is a move to an sqlite backend planend for the near future? > > Frank > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: >> And the implementation would just be something like this? >> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] >> eq 'my_gene' } $seq->get_SeqFeatures(); >> >> I think any implementation would be if we moved from the in-memory >> arrays & hash-based system to a sqlite db on the back-end for how >> Sequence and Feature objects are stored. >> This would be a somewhat slower but wouldn't have performance/memory >> problems we get for sequences with many annotations. >> >> -jason >> Frank Schwach wrote, On 9/7/10 5:09 AM: >>> I am working a lot with feature-rich Bio::Seq objects these days and >>> thought that it would be really nice if I could do something like: >>> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); >>> >>> instead of having to grep for the feature every time. >>> There could then be 'by_tag' and 'by_region' options as well. >>> >>> According to the Bio::Seq docs, something like this seems to be planned >>> at some stage. I would be willing to contribute to this feature if I can >>> and if this isn't already being implemented by somebody else. >>> Does anybody know the state of this feature? >>> >>> Frank >>> >>> >>> >>> >>> >>> >>> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Sep 9 01:51:53 2010 From: jason at bioperl.org (Jason Stajich) Date: Wed, 08 Sep 2010 22:51:53 -0700 Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates In-Reply-To: References: Message-ID: <4C8875F9.6020502@bioperl.org> Hi Alper - This script operates on gtf so doesn't quite do what you want but could be modified to be simpler to just look at the CDS and mRNA rather than the exon,start/stop codon info http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl Otherwise I think there make be some easy ways to do this from some tools in MAKER too. -jason Alper Yilmaz wrote, On 9/8/10 9:47 AM: > Hi, > > I have a GFF file listing mRNA and CDS coordinates for every > transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates > based on that information. I was wondering, if there's already made > script for that purpose that you're aware of. > > I already uploaded the GFF file into Bio::DB::SeqFeature database, so > I can utilize both flat file or database based scripts. > > thanks, > > Alper Yilmaz > Post-doctoral Researcher > Plant Biotechnology Center > The Ohio State University > 1060 Carmack Rd > Columbus, OH 43210 > (614)688-4954 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Thu Sep 9 04:10:36 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 09 Sep 2010 09:10:36 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Message-ID: <1284019836.4777.281.camel@deskpro15336.dynamic.sanger.ac.uk> so something like an abstract Bio::Seq::FeatureContainer that defines the methods for storing and retrieving features and that would then be sub-classed to e.g. Bio::Seq::FeatureContainer::Memory or Bio::Seq::FeatureContainer:Sqlite - is that the plan? Is there any way I can get involved or is it better to wait for other features to be developed first? Cheers, Frank On Wed, 2010-09-08 at 18:20 -0500, Chris Fields wrote: > Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). > > chris > > On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > > > Hi Jason, > > > > Yes, I guess that would be the simplest way of doing it - basically just > > doing it the way the docs suggest for getting at a specific feature but > > hiding the grep behind a Bio::Seq method with search parameters. But we > > could also build a hash of feature tags as the Bio::Seq is built so that > > retrieval is more efficient. This could also be used to implement a bin > > indexing scheme for range queries, similar to what Bio::DB::GFF does. > > Is a move to an sqlite backend planend for the near future? > > > > Frank > > > > > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: > >> And the implementation would just be something like this? > >> > >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] > >> eq 'my_gene' } $seq->get_SeqFeatures(); > >> > >> I think any implementation would be if we moved from the in-memory > >> arrays & hash-based system to a sqlite db on the back-end for how > >> Sequence and Feature objects are stored. > >> This would be a somewhat slower but wouldn't have performance/memory > >> problems we get for sequences with many annotations. > >> > >> -jason > >> Frank Schwach wrote, On 9/7/10 5:09 AM: > >>> I am working a lot with feature-rich Bio::Seq objects these days and > >>> thought that it would be really nice if I could do something like: > >>> > >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > >>> > >>> instead of having to grep for the feature every time. > >>> There could then be 'by_tag' and 'by_region' options as well. > >>> > >>> According to the Bio::Seq docs, something like this seems to be planned > >>> at some stage. I would be willing to contribute to this feature if I can > >>> and if this isn't already being implemented by somebody else. > >>> Does anybody know the state of this feature? > >>> > >>> Frank > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jun.yin at ucd.ie Thu Sep 9 04:20:39 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Thu, 09 Sep 2010 09:20:39 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Message-ID: <00ea01cb4ff7$e30652f0$a912f8d0$%yin@ucd.ie> Hi, I would like to give a go on the bin indexing scheme on Bio::Seq(or a similar package to Bio::LocatableSeq). The idea is to save the index of sequences to a local database (AnyDBM) instead of the memory itself. So this will free some memory usage. This idea actually comes from Bio::DB::Fasta, as implemented by Lincoln Stein. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Thursday, September 09, 2010 12:20 AM To: Frank Schwach Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Seq, search for specific features Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). chris On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > Hi Jason, > > Yes, I guess that would be the simplest way of doing it - basically just > doing it the way the docs suggest for getting at a specific feature but > hiding the grep behind a Bio::Seq method with search parameters. But we > could also build a hash of feature tags as the Bio::Seq is built so that > retrieval is more efficient. This could also be used to implement a bin > indexing scheme for range queries, similar to what Bio::DB::GFF does. > Is a move to an sqlite backend planend for the near future? > > Frank > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: >> And the implementation would just be something like this? >> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] >> eq 'my_gene' } $seq->get_SeqFeatures(); >> >> I think any implementation would be if we moved from the in-memory >> arrays & hash-based system to a sqlite db on the back-end for how >> Sequence and Feature objects are stored. >> This would be a somewhat slower but wouldn't have performance/memory >> problems we get for sequences with many annotations. >> >> -jason >> Frank Schwach wrote, On 9/7/10 5:09 AM: >>> I am working a lot with feature-rich Bio::Seq objects these days and >>> thought that it would be really nice if I could do something like: >>> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); >>> >>> instead of having to grep for the feature every time. >>> There could then be 'by_tag' and 'by_region' options as well. >>> >>> According to the Bio::Seq docs, something like this seems to be planned >>> at some stage. I would be willing to contribute to this feature if I can >>> and if this isn't already being implemented by somebody else. >>> Does anybody know the state of this feature? >>> >>> Frank >>> >>> >>> >>> >>> >>> >>> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From s1012635 at student.hsleiden.nl Thu Sep 9 05:27:23 2010 From: s1012635 at student.hsleiden.nl (_Lelieveld, Stefan - s1012635) Date: Thu, 9 Sep 2010 11:27:23 +0200 (CEST) Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Hi, I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : use Bio::Tools::Tmhmm; my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); while( my $tmhmm_feat = $parser->next_result ) { #do something #eg push @tmhmm_feat, $tmhmm_feat; } How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? cheers! Stefan Lelieveld From fs5 at sanger.ac.uk Thu Sep 9 06:28:51 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 09 Sep 2010 11:28:51 +0100 Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> References: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <1284028131.4777.290.camel@deskpro15336.dynamic.sanger.ac.uk> I haven't used that module myself but it appears to be a parser for results from TMHMM, i.e. you don't feed it the FASTA file but the output from TMHMM after it was run. To run TMHMM you should use Bio::Tools::Run::Tmhmm http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Tmhmm.pm Follow the synopsis to feed the tool with your sequences. You can learn how to read a FASTA file and access each sequence in a loop here: http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples Essentially it boils down to: use Bio::SeqIO; my $file = shift; # to get a file path from command line my $inseq = Bio::SeqIO->new(-file => "<$file",-format => 'FASTA' ); while (my $seq = $inseq->next_seq) { print $seq->accession_number,"\n"; } as an example for printing out accession numbers from $seq, which is a Bio::Seq object. So what you have to do now is to feed each of those Bio::Seq objects into your TMHMM runner. Frank On Thu, 2010-09-09 at 11:27 +0200, _Lelieveld, Stefan - s1012635 wrote: > Hi, > > I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). > I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. > > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : > use Bio::Tools::Tmhmm; > my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); > while( my $tmhmm_feat = $parser->next_result ) { > #do something > #eg > push @tmhmm_feat, $tmhmm_feat; > } > > How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? > > cheers! > > Stefan Lelieveld > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From kai.blin at biotech.uni-tuebingen.de Thu Sep 9 06:16:08 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 9 Sep 2010 12:16:08 +0200 Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> References: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl> <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <20100909121608.2571bbff.kai.blin@biotech.uni-tuebingen.de> On Thu, 9 Sep 2010 11:27:23 +0200 (CEST) "_Lelieveld, Stefan - s1012635" wrote: Hi Stefan, > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : > use Bio::Tools::Tmhmm; > my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); > while( my $tmhmm_feat = $parser->next_result ) { > #do something > #eg > push @tmhmm_feat, $tmhmm_feat; > } > > How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? You need to run TMHMM first, of course. Bio::Tools::Tmhmm only parses the TMHMM output file and returns an object that you can ask for Bio::SeqFeature objects. So if you want to run TMHMM on some fasta files, this module isn't going to do that for you. Assuming that input.txt contains the TMHMM output, """ my $parser = new Bio::Tools:Tmhmm(-file => "input.txt"); """ will load parse the TMHMM output for you. HTH, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From elanorbust2 at yahoo.com Thu Sep 9 12:10:06 2010 From: elanorbust2 at yahoo.com (sally roberts) Date: Thu, 9 Sep 2010 09:10:06 -0700 (PDT) Subject: [Bioperl-l] standaloneblastplus Message-ID: <154453.73718.qm@web37504.mail.mud.yahoo.com> I am running a test for standaloneblastplus but getting data back that does not exist in my query or my local database. Below is a outline of my script small database, query list, and erroneous results. As you will notice the query list is comprised of the first four sequences found in the database. The results say it can not find the first two and then the mathces for the last two do not exist! Thanks for any help! Program #!/usr/bin/perl use Bio::Tools::Run::StandAloneBlastPlus; $fac = Bio::Tools::Run::StandAloneBlastPlus->new( ? -db_name => 'ITS', ? -db_data => 'smallDB.fas', ? -create => 1 ); $result = $fac->blastn( -query => , 'sequences.fasta', ??????????????????????? -outfile => 'ITStest2.bls'); smallDB.fas Data >302585252|HM807352|Waitea circinata? internal transcribed spacer 1 ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >302585252|HM807352|Waitea circinata? internal transcribed spacer 2 GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGCGCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCATTAAAGCGGAGGAA >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1 GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGCGCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTATATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAGAATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCTACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACCTCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTGACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGTGAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCGTTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCGTCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTACCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAGTCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATATATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGAACGCA CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTAAGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCTTATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTGGGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGAATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGAAACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGAAATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTAACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGACTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAAGTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACGTAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGAAACTTTTATATATAGTTCGCCGAATAATAGCGAAC >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer 1 AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCGCTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATCGTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTGAGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTCGAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCGAGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACCGGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTGGCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCCCTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTGTATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATATGAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATGTATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCTATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1 AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCTTTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATCTCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCGGGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAATAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGAAACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAATATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCCGCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTCGTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATAGTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATACAAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCATAAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1 AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGCTCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATTCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTCTTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCGATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGAGATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAATTACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAACGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGTAGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATATATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG sequences.fasta data >Test1 ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >Test2 GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >Test3 CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >Test4 GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA Results BLASTN 2.2.24+ Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Database: ITS ?????????? 5 sequences; 1,102 total letters Query=? Test1 Length=204 ***** No hits found ***** Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 202071 Query=? Test2 Length=192 ***** No hits found ***** Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 189507 Query=? Test3 Length=437 ????????????????????????????????????????????????????????????????????? Score???? E Sequences producing significant alignments:????????????????????????? (Bits)? Value dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 300??? 2e-085 dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 6e-016 dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012 dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 4e-012 >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G59F Length=203 ?Score =? 300 bits (162),? Expect = 2e-085 ?Identities = 176/182 (96%), Gaps = 4/182 (2%) ?Strand=Plus/Plus Query? 10?? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC? 66 ??????????? ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC? 81 Query? 67?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 126 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 82?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 141 Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 201 Query? 187? GG? 188 ??????????? || Sbjct? 202? GG? 203 >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G64F Length=217 ?Score = 69.4 bits (37),? Expect = 6e-016 ?Identities = 39/40 (97%), Gaps = 0/40 (0%) ?Strand=Plus/Plus Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188 ??????????? ||||| |||||||||||||||||||||||||||||||||| Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G60F Length=206 ?Score = 58.4 bits (31),? Expect = 1e-012 ?Identities = 39/42 (92%), Gaps = 3/42 (7%) ?Strand=Plus/Plus Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||| || ||| |||||||||||||||||||||||||||||| Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G65F Length=256 ?Score = 56.5 bits (30),? Expect = 4e-012 ?Identities = 30/30 (100%), Gaps = 0/30 (0%) ?Strand=Plus/Plus Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||||||||||||||||||||||||||||| Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 442850 Query=? Test4 Length=521 ????????????????????????????????????????????????????????????????????? Score???? E Sequences producing significant alignments:????????????????????????? (Bits)? Value dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 309??? 4e-088 dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 7e-016 dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012 dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 5e-012 >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G59F Length=203 ?Score =? 309 bits (167),? Expect = 4e-088 ?Identities = 177/181 (97%), Gaps = 3/181 (1%) ?Strand=Plus/Plus Query? 7??? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA? 63 ??????????? ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA? 82 Query? 64?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 123 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 83?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 142 Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 183 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 202 Query? 184? G? 184 ??????????? | Sbjct? 203? G? 203 >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G64F Length=217 ?Score = 69.4 bits (37),? Expect = 7e-016 ?Identities = 39/40 (97%), Gaps = 0/40 (0%) ?Strand=Plus/Plus Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184 ??????????? ||||| |||||||||||||||||||||||||||||||||| Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G60F Length=206 ?Score = 58.4 bits (31),? Expect = 1e-012 ?Identities = 39/42 (92%), Gaps = 3/42 (7%) ?Strand=Plus/Plus Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182 ??????????? |||| || ||| |||||||||||||||||||||||||||||| Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G65F Length=256 ?Score = 56.5 bits (30),? Expect = 5e-012 ?Identities = 30/30 (100%), Gaps = 0/30 (0%) ?Strand=Plus/Plus Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182 ??????????? |||||||||||||||||||||||||||||| Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 530378 ? Database: ITS ??? Posted date:? Aug 27, 2010? 9:43 AM ? Number of letters in database: 1,102 ? Number of sequences in database:? 5 Matrix: blastn matrix 1 -2 Gap Penalties: Existence: 0, Extension: 2.5 From jaya1786 at gmail.com Thu Sep 9 12:59:51 2010 From: jaya1786 at gmail.com (jayanthijayakumar) Date: Thu, 9 Sep 2010 22:29:51 +0530 Subject: [Bioperl-l] Regarding GSoC 2010 Message-ID: Respected sir/madam, I am Jayanthi Jayakumar doing my second year MS(By Research) in computational biology in Anna University Chennai,India. Iam very much interested to participate in GSoC 2010 under the project "Major Bioperl recognition". I request you to provide details and eligiblity criteria for the same. Thanking you, yours faithfully, Jayanthi Jayakumar From Russell.Smithies at agresearch.co.nz Thu Sep 9 18:54:43 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 10 Sep 2010 10:54:43 +1200 Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <154453.73718.qm@web37504.mail.mud.yahoo.com> References: <154453.73718.qm@web37504.mail.mud.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz> Is that a typo in your email or are some of your fasta headers in your db incorrect? Eg. >301333052|GU725063|Xiphinema adenohystherum internal transcribed >301333052|GU725063|spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Shouldn't that be: >301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Maybe the invalid fasta headers are breaking the db formatter? Russell Smithies Technical Support T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of sally roberts > Sent: Friday, 10 September 2010 4:10 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] standaloneblastplus > > I am running a test for standaloneblastplus but getting data back that > does not exist in my query or my local database. Below is a outline of my > script small database, query list, and erroneous results. As you will > notice the query list is comprised of the first four sequences found in > the database. The results say it can not find the first two and then the > mathces for the last two do not exist! > > Thanks for any help! > > > > Program > > > #!/usr/bin/perl > > use Bio::Tools::Run::StandAloneBlastPlus; > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'ITS', > -db_data => 'smallDB.fas', > -create => 1 > ); > > $result = $fac->blastn( -query => , 'sequences.fasta', > -outfile => 'ITStest2.bls'); > > > smallDB.fas Data > > >302585252|HM807352|Waitea circinata internal transcribed spacer 1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >302585252|HM807352|Waitea circinata internal transcribed spacer 2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >302585250|HM802273|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >302585249|HM802272|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > >302585248|HM802271|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC > CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA > TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT > GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC > CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG > GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC > GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC > ATATCATTAAAGCGGAGGAA > > >301333053|GU725064|Xiphinema turcicum internal transcribed spacer 1 > GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC > GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA > TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG > AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT > ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC > TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG > ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT > GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG > TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG > TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA > CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG > TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA > TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA > ACGCA > CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA > AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT > TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA > > >301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer > 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG > CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT > CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG > GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA > ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA > AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA > AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA > ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA > CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA > AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA > GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG > TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA > AACTTTTATATATAGTTCGCCGAATAATAGCGAAC > > >301333051|GU725062|Xiphinema sphaerocephalum internal transcribed spacer > 1 > AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG > CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC > GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG > AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC > GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG > AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC > GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG > GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC > CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG > TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT > GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG > TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT > ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC > > >301333050|GU725061|Xiphinema hispanum internal transcribed spacer 1 > AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT > TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC > TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG > GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA > TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA > AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA > TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC > GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC > GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC > ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA > GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC > AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT > AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG > > >301333049|GU725060|Xiphinema pyrenaicum internal transcribed spacer 1 > AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC > TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT > TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC > TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG > ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA > GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT > TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA > CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT > AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA > GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA > TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT > ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG > > > > sequences.fasta data > > >Test1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >Test2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >Test3 > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >Test4 > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > > > > Results > > BLASTN 2.2.24+ > > > Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb > Miller (2000), "A greedy algorithm for aligning DNA sequences", J > Comput Biol 2000; 7(1-2):203-14. > > > > Database: ITS > 5 sequences; 1,102 total letters > > > > Query= Test1 > Length=204 > > > ***** No hits found ***** > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 202071 > > > Query= Test2 > Length=192 > > > ***** No hits found ***** > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 189507 > > > Query= Test3 > Length=437 > > Score E > Sequences producing significant alignments: > (Bits) Value > > dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 300 2e-085 > dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4 6e-016 > dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4 1e-012 > dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5 4e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > > Score = 300 bits (162), Expect = 2e-085 > Identities = 176/182 (96%), Gaps = 4/182 (2%) > Strand=Plus/Plus > > Query 10 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC > 66 > ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| > Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC > 81 > > Query 67 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 126 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 82 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 141 > > Query 127 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 186 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 142 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 201 > > Query 187 GG 188 > || > Sbjct 202 GG 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > > Score = 69.4 bits (37), Expect = 6e-016 > Identities = 39/40 (97%), Gaps = 0/40 (0%) > Strand=Plus/Plus > > Query 149 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 188 > ||||| |||||||||||||||||||||||||||||||||| > Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > > Score = 58.4 bits (31), Expect = 1e-012 > Identities = 39/42 (92%), Gaps = 3/42 (7%) > Strand=Plus/Plus > > Query 146 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 186 > |||| || ||| |||||||||||||||||||||||||||||| > Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > > Score = 56.5 bits (30), Expect = 4e-012 > Identities = 30/30 (100%), Gaps = 0/30 (0%) > Strand=Plus/Plus > > Query 157 AAAACTTTCAACAACGGATCTCTTGGTTCT 186 > |||||||||||||||||||||||||||||| > Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 442850 > > > Query= Test4 > Length=521 > > Score E > Sequences producing significant alignments: > (Bits) Value > > dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 309 4e-088 > dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4 7e-016 > dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4 1e-012 > dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5 5e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > > Score = 309 bits (167), Expect = 4e-088 > Identities = 177/181 (97%), Gaps = 3/181 (1%) > Strand=Plus/Plus > > Query 7 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA > 63 > ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| > Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA > 82 > > Query 64 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 123 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 83 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 142 > > Query 124 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 183 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 143 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 202 > > Query 184 G 184 > | > Sbjct 203 G 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > > Score = 69.4 bits (37), Expect = 7e-016 > Identities = 39/40 (97%), Gaps = 0/40 (0%) > Strand=Plus/Plus > > Query 145 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 184 > ||||| |||||||||||||||||||||||||||||||||| > Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > > Score = 58.4 bits (31), Expect = 1e-012 > Identities = 39/42 (92%), Gaps = 3/42 (7%) > Strand=Plus/Plus > > Query 142 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 182 > |||| || ||| |||||||||||||||||||||||||||||| > Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > > Score = 56.5 bits (30), Expect = 5e-012 > Identities = 30/30 (100%), Gaps = 0/30 (0%) > Strand=Plus/Plus > > Query 153 AAAACTTTCAACAACGGATCTCTTGGTTCT 182 > |||||||||||||||||||||||||||||| > Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 530378 > > > Database: ITS > Posted date: Aug 27, 2010 9:43 AM > Number of letters in database: 1,102 > Number of sequences in database: 5 > > > > Matrix: blastn matrix 1 -2 > Gap Penalties: Existence: 0, Extension: 2.5 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From elanorbust2 at yahoo.com Fri Sep 10 11:13:08 2010 From: elanorbust2 at yahoo.com (sally roberts) Date: Fri, 10 Sep 2010 08:13:08 -0700 (PDT) Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz> Message-ID: <23696.14536.qm@web37508.mail.mud.yahoo.com> I think that is just a email error. Thanks for looking though! --- On Thu, 9/9/10, Smithies, Russell wrote: From: Smithies, Russell Subject: RE: [Bioperl-l] standaloneblastplus To: "'sally roberts'" , "'bioperl-l at lists.open-bio.org'" Date: Thursday, September 9, 2010, 6:54 PM Is that a typo in your email or are some of your fasta headers in your db incorrect? Eg. >301333052|GU725063|Xiphinema adenohystherum? internal transcribed >301333052|GU725063|spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Shouldn't that be: >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Maybe the invalid fasta headers are breaking the db formatter? Russell Smithies Technical Support T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809 F? +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of sally roberts > Sent: Friday, 10 September 2010 4:10 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] standaloneblastplus > > I am running a test for standaloneblastplus but getting data back that > does not exist in my query or my local database. Below is a outline of my > script small database, query list, and erroneous results. As you will > notice the query list is comprised of the first four sequences found in > the database. The results say it can not find the first two and then the > mathces for the last two do not exist! > > Thanks for any help! > > > > Program > > > #!/usr/bin/perl > > use Bio::Tools::Run::StandAloneBlastPlus; > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( >???-db_name => 'ITS', >???-db_data => 'smallDB.fas', >???-create => 1 > ); > > $result = $fac->blastn( -query => , 'sequences.fasta', >? ? ? ? ? ? ? ? ? ? ? ???-outfile => 'ITStest2.bls'); > > > smallDB.fas Data > > >302585252|HM807352|Waitea circinata? internal transcribed spacer 1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >302585252|HM807352|Waitea circinata? internal transcribed spacer 2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC > CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA > TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT > GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC > CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG > GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC > GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC > ATATCATTAAAGCGGAGGAA > > >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1 > GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC > GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA > TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG > AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT > ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC > TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG > ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT > GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG > TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG > TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA > CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG > TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA > TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA > ACGCA > CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA > AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT > TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA > > >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer > 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG > CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT > CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG > GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA > ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA > AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA > AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA > ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA > CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA > AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA > GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG > TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA > AACTTTTATATATAGTTCGCCGAATAATAGCGAAC > > >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer > 1 > AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG > CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC > GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG > AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC > GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG > AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC > GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG > GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC > CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG > TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT > GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG > TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT > ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC > > >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1 > AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT > TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC > TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG > GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA > TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA > AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA > TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC > GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC > GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC > ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA > GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC > AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT > AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG > > >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1 > AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC > TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT > TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC > TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG > ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA > GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT > TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA > CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT > AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA > GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA > TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT > ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG > > > > sequences.fasta data > > >Test1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >Test2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >Test3 > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >Test4 > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > > > > Results > > BLASTN 2.2.24+ > > > Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb > Miller (2000), "A greedy algorithm for aligning DNA sequences", J > Comput Biol 2000; 7(1-2):203-14. > > > > Database: ITS >? ? ? ? ? ? 5 sequences; 1,102 total letters > > > > Query=? Test1 > Length=204 > > > ***** No hits found ***** > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 202071 > > > Query=? Test2 > Length=192 > > > ***** No hits found ***** > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 189507 > > > Query=? Test3 > Length=437 > > Score? ???E > Sequences producing significant alignments: > (Bits)? Value > > dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 300? ? 2e-085 > dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4? ? 6e-016 > dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4? ? 1e-012 > dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5? ? 4e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > >? Score =? 300 bits (162),? Expect = 2e-085 >? Identities = 176/182 (96%), Gaps = 4/182 (2%) >? Strand=Plus/Plus > > Query? 10???TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC > 66 >? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| > Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC > 81 > > Query? 67???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 126 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 82???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 141 > > Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 186 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 201 > > Query? 187? GG? 188 >? ? ? ? ? ???|| > Sbjct? 202? GG? 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > >? Score = 69.4 bits (37),? Expect = 6e-016 >? Identities = 39/40 (97%), Gaps = 0/40 (0%) >? Strand=Plus/Plus > > Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188 >? ? ? ? ? ???||||| |||||||||||||||||||||||||||||||||| > Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > >? Score = 58.4 bits (31),? Expect = 1e-012 >? Identities = 39/42 (92%), Gaps = 3/42 (7%) >? Strand=Plus/Plus > > Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 >? ? ? ? ? ???|||| || ||| |||||||||||||||||||||||||||||| > Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > >? Score = 56.5 bits (30),? Expect = 4e-012 >? Identities = 30/30 (100%), Gaps = 0/30 (0%) >? Strand=Plus/Plus > > Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186 >? ? ? ? ? ???|||||||||||||||||||||||||||||| > Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 442850 > > > Query=? Test4 > Length=521 > > Score? ???E > Sequences producing significant alignments: > (Bits)? Value > > dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 309? ? 4e-088 > dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4? ? 7e-016 > dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4? ? 1e-012 > dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5? ? 5e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > >? Score =? 309 bits (167),? Expect = 4e-088 >? Identities = 177/181 (97%), Gaps = 3/181 (1%) >? Strand=Plus/Plus > > Query? 7? ? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA > 63 >? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| > Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA > 82 > > Query? 64???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 123 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 83???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 142 > > Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 183 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 202 > > Query? 184? G? 184 >? ? ? ? ? ???| > Sbjct? 203? G? 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > >? Score = 69.4 bits (37),? Expect = 7e-016 >? Identities = 39/40 (97%), Gaps = 0/40 (0%) >? Strand=Plus/Plus > > Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184 >? ? ? ? ? ???||||| |||||||||||||||||||||||||||||||||| > Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > >? Score = 58.4 bits (31),? Expect = 1e-012 >? Identities = 39/42 (92%), Gaps = 3/42 (7%) >? Strand=Plus/Plus > > Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182 >? ? ? ? ? ???|||| || ||| |||||||||||||||||||||||||||||| > Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > >? Score = 56.5 bits (30),? Expect = 5e-012 >? Identities = 30/30 (100%), Gaps = 0/30 (0%) >? Strand=Plus/Plus > > Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182 >? ? ? ? ? ???|||||||||||||||||||||||||||||| > Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 530378 > > >???Database: ITS >? ???Posted date:? Aug 27, 2010? 9:43 AM >???Number of letters in database: 1,102 >???Number of sequences in database:? 5 > > > > Matrix: blastn matrix 1 -2 > Gap Penalties: Existence: 0, Extension: 2.5 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From David.Messina at sbc.su.se Fri Sep 10 12:23:26 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 10 Sep 2010 18:23:26 +0200 Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <23696.14536.qm@web37508.mail.mud.yahoo.com> References: <23696.14536.qm@web37508.mail.mud.yahoo.com> Message-ID: Hi Sally, Did you run the same search on the command line, outside of BioPerl? The issue you're having may be with Blast+ and not BioPerl. For example, it's possible that the low-complexity and compositional matrix adjustment filtering (which are turned on by default) are excluding the expected matches. Dave On Sep 10, 2010, at 17:13 , sally roberts wrote: > I think that is just a email error. Thanks for looking though! > > --- On Thu, 9/9/10, Smithies, Russell wrote: > > From: Smithies, Russell > Subject: RE: [Bioperl-l] standaloneblastplus > To: "'sally roberts'" , "'bioperl-l at lists.open-bio.org'" > Date: Thursday, September 9, 2010, 6:54 PM > > Is that a typo in your email or are some of your fasta headers in your db incorrect? > Eg. >> 301333052|GU725063|Xiphinema adenohystherum internal transcribed >> 301333052|GU725063|spacer 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT > > Shouldn't that be: >> 301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT > > Maybe the invalid fasta headers are breaking the db formatter? > > > Russell Smithies > > Technical Support > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of sally roberts >> Sent: Friday, 10 September 2010 4:10 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] standaloneblastplus >> >> I am running a test for standaloneblastplus but getting data back that >> does not exist in my query or my local database. Below is a outline of my >> script small database, query list, and erroneous results. As you will >> notice the query list is comprised of the first four sequences found in >> the database. The results say it can not find the first two and then the >> mathces for the last two do not exist! >> >> Thanks for any help! >> >> >> >> Program >> >> >> #!/usr/bin/perl >> >> use Bio::Tools::Run::StandAloneBlastPlus; >> >> >> $fac = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_name => 'ITS', >> -db_data => 'smallDB.fas', >> -create => 1 >> ); >> >> $result = $fac->blastn( -query => , 'sequences.fasta', >> -outfile => 'ITStest2.bls'); >> >> >> smallDB.fas Data >> >>> 302585252|HM807352|Waitea circinata internal transcribed spacer 1 >> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC >> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT >> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >> >>> 302585252|HM807352|Waitea circinata internal transcribed spacer 2 >> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT >> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA >> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >> >>> 302585250|HM802273|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT >> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA >> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA >> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT >> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC >> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >> >>> 302585249|HM802272|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG >> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA >> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT >> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT >> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT >> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA >> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG >> GAA >> >>> 302585248|HM802271|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC >> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA >> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT >> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC >> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG >> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC >> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC >> ATATCATTAAAGCGGAGGAA >> >>> 301333053|GU725064|Xiphinema turcicum internal transcribed spacer 1 >> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC >> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA >> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG >> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT >> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC >> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG >> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT >> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG >> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG >> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA >> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG >> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA >> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA >> ACGCA >> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA >> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT >> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA >> >>> 301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer >> 1 >> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG >> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT >> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG >> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA >> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA >> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA >> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA >> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA >> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA >> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA >> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG >> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA >> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC >> >>> 301333051|GU725062|Xiphinema sphaerocephalum internal transcribed spacer >> 1 >> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG >> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC >> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG >> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC >> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG >> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC >> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG >> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC >> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG >> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT >> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG >> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT >> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC >> >>> 301333050|GU725061|Xiphinema hispanum internal transcribed spacer 1 >> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT >> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC >> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG >> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA >> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA >> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA >> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC >> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC >> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC >> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA >> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC >> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT >> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG >> >>> 301333049|GU725060|Xiphinema pyrenaicum internal transcribed spacer 1 >> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC >> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT >> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC >> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG >> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA >> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT >> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA >> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT >> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA >> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA >> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT >> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG >> >> >> >> sequences.fasta data >> >>> Test1 >> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC >> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT >> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >> >>> Test2 >> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT >> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA >> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >> >>> Test3 >> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT >> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA >> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA >> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT >> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC >> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >> >>> Test4 >> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG >> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA >> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT >> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT >> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT >> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA >> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG >> GAA >> >> >> >> >> Results >> >> BLASTN 2.2.24+ >> >> >> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb >> Miller (2000), "A greedy algorithm for aligning DNA sequences", J >> Comput Biol 2000; 7(1-2):203-14. >> >> >> >> Database: ITS >> 5 sequences; 1,102 total letters >> >> >> >> Query= Test1 >> Length=204 >> >> >> ***** No hits found ***** >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 202071 >> >> >> Query= Test2 >> Length=192 >> >> >> ***** No hits found ***** >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 189507 >> >> >> Query= Test3 >> Length=437 >> >> Score E >> Sequences producing significant alignments: >> (Bits) Value >> >> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 300 2e-085 >> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 69.4 6e-016 >> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 58.4 1e-012 >> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 56.5 4e-012 >> >> >>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G59F >> Length=203 >> >> Score = 300 bits (162), Expect = 2e-085 >> Identities = 176/182 (96%), Gaps = 4/182 (2%) >> Strand=Plus/Plus >> >> Query 10 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC >> 66 >> ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| >> Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC >> 81 >> >> Query 67 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT >> 126 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 82 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT >> 141 >> >> Query 127 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT >> 186 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 142 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT >> 201 >> >> Query 187 GG 188 >> || >> Sbjct 202 GG 203 >> >> >>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G64F >> Length=217 >> >> Score = 69.4 bits (37), Expect = 6e-016 >> Identities = 39/40 (97%), Gaps = 0/40 (0%) >> Strand=Plus/Plus >> >> Query 149 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 188 >> ||||| |||||||||||||||||||||||||||||||||| >> Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 >> >> >>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G60F >> Length=206 >> >> Score = 58.4 bits (31), Expect = 1e-012 >> Identities = 39/42 (92%), Gaps = 3/42 (7%) >> Strand=Plus/Plus >> >> Query 146 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 186 >> |||| || ||| |||||||||||||||||||||||||||||| >> Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 >> >> >>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G65F >> Length=256 >> >> Score = 56.5 bits (30), Expect = 4e-012 >> Identities = 30/30 (100%), Gaps = 0/30 (0%) >> Strand=Plus/Plus >> >> Query 157 AAAACTTTCAACAACGGATCTCTTGGTTCT 186 >> |||||||||||||||||||||||||||||| >> Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 442850 >> >> >> Query= Test4 >> Length=521 >> >> Score E >> Sequences producing significant alignments: >> (Bits) Value >> >> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 309 4e-088 >> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 69.4 7e-016 >> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 58.4 1e-012 >> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 56.5 5e-012 >> >> >>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G59F >> Length=203 >> >> Score = 309 bits (167), Expect = 4e-088 >> Identities = 177/181 (97%), Gaps = 3/181 (1%) >> Strand=Plus/Plus >> >> Query 7 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA >> 63 >> ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| >> Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA >> 82 >> >> Query 64 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG >> 123 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 83 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG >> 142 >> >> Query 124 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG >> 183 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 143 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG >> 202 >> >> Query 184 G 184 >> | >> Sbjct 203 G 203 >> >> >>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G64F >> Length=217 >> >> Score = 69.4 bits (37), Expect = 7e-016 >> Identities = 39/40 (97%), Gaps = 0/40 (0%) >> Strand=Plus/Plus >> >> Query 145 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 184 >> ||||| |||||||||||||||||||||||||||||||||| >> Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 >> >> >>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G60F >> Length=206 >> >> Score = 58.4 bits (31), Expect = 1e-012 >> Identities = 39/42 (92%), Gaps = 3/42 (7%) >> Strand=Plus/Plus >> >> Query 142 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 182 >> |||| || ||| |||||||||||||||||||||||||||||| >> Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 >> >> >>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G65F >> Length=256 >> >> Score = 56.5 bits (30), Expect = 5e-012 >> Identities = 30/30 (100%), Gaps = 0/30 (0%) >> Strand=Plus/Plus >> >> Query 153 AAAACTTTCAACAACGGATCTCTTGGTTCT 182 >> |||||||||||||||||||||||||||||| >> Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 530378 >> >> >> Database: ITS >> Posted date: Aug 27, 2010 9:43 AM >> Number of letters in database: 1,102 >> Number of sequences in database: 5 >> >> >> >> Matrix: blastn matrix 1 -2 >> Gap Penalties: Existence: 0, Extension: 2.5 >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Sat Sep 11 12:13:09 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Sat, 11 Sep 2010 17:13:09 +0100 Subject: [Bioperl-l] Regarding GSoC 2010 In-Reply-To: References: Message-ID: <019501cb51cc$39d15730$ad740590$%yin@ucd.ie> Hi, Jayanthi Jayakumar, GSoC is already finished this year. You can check the information here: http://socghop.appspot.com/gsoc/program/home/google/gsoc2010 However, you can still contribute to the BioPerl project if you like. You can talk to people in this mail list. Or you can join the IRC channel (http://www.bioperl.org/wiki/IRC). Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jayanthijayakumar Sent: Thursday, September 09, 2010 6:00 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Regarding GSoC 2010 Respected sir/madam, I am Jayanthi Jayakumar doing my second year MS(By Research) in computational biology in Anna University Chennai,India. Iam very much interested to participate in GSoC 2010 under the project "Major Bioperl recognition". I request you to provide details and eligiblity criteria for the same. Thanking you, yours faithfully, Jayanthi Jayakumar _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From david.breimann at gmail.com Sun Sep 12 09:16:29 2010 From: david.breimann at gmail.com (David Breimann) Date: Sun, 12 Sep 2010 15:16:29 +0200 Subject: [Bioperl-l] Circular genomes Message-ID: Hello, As continuation to http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033904.html, I would like to ask: Was the fix implemented yet? That is, are GFF3 created for circular genomes comply with GFF3 specs for such genomes? I just find it difficult to keep track using git ,so I'm not sure if this was already handled. Also, will the stat and end coordinates of such genes loaded from a GFF3 file will be "normal" (i.e. no coordinate is larger than the size of the genome) or just as written in the GFF3 (which demands that end > start even if end > genome length)? Thanks, David From David.Messina at sbc.su.se Mon Sep 13 11:10:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 13 Sep 2010 17:10:42 +0200 Subject: [Bioperl-l] BioPerl net installer Message-ID: <80921A33-63E0-481A-B31B-3C0338542F2B@sbc.su.se> Hi everyone, I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. It's already part of bioperl-live, and you can also get it here: http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl Dave From maj at fortinbras.us Mon Sep 13 12:47:45 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 13 Sep 2010 16:47:45 +0000 Subject: [Bioperl-l] BioPerl net installer Message-ID: Dear Scott- You rock! Sincerely, Mark >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Monday, September 13, 2010 11:10 AM >To: 'BioPerl List' >Subject: [Bioperl-l] BioPerl net installer > >Hi everyone, > >I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. > >The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. > >It's already part of bioperl-live, and you can also get it here: > > http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Sep 13 17:15:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 13 Sep 2010 16:15:45 -0500 Subject: [Bioperl-l] BioPerl net installer In-Reply-To: References: Message-ID: <3D7D24C5-B2BD-472E-9611-F3D7112E453D@illinois.edu> Ditto! chris (briefly resurfacing) On Sep 13, 2010, at 11:47 AM, Mark A. Jensen wrote: > Dear Scott- > You rock! > Sincerely, > Mark > >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Monday, September 13, 2010 11:10 AM >> To: 'BioPerl List' >> Subject: [Bioperl-l] BioPerl net installer >> >> Hi everyone, >> >> I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. >> >> The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. >> >> It's already part of bioperl-live, and you can also get it here: >> >> http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl >> >> >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From timmcilveen at talktalk.net Mon Sep 13 19:07:00 2010 From: timmcilveen at talktalk.net (tim) Date: Tue, 14 Sep 2010 00:07:00 +0100 Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3 Message-ID: <201009140007.00798.timmcilveen@talktalk.net> Hi, I have just installed Bioperl on my Linux system using the CPAN install. The install summary is as follows: Test Summary Report ------------------- t/RemoteDB/GenPept.t (Wstat: 256 Tests: 21 Failed: 1) Failed test: 17 Non-zero exit status: 1 t/RemoteDB/Query/GenBank.t (Wstat: 256 Tests: 18 Failed: 1) Failed test: 9 Non-zero exit status: 1 Parse errors: Bad plan. You planned 21 tests but ran 18. t/RemoteDB/Taxonomy.t (Wstat: 512 Tests: 103 Failed: 2) Failed tests: 15, 98 Non-zero exit status: 2 t/Root/RootIO.t (Wstat: 7424 Tests: 30 Failed: 0) Non-zero exit status: 29 Parse errors: Bad plan. You planned 31 tests but ran 30. Files=329, Tests=18407, 512 wallclock secs ( 6.19 usr 0.91 sys + 156.68 cusr 9.16 csys = 172.94 CPU) Result: FAIL Failed 4/329 test programs. 4/18407 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Running Build install make test had returned bad status, won't install without force Failed during this command: CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO Is Bioperl properly installed? During the install process I was getting quite a lot of this error (100's of instances): 'replacement list longer than search list . This happened with t/tools, t/seq / t/search and many others. Any advice would be great. Tim From David.Messina at sbc.su.se Tue Sep 14 03:56:33 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 14 Sep 2010 09:56:33 +0200 Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3 In-Reply-To: <201009140007.00798.timmcilveen@talktalk.net> References: <201009140007.00798.timmcilveen@talktalk.net> Message-ID: <5955676D-D3BC-452B-BAA0-6F230EC11EC1@sbc.su.se> Hi Tim, Thanks for your report. > Is Bioperl properly installed? No, it wasn't. When installing through CPAN, if any tests fail the installation is aborted. You can always check by looking for this line: > make test had returned bad status, won't install without force As for the error(s) > 'replacement list longer than search list' I believe this was fixed a couple of months ago. For details, see: http://bugzilla.open-bio.org/show_bug.cgi?id=3116 So I would recommend that you grab the latest copy of bioperl-live from github, wherein the bug will be fixed: http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots Give that a shot and let us know how it goes. Dave From jskittrell at unmc.edu Thu Sep 16 12:15:49 2010 From: jskittrell at unmc.edu (Jeff Kittrell) Date: Thu, 16 Sep 2010 16:15:49 +0000 (UTC) Subject: [Bioperl-l] mpiblast Message-ID: Does Bioperl work with mpiblast? Is the there a standalone like module that allows you to easily call mpiblast? I'm assuming seqio with parse a mpiblast output file correctly? Thanks for any help, Jeff From David.Messina at sbc.su.se Thu Sep 16 14:25:57 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 16 Sep 2010 20:25:57 +0200 Subject: [Bioperl-l] mpiblast In-Reply-To: References: Message-ID: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> > Is the there a standalone like module that allows you to easily call mpiblast? No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward. http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase > I'm assuming seqio with parse a mpiblast output file correctly? Yes, although I see that a new version of mpiblast was recently released. Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet? Dave From shalabh.sharma7 at gmail.com Thu Sep 16 17:38:14 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 16 Sep 2010 17:38:14 -0400 Subject: [Bioperl-l] IUPAC code similarity Message-ID: Hi All, I have few nucleotide sequences that are composed of IUPAC codes. Like >test VGSRVBSSSSSNSC Similarly i have a database made of of these kind of sequences. I want to find sequences that are 100% similar to the query sequence. Is there any bioPerl module to deal with this, i tried normal blast but it didn't worked. Do i have to convert these sequences to 4 base codes or there is any other way out. Thanks Shalabh From amackey at virginia.edu Fri Sep 17 10:28:15 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 10:28:15 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Convert the IUPAC code to a regular expression, and use regular expressions (in Perl or grep or similar) to find 100% identical matches. -Aaron On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma wrote: > Hi All, > I have few nucleotide sequences that are composed of IUPAC codes. Like > >test > VGSRVBSSSSSNSC > > Similarly i have a database made of of these kind of sequences. I want to > find sequences that are 100% similar to the query sequence. > > Is there any bioPerl module to deal with this, i tried normal blast but it > didn't worked. > Do i have to convert these sequences to 4 base codes or there is any other > way out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Fri Sep 17 11:07:38 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 17 Sep 2010 11:07:38 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Thanks Aaron for your reply. Actually i tried that first, but there is another problem, i have to divide each query sequence to window size 5 with 1 base shift and its not possible to divide regular expression in that way. So what i am trying is to convert those iupac codes to 4 base code sequence and then do the normal search. Now the problem is that i cant able to convert those IUPAC sequences to normal ones, i am still trying to write a script but its taking time. Thanks Shalabh On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: > Convert the IUPAC code to a regular expression, and use regular expressions > (in Perl or grep or similar) to find 100% identical matches. > > -Aaron > > On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma > wrote: > >> Hi All, >> I have few nucleotide sequences that are composed of IUPAC codes. >> Like >> >test >> VGSRVBSSSSSNSC >> >> Similarly i have a database made of of these kind of sequences. I want to >> find sequences that are 100% similar to the query sequence. >> >> Is there any bioPerl module to deal with this, i tried normal blast but it >> didn't worked. >> Do i have to convert these sequences to 4 base codes or there is any other >> way out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From roy.chaudhuri at gmail.com Fri Sep 17 11:04:28 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 17 Sep 2010 16:04:28 +0100 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: <4C93837C.4080008@gmail.com> Hi Shalabh, The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC codes to regular expressions: $perl -e 'use Bio::Tools::SeqPattern; print Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand' [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C Although that won't work if there are also abiguity codes in your database. For a non-BioPerl solution you could try fuzznuc from Emboss. Cheers. Roy. On 17/09/2010 15:28, Aaron Mackey wrote: > Convert the IUPAC code to a regular expression, and use regular expressions > (in Perl or grep or similar) to find 100% identical matches. > > -Aaron > > On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma > wrote: > >> Hi All, >> I have few nucleotide sequences that are composed of IUPAC codes. Like >>> test >> VGSRVBSSSSSNSC >> >> Similarly i have a database made of of these kind of sequences. I want to >> find sequences that are 100% similar to the query sequence. >> >> Is there any bioPerl module to deal with this, i tried normal blast but it >> didn't worked. >> Do i have to convert these sequences to 4 base codes or there is any other >> way out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Fri Sep 17 14:13:22 2010 From: david.breimann at gmail.com (David Breimann) Date: Fri, 17 Sep 2010 20:13:22 +0200 Subject: [Bioperl-l] Installing using git after an older installation Message-ID: Hello, I'm sharing a server with some other lab members. I would like to install the latest version of bioperl for my own use, without affecting my colleagues. I used git to clone a copy of bioperl-live and exported PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB". Now perl -MBio::Perl -le 'print Bio::Perl->VERSION;' returns 1.0069 My question is: is that all? Now I'm using the latest version? Should I include anything special in my scripts? Also, what about all the bp_***.pl scripts? Are the now using the latest version, too? I guess not, since I didn't build anything. So what should I do about them? Thanks, Dave From amackey at virginia.edu Fri Sep 17 15:24:44 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 15:24:44 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: <4C93837C.4080008@gmail.com> References: <4C93837C.4080008@gmail.com> Message-ID: If there are ambi. codes in the database, then the expanded character class has to also include the original ambiguity code; non-ambiguous nucleotides must also be expanded to include all ambiguity codes that represent the nucleotide. -Aaron On Fri, Sep 17, 2010 at 11:04 AM, Roy Chaudhuri wrote: > Hi Shalabh, > > The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC > codes to regular expressions: > > $perl -e 'use Bio::Tools::SeqPattern; print > Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand' > [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C > > Although that won't work if there are also abiguity codes in your database. > For a non-BioPerl solution you could try fuzznuc from Emboss. > > Cheers. > Roy. > > > On 17/09/2010 15:28, Aaron Mackey wrote: > >> Convert the IUPAC code to a regular expression, and use regular >> expressions >> (in Perl or grep or similar) to find 100% identical matches. >> >> -Aaron >> >> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma >> wrote: >> >> Hi All, >>> I have few nucleotide sequences that are composed of IUPAC codes. >>> Like >>> >>>> test >>>> >>> VGSRVBSSSSSNSC >>> >>> Similarly i have a database made of of these kind of sequences. I want to >>> find sequences that are 100% similar to the query sequence. >>> >>> Is there any bioPerl module to deal with this, i tried normal blast but >>> it >>> didn't worked. >>> Do i have to convert these sequences to 4 base codes or there is any >>> other >>> way out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From amackey at virginia.edu Fri Sep 17 15:25:54 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 15:25:54 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: do your windowing/shifting on the unexpanded query sequences; then transform the 5-bp queries into regular expressions. -Aaron On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma wrote: > Thanks Aaron for your reply. > Actually i tried that first, but there is another problem, i have to divide > each query sequence to window size 5 with 1 base shift and its not possible > to divide regular expression in that way. > So what i am trying is to convert those iupac codes to 4 base code sequence > and then do the normal search. > Now the problem is that i cant able to convert those IUPAC sequences to > normal ones, i am still trying to write a script but its taking time. > > Thanks > Shalabh > > > On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: > >> Convert the IUPAC code to a regular expression, and use regular >> expressions (in Perl or grep or similar) to find 100% identical matches. >> >> -Aaron >> >> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma < >> shalabh.sharma7 at gmail.com> wrote: >> >>> Hi All, >>> I have few nucleotide sequences that are composed of IUPAC codes. >>> Like >>> >test >>> VGSRVBSSSSSNSC >>> >>> Similarly i have a database made of of these kind of sequences. I want to >>> find sequences that are 100% similar to the query sequence. >>> >>> Is there any bioPerl module to deal with this, i tried normal blast but >>> it >>> didn't worked. >>> Do i have to convert these sequences to 4 base codes or there is any >>> other >>> way out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > From Kevin.M.Brown at asu.edu Fri Sep 17 16:09:34 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 17 Sep 2010 13:09:34 -0700 Subject: [Bioperl-l] Installing using git after an older installation In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40701E0A4@EX02.asurite.ad.asu.edu> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_IN_A_PERSONAL_MODULE_AREA From shalabh.sharma7 at gmail.com Fri Sep 17 16:45:50 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 17 Sep 2010 16:45:50 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Thanks Aaron, changing the query sequence worked well but i am still struggling with the database. -Shalabh On Fri, Sep 17, 2010 at 3:25 PM, Aaron Mackey wrote: > do your windowing/shifting on the unexpanded query sequences; then > transform the 5-bp queries into regular expressions. > > -Aaron > > > On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma < > shalabh.sharma7 at gmail.com> wrote: > >> Thanks Aaron for your reply. >> Actually i tried that first, but there is another problem, i have to >> divide each query sequence to window size 5 with 1 base shift and its not >> possible to divide regular expression in that way. >> So what i am trying is to convert those iupac codes to 4 base code >> sequence and then do the normal search. >> Now the problem is that i cant able to convert those IUPAC sequences to >> normal ones, i am still trying to write a script but its taking time. >> >> Thanks >> Shalabh >> >> >> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: >> >>> Convert the IUPAC code to a regular expression, and use regular >>> expressions (in Perl or grep or similar) to find 100% identical matches. >>> >>> -Aaron >>> >>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma < >>> shalabh.sharma7 at gmail.com> wrote: >>> >>>> Hi All, >>>> I have few nucleotide sequences that are composed of IUPAC codes. >>>> Like >>>> >test >>>> VGSRVBSSSSSNSC >>>> >>>> Similarly i have a database made of of these kind of sequences. I want >>>> to >>>> find sequences that are 100% similar to the query sequence. >>>> >>>> Is there any bioPerl module to deal with this, i tried normal blast but >>>> it >>>> didn't worked. >>>> Do i have to convert these sequences to 4 base codes or there is any >>>> other >>>> way out. >>>> >>>> Thanks >>>> Shalabh >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >> > From heikki.lehvaslaiho at gmail.com Sat Sep 18 03:41:22 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Sat, 18 Sep 2010 10:41:22 +0300 Subject: [Bioperl-l] mpiblast In-Reply-To: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> References: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> Message-ID: Been running 1.6 and its betas on Blue Gene/P for months. The output is identical to standard BLAST output. No issues in parsing it with BioPerl. ? ?? -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849? office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 16 September 2010 21:25, Dave Messina wrote: >> Is the there a standalone like module that allows you to easily call mpiblast? > > No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward. > > ? ? ? ?http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase > > >> I'm assuming seqio with parse a mpiblast output file correctly? > > Yes, although I see that a new version of mpiblast was recently released. > > Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet? > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From david.breimann at gmail.com Sat Sep 18 05:05:58 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 11:05:58 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl Message-ID: Hello, I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag` in the fields and sometime it doesn't, even though the genabank has a locus tag. Also, is the ID always equivalent to the locus tag? Thanks, Dave From scott at scottcain.net Sat Sep 18 05:17:24 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 10:17:24 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, bp_genbank2gff3.pl suffers from the fact that it has to deal with GenBank files :-) It was designed initially to work on whole genome refseqs, and contains several ad hoc rules for trying to make it "do the right thing." In practice, it is not unusual for a post processing step (either by hand or a quicky perl script) to be required to really get it right. I don't recall the specifics (if I ever knew :-) for when and how the locus tag is used, but I do know that there is a list of things that it will try to use for the ID, and while the locus is on the list, I don't know where it comes in the list, so it's possible that other items might supersede it. Scott On Sat, Sep 18, 2010 at 10:05 AM, David Breimann wrote: > Hello, > > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag` > in the fields and sometime it doesn't, even though the genabank has a locus > tag. > Also, is the ID always equivalent to the locus tag? > > Thanks, > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 05:20:33 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 11:20:33 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Since locus_tag is an essential tag in genbank, I suggest locus_tag will be always added to the GFF last column if it exists in the genbank, whether it is used as ID in the GFF or not. On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain wrote: > Hi Dave, > > bp_genbank2gff3.pl suffers from the fact that it has to deal with > GenBank files :-) It was designed initially to work on whole genome > refseqs, and contains several ad hoc rules for trying to make it "do > the right thing." In practice, it is not unusual for a post > processing step (either by hand or a quicky perl script) to be > required to really get it right. I don't recall the specifics (if I > ever knew :-) for when and how the locus tag is used, but I do know > that there is a list of things that it will try to use for the ID, and > while the locus is on the list, I don't know where it comes in the > list, so it's possible that other items might supersede it. > > Scott > > > On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > wrote: > > Hello, > > > > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a > `locus_tag` > > in the fields and sometime it doesn't, even though the genabank has a > locus > > tag. > > Also, is the ID always equivalent to the locus tag? > > > > Thanks, > > Dave > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From scott at scottcain.net Sat Sep 18 06:08:26 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 11:08:26 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, That seems perfectly reasonable. If you could point out a GenBank entry for which that does not happen, I could try to figure out why not. Scott On Sat, Sep 18, 2010 at 10:20 AM, David Breimann wrote: > Since locus_tag is an essential tag in genbank, I suggest locus_tag will be > always added to the GFF last column if it exists in the genbank, whether it > is used as ID in the GFF or not. > > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain wrote: >> >> Hi Dave, >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with >> GenBank files :-) ?It was designed initially to work on whole genome >> refseqs, and contains several ad hoc rules for trying to make it "do >> the right thing." ?In practice, it is not unusual for a post >> processing step (either by hand or a quicky perl script) to be >> required to really get it right. ?I don't recall the specifics (if I >> ever knew :-) for when and how the locus tag is used, but I do know >> that there is a list of things that it will try to use for the ID, and >> while the locus is on the list, I don't know where it comes in the >> list, so it's possible that other items might supersede it. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> wrote: >> > Hello, >> > >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a >> > `locus_tag` >> > in the fields and sometime it doesn't, even though the genabank has a >> > locus >> > tag. >> > Also, is the ID always equivalent to the locus tag? >> > >> > Thanks, >> > Dave >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 06:20:50 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 12:20:50 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Scott, Here is a very short genbank: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk Note all genes in the genbank have locus tags. In the resulting GFF3, however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no idea why it deserves a special treatment... :) p.s. making this change (i.e., copying locus_tag to the GFF3 last column whenever available) will really make my life easier. Thank you, Dave On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain wrote: > Hi Dave, > > That seems perfectly reasonable. If you could point out a GenBank > entry for which that does not happen, I could try to figure out why > not. > > Scott > > > On Sat, Sep 18, 2010 at 10:20 AM, David Breimann > wrote: > > Since locus_tag is an essential tag in genbank, I suggest locus_tag will > be > > always added to the GFF last column if it exists in the genbank, whether > it > > is used as ID in the GFF or not. > > > > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain > wrote: > >> > >> Hi Dave, > >> > >> bp_genbank2gff3.pl suffers from the fact that it has to deal with > >> GenBank files :-) It was designed initially to work on whole genome > >> refseqs, and contains several ad hoc rules for trying to make it "do > >> the right thing." In practice, it is not unusual for a post > >> processing step (either by hand or a quicky perl script) to be > >> required to really get it right. I don't recall the specifics (if I > >> ever knew :-) for when and how the locus tag is used, but I do know > >> that there is a list of things that it will try to use for the ID, and > >> while the locus is on the list, I don't know where it comes in the > >> list, so it's possible that other items might supersede it. > >> > >> Scott > >> > >> > >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > >> wrote: > >> > Hello, > >> > > >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a > >> > `locus_tag` > >> > in the fields and sometime it doesn't, even though the genabank has a > >> > locus > >> > tag. > >> > Also, is the ID always equivalent to the locus tag? > >> > > >> > Thanks, > >> > Dave > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From david.breimann at gmail.com Sat Sep 18 06:45:13 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 12:45:13 +0200 Subject: [Bioperl-l] Extracting sequences from GFF3 Message-ID: As you know, GFF3 files can contain FASTA sequences after the features. How do I extract a specific FASTA sequence given it's ID? I tried: use Bio::Tools::GFF; use Data::Dumper; my $gffio = Bio::Tools::GFF->new( -file => "/path/to/file.gff", -gff_version => 3 ); print Dumper $gffio->get_seqs(); but $gffio->get_seqs() seems to return nothing, although the GFF3 has sequences and is also valid. By the way, I am able to parse the features themselves (using $gffio->next_feature()). Thanks, Dave From scott at scottcain.net Sat Sep 18 07:07:13 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 12:07:13 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, A fresh "pull" of the bioperl git repository shows that bp_genbank2gff3.pl already does this. It creates a locus_tag for all features that have a locus_tag, and uses the locus_tag for the ID when it can (it can't blindly use the locus tag for the ID since both the gene and the CDS have the same tag). Scott On Sat, Sep 18, 2010 at 11:20 AM, David Breimann wrote: > Hi Scott, > > Here is a very short genbank: > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > > Note all genes in the genbank have locus tags. In the resulting GFF3, > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no > idea why it deserves a special treatment... :) > > p.s. making this change (i.e., copying locus_tag to the GFF3 last column > whenever available) will really make my life easier. > > Thank you, > Dave > > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain wrote: >> >> Hi Dave, >> >> That seems perfectly reasonable. ?If you could point out a GenBank >> entry for which that does not happen, I could try to figure out why >> not. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> wrote: >> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will >> > be >> > always added to the GFF last column if it exists in the genbank, whether >> > it >> > is used as ID in the GFF or not. >> > >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> > wrote: >> >> >> >> Hi Dave, >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with >> >> GenBank files :-) ?It was designed initially to work on whole genome >> >> refseqs, and contains several ad hoc rules for trying to make it "do >> >> the right thing." ?In practice, it is not unusual for a post >> >> processing step (either by hand or a quicky perl script) to be >> >> required to really get it right. ?I don't recall the specifics (if I >> >> ever knew :-) for when and how the locus tag is used, but I do know >> >> that there is a list of things that it will try to use for the ID, and >> >> while the locus is on the list, I don't know where it comes in the >> >> list, so it's possible that other items might supersede it. >> >> >> >> Scott >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> wrote: >> >> > Hello, >> >> > >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a >> >> > `locus_tag` >> >> > in the fields and sometime it doesn't, even though the genabank has a >> >> > locus >> >> > tag. >> >> > Also, is the ID always equivalent to the locus tag? >> >> > >> >> > Thanks, >> >> > Dave >> >> > _______________________________________________ >> >> > Bioperl-l mailing list >> >> > Bioperl-l at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 07:13:23 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 12:13:23 +0100 Subject: [Bioperl-l] Extracting sequences from GFF3 In-Reply-To: References: Message-ID: Hi Dave, I would use Bio::DB::SeqFeature::Store (either with a database on the backend or a flat file if a database isn't warranted): my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory', -dir => 'path/to/file' ); # Warning: this returns a string, and not a PrimarySeq object my $sequence = $db->fetch_sequence('Chr1',5000=>6000); Scott On Sat, Sep 18, 2010 at 11:45 AM, David Breimann wrote: > As you know, GFF3 files can contain FASTA sequences after the features. > > How do I extract a specific FASTA sequence given it's ID? > > I tried: > > use Bio::Tools::GFF; > use Data::Dumper; > > my $gffio = Bio::Tools::GFF->new( > -file => > "/path/to/file.gff", > -gff_version => 3 > ); > > print Dumper $gffio->get_seqs(); > > but $gffio->get_seqs() seems to return nothing, although the GFF3 has > sequences and is also valid. > > By the way, I am able to parse the features themselves (using > $gffio->next_feature()). > > > Thanks, > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 09:40:35 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 14:40:35 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, Let's keep the discussion on the mailing list so we can make sure that when this problem is solved, its resolution will be archived. I don't really understand what is going on either, though it would probably be a good idea to set your PERL5LIB env variable so that when you execute this script from the git repository that it will also uses BioPerl modules in the git repository instead of the ones that are installed in your "normal" path. Also, are you using any command line flags when executing it? I didn't. Scott On Sat, Sep 18, 2010 at 2:14 PM, David Breimann wrote: > Yes, I'm using Ubuntu 10.04. > > That is really weired. I tried running the script from the perl-live dir > (which I just pulled using git), and I get the same results as before > (`Name` instead of `locus_tag`): > > ?$ wget > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y > NC_009789.genbank > > Attached is the resulting GFF3. > I also attach a copy of bp_genbank2gff3.pl as found under > /home/dave/src/bioperl-live/blib/script. > > This is a real mystery for me! > > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >> >> Typically I do build and install, but you can run it directly from the >> git checkout directory. >> >> For locating other versions of the script, are you running linux? ?If >> so, are you familiar with the "locate" command: >> >> ?locate bp_genbank2gff3.pl >> >> If you've never used it before, you may need to update the database >> the locate command uses as root: >> >> ?sudo updatedb >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >> wrote: >> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed >> > of >> > `locus_tag=`. >> > >> > I don't really know how to check for multiple bioperl installations. >> > I'm using my personal server, so I don't mind removing and installing >> > everything from scratch -- but I do'nt know ho to do that. >> > >> > Also, what I don't get with the git is how the scripts are supposed to >> > be >> > updated (unless you build and install). >> > >> > Thanks you! >> > >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain wrote: >> >> >> >> Well, if you aren't getting the same results as me then I'd say you >> >> aren't using the same version of the script :-) >> >> >> >> Unfortunately, the scripts are no longer automatically marked with the >> >> "internal" version information when committed, so there really isn't >> >> anything in the script I can tell you to look for. ?Check for more >> >> than one bioperl instance on your ?computer. >> >> >> >> I've attached the GFF3 file I got so you can look at it and tell me if >> >> it is what you expect. >> >> >> >> Scott >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >> >> wrote: >> >> > Hi Scott, >> >> > >> >> > I just pulled the lated bioperl-live using git. >> >> > I'm not sure how the scripts are updated, so I Build and installed >> >> > anyway >> >> > (perhaps exporting the path is supposed to be enough?) >> >> > Anyway, I still get the same results. No locus_tag. >> >> > How can I tell if I'm using the latest version of the script? >> >> > >> >> > Thanks again. >> >> > >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >> >> > wrote: >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> A fresh "pull" of the bioperl git repository shows that >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >> >> >> all >> >> >> features that have a locus_tag, and uses the locus_tag for the ID >> >> >> when >> >> >> it can (it can't blindly use the locus tag for the ID since both the >> >> >> gene and the CDS have the same tag). >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >> >> >> wrote: >> >> >> > Hi Scott, >> >> >> > >> >> >> > Here is a very short genbank: >> >> >> > >> >> >> > >> >> >> > >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> >> >> > >> >> >> > Note all genes in the genbank have locus tags. In the resulting >> >> >> > GFF3, >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I >> >> >> > have >> >> >> > no >> >> >> > idea why it deserves a special treatment... :) >> >> >> > >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last >> >> >> > column >> >> >> > whenever available) will really make my life easier. >> >> >> > >> >> >> > Thank you, >> >> >> > Dave >> >> >> > >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> That seems perfectly reasonable. ?If you could point out a >> >> >> >> GenBank >> >> >> >> entry for which that does not happen, I could try to figure out >> >> >> >> why >> >> >> >> not. >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> >> >> >> wrote: >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >> >> >> >> > locus_tag >> >> >> >> > will >> >> >> >> > be >> >> >> >> > always added to the GFF last column if it exists in the >> >> >> >> > genbank, >> >> >> >> > whether >> >> >> >> > it >> >> >> >> > is used as ID in the GFF or not. >> >> >> >> > >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal >> >> >> >> >> with >> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole >> >> >> >> >> genome >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make >> >> >> >> >> it >> >> >> >> >> "do >> >> >> >> >> the right thing." ?In practice, it is not unusual for a post >> >> >> >> >> processing step (either by hand or a quicky perl script) to be >> >> >> >> >> required to really get it right. ?I don't recall the specifics >> >> >> >> >> (if I >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I do >> >> >> >> >> know >> >> >> >> >> that there is a list of things that it will try to use for the >> >> >> >> >> ID, >> >> >> >> >> and >> >> >> >> >> while the locus is on the list, I don't know where it comes in >> >> >> >> >> the >> >> >> >> >> list, so it's possible that other items might supersede it. >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> >> >> >> wrote: >> >> >> >> >> > Hello, >> >> >> >> >> > >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds >> >> >> >> >> > a >> >> >> >> >> > `locus_tag` >> >> >> >> >> > in the fields and sometime it doesn't, even though the >> >> >> >> >> > genabank >> >> >> >> >> > has a >> >> >> >> >> > locus >> >> >> >> >> > tag. >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >> >> >> >> >> > >> >> >> >> >> > Thanks, >> >> >> >> >> > Dave >> >> >> >> >> > _______________________________________________ >> >> >> >> >> > Bioperl-l mailing list >> >> >> >> >> > Bioperl-l at lists.open-bio.org >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> >> scottcain >> >> >> >> >> dot net >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> 216-392-3087 >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> scottcain >> >> >> >> dot net >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> 216-392-3087 >> >> >> >> Ontario Institute for Cancer Research >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> scottcain >> >> >> dot net >> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> >> Ontario Institute for Cancer Research >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 09:48:35 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 14:48:35 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, The blib directory is not part of the repository; it is created when you execute ./Build as a staging area before installation. The directory that the script resides is scripts/Bio-DB-GFF/ Scott On Sat, Sep 18, 2010 at 2:40 PM, David Breimann wrote: > Now I did a fresh clone (instead of pull) into a new dir: > > $ git clone http://github.com/bioperl/bioperl-live.git > > but I don't find the script at all (there is no blib dir as before)... > > > On Sat, Sep 18, 2010 at 3:14 PM, David Breimann > wrote: >> >> Yes, I'm using Ubuntu 10.04. >> >> That is really weired. I tried running the script from the perl-live dir >> (which I just pulled using git), and I get the same results as before >> (`Name` instead of `locus_tag`): >> >> ?$ wget >> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y >> NC_009789.genbank >> >> Attached is the resulting GFF3. >> I also attach a copy of bp_genbank2gff3.pl as found under >> /home/dave/src/bioperl-live/blib/script. >> >> This is a real mystery for me! >> >> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >>> >>> Typically I do build and install, but you can run it directly from the >>> git checkout directory. >>> >>> For locating other versions of the script, are you running linux? ?If >>> so, are you familiar with the "locate" command: >>> >>> ?locate bp_genbank2gff3.pl >>> >>> If you've never used it before, you may need to update the database >>> the locate command uses as root: >>> >>> ?sudo updatedb >>> >>> Scott >>> >>> >>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >>> wrote: >>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed >>> > of >>> > `locus_tag=`. >>> > >>> > I don't really know how to check for multiple bioperl installations. >>> > I'm using my personal server, so I don't mind removing and installing >>> > everything from scratch -- but I do'nt know ho to do that. >>> > >>> > Also, what I don't get with the git is how the scripts are supposed to >>> > be >>> > updated (unless you build and install). >>> > >>> > Thanks you! >>> > >>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain >>> > wrote: >>> >> >>> >> Well, if you aren't getting the same results as me then I'd say you >>> >> aren't using the same version of the script :-) >>> >> >>> >> Unfortunately, the scripts are no longer automatically marked with the >>> >> "internal" version information when committed, so there really isn't >>> >> anything in the script I can tell you to look for. ?Check for more >>> >> than one bioperl instance on your ?computer. >>> >> >>> >> I've attached the GFF3 file I got so you can look at it and tell me if >>> >> it is what you expect. >>> >> >>> >> Scott >>> >> >>> >> >>> >> >>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >>> >> wrote: >>> >> > Hi Scott, >>> >> > >>> >> > I just pulled the lated bioperl-live using git. >>> >> > I'm not sure how the scripts are updated, so I Build and installed >>> >> > anyway >>> >> > (perhaps exporting the path is supposed to be enough?) >>> >> > Anyway, I still get the same results. No locus_tag. >>> >> > How can I tell if I'm using the latest version of the script? >>> >> > >>> >> > Thanks again. >>> >> > >>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >>> >> > wrote: >>> >> >> >>> >> >> Hi Dave, >>> >> >> >>> >> >> A fresh "pull" of the bioperl git repository shows that >>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >>> >> >> all >>> >> >> features that have a locus_tag, and uses the locus_tag for the ID >>> >> >> when >>> >> >> it can (it can't blindly use the locus tag for the ID since both >>> >> >> the >>> >> >> gene and the CDS have the same tag). >>> >> >> >>> >> >> Scott >>> >> >> >>> >> >> >>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >>> >> >> wrote: >>> >> >> > Hi Scott, >>> >> >> > >>> >> >> > Here is a very short genbank: >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >>> >> >> > >>> >> >> > Note all genes in the genbank have locus tags. In the resulting >>> >> >> > GFF3, >>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I >>> >> >> > have >>> >> >> > no >>> >> >> > idea why it deserves a special treatment... :) >>> >> >> > >>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last >>> >> >> > column >>> >> >> > whenever available) will really make my life easier. >>> >> >> > >>> >> >> > Thank you, >>> >> >> > Dave >>> >> >> > >>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >>> >> >> > >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Hi Dave, >>> >> >> >> >>> >> >> >> That seems perfectly reasonable. ?If you could point out a >>> >> >> >> GenBank >>> >> >> >> entry for which that does not happen, I could try to figure out >>> >> >> >> why >>> >> >> >> not. >>> >> >> >> >>> >> >> >> Scott >>> >> >> >> >>> >> >> >> >>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >>> >> >> >> wrote: >>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >>> >> >> >> > locus_tag >>> >> >> >> > will >>> >> >> >> > be >>> >> >> >> > always added to the GFF last column if it exists in the >>> >> >> >> > genbank, >>> >> >> >> > whether >>> >> >> >> > it >>> >> >> >> > is used as ID in the GFF or not. >>> >> >> >> > >>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >>> >> >> >> > >>> >> >> >> > wrote: >>> >> >> >> >> >>> >> >> >> >> Hi Dave, >>> >> >> >> >> >>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal >>> >> >> >> >> with >>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole >>> >> >> >> >> genome >>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make >>> >> >> >> >> it >>> >> >> >> >> "do >>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post >>> >> >> >> >> processing step (either by hand or a quicky perl script) to >>> >> >> >> >> be >>> >> >> >> >> required to really get it right. ?I don't recall the >>> >> >> >> >> specifics >>> >> >> >> >> (if I >>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I >>> >> >> >> >> do >>> >> >> >> >> know >>> >> >> >> >> that there is a list of things that it will try to use for >>> >> >> >> >> the >>> >> >> >> >> ID, >>> >> >> >> >> and >>> >> >> >> >> while the locus is on the list, I don't know where it comes >>> >> >> >> >> in >>> >> >> >> >> the >>> >> >> >> >> list, so it's possible that other items might supersede it. >>> >> >> >> >> >>> >> >> >> >> Scott >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >>> >> >> >> >> wrote: >>> >> >> >> >> > Hello, >>> >> >> >> >> > >>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it >>> >> >> >> >> > adds a >>> >> >> >> >> > `locus_tag` >>> >> >> >> >> > in the fields and sometime it doesn't, even though the >>> >> >> >> >> > genabank >>> >> >> >> >> > has a >>> >> >> >> >> > locus >>> >> >> >> >> > tag. >>> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >>> >> >> >> >> > >>> >> >> >> >> > Thanks, >>> >> >> >> >> > Dave >>> >> >> >> >> > _______________________________________________ >>> >> >> >> >> > Bioperl-l mailing list >>> >> >> >> >> > Bioperl-l at lists.open-bio.org >>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> > >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> -- >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> ------------------------------------------------------------------------ >>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> >> >> scottcain >>> >> >> >> >> dot net >>> >> >> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> >> >> 216-392-3087 >>> >> >> >> >> Ontario Institute for Cancer Research >>> >> >> >> > >>> >> >> >> > >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> -- >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> ------------------------------------------------------------------------ >>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> >> scottcain >>> >> >> >> dot net >>> >> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> >> 216-392-3087 >>> >> >> >> Ontario Institute for Cancer Research >>> >> >> > >>> >> >> > >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> >>> >> >> >>> >> >> ------------------------------------------------------------------------ >>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> scottcain >>> >> >> dot net >>> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> 216-392-3087 >>> >> >> Ontario Institute for Cancer Research >>> >> > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> >>> >> ------------------------------------------------------------------------ >>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> scottcain >>> >> dot net >>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >>> >> Ontario Institute for Cancer Research >>> > >>> > >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >>> Ontario Institute for Cancer Research >> > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 09:57:30 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 15:57:30 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: So let's do an intermediate summary of my situation: I'm using Ubuntu 10.04 and Perl 5.10.1. I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of "locus_tag=" in the last GFF3 column), while Scott gets the expected results while using the latest version of bioperl. I cloned a fresh version of bioperl live into my ~/src: $ cd ~/src $ git clone http://github.com/bioperl/bioperl-live.git I then added the following line to the end of ~/.profile: export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB" and ran $ source ~/.profile I then downloaded a small genome from NCBI $ wget ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk and tested the script: $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk Following are the top 10 lines of the resulting GFF3: ##gff-version 3 # sequence-region NC_009789 1 6199 # conversion-by bp_genbank2gff3.pl # organism Escherichia coli E24377A # date 06-JAN-2010 # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. NC_009789 GenBank region 1 6199 . + 1 ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli E24377A;plasmid=pETEC_6;strain=E24377A NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001 NC_009789 GenBank mRNA 665 781 . - 1 ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 NC_009789 GenBank CDS 665 781 . - 1 ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified by glimmer%3B putative;codon_start=1;product=hypothetical protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 while these are from Scotts' file: ##gff-version 3 # sequence-region NC_009789 1 6199 # conversion-by bp_genbank2gff3.pl # organism Escherichia coli E24377A # date 06-JAN-2010 # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. NC_009789 GenBank region 1 6199 . + 1 ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli E24377A;plasmid=pETEC_6;strain=E24377A NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001 NC_009789 GenBank mRNA 665 781 . - 1 ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 NC_009789 GenBank CDS 665 781 . - 1 ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified by glimmer%3B putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's, as desired. I have no idea what is going on here... Best, Dave On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain wrote: > Hi Dave, > > Let's keep the discussion on the mailing list so we can make sure that > when this problem is solved, its resolution will be archived. > > I don't really understand what is going on either, though it would > probably be a good idea to set your PERL5LIB env variable so that when > you execute this script from the git repository that it will also uses > BioPerl modules in the git repository instead of the ones that are > installed in your "normal" path. > > Also, are you using any command line flags when executing it? I didn't. > > Scott > > > On Sat, Sep 18, 2010 at 2:14 PM, David Breimann > wrote: > > Yes, I'm using Ubuntu 10.04. > > > > That is really weired. I tried running the script from the perl-live dir > > (which I just pulled using git), and I get the same results as before > > (`Name` instead of `locus_tag`): > > > > $ wget > > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > > $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y > > NC_009789.genbank > > > > Attached is the resulting GFF3. > > I also attach a copy of bp_genbank2gff3.pl as found under > > /home/dave/src/bioperl-live/blib/script. > > > > This is a real mystery for me! > > > > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: > >> > >> Typically I do build and install, but you can run it directly from the > >> git checkout directory. > >> > >> For locating other versions of the script, are you running linux? If > >> so, are you familiar with the "locate" command: > >> > >> locate bp_genbank2gff3.pl > >> > >> If you've never used it before, you may need to update the database > >> the locate command uses as root: > >> > >> sudo updatedb > >> > >> Scott > >> > >> > >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann > >> wrote: > >> > Your gff seems fine. I get a vey similiar one, but with `Name=` > instaed > >> > of > >> > `locus_tag=`. > >> > > >> > I don't really know how to check for multiple bioperl installations. > >> > I'm using my personal server, so I don't mind removing and installing > >> > everything from scratch -- but I do'nt know ho to do that. > >> > > >> > Also, what I don't get with the git is how the scripts are supposed to > >> > be > >> > updated (unless you build and install). > >> > > >> > Thanks you! > >> > > >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain > wrote: > >> >> > >> >> Well, if you aren't getting the same results as me then I'd say you > >> >> aren't using the same version of the script :-) > >> >> > >> >> Unfortunately, the scripts are no longer automatically marked with > the > >> >> "internal" version information when committed, so there really isn't > >> >> anything in the script I can tell you to look for. Check for more > >> >> than one bioperl instance on your computer. > >> >> > >> >> I've attached the GFF3 file I got so you can look at it and tell me > if > >> >> it is what you expect. > >> >> > >> >> Scott > >> >> > >> >> > >> >> > >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann > >> >> wrote: > >> >> > Hi Scott, > >> >> > > >> >> > I just pulled the lated bioperl-live using git. > >> >> > I'm not sure how the scripts are updated, so I Build and installed > >> >> > anyway > >> >> > (perhaps exporting the path is supposed to be enough?) > >> >> > Anyway, I still get the same results. No locus_tag. > >> >> > How can I tell if I'm using the latest version of the script? > >> >> > > >> >> > Thanks again. > >> >> > > >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain > >> >> > wrote: > >> >> >> > >> >> >> Hi Dave, > >> >> >> > >> >> >> A fresh "pull" of the bioperl git repository shows that > >> >> >> bp_genbank2gff3.pl already does this. It creates a locus_tag for > >> >> >> all > >> >> >> features that have a locus_tag, and uses the locus_tag for the ID > >> >> >> when > >> >> >> it can (it can't blindly use the locus tag for the ID since both > the > >> >> >> gene and the CDS have the same tag). > >> >> >> > >> >> >> Scott > >> >> >> > >> >> >> > >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann > >> >> >> wrote: > >> >> >> > Hi Scott, > >> >> >> > > >> >> >> > Here is a very short genbank: > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > >> >> >> > > >> >> >> > Note all genes in the genbank have locus tags. In the resulting > >> >> >> > GFF3, > >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. > I > >> >> >> > have > >> >> >> > no > >> >> >> > idea why it deserves a special treatment... :) > >> >> >> > > >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 > last > >> >> >> > column > >> >> >> > whenever available) will really make my life easier. > >> >> >> > > >> >> >> > Thank you, > >> >> >> > Dave > >> >> >> > > >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain < > scott at scottcain.net> > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> Hi Dave, > >> >> >> >> > >> >> >> >> That seems perfectly reasonable. If you could point out a > >> >> >> >> GenBank > >> >> >> >> entry for which that does not happen, I could try to figure out > >> >> >> >> why > >> >> >> >> not. > >> >> >> >> > >> >> >> >> Scott > >> >> >> >> > >> >> >> >> > >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann > >> >> >> >> wrote: > >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest > >> >> >> >> > locus_tag > >> >> >> >> > will > >> >> >> >> > be > >> >> >> >> > always added to the GFF last column if it exists in the > >> >> >> >> > genbank, > >> >> >> >> > whether > >> >> >> >> > it > >> >> >> >> > is used as ID in the GFF or not. > >> >> >> >> > > >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain > >> >> >> >> > > >> >> >> >> > wrote: > >> >> >> >> >> > >> >> >> >> >> Hi Dave, > >> >> >> >> >> > >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to > deal > >> >> >> >> >> with > >> >> >> >> >> GenBank files :-) It was designed initially to work on > whole > >> >> >> >> >> genome > >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to > make > >> >> >> >> >> it > >> >> >> >> >> "do > >> >> >> >> >> the right thing." In practice, it is not unusual for a post > >> >> >> >> >> processing step (either by hand or a quicky perl script) to > be > >> >> >> >> >> required to really get it right. I don't recall the > specifics > >> >> >> >> >> (if I > >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I > do > >> >> >> >> >> know > >> >> >> >> >> that there is a list of things that it will try to use for > the > >> >> >> >> >> ID, > >> >> >> >> >> and > >> >> >> >> >> while the locus is on the list, I don't know where it comes > in > >> >> >> >> >> the > >> >> >> >> >> list, so it's possible that other items might supersede it. > >> >> >> >> >> > >> >> >> >> >> Scott > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > >> >> >> >> >> wrote: > >> >> >> >> >> > Hello, > >> >> >> >> >> > > >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it > adds > >> >> >> >> >> > a > >> >> >> >> >> > `locus_tag` > >> >> >> >> >> > in the fields and sometime it doesn't, even though the > >> >> >> >> >> > genabank > >> >> >> >> >> > has a > >> >> >> >> >> > locus > >> >> >> >> >> > tag. > >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? > >> >> >> >> >> > > >> >> >> >> >> > Thanks, > >> >> >> >> >> > Dave > >> >> >> >> >> > _______________________________________________ > >> >> >> >> >> > Bioperl-l mailing list > >> >> >> >> >> > Bioperl-l at lists.open-bio.org > >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> >> >> >> >> > > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> -- > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > ------------------------------------------------------------------------ > >> >> >> >> >> Scott Cain, Ph. D. scott > at > >> >> >> >> >> scottcain > >> >> >> >> >> dot net > >> >> >> >> >> GMOD Coordinator (http://gmod.org/) > >> >> >> >> >> 216-392-3087 > >> >> >> >> >> Ontario Institute for Cancer Research > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ------------------------------------------------------------------------ > >> >> >> >> Scott Cain, Ph. D. scott at > >> >> >> >> scottcain > >> >> >> >> dot net > >> >> >> >> GMOD Coordinator (http://gmod.org/) > >> >> >> >> 216-392-3087 > >> >> >> >> Ontario Institute for Cancer Research > >> >> >> > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> > >> >> >> > >> >> >> > ------------------------------------------------------------------------ > >> >> >> Scott Cain, Ph. D. scott at > >> >> >> scottcain > >> >> >> dot net > >> >> >> GMOD Coordinator (http://gmod.org/) > 216-392-3087 > >> >> >> Ontario Institute for Cancer Research > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> > ------------------------------------------------------------------------ > >> >> Scott Cain, Ph. D. scott at > scottcain > >> >> dot net > >> >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> >> Ontario Institute for Cancer Research > >> > > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From scott at scottcain.net Sat Sep 18 10:03:43 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 15:03:43 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: The only thing I can add is that I did a 'git diff genbank2gff3.PLS' and found no differences. It occurred to me that perhaps I'd done some fixing and not commited it, but it looks to me that that's not the case (assuming I've managed to use git correctly (not a great assumption, but I don't have another one to work with :-)) Scott On Sat, Sep 18, 2010 at 2:57 PM, David Breimann wrote: > So let's do an intermediate summary of my situation: > I'm using Ubuntu 10.04 and Perl 5.10.1. > I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of > "locus_tag=" in the last GFF3 column), while Scott gets the expected results > while using the latest version of bioperl. > I cloned a fresh version of bioperl live into my ~/src: > $ cd ~/src > $ git clone http://github.com/bioperl/bioperl-live.git > > I then added the following line to the end of ~/.profile: > export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB" > and ran > $ source ~/.profile > > I then downloaded a small genome from NCBI > $ wget > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > and tested the script: > $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk > > Following are the top 10 lines of the resulting GFF3: > > ##gff-version 3 > # sequence-region NC_009789 1 6199 > # conversion-by bp_genbank2gff3.pl > # organism Escherichia coli E24377A > # date 06-JAN-2010 > # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. > NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1 > ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia > coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This > record has not yet been subject to final NCBI review. The reference sequence > was derived from CP000798. Source DNA and bacteria available from Jacques > Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL > REFSEQ: This record has not yet been subject to final NCBI review. The > reference sequence was derived from CP000798. Source DNA and bacteria > available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. > ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli > E24377A;plasmid=pETEC_6;strain=E24377A > NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001 > NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 > NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified > by glimmer%3B putative;codon_start=1;product=hypothetical > protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 > > while these are from Scotts' file: > ##gff-version 3 > # sequence-region NC_009789 1 6199 > # conversion-by bp_genbank2gff3.pl > # organism Escherichia coli E24377A > # date 06-JAN-2010 > # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. > NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1 > ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A > plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has > not yet been subject to final NCBI review. The reference sequence was > derived from CP000798. Source DNA and bacteria available from Jacques Ravel > (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: > This record has not yet been subject to final NCBI review. The reference > sequence was derived from CP000798. Source DNA and bacteria available from > Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. > ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli > E24377A;plasmid=pETEC_6;strain=E24377A > NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001 > NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 > NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified > by glimmer%3B > putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical > protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 > > > Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's, > as desired. > I have no idea what is going on here... > > Best, > Dave > > On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain wrote: >> >> Hi Dave, >> >> Let's keep the discussion on the mailing list so we can make sure that >> when this problem is solved, its resolution will be archived. >> >> I don't really understand what is going on either, though it would >> probably be a good idea to set your PERL5LIB env variable so that when >> you execute this script from the git repository that it will also uses >> BioPerl modules in the git repository instead of the ones that are >> installed in your "normal" path. >> >> Also, are you using any command line flags when executing it? ?I didn't. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann >> wrote: >> > Yes, I'm using Ubuntu 10.04. >> > >> > That is really weired. I tried running the script from the perl-live dir >> > (which I just pulled using git), and I get the same results as before >> > (`Name` instead of `locus_tag`): >> > >> > ?$ wget >> > >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y >> > NC_009789.genbank >> > >> > Attached is the resulting GFF3. >> > I also attach a copy of bp_genbank2gff3.pl as found under >> > /home/dave/src/bioperl-live/blib/script. >> > >> > This is a real mystery for me! >> > >> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >> >> >> >> Typically I do build and install, but you can run it directly from the >> >> git checkout directory. >> >> >> >> For locating other versions of the script, are you running linux? ?If >> >> so, are you familiar with the "locate" command: >> >> >> >> ?locate bp_genbank2gff3.pl >> >> >> >> If you've never used it before, you may need to update the database >> >> the locate command uses as root: >> >> >> >> ?sudo updatedb >> >> >> >> Scott >> >> >> >> >> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >> >> wrote: >> >> > Your gff seems fine. I get a vey similiar one, but with `Name=` >> >> > instaed >> >> > of >> >> > `locus_tag=`. >> >> > >> >> > I don't really know how to check for multiple bioperl installations. >> >> > I'm using my personal server, so I don't mind removing and installing >> >> > everything from scratch -- but I do'nt know ho to do that. >> >> > >> >> > Also, what I don't get with the git is how the scripts are supposed >> >> > to >> >> > be >> >> > updated (unless you build and install). >> >> > >> >> > Thanks you! >> >> > >> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain >> >> > wrote: >> >> >> >> >> >> Well, if you aren't getting the same results as me then I'd say you >> >> >> aren't using the same version of the script :-) >> >> >> >> >> >> Unfortunately, the scripts are no longer automatically marked with >> >> >> the >> >> >> "internal" version information when committed, so there really isn't >> >> >> anything in the script I can tell you to look for. ?Check for more >> >> >> than one bioperl instance on your ?computer. >> >> >> >> >> >> I've attached the GFF3 file I got so you can look at it and tell me >> >> >> if >> >> >> it is what you expect. >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >> >> >> wrote: >> >> >> > Hi Scott, >> >> >> > >> >> >> > I just pulled the lated bioperl-live using git. >> >> >> > I'm not sure how the scripts are updated, so I Build and installed >> >> >> > anyway >> >> >> > (perhaps exporting the path is supposed to be enough?) >> >> >> > Anyway, I still get the same results. No locus_tag. >> >> >> > How can I tell if I'm using the latest version of the script? >> >> >> > >> >> >> > Thanks again. >> >> >> > >> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> A fresh "pull" of the bioperl git repository shows that >> >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >> >> >> >> all >> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID >> >> >> >> when >> >> >> >> it can (it can't blindly use the locus tag for the ID since both >> >> >> >> the >> >> >> >> gene and the CDS have the same tag). >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >> >> >> >> wrote: >> >> >> >> > Hi Scott, >> >> >> >> > >> >> >> >> > Here is a very short genbank: >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> >> >> >> > >> >> >> >> > Note all genes in the genbank have locus tags. In the resulting >> >> >> >> > GFF3, >> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. >> >> >> >> > I >> >> >> >> > have >> >> >> >> > no >> >> >> >> > idea why it deserves a special treatment... :) >> >> >> >> > >> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 >> >> >> >> > last >> >> >> >> > column >> >> >> >> > whenever available) will really make my life easier. >> >> >> >> > >> >> >> >> > Thank you, >> >> >> >> > Dave >> >> >> >> > >> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> That seems perfectly reasonable. ?If you could point out a >> >> >> >> >> GenBank >> >> >> >> >> entry for which that does not happen, I could try to figure >> >> >> >> >> out >> >> >> >> >> why >> >> >> >> >> not. >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> >> >> >> >> wrote: >> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >> >> >> >> >> > locus_tag >> >> >> >> >> > will >> >> >> >> >> > be >> >> >> >> >> > always added to the GFF last column if it exists in the >> >> >> >> >> > genbank, >> >> >> >> >> > whether >> >> >> >> >> > it >> >> >> >> >> > is used as ID in the GFF or not. >> >> >> >> >> > >> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> >> >> >> >> > >> >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to >> >> >> >> >> >> deal >> >> >> >> >> >> with >> >> >> >> >> >> GenBank files :-) ?It was designed initially to work on >> >> >> >> >> >> whole >> >> >> >> >> >> genome >> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to >> >> >> >> >> >> make >> >> >> >> >> >> it >> >> >> >> >> >> "do >> >> >> >> >> >> the right thing." ?In practice, it is not unusual for a >> >> >> >> >> >> post >> >> >> >> >> >> processing step (either by hand or a quicky perl script) to >> >> >> >> >> >> be >> >> >> >> >> >> required to really get it right. ?I don't recall the >> >> >> >> >> >> specifics >> >> >> >> >> >> (if I >> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I >> >> >> >> >> >> do >> >> >> >> >> >> know >> >> >> >> >> >> that there is a list of things that it will try to use for >> >> >> >> >> >> the >> >> >> >> >> >> ID, >> >> >> >> >> >> and >> >> >> >> >> >> while the locus is on the list, I don't know where it comes >> >> >> >> >> >> in >> >> >> >> >> >> the >> >> >> >> >> >> list, so it's possible that other items might supersede it. >> >> >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> >> >> >> >> wrote: >> >> >> >> >> >> > Hello, >> >> >> >> >> >> > >> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it >> >> >> >> >> >> > adds >> >> >> >> >> >> > a >> >> >> >> >> >> > `locus_tag` >> >> >> >> >> >> > in the fields and sometime it doesn't, even though the >> >> >> >> >> >> > genabank >> >> >> >> >> >> > has a >> >> >> >> >> >> > locus >> >> >> >> >> >> > tag. >> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >> >> >> >> >> >> > >> >> >> >> >> >> > Thanks, >> >> >> >> >> >> > Dave >> >> >> >> >> >> > _______________________________________________ >> >> >> >> >> >> > Bioperl-l mailing list >> >> >> >> >> >> > Bioperl-l at lists.open-bio.org >> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott >> >> >> >> >> >> at >> >> >> >> >> >> scottcain >> >> >> >> >> >> dot net >> >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> >> 216-392-3087 >> >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> >> scottcain >> >> >> >> >> dot net >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> 216-392-3087 >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> scottcain >> >> >> >> dot net >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> 216-392-3087 >> >> >> >> Ontario Institute for Cancer Research >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> scottcain >> >> >> dot net >> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> >> Ontario Institute for Cancer Research >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From j.scholtalbers at gmail.com Mon Sep 20 04:04:34 2010 From: j.scholtalbers at gmail.com (Jelle Scholtalbers) Date: Mon, 20 Sep 2010 10:04:34 +0200 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: Hi, I'm trying to get all descendents for a specific taxon using Entrez. each_Descendent and get_all_Descendents don't seem to be implemented or working. I then tried by getting the tree for this taxon using Bio::DB::Taxonomy's get_tree. However this only retrieves the ancestors/parents. What would be the best approach here? Cheers, Jelle On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > Eric > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > wrote: > > Sounds like this is going through an initial indexing step (for > flatfiles). I would expect the initial indexing of the tables to take time > as you have to create the DB, but subsequent lookups post-indexing should be > much faster if the index is already present. Maybe Jason could answer in > more detail? > > > > chris > > > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > >> Hello, > >> > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl > >> 5.8.5 with BioPerl 1.6.0 > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > >> > >> It ran for 100 cpu seconds and output: > >> > >> 33090 Viridiplantae kingdom > >> > >> I was expecting it to also output the descendents. Some questions: > >> > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > >> implemented? It looks to be in Taxon.pm but it is not documented and > >> when I ran Data::Dumper on $node the value '_desc' was empty. > >> > >> 2) is the flatfile reader always so slow? after replacing 'flatfile' > >> with a call to 'entrez' it took only 0.02 cpu seconds to come > >> up with the same result. > >> > >> thanks, > >> Eric > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From pcantalupo at gmail.com Mon Sep 20 10:46:32 2010 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Mon, 20 Sep 2010 10:46:32 -0400 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: Jelle, Below is my subroutine that returns the lineage corresponding to a Taxonomy id. For example, if you use 10633 as the taxid, the subroutine will return: Viruses dsDNA viruses, no RNA stage Polyomaviridae Polyomavirus Simian virus 40 I hope this is what you wanted. Good luck sub taxid2lineage { ?? my ($id) = @_; ?? return undef unless ($id); ?? my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', ????????????????????????????????????????? -db??? => 'taxonomy', ????????????????????????????????????????? -email => 'pcantalupo at gmail.com', ????????????????????????????????????????? -id??? => [ $id ], ????????????????????????????????????????? ); ?? my $res = $factory->get_Response->content; ?? my $data = XMLin($res); ?? if (!ref($data)) { ????? # this happens when the Taxid is not found in the Taxonomy DB ????? return $data; ?? } ?? my @lineage = (); ?? foreach my $taxa (@{ $data->{Taxon}->{LineageEx}->{Taxon} } ) { ????? # taxa is a hash with three keys ScientificName, TaxId, and Rank ????? # I'm only saving the ScientificName but possible extensions to this ????? # subroutine would be to return the TaxId and Rank as well. ????? push (@lineage, $taxa->{ScientificName}); ?? } ?? # add the Species to the end of the Lineage array. ?? push (@lineage, $data->{Taxon}->{ScientificName}); ?? return wantarray ? return @lineage : join("; ", @lineage); } Paul Cantalupo University of Pittsburgh On Mon, Sep 20, 2010 at 4:04 AM, Jelle Scholtalbers wrote: > > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. ?I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > > Eric > > > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > > wrote: > > > Sounds like this is going through an initial indexing step (for > > flatfiles). ?I would expect the initial indexing of the tables to take time > > as you have to create the DB, but subsequent lookups post-indexing should be > > much faster if the index is already present. ?Maybe Jason could answer in > > more detail? > > > > > > chris > > > > > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > > > >> Hello, > > >> > > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl > > >> 5.8.5 with BioPerl 1.6.0 > > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > >> > > >> It ran for 100 cpu seconds and output: > > >> > > >> 33090 Viridiplantae kingdom > > >> > > >> I was expecting it to also output the descendents. Some questions: > > >> > > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > > >> implemented? It looks to be in Taxon.pm but it is not documented and > > >> when I ran Data::Dumper on $node the value '_desc' was empty. > > >> > > >> 2) is the flatfile reader always so slow? after replacing 'flatfile' > > >> with a call to 'entrez' it took only 0.02 cpu seconds to come > > >> up with the same result. > > >> > > >> thanks, > > >> Eric > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Sep 20 11:38:36 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 20 Sep 2010 08:38:36 -0700 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: <4C977FFC.5000205@bioperl.org> This works for me to get all the descendents from sub-node. You have to call the function with the dabatase handle. I am not sure if the Taxon implementation has reference to the dbhandle or not: #!/usr/bin/perl -w use strict; use Bio::DB::Taxonomy; my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into this directory my $db = Bio::DB::Taxonomy->new(-source => 'flatfile', -nodesfile => "$dbdir/nodes.dmp", -namesfile => "$dbdir/names.dmp", ); my $taxa = $db->get_taxon(-taxonid => 151341); my @d = $db->get_all_Descendents($taxa); print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name } @d), "\n"; Hope that helps. Jelle Scholtalbers wrote, On 9/20/10 1:04 AM: > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > >> Thanks, that was indeed the answer to #2. Any idea about each_Descendent? >> Eric >> >> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields >> wrote: >> >>> Sounds like this is going through an initial indexing step (for >>> >> flatfiles). I would expect the initial indexing of the tables to take time >> as you have to create the DB, but subsequent lookups post-indexing should be >> much faster if the index is already present. Maybe Jason could answer in >> more detail? >> >>> chris >>> >>> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: >>> >>> >>>> Hello, >>>> >>>> I tried the Bio::DB::Taxonomy example on this wiki page using perl >>>> 5.8.5 with BioPerl 1.6.0 >>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>> >>>> It ran for 100 cpu seconds and output: >>>> >>>> 33090 Viridiplantae kingdom >>>> >>>> I was expecting it to also output the descendents. Some questions: >>>> >>>> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually >>>> implemented? It looks to be in Taxon.pm but it is not documented and >>>> when I ran Data::Dumper on $node the value '_desc' was empty. >>>> >>>> 2) is the flatfile reader always so slow? after replacing 'flatfile' >>>> with a call to 'entrez' it took only 0.02 cpu seconds to come >>>> up with the same result. >>>> >>>> thanks, >>>> Eric >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From j.scholtalbers at gmail.com Wed Sep 22 03:46:35 2010 From: j.scholtalbers at gmail.com (Jelle Scholtalbers) Date: Wed, 22 Sep 2010 09:46:35 +0200 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: <4C977FFC.5000205@bioperl.org> References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> <4C977FFC.5000205@bioperl.org> Message-ID: Hi Jason, this was the same method I was using. With the taxdump it works apparently, however it does not work with Entrez as source. So I will just stick to a up2date taxdump then. Thanks for your example. @Paul: Your method gives indeed the lineage but will only retrieve the ancestors. I want to retrieve all the descendents. Thx anyway. Cheers, Jelle On Mon, Sep 20, 2010 at 5:38 PM, Jason Stajich wrote: > > This works for me to get all the descendents from sub-node. You have to > call the function with the dabatase handle. I am not sure if the Taxon > implementation has reference to the dbhandle or not: > #!/usr/bin/perl -w > use strict; > use Bio::DB::Taxonomy; > my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into > this directory > my $db = Bio::DB::Taxonomy->new(-source => 'flatfile', > -nodesfile => "$dbdir/nodes.dmp", > -namesfile => "$dbdir/names.dmp", > ); > my $taxa = $db->get_taxon(-taxonid => 151341); > my @d = $db->get_all_Descendents($taxa); > > print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name > } @d), "\n"; > > > Hope that helps. > Jelle Scholtalbers wrote, On 9/20/10 1:04 AM: > > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > > > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > Eric > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > wrote: > > > Sounds like this is going through an initial indexing step (for > > > flatfiles). I would expect the initial indexing of the tables to take time > as you have to create the DB, but subsequent lookups post-indexing should be > much faster if the index is already present. Maybe Jason could answer in > more detail? > > > chris > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > > Hello, > > I tried the Bio::DB::Taxonomy example on this wiki page using perl > 5.8.5 with BioPerl 1.6.0http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > It ran for 100 cpu seconds and output: > > 33090 Viridiplantae kingdom > > I was expecting it to also output the descendents. Some questions: > > 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > implemented? It looks to be in Taxon.pm but it is not documented and > when I ran Data::Dumper on $node the value '_desc' was empty. > > 2) is the flatfile reader always so slow? after replacing 'flatfile' > with a call to 'entrez' it took only 0.02 cpu seconds to come > up with the same result. > > thanks, > Eric > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > From waldenhe at muohio.edu Fri Sep 24 15:15:48 2010 From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene) Date: Fri, 24 Sep 2010 15:15:48 -0400 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> Hello Bioperl Masters, I am trying to perform a local blast with a query list of fasta files against a db of other fasta files. I am attempting to use the Bio::Tools::Run::StandAloneBlastPlus module. I have downleaded from the NCBI website BLAST+ 2.2.24+ and installed on my ubuntu machine. I am using bioperl-1.5.2. so the snibbit of code that is giving me errors is below: my $seq_obj = Bio::Seq->new(-id =>$accn, -seq =>$seq); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; The error I am getting is: --------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- Can't call method "next_result" on an undefined value at /media/C8B3-4A4A/Bioinformatics 1.1 beta/BioPerl/bioperl.pm line 284. I think the real problem is the "cannot find path to Blastall. >From reading around on different forums I have to make a .ncbirc text file with the location of BLAST+2.2.24+ on my machine. I have that file in my /home folder. How do I get StandAloneBlastPlus synced up with BLAST+2.2.24+ ? Am I approaching this right? Thankyou, Hans Waldenmaier From ross at cuhk.edu.hk Sat Sep 25 04:30:39 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sat, 25 Sep 2010 16:30:39 +0800 Subject: [Bioperl-l] perl for GO In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: <015201cb5c8b$ef693490$ce3b9db0$@edu.hk> Given a set of GO IDs, e.g. GO:0008150 GO:0005750 GO:0006122 GO:0008121 GO:0003674 GO:0005575 GO:0008150 GO:0009507 GO:0009535 GO:0009567 GO:0009977 GO:0010027 GO:0031361 from http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo one can manually examine the hierarchy. Although there is go-perl (http://search.cpan.org/~cmungall/go-perl/) and go-db-perl (http://search.cpan.org/~cmungall/go-db-perl/), as a life science student who just learns Perl, I find it difficult to draw a hierarchy tree (or simply make it a table to count the occurrence) to produce something like: biological_process (4) *** cellular process (4) ****** cell adhesion (1) ****** cell differention (3) Molecular function (4) Cellular component (4) Can anybody advise? I don't need any fancy figures at all... From David.Messina at sbc.su.se Sun Sep 26 12:11:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 26 Sep 2010 18:11:54 +0200 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0@sbc.su.se> Hi Hans, > I think the real problem is the "cannot find path to Blastall. Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. Dave From maj at fortinbras.us Sun Sep 26 20:43:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 27 Sep 2010 00:43:15 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: git clone http://github.com/bioperl/bioperl-live.git git clone http://github.com/bioperl/bioperl-run.git (i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl cheers MAJ -------------------------- Mark A. Jensen, PhD Senior Consultant Fortinbras Research http://www.fortinbras.us >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Sunday, September 26, 2010 12:11 PM >To: 'Waldenmaier, Hans Eugene' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans, > > >> I think the real problem is the "cannot find path to Blastall. > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > >Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Sep 27 17:07:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Sep 2010 16:07:11 -0500 Subject: [Bioperl-l] Client-side Scansite Bioperl module In-Reply-To: References: Message-ID: Sorry, didn't see this being responded to on-list (been off the radar the last month). I think this is a good idea, but I'm wondering if this might be better as a separate release on CPAN from bioperl core, seeing as we're in the prelim stages after the next bioperl release of modularizing the current bioperl core into smaller independent releases. chris On Sep 4, 2010, at 10:40 AM, Jonathan Rameseder wrote: > hi guys > > it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases. maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D! > > best wishes > johnny > > [1] Bio::Tools::Analysis::Protein::Scansite > [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593 > > ******************** > Jonathan Rameseder > Ph.D. Candidate > Computational Systems Biology Initiative > Koch Institute for Integrative Cancer Research > Massachusetts Institute of Technology > ******************** From gandipalem at gmail.com Tue Sep 28 00:09:06 2010 From: gandipalem at gmail.com (bv s) Date: Tue, 28 Sep 2010 09:39:06 +0530 Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19 In-Reply-To: References: Message-ID: Dear Sir/Madam, Any one can tell how to use the make_primers.pl script? What is Coordination file? Regards Suresh Scholar, National Bureau Of Plant Genetic Resources, New Delhi. On Mon, Sep 27, 2010 at 9:30 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Re: StandAloneBlastPlus (Dave Messina) > 2. Re: StandAloneBlastPlus (Mark A. Jensen) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 26 Sep 2010 18:11:54 +0200 > From: Dave Messina > Subject: Re: [Bioperl-l] StandAloneBlastPlus > To: "Waldenmaier, Hans Eugene" > Cc: "bioperl-l at bioperl.org" > Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0 at sbc.su.se> > Content-Type: text/plain; charset=us-ascii > > Hi Hans, > > > > I think the real problem is the "cannot find path to Blastall. > > Yes. But it sounds like you're trying to use the Bio::Tools::Run modules > for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it > has blastn, blastp, etc. > > See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > > Also, you probably need to upgrade your BioPerl installation. I'm pretty > sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > > Dave > > > > > ------------------------------ > > Message: 2 > Date: Mon, 27 Sep 2010 00:43:15 +0000 > From: "Mark A. Jensen" > Subject: Re: [Bioperl-l] StandAloneBlastPlus > To: "Dave Messina" , "Waldenmaier, Hans > Eugene" > Cc: bioperl-l at bioperl.org > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > Hi Hans-- Dave is right; you'll need both the new blast+ as well as the > latest BioPerl trunk code. Get it by doing both of the following: > > git clone http://github.com/bioperl/bioperl-live.git > git clone http://github.com/bioperl/bioperl-run.git > > (i.e., you need the latest core and run distributions). To install, see > http://www.bioperl.org/wiki/Installing_BioPerl > > cheers MAJ > > -------------------------- > Mark A. Jensen, PhD > Senior Consultant > Fortinbras Research > http://www.fortinbras.us > > >-----Original Message----- > >From: Dave Messina [mailto:David.Messina at sbc.su.se] > >Sent: Sunday, September 26, 2010 12:11 PM > >To: 'Waldenmaier, Hans Eugene' > >Cc: bioperl-l at bioperl.org > >Subject: Re: [Bioperl-l] StandAloneBlastPlus > > > >Hi Hans, > > > > > >> I think the real problem is the "cannot find path to Blastall. > > > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules > for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it > has blastn, blastp, etc. > > > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > > > >Also, you probably need to upgrade your BioPerl installation. I'm pretty > sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > > > > > >Dave > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 89, Issue 19 > ***************************************** > From David.Messina at sbc.su.se Tue Sep 28 03:53:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 28 Sep 2010 09:53:29 +0200 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> Message-ID: <0BFD9DB0-40D9-4443-8968-CF5D5A31BD02@sbc.su.se> > I can get the command-line Blast running. But I still cannot get Perl to see BLAST. Type the following on the command line: perl -e 'print $ENV{PATH}, "\n"' You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing export /home/hans/BLAST/bin:PATH=${PATH} on the command line and then type perl -e 'print $ENV{PATH}, "\n"' again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script? > I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: > export PATH=${PATH}:/home/hans/BLAST/bin > export BLASTDIR=/home/hans/BLAST/ > > Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added. It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you. Dave From David.Messina at sbc.su.se Tue Sep 28 03:58:00 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 28 Sep 2010 09:58:00 +0200 Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19 In-Reply-To: References: Message-ID: <6BACC902-4F5E-466B-B949-FE373831CB92@sbc.su.se> > Any one can tell how to use the make_primers.pl script? > What is Coordination file? >From the documentation at the top of the script: Description: This program designs primers for constructing knockouts of genes by transformation of PCR products (ref: Datsenko & Wanner, PNAS 2000). A tab-delimited file containing ORF START STOP is read, and primers flanking the start & stop coordinates are designed based on the user-designated sequence file. In addition, primers flanking the knockout regions are chosen for PCR screening purposes once the knockout is generated. The script uses Bioperl in order to determine the primer sequences, which requires getting subsequences and reverse complementing some of the objects. Dave From maj at fortinbras.us Tue Sep 28 07:18:34 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 28 Sep 2010 11:18:34 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: The module checks the env variable BLASTPLUSDIR for the executable; you can set it directly export BLASTPLUSDIR=/home/hans/BLAST/bin and you should be good to go. MAJ >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Tuesday, September 28, 2010 03:53 AM >To: 'Waldenmaier, Hans Eugene' >Cc: 'Mark A. Jensen', bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >> I can get the command-line Blast running. But I still cannot get Perl to see BLAST. > >Type the following on the command line: >perl -e 'print $ENV{PATH}, "\n"' > >You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing >export /home/hans/BLAST/bin:PATH=${PATH} > >on the command line and then type >perl -e 'print $ENV{PATH}, "\n"' > >again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script? > > >> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: >> export PATH=${PATH}:/home/hans/BLAST/bin >> export BLASTDIR=/home/hans/BLAST/ >> >> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. > >It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added. > >It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you. > > >Dave >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From waldenhe at muohio.edu Tue Sep 28 00:52:56 2010 From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene) Date: Tue, 28 Sep 2010 00:52:56 -0400 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: References: Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> Thanks Guys, I have run those steps, my current version now is: hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.006001 But I am still having problems. I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI. I can get the command-line Blast running. But I still cannot get Perl to see BLAST. Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now. I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: export PATH=${PATH}:/home/hans/BLAST/bin export BLASTDIR=/home/hans/BLAST/ Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. Thanks for the help, Hans ________________________________________ From: Mark A. Jensen [maj at fortinbras.us] Sent: Sunday, September 26, 2010 8:43 To: Dave Messina; Waldenmaier, Hans Eugene Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] StandAloneBlastPlus Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: git clone http://github.com/bioperl/bioperl-live.git git clone http://github.com/bioperl/bioperl-run.git (i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl cheers MAJ -------------------------- Mark A. Jensen, PhD Senior Consultant Fortinbras Research http://www.fortinbras.us >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Sunday, September 26, 2010 12:11 PM >To: 'Waldenmaier, Hans Eugene' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans, > > >> I think the real problem is the "cannot find path to Blastall. > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > >Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Sep 28 11:04:07 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 28 Sep 2010 15:04:07 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: Should work from .bashrc, Hans. Also add export BLASTPLUSDIR=/home/hans/BLAST/bin It really should see it in the PATH as you have it, so that may be a bug; however the BLASTPLUSDIR should force it to see the program. You can also execute the export commands in the shell, and the variables will be set and visible to programs for the duration of the login session. You can see what they are set to in the shell by doing set | grep BLAST cheers MAJ >-----Original Message----- >From: Waldenmaier, Hans Eugene [mailto:waldenhe at muohio.edu] >Sent: Tuesday, September 28, 2010 12:52 AM >To: 'Mark A. Jensen', 'Dave Messina' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Thanks Guys, > >I have run those steps, my current version now is: >hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' >1.006001 > >But I am still having problems. > >I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI. I can get the command-line Blast running. But I still cannot get Perl to see BLAST. >Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now. >I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: >export PATH=${PATH}:/home/hans/BLAST/bin >export BLASTDIR=/home/hans/BLAST/ > >Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. > >Thanks for the help, > >Hans >________________________________________ >From: Mark A. Jensen [maj at fortinbras.us] >Sent: Sunday, September 26, 2010 8:43 >To: Dave Messina; Waldenmaier, Hans Eugene >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: > >git clone http://github.com/bioperl/bioperl-live.git >git clone http://github.com/bioperl/bioperl-run.git > >(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl > >cheers MAJ > >-------------------------- >Mark A. Jensen, PhD >Senior Consultant >Fortinbras Research >http://www.fortinbras.us > >>-----Original Message----- >>From: Dave Messina [mailto:David.Messina at sbc.su.se] >>Sent: Sunday, September 26, 2010 12:11 PM >>To: 'Waldenmaier, Hans Eugene' >>Cc: bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] StandAloneBlastPlus >> >>Hi Hans, >> >> >>> I think the real problem is the "cannot find path to Blastall. >> >>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. >> >>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. >> >>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. >> >> >> >>Dave >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From chiragmatkarbioinfo at gmail.com Thu Sep 30 08:20:35 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Thu, 30 Sep 2010 19:20:35 +0700 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id Message-ID: Hello all, Is there any module to fetch dna sequence data from ensemble gene id? -- Regards, Chirag Matkar From jun.yin at ucd.ie Thu Sep 30 09:36:31 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Thu, 30 Sep 2010 14:36:31 +0100 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id In-Reply-To: References: Message-ID: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> Hi, Chirag, BioPerl does not have any module to retrieve data from Ensembl. But Ensembl provides a BioPerl-like interface on that function. You can visit Ensembl's website on how to use that module: http://www.ensembl.org/info/data/api.html Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar Sent: Thursday, September 30, 2010 1:21 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id Hello all, Is there any module to fetch dna sequence data from ensemble gene id? -- Regards, Chirag Matkar _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Thu Sep 30 11:16:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 30 Sep 2010 10:16:45 -0500 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id In-Reply-To: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> References: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> Message-ID: On Sep 30, 2010, at 8:36 AM, Jun Yin wrote: > Hi, Chirag, > > BioPerl does not have any module to retrieve data from Ensembl. But Ensembl > provides a BioPerl-like interface on that function. Actually, BioPerl does have Bio::Tools::Run::Ensembl, which was submitted by Sendu Bala a few years back. I think it stills works rather well, at least tests pass. You might get more out of using the Ensembl API directly as Jun states though, YMMV. BTW, the ensembl API also works with the latest bioperl code, regardless what the Ensembl website says (e.g. they only support v1.2.3). Haven't heard more about whether this discrepancy was supposed to be addressed at some point. chris > You can visit Ensembl's website on how to use that module: > http://www.ensembl.org/info/data/api.html > > Cheers, > Jun Yin > Ph.D. student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar > Sent: Thursday, September 30, 2010 1:21 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id > > Hello all, > Is there any module to fetch dna sequence data from ensemble gene id? > > -- > Regards, > Chirag Matkar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From A.Vakhrusheva at lumc.nl Wed Sep 29 09:28:54 2010 From: A.Vakhrusheva at lumc.nl (A.Vakhrusheva at lumc.nl) Date: Wed, 29 Sep 2010 15:28:54 +0200 Subject: [Bioperl-l] Bio::Matrix::MatrixI Message-ID: <35D95AF6C5D146479C328BBBA554FB76028C367E@mailf.lumcnet.prod.intern> Bio::Matrix::MatrixI I have a question concerning this interface. I want to calculate p distances matrix, but what format is acceptable for input? Phylip doesn't work Anna From shalabh.sharma7 at gmail.com Wed Sep 1 16:56:35 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 1 Sep 2010 16:56:35 -0400 Subject: [Bioperl-l] Bio::SearchIO::hmmer Message-ID: Hi , I am trying to parse hmmsearch report (from HMMER3). I am using the script mentioned here: http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm I am not getting anything but this "amoA_10genes_align.fasta.2 [M=247] for HMM" as the output, i am not even getting any error. I am attaching the hmmsearch report (just a test report) which i tried to test against the parser. I would really appreciate if anyone can help me out. Thanks Shalabh Sharma -------------- next part -------------- # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: amoA_10genes.hmm # target sequence database: test.faa # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: amoA_10genes_align.fasta.2 [M=247] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 1.6e-72 231.1 5.1 1.7e-72 231.0 3.5 1.0 1 gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacte 1.6e-72 231.1 5.1 1.7e-72 231.0 3.5 1.0 1 gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacte Domain annotation for each sequence (and alignments): >> gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacterium] # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 231.0 3.5 1.7e-72 1.7e-72 113 245 .. 1 144 [. 1 146 [. 0.95 Alignments for each domain: == domain 1 score: 231.0 bits; conditional E-value: 1.7e-72 amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe gi|63021979|gb|AAY26564.1| 1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 8********************************************************************************** PP amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af +k+ gi|63021979|gb|AAY26564.1| 84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144 **********************************************966666666655555 PP >> gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacterium] # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 231.0 3.5 1.7e-72 1.7e-72 113 245 .. 1 144 [. 1 146 [. 0.95 Alignments for each domain: == domain 1 score: 231.0 bits; conditional E-value: 1.7e-72 amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe gi|63021981|gb|AAY26565.1| 1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 8********************************************************************************** PP amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af +k+ gi|63021981|gb|AAY26565.1| 84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144 **********************************************966666666655555 PP Internal pipeline statistics summary: ------------------------------------- Query model(s): 1 (247 nodes) Target sequences: 2 (300 residues) Passed MSV filter: 2 (1); expected 0.0 (0.02) Passed bias filter: 2 (1); expected 0.0 (0.02) Passed Vit filter: 2 (1); expected 0.0 (0.001) Passed Fwd filter: 2 (1); expected 0.0 (1e-05) Initial search space (Z): 2 [actual number of targets] Domain search space (domZ): 2 [number of targets reported over threshold] # CPU time: 0.03u 0.00s 00:00:00.03 Elapsed: 00:00:00.08 # Mc/sec: 0.93 // From thomas.sharpton at gmail.com Wed Sep 1 17:29:26 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 1 Sep 2010 14:29:26 -0700 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: References: Message-ID: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> Hi Shalabh, We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to use the HMMER3 version, as found here: http://github.com/bioperl/bioperl-hmmer3 Hope this helps, T On Sep 1, 2010, at 1:56 PM, shalabh sharma wrote: > Hi , > I am trying to parse hmmsearch report (from HMMER3). I am using > the > script mentioned here: > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm > > I am not getting anything but this "amoA_10genes_align.fasta.2 > [M=247] for > HMM" as the output, i am not even getting any error. > I am attaching the hmmsearch report (just a test report) which i > tried to > test against the parser. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Thu Sep 2 04:44:58 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 2 Sep 2010 10:44:58 +0200 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> References: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> Message-ID: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> On Wed, 1 Sep 2010 14:29:26 -0700 Thomas Sharpton wrote: Hi, > We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to > use the HMMER3 version, as found here: > > http://github.com/bioperl/bioperl-hmmer3 Actually it's now included in the bioperl-live repository, but the code hasn't made it into a release yet. http://github.com/bioperl/bioperl-live.git Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From e.stupka at ucl.ac.uk Thu Sep 2 08:32:02 2010 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Thu, 2 Sep 2010 13:32:02 +0100 Subject: [Bioperl-l] git account Message-ID: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> Hello there, I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any? thanks! Elia --- '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." ~ Stephen Hawkings Senior Lecturer, Bioinformatics Scientific Director - Bioinformatics, UCL Genomics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Institute of Cell and Molecular Science Barts and The London School of Medicine and Dentistry 4 Newark Street Whitechapel London E1 2AT Office (UCL): +44 207 679 6493 Fax: +44 0207 6796817 Office (ICMS): +44 0207 8822374 Mobile: +44 787 6478912 From cjfields at illinois.edu Thu Sep 2 10:29:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 2 Sep 2010 09:29:40 -0500 Subject: [Bioperl-l] git account In-Reply-To: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> References: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> Message-ID: Done! Let us know if you run into problems. chris On Sep 2, 2010, at 7:32 AM, Elia Stupka wrote: > Hello there, > > I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any? > > thanks! > > Elia > > > --- > '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." > ~ Stephen Hawkings > > Senior Lecturer, Bioinformatics > Scientific Director - Bioinformatics, UCL Genomics > > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Institute of Cell and Molecular Science > Barts and The London School of Medicine and Dentistry > 4 Newark Street > Whitechapel > London > E1 2AT > > Office (UCL): +44 207 679 6493 > Fax: +44 0207 6796817 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 787 6478912 > > > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From J.Christopher.Ellis at duke.edu Thu Sep 2 10:53:34 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Thu, 2 Sep 2010 10:53:34 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <53096.1283439214@duke.edu> Chris have you had any luck with this? Thanks, Chris On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent: Yes, I see that one. It may be the ID hash that is being returned is empty. I'll look into it. -c On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > Hi Chris, > > The error is... > > "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." > > The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > use Bio::DB::EUtilities; > > > > > > > > > my (%taxa, @taxa); > > > > my (%names, %idmap); > > > > > > > > > # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', > > > > # (probably) > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > 20807972 > 730439); > > > > > > > my $factory = Bio::DB::EUtilities->new( > > - > eutil => 'elink', > > > -db => 'taxonomy', > > > > > -dbfrom => 'protein', > > > > > -correspondence => 1, > > > > > -id => @ids); > > > > > > > > > # iterate through the LinkSet objects > > > > while (my $ds = $factory->next_LinkSet) { > > > > > $taxa{($ds->get_submitted_ids)[0] > > } > = ($ds->get_ids)[0] > > } > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > => > 'esummary', > > > -db => 'taxonomy', > > > > > -id => @taxa ); > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > { > > > $names{($_->get_contents_by_name('TaxId')) > > [ > 0]} = > > ($_->get_contents_by_name('ScientificName'))[0 > > ] > ; > > } > > > > > > > > > foreach (@ids) { > > > > > $idmap{$_} = $names{$taxa{$_ > > } > }; > > } > > > > > > > > > # %idmap is > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > # 730439 => 'Bacillus caldolyticus' > > > > # 89318838 => undef (this record has been removed from the db) > > > > > > > > > 1; > > > Thanks, > > > > Chris > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: > Chris, > > Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. > > chris > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > Hi All, > > > > I am trying to extract the entire taxonomy of an organism including the > > classifications. Some thing like... > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > > > Thanks for all your help in advance! > > > > Chris > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Sep 2 12:21:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 02 Sep 2010 11:21:48 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <53096.1283439214@duke.edu> References: <53096.1283439214@duke.edu> Message-ID: <1283444508.5339.10.camel@pyrimidine.igb.uiuc.edu> Chris, There are a few things wrong with the original script, so I'll fix them. Basically, it makes the assumption that every ID in the original list is found. The problem: eutils only reports back data it finds, silently discarding IDs that don't match. So, using the original ID list when building the hashes needs a bit more error checking. Here's the revised script that works for me. https://gist.github.com/f5db90a432fed68548d4 I'm also adding a check to ensure all IDs are defined prior to adding them to the param string, just in case. chris On Thu, 2010-09-02 at 10:53 -0400, J. Christopher Ellis wrote: > Chris have you had any luck with this? > > Thanks, > Chris > > On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent: > Yes, I see that one. It may be the ID hash that is being > returned is empty. I'll look into it. > > -c > > On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > > > Hi Chris, > > > > The error is... > > > > "Use of uninitialized value $id in join or string at > C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm > line 363." > > > > The script from > http://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > > > use Bio::DB::EUtilities; > > > > > > > > > > > > > > > > > > my (%taxa, @taxa); > > > > > > > > my (%names, %idmap); > > > > > > > > > > > > > > > > > > # these are protein ids; nuc ids will work by changing > -dbfrom => 'nucleotide', > > > > > > > > # (probably) > > > > > > > > > > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > > > 20807972 > > 730439); > > > > > > > > > > > > > > my $factory = Bio::DB::EUtilities->new( > > > > - > > eutil => 'elink', > > > > > > -db => 'taxonomy', > > > > > > > > > > -dbfrom => 'protein', > > > > > > > > > > -correspondence => 1, > > > > > > > > > > -id => \@ids); > > > > > > > > > > > > > > > > > > # iterate through the LinkSet objects > > > > > > > > while (my $ds = $factory->next_LinkSet) { > > > > > > > > > > $taxa{($ds->get_submitted_ids)[0] > > > > } > > = ($ds->get_ids)[0] > > > > } > > > > > > > > > > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > > > => > > 'esummary', > > > > > > -db => 'taxonomy', > > > > > > > > > > -id => \@taxa ); > > > > > > > > > > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > > > > { > > > > > > $names{($_->get_contents_by_name('TaxId')) > > > > [ > > 0]} = > > > > ($_->get_contents_by_name('ScientificName'))[0 > > > > ] > > ; > > > > } > > > > > > > > > > > > > > > > > > foreach (@ids) { > > > > > > > > > > $idmap{$_} = $names{$taxa{$_ > > > > } > > }; > > > > } > > > > > > > > > > > > > > > > > > # %idmap is > > > > > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > > > > > # 730439 => 'Bacillus caldolyticus' > > > > > > > > # 89318838 => undef (this record has been removed from the > db) > > > > > > > > > > > > > > > > > > 1; > > > > > > Thanks, > > > > > > > > Chris > > > > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu > sent: > > Chris, > > > > Regarding a fix for that script, we would have to see your > modified script and the error. However, there are modules > within BioPerl to essentially do what you want, in particular, > Bio::DB::Taxonomy. > > > > chris > > > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > > > Hi All, > > > > > > I am trying to extract the entire taxonomy of an organism > including the > > > classifications. Some thing like... > > > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, > Order:Enterobacteriales, Family:Enterobacteriaceae, > Genus:Escherichia > > > > > > I am not worried about format just that I get the > information and the associated level of hierarchy. The script > found > http://bioperl.org/wiki/Species_names_from_accession_numbers% > 26quot%3B%26gt% > 3Bhttp://bioperl.org/wiki/Species_names_from_accession_numbers">athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > > > My first question is "Is there a known fix for this?" and > my second question is how do I get the full hierarchical > information (as seen above) with the taxonomy db? > > > > > > Thanks for all your help in advance! > > > > > > Chris > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l% > 26quot%3B%26gt% > 3Bhttp://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From thomas.sharpton at gmail.com Thu Sep 2 12:34:07 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Thu, 2 Sep 2010 09:34:07 -0700 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> References: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> Message-ID: So it is! I'm paying attention, I swear I am.... Shalabh, if the HMMER3 version of SearchIO doesn't solve your problem, do let us know. Best, Tom On Sep 2, 2010, at 1:44 AM, Kai Blin wrote: > On Wed, 1 Sep 2010 14:29:26 -0700 > Thomas Sharpton wrote: > > Hi, > >> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to >> use the HMMER3 version, as found here: >> >> http://github.com/bioperl/bioperl-hmmer3 > > Actually it's now included in the bioperl-live repository, but the > code > hasn't made it into a release yet. > > http://github.com/bioperl/bioperl-live.git > > Cheers, > Kai > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From johnny at mit.edu Sat Sep 4 11:40:37 2010 From: johnny at mit.edu (Jonathan Rameseder) Date: Sat, 4 Sep 2010 11:40:37 -0400 Subject: [Bioperl-l] Client-side Scansite Bioperl module Message-ID: hi guys it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases. maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D! best wishes johnny [1] Bio::Tools::Analysis::Protein::Scansite [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593 ******************** Jonathan Rameseder Ph.D. Candidate Computational Systems Biology Initiative Koch Institute for Integrative Cancer Research Massachusetts Institute of Technology ******************** From David.Messina at sbc.su.se Mon Sep 6 08:14:20 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Sep 2010 14:14:20 +0200 Subject: [Bioperl-l] Client-side Scansite Bioperl module In-Reply-To: References: Message-ID: <0EA1C4B0-66CF-4AE3-9A47-CC6624737821@sbc.su.se> Hi Jonathan, Great to hear you're interested in including your code in BioPerl! In general, we are liberal in what we accept. I think (and I'd like to hear what other BioPerlers think) the value of adding your code depends a lot on how it ties in with existing BioPerl objects ? does it make use of Bio::Seq or Bio::SeqIO, for example? If you haven't already, you might want to take a look at some of our developer documentation. For example: http://www.bioperl.org/wiki/Bioperl_Best_Practices http://www.bioperl.org/wiki/Advanced_BioPerl Also, the other thing to be aware of is that in the near future BioPerl itself will be splitting up into separately distributed modules anyway. I can't find a good recent thread that discussed the rationale and details, but here's a couple anyway: http://www.bioperl.org/wiki/Proposed_BioPerl_changes http://old.nabble.com/Final-BioPerl-1.6-release-td29180027.html#a29195208 Dave From ross at cuhk.edu.hk Tue Sep 7 04:28:00 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 7 Sep 2010 16:28:00 +0800 Subject: [Bioperl-l] Indexing nr database In-Reply-To: References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> By the following codes, I wanna index the 4G nr database, however, the index file is > 1T and the job has been running for weeks and still hasn't finished. Could anybody tell me how you accomplish the goal? Thanks in advance. use strict; use Bio::DB::Flat::BinarySearch; (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; # use single quotes so you don't have to write # regular expressions like "gi\\|(\\d+)" #my $primary_pattern = '^>(\S+)'; #if ($fullHeader == 1) { my $primary_pattern = '^>(.+)'; #} my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis H37Rv complete genome"; #$string =~ s/$primary_pattern/RRR/g; #print "$string\n"; # one or more patterns stored in a hash: my $secondary_patterns = {GI => 'gi\|(\d+)'}; my $db = Bio::DB::Flat::BinarySearch->new( -directory => $baseDir, -dbname => $dbName, -write_flag => 1, -primary_pattern => $primary_pattern, -primary_namespace => 'ACC', -secondary_patterns => $secondary_patterns, -verbose => 1, -format => 'fasta' ); $db->build_index($seqFile); From David.Messina at sbc.su.se Tue Sep 7 05:23:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 7 Sep 2010 11:23:42 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> Message-ID: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> Hi Ross, What do you need the index for? If it's random retrieval of sequences using an accession or GI, you'd be better off using NCBI's own database indexing and retrieval tools. They're far faster than BioPerl. They're distributed with Blast+ and available here: ftp://ftp.ncbi.nlm.nih.gov//blast/executables/LATEST Specifically, I'm talking about 'makeblastdb' and blastdbcmd'. I'm not sure what you mean by "4g" nr, but there's an already-indexed version of nr available here: ftp://ftp.ncbi.nih.gov//blast/db You can use that directly with the BLAST+ database tools. Also, you take a look at the cookbook at the end of the Blast+ user manual (available in the same download directory as Blast+ itself). Some nice examples there showing off the flexibility of this latest version of the software. Dave From ross at cuhk.edu.hk Tue Sep 7 05:18:16 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 7 Sep 2010 17:18:16 +0800 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <4C860148.3030000@fmi.ch> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> Message-ID: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> The reason is that I have to retrieve the specific information of the matched sequences, e.g. extract the 64th amino acid of the top matched sequence. Is there any way to achieve that? -----Original Message----- From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] Sent: Tuesday, September 07, 2010 5:09 PM To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk Subject: Re: [Bioperl-l] Indexing nr database Hi why don't you use the pre-indexed BLAST files from NCBI: ftp://ftp.ncbi.nih.gov/blast/db/ you can use them to fetch individual sequences by gi number or accession with the tool "blastdbcmd" from blast+ binaries: ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ regards, Hans On 09/07/2010 10:28 AM, Ross KK Leung wrote: > By the following codes, I wanna index the 4G nr database, however, the index > file is> 1T and the job has been running for weeks and still hasn't > finished. Could anybody tell me how you accomplish the goal? Thanks in > advance. > > use strict; > > use Bio::DB::Flat::BinarySearch; > > > > (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; > > > > # use single quotes so you don't have to write > > # regular expressions like "gi\\|(\\d+)" > > #my $primary_pattern = '^>(\S+)'; > > #if ($fullHeader == 1) { > > my $primary_pattern = '^>(.+)'; > > #} > > my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis > H37Rv complete genome"; > #$string =~ s/$primary_pattern/RRR/g; > > #print "$string\n"; > > > > # one or more patterns stored in a hash: > > my $secondary_patterns = {GI => 'gi\|(\d+)'}; > > > > my $db = Bio::DB::Flat::BinarySearch->new( > > -directory => $baseDir, > > -dbname => $dbName, > > -write_flag => 1, > > -primary_pattern => $primary_pattern, > > -primary_namespace => 'ACC', > > -secondary_patterns => $secondary_patterns, > > -verbose => 1, > > -format => 'fasta' ); > > > > $db->build_index($seqFile); > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Tue Sep 7 05:09:28 2010 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Tue, 07 Sep 2010 11:09:28 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> Message-ID: <4C860148.3030000@fmi.ch> Hi why don't you use the pre-indexed BLAST files from NCBI: ftp://ftp.ncbi.nih.gov/blast/db/ you can use them to fetch individual sequences by gi number or accession with the tool "blastdbcmd" from blast+ binaries: ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ regards, Hans On 09/07/2010 10:28 AM, Ross KK Leung wrote: > By the following codes, I wanna index the 4G nr database, however, the index > file is> 1T and the job has been running for weeks and still hasn't > finished. Could anybody tell me how you accomplish the goal? Thanks in > advance. > > use strict; > > use Bio::DB::Flat::BinarySearch; > > > > (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; > > > > # use single quotes so you don't have to write > > # regular expressions like "gi\\|(\\d+)" > > #my $primary_pattern = '^>(\S+)'; > > #if ($fullHeader == 1) { > > my $primary_pattern = '^>(.+)'; > > #} > > my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis > H37Rv complete genome"; > #$string =~ s/$primary_pattern/RRR/g; > > #print "$string\n"; > > > > # one or more patterns stored in a hash: > > my $secondary_patterns = {GI => 'gi\|(\d+)'}; > > > > my $db = Bio::DB::Flat::BinarySearch->new( > > -directory => $baseDir, > > -dbname => $dbName, > > -write_flag => 1, > > -primary_pattern => $primary_pattern, > > -primary_namespace => 'ACC', > > -secondary_patterns => $secondary_patterns, > > -verbose => 1, > > -format => 'fasta' ); > > > > $db->build_index($seqFile); > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Tue Sep 7 05:33:46 2010 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Tue, 07 Sep 2010 11:33:46 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> Message-ID: <4C8606FA.3000509@fmi.ch> On 09/07/2010 11:18 AM, Ross KK Leung wrote: > The reason is that I have to retrieve the specific information of the > matched sequences, e.g. extract the 64th amino acid of the top matched > sequence. Is there any way to achieve that? "blastdbcmd" has several options like "-range" and even if "blastdbcmd" does not give you the subset of information you want to fetch, I am still convinced you are quicker by fetching the complete entry with"blastdbcmd" and then parse the required data out of just one entry. Hans > -----Original Message----- > From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] > Sent: Tuesday, September 07, 2010 5:09 PM > To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk > Subject: Re: [Bioperl-l] Indexing nr database > > Hi > > > why don't you use the pre-indexed BLAST files from NCBI: > > ftp://ftp.ncbi.nih.gov/blast/db/ > > you can use them to fetch individual sequences by gi number or accession > with the tool "blastdbcmd" from blast+ binaries: > > ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ > > > regards, Hans > > > > On 09/07/2010 10:28 AM, Ross KK Leung wrote: >> By the following codes, I wanna index the 4G nr database, however, the > index >> file is> 1T and the job has been running for weeks and still hasn't >> finished. Could anybody tell me how you accomplish the goal? Thanks in >> advance. >> >> use strict; >> >> use Bio::DB::Flat::BinarySearch; >> >> >> >> (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = > @ARGV; >> >> >> >> # use single quotes so you don't have to write >> >> # regular expressions like "gi\\|(\\d+)" >> >> #my $primary_pattern = '^>(\S+)'; >> >> #if ($fullHeader == 1) { >> >> my $primary_pattern = '^>(.+)'; >> >> #} >> >> my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis >> H37Rv complete genome"; >> #$string =~ s/$primary_pattern/RRR/g; >> >> #print "$string\n"; >> >> >> >> # one or more patterns stored in a hash: >> >> my $secondary_patterns = {GI => 'gi\|(\d+)'}; >> >> >> >> my $db = Bio::DB::Flat::BinarySearch->new( >> >> -directory => $baseDir, >> >> -dbname => $dbName, >> >> -write_flag => 1, >> >> -primary_pattern => $primary_pattern, >> >> -primary_namespace => 'ACC', >> >> -secondary_patterns => $secondary_patterns, >> >> -verbose => 1, >> >> -format => 'fasta' ); >> >> >> >> $db->build_index($seqFile); >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From fs5 at sanger.ac.uk Tue Sep 7 08:09:52 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 07 Sep 2010 13:09:52 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> Message-ID: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> I am working a lot with feature-rich Bio::Seq objects these days and thought that it would be really nice if I could do something like: my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); instead of having to grep for the feature every time. There could then be 'by_tag' and 'by_region' options as well. According to the Bio::Seq docs, something like this seems to be planned at some stage. I would be willing to contribute to this feature if I can and if this isn't already being implemented by somebody else. Does anybody know the state of this feature? Frank -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jason at bioperl.org Tue Sep 7 13:36:07 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 07 Sep 2010 10:36:07 -0700 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <4C867807.2040907@bioperl.org> And the implementation would just be something like this? my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] eq 'my_gene' } $seq->get_SeqFeatures(); I think any implementation would be if we moved from the in-memory arrays & hash-based system to a sqlite db on the back-end for how Sequence and Feature objects are stored. This would be a somewhat slower but wouldn't have performance/memory problems we get for sequences with many annotations. -jason Frank Schwach wrote, On 9/7/10 5:09 AM: > I am working a lot with feature-rich Bio::Seq objects these days and > thought that it would be really nice if I could do something like: > > my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > > instead of having to grep for the feature every time. > There could then be 'by_tag' and 'by_region' options as well. > > According to the Bio::Seq docs, something like this seems to be planned > at some stage. I would be willing to contribute to this feature if I can > and if this isn't already being implemented by somebody else. > Does anybody know the state of this feature? > > Frank > > > > > > > From fs5 at sanger.ac.uk Wed Sep 8 04:42:57 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 08 Sep 2010 09:42:57 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <4C867807.2040907@bioperl.org> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> Message-ID: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Jason, Yes, I guess that would be the simplest way of doing it - basically just doing it the way the docs suggest for getting at a specific feature but hiding the grep behind a Bio::Seq method with search parameters. But we could also build a hash of feature tags as the Bio::Seq is built so that retrieval is more efficient. This could also be used to implement a bin indexing scheme for range queries, similar to what Bio::DB::GFF does. Is a move to an sqlite backend planend for the near future? Frank On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: > And the implementation would just be something like this? > > my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] > eq 'my_gene' } $seq->get_SeqFeatures(); > > I think any implementation would be if we moved from the in-memory > arrays & hash-based system to a sqlite db on the back-end for how > Sequence and Feature objects are stored. > This would be a somewhat slower but wouldn't have performance/memory > problems we get for sequences with many annotations. > > -jason > Frank Schwach wrote, On 9/7/10 5:09 AM: > > I am working a lot with feature-rich Bio::Seq objects these days and > > thought that it would be really nice if I could do something like: > > > > my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > > > > instead of having to grep for the feature every time. > > There could then be 'by_tag' and 'by_region' options as well. > > > > According to the Bio::Seq docs, something like this seems to be planned > > at some stage. I would be willing to contribute to this feature if I can > > and if this isn't already being implemented by somebody else. > > Does anybody know the state of this feature? > > > > Frank > > > > > > > > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From stefan.kirov at bms.com Wed Sep 8 11:09:55 2010 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 08 Sep 2010 11:09:55 -0400 Subject: [Bioperl-l] Another interesting Javascript library Message-ID: <4C87A743.5010109@bms.com> Sorry for off topic, but I believe a lot of people can find this quite useful: "CanvasXpress is a javascript library based on the tag implemented in HTML5. I developed this library as the core visualization component for our BMS systems biology platform which I hope to release soon. The basic idea was to have generic and simple way to display genomics data. CanvasXpress supports bar graphs, line graphs, bar-line combination graphs, boxplots, dotplots, area graphs, stacked graphs, percentage-stacked graphs, correlation plots, Venn diagrams, heatmaps, newick trees, 2D-scatter plots, 2D-scatter bubble plots, 3D-scatter plots, pie charts, networks (or pathways), and a genome browser. It also supports a few data transformations like log and exponential transformation, z-score, percentile transformation and ratio. It also support grouping of samples, zooming, events ... yada, yada, yada ... and more importantly I created an Ext panel for it. Take a look. http://canvasxpress.org/" Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From alperyilmaz at gmail.com Wed Sep 8 12:47:42 2010 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Wed, 8 Sep 2010 12:47:42 -0400 Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates Message-ID: Hi, I have a GFF file listing mRNA and CDS coordinates for every transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates based on that information. I was wondering, if there's already made script for that purpose that you're aware of. I already uploaded the GFF file into Bio::DB::SeqFeature database, so I can utilize both flat file or database based scripts. thanks, Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 From cjfields at illinois.edu Wed Sep 8 19:20:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Sep 2010 18:20:09 -0500 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). chris On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > Hi Jason, > > Yes, I guess that would be the simplest way of doing it - basically just > doing it the way the docs suggest for getting at a specific feature but > hiding the grep behind a Bio::Seq method with search parameters. But we > could also build a hash of feature tags as the Bio::Seq is built so that > retrieval is more efficient. This could also be used to implement a bin > indexing scheme for range queries, similar to what Bio::DB::GFF does. > Is a move to an sqlite backend planend for the near future? > > Frank > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: >> And the implementation would just be something like this? >> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] >> eq 'my_gene' } $seq->get_SeqFeatures(); >> >> I think any implementation would be if we moved from the in-memory >> arrays & hash-based system to a sqlite db on the back-end for how >> Sequence and Feature objects are stored. >> This would be a somewhat slower but wouldn't have performance/memory >> problems we get for sequences with many annotations. >> >> -jason >> Frank Schwach wrote, On 9/7/10 5:09 AM: >>> I am working a lot with feature-rich Bio::Seq objects these days and >>> thought that it would be really nice if I could do something like: >>> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); >>> >>> instead of having to grep for the feature every time. >>> There could then be 'by_tag' and 'by_region' options as well. >>> >>> According to the Bio::Seq docs, something like this seems to be planned >>> at some stage. I would be willing to contribute to this feature if I can >>> and if this isn't already being implemented by somebody else. >>> Does anybody know the state of this feature? >>> >>> Frank >>> >>> >>> >>> >>> >>> >>> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Sep 9 01:51:53 2010 From: jason at bioperl.org (Jason Stajich) Date: Wed, 08 Sep 2010 22:51:53 -0700 Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates In-Reply-To: References: Message-ID: <4C8875F9.6020502@bioperl.org> Hi Alper - This script operates on gtf so doesn't quite do what you want but could be modified to be simpler to just look at the CDS and mRNA rather than the exon,start/stop codon info http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl Otherwise I think there make be some easy ways to do this from some tools in MAKER too. -jason Alper Yilmaz wrote, On 9/8/10 9:47 AM: > Hi, > > I have a GFF file listing mRNA and CDS coordinates for every > transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates > based on that information. I was wondering, if there's already made > script for that purpose that you're aware of. > > I already uploaded the GFF file into Bio::DB::SeqFeature database, so > I can utilize both flat file or database based scripts. > > thanks, > > Alper Yilmaz > Post-doctoral Researcher > Plant Biotechnology Center > The Ohio State University > 1060 Carmack Rd > Columbus, OH 43210 > (614)688-4954 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Thu Sep 9 04:10:36 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 09 Sep 2010 09:10:36 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Message-ID: <1284019836.4777.281.camel@deskpro15336.dynamic.sanger.ac.uk> so something like an abstract Bio::Seq::FeatureContainer that defines the methods for storing and retrieving features and that would then be sub-classed to e.g. Bio::Seq::FeatureContainer::Memory or Bio::Seq::FeatureContainer:Sqlite - is that the plan? Is there any way I can get involved or is it better to wait for other features to be developed first? Cheers, Frank On Wed, 2010-09-08 at 18:20 -0500, Chris Fields wrote: > Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). > > chris > > On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > > > Hi Jason, > > > > Yes, I guess that would be the simplest way of doing it - basically just > > doing it the way the docs suggest for getting at a specific feature but > > hiding the grep behind a Bio::Seq method with search parameters. But we > > could also build a hash of feature tags as the Bio::Seq is built so that > > retrieval is more efficient. This could also be used to implement a bin > > indexing scheme for range queries, similar to what Bio::DB::GFF does. > > Is a move to an sqlite backend planend for the near future? > > > > Frank > > > > > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: > >> And the implementation would just be something like this? > >> > >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] > >> eq 'my_gene' } $seq->get_SeqFeatures(); > >> > >> I think any implementation would be if we moved from the in-memory > >> arrays & hash-based system to a sqlite db on the back-end for how > >> Sequence and Feature objects are stored. > >> This would be a somewhat slower but wouldn't have performance/memory > >> problems we get for sequences with many annotations. > >> > >> -jason > >> Frank Schwach wrote, On 9/7/10 5:09 AM: > >>> I am working a lot with feature-rich Bio::Seq objects these days and > >>> thought that it would be really nice if I could do something like: > >>> > >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > >>> > >>> instead of having to grep for the feature every time. > >>> There could then be 'by_tag' and 'by_region' options as well. > >>> > >>> According to the Bio::Seq docs, something like this seems to be planned > >>> at some stage. I would be willing to contribute to this feature if I can > >>> and if this isn't already being implemented by somebody else. > >>> Does anybody know the state of this feature? > >>> > >>> Frank > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jun.yin at ucd.ie Thu Sep 9 04:20:39 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Thu, 09 Sep 2010 09:20:39 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Message-ID: <00ea01cb4ff7$e30652f0$a912f8d0$%yin@ucd.ie> Hi, I would like to give a go on the bin indexing scheme on Bio::Seq(or a similar package to Bio::LocatableSeq). The idea is to save the index of sequences to a local database (AnyDBM) instead of the memory itself. So this will free some memory usage. This idea actually comes from Bio::DB::Fasta, as implemented by Lincoln Stein. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Thursday, September 09, 2010 12:20 AM To: Frank Schwach Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Seq, search for specific features Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). chris On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > Hi Jason, > > Yes, I guess that would be the simplest way of doing it - basically just > doing it the way the docs suggest for getting at a specific feature but > hiding the grep behind a Bio::Seq method with search parameters. But we > could also build a hash of feature tags as the Bio::Seq is built so that > retrieval is more efficient. This could also be used to implement a bin > indexing scheme for range queries, similar to what Bio::DB::GFF does. > Is a move to an sqlite backend planend for the near future? > > Frank > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: >> And the implementation would just be something like this? >> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] >> eq 'my_gene' } $seq->get_SeqFeatures(); >> >> I think any implementation would be if we moved from the in-memory >> arrays & hash-based system to a sqlite db on the back-end for how >> Sequence and Feature objects are stored. >> This would be a somewhat slower but wouldn't have performance/memory >> problems we get for sequences with many annotations. >> >> -jason >> Frank Schwach wrote, On 9/7/10 5:09 AM: >>> I am working a lot with feature-rich Bio::Seq objects these days and >>> thought that it would be really nice if I could do something like: >>> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); >>> >>> instead of having to grep for the feature every time. >>> There could then be 'by_tag' and 'by_region' options as well. >>> >>> According to the Bio::Seq docs, something like this seems to be planned >>> at some stage. I would be willing to contribute to this feature if I can >>> and if this isn't already being implemented by somebody else. >>> Does anybody know the state of this feature? >>> >>> Frank >>> >>> >>> >>> >>> >>> >>> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From s1012635 at student.hsleiden.nl Thu Sep 9 05:27:23 2010 From: s1012635 at student.hsleiden.nl (_Lelieveld, Stefan - s1012635) Date: Thu, 9 Sep 2010 11:27:23 +0200 (CEST) Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Hi, I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : use Bio::Tools::Tmhmm; my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); while( my $tmhmm_feat = $parser->next_result ) { #do something #eg push @tmhmm_feat, $tmhmm_feat; } How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? cheers! Stefan Lelieveld From fs5 at sanger.ac.uk Thu Sep 9 06:28:51 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 09 Sep 2010 11:28:51 +0100 Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> References: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <1284028131.4777.290.camel@deskpro15336.dynamic.sanger.ac.uk> I haven't used that module myself but it appears to be a parser for results from TMHMM, i.e. you don't feed it the FASTA file but the output from TMHMM after it was run. To run TMHMM you should use Bio::Tools::Run::Tmhmm http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Tmhmm.pm Follow the synopsis to feed the tool with your sequences. You can learn how to read a FASTA file and access each sequence in a loop here: http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples Essentially it boils down to: use Bio::SeqIO; my $file = shift; # to get a file path from command line my $inseq = Bio::SeqIO->new(-file => "<$file",-format => 'FASTA' ); while (my $seq = $inseq->next_seq) { print $seq->accession_number,"\n"; } as an example for printing out accession numbers from $seq, which is a Bio::Seq object. So what you have to do now is to feed each of those Bio::Seq objects into your TMHMM runner. Frank On Thu, 2010-09-09 at 11:27 +0200, _Lelieveld, Stefan - s1012635 wrote: > Hi, > > I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). > I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. > > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : > use Bio::Tools::Tmhmm; > my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); > while( my $tmhmm_feat = $parser->next_result ) { > #do something > #eg > push @tmhmm_feat, $tmhmm_feat; > } > > How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? > > cheers! > > Stefan Lelieveld > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From kai.blin at biotech.uni-tuebingen.de Thu Sep 9 06:16:08 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 9 Sep 2010 12:16:08 +0200 Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> References: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl> <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <20100909121608.2571bbff.kai.blin@biotech.uni-tuebingen.de> On Thu, 9 Sep 2010 11:27:23 +0200 (CEST) "_Lelieveld, Stefan - s1012635" wrote: Hi Stefan, > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : > use Bio::Tools::Tmhmm; > my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); > while( my $tmhmm_feat = $parser->next_result ) { > #do something > #eg > push @tmhmm_feat, $tmhmm_feat; > } > > How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? You need to run TMHMM first, of course. Bio::Tools::Tmhmm only parses the TMHMM output file and returns an object that you can ask for Bio::SeqFeature objects. So if you want to run TMHMM on some fasta files, this module isn't going to do that for you. Assuming that input.txt contains the TMHMM output, """ my $parser = new Bio::Tools:Tmhmm(-file => "input.txt"); """ will load parse the TMHMM output for you. HTH, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From elanorbust2 at yahoo.com Thu Sep 9 12:10:06 2010 From: elanorbust2 at yahoo.com (sally roberts) Date: Thu, 9 Sep 2010 09:10:06 -0700 (PDT) Subject: [Bioperl-l] standaloneblastplus Message-ID: <154453.73718.qm@web37504.mail.mud.yahoo.com> I am running a test for standaloneblastplus but getting data back that does not exist in my query or my local database. Below is a outline of my script small database, query list, and erroneous results. As you will notice the query list is comprised of the first four sequences found in the database. The results say it can not find the first two and then the mathces for the last two do not exist! Thanks for any help! Program #!/usr/bin/perl use Bio::Tools::Run::StandAloneBlastPlus; $fac = Bio::Tools::Run::StandAloneBlastPlus->new( ? -db_name => 'ITS', ? -db_data => 'smallDB.fas', ? -create => 1 ); $result = $fac->blastn( -query => , 'sequences.fasta', ??????????????????????? -outfile => 'ITStest2.bls'); smallDB.fas Data >302585252|HM807352|Waitea circinata? internal transcribed spacer 1 ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >302585252|HM807352|Waitea circinata? internal transcribed spacer 2 GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGCGCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCATTAAAGCGGAGGAA >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1 GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGCGCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTATATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAGAATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCTACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACCTCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTGACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGTGAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCGTTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCGTCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTACCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAGTCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATATATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGAACGCA CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTAAGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCTTATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTGGGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGAATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGAAACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGAAATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTAACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGACTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAAGTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACGTAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGAAACTTTTATATATAGTTCGCCGAATAATAGCGAAC >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer 1 AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCGCTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATCGTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTGAGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTCGAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCGAGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACCGGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTGGCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCCCTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTGTATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATATGAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATGTATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCTATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1 AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCTTTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATCTCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCGGGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAATAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGAAACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAATATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCCGCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTCGTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATAGTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATACAAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCATAAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1 AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGCTCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATTCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTCTTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCGATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGAGATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAATTACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAACGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGTAGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATATATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG sequences.fasta data >Test1 ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >Test2 GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >Test3 CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >Test4 GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA Results BLASTN 2.2.24+ Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Database: ITS ?????????? 5 sequences; 1,102 total letters Query=? Test1 Length=204 ***** No hits found ***** Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 202071 Query=? Test2 Length=192 ***** No hits found ***** Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 189507 Query=? Test3 Length=437 ????????????????????????????????????????????????????????????????????? Score???? E Sequences producing significant alignments:????????????????????????? (Bits)? Value dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 300??? 2e-085 dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 6e-016 dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012 dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 4e-012 >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G59F Length=203 ?Score =? 300 bits (162),? Expect = 2e-085 ?Identities = 176/182 (96%), Gaps = 4/182 (2%) ?Strand=Plus/Plus Query? 10?? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC? 66 ??????????? ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC? 81 Query? 67?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 126 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 82?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 141 Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 201 Query? 187? GG? 188 ??????????? || Sbjct? 202? GG? 203 >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G64F Length=217 ?Score = 69.4 bits (37),? Expect = 6e-016 ?Identities = 39/40 (97%), Gaps = 0/40 (0%) ?Strand=Plus/Plus Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188 ??????????? ||||| |||||||||||||||||||||||||||||||||| Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G60F Length=206 ?Score = 58.4 bits (31),? Expect = 1e-012 ?Identities = 39/42 (92%), Gaps = 3/42 (7%) ?Strand=Plus/Plus Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||| || ||| |||||||||||||||||||||||||||||| Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G65F Length=256 ?Score = 56.5 bits (30),? Expect = 4e-012 ?Identities = 30/30 (100%), Gaps = 0/30 (0%) ?Strand=Plus/Plus Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||||||||||||||||||||||||||||| Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 442850 Query=? Test4 Length=521 ????????????????????????????????????????????????????????????????????? Score???? E Sequences producing significant alignments:????????????????????????? (Bits)? Value dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 309??? 4e-088 dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 7e-016 dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012 dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 5e-012 >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G59F Length=203 ?Score =? 309 bits (167),? Expect = 4e-088 ?Identities = 177/181 (97%), Gaps = 3/181 (1%) ?Strand=Plus/Plus Query? 7??? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA? 63 ??????????? ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA? 82 Query? 64?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 123 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 83?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 142 Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 183 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 202 Query? 184? G? 184 ??????????? | Sbjct? 203? G? 203 >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G64F Length=217 ?Score = 69.4 bits (37),? Expect = 7e-016 ?Identities = 39/40 (97%), Gaps = 0/40 (0%) ?Strand=Plus/Plus Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184 ??????????? ||||| |||||||||||||||||||||||||||||||||| Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G60F Length=206 ?Score = 58.4 bits (31),? Expect = 1e-012 ?Identities = 39/42 (92%), Gaps = 3/42 (7%) ?Strand=Plus/Plus Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182 ??????????? |||| || ||| |||||||||||||||||||||||||||||| Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G65F Length=256 ?Score = 56.5 bits (30),? Expect = 5e-012 ?Identities = 30/30 (100%), Gaps = 0/30 (0%) ?Strand=Plus/Plus Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182 ??????????? |||||||||||||||||||||||||||||| Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 530378 ? Database: ITS ??? Posted date:? Aug 27, 2010? 9:43 AM ? Number of letters in database: 1,102 ? Number of sequences in database:? 5 Matrix: blastn matrix 1 -2 Gap Penalties: Existence: 0, Extension: 2.5 From jaya1786 at gmail.com Thu Sep 9 12:59:51 2010 From: jaya1786 at gmail.com (jayanthijayakumar) Date: Thu, 9 Sep 2010 22:29:51 +0530 Subject: [Bioperl-l] Regarding GSoC 2010 Message-ID: Respected sir/madam, I am Jayanthi Jayakumar doing my second year MS(By Research) in computational biology in Anna University Chennai,India. Iam very much interested to participate in GSoC 2010 under the project "Major Bioperl recognition". I request you to provide details and eligiblity criteria for the same. Thanking you, yours faithfully, Jayanthi Jayakumar From Russell.Smithies at agresearch.co.nz Thu Sep 9 18:54:43 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 10 Sep 2010 10:54:43 +1200 Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <154453.73718.qm@web37504.mail.mud.yahoo.com> References: <154453.73718.qm@web37504.mail.mud.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz> Is that a typo in your email or are some of your fasta headers in your db incorrect? Eg. >301333052|GU725063|Xiphinema adenohystherum internal transcribed >301333052|GU725063|spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Shouldn't that be: >301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Maybe the invalid fasta headers are breaking the db formatter? Russell Smithies Technical Support T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of sally roberts > Sent: Friday, 10 September 2010 4:10 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] standaloneblastplus > > I am running a test for standaloneblastplus but getting data back that > does not exist in my query or my local database. Below is a outline of my > script small database, query list, and erroneous results. As you will > notice the query list is comprised of the first four sequences found in > the database. The results say it can not find the first two and then the > mathces for the last two do not exist! > > Thanks for any help! > > > > Program > > > #!/usr/bin/perl > > use Bio::Tools::Run::StandAloneBlastPlus; > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'ITS', > -db_data => 'smallDB.fas', > -create => 1 > ); > > $result = $fac->blastn( -query => , 'sequences.fasta', > -outfile => 'ITStest2.bls'); > > > smallDB.fas Data > > >302585252|HM807352|Waitea circinata internal transcribed spacer 1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >302585252|HM807352|Waitea circinata internal transcribed spacer 2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >302585250|HM802273|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >302585249|HM802272|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > >302585248|HM802271|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC > CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA > TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT > GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC > CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG > GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC > GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC > ATATCATTAAAGCGGAGGAA > > >301333053|GU725064|Xiphinema turcicum internal transcribed spacer 1 > GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC > GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA > TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG > AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT > ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC > TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG > ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT > GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG > TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG > TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA > CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG > TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA > TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA > ACGCA > CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA > AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT > TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA > > >301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer > 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG > CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT > CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG > GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA > ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA > AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA > AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA > ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA > CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA > AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA > GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG > TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA > AACTTTTATATATAGTTCGCCGAATAATAGCGAAC > > >301333051|GU725062|Xiphinema sphaerocephalum internal transcribed spacer > 1 > AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG > CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC > GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG > AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC > GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG > AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC > GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG > GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC > CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG > TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT > GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG > TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT > ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC > > >301333050|GU725061|Xiphinema hispanum internal transcribed spacer 1 > AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT > TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC > TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG > GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA > TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA > AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA > TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC > GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC > GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC > ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA > GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC > AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT > AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG > > >301333049|GU725060|Xiphinema pyrenaicum internal transcribed spacer 1 > AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC > TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT > TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC > TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG > ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA > GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT > TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA > CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT > AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA > GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA > TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT > ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG > > > > sequences.fasta data > > >Test1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >Test2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >Test3 > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >Test4 > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > > > > Results > > BLASTN 2.2.24+ > > > Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb > Miller (2000), "A greedy algorithm for aligning DNA sequences", J > Comput Biol 2000; 7(1-2):203-14. > > > > Database: ITS > 5 sequences; 1,102 total letters > > > > Query= Test1 > Length=204 > > > ***** No hits found ***** > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 202071 > > > Query= Test2 > Length=192 > > > ***** No hits found ***** > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 189507 > > > Query= Test3 > Length=437 > > Score E > Sequences producing significant alignments: > (Bits) Value > > dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 300 2e-085 > dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4 6e-016 > dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4 1e-012 > dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5 4e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > > Score = 300 bits (162), Expect = 2e-085 > Identities = 176/182 (96%), Gaps = 4/182 (2%) > Strand=Plus/Plus > > Query 10 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC > 66 > ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| > Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC > 81 > > Query 67 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 126 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 82 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 141 > > Query 127 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 186 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 142 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 201 > > Query 187 GG 188 > || > Sbjct 202 GG 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > > Score = 69.4 bits (37), Expect = 6e-016 > Identities = 39/40 (97%), Gaps = 0/40 (0%) > Strand=Plus/Plus > > Query 149 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 188 > ||||| |||||||||||||||||||||||||||||||||| > Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > > Score = 58.4 bits (31), Expect = 1e-012 > Identities = 39/42 (92%), Gaps = 3/42 (7%) > Strand=Plus/Plus > > Query 146 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 186 > |||| || ||| |||||||||||||||||||||||||||||| > Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > > Score = 56.5 bits (30), Expect = 4e-012 > Identities = 30/30 (100%), Gaps = 0/30 (0%) > Strand=Plus/Plus > > Query 157 AAAACTTTCAACAACGGATCTCTTGGTTCT 186 > |||||||||||||||||||||||||||||| > Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 442850 > > > Query= Test4 > Length=521 > > Score E > Sequences producing significant alignments: > (Bits) Value > > dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 309 4e-088 > dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4 7e-016 > dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4 1e-012 > dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5 5e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > > Score = 309 bits (167), Expect = 4e-088 > Identities = 177/181 (97%), Gaps = 3/181 (1%) > Strand=Plus/Plus > > Query 7 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA > 63 > ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| > Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA > 82 > > Query 64 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 123 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 83 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 142 > > Query 124 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 183 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 143 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 202 > > Query 184 G 184 > | > Sbjct 203 G 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > > Score = 69.4 bits (37), Expect = 7e-016 > Identities = 39/40 (97%), Gaps = 0/40 (0%) > Strand=Plus/Plus > > Query 145 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 184 > ||||| |||||||||||||||||||||||||||||||||| > Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > > Score = 58.4 bits (31), Expect = 1e-012 > Identities = 39/42 (92%), Gaps = 3/42 (7%) > Strand=Plus/Plus > > Query 142 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 182 > |||| || ||| |||||||||||||||||||||||||||||| > Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > > Score = 56.5 bits (30), Expect = 5e-012 > Identities = 30/30 (100%), Gaps = 0/30 (0%) > Strand=Plus/Plus > > Query 153 AAAACTTTCAACAACGGATCTCTTGGTTCT 182 > |||||||||||||||||||||||||||||| > Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 530378 > > > Database: ITS > Posted date: Aug 27, 2010 9:43 AM > Number of letters in database: 1,102 > Number of sequences in database: 5 > > > > Matrix: blastn matrix 1 -2 > Gap Penalties: Existence: 0, Extension: 2.5 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From elanorbust2 at yahoo.com Fri Sep 10 11:13:08 2010 From: elanorbust2 at yahoo.com (sally roberts) Date: Fri, 10 Sep 2010 08:13:08 -0700 (PDT) Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz> Message-ID: <23696.14536.qm@web37508.mail.mud.yahoo.com> I think that is just a email error. Thanks for looking though! --- On Thu, 9/9/10, Smithies, Russell wrote: From: Smithies, Russell Subject: RE: [Bioperl-l] standaloneblastplus To: "'sally roberts'" , "'bioperl-l at lists.open-bio.org'" Date: Thursday, September 9, 2010, 6:54 PM Is that a typo in your email or are some of your fasta headers in your db incorrect? Eg. >301333052|GU725063|Xiphinema adenohystherum? internal transcribed >301333052|GU725063|spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Shouldn't that be: >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Maybe the invalid fasta headers are breaking the db formatter? Russell Smithies Technical Support T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809 F? +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of sally roberts > Sent: Friday, 10 September 2010 4:10 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] standaloneblastplus > > I am running a test for standaloneblastplus but getting data back that > does not exist in my query or my local database. Below is a outline of my > script small database, query list, and erroneous results. As you will > notice the query list is comprised of the first four sequences found in > the database. The results say it can not find the first two and then the > mathces for the last two do not exist! > > Thanks for any help! > > > > Program > > > #!/usr/bin/perl > > use Bio::Tools::Run::StandAloneBlastPlus; > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( >???-db_name => 'ITS', >???-db_data => 'smallDB.fas', >???-create => 1 > ); > > $result = $fac->blastn( -query => , 'sequences.fasta', >? ? ? ? ? ? ? ? ? ? ? ???-outfile => 'ITStest2.bls'); > > > smallDB.fas Data > > >302585252|HM807352|Waitea circinata? internal transcribed spacer 1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >302585252|HM807352|Waitea circinata? internal transcribed spacer 2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC > CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA > TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT > GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC > CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG > GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC > GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC > ATATCATTAAAGCGGAGGAA > > >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1 > GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC > GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA > TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG > AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT > ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC > TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG > ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT > GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG > TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG > TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA > CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG > TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA > TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA > ACGCA > CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA > AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT > TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA > > >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer > 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG > CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT > CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG > GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA > ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA > AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA > AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA > ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA > CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA > AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA > GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG > TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA > AACTTTTATATATAGTTCGCCGAATAATAGCGAAC > > >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer > 1 > AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG > CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC > GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG > AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC > GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG > AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC > GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG > GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC > CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG > TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT > GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG > TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT > ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC > > >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1 > AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT > TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC > TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG > GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA > TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA > AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA > TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC > GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC > GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC > ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA > GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC > AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT > AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG > > >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1 > AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC > TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT > TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC > TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG > ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA > GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT > TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA > CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT > AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA > GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA > TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT > ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG > > > > sequences.fasta data > > >Test1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >Test2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >Test3 > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >Test4 > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > > > > Results > > BLASTN 2.2.24+ > > > Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb > Miller (2000), "A greedy algorithm for aligning DNA sequences", J > Comput Biol 2000; 7(1-2):203-14. > > > > Database: ITS >? ? ? ? ? ? 5 sequences; 1,102 total letters > > > > Query=? Test1 > Length=204 > > > ***** No hits found ***** > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 202071 > > > Query=? Test2 > Length=192 > > > ***** No hits found ***** > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 189507 > > > Query=? Test3 > Length=437 > > Score? ???E > Sequences producing significant alignments: > (Bits)? Value > > dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 300? ? 2e-085 > dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4? ? 6e-016 > dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4? ? 1e-012 > dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5? ? 4e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > >? Score =? 300 bits (162),? Expect = 2e-085 >? Identities = 176/182 (96%), Gaps = 4/182 (2%) >? Strand=Plus/Plus > > Query? 10???TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC > 66 >? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| > Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC > 81 > > Query? 67???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 126 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 82???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 141 > > Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 186 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 201 > > Query? 187? GG? 188 >? ? ? ? ? ???|| > Sbjct? 202? GG? 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > >? Score = 69.4 bits (37),? Expect = 6e-016 >? Identities = 39/40 (97%), Gaps = 0/40 (0%) >? Strand=Plus/Plus > > Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188 >? ? ? ? ? ???||||| |||||||||||||||||||||||||||||||||| > Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > >? Score = 58.4 bits (31),? Expect = 1e-012 >? Identities = 39/42 (92%), Gaps = 3/42 (7%) >? Strand=Plus/Plus > > Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 >? ? ? ? ? ???|||| || ||| |||||||||||||||||||||||||||||| > Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > >? Score = 56.5 bits (30),? Expect = 4e-012 >? Identities = 30/30 (100%), Gaps = 0/30 (0%) >? Strand=Plus/Plus > > Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186 >? ? ? ? ? ???|||||||||||||||||||||||||||||| > Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 442850 > > > Query=? Test4 > Length=521 > > Score? ???E > Sequences producing significant alignments: > (Bits)? Value > > dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 309? ? 4e-088 > dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4? ? 7e-016 > dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4? ? 1e-012 > dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5? ? 5e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > >? Score =? 309 bits (167),? Expect = 4e-088 >? Identities = 177/181 (97%), Gaps = 3/181 (1%) >? Strand=Plus/Plus > > Query? 7? ? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA > 63 >? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| > Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA > 82 > > Query? 64???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 123 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 83???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 142 > > Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 183 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 202 > > Query? 184? G? 184 >? ? ? ? ? ???| > Sbjct? 203? G? 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > >? Score = 69.4 bits (37),? Expect = 7e-016 >? Identities = 39/40 (97%), Gaps = 0/40 (0%) >? Strand=Plus/Plus > > Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184 >? ? ? ? ? ???||||| |||||||||||||||||||||||||||||||||| > Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > >? Score = 58.4 bits (31),? Expect = 1e-012 >? Identities = 39/42 (92%), Gaps = 3/42 (7%) >? Strand=Plus/Plus > > Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182 >? ? ? ? ? ???|||| || ||| |||||||||||||||||||||||||||||| > Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > >? Score = 56.5 bits (30),? Expect = 5e-012 >? Identities = 30/30 (100%), Gaps = 0/30 (0%) >? Strand=Plus/Plus > > Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182 >? ? ? ? ? ???|||||||||||||||||||||||||||||| > Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 530378 > > >???Database: ITS >? ???Posted date:? Aug 27, 2010? 9:43 AM >???Number of letters in database: 1,102 >???Number of sequences in database:? 5 > > > > Matrix: blastn matrix 1 -2 > Gap Penalties: Existence: 0, Extension: 2.5 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From David.Messina at sbc.su.se Fri Sep 10 12:23:26 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 10 Sep 2010 18:23:26 +0200 Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <23696.14536.qm@web37508.mail.mud.yahoo.com> References: <23696.14536.qm@web37508.mail.mud.yahoo.com> Message-ID: Hi Sally, Did you run the same search on the command line, outside of BioPerl? The issue you're having may be with Blast+ and not BioPerl. For example, it's possible that the low-complexity and compositional matrix adjustment filtering (which are turned on by default) are excluding the expected matches. Dave On Sep 10, 2010, at 17:13 , sally roberts wrote: > I think that is just a email error. Thanks for looking though! > > --- On Thu, 9/9/10, Smithies, Russell wrote: > > From: Smithies, Russell > Subject: RE: [Bioperl-l] standaloneblastplus > To: "'sally roberts'" , "'bioperl-l at lists.open-bio.org'" > Date: Thursday, September 9, 2010, 6:54 PM > > Is that a typo in your email or are some of your fasta headers in your db incorrect? > Eg. >> 301333052|GU725063|Xiphinema adenohystherum internal transcribed >> 301333052|GU725063|spacer 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT > > Shouldn't that be: >> 301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT > > Maybe the invalid fasta headers are breaking the db formatter? > > > Russell Smithies > > Technical Support > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of sally roberts >> Sent: Friday, 10 September 2010 4:10 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] standaloneblastplus >> >> I am running a test for standaloneblastplus but getting data back that >> does not exist in my query or my local database. Below is a outline of my >> script small database, query list, and erroneous results. As you will >> notice the query list is comprised of the first four sequences found in >> the database. The results say it can not find the first two and then the >> mathces for the last two do not exist! >> >> Thanks for any help! >> >> >> >> Program >> >> >> #!/usr/bin/perl >> >> use Bio::Tools::Run::StandAloneBlastPlus; >> >> >> $fac = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_name => 'ITS', >> -db_data => 'smallDB.fas', >> -create => 1 >> ); >> >> $result = $fac->blastn( -query => , 'sequences.fasta', >> -outfile => 'ITStest2.bls'); >> >> >> smallDB.fas Data >> >>> 302585252|HM807352|Waitea circinata internal transcribed spacer 1 >> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC >> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT >> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >> >>> 302585252|HM807352|Waitea circinata internal transcribed spacer 2 >> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT >> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA >> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >> >>> 302585250|HM802273|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT >> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA >> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA >> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT >> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC >> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >> >>> 302585249|HM802272|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG >> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA >> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT >> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT >> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT >> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA >> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG >> GAA >> >>> 302585248|HM802271|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC >> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA >> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT >> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC >> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG >> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC >> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC >> ATATCATTAAAGCGGAGGAA >> >>> 301333053|GU725064|Xiphinema turcicum internal transcribed spacer 1 >> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC >> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA >> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG >> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT >> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC >> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG >> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT >> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG >> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG >> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA >> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG >> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA >> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA >> ACGCA >> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA >> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT >> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA >> >>> 301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer >> 1 >> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG >> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT >> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG >> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA >> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA >> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA >> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA >> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA >> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA >> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA >> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG >> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA >> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC >> >>> 301333051|GU725062|Xiphinema sphaerocephalum internal transcribed spacer >> 1 >> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG >> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC >> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG >> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC >> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG >> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC >> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG >> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC >> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG >> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT >> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG >> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT >> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC >> >>> 301333050|GU725061|Xiphinema hispanum internal transcribed spacer 1 >> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT >> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC >> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG >> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA >> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA >> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA >> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC >> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC >> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC >> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA >> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC >> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT >> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG >> >>> 301333049|GU725060|Xiphinema pyrenaicum internal transcribed spacer 1 >> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC >> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT >> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC >> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG >> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA >> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT >> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA >> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT >> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA >> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA >> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT >> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG >> >> >> >> sequences.fasta data >> >>> Test1 >> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC >> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT >> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >> >>> Test2 >> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT >> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA >> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >> >>> Test3 >> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT >> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA >> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA >> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT >> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC >> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >> >>> Test4 >> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG >> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA >> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT >> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT >> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT >> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA >> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG >> GAA >> >> >> >> >> Results >> >> BLASTN 2.2.24+ >> >> >> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb >> Miller (2000), "A greedy algorithm for aligning DNA sequences", J >> Comput Biol 2000; 7(1-2):203-14. >> >> >> >> Database: ITS >> 5 sequences; 1,102 total letters >> >> >> >> Query= Test1 >> Length=204 >> >> >> ***** No hits found ***** >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 202071 >> >> >> Query= Test2 >> Length=192 >> >> >> ***** No hits found ***** >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 189507 >> >> >> Query= Test3 >> Length=437 >> >> Score E >> Sequences producing significant alignments: >> (Bits) Value >> >> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 300 2e-085 >> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 69.4 6e-016 >> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 58.4 1e-012 >> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 56.5 4e-012 >> >> >>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G59F >> Length=203 >> >> Score = 300 bits (162), Expect = 2e-085 >> Identities = 176/182 (96%), Gaps = 4/182 (2%) >> Strand=Plus/Plus >> >> Query 10 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC >> 66 >> ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| >> Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC >> 81 >> >> Query 67 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT >> 126 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 82 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT >> 141 >> >> Query 127 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT >> 186 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 142 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT >> 201 >> >> Query 187 GG 188 >> || >> Sbjct 202 GG 203 >> >> >>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G64F >> Length=217 >> >> Score = 69.4 bits (37), Expect = 6e-016 >> Identities = 39/40 (97%), Gaps = 0/40 (0%) >> Strand=Plus/Plus >> >> Query 149 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 188 >> ||||| |||||||||||||||||||||||||||||||||| >> Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 >> >> >>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G60F >> Length=206 >> >> Score = 58.4 bits (31), Expect = 1e-012 >> Identities = 39/42 (92%), Gaps = 3/42 (7%) >> Strand=Plus/Plus >> >> Query 146 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 186 >> |||| || ||| |||||||||||||||||||||||||||||| >> Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 >> >> >>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G65F >> Length=256 >> >> Score = 56.5 bits (30), Expect = 4e-012 >> Identities = 30/30 (100%), Gaps = 0/30 (0%) >> Strand=Plus/Plus >> >> Query 157 AAAACTTTCAACAACGGATCTCTTGGTTCT 186 >> |||||||||||||||||||||||||||||| >> Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 442850 >> >> >> Query= Test4 >> Length=521 >> >> Score E >> Sequences producing significant alignments: >> (Bits) Value >> >> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 309 4e-088 >> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 69.4 7e-016 >> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 58.4 1e-012 >> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 56.5 5e-012 >> >> >>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G59F >> Length=203 >> >> Score = 309 bits (167), Expect = 4e-088 >> Identities = 177/181 (97%), Gaps = 3/181 (1%) >> Strand=Plus/Plus >> >> Query 7 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA >> 63 >> ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| >> Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA >> 82 >> >> Query 64 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG >> 123 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 83 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG >> 142 >> >> Query 124 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG >> 183 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 143 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG >> 202 >> >> Query 184 G 184 >> | >> Sbjct 203 G 203 >> >> >>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G64F >> Length=217 >> >> Score = 69.4 bits (37), Expect = 7e-016 >> Identities = 39/40 (97%), Gaps = 0/40 (0%) >> Strand=Plus/Plus >> >> Query 145 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 184 >> ||||| |||||||||||||||||||||||||||||||||| >> Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 >> >> >>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G60F >> Length=206 >> >> Score = 58.4 bits (31), Expect = 1e-012 >> Identities = 39/42 (92%), Gaps = 3/42 (7%) >> Strand=Plus/Plus >> >> Query 142 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 182 >> |||| || ||| |||||||||||||||||||||||||||||| >> Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 >> >> >>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G65F >> Length=256 >> >> Score = 56.5 bits (30), Expect = 5e-012 >> Identities = 30/30 (100%), Gaps = 0/30 (0%) >> Strand=Plus/Plus >> >> Query 153 AAAACTTTCAACAACGGATCTCTTGGTTCT 182 >> |||||||||||||||||||||||||||||| >> Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 530378 >> >> >> Database: ITS >> Posted date: Aug 27, 2010 9:43 AM >> Number of letters in database: 1,102 >> Number of sequences in database: 5 >> >> >> >> Matrix: blastn matrix 1 -2 >> Gap Penalties: Existence: 0, Extension: 2.5 >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Sat Sep 11 12:13:09 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Sat, 11 Sep 2010 17:13:09 +0100 Subject: [Bioperl-l] Regarding GSoC 2010 In-Reply-To: References: Message-ID: <019501cb51cc$39d15730$ad740590$%yin@ucd.ie> Hi, Jayanthi Jayakumar, GSoC is already finished this year. You can check the information here: http://socghop.appspot.com/gsoc/program/home/google/gsoc2010 However, you can still contribute to the BioPerl project if you like. You can talk to people in this mail list. Or you can join the IRC channel (http://www.bioperl.org/wiki/IRC). Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jayanthijayakumar Sent: Thursday, September 09, 2010 6:00 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Regarding GSoC 2010 Respected sir/madam, I am Jayanthi Jayakumar doing my second year MS(By Research) in computational biology in Anna University Chennai,India. Iam very much interested to participate in GSoC 2010 under the project "Major Bioperl recognition". I request you to provide details and eligiblity criteria for the same. Thanking you, yours faithfully, Jayanthi Jayakumar _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From david.breimann at gmail.com Sun Sep 12 09:16:29 2010 From: david.breimann at gmail.com (David Breimann) Date: Sun, 12 Sep 2010 15:16:29 +0200 Subject: [Bioperl-l] Circular genomes Message-ID: Hello, As continuation to http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033904.html, I would like to ask: Was the fix implemented yet? That is, are GFF3 created for circular genomes comply with GFF3 specs for such genomes? I just find it difficult to keep track using git ,so I'm not sure if this was already handled. Also, will the stat and end coordinates of such genes loaded from a GFF3 file will be "normal" (i.e. no coordinate is larger than the size of the genome) or just as written in the GFF3 (which demands that end > start even if end > genome length)? Thanks, David From David.Messina at sbc.su.se Mon Sep 13 11:10:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 13 Sep 2010 17:10:42 +0200 Subject: [Bioperl-l] BioPerl net installer Message-ID: <80921A33-63E0-481A-B31B-3C0338542F2B@sbc.su.se> Hi everyone, I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. It's already part of bioperl-live, and you can also get it here: http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl Dave From maj at fortinbras.us Mon Sep 13 12:47:45 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 13 Sep 2010 16:47:45 +0000 Subject: [Bioperl-l] BioPerl net installer Message-ID: Dear Scott- You rock! Sincerely, Mark >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Monday, September 13, 2010 11:10 AM >To: 'BioPerl List' >Subject: [Bioperl-l] BioPerl net installer > >Hi everyone, > >I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. > >The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. > >It's already part of bioperl-live, and you can also get it here: > > http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Sep 13 17:15:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 13 Sep 2010 16:15:45 -0500 Subject: [Bioperl-l] BioPerl net installer In-Reply-To: References: Message-ID: <3D7D24C5-B2BD-472E-9611-F3D7112E453D@illinois.edu> Ditto! chris (briefly resurfacing) On Sep 13, 2010, at 11:47 AM, Mark A. Jensen wrote: > Dear Scott- > You rock! > Sincerely, > Mark > >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Monday, September 13, 2010 11:10 AM >> To: 'BioPerl List' >> Subject: [Bioperl-l] BioPerl net installer >> >> Hi everyone, >> >> I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. >> >> The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. >> >> It's already part of bioperl-live, and you can also get it here: >> >> http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl >> >> >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From timmcilveen at talktalk.net Mon Sep 13 19:07:00 2010 From: timmcilveen at talktalk.net (tim) Date: Tue, 14 Sep 2010 00:07:00 +0100 Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3 Message-ID: <201009140007.00798.timmcilveen@talktalk.net> Hi, I have just installed Bioperl on my Linux system using the CPAN install. The install summary is as follows: Test Summary Report ------------------- t/RemoteDB/GenPept.t (Wstat: 256 Tests: 21 Failed: 1) Failed test: 17 Non-zero exit status: 1 t/RemoteDB/Query/GenBank.t (Wstat: 256 Tests: 18 Failed: 1) Failed test: 9 Non-zero exit status: 1 Parse errors: Bad plan. You planned 21 tests but ran 18. t/RemoteDB/Taxonomy.t (Wstat: 512 Tests: 103 Failed: 2) Failed tests: 15, 98 Non-zero exit status: 2 t/Root/RootIO.t (Wstat: 7424 Tests: 30 Failed: 0) Non-zero exit status: 29 Parse errors: Bad plan. You planned 31 tests but ran 30. Files=329, Tests=18407, 512 wallclock secs ( 6.19 usr 0.91 sys + 156.68 cusr 9.16 csys = 172.94 CPU) Result: FAIL Failed 4/329 test programs. 4/18407 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Running Build install make test had returned bad status, won't install without force Failed during this command: CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO Is Bioperl properly installed? During the install process I was getting quite a lot of this error (100's of instances): 'replacement list longer than search list . This happened with t/tools, t/seq / t/search and many others. Any advice would be great. Tim From David.Messina at sbc.su.se Tue Sep 14 03:56:33 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 14 Sep 2010 09:56:33 +0200 Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3 In-Reply-To: <201009140007.00798.timmcilveen@talktalk.net> References: <201009140007.00798.timmcilveen@talktalk.net> Message-ID: <5955676D-D3BC-452B-BAA0-6F230EC11EC1@sbc.su.se> Hi Tim, Thanks for your report. > Is Bioperl properly installed? No, it wasn't. When installing through CPAN, if any tests fail the installation is aborted. You can always check by looking for this line: > make test had returned bad status, won't install without force As for the error(s) > 'replacement list longer than search list' I believe this was fixed a couple of months ago. For details, see: http://bugzilla.open-bio.org/show_bug.cgi?id=3116 So I would recommend that you grab the latest copy of bioperl-live from github, wherein the bug will be fixed: http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots Give that a shot and let us know how it goes. Dave From jskittrell at unmc.edu Thu Sep 16 12:15:49 2010 From: jskittrell at unmc.edu (Jeff Kittrell) Date: Thu, 16 Sep 2010 16:15:49 +0000 (UTC) Subject: [Bioperl-l] mpiblast Message-ID: Does Bioperl work with mpiblast? Is the there a standalone like module that allows you to easily call mpiblast? I'm assuming seqio with parse a mpiblast output file correctly? Thanks for any help, Jeff From David.Messina at sbc.su.se Thu Sep 16 14:25:57 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 16 Sep 2010 20:25:57 +0200 Subject: [Bioperl-l] mpiblast In-Reply-To: References: Message-ID: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> > Is the there a standalone like module that allows you to easily call mpiblast? No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward. http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase > I'm assuming seqio with parse a mpiblast output file correctly? Yes, although I see that a new version of mpiblast was recently released. Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet? Dave From shalabh.sharma7 at gmail.com Thu Sep 16 17:38:14 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 16 Sep 2010 17:38:14 -0400 Subject: [Bioperl-l] IUPAC code similarity Message-ID: Hi All, I have few nucleotide sequences that are composed of IUPAC codes. Like >test VGSRVBSSSSSNSC Similarly i have a database made of of these kind of sequences. I want to find sequences that are 100% similar to the query sequence. Is there any bioPerl module to deal with this, i tried normal blast but it didn't worked. Do i have to convert these sequences to 4 base codes or there is any other way out. Thanks Shalabh From amackey at virginia.edu Fri Sep 17 10:28:15 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 10:28:15 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Convert the IUPAC code to a regular expression, and use regular expressions (in Perl or grep or similar) to find 100% identical matches. -Aaron On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma wrote: > Hi All, > I have few nucleotide sequences that are composed of IUPAC codes. Like > >test > VGSRVBSSSSSNSC > > Similarly i have a database made of of these kind of sequences. I want to > find sequences that are 100% similar to the query sequence. > > Is there any bioPerl module to deal with this, i tried normal blast but it > didn't worked. > Do i have to convert these sequences to 4 base codes or there is any other > way out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Fri Sep 17 11:07:38 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 17 Sep 2010 11:07:38 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Thanks Aaron for your reply. Actually i tried that first, but there is another problem, i have to divide each query sequence to window size 5 with 1 base shift and its not possible to divide regular expression in that way. So what i am trying is to convert those iupac codes to 4 base code sequence and then do the normal search. Now the problem is that i cant able to convert those IUPAC sequences to normal ones, i am still trying to write a script but its taking time. Thanks Shalabh On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: > Convert the IUPAC code to a regular expression, and use regular expressions > (in Perl or grep or similar) to find 100% identical matches. > > -Aaron > > On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma > wrote: > >> Hi All, >> I have few nucleotide sequences that are composed of IUPAC codes. >> Like >> >test >> VGSRVBSSSSSNSC >> >> Similarly i have a database made of of these kind of sequences. I want to >> find sequences that are 100% similar to the query sequence. >> >> Is there any bioPerl module to deal with this, i tried normal blast but it >> didn't worked. >> Do i have to convert these sequences to 4 base codes or there is any other >> way out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From roy.chaudhuri at gmail.com Fri Sep 17 11:04:28 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 17 Sep 2010 16:04:28 +0100 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: <4C93837C.4080008@gmail.com> Hi Shalabh, The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC codes to regular expressions: $perl -e 'use Bio::Tools::SeqPattern; print Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand' [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C Although that won't work if there are also abiguity codes in your database. For a non-BioPerl solution you could try fuzznuc from Emboss. Cheers. Roy. On 17/09/2010 15:28, Aaron Mackey wrote: > Convert the IUPAC code to a regular expression, and use regular expressions > (in Perl or grep or similar) to find 100% identical matches. > > -Aaron > > On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma > wrote: > >> Hi All, >> I have few nucleotide sequences that are composed of IUPAC codes. Like >>> test >> VGSRVBSSSSSNSC >> >> Similarly i have a database made of of these kind of sequences. I want to >> find sequences that are 100% similar to the query sequence. >> >> Is there any bioPerl module to deal with this, i tried normal blast but it >> didn't worked. >> Do i have to convert these sequences to 4 base codes or there is any other >> way out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Fri Sep 17 14:13:22 2010 From: david.breimann at gmail.com (David Breimann) Date: Fri, 17 Sep 2010 20:13:22 +0200 Subject: [Bioperl-l] Installing using git after an older installation Message-ID: Hello, I'm sharing a server with some other lab members. I would like to install the latest version of bioperl for my own use, without affecting my colleagues. I used git to clone a copy of bioperl-live and exported PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB". Now perl -MBio::Perl -le 'print Bio::Perl->VERSION;' returns 1.0069 My question is: is that all? Now I'm using the latest version? Should I include anything special in my scripts? Also, what about all the bp_***.pl scripts? Are the now using the latest version, too? I guess not, since I didn't build anything. So what should I do about them? Thanks, Dave From amackey at virginia.edu Fri Sep 17 15:24:44 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 15:24:44 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: <4C93837C.4080008@gmail.com> References: <4C93837C.4080008@gmail.com> Message-ID: If there are ambi. codes in the database, then the expanded character class has to also include the original ambiguity code; non-ambiguous nucleotides must also be expanded to include all ambiguity codes that represent the nucleotide. -Aaron On Fri, Sep 17, 2010 at 11:04 AM, Roy Chaudhuri wrote: > Hi Shalabh, > > The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC > codes to regular expressions: > > $perl -e 'use Bio::Tools::SeqPattern; print > Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand' > [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C > > Although that won't work if there are also abiguity codes in your database. > For a non-BioPerl solution you could try fuzznuc from Emboss. > > Cheers. > Roy. > > > On 17/09/2010 15:28, Aaron Mackey wrote: > >> Convert the IUPAC code to a regular expression, and use regular >> expressions >> (in Perl or grep or similar) to find 100% identical matches. >> >> -Aaron >> >> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma >> wrote: >> >> Hi All, >>> I have few nucleotide sequences that are composed of IUPAC codes. >>> Like >>> >>>> test >>>> >>> VGSRVBSSSSSNSC >>> >>> Similarly i have a database made of of these kind of sequences. I want to >>> find sequences that are 100% similar to the query sequence. >>> >>> Is there any bioPerl module to deal with this, i tried normal blast but >>> it >>> didn't worked. >>> Do i have to convert these sequences to 4 base codes or there is any >>> other >>> way out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From amackey at virginia.edu Fri Sep 17 15:25:54 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 15:25:54 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: do your windowing/shifting on the unexpanded query sequences; then transform the 5-bp queries into regular expressions. -Aaron On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma wrote: > Thanks Aaron for your reply. > Actually i tried that first, but there is another problem, i have to divide > each query sequence to window size 5 with 1 base shift and its not possible > to divide regular expression in that way. > So what i am trying is to convert those iupac codes to 4 base code sequence > and then do the normal search. > Now the problem is that i cant able to convert those IUPAC sequences to > normal ones, i am still trying to write a script but its taking time. > > Thanks > Shalabh > > > On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: > >> Convert the IUPAC code to a regular expression, and use regular >> expressions (in Perl or grep or similar) to find 100% identical matches. >> >> -Aaron >> >> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma < >> shalabh.sharma7 at gmail.com> wrote: >> >>> Hi All, >>> I have few nucleotide sequences that are composed of IUPAC codes. >>> Like >>> >test >>> VGSRVBSSSSSNSC >>> >>> Similarly i have a database made of of these kind of sequences. I want to >>> find sequences that are 100% similar to the query sequence. >>> >>> Is there any bioPerl module to deal with this, i tried normal blast but >>> it >>> didn't worked. >>> Do i have to convert these sequences to 4 base codes or there is any >>> other >>> way out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > From Kevin.M.Brown at asu.edu Fri Sep 17 16:09:34 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 17 Sep 2010 13:09:34 -0700 Subject: [Bioperl-l] Installing using git after an older installation In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40701E0A4@EX02.asurite.ad.asu.edu> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_IN_A_PERSONAL_MODULE_AREA From shalabh.sharma7 at gmail.com Fri Sep 17 16:45:50 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 17 Sep 2010 16:45:50 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Thanks Aaron, changing the query sequence worked well but i am still struggling with the database. -Shalabh On Fri, Sep 17, 2010 at 3:25 PM, Aaron Mackey wrote: > do your windowing/shifting on the unexpanded query sequences; then > transform the 5-bp queries into regular expressions. > > -Aaron > > > On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma < > shalabh.sharma7 at gmail.com> wrote: > >> Thanks Aaron for your reply. >> Actually i tried that first, but there is another problem, i have to >> divide each query sequence to window size 5 with 1 base shift and its not >> possible to divide regular expression in that way. >> So what i am trying is to convert those iupac codes to 4 base code >> sequence and then do the normal search. >> Now the problem is that i cant able to convert those IUPAC sequences to >> normal ones, i am still trying to write a script but its taking time. >> >> Thanks >> Shalabh >> >> >> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: >> >>> Convert the IUPAC code to a regular expression, and use regular >>> expressions (in Perl or grep or similar) to find 100% identical matches. >>> >>> -Aaron >>> >>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma < >>> shalabh.sharma7 at gmail.com> wrote: >>> >>>> Hi All, >>>> I have few nucleotide sequences that are composed of IUPAC codes. >>>> Like >>>> >test >>>> VGSRVBSSSSSNSC >>>> >>>> Similarly i have a database made of of these kind of sequences. I want >>>> to >>>> find sequences that are 100% similar to the query sequence. >>>> >>>> Is there any bioPerl module to deal with this, i tried normal blast but >>>> it >>>> didn't worked. >>>> Do i have to convert these sequences to 4 base codes or there is any >>>> other >>>> way out. >>>> >>>> Thanks >>>> Shalabh >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >> > From heikki.lehvaslaiho at gmail.com Sat Sep 18 03:41:22 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Sat, 18 Sep 2010 10:41:22 +0300 Subject: [Bioperl-l] mpiblast In-Reply-To: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> References: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> Message-ID: Been running 1.6 and its betas on Blue Gene/P for months. The output is identical to standard BLAST output. No issues in parsing it with BioPerl. ? ?? -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849? office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 16 September 2010 21:25, Dave Messina wrote: >> Is the there a standalone like module that allows you to easily call mpiblast? > > No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward. > > ? ? ? ?http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase > > >> I'm assuming seqio with parse a mpiblast output file correctly? > > Yes, although I see that a new version of mpiblast was recently released. > > Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet? > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From david.breimann at gmail.com Sat Sep 18 05:05:58 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 11:05:58 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl Message-ID: Hello, I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag` in the fields and sometime it doesn't, even though the genabank has a locus tag. Also, is the ID always equivalent to the locus tag? Thanks, Dave From scott at scottcain.net Sat Sep 18 05:17:24 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 10:17:24 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, bp_genbank2gff3.pl suffers from the fact that it has to deal with GenBank files :-) It was designed initially to work on whole genome refseqs, and contains several ad hoc rules for trying to make it "do the right thing." In practice, it is not unusual for a post processing step (either by hand or a quicky perl script) to be required to really get it right. I don't recall the specifics (if I ever knew :-) for when and how the locus tag is used, but I do know that there is a list of things that it will try to use for the ID, and while the locus is on the list, I don't know where it comes in the list, so it's possible that other items might supersede it. Scott On Sat, Sep 18, 2010 at 10:05 AM, David Breimann wrote: > Hello, > > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag` > in the fields and sometime it doesn't, even though the genabank has a locus > tag. > Also, is the ID always equivalent to the locus tag? > > Thanks, > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 05:20:33 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 11:20:33 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Since locus_tag is an essential tag in genbank, I suggest locus_tag will be always added to the GFF last column if it exists in the genbank, whether it is used as ID in the GFF or not. On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain wrote: > Hi Dave, > > bp_genbank2gff3.pl suffers from the fact that it has to deal with > GenBank files :-) It was designed initially to work on whole genome > refseqs, and contains several ad hoc rules for trying to make it "do > the right thing." In practice, it is not unusual for a post > processing step (either by hand or a quicky perl script) to be > required to really get it right. I don't recall the specifics (if I > ever knew :-) for when and how the locus tag is used, but I do know > that there is a list of things that it will try to use for the ID, and > while the locus is on the list, I don't know where it comes in the > list, so it's possible that other items might supersede it. > > Scott > > > On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > wrote: > > Hello, > > > > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a > `locus_tag` > > in the fields and sometime it doesn't, even though the genabank has a > locus > > tag. > > Also, is the ID always equivalent to the locus tag? > > > > Thanks, > > Dave > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From scott at scottcain.net Sat Sep 18 06:08:26 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 11:08:26 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, That seems perfectly reasonable. If you could point out a GenBank entry for which that does not happen, I could try to figure out why not. Scott On Sat, Sep 18, 2010 at 10:20 AM, David Breimann wrote: > Since locus_tag is an essential tag in genbank, I suggest locus_tag will be > always added to the GFF last column if it exists in the genbank, whether it > is used as ID in the GFF or not. > > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain wrote: >> >> Hi Dave, >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with >> GenBank files :-) ?It was designed initially to work on whole genome >> refseqs, and contains several ad hoc rules for trying to make it "do >> the right thing." ?In practice, it is not unusual for a post >> processing step (either by hand or a quicky perl script) to be >> required to really get it right. ?I don't recall the specifics (if I >> ever knew :-) for when and how the locus tag is used, but I do know >> that there is a list of things that it will try to use for the ID, and >> while the locus is on the list, I don't know where it comes in the >> list, so it's possible that other items might supersede it. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> wrote: >> > Hello, >> > >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a >> > `locus_tag` >> > in the fields and sometime it doesn't, even though the genabank has a >> > locus >> > tag. >> > Also, is the ID always equivalent to the locus tag? >> > >> > Thanks, >> > Dave >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 06:20:50 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 12:20:50 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Scott, Here is a very short genbank: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk Note all genes in the genbank have locus tags. In the resulting GFF3, however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no idea why it deserves a special treatment... :) p.s. making this change (i.e., copying locus_tag to the GFF3 last column whenever available) will really make my life easier. Thank you, Dave On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain wrote: > Hi Dave, > > That seems perfectly reasonable. If you could point out a GenBank > entry for which that does not happen, I could try to figure out why > not. > > Scott > > > On Sat, Sep 18, 2010 at 10:20 AM, David Breimann > wrote: > > Since locus_tag is an essential tag in genbank, I suggest locus_tag will > be > > always added to the GFF last column if it exists in the genbank, whether > it > > is used as ID in the GFF or not. > > > > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain > wrote: > >> > >> Hi Dave, > >> > >> bp_genbank2gff3.pl suffers from the fact that it has to deal with > >> GenBank files :-) It was designed initially to work on whole genome > >> refseqs, and contains several ad hoc rules for trying to make it "do > >> the right thing." In practice, it is not unusual for a post > >> processing step (either by hand or a quicky perl script) to be > >> required to really get it right. I don't recall the specifics (if I > >> ever knew :-) for when and how the locus tag is used, but I do know > >> that there is a list of things that it will try to use for the ID, and > >> while the locus is on the list, I don't know where it comes in the > >> list, so it's possible that other items might supersede it. > >> > >> Scott > >> > >> > >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > >> wrote: > >> > Hello, > >> > > >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a > >> > `locus_tag` > >> > in the fields and sometime it doesn't, even though the genabank has a > >> > locus > >> > tag. > >> > Also, is the ID always equivalent to the locus tag? > >> > > >> > Thanks, > >> > Dave > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From david.breimann at gmail.com Sat Sep 18 06:45:13 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 12:45:13 +0200 Subject: [Bioperl-l] Extracting sequences from GFF3 Message-ID: As you know, GFF3 files can contain FASTA sequences after the features. How do I extract a specific FASTA sequence given it's ID? I tried: use Bio::Tools::GFF; use Data::Dumper; my $gffio = Bio::Tools::GFF->new( -file => "/path/to/file.gff", -gff_version => 3 ); print Dumper $gffio->get_seqs(); but $gffio->get_seqs() seems to return nothing, although the GFF3 has sequences and is also valid. By the way, I am able to parse the features themselves (using $gffio->next_feature()). Thanks, Dave From scott at scottcain.net Sat Sep 18 07:07:13 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 12:07:13 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, A fresh "pull" of the bioperl git repository shows that bp_genbank2gff3.pl already does this. It creates a locus_tag for all features that have a locus_tag, and uses the locus_tag for the ID when it can (it can't blindly use the locus tag for the ID since both the gene and the CDS have the same tag). Scott On Sat, Sep 18, 2010 at 11:20 AM, David Breimann wrote: > Hi Scott, > > Here is a very short genbank: > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > > Note all genes in the genbank have locus tags. In the resulting GFF3, > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no > idea why it deserves a special treatment... :) > > p.s. making this change (i.e., copying locus_tag to the GFF3 last column > whenever available) will really make my life easier. > > Thank you, > Dave > > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain wrote: >> >> Hi Dave, >> >> That seems perfectly reasonable. ?If you could point out a GenBank >> entry for which that does not happen, I could try to figure out why >> not. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> wrote: >> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will >> > be >> > always added to the GFF last column if it exists in the genbank, whether >> > it >> > is used as ID in the GFF or not. >> > >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> > wrote: >> >> >> >> Hi Dave, >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with >> >> GenBank files :-) ?It was designed initially to work on whole genome >> >> refseqs, and contains several ad hoc rules for trying to make it "do >> >> the right thing." ?In practice, it is not unusual for a post >> >> processing step (either by hand or a quicky perl script) to be >> >> required to really get it right. ?I don't recall the specifics (if I >> >> ever knew :-) for when and how the locus tag is used, but I do know >> >> that there is a list of things that it will try to use for the ID, and >> >> while the locus is on the list, I don't know where it comes in the >> >> list, so it's possible that other items might supersede it. >> >> >> >> Scott >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> wrote: >> >> > Hello, >> >> > >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a >> >> > `locus_tag` >> >> > in the fields and sometime it doesn't, even though the genabank has a >> >> > locus >> >> > tag. >> >> > Also, is the ID always equivalent to the locus tag? >> >> > >> >> > Thanks, >> >> > Dave >> >> > _______________________________________________ >> >> > Bioperl-l mailing list >> >> > Bioperl-l at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 07:13:23 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 12:13:23 +0100 Subject: [Bioperl-l] Extracting sequences from GFF3 In-Reply-To: References: Message-ID: Hi Dave, I would use Bio::DB::SeqFeature::Store (either with a database on the backend or a flat file if a database isn't warranted): my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory', -dir => 'path/to/file' ); # Warning: this returns a string, and not a PrimarySeq object my $sequence = $db->fetch_sequence('Chr1',5000=>6000); Scott On Sat, Sep 18, 2010 at 11:45 AM, David Breimann wrote: > As you know, GFF3 files can contain FASTA sequences after the features. > > How do I extract a specific FASTA sequence given it's ID? > > I tried: > > use Bio::Tools::GFF; > use Data::Dumper; > > my $gffio = Bio::Tools::GFF->new( > -file => > "/path/to/file.gff", > -gff_version => 3 > ); > > print Dumper $gffio->get_seqs(); > > but $gffio->get_seqs() seems to return nothing, although the GFF3 has > sequences and is also valid. > > By the way, I am able to parse the features themselves (using > $gffio->next_feature()). > > > Thanks, > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 09:40:35 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 14:40:35 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, Let's keep the discussion on the mailing list so we can make sure that when this problem is solved, its resolution will be archived. I don't really understand what is going on either, though it would probably be a good idea to set your PERL5LIB env variable so that when you execute this script from the git repository that it will also uses BioPerl modules in the git repository instead of the ones that are installed in your "normal" path. Also, are you using any command line flags when executing it? I didn't. Scott On Sat, Sep 18, 2010 at 2:14 PM, David Breimann wrote: > Yes, I'm using Ubuntu 10.04. > > That is really weired. I tried running the script from the perl-live dir > (which I just pulled using git), and I get the same results as before > (`Name` instead of `locus_tag`): > > ?$ wget > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y > NC_009789.genbank > > Attached is the resulting GFF3. > I also attach a copy of bp_genbank2gff3.pl as found under > /home/dave/src/bioperl-live/blib/script. > > This is a real mystery for me! > > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >> >> Typically I do build and install, but you can run it directly from the >> git checkout directory. >> >> For locating other versions of the script, are you running linux? ?If >> so, are you familiar with the "locate" command: >> >> ?locate bp_genbank2gff3.pl >> >> If you've never used it before, you may need to update the database >> the locate command uses as root: >> >> ?sudo updatedb >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >> wrote: >> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed >> > of >> > `locus_tag=`. >> > >> > I don't really know how to check for multiple bioperl installations. >> > I'm using my personal server, so I don't mind removing and installing >> > everything from scratch -- but I do'nt know ho to do that. >> > >> > Also, what I don't get with the git is how the scripts are supposed to >> > be >> > updated (unless you build and install). >> > >> > Thanks you! >> > >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain wrote: >> >> >> >> Well, if you aren't getting the same results as me then I'd say you >> >> aren't using the same version of the script :-) >> >> >> >> Unfortunately, the scripts are no longer automatically marked with the >> >> "internal" version information when committed, so there really isn't >> >> anything in the script I can tell you to look for. ?Check for more >> >> than one bioperl instance on your ?computer. >> >> >> >> I've attached the GFF3 file I got so you can look at it and tell me if >> >> it is what you expect. >> >> >> >> Scott >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >> >> wrote: >> >> > Hi Scott, >> >> > >> >> > I just pulled the lated bioperl-live using git. >> >> > I'm not sure how the scripts are updated, so I Build and installed >> >> > anyway >> >> > (perhaps exporting the path is supposed to be enough?) >> >> > Anyway, I still get the same results. No locus_tag. >> >> > How can I tell if I'm using the latest version of the script? >> >> > >> >> > Thanks again. >> >> > >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >> >> > wrote: >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> A fresh "pull" of the bioperl git repository shows that >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >> >> >> all >> >> >> features that have a locus_tag, and uses the locus_tag for the ID >> >> >> when >> >> >> it can (it can't blindly use the locus tag for the ID since both the >> >> >> gene and the CDS have the same tag). >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >> >> >> wrote: >> >> >> > Hi Scott, >> >> >> > >> >> >> > Here is a very short genbank: >> >> >> > >> >> >> > >> >> >> > >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> >> >> > >> >> >> > Note all genes in the genbank have locus tags. In the resulting >> >> >> > GFF3, >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I >> >> >> > have >> >> >> > no >> >> >> > idea why it deserves a special treatment... :) >> >> >> > >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last >> >> >> > column >> >> >> > whenever available) will really make my life easier. >> >> >> > >> >> >> > Thank you, >> >> >> > Dave >> >> >> > >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> That seems perfectly reasonable. ?If you could point out a >> >> >> >> GenBank >> >> >> >> entry for which that does not happen, I could try to figure out >> >> >> >> why >> >> >> >> not. >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> >> >> >> wrote: >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >> >> >> >> > locus_tag >> >> >> >> > will >> >> >> >> > be >> >> >> >> > always added to the GFF last column if it exists in the >> >> >> >> > genbank, >> >> >> >> > whether >> >> >> >> > it >> >> >> >> > is used as ID in the GFF or not. >> >> >> >> > >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal >> >> >> >> >> with >> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole >> >> >> >> >> genome >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make >> >> >> >> >> it >> >> >> >> >> "do >> >> >> >> >> the right thing." ?In practice, it is not unusual for a post >> >> >> >> >> processing step (either by hand or a quicky perl script) to be >> >> >> >> >> required to really get it right. ?I don't recall the specifics >> >> >> >> >> (if I >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I do >> >> >> >> >> know >> >> >> >> >> that there is a list of things that it will try to use for the >> >> >> >> >> ID, >> >> >> >> >> and >> >> >> >> >> while the locus is on the list, I don't know where it comes in >> >> >> >> >> the >> >> >> >> >> list, so it's possible that other items might supersede it. >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> >> >> >> wrote: >> >> >> >> >> > Hello, >> >> >> >> >> > >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds >> >> >> >> >> > a >> >> >> >> >> > `locus_tag` >> >> >> >> >> > in the fields and sometime it doesn't, even though the >> >> >> >> >> > genabank >> >> >> >> >> > has a >> >> >> >> >> > locus >> >> >> >> >> > tag. >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >> >> >> >> >> > >> >> >> >> >> > Thanks, >> >> >> >> >> > Dave >> >> >> >> >> > _______________________________________________ >> >> >> >> >> > Bioperl-l mailing list >> >> >> >> >> > Bioperl-l at lists.open-bio.org >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> >> scottcain >> >> >> >> >> dot net >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> 216-392-3087 >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> scottcain >> >> >> >> dot net >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> 216-392-3087 >> >> >> >> Ontario Institute for Cancer Research >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> scottcain >> >> >> dot net >> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> >> Ontario Institute for Cancer Research >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 09:48:35 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 14:48:35 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, The blib directory is not part of the repository; it is created when you execute ./Build as a staging area before installation. The directory that the script resides is scripts/Bio-DB-GFF/ Scott On Sat, Sep 18, 2010 at 2:40 PM, David Breimann wrote: > Now I did a fresh clone (instead of pull) into a new dir: > > $ git clone http://github.com/bioperl/bioperl-live.git > > but I don't find the script at all (there is no blib dir as before)... > > > On Sat, Sep 18, 2010 at 3:14 PM, David Breimann > wrote: >> >> Yes, I'm using Ubuntu 10.04. >> >> That is really weired. I tried running the script from the perl-live dir >> (which I just pulled using git), and I get the same results as before >> (`Name` instead of `locus_tag`): >> >> ?$ wget >> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y >> NC_009789.genbank >> >> Attached is the resulting GFF3. >> I also attach a copy of bp_genbank2gff3.pl as found under >> /home/dave/src/bioperl-live/blib/script. >> >> This is a real mystery for me! >> >> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >>> >>> Typically I do build and install, but you can run it directly from the >>> git checkout directory. >>> >>> For locating other versions of the script, are you running linux? ?If >>> so, are you familiar with the "locate" command: >>> >>> ?locate bp_genbank2gff3.pl >>> >>> If you've never used it before, you may need to update the database >>> the locate command uses as root: >>> >>> ?sudo updatedb >>> >>> Scott >>> >>> >>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >>> wrote: >>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed >>> > of >>> > `locus_tag=`. >>> > >>> > I don't really know how to check for multiple bioperl installations. >>> > I'm using my personal server, so I don't mind removing and installing >>> > everything from scratch -- but I do'nt know ho to do that. >>> > >>> > Also, what I don't get with the git is how the scripts are supposed to >>> > be >>> > updated (unless you build and install). >>> > >>> > Thanks you! >>> > >>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain >>> > wrote: >>> >> >>> >> Well, if you aren't getting the same results as me then I'd say you >>> >> aren't using the same version of the script :-) >>> >> >>> >> Unfortunately, the scripts are no longer automatically marked with the >>> >> "internal" version information when committed, so there really isn't >>> >> anything in the script I can tell you to look for. ?Check for more >>> >> than one bioperl instance on your ?computer. >>> >> >>> >> I've attached the GFF3 file I got so you can look at it and tell me if >>> >> it is what you expect. >>> >> >>> >> Scott >>> >> >>> >> >>> >> >>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >>> >> wrote: >>> >> > Hi Scott, >>> >> > >>> >> > I just pulled the lated bioperl-live using git. >>> >> > I'm not sure how the scripts are updated, so I Build and installed >>> >> > anyway >>> >> > (perhaps exporting the path is supposed to be enough?) >>> >> > Anyway, I still get the same results. No locus_tag. >>> >> > How can I tell if I'm using the latest version of the script? >>> >> > >>> >> > Thanks again. >>> >> > >>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >>> >> > wrote: >>> >> >> >>> >> >> Hi Dave, >>> >> >> >>> >> >> A fresh "pull" of the bioperl git repository shows that >>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >>> >> >> all >>> >> >> features that have a locus_tag, and uses the locus_tag for the ID >>> >> >> when >>> >> >> it can (it can't blindly use the locus tag for the ID since both >>> >> >> the >>> >> >> gene and the CDS have the same tag). >>> >> >> >>> >> >> Scott >>> >> >> >>> >> >> >>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >>> >> >> wrote: >>> >> >> > Hi Scott, >>> >> >> > >>> >> >> > Here is a very short genbank: >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >>> >> >> > >>> >> >> > Note all genes in the genbank have locus tags. In the resulting >>> >> >> > GFF3, >>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I >>> >> >> > have >>> >> >> > no >>> >> >> > idea why it deserves a special treatment... :) >>> >> >> > >>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last >>> >> >> > column >>> >> >> > whenever available) will really make my life easier. >>> >> >> > >>> >> >> > Thank you, >>> >> >> > Dave >>> >> >> > >>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >>> >> >> > >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Hi Dave, >>> >> >> >> >>> >> >> >> That seems perfectly reasonable. ?If you could point out a >>> >> >> >> GenBank >>> >> >> >> entry for which that does not happen, I could try to figure out >>> >> >> >> why >>> >> >> >> not. >>> >> >> >> >>> >> >> >> Scott >>> >> >> >> >>> >> >> >> >>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >>> >> >> >> wrote: >>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >>> >> >> >> > locus_tag >>> >> >> >> > will >>> >> >> >> > be >>> >> >> >> > always added to the GFF last column if it exists in the >>> >> >> >> > genbank, >>> >> >> >> > whether >>> >> >> >> > it >>> >> >> >> > is used as ID in the GFF or not. >>> >> >> >> > >>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >>> >> >> >> > >>> >> >> >> > wrote: >>> >> >> >> >> >>> >> >> >> >> Hi Dave, >>> >> >> >> >> >>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal >>> >> >> >> >> with >>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole >>> >> >> >> >> genome >>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make >>> >> >> >> >> it >>> >> >> >> >> "do >>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post >>> >> >> >> >> processing step (either by hand or a quicky perl script) to >>> >> >> >> >> be >>> >> >> >> >> required to really get it right. ?I don't recall the >>> >> >> >> >> specifics >>> >> >> >> >> (if I >>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I >>> >> >> >> >> do >>> >> >> >> >> know >>> >> >> >> >> that there is a list of things that it will try to use for >>> >> >> >> >> the >>> >> >> >> >> ID, >>> >> >> >> >> and >>> >> >> >> >> while the locus is on the list, I don't know where it comes >>> >> >> >> >> in >>> >> >> >> >> the >>> >> >> >> >> list, so it's possible that other items might supersede it. >>> >> >> >> >> >>> >> >> >> >> Scott >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >>> >> >> >> >> wrote: >>> >> >> >> >> > Hello, >>> >> >> >> >> > >>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it >>> >> >> >> >> > adds a >>> >> >> >> >> > `locus_tag` >>> >> >> >> >> > in the fields and sometime it doesn't, even though the >>> >> >> >> >> > genabank >>> >> >> >> >> > has a >>> >> >> >> >> > locus >>> >> >> >> >> > tag. >>> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >>> >> >> >> >> > >>> >> >> >> >> > Thanks, >>> >> >> >> >> > Dave >>> >> >> >> >> > _______________________________________________ >>> >> >> >> >> > Bioperl-l mailing list >>> >> >> >> >> > Bioperl-l at lists.open-bio.org >>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> > >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> -- >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> ------------------------------------------------------------------------ >>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> >> >> scottcain >>> >> >> >> >> dot net >>> >> >> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> >> >> 216-392-3087 >>> >> >> >> >> Ontario Institute for Cancer Research >>> >> >> >> > >>> >> >> >> > >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> -- >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> ------------------------------------------------------------------------ >>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> >> scottcain >>> >> >> >> dot net >>> >> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> >> 216-392-3087 >>> >> >> >> Ontario Institute for Cancer Research >>> >> >> > >>> >> >> > >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> >>> >> >> >>> >> >> ------------------------------------------------------------------------ >>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> scottcain >>> >> >> dot net >>> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> 216-392-3087 >>> >> >> Ontario Institute for Cancer Research >>> >> > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> >>> >> ------------------------------------------------------------------------ >>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> scottcain >>> >> dot net >>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >>> >> Ontario Institute for Cancer Research >>> > >>> > >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >>> Ontario Institute for Cancer Research >> > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 09:57:30 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 15:57:30 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: So let's do an intermediate summary of my situation: I'm using Ubuntu 10.04 and Perl 5.10.1. I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of "locus_tag=" in the last GFF3 column), while Scott gets the expected results while using the latest version of bioperl. I cloned a fresh version of bioperl live into my ~/src: $ cd ~/src $ git clone http://github.com/bioperl/bioperl-live.git I then added the following line to the end of ~/.profile: export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB" and ran $ source ~/.profile I then downloaded a small genome from NCBI $ wget ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk and tested the script: $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk Following are the top 10 lines of the resulting GFF3: ##gff-version 3 # sequence-region NC_009789 1 6199 # conversion-by bp_genbank2gff3.pl # organism Escherichia coli E24377A # date 06-JAN-2010 # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. NC_009789 GenBank region 1 6199 . + 1 ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli E24377A;plasmid=pETEC_6;strain=E24377A NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001 NC_009789 GenBank mRNA 665 781 . - 1 ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 NC_009789 GenBank CDS 665 781 . - 1 ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified by glimmer%3B putative;codon_start=1;product=hypothetical protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 while these are from Scotts' file: ##gff-version 3 # sequence-region NC_009789 1 6199 # conversion-by bp_genbank2gff3.pl # organism Escherichia coli E24377A # date 06-JAN-2010 # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. NC_009789 GenBank region 1 6199 . + 1 ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli E24377A;plasmid=pETEC_6;strain=E24377A NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001 NC_009789 GenBank mRNA 665 781 . - 1 ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 NC_009789 GenBank CDS 665 781 . - 1 ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified by glimmer%3B putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's, as desired. I have no idea what is going on here... Best, Dave On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain wrote: > Hi Dave, > > Let's keep the discussion on the mailing list so we can make sure that > when this problem is solved, its resolution will be archived. > > I don't really understand what is going on either, though it would > probably be a good idea to set your PERL5LIB env variable so that when > you execute this script from the git repository that it will also uses > BioPerl modules in the git repository instead of the ones that are > installed in your "normal" path. > > Also, are you using any command line flags when executing it? I didn't. > > Scott > > > On Sat, Sep 18, 2010 at 2:14 PM, David Breimann > wrote: > > Yes, I'm using Ubuntu 10.04. > > > > That is really weired. I tried running the script from the perl-live dir > > (which I just pulled using git), and I get the same results as before > > (`Name` instead of `locus_tag`): > > > > $ wget > > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > > $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y > > NC_009789.genbank > > > > Attached is the resulting GFF3. > > I also attach a copy of bp_genbank2gff3.pl as found under > > /home/dave/src/bioperl-live/blib/script. > > > > This is a real mystery for me! > > > > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: > >> > >> Typically I do build and install, but you can run it directly from the > >> git checkout directory. > >> > >> For locating other versions of the script, are you running linux? If > >> so, are you familiar with the "locate" command: > >> > >> locate bp_genbank2gff3.pl > >> > >> If you've never used it before, you may need to update the database > >> the locate command uses as root: > >> > >> sudo updatedb > >> > >> Scott > >> > >> > >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann > >> wrote: > >> > Your gff seems fine. I get a vey similiar one, but with `Name=` > instaed > >> > of > >> > `locus_tag=`. > >> > > >> > I don't really know how to check for multiple bioperl installations. > >> > I'm using my personal server, so I don't mind removing and installing > >> > everything from scratch -- but I do'nt know ho to do that. > >> > > >> > Also, what I don't get with the git is how the scripts are supposed to > >> > be > >> > updated (unless you build and install). > >> > > >> > Thanks you! > >> > > >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain > wrote: > >> >> > >> >> Well, if you aren't getting the same results as me then I'd say you > >> >> aren't using the same version of the script :-) > >> >> > >> >> Unfortunately, the scripts are no longer automatically marked with > the > >> >> "internal" version information when committed, so there really isn't > >> >> anything in the script I can tell you to look for. Check for more > >> >> than one bioperl instance on your computer. > >> >> > >> >> I've attached the GFF3 file I got so you can look at it and tell me > if > >> >> it is what you expect. > >> >> > >> >> Scott > >> >> > >> >> > >> >> > >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann > >> >> wrote: > >> >> > Hi Scott, > >> >> > > >> >> > I just pulled the lated bioperl-live using git. > >> >> > I'm not sure how the scripts are updated, so I Build and installed > >> >> > anyway > >> >> > (perhaps exporting the path is supposed to be enough?) > >> >> > Anyway, I still get the same results. No locus_tag. > >> >> > How can I tell if I'm using the latest version of the script? > >> >> > > >> >> > Thanks again. > >> >> > > >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain > >> >> > wrote: > >> >> >> > >> >> >> Hi Dave, > >> >> >> > >> >> >> A fresh "pull" of the bioperl git repository shows that > >> >> >> bp_genbank2gff3.pl already does this. It creates a locus_tag for > >> >> >> all > >> >> >> features that have a locus_tag, and uses the locus_tag for the ID > >> >> >> when > >> >> >> it can (it can't blindly use the locus tag for the ID since both > the > >> >> >> gene and the CDS have the same tag). > >> >> >> > >> >> >> Scott > >> >> >> > >> >> >> > >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann > >> >> >> wrote: > >> >> >> > Hi Scott, > >> >> >> > > >> >> >> > Here is a very short genbank: > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > >> >> >> > > >> >> >> > Note all genes in the genbank have locus tags. In the resulting > >> >> >> > GFF3, > >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. > I > >> >> >> > have > >> >> >> > no > >> >> >> > idea why it deserves a special treatment... :) > >> >> >> > > >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 > last > >> >> >> > column > >> >> >> > whenever available) will really make my life easier. > >> >> >> > > >> >> >> > Thank you, > >> >> >> > Dave > >> >> >> > > >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain < > scott at scottcain.net> > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> Hi Dave, > >> >> >> >> > >> >> >> >> That seems perfectly reasonable. If you could point out a > >> >> >> >> GenBank > >> >> >> >> entry for which that does not happen, I could try to figure out > >> >> >> >> why > >> >> >> >> not. > >> >> >> >> > >> >> >> >> Scott > >> >> >> >> > >> >> >> >> > >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann > >> >> >> >> wrote: > >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest > >> >> >> >> > locus_tag > >> >> >> >> > will > >> >> >> >> > be > >> >> >> >> > always added to the GFF last column if it exists in the > >> >> >> >> > genbank, > >> >> >> >> > whether > >> >> >> >> > it > >> >> >> >> > is used as ID in the GFF or not. > >> >> >> >> > > >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain > >> >> >> >> > > >> >> >> >> > wrote: > >> >> >> >> >> > >> >> >> >> >> Hi Dave, > >> >> >> >> >> > >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to > deal > >> >> >> >> >> with > >> >> >> >> >> GenBank files :-) It was designed initially to work on > whole > >> >> >> >> >> genome > >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to > make > >> >> >> >> >> it > >> >> >> >> >> "do > >> >> >> >> >> the right thing." In practice, it is not unusual for a post > >> >> >> >> >> processing step (either by hand or a quicky perl script) to > be > >> >> >> >> >> required to really get it right. I don't recall the > specifics > >> >> >> >> >> (if I > >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I > do > >> >> >> >> >> know > >> >> >> >> >> that there is a list of things that it will try to use for > the > >> >> >> >> >> ID, > >> >> >> >> >> and > >> >> >> >> >> while the locus is on the list, I don't know where it comes > in > >> >> >> >> >> the > >> >> >> >> >> list, so it's possible that other items might supersede it. > >> >> >> >> >> > >> >> >> >> >> Scott > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > >> >> >> >> >> wrote: > >> >> >> >> >> > Hello, > >> >> >> >> >> > > >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it > adds > >> >> >> >> >> > a > >> >> >> >> >> > `locus_tag` > >> >> >> >> >> > in the fields and sometime it doesn't, even though the > >> >> >> >> >> > genabank > >> >> >> >> >> > has a > >> >> >> >> >> > locus > >> >> >> >> >> > tag. > >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? > >> >> >> >> >> > > >> >> >> >> >> > Thanks, > >> >> >> >> >> > Dave > >> >> >> >> >> > _______________________________________________ > >> >> >> >> >> > Bioperl-l mailing list > >> >> >> >> >> > Bioperl-l at lists.open-bio.org > >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> >> >> >> >> > > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> -- > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > ------------------------------------------------------------------------ > >> >> >> >> >> Scott Cain, Ph. D. scott > at > >> >> >> >> >> scottcain > >> >> >> >> >> dot net > >> >> >> >> >> GMOD Coordinator (http://gmod.org/) > >> >> >> >> >> 216-392-3087 > >> >> >> >> >> Ontario Institute for Cancer Research > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ------------------------------------------------------------------------ > >> >> >> >> Scott Cain, Ph. D. scott at > >> >> >> >> scottcain > >> >> >> >> dot net > >> >> >> >> GMOD Coordinator (http://gmod.org/) > >> >> >> >> 216-392-3087 > >> >> >> >> Ontario Institute for Cancer Research > >> >> >> > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> > >> >> >> > >> >> >> > ------------------------------------------------------------------------ > >> >> >> Scott Cain, Ph. D. scott at > >> >> >> scottcain > >> >> >> dot net > >> >> >> GMOD Coordinator (http://gmod.org/) > 216-392-3087 > >> >> >> Ontario Institute for Cancer Research > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> > ------------------------------------------------------------------------ > >> >> Scott Cain, Ph. D. scott at > scottcain > >> >> dot net > >> >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> >> Ontario Institute for Cancer Research > >> > > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From scott at scottcain.net Sat Sep 18 10:03:43 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 15:03:43 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: The only thing I can add is that I did a 'git diff genbank2gff3.PLS' and found no differences. It occurred to me that perhaps I'd done some fixing and not commited it, but it looks to me that that's not the case (assuming I've managed to use git correctly (not a great assumption, but I don't have another one to work with :-)) Scott On Sat, Sep 18, 2010 at 2:57 PM, David Breimann wrote: > So let's do an intermediate summary of my situation: > I'm using Ubuntu 10.04 and Perl 5.10.1. > I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of > "locus_tag=" in the last GFF3 column), while Scott gets the expected results > while using the latest version of bioperl. > I cloned a fresh version of bioperl live into my ~/src: > $ cd ~/src > $ git clone http://github.com/bioperl/bioperl-live.git > > I then added the following line to the end of ~/.profile: > export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB" > and ran > $ source ~/.profile > > I then downloaded a small genome from NCBI > $ wget > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > and tested the script: > $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk > > Following are the top 10 lines of the resulting GFF3: > > ##gff-version 3 > # sequence-region NC_009789 1 6199 > # conversion-by bp_genbank2gff3.pl > # organism Escherichia coli E24377A > # date 06-JAN-2010 > # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. > NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1 > ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia > coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This > record has not yet been subject to final NCBI review. The reference sequence > was derived from CP000798. Source DNA and bacteria available from Jacques > Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL > REFSEQ: This record has not yet been subject to final NCBI review. The > reference sequence was derived from CP000798. Source DNA and bacteria > available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. > ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli > E24377A;plasmid=pETEC_6;strain=E24377A > NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001 > NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 > NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified > by glimmer%3B putative;codon_start=1;product=hypothetical > protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 > > while these are from Scotts' file: > ##gff-version 3 > # sequence-region NC_009789 1 6199 > # conversion-by bp_genbank2gff3.pl > # organism Escherichia coli E24377A > # date 06-JAN-2010 > # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. > NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1 > ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A > plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has > not yet been subject to final NCBI review. The reference sequence was > derived from CP000798. Source DNA and bacteria available from Jacques Ravel > (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: > This record has not yet been subject to final NCBI review. The reference > sequence was derived from CP000798. Source DNA and bacteria available from > Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. > ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli > E24377A;plasmid=pETEC_6;strain=E24377A > NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001 > NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 > NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified > by glimmer%3B > putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical > protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 > > > Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's, > as desired. > I have no idea what is going on here... > > Best, > Dave > > On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain wrote: >> >> Hi Dave, >> >> Let's keep the discussion on the mailing list so we can make sure that >> when this problem is solved, its resolution will be archived. >> >> I don't really understand what is going on either, though it would >> probably be a good idea to set your PERL5LIB env variable so that when >> you execute this script from the git repository that it will also uses >> BioPerl modules in the git repository instead of the ones that are >> installed in your "normal" path. >> >> Also, are you using any command line flags when executing it? ?I didn't. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann >> wrote: >> > Yes, I'm using Ubuntu 10.04. >> > >> > That is really weired. I tried running the script from the perl-live dir >> > (which I just pulled using git), and I get the same results as before >> > (`Name` instead of `locus_tag`): >> > >> > ?$ wget >> > >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y >> > NC_009789.genbank >> > >> > Attached is the resulting GFF3. >> > I also attach a copy of bp_genbank2gff3.pl as found under >> > /home/dave/src/bioperl-live/blib/script. >> > >> > This is a real mystery for me! >> > >> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >> >> >> >> Typically I do build and install, but you can run it directly from the >> >> git checkout directory. >> >> >> >> For locating other versions of the script, are you running linux? ?If >> >> so, are you familiar with the "locate" command: >> >> >> >> ?locate bp_genbank2gff3.pl >> >> >> >> If you've never used it before, you may need to update the database >> >> the locate command uses as root: >> >> >> >> ?sudo updatedb >> >> >> >> Scott >> >> >> >> >> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >> >> wrote: >> >> > Your gff seems fine. I get a vey similiar one, but with `Name=` >> >> > instaed >> >> > of >> >> > `locus_tag=`. >> >> > >> >> > I don't really know how to check for multiple bioperl installations. >> >> > I'm using my personal server, so I don't mind removing and installing >> >> > everything from scratch -- but I do'nt know ho to do that. >> >> > >> >> > Also, what I don't get with the git is how the scripts are supposed >> >> > to >> >> > be >> >> > updated (unless you build and install). >> >> > >> >> > Thanks you! >> >> > >> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain >> >> > wrote: >> >> >> >> >> >> Well, if you aren't getting the same results as me then I'd say you >> >> >> aren't using the same version of the script :-) >> >> >> >> >> >> Unfortunately, the scripts are no longer automatically marked with >> >> >> the >> >> >> "internal" version information when committed, so there really isn't >> >> >> anything in the script I can tell you to look for. ?Check for more >> >> >> than one bioperl instance on your ?computer. >> >> >> >> >> >> I've attached the GFF3 file I got so you can look at it and tell me >> >> >> if >> >> >> it is what you expect. >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >> >> >> wrote: >> >> >> > Hi Scott, >> >> >> > >> >> >> > I just pulled the lated bioperl-live using git. >> >> >> > I'm not sure how the scripts are updated, so I Build and installed >> >> >> > anyway >> >> >> > (perhaps exporting the path is supposed to be enough?) >> >> >> > Anyway, I still get the same results. No locus_tag. >> >> >> > How can I tell if I'm using the latest version of the script? >> >> >> > >> >> >> > Thanks again. >> >> >> > >> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> A fresh "pull" of the bioperl git repository shows that >> >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >> >> >> >> all >> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID >> >> >> >> when >> >> >> >> it can (it can't blindly use the locus tag for the ID since both >> >> >> >> the >> >> >> >> gene and the CDS have the same tag). >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >> >> >> >> wrote: >> >> >> >> > Hi Scott, >> >> >> >> > >> >> >> >> > Here is a very short genbank: >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> >> >> >> > >> >> >> >> > Note all genes in the genbank have locus tags. In the resulting >> >> >> >> > GFF3, >> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. >> >> >> >> > I >> >> >> >> > have >> >> >> >> > no >> >> >> >> > idea why it deserves a special treatment... :) >> >> >> >> > >> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 >> >> >> >> > last >> >> >> >> > column >> >> >> >> > whenever available) will really make my life easier. >> >> >> >> > >> >> >> >> > Thank you, >> >> >> >> > Dave >> >> >> >> > >> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> That seems perfectly reasonable. ?If you could point out a >> >> >> >> >> GenBank >> >> >> >> >> entry for which that does not happen, I could try to figure >> >> >> >> >> out >> >> >> >> >> why >> >> >> >> >> not. >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> >> >> >> >> wrote: >> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >> >> >> >> >> > locus_tag >> >> >> >> >> > will >> >> >> >> >> > be >> >> >> >> >> > always added to the GFF last column if it exists in the >> >> >> >> >> > genbank, >> >> >> >> >> > whether >> >> >> >> >> > it >> >> >> >> >> > is used as ID in the GFF or not. >> >> >> >> >> > >> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> >> >> >> >> > >> >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to >> >> >> >> >> >> deal >> >> >> >> >> >> with >> >> >> >> >> >> GenBank files :-) ?It was designed initially to work on >> >> >> >> >> >> whole >> >> >> >> >> >> genome >> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to >> >> >> >> >> >> make >> >> >> >> >> >> it >> >> >> >> >> >> "do >> >> >> >> >> >> the right thing." ?In practice, it is not unusual for a >> >> >> >> >> >> post >> >> >> >> >> >> processing step (either by hand or a quicky perl script) to >> >> >> >> >> >> be >> >> >> >> >> >> required to really get it right. ?I don't recall the >> >> >> >> >> >> specifics >> >> >> >> >> >> (if I >> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I >> >> >> >> >> >> do >> >> >> >> >> >> know >> >> >> >> >> >> that there is a list of things that it will try to use for >> >> >> >> >> >> the >> >> >> >> >> >> ID, >> >> >> >> >> >> and >> >> >> >> >> >> while the locus is on the list, I don't know where it comes >> >> >> >> >> >> in >> >> >> >> >> >> the >> >> >> >> >> >> list, so it's possible that other items might supersede it. >> >> >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> >> >> >> >> wrote: >> >> >> >> >> >> > Hello, >> >> >> >> >> >> > >> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it >> >> >> >> >> >> > adds >> >> >> >> >> >> > a >> >> >> >> >> >> > `locus_tag` >> >> >> >> >> >> > in the fields and sometime it doesn't, even though the >> >> >> >> >> >> > genabank >> >> >> >> >> >> > has a >> >> >> >> >> >> > locus >> >> >> >> >> >> > tag. >> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >> >> >> >> >> >> > >> >> >> >> >> >> > Thanks, >> >> >> >> >> >> > Dave >> >> >> >> >> >> > _______________________________________________ >> >> >> >> >> >> > Bioperl-l mailing list >> >> >> >> >> >> > Bioperl-l at lists.open-bio.org >> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott >> >> >> >> >> >> at >> >> >> >> >> >> scottcain >> >> >> >> >> >> dot net >> >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> >> 216-392-3087 >> >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> >> scottcain >> >> >> >> >> dot net >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> 216-392-3087 >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> scottcain >> >> >> >> dot net >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> 216-392-3087 >> >> >> >> Ontario Institute for Cancer Research >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> scottcain >> >> >> dot net >> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> >> Ontario Institute for Cancer Research >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From j.scholtalbers at gmail.com Mon Sep 20 04:04:34 2010 From: j.scholtalbers at gmail.com (Jelle Scholtalbers) Date: Mon, 20 Sep 2010 10:04:34 +0200 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: Hi, I'm trying to get all descendents for a specific taxon using Entrez. each_Descendent and get_all_Descendents don't seem to be implemented or working. I then tried by getting the tree for this taxon using Bio::DB::Taxonomy's get_tree. However this only retrieves the ancestors/parents. What would be the best approach here? Cheers, Jelle On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > Eric > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > wrote: > > Sounds like this is going through an initial indexing step (for > flatfiles). I would expect the initial indexing of the tables to take time > as you have to create the DB, but subsequent lookups post-indexing should be > much faster if the index is already present. Maybe Jason could answer in > more detail? > > > > chris > > > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > >> Hello, > >> > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl > >> 5.8.5 with BioPerl 1.6.0 > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > >> > >> It ran for 100 cpu seconds and output: > >> > >> 33090 Viridiplantae kingdom > >> > >> I was expecting it to also output the descendents. Some questions: > >> > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > >> implemented? It looks to be in Taxon.pm but it is not documented and > >> when I ran Data::Dumper on $node the value '_desc' was empty. > >> > >> 2) is the flatfile reader always so slow? after replacing 'flatfile' > >> with a call to 'entrez' it took only 0.02 cpu seconds to come > >> up with the same result. > >> > >> thanks, > >> Eric > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From pcantalupo at gmail.com Mon Sep 20 10:46:32 2010 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Mon, 20 Sep 2010 10:46:32 -0400 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: Jelle, Below is my subroutine that returns the lineage corresponding to a Taxonomy id. For example, if you use 10633 as the taxid, the subroutine will return: Viruses dsDNA viruses, no RNA stage Polyomaviridae Polyomavirus Simian virus 40 I hope this is what you wanted. Good luck sub taxid2lineage { ?? my ($id) = @_; ?? return undef unless ($id); ?? my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', ????????????????????????????????????????? -db??? => 'taxonomy', ????????????????????????????????????????? -email => 'pcantalupo at gmail.com', ????????????????????????????????????????? -id??? => [ $id ], ????????????????????????????????????????? ); ?? my $res = $factory->get_Response->content; ?? my $data = XMLin($res); ?? if (!ref($data)) { ????? # this happens when the Taxid is not found in the Taxonomy DB ????? return $data; ?? } ?? my @lineage = (); ?? foreach my $taxa (@{ $data->{Taxon}->{LineageEx}->{Taxon} } ) { ????? # taxa is a hash with three keys ScientificName, TaxId, and Rank ????? # I'm only saving the ScientificName but possible extensions to this ????? # subroutine would be to return the TaxId and Rank as well. ????? push (@lineage, $taxa->{ScientificName}); ?? } ?? # add the Species to the end of the Lineage array. ?? push (@lineage, $data->{Taxon}->{ScientificName}); ?? return wantarray ? return @lineage : join("; ", @lineage); } Paul Cantalupo University of Pittsburgh On Mon, Sep 20, 2010 at 4:04 AM, Jelle Scholtalbers wrote: > > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. ?I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > > Eric > > > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > > wrote: > > > Sounds like this is going through an initial indexing step (for > > flatfiles). ?I would expect the initial indexing of the tables to take time > > as you have to create the DB, but subsequent lookups post-indexing should be > > much faster if the index is already present. ?Maybe Jason could answer in > > more detail? > > > > > > chris > > > > > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > > > >> Hello, > > >> > > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl > > >> 5.8.5 with BioPerl 1.6.0 > > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > >> > > >> It ran for 100 cpu seconds and output: > > >> > > >> 33090 Viridiplantae kingdom > > >> > > >> I was expecting it to also output the descendents. Some questions: > > >> > > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > > >> implemented? It looks to be in Taxon.pm but it is not documented and > > >> when I ran Data::Dumper on $node the value '_desc' was empty. > > >> > > >> 2) is the flatfile reader always so slow? after replacing 'flatfile' > > >> with a call to 'entrez' it took only 0.02 cpu seconds to come > > >> up with the same result. > > >> > > >> thanks, > > >> Eric > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Sep 20 11:38:36 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 20 Sep 2010 08:38:36 -0700 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: <4C977FFC.5000205@bioperl.org> This works for me to get all the descendents from sub-node. You have to call the function with the dabatase handle. I am not sure if the Taxon implementation has reference to the dbhandle or not: #!/usr/bin/perl -w use strict; use Bio::DB::Taxonomy; my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into this directory my $db = Bio::DB::Taxonomy->new(-source => 'flatfile', -nodesfile => "$dbdir/nodes.dmp", -namesfile => "$dbdir/names.dmp", ); my $taxa = $db->get_taxon(-taxonid => 151341); my @d = $db->get_all_Descendents($taxa); print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name } @d), "\n"; Hope that helps. Jelle Scholtalbers wrote, On 9/20/10 1:04 AM: > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > >> Thanks, that was indeed the answer to #2. Any idea about each_Descendent? >> Eric >> >> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields >> wrote: >> >>> Sounds like this is going through an initial indexing step (for >>> >> flatfiles). I would expect the initial indexing of the tables to take time >> as you have to create the DB, but subsequent lookups post-indexing should be >> much faster if the index is already present. Maybe Jason could answer in >> more detail? >> >>> chris >>> >>> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: >>> >>> >>>> Hello, >>>> >>>> I tried the Bio::DB::Taxonomy example on this wiki page using perl >>>> 5.8.5 with BioPerl 1.6.0 >>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>> >>>> It ran for 100 cpu seconds and output: >>>> >>>> 33090 Viridiplantae kingdom >>>> >>>> I was expecting it to also output the descendents. Some questions: >>>> >>>> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually >>>> implemented? It looks to be in Taxon.pm but it is not documented and >>>> when I ran Data::Dumper on $node the value '_desc' was empty. >>>> >>>> 2) is the flatfile reader always so slow? after replacing 'flatfile' >>>> with a call to 'entrez' it took only 0.02 cpu seconds to come >>>> up with the same result. >>>> >>>> thanks, >>>> Eric >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From j.scholtalbers at gmail.com Wed Sep 22 03:46:35 2010 From: j.scholtalbers at gmail.com (Jelle Scholtalbers) Date: Wed, 22 Sep 2010 09:46:35 +0200 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: <4C977FFC.5000205@bioperl.org> References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> <4C977FFC.5000205@bioperl.org> Message-ID: Hi Jason, this was the same method I was using. With the taxdump it works apparently, however it does not work with Entrez as source. So I will just stick to a up2date taxdump then. Thanks for your example. @Paul: Your method gives indeed the lineage but will only retrieve the ancestors. I want to retrieve all the descendents. Thx anyway. Cheers, Jelle On Mon, Sep 20, 2010 at 5:38 PM, Jason Stajich wrote: > > This works for me to get all the descendents from sub-node. You have to > call the function with the dabatase handle. I am not sure if the Taxon > implementation has reference to the dbhandle or not: > #!/usr/bin/perl -w > use strict; > use Bio::DB::Taxonomy; > my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into > this directory > my $db = Bio::DB::Taxonomy->new(-source => 'flatfile', > -nodesfile => "$dbdir/nodes.dmp", > -namesfile => "$dbdir/names.dmp", > ); > my $taxa = $db->get_taxon(-taxonid => 151341); > my @d = $db->get_all_Descendents($taxa); > > print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name > } @d), "\n"; > > > Hope that helps. > Jelle Scholtalbers wrote, On 9/20/10 1:04 AM: > > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > > > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > Eric > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > wrote: > > > Sounds like this is going through an initial indexing step (for > > > flatfiles). I would expect the initial indexing of the tables to take time > as you have to create the DB, but subsequent lookups post-indexing should be > much faster if the index is already present. Maybe Jason could answer in > more detail? > > > chris > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > > Hello, > > I tried the Bio::DB::Taxonomy example on this wiki page using perl > 5.8.5 with BioPerl 1.6.0http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > It ran for 100 cpu seconds and output: > > 33090 Viridiplantae kingdom > > I was expecting it to also output the descendents. Some questions: > > 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > implemented? It looks to be in Taxon.pm but it is not documented and > when I ran Data::Dumper on $node the value '_desc' was empty. > > 2) is the flatfile reader always so slow? after replacing 'flatfile' > with a call to 'entrez' it took only 0.02 cpu seconds to come > up with the same result. > > thanks, > Eric > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > From waldenhe at muohio.edu Fri Sep 24 15:15:48 2010 From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene) Date: Fri, 24 Sep 2010 15:15:48 -0400 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> Hello Bioperl Masters, I am trying to perform a local blast with a query list of fasta files against a db of other fasta files. I am attempting to use the Bio::Tools::Run::StandAloneBlastPlus module. I have downleaded from the NCBI website BLAST+ 2.2.24+ and installed on my ubuntu machine. I am using bioperl-1.5.2. so the snibbit of code that is giving me errors is below: my $seq_obj = Bio::Seq->new(-id =>$accn, -seq =>$seq); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; The error I am getting is: --------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- Can't call method "next_result" on an undefined value at /media/C8B3-4A4A/Bioinformatics 1.1 beta/BioPerl/bioperl.pm line 284. I think the real problem is the "cannot find path to Blastall. >From reading around on different forums I have to make a .ncbirc text file with the location of BLAST+2.2.24+ on my machine. I have that file in my /home folder. How do I get StandAloneBlastPlus synced up with BLAST+2.2.24+ ? Am I approaching this right? Thankyou, Hans Waldenmaier From ross at cuhk.edu.hk Sat Sep 25 04:30:39 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sat, 25 Sep 2010 16:30:39 +0800 Subject: [Bioperl-l] perl for GO In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: <015201cb5c8b$ef693490$ce3b9db0$@edu.hk> Given a set of GO IDs, e.g. GO:0008150 GO:0005750 GO:0006122 GO:0008121 GO:0003674 GO:0005575 GO:0008150 GO:0009507 GO:0009535 GO:0009567 GO:0009977 GO:0010027 GO:0031361 from http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo one can manually examine the hierarchy. Although there is go-perl (http://search.cpan.org/~cmungall/go-perl/) and go-db-perl (http://search.cpan.org/~cmungall/go-db-perl/), as a life science student who just learns Perl, I find it difficult to draw a hierarchy tree (or simply make it a table to count the occurrence) to produce something like: biological_process (4) *** cellular process (4) ****** cell adhesion (1) ****** cell differention (3) Molecular function (4) Cellular component (4) Can anybody advise? I don't need any fancy figures at all... From David.Messina at sbc.su.se Sun Sep 26 12:11:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 26 Sep 2010 18:11:54 +0200 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0@sbc.su.se> Hi Hans, > I think the real problem is the "cannot find path to Blastall. Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. Dave From maj at fortinbras.us Sun Sep 26 20:43:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 27 Sep 2010 00:43:15 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: git clone http://github.com/bioperl/bioperl-live.git git clone http://github.com/bioperl/bioperl-run.git (i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl cheers MAJ -------------------------- Mark A. Jensen, PhD Senior Consultant Fortinbras Research http://www.fortinbras.us >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Sunday, September 26, 2010 12:11 PM >To: 'Waldenmaier, Hans Eugene' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans, > > >> I think the real problem is the "cannot find path to Blastall. > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > >Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Sep 27 17:07:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Sep 2010 16:07:11 -0500 Subject: [Bioperl-l] Client-side Scansite Bioperl module In-Reply-To: References: Message-ID: Sorry, didn't see this being responded to on-list (been off the radar the last month). I think this is a good idea, but I'm wondering if this might be better as a separate release on CPAN from bioperl core, seeing as we're in the prelim stages after the next bioperl release of modularizing the current bioperl core into smaller independent releases. chris On Sep 4, 2010, at 10:40 AM, Jonathan Rameseder wrote: > hi guys > > it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases. maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D! > > best wishes > johnny > > [1] Bio::Tools::Analysis::Protein::Scansite > [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593 > > ******************** > Jonathan Rameseder > Ph.D. Candidate > Computational Systems Biology Initiative > Koch Institute for Integrative Cancer Research > Massachusetts Institute of Technology > ******************** From gandipalem at gmail.com Tue Sep 28 00:09:06 2010 From: gandipalem at gmail.com (bv s) Date: Tue, 28 Sep 2010 09:39:06 +0530 Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19 In-Reply-To: References: Message-ID: Dear Sir/Madam, Any one can tell how to use the make_primers.pl script? What is Coordination file? Regards Suresh Scholar, National Bureau Of Plant Genetic Resources, New Delhi. On Mon, Sep 27, 2010 at 9:30 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Re: StandAloneBlastPlus (Dave Messina) > 2. Re: StandAloneBlastPlus (Mark A. Jensen) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 26 Sep 2010 18:11:54 +0200 > From: Dave Messina > Subject: Re: [Bioperl-l] StandAloneBlastPlus > To: "Waldenmaier, Hans Eugene" > Cc: "bioperl-l at bioperl.org" > Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0 at sbc.su.se> > Content-Type: text/plain; charset=us-ascii > > Hi Hans, > > > > I think the real problem is the "cannot find path to Blastall. > > Yes. But it sounds like you're trying to use the Bio::Tools::Run modules > for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it > has blastn, blastp, etc. > > See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > > Also, you probably need to upgrade your BioPerl installation. I'm pretty > sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > > Dave > > > > > ------------------------------ > > Message: 2 > Date: Mon, 27 Sep 2010 00:43:15 +0000 > From: "Mark A. Jensen" > Subject: Re: [Bioperl-l] StandAloneBlastPlus > To: "Dave Messina" , "Waldenmaier, Hans > Eugene" > Cc: bioperl-l at bioperl.org > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > Hi Hans-- Dave is right; you'll need both the new blast+ as well as the > latest BioPerl trunk code. Get it by doing both of the following: > > git clone http://github.com/bioperl/bioperl-live.git > git clone http://github.com/bioperl/bioperl-run.git > > (i.e., you need the latest core and run distributions). To install, see > http://www.bioperl.org/wiki/Installing_BioPerl > > cheers MAJ > > -------------------------- > Mark A. Jensen, PhD > Senior Consultant > Fortinbras Research > http://www.fortinbras.us > > >-----Original Message----- > >From: Dave Messina [mailto:David.Messina at sbc.su.se] > >Sent: Sunday, September 26, 2010 12:11 PM > >To: 'Waldenmaier, Hans Eugene' > >Cc: bioperl-l at bioperl.org > >Subject: Re: [Bioperl-l] StandAloneBlastPlus > > > >Hi Hans, > > > > > >> I think the real problem is the "cannot find path to Blastall. > > > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules > for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it > has blastn, blastp, etc. > > > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > > > >Also, you probably need to upgrade your BioPerl installation. I'm pretty > sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > > > > > >Dave > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 89, Issue 19 > ***************************************** > From David.Messina at sbc.su.se Tue Sep 28 03:53:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 28 Sep 2010 09:53:29 +0200 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> Message-ID: <0BFD9DB0-40D9-4443-8968-CF5D5A31BD02@sbc.su.se> > I can get the command-line Blast running. But I still cannot get Perl to see BLAST. Type the following on the command line: perl -e 'print $ENV{PATH}, "\n"' You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing export /home/hans/BLAST/bin:PATH=${PATH} on the command line and then type perl -e 'print $ENV{PATH}, "\n"' again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script? > I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: > export PATH=${PATH}:/home/hans/BLAST/bin > export BLASTDIR=/home/hans/BLAST/ > > Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added. It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you. Dave From David.Messina at sbc.su.se Tue Sep 28 03:58:00 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 28 Sep 2010 09:58:00 +0200 Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19 In-Reply-To: References: Message-ID: <6BACC902-4F5E-466B-B949-FE373831CB92@sbc.su.se> > Any one can tell how to use the make_primers.pl script? > What is Coordination file? >From the documentation at the top of the script: Description: This program designs primers for constructing knockouts of genes by transformation of PCR products (ref: Datsenko & Wanner, PNAS 2000). A tab-delimited file containing ORF START STOP is read, and primers flanking the start & stop coordinates are designed based on the user-designated sequence file. In addition, primers flanking the knockout regions are chosen for PCR screening purposes once the knockout is generated. The script uses Bioperl in order to determine the primer sequences, which requires getting subsequences and reverse complementing some of the objects. Dave From maj at fortinbras.us Tue Sep 28 07:18:34 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 28 Sep 2010 11:18:34 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: The module checks the env variable BLASTPLUSDIR for the executable; you can set it directly export BLASTPLUSDIR=/home/hans/BLAST/bin and you should be good to go. MAJ >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Tuesday, September 28, 2010 03:53 AM >To: 'Waldenmaier, Hans Eugene' >Cc: 'Mark A. Jensen', bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >> I can get the command-line Blast running. But I still cannot get Perl to see BLAST. > >Type the following on the command line: >perl -e 'print $ENV{PATH}, "\n"' > >You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing >export /home/hans/BLAST/bin:PATH=${PATH} > >on the command line and then type >perl -e 'print $ENV{PATH}, "\n"' > >again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script? > > >> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: >> export PATH=${PATH}:/home/hans/BLAST/bin >> export BLASTDIR=/home/hans/BLAST/ >> >> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. > >It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added. > >It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you. > > >Dave >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From waldenhe at muohio.edu Tue Sep 28 00:52:56 2010 From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene) Date: Tue, 28 Sep 2010 00:52:56 -0400 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: References: Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> Thanks Guys, I have run those steps, my current version now is: hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.006001 But I am still having problems. I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI. I can get the command-line Blast running. But I still cannot get Perl to see BLAST. Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now. I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: export PATH=${PATH}:/home/hans/BLAST/bin export BLASTDIR=/home/hans/BLAST/ Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. Thanks for the help, Hans ________________________________________ From: Mark A. Jensen [maj at fortinbras.us] Sent: Sunday, September 26, 2010 8:43 To: Dave Messina; Waldenmaier, Hans Eugene Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] StandAloneBlastPlus Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: git clone http://github.com/bioperl/bioperl-live.git git clone http://github.com/bioperl/bioperl-run.git (i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl cheers MAJ -------------------------- Mark A. Jensen, PhD Senior Consultant Fortinbras Research http://www.fortinbras.us >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Sunday, September 26, 2010 12:11 PM >To: 'Waldenmaier, Hans Eugene' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans, > > >> I think the real problem is the "cannot find path to Blastall. > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > >Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Sep 28 11:04:07 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 28 Sep 2010 15:04:07 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: Should work from .bashrc, Hans. Also add export BLASTPLUSDIR=/home/hans/BLAST/bin It really should see it in the PATH as you have it, so that may be a bug; however the BLASTPLUSDIR should force it to see the program. You can also execute the export commands in the shell, and the variables will be set and visible to programs for the duration of the login session. You can see what they are set to in the shell by doing set | grep BLAST cheers MAJ >-----Original Message----- >From: Waldenmaier, Hans Eugene [mailto:waldenhe at muohio.edu] >Sent: Tuesday, September 28, 2010 12:52 AM >To: 'Mark A. Jensen', 'Dave Messina' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Thanks Guys, > >I have run those steps, my current version now is: >hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' >1.006001 > >But I am still having problems. > >I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI. I can get the command-line Blast running. But I still cannot get Perl to see BLAST. >Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now. >I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: >export PATH=${PATH}:/home/hans/BLAST/bin >export BLASTDIR=/home/hans/BLAST/ > >Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. > >Thanks for the help, > >Hans >________________________________________ >From: Mark A. Jensen [maj at fortinbras.us] >Sent: Sunday, September 26, 2010 8:43 >To: Dave Messina; Waldenmaier, Hans Eugene >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: > >git clone http://github.com/bioperl/bioperl-live.git >git clone http://github.com/bioperl/bioperl-run.git > >(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl > >cheers MAJ > >-------------------------- >Mark A. Jensen, PhD >Senior Consultant >Fortinbras Research >http://www.fortinbras.us > >>-----Original Message----- >>From: Dave Messina [mailto:David.Messina at sbc.su.se] >>Sent: Sunday, September 26, 2010 12:11 PM >>To: 'Waldenmaier, Hans Eugene' >>Cc: bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] StandAloneBlastPlus >> >>Hi Hans, >> >> >>> I think the real problem is the "cannot find path to Blastall. >> >>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. >> >>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. >> >>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. >> >> >> >>Dave >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From chiragmatkarbioinfo at gmail.com Thu Sep 30 08:20:35 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Thu, 30 Sep 2010 19:20:35 +0700 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id Message-ID: Hello all, Is there any module to fetch dna sequence data from ensemble gene id? -- Regards, Chirag Matkar From jun.yin at ucd.ie Thu Sep 30 09:36:31 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Thu, 30 Sep 2010 14:36:31 +0100 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id In-Reply-To: References: Message-ID: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> Hi, Chirag, BioPerl does not have any module to retrieve data from Ensembl. But Ensembl provides a BioPerl-like interface on that function. You can visit Ensembl's website on how to use that module: http://www.ensembl.org/info/data/api.html Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar Sent: Thursday, September 30, 2010 1:21 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id Hello all, Is there any module to fetch dna sequence data from ensemble gene id? -- Regards, Chirag Matkar _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Thu Sep 30 11:16:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 30 Sep 2010 10:16:45 -0500 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id In-Reply-To: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> References: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> Message-ID: On Sep 30, 2010, at 8:36 AM, Jun Yin wrote: > Hi, Chirag, > > BioPerl does not have any module to retrieve data from Ensembl. But Ensembl > provides a BioPerl-like interface on that function. Actually, BioPerl does have Bio::Tools::Run::Ensembl, which was submitted by Sendu Bala a few years back. I think it stills works rather well, at least tests pass. You might get more out of using the Ensembl API directly as Jun states though, YMMV. BTW, the ensembl API also works with the latest bioperl code, regardless what the Ensembl website says (e.g. they only support v1.2.3). Haven't heard more about whether this discrepancy was supposed to be addressed at some point. chris > You can visit Ensembl's website on how to use that module: > http://www.ensembl.org/info/data/api.html > > Cheers, > Jun Yin > Ph.D. student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar > Sent: Thursday, September 30, 2010 1:21 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id > > Hello all, > Is there any module to fetch dna sequence data from ensemble gene id? > > -- > Regards, > Chirag Matkar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From A.Vakhrusheva at lumc.nl Wed Sep 29 09:28:54 2010 From: A.Vakhrusheva at lumc.nl (A.Vakhrusheva at lumc.nl) Date: Wed, 29 Sep 2010 15:28:54 +0200 Subject: [Bioperl-l] Bio::Matrix::MatrixI Message-ID: <35D95AF6C5D146479C328BBBA554FB76028C367E@mailf.lumcnet.prod.intern> Bio::Matrix::MatrixI I have a question concerning this interface. I want to calculate p distances matrix, but what format is acceptable for input? Phylip doesn't work Anna From shalabh.sharma7 at gmail.com Wed Sep 1 16:56:35 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 1 Sep 2010 16:56:35 -0400 Subject: [Bioperl-l] Bio::SearchIO::hmmer Message-ID: Hi , I am trying to parse hmmsearch report (from HMMER3). I am using the script mentioned here: http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm I am not getting anything but this "amoA_10genes_align.fasta.2 [M=247] for HMM" as the output, i am not even getting any error. I am attaching the hmmsearch report (just a test report) which i tried to test against the parser. I would really appreciate if anyone can help me out. Thanks Shalabh Sharma -------------- next part -------------- # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: amoA_10genes.hmm # target sequence database: test.faa # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: amoA_10genes_align.fasta.2 [M=247] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 1.6e-72 231.1 5.1 1.7e-72 231.0 3.5 1.0 1 gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacte 1.6e-72 231.1 5.1 1.7e-72 231.0 3.5 1.0 1 gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacte Domain annotation for each sequence (and alignments): >> gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacterium] # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 231.0 3.5 1.7e-72 1.7e-72 113 245 .. 1 144 [. 1 146 [. 0.95 Alignments for each domain: == domain 1 score: 231.0 bits; conditional E-value: 1.7e-72 amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe gi|63021979|gb|AAY26564.1| 1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 8********************************************************************************** PP amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af +k+ gi|63021979|gb|AAY26564.1| 84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144 **********************************************966666666655555 PP >> gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacterium] # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 231.0 3.5 1.7e-72 1.7e-72 113 245 .. 1 144 [. 1 146 [. 0.95 Alignments for each domain: == domain 1 score: 231.0 bits; conditional E-value: 1.7e-72 amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe gi|63021981|gb|AAY26565.1| 1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 8********************************************************************************** PP amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af +k+ gi|63021981|gb|AAY26565.1| 84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144 **********************************************966666666655555 PP Internal pipeline statistics summary: ------------------------------------- Query model(s): 1 (247 nodes) Target sequences: 2 (300 residues) Passed MSV filter: 2 (1); expected 0.0 (0.02) Passed bias filter: 2 (1); expected 0.0 (0.02) Passed Vit filter: 2 (1); expected 0.0 (0.001) Passed Fwd filter: 2 (1); expected 0.0 (1e-05) Initial search space (Z): 2 [actual number of targets] Domain search space (domZ): 2 [number of targets reported over threshold] # CPU time: 0.03u 0.00s 00:00:00.03 Elapsed: 00:00:00.08 # Mc/sec: 0.93 // From thomas.sharpton at gmail.com Wed Sep 1 17:29:26 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 1 Sep 2010 14:29:26 -0700 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: References: Message-ID: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> Hi Shalabh, We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to use the HMMER3 version, as found here: http://github.com/bioperl/bioperl-hmmer3 Hope this helps, T On Sep 1, 2010, at 1:56 PM, shalabh sharma wrote: > Hi , > I am trying to parse hmmsearch report (from HMMER3). I am using > the > script mentioned here: > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm > > I am not getting anything but this "amoA_10genes_align.fasta.2 > [M=247] for > HMM" as the output, i am not even getting any error. > I am attaching the hmmsearch report (just a test report) which i > tried to > test against the parser. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Thu Sep 2 04:44:58 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 2 Sep 2010 10:44:58 +0200 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> References: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> Message-ID: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> On Wed, 1 Sep 2010 14:29:26 -0700 Thomas Sharpton wrote: Hi, > We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to > use the HMMER3 version, as found here: > > http://github.com/bioperl/bioperl-hmmer3 Actually it's now included in the bioperl-live repository, but the code hasn't made it into a release yet. http://github.com/bioperl/bioperl-live.git Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From e.stupka at ucl.ac.uk Thu Sep 2 08:32:02 2010 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Thu, 2 Sep 2010 13:32:02 +0100 Subject: [Bioperl-l] git account Message-ID: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> Hello there, I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any? thanks! Elia --- '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." ~ Stephen Hawkings Senior Lecturer, Bioinformatics Scientific Director - Bioinformatics, UCL Genomics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Institute of Cell and Molecular Science Barts and The London School of Medicine and Dentistry 4 Newark Street Whitechapel London E1 2AT Office (UCL): +44 207 679 6493 Fax: +44 0207 6796817 Office (ICMS): +44 0207 8822374 Mobile: +44 787 6478912 From cjfields at illinois.edu Thu Sep 2 10:29:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 2 Sep 2010 09:29:40 -0500 Subject: [Bioperl-l] git account In-Reply-To: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> References: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> Message-ID: Done! Let us know if you run into problems. chris On Sep 2, 2010, at 7:32 AM, Elia Stupka wrote: > Hello there, > > I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any? > > thanks! > > Elia > > > --- > '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." > ~ Stephen Hawkings > > Senior Lecturer, Bioinformatics > Scientific Director - Bioinformatics, UCL Genomics > > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Institute of Cell and Molecular Science > Barts and The London School of Medicine and Dentistry > 4 Newark Street > Whitechapel > London > E1 2AT > > Office (UCL): +44 207 679 6493 > Fax: +44 0207 6796817 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 787 6478912 > > > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From J.Christopher.Ellis at duke.edu Thu Sep 2 10:53:34 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Thu, 2 Sep 2010 10:53:34 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <53096.1283439214@duke.edu> Chris have you had any luck with this? Thanks, Chris On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent: Yes, I see that one. It may be the ID hash that is being returned is empty. I'll look into it. -c On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > Hi Chris, > > The error is... > > "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." > > The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > use Bio::DB::EUtilities; > > > > > > > > > my (%taxa, @taxa); > > > > my (%names, %idmap); > > > > > > > > > # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', > > > > # (probably) > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > 20807972 > 730439); > > > > > > > my $factory = Bio::DB::EUtilities->new( > > - > eutil => 'elink', > > > -db => 'taxonomy', > > > > > -dbfrom => 'protein', > > > > > -correspondence => 1, > > > > > -id => @ids); > > > > > > > > > # iterate through the LinkSet objects > > > > while (my $ds = $factory->next_LinkSet) { > > > > > $taxa{($ds->get_submitted_ids)[0] > > } > = ($ds->get_ids)[0] > > } > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > => > 'esummary', > > > -db => 'taxonomy', > > > > > -id => @taxa ); > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > { > > > $names{($_->get_contents_by_name('TaxId')) > > [ > 0]} = > > ($_->get_contents_by_name('ScientificName'))[0 > > ] > ; > > } > > > > > > > > > foreach (@ids) { > > > > > $idmap{$_} = $names{$taxa{$_ > > } > }; > > } > > > > > > > > > # %idmap is > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > # 730439 => 'Bacillus caldolyticus' > > > > # 89318838 => undef (this record has been removed from the db) > > > > > > > > > 1; > > > Thanks, > > > > Chris > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: > Chris, > > Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. > > chris > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > Hi All, > > > > I am trying to extract the entire taxonomy of an organism including the > > classifications. Some thing like... > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > > > Thanks for all your help in advance! > > > > Chris > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Sep 2 12:21:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 02 Sep 2010 11:21:48 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <53096.1283439214@duke.edu> References: <53096.1283439214@duke.edu> Message-ID: <1283444508.5339.10.camel@pyrimidine.igb.uiuc.edu> Chris, There are a few things wrong with the original script, so I'll fix them. Basically, it makes the assumption that every ID in the original list is found. The problem: eutils only reports back data it finds, silently discarding IDs that don't match. So, using the original ID list when building the hashes needs a bit more error checking. Here's the revised script that works for me. https://gist.github.com/f5db90a432fed68548d4 I'm also adding a check to ensure all IDs are defined prior to adding them to the param string, just in case. chris On Thu, 2010-09-02 at 10:53 -0400, J. Christopher Ellis wrote: > Chris have you had any luck with this? > > Thanks, > Chris > > On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent: > Yes, I see that one. It may be the ID hash that is being > returned is empty. I'll look into it. > > -c > > On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > > > Hi Chris, > > > > The error is... > > > > "Use of uninitialized value $id in join or string at > C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm > line 363." > > > > The script from > http://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > > > use Bio::DB::EUtilities; > > > > > > > > > > > > > > > > > > my (%taxa, @taxa); > > > > > > > > my (%names, %idmap); > > > > > > > > > > > > > > > > > > # these are protein ids; nuc ids will work by changing > -dbfrom => 'nucleotide', > > > > > > > > # (probably) > > > > > > > > > > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > > > 20807972 > > 730439); > > > > > > > > > > > > > > my $factory = Bio::DB::EUtilities->new( > > > > - > > eutil => 'elink', > > > > > > -db => 'taxonomy', > > > > > > > > > > -dbfrom => 'protein', > > > > > > > > > > -correspondence => 1, > > > > > > > > > > -id => \@ids); > > > > > > > > > > > > > > > > > > # iterate through the LinkSet objects > > > > > > > > while (my $ds = $factory->next_LinkSet) { > > > > > > > > > > $taxa{($ds->get_submitted_ids)[0] > > > > } > > = ($ds->get_ids)[0] > > > > } > > > > > > > > > > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > > > => > > 'esummary', > > > > > > -db => 'taxonomy', > > > > > > > > > > -id => \@taxa ); > > > > > > > > > > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > > > > { > > > > > > $names{($_->get_contents_by_name('TaxId')) > > > > [ > > 0]} = > > > > ($_->get_contents_by_name('ScientificName'))[0 > > > > ] > > ; > > > > } > > > > > > > > > > > > > > > > > > foreach (@ids) { > > > > > > > > > > $idmap{$_} = $names{$taxa{$_ > > > > } > > }; > > > > } > > > > > > > > > > > > > > > > > > # %idmap is > > > > > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > > > > > # 730439 => 'Bacillus caldolyticus' > > > > > > > > # 89318838 => undef (this record has been removed from the > db) > > > > > > > > > > > > > > > > > > 1; > > > > > > Thanks, > > > > > > > > Chris > > > > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu > sent: > > Chris, > > > > Regarding a fix for that script, we would have to see your > modified script and the error. However, there are modules > within BioPerl to essentially do what you want, in particular, > Bio::DB::Taxonomy. > > > > chris > > > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > > > Hi All, > > > > > > I am trying to extract the entire taxonomy of an organism > including the > > > classifications. Some thing like... > > > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, > Order:Enterobacteriales, Family:Enterobacteriaceae, > Genus:Escherichia > > > > > > I am not worried about format just that I get the > information and the associated level of hierarchy. The script > found > http://bioperl.org/wiki/Species_names_from_accession_numbers% > 26quot%3B%26gt% > 3Bhttp://bioperl.org/wiki/Species_names_from_accession_numbers">athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > > > My first question is "Is there a known fix for this?" and > my second question is how do I get the full hierarchical > information (as seen above) with the taxonomy db? > > > > > > Thanks for all your help in advance! > > > > > > Chris > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l% > 26quot%3B%26gt% > 3Bhttp://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From thomas.sharpton at gmail.com Thu Sep 2 12:34:07 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Thu, 2 Sep 2010 09:34:07 -0700 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> References: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> Message-ID: So it is! I'm paying attention, I swear I am.... Shalabh, if the HMMER3 version of SearchIO doesn't solve your problem, do let us know. Best, Tom On Sep 2, 2010, at 1:44 AM, Kai Blin wrote: > On Wed, 1 Sep 2010 14:29:26 -0700 > Thomas Sharpton wrote: > > Hi, > >> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to >> use the HMMER3 version, as found here: >> >> http://github.com/bioperl/bioperl-hmmer3 > > Actually it's now included in the bioperl-live repository, but the > code > hasn't made it into a release yet. > > http://github.com/bioperl/bioperl-live.git > > Cheers, > Kai > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From johnny at mit.edu Sat Sep 4 11:40:37 2010 From: johnny at mit.edu (Jonathan Rameseder) Date: Sat, 4 Sep 2010 11:40:37 -0400 Subject: [Bioperl-l] Client-side Scansite Bioperl module Message-ID: hi guys it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases. maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D! best wishes johnny [1] Bio::Tools::Analysis::Protein::Scansite [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593 ******************** Jonathan Rameseder Ph.D. Candidate Computational Systems Biology Initiative Koch Institute for Integrative Cancer Research Massachusetts Institute of Technology ******************** From David.Messina at sbc.su.se Mon Sep 6 08:14:20 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Sep 2010 14:14:20 +0200 Subject: [Bioperl-l] Client-side Scansite Bioperl module In-Reply-To: References: Message-ID: <0EA1C4B0-66CF-4AE3-9A47-CC6624737821@sbc.su.se> Hi Jonathan, Great to hear you're interested in including your code in BioPerl! In general, we are liberal in what we accept. I think (and I'd like to hear what other BioPerlers think) the value of adding your code depends a lot on how it ties in with existing BioPerl objects ? does it make use of Bio::Seq or Bio::SeqIO, for example? If you haven't already, you might want to take a look at some of our developer documentation. For example: http://www.bioperl.org/wiki/Bioperl_Best_Practices http://www.bioperl.org/wiki/Advanced_BioPerl Also, the other thing to be aware of is that in the near future BioPerl itself will be splitting up into separately distributed modules anyway. I can't find a good recent thread that discussed the rationale and details, but here's a couple anyway: http://www.bioperl.org/wiki/Proposed_BioPerl_changes http://old.nabble.com/Final-BioPerl-1.6-release-td29180027.html#a29195208 Dave From ross at cuhk.edu.hk Tue Sep 7 04:28:00 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 7 Sep 2010 16:28:00 +0800 Subject: [Bioperl-l] Indexing nr database In-Reply-To: References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> By the following codes, I wanna index the 4G nr database, however, the index file is > 1T and the job has been running for weeks and still hasn't finished. Could anybody tell me how you accomplish the goal? Thanks in advance. use strict; use Bio::DB::Flat::BinarySearch; (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; # use single quotes so you don't have to write # regular expressions like "gi\\|(\\d+)" #my $primary_pattern = '^>(\S+)'; #if ($fullHeader == 1) { my $primary_pattern = '^>(.+)'; #} my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis H37Rv complete genome"; #$string =~ s/$primary_pattern/RRR/g; #print "$string\n"; # one or more patterns stored in a hash: my $secondary_patterns = {GI => 'gi\|(\d+)'}; my $db = Bio::DB::Flat::BinarySearch->new( -directory => $baseDir, -dbname => $dbName, -write_flag => 1, -primary_pattern => $primary_pattern, -primary_namespace => 'ACC', -secondary_patterns => $secondary_patterns, -verbose => 1, -format => 'fasta' ); $db->build_index($seqFile); From David.Messina at sbc.su.se Tue Sep 7 05:23:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 7 Sep 2010 11:23:42 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> Message-ID: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> Hi Ross, What do you need the index for? If it's random retrieval of sequences using an accession or GI, you'd be better off using NCBI's own database indexing and retrieval tools. They're far faster than BioPerl. They're distributed with Blast+ and available here: ftp://ftp.ncbi.nlm.nih.gov//blast/executables/LATEST Specifically, I'm talking about 'makeblastdb' and blastdbcmd'. I'm not sure what you mean by "4g" nr, but there's an already-indexed version of nr available here: ftp://ftp.ncbi.nih.gov//blast/db You can use that directly with the BLAST+ database tools. Also, you take a look at the cookbook at the end of the Blast+ user manual (available in the same download directory as Blast+ itself). Some nice examples there showing off the flexibility of this latest version of the software. Dave From ross at cuhk.edu.hk Tue Sep 7 05:18:16 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 7 Sep 2010 17:18:16 +0800 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <4C860148.3030000@fmi.ch> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> Message-ID: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> The reason is that I have to retrieve the specific information of the matched sequences, e.g. extract the 64th amino acid of the top matched sequence. Is there any way to achieve that? -----Original Message----- From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] Sent: Tuesday, September 07, 2010 5:09 PM To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk Subject: Re: [Bioperl-l] Indexing nr database Hi why don't you use the pre-indexed BLAST files from NCBI: ftp://ftp.ncbi.nih.gov/blast/db/ you can use them to fetch individual sequences by gi number or accession with the tool "blastdbcmd" from blast+ binaries: ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ regards, Hans On 09/07/2010 10:28 AM, Ross KK Leung wrote: > By the following codes, I wanna index the 4G nr database, however, the index > file is> 1T and the job has been running for weeks and still hasn't > finished. Could anybody tell me how you accomplish the goal? Thanks in > advance. > > use strict; > > use Bio::DB::Flat::BinarySearch; > > > > (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; > > > > # use single quotes so you don't have to write > > # regular expressions like "gi\\|(\\d+)" > > #my $primary_pattern = '^>(\S+)'; > > #if ($fullHeader == 1) { > > my $primary_pattern = '^>(.+)'; > > #} > > my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis > H37Rv complete genome"; > #$string =~ s/$primary_pattern/RRR/g; > > #print "$string\n"; > > > > # one or more patterns stored in a hash: > > my $secondary_patterns = {GI => 'gi\|(\d+)'}; > > > > my $db = Bio::DB::Flat::BinarySearch->new( > > -directory => $baseDir, > > -dbname => $dbName, > > -write_flag => 1, > > -primary_pattern => $primary_pattern, > > -primary_namespace => 'ACC', > > -secondary_patterns => $secondary_patterns, > > -verbose => 1, > > -format => 'fasta' ); > > > > $db->build_index($seqFile); > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Tue Sep 7 05:09:28 2010 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Tue, 07 Sep 2010 11:09:28 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> Message-ID: <4C860148.3030000@fmi.ch> Hi why don't you use the pre-indexed BLAST files from NCBI: ftp://ftp.ncbi.nih.gov/blast/db/ you can use them to fetch individual sequences by gi number or accession with the tool "blastdbcmd" from blast+ binaries: ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ regards, Hans On 09/07/2010 10:28 AM, Ross KK Leung wrote: > By the following codes, I wanna index the 4G nr database, however, the index > file is> 1T and the job has been running for weeks and still hasn't > finished. Could anybody tell me how you accomplish the goal? Thanks in > advance. > > use strict; > > use Bio::DB::Flat::BinarySearch; > > > > (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; > > > > # use single quotes so you don't have to write > > # regular expressions like "gi\\|(\\d+)" > > #my $primary_pattern = '^>(\S+)'; > > #if ($fullHeader == 1) { > > my $primary_pattern = '^>(.+)'; > > #} > > my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis > H37Rv complete genome"; > #$string =~ s/$primary_pattern/RRR/g; > > #print "$string\n"; > > > > # one or more patterns stored in a hash: > > my $secondary_patterns = {GI => 'gi\|(\d+)'}; > > > > my $db = Bio::DB::Flat::BinarySearch->new( > > -directory => $baseDir, > > -dbname => $dbName, > > -write_flag => 1, > > -primary_pattern => $primary_pattern, > > -primary_namespace => 'ACC', > > -secondary_patterns => $secondary_patterns, > > -verbose => 1, > > -format => 'fasta' ); > > > > $db->build_index($seqFile); > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Tue Sep 7 05:33:46 2010 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Tue, 07 Sep 2010 11:33:46 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> Message-ID: <4C8606FA.3000509@fmi.ch> On 09/07/2010 11:18 AM, Ross KK Leung wrote: > The reason is that I have to retrieve the specific information of the > matched sequences, e.g. extract the 64th amino acid of the top matched > sequence. Is there any way to achieve that? "blastdbcmd" has several options like "-range" and even if "blastdbcmd" does not give you the subset of information you want to fetch, I am still convinced you are quicker by fetching the complete entry with"blastdbcmd" and then parse the required data out of just one entry. Hans > -----Original Message----- > From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] > Sent: Tuesday, September 07, 2010 5:09 PM > To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk > Subject: Re: [Bioperl-l] Indexing nr database > > Hi > > > why don't you use the pre-indexed BLAST files from NCBI: > > ftp://ftp.ncbi.nih.gov/blast/db/ > > you can use them to fetch individual sequences by gi number or accession > with the tool "blastdbcmd" from blast+ binaries: > > ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ > > > regards, Hans > > > > On 09/07/2010 10:28 AM, Ross KK Leung wrote: >> By the following codes, I wanna index the 4G nr database, however, the > index >> file is> 1T and the job has been running for weeks and still hasn't >> finished. Could anybody tell me how you accomplish the goal? Thanks in >> advance. >> >> use strict; >> >> use Bio::DB::Flat::BinarySearch; >> >> >> >> (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = > @ARGV; >> >> >> >> # use single quotes so you don't have to write >> >> # regular expressions like "gi\\|(\\d+)" >> >> #my $primary_pattern = '^>(\S+)'; >> >> #if ($fullHeader == 1) { >> >> my $primary_pattern = '^>(.+)'; >> >> #} >> >> my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis >> H37Rv complete genome"; >> #$string =~ s/$primary_pattern/RRR/g; >> >> #print "$string\n"; >> >> >> >> # one or more patterns stored in a hash: >> >> my $secondary_patterns = {GI => 'gi\|(\d+)'}; >> >> >> >> my $db = Bio::DB::Flat::BinarySearch->new( >> >> -directory => $baseDir, >> >> -dbname => $dbName, >> >> -write_flag => 1, >> >> -primary_pattern => $primary_pattern, >> >> -primary_namespace => 'ACC', >> >> -secondary_patterns => $secondary_patterns, >> >> -verbose => 1, >> >> -format => 'fasta' ); >> >> >> >> $db->build_index($seqFile); >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From fs5 at sanger.ac.uk Tue Sep 7 08:09:52 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 07 Sep 2010 13:09:52 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> Message-ID: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> I am working a lot with feature-rich Bio::Seq objects these days and thought that it would be really nice if I could do something like: my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); instead of having to grep for the feature every time. There could then be 'by_tag' and 'by_region' options as well. According to the Bio::Seq docs, something like this seems to be planned at some stage. I would be willing to contribute to this feature if I can and if this isn't already being implemented by somebody else. Does anybody know the state of this feature? Frank -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jason at bioperl.org Tue Sep 7 13:36:07 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 07 Sep 2010 10:36:07 -0700 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <4C867807.2040907@bioperl.org> And the implementation would just be something like this? my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] eq 'my_gene' } $seq->get_SeqFeatures(); I think any implementation would be if we moved from the in-memory arrays & hash-based system to a sqlite db on the back-end for how Sequence and Feature objects are stored. This would be a somewhat slower but wouldn't have performance/memory problems we get for sequences with many annotations. -jason Frank Schwach wrote, On 9/7/10 5:09 AM: > I am working a lot with feature-rich Bio::Seq objects these days and > thought that it would be really nice if I could do something like: > > my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > > instead of having to grep for the feature every time. > There could then be 'by_tag' and 'by_region' options as well. > > According to the Bio::Seq docs, something like this seems to be planned > at some stage. I would be willing to contribute to this feature if I can > and if this isn't already being implemented by somebody else. > Does anybody know the state of this feature? > > Frank > > > > > > > From fs5 at sanger.ac.uk Wed Sep 8 04:42:57 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 08 Sep 2010 09:42:57 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <4C867807.2040907@bioperl.org> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> Message-ID: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Jason, Yes, I guess that would be the simplest way of doing it - basically just doing it the way the docs suggest for getting at a specific feature but hiding the grep behind a Bio::Seq method with search parameters. But we could also build a hash of feature tags as the Bio::Seq is built so that retrieval is more efficient. This could also be used to implement a bin indexing scheme for range queries, similar to what Bio::DB::GFF does. Is a move to an sqlite backend planend for the near future? Frank On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: > And the implementation would just be something like this? > > my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] > eq 'my_gene' } $seq->get_SeqFeatures(); > > I think any implementation would be if we moved from the in-memory > arrays & hash-based system to a sqlite db on the back-end for how > Sequence and Feature objects are stored. > This would be a somewhat slower but wouldn't have performance/memory > problems we get for sequences with many annotations. > > -jason > Frank Schwach wrote, On 9/7/10 5:09 AM: > > I am working a lot with feature-rich Bio::Seq objects these days and > > thought that it would be really nice if I could do something like: > > > > my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > > > > instead of having to grep for the feature every time. > > There could then be 'by_tag' and 'by_region' options as well. > > > > According to the Bio::Seq docs, something like this seems to be planned > > at some stage. I would be willing to contribute to this feature if I can > > and if this isn't already being implemented by somebody else. > > Does anybody know the state of this feature? > > > > Frank > > > > > > > > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From stefan.kirov at bms.com Wed Sep 8 11:09:55 2010 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 08 Sep 2010 11:09:55 -0400 Subject: [Bioperl-l] Another interesting Javascript library Message-ID: <4C87A743.5010109@bms.com> Sorry for off topic, but I believe a lot of people can find this quite useful: "CanvasXpress is a javascript library based on the tag implemented in HTML5. I developed this library as the core visualization component for our BMS systems biology platform which I hope to release soon. The basic idea was to have generic and simple way to display genomics data. CanvasXpress supports bar graphs, line graphs, bar-line combination graphs, boxplots, dotplots, area graphs, stacked graphs, percentage-stacked graphs, correlation plots, Venn diagrams, heatmaps, newick trees, 2D-scatter plots, 2D-scatter bubble plots, 3D-scatter plots, pie charts, networks (or pathways), and a genome browser. It also supports a few data transformations like log and exponential transformation, z-score, percentile transformation and ratio. It also support grouping of samples, zooming, events ... yada, yada, yada ... and more importantly I created an Ext panel for it. Take a look. http://canvasxpress.org/" Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From alperyilmaz at gmail.com Wed Sep 8 12:47:42 2010 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Wed, 8 Sep 2010 12:47:42 -0400 Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates Message-ID: Hi, I have a GFF file listing mRNA and CDS coordinates for every transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates based on that information. I was wondering, if there's already made script for that purpose that you're aware of. I already uploaded the GFF file into Bio::DB::SeqFeature database, so I can utilize both flat file or database based scripts. thanks, Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 From cjfields at illinois.edu Wed Sep 8 19:20:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Sep 2010 18:20:09 -0500 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). chris On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > Hi Jason, > > Yes, I guess that would be the simplest way of doing it - basically just > doing it the way the docs suggest for getting at a specific feature but > hiding the grep behind a Bio::Seq method with search parameters. But we > could also build a hash of feature tags as the Bio::Seq is built so that > retrieval is more efficient. This could also be used to implement a bin > indexing scheme for range queries, similar to what Bio::DB::GFF does. > Is a move to an sqlite backend planend for the near future? > > Frank > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: >> And the implementation would just be something like this? >> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] >> eq 'my_gene' } $seq->get_SeqFeatures(); >> >> I think any implementation would be if we moved from the in-memory >> arrays & hash-based system to a sqlite db on the back-end for how >> Sequence and Feature objects are stored. >> This would be a somewhat slower but wouldn't have performance/memory >> problems we get for sequences with many annotations. >> >> -jason >> Frank Schwach wrote, On 9/7/10 5:09 AM: >>> I am working a lot with feature-rich Bio::Seq objects these days and >>> thought that it would be really nice if I could do something like: >>> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); >>> >>> instead of having to grep for the feature every time. >>> There could then be 'by_tag' and 'by_region' options as well. >>> >>> According to the Bio::Seq docs, something like this seems to be planned >>> at some stage. I would be willing to contribute to this feature if I can >>> and if this isn't already being implemented by somebody else. >>> Does anybody know the state of this feature? >>> >>> Frank >>> >>> >>> >>> >>> >>> >>> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Sep 9 01:51:53 2010 From: jason at bioperl.org (Jason Stajich) Date: Wed, 08 Sep 2010 22:51:53 -0700 Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates In-Reply-To: References: Message-ID: <4C8875F9.6020502@bioperl.org> Hi Alper - This script operates on gtf so doesn't quite do what you want but could be modified to be simpler to just look at the CDS and mRNA rather than the exon,start/stop codon info http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl Otherwise I think there make be some easy ways to do this from some tools in MAKER too. -jason Alper Yilmaz wrote, On 9/8/10 9:47 AM: > Hi, > > I have a GFF file listing mRNA and CDS coordinates for every > transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates > based on that information. I was wondering, if there's already made > script for that purpose that you're aware of. > > I already uploaded the GFF file into Bio::DB::SeqFeature database, so > I can utilize both flat file or database based scripts. > > thanks, > > Alper Yilmaz > Post-doctoral Researcher > Plant Biotechnology Center > The Ohio State University > 1060 Carmack Rd > Columbus, OH 43210 > (614)688-4954 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Thu Sep 9 04:10:36 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 09 Sep 2010 09:10:36 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Message-ID: <1284019836.4777.281.camel@deskpro15336.dynamic.sanger.ac.uk> so something like an abstract Bio::Seq::FeatureContainer that defines the methods for storing and retrieving features and that would then be sub-classed to e.g. Bio::Seq::FeatureContainer::Memory or Bio::Seq::FeatureContainer:Sqlite - is that the plan? Is there any way I can get involved or is it better to wait for other features to be developed first? Cheers, Frank On Wed, 2010-09-08 at 18:20 -0500, Chris Fields wrote: > Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). > > chris > > On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > > > Hi Jason, > > > > Yes, I guess that would be the simplest way of doing it - basically just > > doing it the way the docs suggest for getting at a specific feature but > > hiding the grep behind a Bio::Seq method with search parameters. But we > > could also build a hash of feature tags as the Bio::Seq is built so that > > retrieval is more efficient. This could also be used to implement a bin > > indexing scheme for range queries, similar to what Bio::DB::GFF does. > > Is a move to an sqlite backend planend for the near future? > > > > Frank > > > > > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: > >> And the implementation would just be something like this? > >> > >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] > >> eq 'my_gene' } $seq->get_SeqFeatures(); > >> > >> I think any implementation would be if we moved from the in-memory > >> arrays & hash-based system to a sqlite db on the back-end for how > >> Sequence and Feature objects are stored. > >> This would be a somewhat slower but wouldn't have performance/memory > >> problems we get for sequences with many annotations. > >> > >> -jason > >> Frank Schwach wrote, On 9/7/10 5:09 AM: > >>> I am working a lot with feature-rich Bio::Seq objects these days and > >>> thought that it would be really nice if I could do something like: > >>> > >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > >>> > >>> instead of having to grep for the feature every time. > >>> There could then be 'by_tag' and 'by_region' options as well. > >>> > >>> According to the Bio::Seq docs, something like this seems to be planned > >>> at some stage. I would be willing to contribute to this feature if I can > >>> and if this isn't already being implemented by somebody else. > >>> Does anybody know the state of this feature? > >>> > >>> Frank > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jun.yin at ucd.ie Thu Sep 9 04:20:39 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Thu, 09 Sep 2010 09:20:39 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Message-ID: <00ea01cb4ff7$e30652f0$a912f8d0$%yin@ucd.ie> Hi, I would like to give a go on the bin indexing scheme on Bio::Seq(or a similar package to Bio::LocatableSeq). The idea is to save the index of sequences to a local database (AnyDBM) instead of the memory itself. So this will free some memory usage. This idea actually comes from Bio::DB::Fasta, as implemented by Lincoln Stein. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Thursday, September 09, 2010 12:20 AM To: Frank Schwach Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Seq, search for specific features Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). chris On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > Hi Jason, > > Yes, I guess that would be the simplest way of doing it - basically just > doing it the way the docs suggest for getting at a specific feature but > hiding the grep behind a Bio::Seq method with search parameters. But we > could also build a hash of feature tags as the Bio::Seq is built so that > retrieval is more efficient. This could also be used to implement a bin > indexing scheme for range queries, similar to what Bio::DB::GFF does. > Is a move to an sqlite backend planend for the near future? > > Frank > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: >> And the implementation would just be something like this? >> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] >> eq 'my_gene' } $seq->get_SeqFeatures(); >> >> I think any implementation would be if we moved from the in-memory >> arrays & hash-based system to a sqlite db on the back-end for how >> Sequence and Feature objects are stored. >> This would be a somewhat slower but wouldn't have performance/memory >> problems we get for sequences with many annotations. >> >> -jason >> Frank Schwach wrote, On 9/7/10 5:09 AM: >>> I am working a lot with feature-rich Bio::Seq objects these days and >>> thought that it would be really nice if I could do something like: >>> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); >>> >>> instead of having to grep for the feature every time. >>> There could then be 'by_tag' and 'by_region' options as well. >>> >>> According to the Bio::Seq docs, something like this seems to be planned >>> at some stage. I would be willing to contribute to this feature if I can >>> and if this isn't already being implemented by somebody else. >>> Does anybody know the state of this feature? >>> >>> Frank >>> >>> >>> >>> >>> >>> >>> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From s1012635 at student.hsleiden.nl Thu Sep 9 05:27:23 2010 From: s1012635 at student.hsleiden.nl (_Lelieveld, Stefan - s1012635) Date: Thu, 9 Sep 2010 11:27:23 +0200 (CEST) Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Hi, I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : use Bio::Tools::Tmhmm; my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); while( my $tmhmm_feat = $parser->next_result ) { #do something #eg push @tmhmm_feat, $tmhmm_feat; } How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? cheers! Stefan Lelieveld From fs5 at sanger.ac.uk Thu Sep 9 06:28:51 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 09 Sep 2010 11:28:51 +0100 Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> References: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <1284028131.4777.290.camel@deskpro15336.dynamic.sanger.ac.uk> I haven't used that module myself but it appears to be a parser for results from TMHMM, i.e. you don't feed it the FASTA file but the output from TMHMM after it was run. To run TMHMM you should use Bio::Tools::Run::Tmhmm http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Tmhmm.pm Follow the synopsis to feed the tool with your sequences. You can learn how to read a FASTA file and access each sequence in a loop here: http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples Essentially it boils down to: use Bio::SeqIO; my $file = shift; # to get a file path from command line my $inseq = Bio::SeqIO->new(-file => "<$file",-format => 'FASTA' ); while (my $seq = $inseq->next_seq) { print $seq->accession_number,"\n"; } as an example for printing out accession numbers from $seq, which is a Bio::Seq object. So what you have to do now is to feed each of those Bio::Seq objects into your TMHMM runner. Frank On Thu, 2010-09-09 at 11:27 +0200, _Lelieveld, Stefan - s1012635 wrote: > Hi, > > I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). > I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. > > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : > use Bio::Tools::Tmhmm; > my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); > while( my $tmhmm_feat = $parser->next_result ) { > #do something > #eg > push @tmhmm_feat, $tmhmm_feat; > } > > How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? > > cheers! > > Stefan Lelieveld > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From kai.blin at biotech.uni-tuebingen.de Thu Sep 9 06:16:08 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 9 Sep 2010 12:16:08 +0200 Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> References: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl> <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <20100909121608.2571bbff.kai.blin@biotech.uni-tuebingen.de> On Thu, 9 Sep 2010 11:27:23 +0200 (CEST) "_Lelieveld, Stefan - s1012635" wrote: Hi Stefan, > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : > use Bio::Tools::Tmhmm; > my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); > while( my $tmhmm_feat = $parser->next_result ) { > #do something > #eg > push @tmhmm_feat, $tmhmm_feat; > } > > How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? You need to run TMHMM first, of course. Bio::Tools::Tmhmm only parses the TMHMM output file and returns an object that you can ask for Bio::SeqFeature objects. So if you want to run TMHMM on some fasta files, this module isn't going to do that for you. Assuming that input.txt contains the TMHMM output, """ my $parser = new Bio::Tools:Tmhmm(-file => "input.txt"); """ will load parse the TMHMM output for you. HTH, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From elanorbust2 at yahoo.com Thu Sep 9 12:10:06 2010 From: elanorbust2 at yahoo.com (sally roberts) Date: Thu, 9 Sep 2010 09:10:06 -0700 (PDT) Subject: [Bioperl-l] standaloneblastplus Message-ID: <154453.73718.qm@web37504.mail.mud.yahoo.com> I am running a test for standaloneblastplus but getting data back that does not exist in my query or my local database. Below is a outline of my script small database, query list, and erroneous results. As you will notice the query list is comprised of the first four sequences found in the database. The results say it can not find the first two and then the mathces for the last two do not exist! Thanks for any help! Program #!/usr/bin/perl use Bio::Tools::Run::StandAloneBlastPlus; $fac = Bio::Tools::Run::StandAloneBlastPlus->new( ? -db_name => 'ITS', ? -db_data => 'smallDB.fas', ? -create => 1 ); $result = $fac->blastn( -query => , 'sequences.fasta', ??????????????????????? -outfile => 'ITStest2.bls'); smallDB.fas Data >302585252|HM807352|Waitea circinata? internal transcribed spacer 1 ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >302585252|HM807352|Waitea circinata? internal transcribed spacer 2 GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGCGCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCATTAAAGCGGAGGAA >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1 GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGCGCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTATATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAGAATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCTACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACCTCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTGACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGTGAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCGTTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCGTCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTACCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAGTCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATATATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGAACGCA CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTAAGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCTTATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTGGGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGAATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGAAACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGAAATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTAACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGACTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAAGTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACGTAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGAAACTTTTATATATAGTTCGCCGAATAATAGCGAAC >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer 1 AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCGCTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATCGTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTGAGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTCGAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCGAGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACCGGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTGGCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCCCTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTGTATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATATGAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATGTATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCTATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1 AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCTTTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATCTCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCGGGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAATAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGAAACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAATATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCCGCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTCGTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATAGTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATACAAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCATAAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1 AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGCTCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATTCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTCTTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCGATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGAGATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAATTACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAACGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGTAGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATATATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG sequences.fasta data >Test1 ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >Test2 GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >Test3 CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >Test4 GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA Results BLASTN 2.2.24+ Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Database: ITS ?????????? 5 sequences; 1,102 total letters Query=? Test1 Length=204 ***** No hits found ***** Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 202071 Query=? Test2 Length=192 ***** No hits found ***** Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 189507 Query=? Test3 Length=437 ????????????????????????????????????????????????????????????????????? Score???? E Sequences producing significant alignments:????????????????????????? (Bits)? Value dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 300??? 2e-085 dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 6e-016 dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012 dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 4e-012 >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G59F Length=203 ?Score =? 300 bits (162),? Expect = 2e-085 ?Identities = 176/182 (96%), Gaps = 4/182 (2%) ?Strand=Plus/Plus Query? 10?? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC? 66 ??????????? ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC? 81 Query? 67?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 126 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 82?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 141 Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 201 Query? 187? GG? 188 ??????????? || Sbjct? 202? GG? 203 >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G64F Length=217 ?Score = 69.4 bits (37),? Expect = 6e-016 ?Identities = 39/40 (97%), Gaps = 0/40 (0%) ?Strand=Plus/Plus Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188 ??????????? ||||| |||||||||||||||||||||||||||||||||| Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G60F Length=206 ?Score = 58.4 bits (31),? Expect = 1e-012 ?Identities = 39/42 (92%), Gaps = 3/42 (7%) ?Strand=Plus/Plus Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||| || ||| |||||||||||||||||||||||||||||| Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G65F Length=256 ?Score = 56.5 bits (30),? Expect = 4e-012 ?Identities = 30/30 (100%), Gaps = 0/30 (0%) ?Strand=Plus/Plus Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||||||||||||||||||||||||||||| Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 442850 Query=? Test4 Length=521 ????????????????????????????????????????????????????????????????????? Score???? E Sequences producing significant alignments:????????????????????????? (Bits)? Value dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 309??? 4e-088 dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 7e-016 dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012 dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 5e-012 >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G59F Length=203 ?Score =? 309 bits (167),? Expect = 4e-088 ?Identities = 177/181 (97%), Gaps = 3/181 (1%) ?Strand=Plus/Plus Query? 7??? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA? 63 ??????????? ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA? 82 Query? 64?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 123 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 83?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 142 Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 183 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 202 Query? 184? G? 184 ??????????? | Sbjct? 203? G? 203 >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G64F Length=217 ?Score = 69.4 bits (37),? Expect = 7e-016 ?Identities = 39/40 (97%), Gaps = 0/40 (0%) ?Strand=Plus/Plus Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184 ??????????? ||||| |||||||||||||||||||||||||||||||||| Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G60F Length=206 ?Score = 58.4 bits (31),? Expect = 1e-012 ?Identities = 39/42 (92%), Gaps = 3/42 (7%) ?Strand=Plus/Plus Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182 ??????????? |||| || ||| |||||||||||||||||||||||||||||| Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G65F Length=256 ?Score = 56.5 bits (30),? Expect = 5e-012 ?Identities = 30/30 (100%), Gaps = 0/30 (0%) ?Strand=Plus/Plus Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182 ??????????? |||||||||||||||||||||||||||||| Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 530378 ? Database: ITS ??? Posted date:? Aug 27, 2010? 9:43 AM ? Number of letters in database: 1,102 ? Number of sequences in database:? 5 Matrix: blastn matrix 1 -2 Gap Penalties: Existence: 0, Extension: 2.5 From jaya1786 at gmail.com Thu Sep 9 12:59:51 2010 From: jaya1786 at gmail.com (jayanthijayakumar) Date: Thu, 9 Sep 2010 22:29:51 +0530 Subject: [Bioperl-l] Regarding GSoC 2010 Message-ID: Respected sir/madam, I am Jayanthi Jayakumar doing my second year MS(By Research) in computational biology in Anna University Chennai,India. Iam very much interested to participate in GSoC 2010 under the project "Major Bioperl recognition". I request you to provide details and eligiblity criteria for the same. Thanking you, yours faithfully, Jayanthi Jayakumar From Russell.Smithies at agresearch.co.nz Thu Sep 9 18:54:43 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 10 Sep 2010 10:54:43 +1200 Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <154453.73718.qm@web37504.mail.mud.yahoo.com> References: <154453.73718.qm@web37504.mail.mud.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz> Is that a typo in your email or are some of your fasta headers in your db incorrect? Eg. >301333052|GU725063|Xiphinema adenohystherum internal transcribed >301333052|GU725063|spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Shouldn't that be: >301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Maybe the invalid fasta headers are breaking the db formatter? Russell Smithies Technical Support T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of sally roberts > Sent: Friday, 10 September 2010 4:10 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] standaloneblastplus > > I am running a test for standaloneblastplus but getting data back that > does not exist in my query or my local database. Below is a outline of my > script small database, query list, and erroneous results. As you will > notice the query list is comprised of the first four sequences found in > the database. The results say it can not find the first two and then the > mathces for the last two do not exist! > > Thanks for any help! > > > > Program > > > #!/usr/bin/perl > > use Bio::Tools::Run::StandAloneBlastPlus; > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'ITS', > -db_data => 'smallDB.fas', > -create => 1 > ); > > $result = $fac->blastn( -query => , 'sequences.fasta', > -outfile => 'ITStest2.bls'); > > > smallDB.fas Data > > >302585252|HM807352|Waitea circinata internal transcribed spacer 1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >302585252|HM807352|Waitea circinata internal transcribed spacer 2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >302585250|HM802273|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >302585249|HM802272|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > >302585248|HM802271|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC > CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA > TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT > GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC > CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG > GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC > GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC > ATATCATTAAAGCGGAGGAA > > >301333053|GU725064|Xiphinema turcicum internal transcribed spacer 1 > GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC > GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA > TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG > AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT > ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC > TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG > ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT > GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG > TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG > TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA > CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG > TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA > TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA > ACGCA > CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA > AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT > TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA > > >301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer > 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG > CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT > CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG > GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA > ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA > AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA > AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA > ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA > CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA > AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA > GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG > TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA > AACTTTTATATATAGTTCGCCGAATAATAGCGAAC > > >301333051|GU725062|Xiphinema sphaerocephalum internal transcribed spacer > 1 > AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG > CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC > GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG > AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC > GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG > AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC > GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG > GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC > CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG > TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT > GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG > TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT > ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC > > >301333050|GU725061|Xiphinema hispanum internal transcribed spacer 1 > AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT > TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC > TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG > GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA > TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA > AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA > TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC > GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC > GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC > ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA > GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC > AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT > AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG > > >301333049|GU725060|Xiphinema pyrenaicum internal transcribed spacer 1 > AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC > TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT > TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC > TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG > ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA > GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT > TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA > CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT > AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA > GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA > TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT > ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG > > > > sequences.fasta data > > >Test1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >Test2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >Test3 > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >Test4 > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > > > > Results > > BLASTN 2.2.24+ > > > Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb > Miller (2000), "A greedy algorithm for aligning DNA sequences", J > Comput Biol 2000; 7(1-2):203-14. > > > > Database: ITS > 5 sequences; 1,102 total letters > > > > Query= Test1 > Length=204 > > > ***** No hits found ***** > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 202071 > > > Query= Test2 > Length=192 > > > ***** No hits found ***** > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 189507 > > > Query= Test3 > Length=437 > > Score E > Sequences producing significant alignments: > (Bits) Value > > dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 300 2e-085 > dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4 6e-016 > dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4 1e-012 > dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5 4e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > > Score = 300 bits (162), Expect = 2e-085 > Identities = 176/182 (96%), Gaps = 4/182 (2%) > Strand=Plus/Plus > > Query 10 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC > 66 > ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| > Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC > 81 > > Query 67 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 126 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 82 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 141 > > Query 127 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 186 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 142 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 201 > > Query 187 GG 188 > || > Sbjct 202 GG 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > > Score = 69.4 bits (37), Expect = 6e-016 > Identities = 39/40 (97%), Gaps = 0/40 (0%) > Strand=Plus/Plus > > Query 149 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 188 > ||||| |||||||||||||||||||||||||||||||||| > Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > > Score = 58.4 bits (31), Expect = 1e-012 > Identities = 39/42 (92%), Gaps = 3/42 (7%) > Strand=Plus/Plus > > Query 146 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 186 > |||| || ||| |||||||||||||||||||||||||||||| > Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > > Score = 56.5 bits (30), Expect = 4e-012 > Identities = 30/30 (100%), Gaps = 0/30 (0%) > Strand=Plus/Plus > > Query 157 AAAACTTTCAACAACGGATCTCTTGGTTCT 186 > |||||||||||||||||||||||||||||| > Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 442850 > > > Query= Test4 > Length=521 > > Score E > Sequences producing significant alignments: > (Bits) Value > > dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 309 4e-088 > dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4 7e-016 > dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4 1e-012 > dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5 5e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > > Score = 309 bits (167), Expect = 4e-088 > Identities = 177/181 (97%), Gaps = 3/181 (1%) > Strand=Plus/Plus > > Query 7 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA > 63 > ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| > Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA > 82 > > Query 64 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 123 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 83 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 142 > > Query 124 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 183 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 143 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 202 > > Query 184 G 184 > | > Sbjct 203 G 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > > Score = 69.4 bits (37), Expect = 7e-016 > Identities = 39/40 (97%), Gaps = 0/40 (0%) > Strand=Plus/Plus > > Query 145 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 184 > ||||| |||||||||||||||||||||||||||||||||| > Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > > Score = 58.4 bits (31), Expect = 1e-012 > Identities = 39/42 (92%), Gaps = 3/42 (7%) > Strand=Plus/Plus > > Query 142 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 182 > |||| || ||| |||||||||||||||||||||||||||||| > Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > > Score = 56.5 bits (30), Expect = 5e-012 > Identities = 30/30 (100%), Gaps = 0/30 (0%) > Strand=Plus/Plus > > Query 153 AAAACTTTCAACAACGGATCTCTTGGTTCT 182 > |||||||||||||||||||||||||||||| > Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 530378 > > > Database: ITS > Posted date: Aug 27, 2010 9:43 AM > Number of letters in database: 1,102 > Number of sequences in database: 5 > > > > Matrix: blastn matrix 1 -2 > Gap Penalties: Existence: 0, Extension: 2.5 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From elanorbust2 at yahoo.com Fri Sep 10 11:13:08 2010 From: elanorbust2 at yahoo.com (sally roberts) Date: Fri, 10 Sep 2010 08:13:08 -0700 (PDT) Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz> Message-ID: <23696.14536.qm@web37508.mail.mud.yahoo.com> I think that is just a email error. Thanks for looking though! --- On Thu, 9/9/10, Smithies, Russell wrote: From: Smithies, Russell Subject: RE: [Bioperl-l] standaloneblastplus To: "'sally roberts'" , "'bioperl-l at lists.open-bio.org'" Date: Thursday, September 9, 2010, 6:54 PM Is that a typo in your email or are some of your fasta headers in your db incorrect? Eg. >301333052|GU725063|Xiphinema adenohystherum? internal transcribed >301333052|GU725063|spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Shouldn't that be: >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Maybe the invalid fasta headers are breaking the db formatter? Russell Smithies Technical Support T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809 F? +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of sally roberts > Sent: Friday, 10 September 2010 4:10 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] standaloneblastplus > > I am running a test for standaloneblastplus but getting data back that > does not exist in my query or my local database. Below is a outline of my > script small database, query list, and erroneous results. As you will > notice the query list is comprised of the first four sequences found in > the database. The results say it can not find the first two and then the > mathces for the last two do not exist! > > Thanks for any help! > > > > Program > > > #!/usr/bin/perl > > use Bio::Tools::Run::StandAloneBlastPlus; > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( >???-db_name => 'ITS', >???-db_data => 'smallDB.fas', >???-create => 1 > ); > > $result = $fac->blastn( -query => , 'sequences.fasta', >? ? ? ? ? ? ? ? ? ? ? ???-outfile => 'ITStest2.bls'); > > > smallDB.fas Data > > >302585252|HM807352|Waitea circinata? internal transcribed spacer 1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >302585252|HM807352|Waitea circinata? internal transcribed spacer 2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC > CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA > TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT > GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC > CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG > GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC > GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC > ATATCATTAAAGCGGAGGAA > > >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1 > GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC > GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA > TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG > AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT > ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC > TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG > ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT > GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG > TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG > TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA > CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG > TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA > TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA > ACGCA > CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA > AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT > TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA > > >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer > 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG > CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT > CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG > GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA > ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA > AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA > AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA > ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA > CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA > AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA > GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG > TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA > AACTTTTATATATAGTTCGCCGAATAATAGCGAAC > > >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer > 1 > AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG > CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC > GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG > AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC > GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG > AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC > GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG > GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC > CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG > TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT > GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG > TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT > ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC > > >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1 > AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT > TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC > TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG > GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA > TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA > AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA > TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC > GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC > GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC > ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA > GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC > AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT > AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG > > >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1 > AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC > TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT > TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC > TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG > ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA > GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT > TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA > CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT > AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA > GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA > TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT > ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG > > > > sequences.fasta data > > >Test1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >Test2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >Test3 > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >Test4 > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > > > > Results > > BLASTN 2.2.24+ > > > Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb > Miller (2000), "A greedy algorithm for aligning DNA sequences", J > Comput Biol 2000; 7(1-2):203-14. > > > > Database: ITS >? ? ? ? ? ? 5 sequences; 1,102 total letters > > > > Query=? Test1 > Length=204 > > > ***** No hits found ***** > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 202071 > > > Query=? Test2 > Length=192 > > > ***** No hits found ***** > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 189507 > > > Query=? Test3 > Length=437 > > Score? ???E > Sequences producing significant alignments: > (Bits)? Value > > dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 300? ? 2e-085 > dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4? ? 6e-016 > dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4? ? 1e-012 > dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5? ? 4e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > >? Score =? 300 bits (162),? Expect = 2e-085 >? Identities = 176/182 (96%), Gaps = 4/182 (2%) >? Strand=Plus/Plus > > Query? 10???TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC > 66 >? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| > Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC > 81 > > Query? 67???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 126 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 82???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 141 > > Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 186 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 201 > > Query? 187? GG? 188 >? ? ? ? ? ???|| > Sbjct? 202? GG? 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > >? Score = 69.4 bits (37),? Expect = 6e-016 >? Identities = 39/40 (97%), Gaps = 0/40 (0%) >? Strand=Plus/Plus > > Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188 >? ? ? ? ? ???||||| |||||||||||||||||||||||||||||||||| > Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > >? Score = 58.4 bits (31),? Expect = 1e-012 >? Identities = 39/42 (92%), Gaps = 3/42 (7%) >? Strand=Plus/Plus > > Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 >? ? ? ? ? ???|||| || ||| |||||||||||||||||||||||||||||| > Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > >? Score = 56.5 bits (30),? Expect = 4e-012 >? Identities = 30/30 (100%), Gaps = 0/30 (0%) >? Strand=Plus/Plus > > Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186 >? ? ? ? ? ???|||||||||||||||||||||||||||||| > Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 442850 > > > Query=? Test4 > Length=521 > > Score? ???E > Sequences producing significant alignments: > (Bits)? Value > > dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 309? ? 4e-088 > dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4? ? 7e-016 > dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4? ? 1e-012 > dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5? ? 5e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > >? Score =? 309 bits (167),? Expect = 4e-088 >? Identities = 177/181 (97%), Gaps = 3/181 (1%) >? Strand=Plus/Plus > > Query? 7? ? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA > 63 >? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| > Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA > 82 > > Query? 64???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 123 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 83???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 142 > > Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 183 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 202 > > Query? 184? G? 184 >? ? ? ? ? ???| > Sbjct? 203? G? 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > >? Score = 69.4 bits (37),? Expect = 7e-016 >? Identities = 39/40 (97%), Gaps = 0/40 (0%) >? Strand=Plus/Plus > > Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184 >? ? ? ? ? ???||||| |||||||||||||||||||||||||||||||||| > Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > >? Score = 58.4 bits (31),? Expect = 1e-012 >? Identities = 39/42 (92%), Gaps = 3/42 (7%) >? Strand=Plus/Plus > > Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182 >? ? ? ? ? ???|||| || ||| |||||||||||||||||||||||||||||| > Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > >? Score = 56.5 bits (30),? Expect = 5e-012 >? Identities = 30/30 (100%), Gaps = 0/30 (0%) >? Strand=Plus/Plus > > Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182 >? ? ? ? ? ???|||||||||||||||||||||||||||||| > Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 530378 > > >???Database: ITS >? ???Posted date:? Aug 27, 2010? 9:43 AM >???Number of letters in database: 1,102 >???Number of sequences in database:? 5 > > > > Matrix: blastn matrix 1 -2 > Gap Penalties: Existence: 0, Extension: 2.5 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From David.Messina at sbc.su.se Fri Sep 10 12:23:26 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 10 Sep 2010 18:23:26 +0200 Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <23696.14536.qm@web37508.mail.mud.yahoo.com> References: <23696.14536.qm@web37508.mail.mud.yahoo.com> Message-ID: Hi Sally, Did you run the same search on the command line, outside of BioPerl? The issue you're having may be with Blast+ and not BioPerl. For example, it's possible that the low-complexity and compositional matrix adjustment filtering (which are turned on by default) are excluding the expected matches. Dave On Sep 10, 2010, at 17:13 , sally roberts wrote: > I think that is just a email error. Thanks for looking though! > > --- On Thu, 9/9/10, Smithies, Russell wrote: > > From: Smithies, Russell > Subject: RE: [Bioperl-l] standaloneblastplus > To: "'sally roberts'" , "'bioperl-l at lists.open-bio.org'" > Date: Thursday, September 9, 2010, 6:54 PM > > Is that a typo in your email or are some of your fasta headers in your db incorrect? > Eg. >> 301333052|GU725063|Xiphinema adenohystherum internal transcribed >> 301333052|GU725063|spacer 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT > > Shouldn't that be: >> 301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT > > Maybe the invalid fasta headers are breaking the db formatter? > > > Russell Smithies > > Technical Support > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of sally roberts >> Sent: Friday, 10 September 2010 4:10 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] standaloneblastplus >> >> I am running a test for standaloneblastplus but getting data back that >> does not exist in my query or my local database. Below is a outline of my >> script small database, query list, and erroneous results. As you will >> notice the query list is comprised of the first four sequences found in >> the database. The results say it can not find the first two and then the >> mathces for the last two do not exist! >> >> Thanks for any help! >> >> >> >> Program >> >> >> #!/usr/bin/perl >> >> use Bio::Tools::Run::StandAloneBlastPlus; >> >> >> $fac = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_name => 'ITS', >> -db_data => 'smallDB.fas', >> -create => 1 >> ); >> >> $result = $fac->blastn( -query => , 'sequences.fasta', >> -outfile => 'ITStest2.bls'); >> >> >> smallDB.fas Data >> >>> 302585252|HM807352|Waitea circinata internal transcribed spacer 1 >> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC >> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT >> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >> >>> 302585252|HM807352|Waitea circinata internal transcribed spacer 2 >> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT >> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA >> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >> >>> 302585250|HM802273|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT >> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA >> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA >> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT >> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC >> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >> >>> 302585249|HM802272|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG >> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA >> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT >> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT >> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT >> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA >> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG >> GAA >> >>> 302585248|HM802271|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC >> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA >> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT >> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC >> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG >> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC >> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC >> ATATCATTAAAGCGGAGGAA >> >>> 301333053|GU725064|Xiphinema turcicum internal transcribed spacer 1 >> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC >> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA >> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG >> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT >> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC >> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG >> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT >> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG >> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG >> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA >> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG >> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA >> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA >> ACGCA >> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA >> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT >> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA >> >>> 301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer >> 1 >> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG >> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT >> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG >> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA >> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA >> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA >> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA >> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA >> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA >> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA >> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG >> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA >> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC >> >>> 301333051|GU725062|Xiphinema sphaerocephalum internal transcribed spacer >> 1 >> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG >> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC >> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG >> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC >> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG >> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC >> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG >> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC >> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG >> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT >> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG >> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT >> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC >> >>> 301333050|GU725061|Xiphinema hispanum internal transcribed spacer 1 >> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT >> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC >> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG >> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA >> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA >> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA >> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC >> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC >> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC >> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA >> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC >> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT >> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG >> >>> 301333049|GU725060|Xiphinema pyrenaicum internal transcribed spacer 1 >> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC >> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT >> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC >> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG >> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA >> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT >> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA >> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT >> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA >> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA >> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT >> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG >> >> >> >> sequences.fasta data >> >>> Test1 >> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC >> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT >> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >> >>> Test2 >> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT >> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA >> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >> >>> Test3 >> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT >> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA >> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA >> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT >> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC >> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >> >>> Test4 >> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG >> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA >> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT >> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT >> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT >> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA >> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG >> GAA >> >> >> >> >> Results >> >> BLASTN 2.2.24+ >> >> >> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb >> Miller (2000), "A greedy algorithm for aligning DNA sequences", J >> Comput Biol 2000; 7(1-2):203-14. >> >> >> >> Database: ITS >> 5 sequences; 1,102 total letters >> >> >> >> Query= Test1 >> Length=204 >> >> >> ***** No hits found ***** >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 202071 >> >> >> Query= Test2 >> Length=192 >> >> >> ***** No hits found ***** >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 189507 >> >> >> Query= Test3 >> Length=437 >> >> Score E >> Sequences producing significant alignments: >> (Bits) Value >> >> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 300 2e-085 >> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 69.4 6e-016 >> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 58.4 1e-012 >> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 56.5 4e-012 >> >> >>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G59F >> Length=203 >> >> Score = 300 bits (162), Expect = 2e-085 >> Identities = 176/182 (96%), Gaps = 4/182 (2%) >> Strand=Plus/Plus >> >> Query 10 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC >> 66 >> ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| >> Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC >> 81 >> >> Query 67 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT >> 126 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 82 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT >> 141 >> >> Query 127 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT >> 186 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 142 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT >> 201 >> >> Query 187 GG 188 >> || >> Sbjct 202 GG 203 >> >> >>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G64F >> Length=217 >> >> Score = 69.4 bits (37), Expect = 6e-016 >> Identities = 39/40 (97%), Gaps = 0/40 (0%) >> Strand=Plus/Plus >> >> Query 149 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 188 >> ||||| |||||||||||||||||||||||||||||||||| >> Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 >> >> >>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G60F >> Length=206 >> >> Score = 58.4 bits (31), Expect = 1e-012 >> Identities = 39/42 (92%), Gaps = 3/42 (7%) >> Strand=Plus/Plus >> >> Query 146 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 186 >> |||| || ||| |||||||||||||||||||||||||||||| >> Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 >> >> >>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G65F >> Length=256 >> >> Score = 56.5 bits (30), Expect = 4e-012 >> Identities = 30/30 (100%), Gaps = 0/30 (0%) >> Strand=Plus/Plus >> >> Query 157 AAAACTTTCAACAACGGATCTCTTGGTTCT 186 >> |||||||||||||||||||||||||||||| >> Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 442850 >> >> >> Query= Test4 >> Length=521 >> >> Score E >> Sequences producing significant alignments: >> (Bits) Value >> >> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 309 4e-088 >> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 69.4 7e-016 >> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 58.4 1e-012 >> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 56.5 5e-012 >> >> >>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G59F >> Length=203 >> >> Score = 309 bits (167), Expect = 4e-088 >> Identities = 177/181 (97%), Gaps = 3/181 (1%) >> Strand=Plus/Plus >> >> Query 7 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA >> 63 >> ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| >> Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA >> 82 >> >> Query 64 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG >> 123 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 83 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG >> 142 >> >> Query 124 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG >> 183 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 143 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG >> 202 >> >> Query 184 G 184 >> | >> Sbjct 203 G 203 >> >> >>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G64F >> Length=217 >> >> Score = 69.4 bits (37), Expect = 7e-016 >> Identities = 39/40 (97%), Gaps = 0/40 (0%) >> Strand=Plus/Plus >> >> Query 145 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 184 >> ||||| |||||||||||||||||||||||||||||||||| >> Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 >> >> >>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G60F >> Length=206 >> >> Score = 58.4 bits (31), Expect = 1e-012 >> Identities = 39/42 (92%), Gaps = 3/42 (7%) >> Strand=Plus/Plus >> >> Query 142 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 182 >> |||| || ||| |||||||||||||||||||||||||||||| >> Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 >> >> >>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G65F >> Length=256 >> >> Score = 56.5 bits (30), Expect = 5e-012 >> Identities = 30/30 (100%), Gaps = 0/30 (0%) >> Strand=Plus/Plus >> >> Query 153 AAAACTTTCAACAACGGATCTCTTGGTTCT 182 >> |||||||||||||||||||||||||||||| >> Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 530378 >> >> >> Database: ITS >> Posted date: Aug 27, 2010 9:43 AM >> Number of letters in database: 1,102 >> Number of sequences in database: 5 >> >> >> >> Matrix: blastn matrix 1 -2 >> Gap Penalties: Existence: 0, Extension: 2.5 >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Sat Sep 11 12:13:09 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Sat, 11 Sep 2010 17:13:09 +0100 Subject: [Bioperl-l] Regarding GSoC 2010 In-Reply-To: References: Message-ID: <019501cb51cc$39d15730$ad740590$%yin@ucd.ie> Hi, Jayanthi Jayakumar, GSoC is already finished this year. You can check the information here: http://socghop.appspot.com/gsoc/program/home/google/gsoc2010 However, you can still contribute to the BioPerl project if you like. You can talk to people in this mail list. Or you can join the IRC channel (http://www.bioperl.org/wiki/IRC). Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jayanthijayakumar Sent: Thursday, September 09, 2010 6:00 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Regarding GSoC 2010 Respected sir/madam, I am Jayanthi Jayakumar doing my second year MS(By Research) in computational biology in Anna University Chennai,India. Iam very much interested to participate in GSoC 2010 under the project "Major Bioperl recognition". I request you to provide details and eligiblity criteria for the same. Thanking you, yours faithfully, Jayanthi Jayakumar _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From david.breimann at gmail.com Sun Sep 12 09:16:29 2010 From: david.breimann at gmail.com (David Breimann) Date: Sun, 12 Sep 2010 15:16:29 +0200 Subject: [Bioperl-l] Circular genomes Message-ID: Hello, As continuation to http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033904.html, I would like to ask: Was the fix implemented yet? That is, are GFF3 created for circular genomes comply with GFF3 specs for such genomes? I just find it difficult to keep track using git ,so I'm not sure if this was already handled. Also, will the stat and end coordinates of such genes loaded from a GFF3 file will be "normal" (i.e. no coordinate is larger than the size of the genome) or just as written in the GFF3 (which demands that end > start even if end > genome length)? Thanks, David From David.Messina at sbc.su.se Mon Sep 13 11:10:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 13 Sep 2010 17:10:42 +0200 Subject: [Bioperl-l] BioPerl net installer Message-ID: <80921A33-63E0-481A-B31B-3C0338542F2B@sbc.su.se> Hi everyone, I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. It's already part of bioperl-live, and you can also get it here: http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl Dave From maj at fortinbras.us Mon Sep 13 12:47:45 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 13 Sep 2010 16:47:45 +0000 Subject: [Bioperl-l] BioPerl net installer Message-ID: Dear Scott- You rock! Sincerely, Mark >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Monday, September 13, 2010 11:10 AM >To: 'BioPerl List' >Subject: [Bioperl-l] BioPerl net installer > >Hi everyone, > >I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. > >The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. > >It's already part of bioperl-live, and you can also get it here: > > http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Sep 13 17:15:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 13 Sep 2010 16:15:45 -0500 Subject: [Bioperl-l] BioPerl net installer In-Reply-To: References: Message-ID: <3D7D24C5-B2BD-472E-9611-F3D7112E453D@illinois.edu> Ditto! chris (briefly resurfacing) On Sep 13, 2010, at 11:47 AM, Mark A. Jensen wrote: > Dear Scott- > You rock! > Sincerely, > Mark > >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Monday, September 13, 2010 11:10 AM >> To: 'BioPerl List' >> Subject: [Bioperl-l] BioPerl net installer >> >> Hi everyone, >> >> I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. >> >> The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. >> >> It's already part of bioperl-live, and you can also get it here: >> >> http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl >> >> >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From timmcilveen at talktalk.net Mon Sep 13 19:07:00 2010 From: timmcilveen at talktalk.net (tim) Date: Tue, 14 Sep 2010 00:07:00 +0100 Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3 Message-ID: <201009140007.00798.timmcilveen@talktalk.net> Hi, I have just installed Bioperl on my Linux system using the CPAN install. The install summary is as follows: Test Summary Report ------------------- t/RemoteDB/GenPept.t (Wstat: 256 Tests: 21 Failed: 1) Failed test: 17 Non-zero exit status: 1 t/RemoteDB/Query/GenBank.t (Wstat: 256 Tests: 18 Failed: 1) Failed test: 9 Non-zero exit status: 1 Parse errors: Bad plan. You planned 21 tests but ran 18. t/RemoteDB/Taxonomy.t (Wstat: 512 Tests: 103 Failed: 2) Failed tests: 15, 98 Non-zero exit status: 2 t/Root/RootIO.t (Wstat: 7424 Tests: 30 Failed: 0) Non-zero exit status: 29 Parse errors: Bad plan. You planned 31 tests but ran 30. Files=329, Tests=18407, 512 wallclock secs ( 6.19 usr 0.91 sys + 156.68 cusr 9.16 csys = 172.94 CPU) Result: FAIL Failed 4/329 test programs. 4/18407 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Running Build install make test had returned bad status, won't install without force Failed during this command: CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO Is Bioperl properly installed? During the install process I was getting quite a lot of this error (100's of instances): 'replacement list longer than search list . This happened with t/tools, t/seq / t/search and many others. Any advice would be great. Tim From David.Messina at sbc.su.se Tue Sep 14 03:56:33 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 14 Sep 2010 09:56:33 +0200 Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3 In-Reply-To: <201009140007.00798.timmcilveen@talktalk.net> References: <201009140007.00798.timmcilveen@talktalk.net> Message-ID: <5955676D-D3BC-452B-BAA0-6F230EC11EC1@sbc.su.se> Hi Tim, Thanks for your report. > Is Bioperl properly installed? No, it wasn't. When installing through CPAN, if any tests fail the installation is aborted. You can always check by looking for this line: > make test had returned bad status, won't install without force As for the error(s) > 'replacement list longer than search list' I believe this was fixed a couple of months ago. For details, see: http://bugzilla.open-bio.org/show_bug.cgi?id=3116 So I would recommend that you grab the latest copy of bioperl-live from github, wherein the bug will be fixed: http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots Give that a shot and let us know how it goes. Dave From jskittrell at unmc.edu Thu Sep 16 12:15:49 2010 From: jskittrell at unmc.edu (Jeff Kittrell) Date: Thu, 16 Sep 2010 16:15:49 +0000 (UTC) Subject: [Bioperl-l] mpiblast Message-ID: Does Bioperl work with mpiblast? Is the there a standalone like module that allows you to easily call mpiblast? I'm assuming seqio with parse a mpiblast output file correctly? Thanks for any help, Jeff From David.Messina at sbc.su.se Thu Sep 16 14:25:57 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 16 Sep 2010 20:25:57 +0200 Subject: [Bioperl-l] mpiblast In-Reply-To: References: Message-ID: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> > Is the there a standalone like module that allows you to easily call mpiblast? No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward. http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase > I'm assuming seqio with parse a mpiblast output file correctly? Yes, although I see that a new version of mpiblast was recently released. Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet? Dave From shalabh.sharma7 at gmail.com Thu Sep 16 17:38:14 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 16 Sep 2010 17:38:14 -0400 Subject: [Bioperl-l] IUPAC code similarity Message-ID: Hi All, I have few nucleotide sequences that are composed of IUPAC codes. Like >test VGSRVBSSSSSNSC Similarly i have a database made of of these kind of sequences. I want to find sequences that are 100% similar to the query sequence. Is there any bioPerl module to deal with this, i tried normal blast but it didn't worked. Do i have to convert these sequences to 4 base codes or there is any other way out. Thanks Shalabh From amackey at virginia.edu Fri Sep 17 10:28:15 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 10:28:15 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Convert the IUPAC code to a regular expression, and use regular expressions (in Perl or grep or similar) to find 100% identical matches. -Aaron On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma wrote: > Hi All, > I have few nucleotide sequences that are composed of IUPAC codes. Like > >test > VGSRVBSSSSSNSC > > Similarly i have a database made of of these kind of sequences. I want to > find sequences that are 100% similar to the query sequence. > > Is there any bioPerl module to deal with this, i tried normal blast but it > didn't worked. > Do i have to convert these sequences to 4 base codes or there is any other > way out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Fri Sep 17 11:07:38 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 17 Sep 2010 11:07:38 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Thanks Aaron for your reply. Actually i tried that first, but there is another problem, i have to divide each query sequence to window size 5 with 1 base shift and its not possible to divide regular expression in that way. So what i am trying is to convert those iupac codes to 4 base code sequence and then do the normal search. Now the problem is that i cant able to convert those IUPAC sequences to normal ones, i am still trying to write a script but its taking time. Thanks Shalabh On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: > Convert the IUPAC code to a regular expression, and use regular expressions > (in Perl or grep or similar) to find 100% identical matches. > > -Aaron > > On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma > wrote: > >> Hi All, >> I have few nucleotide sequences that are composed of IUPAC codes. >> Like >> >test >> VGSRVBSSSSSNSC >> >> Similarly i have a database made of of these kind of sequences. I want to >> find sequences that are 100% similar to the query sequence. >> >> Is there any bioPerl module to deal with this, i tried normal blast but it >> didn't worked. >> Do i have to convert these sequences to 4 base codes or there is any other >> way out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From roy.chaudhuri at gmail.com Fri Sep 17 11:04:28 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 17 Sep 2010 16:04:28 +0100 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: <4C93837C.4080008@gmail.com> Hi Shalabh, The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC codes to regular expressions: $perl -e 'use Bio::Tools::SeqPattern; print Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand' [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C Although that won't work if there are also abiguity codes in your database. For a non-BioPerl solution you could try fuzznuc from Emboss. Cheers. Roy. On 17/09/2010 15:28, Aaron Mackey wrote: > Convert the IUPAC code to a regular expression, and use regular expressions > (in Perl or grep or similar) to find 100% identical matches. > > -Aaron > > On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma > wrote: > >> Hi All, >> I have few nucleotide sequences that are composed of IUPAC codes. Like >>> test >> VGSRVBSSSSSNSC >> >> Similarly i have a database made of of these kind of sequences. I want to >> find sequences that are 100% similar to the query sequence. >> >> Is there any bioPerl module to deal with this, i tried normal blast but it >> didn't worked. >> Do i have to convert these sequences to 4 base codes or there is any other >> way out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Fri Sep 17 14:13:22 2010 From: david.breimann at gmail.com (David Breimann) Date: Fri, 17 Sep 2010 20:13:22 +0200 Subject: [Bioperl-l] Installing using git after an older installation Message-ID: Hello, I'm sharing a server with some other lab members. I would like to install the latest version of bioperl for my own use, without affecting my colleagues. I used git to clone a copy of bioperl-live and exported PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB". Now perl -MBio::Perl -le 'print Bio::Perl->VERSION;' returns 1.0069 My question is: is that all? Now I'm using the latest version? Should I include anything special in my scripts? Also, what about all the bp_***.pl scripts? Are the now using the latest version, too? I guess not, since I didn't build anything. So what should I do about them? Thanks, Dave From amackey at virginia.edu Fri Sep 17 15:24:44 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 15:24:44 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: <4C93837C.4080008@gmail.com> References: <4C93837C.4080008@gmail.com> Message-ID: If there are ambi. codes in the database, then the expanded character class has to also include the original ambiguity code; non-ambiguous nucleotides must also be expanded to include all ambiguity codes that represent the nucleotide. -Aaron On Fri, Sep 17, 2010 at 11:04 AM, Roy Chaudhuri wrote: > Hi Shalabh, > > The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC > codes to regular expressions: > > $perl -e 'use Bio::Tools::SeqPattern; print > Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand' > [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C > > Although that won't work if there are also abiguity codes in your database. > For a non-BioPerl solution you could try fuzznuc from Emboss. > > Cheers. > Roy. > > > On 17/09/2010 15:28, Aaron Mackey wrote: > >> Convert the IUPAC code to a regular expression, and use regular >> expressions >> (in Perl or grep or similar) to find 100% identical matches. >> >> -Aaron >> >> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma >> wrote: >> >> Hi All, >>> I have few nucleotide sequences that are composed of IUPAC codes. >>> Like >>> >>>> test >>>> >>> VGSRVBSSSSSNSC >>> >>> Similarly i have a database made of of these kind of sequences. I want to >>> find sequences that are 100% similar to the query sequence. >>> >>> Is there any bioPerl module to deal with this, i tried normal blast but >>> it >>> didn't worked. >>> Do i have to convert these sequences to 4 base codes or there is any >>> other >>> way out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From amackey at virginia.edu Fri Sep 17 15:25:54 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 15:25:54 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: do your windowing/shifting on the unexpanded query sequences; then transform the 5-bp queries into regular expressions. -Aaron On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma wrote: > Thanks Aaron for your reply. > Actually i tried that first, but there is another problem, i have to divide > each query sequence to window size 5 with 1 base shift and its not possible > to divide regular expression in that way. > So what i am trying is to convert those iupac codes to 4 base code sequence > and then do the normal search. > Now the problem is that i cant able to convert those IUPAC sequences to > normal ones, i am still trying to write a script but its taking time. > > Thanks > Shalabh > > > On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: > >> Convert the IUPAC code to a regular expression, and use regular >> expressions (in Perl or grep or similar) to find 100% identical matches. >> >> -Aaron >> >> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma < >> shalabh.sharma7 at gmail.com> wrote: >> >>> Hi All, >>> I have few nucleotide sequences that are composed of IUPAC codes. >>> Like >>> >test >>> VGSRVBSSSSSNSC >>> >>> Similarly i have a database made of of these kind of sequences. I want to >>> find sequences that are 100% similar to the query sequence. >>> >>> Is there any bioPerl module to deal with this, i tried normal blast but >>> it >>> didn't worked. >>> Do i have to convert these sequences to 4 base codes or there is any >>> other >>> way out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > From Kevin.M.Brown at asu.edu Fri Sep 17 16:09:34 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 17 Sep 2010 13:09:34 -0700 Subject: [Bioperl-l] Installing using git after an older installation In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40701E0A4@EX02.asurite.ad.asu.edu> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_IN_A_PERSONAL_MODULE_AREA From shalabh.sharma7 at gmail.com Fri Sep 17 16:45:50 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 17 Sep 2010 16:45:50 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Thanks Aaron, changing the query sequence worked well but i am still struggling with the database. -Shalabh On Fri, Sep 17, 2010 at 3:25 PM, Aaron Mackey wrote: > do your windowing/shifting on the unexpanded query sequences; then > transform the 5-bp queries into regular expressions. > > -Aaron > > > On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma < > shalabh.sharma7 at gmail.com> wrote: > >> Thanks Aaron for your reply. >> Actually i tried that first, but there is another problem, i have to >> divide each query sequence to window size 5 with 1 base shift and its not >> possible to divide regular expression in that way. >> So what i am trying is to convert those iupac codes to 4 base code >> sequence and then do the normal search. >> Now the problem is that i cant able to convert those IUPAC sequences to >> normal ones, i am still trying to write a script but its taking time. >> >> Thanks >> Shalabh >> >> >> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: >> >>> Convert the IUPAC code to a regular expression, and use regular >>> expressions (in Perl or grep or similar) to find 100% identical matches. >>> >>> -Aaron >>> >>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma < >>> shalabh.sharma7 at gmail.com> wrote: >>> >>>> Hi All, >>>> I have few nucleotide sequences that are composed of IUPAC codes. >>>> Like >>>> >test >>>> VGSRVBSSSSSNSC >>>> >>>> Similarly i have a database made of of these kind of sequences. I want >>>> to >>>> find sequences that are 100% similar to the query sequence. >>>> >>>> Is there any bioPerl module to deal with this, i tried normal blast but >>>> it >>>> didn't worked. >>>> Do i have to convert these sequences to 4 base codes or there is any >>>> other >>>> way out. >>>> >>>> Thanks >>>> Shalabh >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >> > From heikki.lehvaslaiho at gmail.com Sat Sep 18 03:41:22 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Sat, 18 Sep 2010 10:41:22 +0300 Subject: [Bioperl-l] mpiblast In-Reply-To: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> References: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> Message-ID: Been running 1.6 and its betas on Blue Gene/P for months. The output is identical to standard BLAST output. No issues in parsing it with BioPerl. ? ?? -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849? office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 16 September 2010 21:25, Dave Messina wrote: >> Is the there a standalone like module that allows you to easily call mpiblast? > > No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward. > > ? ? ? ?http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase > > >> I'm assuming seqio with parse a mpiblast output file correctly? > > Yes, although I see that a new version of mpiblast was recently released. > > Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet? > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From david.breimann at gmail.com Sat Sep 18 05:05:58 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 11:05:58 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl Message-ID: Hello, I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag` in the fields and sometime it doesn't, even though the genabank has a locus tag. Also, is the ID always equivalent to the locus tag? Thanks, Dave From scott at scottcain.net Sat Sep 18 05:17:24 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 10:17:24 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, bp_genbank2gff3.pl suffers from the fact that it has to deal with GenBank files :-) It was designed initially to work on whole genome refseqs, and contains several ad hoc rules for trying to make it "do the right thing." In practice, it is not unusual for a post processing step (either by hand or a quicky perl script) to be required to really get it right. I don't recall the specifics (if I ever knew :-) for when and how the locus tag is used, but I do know that there is a list of things that it will try to use for the ID, and while the locus is on the list, I don't know where it comes in the list, so it's possible that other items might supersede it. Scott On Sat, Sep 18, 2010 at 10:05 AM, David Breimann wrote: > Hello, > > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag` > in the fields and sometime it doesn't, even though the genabank has a locus > tag. > Also, is the ID always equivalent to the locus tag? > > Thanks, > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 05:20:33 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 11:20:33 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Since locus_tag is an essential tag in genbank, I suggest locus_tag will be always added to the GFF last column if it exists in the genbank, whether it is used as ID in the GFF or not. On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain wrote: > Hi Dave, > > bp_genbank2gff3.pl suffers from the fact that it has to deal with > GenBank files :-) It was designed initially to work on whole genome > refseqs, and contains several ad hoc rules for trying to make it "do > the right thing." In practice, it is not unusual for a post > processing step (either by hand or a quicky perl script) to be > required to really get it right. I don't recall the specifics (if I > ever knew :-) for when and how the locus tag is used, but I do know > that there is a list of things that it will try to use for the ID, and > while the locus is on the list, I don't know where it comes in the > list, so it's possible that other items might supersede it. > > Scott > > > On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > wrote: > > Hello, > > > > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a > `locus_tag` > > in the fields and sometime it doesn't, even though the genabank has a > locus > > tag. > > Also, is the ID always equivalent to the locus tag? > > > > Thanks, > > Dave > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From scott at scottcain.net Sat Sep 18 06:08:26 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 11:08:26 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, That seems perfectly reasonable. If you could point out a GenBank entry for which that does not happen, I could try to figure out why not. Scott On Sat, Sep 18, 2010 at 10:20 AM, David Breimann wrote: > Since locus_tag is an essential tag in genbank, I suggest locus_tag will be > always added to the GFF last column if it exists in the genbank, whether it > is used as ID in the GFF or not. > > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain wrote: >> >> Hi Dave, >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with >> GenBank files :-) ?It was designed initially to work on whole genome >> refseqs, and contains several ad hoc rules for trying to make it "do >> the right thing." ?In practice, it is not unusual for a post >> processing step (either by hand or a quicky perl script) to be >> required to really get it right. ?I don't recall the specifics (if I >> ever knew :-) for when and how the locus tag is used, but I do know >> that there is a list of things that it will try to use for the ID, and >> while the locus is on the list, I don't know where it comes in the >> list, so it's possible that other items might supersede it. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> wrote: >> > Hello, >> > >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a >> > `locus_tag` >> > in the fields and sometime it doesn't, even though the genabank has a >> > locus >> > tag. >> > Also, is the ID always equivalent to the locus tag? >> > >> > Thanks, >> > Dave >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 06:20:50 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 12:20:50 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Scott, Here is a very short genbank: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk Note all genes in the genbank have locus tags. In the resulting GFF3, however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no idea why it deserves a special treatment... :) p.s. making this change (i.e., copying locus_tag to the GFF3 last column whenever available) will really make my life easier. Thank you, Dave On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain wrote: > Hi Dave, > > That seems perfectly reasonable. If you could point out a GenBank > entry for which that does not happen, I could try to figure out why > not. > > Scott > > > On Sat, Sep 18, 2010 at 10:20 AM, David Breimann > wrote: > > Since locus_tag is an essential tag in genbank, I suggest locus_tag will > be > > always added to the GFF last column if it exists in the genbank, whether > it > > is used as ID in the GFF or not. > > > > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain > wrote: > >> > >> Hi Dave, > >> > >> bp_genbank2gff3.pl suffers from the fact that it has to deal with > >> GenBank files :-) It was designed initially to work on whole genome > >> refseqs, and contains several ad hoc rules for trying to make it "do > >> the right thing." In practice, it is not unusual for a post > >> processing step (either by hand or a quicky perl script) to be > >> required to really get it right. I don't recall the specifics (if I > >> ever knew :-) for when and how the locus tag is used, but I do know > >> that there is a list of things that it will try to use for the ID, and > >> while the locus is on the list, I don't know where it comes in the > >> list, so it's possible that other items might supersede it. > >> > >> Scott > >> > >> > >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > >> wrote: > >> > Hello, > >> > > >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a > >> > `locus_tag` > >> > in the fields and sometime it doesn't, even though the genabank has a > >> > locus > >> > tag. > >> > Also, is the ID always equivalent to the locus tag? > >> > > >> > Thanks, > >> > Dave > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From david.breimann at gmail.com Sat Sep 18 06:45:13 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 12:45:13 +0200 Subject: [Bioperl-l] Extracting sequences from GFF3 Message-ID: As you know, GFF3 files can contain FASTA sequences after the features. How do I extract a specific FASTA sequence given it's ID? I tried: use Bio::Tools::GFF; use Data::Dumper; my $gffio = Bio::Tools::GFF->new( -file => "/path/to/file.gff", -gff_version => 3 ); print Dumper $gffio->get_seqs(); but $gffio->get_seqs() seems to return nothing, although the GFF3 has sequences and is also valid. By the way, I am able to parse the features themselves (using $gffio->next_feature()). Thanks, Dave From scott at scottcain.net Sat Sep 18 07:07:13 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 12:07:13 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, A fresh "pull" of the bioperl git repository shows that bp_genbank2gff3.pl already does this. It creates a locus_tag for all features that have a locus_tag, and uses the locus_tag for the ID when it can (it can't blindly use the locus tag for the ID since both the gene and the CDS have the same tag). Scott On Sat, Sep 18, 2010 at 11:20 AM, David Breimann wrote: > Hi Scott, > > Here is a very short genbank: > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > > Note all genes in the genbank have locus tags. In the resulting GFF3, > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no > idea why it deserves a special treatment... :) > > p.s. making this change (i.e., copying locus_tag to the GFF3 last column > whenever available) will really make my life easier. > > Thank you, > Dave > > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain wrote: >> >> Hi Dave, >> >> That seems perfectly reasonable. ?If you could point out a GenBank >> entry for which that does not happen, I could try to figure out why >> not. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> wrote: >> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will >> > be >> > always added to the GFF last column if it exists in the genbank, whether >> > it >> > is used as ID in the GFF or not. >> > >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> > wrote: >> >> >> >> Hi Dave, >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with >> >> GenBank files :-) ?It was designed initially to work on whole genome >> >> refseqs, and contains several ad hoc rules for trying to make it "do >> >> the right thing." ?In practice, it is not unusual for a post >> >> processing step (either by hand or a quicky perl script) to be >> >> required to really get it right. ?I don't recall the specifics (if I >> >> ever knew :-) for when and how the locus tag is used, but I do know >> >> that there is a list of things that it will try to use for the ID, and >> >> while the locus is on the list, I don't know where it comes in the >> >> list, so it's possible that other items might supersede it. >> >> >> >> Scott >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> wrote: >> >> > Hello, >> >> > >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a >> >> > `locus_tag` >> >> > in the fields and sometime it doesn't, even though the genabank has a >> >> > locus >> >> > tag. >> >> > Also, is the ID always equivalent to the locus tag? >> >> > >> >> > Thanks, >> >> > Dave >> >> > _______________________________________________ >> >> > Bioperl-l mailing list >> >> > Bioperl-l at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 07:13:23 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 12:13:23 +0100 Subject: [Bioperl-l] Extracting sequences from GFF3 In-Reply-To: References: Message-ID: Hi Dave, I would use Bio::DB::SeqFeature::Store (either with a database on the backend or a flat file if a database isn't warranted): my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory', -dir => 'path/to/file' ); # Warning: this returns a string, and not a PrimarySeq object my $sequence = $db->fetch_sequence('Chr1',5000=>6000); Scott On Sat, Sep 18, 2010 at 11:45 AM, David Breimann wrote: > As you know, GFF3 files can contain FASTA sequences after the features. > > How do I extract a specific FASTA sequence given it's ID? > > I tried: > > use Bio::Tools::GFF; > use Data::Dumper; > > my $gffio = Bio::Tools::GFF->new( > -file => > "/path/to/file.gff", > -gff_version => 3 > ); > > print Dumper $gffio->get_seqs(); > > but $gffio->get_seqs() seems to return nothing, although the GFF3 has > sequences and is also valid. > > By the way, I am able to parse the features themselves (using > $gffio->next_feature()). > > > Thanks, > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 09:40:35 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 14:40:35 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, Let's keep the discussion on the mailing list so we can make sure that when this problem is solved, its resolution will be archived. I don't really understand what is going on either, though it would probably be a good idea to set your PERL5LIB env variable so that when you execute this script from the git repository that it will also uses BioPerl modules in the git repository instead of the ones that are installed in your "normal" path. Also, are you using any command line flags when executing it? I didn't. Scott On Sat, Sep 18, 2010 at 2:14 PM, David Breimann wrote: > Yes, I'm using Ubuntu 10.04. > > That is really weired. I tried running the script from the perl-live dir > (which I just pulled using git), and I get the same results as before > (`Name` instead of `locus_tag`): > > ?$ wget > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y > NC_009789.genbank > > Attached is the resulting GFF3. > I also attach a copy of bp_genbank2gff3.pl as found under > /home/dave/src/bioperl-live/blib/script. > > This is a real mystery for me! > > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >> >> Typically I do build and install, but you can run it directly from the >> git checkout directory. >> >> For locating other versions of the script, are you running linux? ?If >> so, are you familiar with the "locate" command: >> >> ?locate bp_genbank2gff3.pl >> >> If you've never used it before, you may need to update the database >> the locate command uses as root: >> >> ?sudo updatedb >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >> wrote: >> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed >> > of >> > `locus_tag=`. >> > >> > I don't really know how to check for multiple bioperl installations. >> > I'm using my personal server, so I don't mind removing and installing >> > everything from scratch -- but I do'nt know ho to do that. >> > >> > Also, what I don't get with the git is how the scripts are supposed to >> > be >> > updated (unless you build and install). >> > >> > Thanks you! >> > >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain wrote: >> >> >> >> Well, if you aren't getting the same results as me then I'd say you >> >> aren't using the same version of the script :-) >> >> >> >> Unfortunately, the scripts are no longer automatically marked with the >> >> "internal" version information when committed, so there really isn't >> >> anything in the script I can tell you to look for. ?Check for more >> >> than one bioperl instance on your ?computer. >> >> >> >> I've attached the GFF3 file I got so you can look at it and tell me if >> >> it is what you expect. >> >> >> >> Scott >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >> >> wrote: >> >> > Hi Scott, >> >> > >> >> > I just pulled the lated bioperl-live using git. >> >> > I'm not sure how the scripts are updated, so I Build and installed >> >> > anyway >> >> > (perhaps exporting the path is supposed to be enough?) >> >> > Anyway, I still get the same results. No locus_tag. >> >> > How can I tell if I'm using the latest version of the script? >> >> > >> >> > Thanks again. >> >> > >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >> >> > wrote: >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> A fresh "pull" of the bioperl git repository shows that >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >> >> >> all >> >> >> features that have a locus_tag, and uses the locus_tag for the ID >> >> >> when >> >> >> it can (it can't blindly use the locus tag for the ID since both the >> >> >> gene and the CDS have the same tag). >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >> >> >> wrote: >> >> >> > Hi Scott, >> >> >> > >> >> >> > Here is a very short genbank: >> >> >> > >> >> >> > >> >> >> > >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> >> >> > >> >> >> > Note all genes in the genbank have locus tags. In the resulting >> >> >> > GFF3, >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I >> >> >> > have >> >> >> > no >> >> >> > idea why it deserves a special treatment... :) >> >> >> > >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last >> >> >> > column >> >> >> > whenever available) will really make my life easier. >> >> >> > >> >> >> > Thank you, >> >> >> > Dave >> >> >> > >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> That seems perfectly reasonable. ?If you could point out a >> >> >> >> GenBank >> >> >> >> entry for which that does not happen, I could try to figure out >> >> >> >> why >> >> >> >> not. >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> >> >> >> wrote: >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >> >> >> >> > locus_tag >> >> >> >> > will >> >> >> >> > be >> >> >> >> > always added to the GFF last column if it exists in the >> >> >> >> > genbank, >> >> >> >> > whether >> >> >> >> > it >> >> >> >> > is used as ID in the GFF or not. >> >> >> >> > >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal >> >> >> >> >> with >> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole >> >> >> >> >> genome >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make >> >> >> >> >> it >> >> >> >> >> "do >> >> >> >> >> the right thing." ?In practice, it is not unusual for a post >> >> >> >> >> processing step (either by hand or a quicky perl script) to be >> >> >> >> >> required to really get it right. ?I don't recall the specifics >> >> >> >> >> (if I >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I do >> >> >> >> >> know >> >> >> >> >> that there is a list of things that it will try to use for the >> >> >> >> >> ID, >> >> >> >> >> and >> >> >> >> >> while the locus is on the list, I don't know where it comes in >> >> >> >> >> the >> >> >> >> >> list, so it's possible that other items might supersede it. >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> >> >> >> wrote: >> >> >> >> >> > Hello, >> >> >> >> >> > >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds >> >> >> >> >> > a >> >> >> >> >> > `locus_tag` >> >> >> >> >> > in the fields and sometime it doesn't, even though the >> >> >> >> >> > genabank >> >> >> >> >> > has a >> >> >> >> >> > locus >> >> >> >> >> > tag. >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >> >> >> >> >> > >> >> >> >> >> > Thanks, >> >> >> >> >> > Dave >> >> >> >> >> > _______________________________________________ >> >> >> >> >> > Bioperl-l mailing list >> >> >> >> >> > Bioperl-l at lists.open-bio.org >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> >> scottcain >> >> >> >> >> dot net >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> 216-392-3087 >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> scottcain >> >> >> >> dot net >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> 216-392-3087 >> >> >> >> Ontario Institute for Cancer Research >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> scottcain >> >> >> dot net >> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> >> Ontario Institute for Cancer Research >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 09:48:35 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 14:48:35 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, The blib directory is not part of the repository; it is created when you execute ./Build as a staging area before installation. The directory that the script resides is scripts/Bio-DB-GFF/ Scott On Sat, Sep 18, 2010 at 2:40 PM, David Breimann wrote: > Now I did a fresh clone (instead of pull) into a new dir: > > $ git clone http://github.com/bioperl/bioperl-live.git > > but I don't find the script at all (there is no blib dir as before)... > > > On Sat, Sep 18, 2010 at 3:14 PM, David Breimann > wrote: >> >> Yes, I'm using Ubuntu 10.04. >> >> That is really weired. I tried running the script from the perl-live dir >> (which I just pulled using git), and I get the same results as before >> (`Name` instead of `locus_tag`): >> >> ?$ wget >> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y >> NC_009789.genbank >> >> Attached is the resulting GFF3. >> I also attach a copy of bp_genbank2gff3.pl as found under >> /home/dave/src/bioperl-live/blib/script. >> >> This is a real mystery for me! >> >> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >>> >>> Typically I do build and install, but you can run it directly from the >>> git checkout directory. >>> >>> For locating other versions of the script, are you running linux? ?If >>> so, are you familiar with the "locate" command: >>> >>> ?locate bp_genbank2gff3.pl >>> >>> If you've never used it before, you may need to update the database >>> the locate command uses as root: >>> >>> ?sudo updatedb >>> >>> Scott >>> >>> >>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >>> wrote: >>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed >>> > of >>> > `locus_tag=`. >>> > >>> > I don't really know how to check for multiple bioperl installations. >>> > I'm using my personal server, so I don't mind removing and installing >>> > everything from scratch -- but I do'nt know ho to do that. >>> > >>> > Also, what I don't get with the git is how the scripts are supposed to >>> > be >>> > updated (unless you build and install). >>> > >>> > Thanks you! >>> > >>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain >>> > wrote: >>> >> >>> >> Well, if you aren't getting the same results as me then I'd say you >>> >> aren't using the same version of the script :-) >>> >> >>> >> Unfortunately, the scripts are no longer automatically marked with the >>> >> "internal" version information when committed, so there really isn't >>> >> anything in the script I can tell you to look for. ?Check for more >>> >> than one bioperl instance on your ?computer. >>> >> >>> >> I've attached the GFF3 file I got so you can look at it and tell me if >>> >> it is what you expect. >>> >> >>> >> Scott >>> >> >>> >> >>> >> >>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >>> >> wrote: >>> >> > Hi Scott, >>> >> > >>> >> > I just pulled the lated bioperl-live using git. >>> >> > I'm not sure how the scripts are updated, so I Build and installed >>> >> > anyway >>> >> > (perhaps exporting the path is supposed to be enough?) >>> >> > Anyway, I still get the same results. No locus_tag. >>> >> > How can I tell if I'm using the latest version of the script? >>> >> > >>> >> > Thanks again. >>> >> > >>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >>> >> > wrote: >>> >> >> >>> >> >> Hi Dave, >>> >> >> >>> >> >> A fresh "pull" of the bioperl git repository shows that >>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >>> >> >> all >>> >> >> features that have a locus_tag, and uses the locus_tag for the ID >>> >> >> when >>> >> >> it can (it can't blindly use the locus tag for the ID since both >>> >> >> the >>> >> >> gene and the CDS have the same tag). >>> >> >> >>> >> >> Scott >>> >> >> >>> >> >> >>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >>> >> >> wrote: >>> >> >> > Hi Scott, >>> >> >> > >>> >> >> > Here is a very short genbank: >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >>> >> >> > >>> >> >> > Note all genes in the genbank have locus tags. In the resulting >>> >> >> > GFF3, >>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I >>> >> >> > have >>> >> >> > no >>> >> >> > idea why it deserves a special treatment... :) >>> >> >> > >>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last >>> >> >> > column >>> >> >> > whenever available) will really make my life easier. >>> >> >> > >>> >> >> > Thank you, >>> >> >> > Dave >>> >> >> > >>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >>> >> >> > >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Hi Dave, >>> >> >> >> >>> >> >> >> That seems perfectly reasonable. ?If you could point out a >>> >> >> >> GenBank >>> >> >> >> entry for which that does not happen, I could try to figure out >>> >> >> >> why >>> >> >> >> not. >>> >> >> >> >>> >> >> >> Scott >>> >> >> >> >>> >> >> >> >>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >>> >> >> >> wrote: >>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >>> >> >> >> > locus_tag >>> >> >> >> > will >>> >> >> >> > be >>> >> >> >> > always added to the GFF last column if it exists in the >>> >> >> >> > genbank, >>> >> >> >> > whether >>> >> >> >> > it >>> >> >> >> > is used as ID in the GFF or not. >>> >> >> >> > >>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >>> >> >> >> > >>> >> >> >> > wrote: >>> >> >> >> >> >>> >> >> >> >> Hi Dave, >>> >> >> >> >> >>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal >>> >> >> >> >> with >>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole >>> >> >> >> >> genome >>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make >>> >> >> >> >> it >>> >> >> >> >> "do >>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post >>> >> >> >> >> processing step (either by hand or a quicky perl script) to >>> >> >> >> >> be >>> >> >> >> >> required to really get it right. ?I don't recall the >>> >> >> >> >> specifics >>> >> >> >> >> (if I >>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I >>> >> >> >> >> do >>> >> >> >> >> know >>> >> >> >> >> that there is a list of things that it will try to use for >>> >> >> >> >> the >>> >> >> >> >> ID, >>> >> >> >> >> and >>> >> >> >> >> while the locus is on the list, I don't know where it comes >>> >> >> >> >> in >>> >> >> >> >> the >>> >> >> >> >> list, so it's possible that other items might supersede it. >>> >> >> >> >> >>> >> >> >> >> Scott >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >>> >> >> >> >> wrote: >>> >> >> >> >> > Hello, >>> >> >> >> >> > >>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it >>> >> >> >> >> > adds a >>> >> >> >> >> > `locus_tag` >>> >> >> >> >> > in the fields and sometime it doesn't, even though the >>> >> >> >> >> > genabank >>> >> >> >> >> > has a >>> >> >> >> >> > locus >>> >> >> >> >> > tag. >>> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >>> >> >> >> >> > >>> >> >> >> >> > Thanks, >>> >> >> >> >> > Dave >>> >> >> >> >> > _______________________________________________ >>> >> >> >> >> > Bioperl-l mailing list >>> >> >> >> >> > Bioperl-l at lists.open-bio.org >>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> > >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> -- >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> ------------------------------------------------------------------------ >>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> >> >> scottcain >>> >> >> >> >> dot net >>> >> >> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> >> >> 216-392-3087 >>> >> >> >> >> Ontario Institute for Cancer Research >>> >> >> >> > >>> >> >> >> > >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> -- >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> ------------------------------------------------------------------------ >>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> >> scottcain >>> >> >> >> dot net >>> >> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> >> 216-392-3087 >>> >> >> >> Ontario Institute for Cancer Research >>> >> >> > >>> >> >> > >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> >>> >> >> >>> >> >> ------------------------------------------------------------------------ >>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> scottcain >>> >> >> dot net >>> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> 216-392-3087 >>> >> >> Ontario Institute for Cancer Research >>> >> > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> >>> >> ------------------------------------------------------------------------ >>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> scottcain >>> >> dot net >>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >>> >> Ontario Institute for Cancer Research >>> > >>> > >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >>> Ontario Institute for Cancer Research >> > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 09:57:30 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 15:57:30 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: So let's do an intermediate summary of my situation: I'm using Ubuntu 10.04 and Perl 5.10.1. I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of "locus_tag=" in the last GFF3 column), while Scott gets the expected results while using the latest version of bioperl. I cloned a fresh version of bioperl live into my ~/src: $ cd ~/src $ git clone http://github.com/bioperl/bioperl-live.git I then added the following line to the end of ~/.profile: export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB" and ran $ source ~/.profile I then downloaded a small genome from NCBI $ wget ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk and tested the script: $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk Following are the top 10 lines of the resulting GFF3: ##gff-version 3 # sequence-region NC_009789 1 6199 # conversion-by bp_genbank2gff3.pl # organism Escherichia coli E24377A # date 06-JAN-2010 # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. NC_009789 GenBank region 1 6199 . + 1 ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli E24377A;plasmid=pETEC_6;strain=E24377A NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001 NC_009789 GenBank mRNA 665 781 . - 1 ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 NC_009789 GenBank CDS 665 781 . - 1 ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified by glimmer%3B putative;codon_start=1;product=hypothetical protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 while these are from Scotts' file: ##gff-version 3 # sequence-region NC_009789 1 6199 # conversion-by bp_genbank2gff3.pl # organism Escherichia coli E24377A # date 06-JAN-2010 # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. NC_009789 GenBank region 1 6199 . + 1 ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli E24377A;plasmid=pETEC_6;strain=E24377A NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001 NC_009789 GenBank mRNA 665 781 . - 1 ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 NC_009789 GenBank CDS 665 781 . - 1 ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified by glimmer%3B putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's, as desired. I have no idea what is going on here... Best, Dave On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain wrote: > Hi Dave, > > Let's keep the discussion on the mailing list so we can make sure that > when this problem is solved, its resolution will be archived. > > I don't really understand what is going on either, though it would > probably be a good idea to set your PERL5LIB env variable so that when > you execute this script from the git repository that it will also uses > BioPerl modules in the git repository instead of the ones that are > installed in your "normal" path. > > Also, are you using any command line flags when executing it? I didn't. > > Scott > > > On Sat, Sep 18, 2010 at 2:14 PM, David Breimann > wrote: > > Yes, I'm using Ubuntu 10.04. > > > > That is really weired. I tried running the script from the perl-live dir > > (which I just pulled using git), and I get the same results as before > > (`Name` instead of `locus_tag`): > > > > $ wget > > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > > $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y > > NC_009789.genbank > > > > Attached is the resulting GFF3. > > I also attach a copy of bp_genbank2gff3.pl as found under > > /home/dave/src/bioperl-live/blib/script. > > > > This is a real mystery for me! > > > > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: > >> > >> Typically I do build and install, but you can run it directly from the > >> git checkout directory. > >> > >> For locating other versions of the script, are you running linux? If > >> so, are you familiar with the "locate" command: > >> > >> locate bp_genbank2gff3.pl > >> > >> If you've never used it before, you may need to update the database > >> the locate command uses as root: > >> > >> sudo updatedb > >> > >> Scott > >> > >> > >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann > >> wrote: > >> > Your gff seems fine. I get a vey similiar one, but with `Name=` > instaed > >> > of > >> > `locus_tag=`. > >> > > >> > I don't really know how to check for multiple bioperl installations. > >> > I'm using my personal server, so I don't mind removing and installing > >> > everything from scratch -- but I do'nt know ho to do that. > >> > > >> > Also, what I don't get with the git is how the scripts are supposed to > >> > be > >> > updated (unless you build and install). > >> > > >> > Thanks you! > >> > > >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain > wrote: > >> >> > >> >> Well, if you aren't getting the same results as me then I'd say you > >> >> aren't using the same version of the script :-) > >> >> > >> >> Unfortunately, the scripts are no longer automatically marked with > the > >> >> "internal" version information when committed, so there really isn't > >> >> anything in the script I can tell you to look for. Check for more > >> >> than one bioperl instance on your computer. > >> >> > >> >> I've attached the GFF3 file I got so you can look at it and tell me > if > >> >> it is what you expect. > >> >> > >> >> Scott > >> >> > >> >> > >> >> > >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann > >> >> wrote: > >> >> > Hi Scott, > >> >> > > >> >> > I just pulled the lated bioperl-live using git. > >> >> > I'm not sure how the scripts are updated, so I Build and installed > >> >> > anyway > >> >> > (perhaps exporting the path is supposed to be enough?) > >> >> > Anyway, I still get the same results. No locus_tag. > >> >> > How can I tell if I'm using the latest version of the script? > >> >> > > >> >> > Thanks again. > >> >> > > >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain > >> >> > wrote: > >> >> >> > >> >> >> Hi Dave, > >> >> >> > >> >> >> A fresh "pull" of the bioperl git repository shows that > >> >> >> bp_genbank2gff3.pl already does this. It creates a locus_tag for > >> >> >> all > >> >> >> features that have a locus_tag, and uses the locus_tag for the ID > >> >> >> when > >> >> >> it can (it can't blindly use the locus tag for the ID since both > the > >> >> >> gene and the CDS have the same tag). > >> >> >> > >> >> >> Scott > >> >> >> > >> >> >> > >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann > >> >> >> wrote: > >> >> >> > Hi Scott, > >> >> >> > > >> >> >> > Here is a very short genbank: > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > >> >> >> > > >> >> >> > Note all genes in the genbank have locus tags. In the resulting > >> >> >> > GFF3, > >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. > I > >> >> >> > have > >> >> >> > no > >> >> >> > idea why it deserves a special treatment... :) > >> >> >> > > >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 > last > >> >> >> > column > >> >> >> > whenever available) will really make my life easier. > >> >> >> > > >> >> >> > Thank you, > >> >> >> > Dave > >> >> >> > > >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain < > scott at scottcain.net> > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> Hi Dave, > >> >> >> >> > >> >> >> >> That seems perfectly reasonable. If you could point out a > >> >> >> >> GenBank > >> >> >> >> entry for which that does not happen, I could try to figure out > >> >> >> >> why > >> >> >> >> not. > >> >> >> >> > >> >> >> >> Scott > >> >> >> >> > >> >> >> >> > >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann > >> >> >> >> wrote: > >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest > >> >> >> >> > locus_tag > >> >> >> >> > will > >> >> >> >> > be > >> >> >> >> > always added to the GFF last column if it exists in the > >> >> >> >> > genbank, > >> >> >> >> > whether > >> >> >> >> > it > >> >> >> >> > is used as ID in the GFF or not. > >> >> >> >> > > >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain > >> >> >> >> > > >> >> >> >> > wrote: > >> >> >> >> >> > >> >> >> >> >> Hi Dave, > >> >> >> >> >> > >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to > deal > >> >> >> >> >> with > >> >> >> >> >> GenBank files :-) It was designed initially to work on > whole > >> >> >> >> >> genome > >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to > make > >> >> >> >> >> it > >> >> >> >> >> "do > >> >> >> >> >> the right thing." In practice, it is not unusual for a post > >> >> >> >> >> processing step (either by hand or a quicky perl script) to > be > >> >> >> >> >> required to really get it right. I don't recall the > specifics > >> >> >> >> >> (if I > >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I > do > >> >> >> >> >> know > >> >> >> >> >> that there is a list of things that it will try to use for > the > >> >> >> >> >> ID, > >> >> >> >> >> and > >> >> >> >> >> while the locus is on the list, I don't know where it comes > in > >> >> >> >> >> the > >> >> >> >> >> list, so it's possible that other items might supersede it. > >> >> >> >> >> > >> >> >> >> >> Scott > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > >> >> >> >> >> wrote: > >> >> >> >> >> > Hello, > >> >> >> >> >> > > >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it > adds > >> >> >> >> >> > a > >> >> >> >> >> > `locus_tag` > >> >> >> >> >> > in the fields and sometime it doesn't, even though the > >> >> >> >> >> > genabank > >> >> >> >> >> > has a > >> >> >> >> >> > locus > >> >> >> >> >> > tag. > >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? > >> >> >> >> >> > > >> >> >> >> >> > Thanks, > >> >> >> >> >> > Dave > >> >> >> >> >> > _______________________________________________ > >> >> >> >> >> > Bioperl-l mailing list > >> >> >> >> >> > Bioperl-l at lists.open-bio.org > >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> >> >> >> >> > > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> -- > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > ------------------------------------------------------------------------ > >> >> >> >> >> Scott Cain, Ph. D. scott > at > >> >> >> >> >> scottcain > >> >> >> >> >> dot net > >> >> >> >> >> GMOD Coordinator (http://gmod.org/) > >> >> >> >> >> 216-392-3087 > >> >> >> >> >> Ontario Institute for Cancer Research > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ------------------------------------------------------------------------ > >> >> >> >> Scott Cain, Ph. D. scott at > >> >> >> >> scottcain > >> >> >> >> dot net > >> >> >> >> GMOD Coordinator (http://gmod.org/) > >> >> >> >> 216-392-3087 > >> >> >> >> Ontario Institute for Cancer Research > >> >> >> > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> > >> >> >> > >> >> >> > ------------------------------------------------------------------------ > >> >> >> Scott Cain, Ph. D. scott at > >> >> >> scottcain > >> >> >> dot net > >> >> >> GMOD Coordinator (http://gmod.org/) > 216-392-3087 > >> >> >> Ontario Institute for Cancer Research > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> > ------------------------------------------------------------------------ > >> >> Scott Cain, Ph. D. scott at > scottcain > >> >> dot net > >> >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> >> Ontario Institute for Cancer Research > >> > > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From scott at scottcain.net Sat Sep 18 10:03:43 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 15:03:43 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: The only thing I can add is that I did a 'git diff genbank2gff3.PLS' and found no differences. It occurred to me that perhaps I'd done some fixing and not commited it, but it looks to me that that's not the case (assuming I've managed to use git correctly (not a great assumption, but I don't have another one to work with :-)) Scott On Sat, Sep 18, 2010 at 2:57 PM, David Breimann wrote: > So let's do an intermediate summary of my situation: > I'm using Ubuntu 10.04 and Perl 5.10.1. > I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of > "locus_tag=" in the last GFF3 column), while Scott gets the expected results > while using the latest version of bioperl. > I cloned a fresh version of bioperl live into my ~/src: > $ cd ~/src > $ git clone http://github.com/bioperl/bioperl-live.git > > I then added the following line to the end of ~/.profile: > export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB" > and ran > $ source ~/.profile > > I then downloaded a small genome from NCBI > $ wget > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > and tested the script: > $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk > > Following are the top 10 lines of the resulting GFF3: > > ##gff-version 3 > # sequence-region NC_009789 1 6199 > # conversion-by bp_genbank2gff3.pl > # organism Escherichia coli E24377A > # date 06-JAN-2010 > # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. > NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1 > ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia > coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This > record has not yet been subject to final NCBI review. The reference sequence > was derived from CP000798. Source DNA and bacteria available from Jacques > Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL > REFSEQ: This record has not yet been subject to final NCBI review. The > reference sequence was derived from CP000798. Source DNA and bacteria > available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. > ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli > E24377A;plasmid=pETEC_6;strain=E24377A > NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001 > NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 > NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified > by glimmer%3B putative;codon_start=1;product=hypothetical > protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 > > while these are from Scotts' file: > ##gff-version 3 > # sequence-region NC_009789 1 6199 > # conversion-by bp_genbank2gff3.pl > # organism Escherichia coli E24377A > # date 06-JAN-2010 > # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. > NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1 > ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A > plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has > not yet been subject to final NCBI review. The reference sequence was > derived from CP000798. Source DNA and bacteria available from Jacques Ravel > (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: > This record has not yet been subject to final NCBI review. The reference > sequence was derived from CP000798. Source DNA and bacteria available from > Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. > ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli > E24377A;plasmid=pETEC_6;strain=E24377A > NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001 > NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 > NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified > by glimmer%3B > putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical > protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 > > > Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's, > as desired. > I have no idea what is going on here... > > Best, > Dave > > On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain wrote: >> >> Hi Dave, >> >> Let's keep the discussion on the mailing list so we can make sure that >> when this problem is solved, its resolution will be archived. >> >> I don't really understand what is going on either, though it would >> probably be a good idea to set your PERL5LIB env variable so that when >> you execute this script from the git repository that it will also uses >> BioPerl modules in the git repository instead of the ones that are >> installed in your "normal" path. >> >> Also, are you using any command line flags when executing it? ?I didn't. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann >> wrote: >> > Yes, I'm using Ubuntu 10.04. >> > >> > That is really weired. I tried running the script from the perl-live dir >> > (which I just pulled using git), and I get the same results as before >> > (`Name` instead of `locus_tag`): >> > >> > ?$ wget >> > >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y >> > NC_009789.genbank >> > >> > Attached is the resulting GFF3. >> > I also attach a copy of bp_genbank2gff3.pl as found under >> > /home/dave/src/bioperl-live/blib/script. >> > >> > This is a real mystery for me! >> > >> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >> >> >> >> Typically I do build and install, but you can run it directly from the >> >> git checkout directory. >> >> >> >> For locating other versions of the script, are you running linux? ?If >> >> so, are you familiar with the "locate" command: >> >> >> >> ?locate bp_genbank2gff3.pl >> >> >> >> If you've never used it before, you may need to update the database >> >> the locate command uses as root: >> >> >> >> ?sudo updatedb >> >> >> >> Scott >> >> >> >> >> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >> >> wrote: >> >> > Your gff seems fine. I get a vey similiar one, but with `Name=` >> >> > instaed >> >> > of >> >> > `locus_tag=`. >> >> > >> >> > I don't really know how to check for multiple bioperl installations. >> >> > I'm using my personal server, so I don't mind removing and installing >> >> > everything from scratch -- but I do'nt know ho to do that. >> >> > >> >> > Also, what I don't get with the git is how the scripts are supposed >> >> > to >> >> > be >> >> > updated (unless you build and install). >> >> > >> >> > Thanks you! >> >> > >> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain >> >> > wrote: >> >> >> >> >> >> Well, if you aren't getting the same results as me then I'd say you >> >> >> aren't using the same version of the script :-) >> >> >> >> >> >> Unfortunately, the scripts are no longer automatically marked with >> >> >> the >> >> >> "internal" version information when committed, so there really isn't >> >> >> anything in the script I can tell you to look for. ?Check for more >> >> >> than one bioperl instance on your ?computer. >> >> >> >> >> >> I've attached the GFF3 file I got so you can look at it and tell me >> >> >> if >> >> >> it is what you expect. >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >> >> >> wrote: >> >> >> > Hi Scott, >> >> >> > >> >> >> > I just pulled the lated bioperl-live using git. >> >> >> > I'm not sure how the scripts are updated, so I Build and installed >> >> >> > anyway >> >> >> > (perhaps exporting the path is supposed to be enough?) >> >> >> > Anyway, I still get the same results. No locus_tag. >> >> >> > How can I tell if I'm using the latest version of the script? >> >> >> > >> >> >> > Thanks again. >> >> >> > >> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> A fresh "pull" of the bioperl git repository shows that >> >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >> >> >> >> all >> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID >> >> >> >> when >> >> >> >> it can (it can't blindly use the locus tag for the ID since both >> >> >> >> the >> >> >> >> gene and the CDS have the same tag). >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >> >> >> >> wrote: >> >> >> >> > Hi Scott, >> >> >> >> > >> >> >> >> > Here is a very short genbank: >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> >> >> >> > >> >> >> >> > Note all genes in the genbank have locus tags. In the resulting >> >> >> >> > GFF3, >> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. >> >> >> >> > I >> >> >> >> > have >> >> >> >> > no >> >> >> >> > idea why it deserves a special treatment... :) >> >> >> >> > >> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 >> >> >> >> > last >> >> >> >> > column >> >> >> >> > whenever available) will really make my life easier. >> >> >> >> > >> >> >> >> > Thank you, >> >> >> >> > Dave >> >> >> >> > >> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> That seems perfectly reasonable. ?If you could point out a >> >> >> >> >> GenBank >> >> >> >> >> entry for which that does not happen, I could try to figure >> >> >> >> >> out >> >> >> >> >> why >> >> >> >> >> not. >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> >> >> >> >> wrote: >> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >> >> >> >> >> > locus_tag >> >> >> >> >> > will >> >> >> >> >> > be >> >> >> >> >> > always added to the GFF last column if it exists in the >> >> >> >> >> > genbank, >> >> >> >> >> > whether >> >> >> >> >> > it >> >> >> >> >> > is used as ID in the GFF or not. >> >> >> >> >> > >> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> >> >> >> >> > >> >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to >> >> >> >> >> >> deal >> >> >> >> >> >> with >> >> >> >> >> >> GenBank files :-) ?It was designed initially to work on >> >> >> >> >> >> whole >> >> >> >> >> >> genome >> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to >> >> >> >> >> >> make >> >> >> >> >> >> it >> >> >> >> >> >> "do >> >> >> >> >> >> the right thing." ?In practice, it is not unusual for a >> >> >> >> >> >> post >> >> >> >> >> >> processing step (either by hand or a quicky perl script) to >> >> >> >> >> >> be >> >> >> >> >> >> required to really get it right. ?I don't recall the >> >> >> >> >> >> specifics >> >> >> >> >> >> (if I >> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I >> >> >> >> >> >> do >> >> >> >> >> >> know >> >> >> >> >> >> that there is a list of things that it will try to use for >> >> >> >> >> >> the >> >> >> >> >> >> ID, >> >> >> >> >> >> and >> >> >> >> >> >> while the locus is on the list, I don't know where it comes >> >> >> >> >> >> in >> >> >> >> >> >> the >> >> >> >> >> >> list, so it's possible that other items might supersede it. >> >> >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> >> >> >> >> wrote: >> >> >> >> >> >> > Hello, >> >> >> >> >> >> > >> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it >> >> >> >> >> >> > adds >> >> >> >> >> >> > a >> >> >> >> >> >> > `locus_tag` >> >> >> >> >> >> > in the fields and sometime it doesn't, even though the >> >> >> >> >> >> > genabank >> >> >> >> >> >> > has a >> >> >> >> >> >> > locus >> >> >> >> >> >> > tag. >> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >> >> >> >> >> >> > >> >> >> >> >> >> > Thanks, >> >> >> >> >> >> > Dave >> >> >> >> >> >> > _______________________________________________ >> >> >> >> >> >> > Bioperl-l mailing list >> >> >> >> >> >> > Bioperl-l at lists.open-bio.org >> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott >> >> >> >> >> >> at >> >> >> >> >> >> scottcain >> >> >> >> >> >> dot net >> >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> >> 216-392-3087 >> >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> >> scottcain >> >> >> >> >> dot net >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> 216-392-3087 >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> scottcain >> >> >> >> dot net >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> 216-392-3087 >> >> >> >> Ontario Institute for Cancer Research >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> scottcain >> >> >> dot net >> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> >> Ontario Institute for Cancer Research >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From j.scholtalbers at gmail.com Mon Sep 20 04:04:34 2010 From: j.scholtalbers at gmail.com (Jelle Scholtalbers) Date: Mon, 20 Sep 2010 10:04:34 +0200 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: Hi, I'm trying to get all descendents for a specific taxon using Entrez. each_Descendent and get_all_Descendents don't seem to be implemented or working. I then tried by getting the tree for this taxon using Bio::DB::Taxonomy's get_tree. However this only retrieves the ancestors/parents. What would be the best approach here? Cheers, Jelle On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > Eric > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > wrote: > > Sounds like this is going through an initial indexing step (for > flatfiles). I would expect the initial indexing of the tables to take time > as you have to create the DB, but subsequent lookups post-indexing should be > much faster if the index is already present. Maybe Jason could answer in > more detail? > > > > chris > > > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > >> Hello, > >> > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl > >> 5.8.5 with BioPerl 1.6.0 > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > >> > >> It ran for 100 cpu seconds and output: > >> > >> 33090 Viridiplantae kingdom > >> > >> I was expecting it to also output the descendents. Some questions: > >> > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > >> implemented? It looks to be in Taxon.pm but it is not documented and > >> when I ran Data::Dumper on $node the value '_desc' was empty. > >> > >> 2) is the flatfile reader always so slow? after replacing 'flatfile' > >> with a call to 'entrez' it took only 0.02 cpu seconds to come > >> up with the same result. > >> > >> thanks, > >> Eric > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From pcantalupo at gmail.com Mon Sep 20 10:46:32 2010 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Mon, 20 Sep 2010 10:46:32 -0400 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: Jelle, Below is my subroutine that returns the lineage corresponding to a Taxonomy id. For example, if you use 10633 as the taxid, the subroutine will return: Viruses dsDNA viruses, no RNA stage Polyomaviridae Polyomavirus Simian virus 40 I hope this is what you wanted. Good luck sub taxid2lineage { ?? my ($id) = @_; ?? return undef unless ($id); ?? my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', ????????????????????????????????????????? -db??? => 'taxonomy', ????????????????????????????????????????? -email => 'pcantalupo at gmail.com', ????????????????????????????????????????? -id??? => [ $id ], ????????????????????????????????????????? ); ?? my $res = $factory->get_Response->content; ?? my $data = XMLin($res); ?? if (!ref($data)) { ????? # this happens when the Taxid is not found in the Taxonomy DB ????? return $data; ?? } ?? my @lineage = (); ?? foreach my $taxa (@{ $data->{Taxon}->{LineageEx}->{Taxon} } ) { ????? # taxa is a hash with three keys ScientificName, TaxId, and Rank ????? # I'm only saving the ScientificName but possible extensions to this ????? # subroutine would be to return the TaxId and Rank as well. ????? push (@lineage, $taxa->{ScientificName}); ?? } ?? # add the Species to the end of the Lineage array. ?? push (@lineage, $data->{Taxon}->{ScientificName}); ?? return wantarray ? return @lineage : join("; ", @lineage); } Paul Cantalupo University of Pittsburgh On Mon, Sep 20, 2010 at 4:04 AM, Jelle Scholtalbers wrote: > > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. ?I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > > Eric > > > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > > wrote: > > > Sounds like this is going through an initial indexing step (for > > flatfiles). ?I would expect the initial indexing of the tables to take time > > as you have to create the DB, but subsequent lookups post-indexing should be > > much faster if the index is already present. ?Maybe Jason could answer in > > more detail? > > > > > > chris > > > > > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > > > >> Hello, > > >> > > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl > > >> 5.8.5 with BioPerl 1.6.0 > > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > >> > > >> It ran for 100 cpu seconds and output: > > >> > > >> 33090 Viridiplantae kingdom > > >> > > >> I was expecting it to also output the descendents. Some questions: > > >> > > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > > >> implemented? It looks to be in Taxon.pm but it is not documented and > > >> when I ran Data::Dumper on $node the value '_desc' was empty. > > >> > > >> 2) is the flatfile reader always so slow? after replacing 'flatfile' > > >> with a call to 'entrez' it took only 0.02 cpu seconds to come > > >> up with the same result. > > >> > > >> thanks, > > >> Eric > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Sep 20 11:38:36 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 20 Sep 2010 08:38:36 -0700 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: <4C977FFC.5000205@bioperl.org> This works for me to get all the descendents from sub-node. You have to call the function with the dabatase handle. I am not sure if the Taxon implementation has reference to the dbhandle or not: #!/usr/bin/perl -w use strict; use Bio::DB::Taxonomy; my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into this directory my $db = Bio::DB::Taxonomy->new(-source => 'flatfile', -nodesfile => "$dbdir/nodes.dmp", -namesfile => "$dbdir/names.dmp", ); my $taxa = $db->get_taxon(-taxonid => 151341); my @d = $db->get_all_Descendents($taxa); print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name } @d), "\n"; Hope that helps. Jelle Scholtalbers wrote, On 9/20/10 1:04 AM: > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > >> Thanks, that was indeed the answer to #2. Any idea about each_Descendent? >> Eric >> >> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields >> wrote: >> >>> Sounds like this is going through an initial indexing step (for >>> >> flatfiles). I would expect the initial indexing of the tables to take time >> as you have to create the DB, but subsequent lookups post-indexing should be >> much faster if the index is already present. Maybe Jason could answer in >> more detail? >> >>> chris >>> >>> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: >>> >>> >>>> Hello, >>>> >>>> I tried the Bio::DB::Taxonomy example on this wiki page using perl >>>> 5.8.5 with BioPerl 1.6.0 >>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>> >>>> It ran for 100 cpu seconds and output: >>>> >>>> 33090 Viridiplantae kingdom >>>> >>>> I was expecting it to also output the descendents. Some questions: >>>> >>>> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually >>>> implemented? It looks to be in Taxon.pm but it is not documented and >>>> when I ran Data::Dumper on $node the value '_desc' was empty. >>>> >>>> 2) is the flatfile reader always so slow? after replacing 'flatfile' >>>> with a call to 'entrez' it took only 0.02 cpu seconds to come >>>> up with the same result. >>>> >>>> thanks, >>>> Eric >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From j.scholtalbers at gmail.com Wed Sep 22 03:46:35 2010 From: j.scholtalbers at gmail.com (Jelle Scholtalbers) Date: Wed, 22 Sep 2010 09:46:35 +0200 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: <4C977FFC.5000205@bioperl.org> References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> <4C977FFC.5000205@bioperl.org> Message-ID: Hi Jason, this was the same method I was using. With the taxdump it works apparently, however it does not work with Entrez as source. So I will just stick to a up2date taxdump then. Thanks for your example. @Paul: Your method gives indeed the lineage but will only retrieve the ancestors. I want to retrieve all the descendents. Thx anyway. Cheers, Jelle On Mon, Sep 20, 2010 at 5:38 PM, Jason Stajich wrote: > > This works for me to get all the descendents from sub-node. You have to > call the function with the dabatase handle. I am not sure if the Taxon > implementation has reference to the dbhandle or not: > #!/usr/bin/perl -w > use strict; > use Bio::DB::Taxonomy; > my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into > this directory > my $db = Bio::DB::Taxonomy->new(-source => 'flatfile', > -nodesfile => "$dbdir/nodes.dmp", > -namesfile => "$dbdir/names.dmp", > ); > my $taxa = $db->get_taxon(-taxonid => 151341); > my @d = $db->get_all_Descendents($taxa); > > print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name > } @d), "\n"; > > > Hope that helps. > Jelle Scholtalbers wrote, On 9/20/10 1:04 AM: > > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > > > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > Eric > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > wrote: > > > Sounds like this is going through an initial indexing step (for > > > flatfiles). I would expect the initial indexing of the tables to take time > as you have to create the DB, but subsequent lookups post-indexing should be > much faster if the index is already present. Maybe Jason could answer in > more detail? > > > chris > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > > Hello, > > I tried the Bio::DB::Taxonomy example on this wiki page using perl > 5.8.5 with BioPerl 1.6.0http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > It ran for 100 cpu seconds and output: > > 33090 Viridiplantae kingdom > > I was expecting it to also output the descendents. Some questions: > > 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > implemented? It looks to be in Taxon.pm but it is not documented and > when I ran Data::Dumper on $node the value '_desc' was empty. > > 2) is the flatfile reader always so slow? after replacing 'flatfile' > with a call to 'entrez' it took only 0.02 cpu seconds to come > up with the same result. > > thanks, > Eric > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > From waldenhe at muohio.edu Fri Sep 24 15:15:48 2010 From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene) Date: Fri, 24 Sep 2010 15:15:48 -0400 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> Hello Bioperl Masters, I am trying to perform a local blast with a query list of fasta files against a db of other fasta files. I am attempting to use the Bio::Tools::Run::StandAloneBlastPlus module. I have downleaded from the NCBI website BLAST+ 2.2.24+ and installed on my ubuntu machine. I am using bioperl-1.5.2. so the snibbit of code that is giving me errors is below: my $seq_obj = Bio::Seq->new(-id =>$accn, -seq =>$seq); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; The error I am getting is: --------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- Can't call method "next_result" on an undefined value at /media/C8B3-4A4A/Bioinformatics 1.1 beta/BioPerl/bioperl.pm line 284. I think the real problem is the "cannot find path to Blastall. >From reading around on different forums I have to make a .ncbirc text file with the location of BLAST+2.2.24+ on my machine. I have that file in my /home folder. How do I get StandAloneBlastPlus synced up with BLAST+2.2.24+ ? Am I approaching this right? Thankyou, Hans Waldenmaier From ross at cuhk.edu.hk Sat Sep 25 04:30:39 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sat, 25 Sep 2010 16:30:39 +0800 Subject: [Bioperl-l] perl for GO In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: <015201cb5c8b$ef693490$ce3b9db0$@edu.hk> Given a set of GO IDs, e.g. GO:0008150 GO:0005750 GO:0006122 GO:0008121 GO:0003674 GO:0005575 GO:0008150 GO:0009507 GO:0009535 GO:0009567 GO:0009977 GO:0010027 GO:0031361 from http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo one can manually examine the hierarchy. Although there is go-perl (http://search.cpan.org/~cmungall/go-perl/) and go-db-perl (http://search.cpan.org/~cmungall/go-db-perl/), as a life science student who just learns Perl, I find it difficult to draw a hierarchy tree (or simply make it a table to count the occurrence) to produce something like: biological_process (4) *** cellular process (4) ****** cell adhesion (1) ****** cell differention (3) Molecular function (4) Cellular component (4) Can anybody advise? I don't need any fancy figures at all... From David.Messina at sbc.su.se Sun Sep 26 12:11:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 26 Sep 2010 18:11:54 +0200 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0@sbc.su.se> Hi Hans, > I think the real problem is the "cannot find path to Blastall. Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. Dave From maj at fortinbras.us Sun Sep 26 20:43:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 27 Sep 2010 00:43:15 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: git clone http://github.com/bioperl/bioperl-live.git git clone http://github.com/bioperl/bioperl-run.git (i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl cheers MAJ -------------------------- Mark A. Jensen, PhD Senior Consultant Fortinbras Research http://www.fortinbras.us >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Sunday, September 26, 2010 12:11 PM >To: 'Waldenmaier, Hans Eugene' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans, > > >> I think the real problem is the "cannot find path to Blastall. > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > >Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Sep 27 17:07:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Sep 2010 16:07:11 -0500 Subject: [Bioperl-l] Client-side Scansite Bioperl module In-Reply-To: References: Message-ID: Sorry, didn't see this being responded to on-list (been off the radar the last month). I think this is a good idea, but I'm wondering if this might be better as a separate release on CPAN from bioperl core, seeing as we're in the prelim stages after the next bioperl release of modularizing the current bioperl core into smaller independent releases. chris On Sep 4, 2010, at 10:40 AM, Jonathan Rameseder wrote: > hi guys > > it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases. maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D! > > best wishes > johnny > > [1] Bio::Tools::Analysis::Protein::Scansite > [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593 > > ******************** > Jonathan Rameseder > Ph.D. Candidate > Computational Systems Biology Initiative > Koch Institute for Integrative Cancer Research > Massachusetts Institute of Technology > ******************** From gandipalem at gmail.com Tue Sep 28 00:09:06 2010 From: gandipalem at gmail.com (bv s) Date: Tue, 28 Sep 2010 09:39:06 +0530 Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19 In-Reply-To: References: Message-ID: Dear Sir/Madam, Any one can tell how to use the make_primers.pl script? What is Coordination file? Regards Suresh Scholar, National Bureau Of Plant Genetic Resources, New Delhi. On Mon, Sep 27, 2010 at 9:30 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Re: StandAloneBlastPlus (Dave Messina) > 2. Re: StandAloneBlastPlus (Mark A. Jensen) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 26 Sep 2010 18:11:54 +0200 > From: Dave Messina > Subject: Re: [Bioperl-l] StandAloneBlastPlus > To: "Waldenmaier, Hans Eugene" > Cc: "bioperl-l at bioperl.org" > Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0 at sbc.su.se> > Content-Type: text/plain; charset=us-ascii > > Hi Hans, > > > > I think the real problem is the "cannot find path to Blastall. > > Yes. But it sounds like you're trying to use the Bio::Tools::Run modules > for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it > has blastn, blastp, etc. > > See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > > Also, you probably need to upgrade your BioPerl installation. I'm pretty > sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > > Dave > > > > > ------------------------------ > > Message: 2 > Date: Mon, 27 Sep 2010 00:43:15 +0000 > From: "Mark A. Jensen" > Subject: Re: [Bioperl-l] StandAloneBlastPlus > To: "Dave Messina" , "Waldenmaier, Hans > Eugene" > Cc: bioperl-l at bioperl.org > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > Hi Hans-- Dave is right; you'll need both the new blast+ as well as the > latest BioPerl trunk code. Get it by doing both of the following: > > git clone http://github.com/bioperl/bioperl-live.git > git clone http://github.com/bioperl/bioperl-run.git > > (i.e., you need the latest core and run distributions). To install, see > http://www.bioperl.org/wiki/Installing_BioPerl > > cheers MAJ > > -------------------------- > Mark A. Jensen, PhD > Senior Consultant > Fortinbras Research > http://www.fortinbras.us > > >-----Original Message----- > >From: Dave Messina [mailto:David.Messina at sbc.su.se] > >Sent: Sunday, September 26, 2010 12:11 PM > >To: 'Waldenmaier, Hans Eugene' > >Cc: bioperl-l at bioperl.org > >Subject: Re: [Bioperl-l] StandAloneBlastPlus > > > >Hi Hans, > > > > > >> I think the real problem is the "cannot find path to Blastall. > > > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules > for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it > has blastn, blastp, etc. > > > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > > > >Also, you probably need to upgrade your BioPerl installation. I'm pretty > sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > > > > > >Dave > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 89, Issue 19 > ***************************************** > From David.Messina at sbc.su.se Tue Sep 28 03:53:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 28 Sep 2010 09:53:29 +0200 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> Message-ID: <0BFD9DB0-40D9-4443-8968-CF5D5A31BD02@sbc.su.se> > I can get the command-line Blast running. But I still cannot get Perl to see BLAST. Type the following on the command line: perl -e 'print $ENV{PATH}, "\n"' You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing export /home/hans/BLAST/bin:PATH=${PATH} on the command line and then type perl -e 'print $ENV{PATH}, "\n"' again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script? > I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: > export PATH=${PATH}:/home/hans/BLAST/bin > export BLASTDIR=/home/hans/BLAST/ > > Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added. It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you. Dave From David.Messina at sbc.su.se Tue Sep 28 03:58:00 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 28 Sep 2010 09:58:00 +0200 Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19 In-Reply-To: References: Message-ID: <6BACC902-4F5E-466B-B949-FE373831CB92@sbc.su.se> > Any one can tell how to use the make_primers.pl script? > What is Coordination file? >From the documentation at the top of the script: Description: This program designs primers for constructing knockouts of genes by transformation of PCR products (ref: Datsenko & Wanner, PNAS 2000). A tab-delimited file containing ORF START STOP is read, and primers flanking the start & stop coordinates are designed based on the user-designated sequence file. In addition, primers flanking the knockout regions are chosen for PCR screening purposes once the knockout is generated. The script uses Bioperl in order to determine the primer sequences, which requires getting subsequences and reverse complementing some of the objects. Dave From maj at fortinbras.us Tue Sep 28 07:18:34 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 28 Sep 2010 11:18:34 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: The module checks the env variable BLASTPLUSDIR for the executable; you can set it directly export BLASTPLUSDIR=/home/hans/BLAST/bin and you should be good to go. MAJ >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Tuesday, September 28, 2010 03:53 AM >To: 'Waldenmaier, Hans Eugene' >Cc: 'Mark A. Jensen', bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >> I can get the command-line Blast running. But I still cannot get Perl to see BLAST. > >Type the following on the command line: >perl -e 'print $ENV{PATH}, "\n"' > >You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing >export /home/hans/BLAST/bin:PATH=${PATH} > >on the command line and then type >perl -e 'print $ENV{PATH}, "\n"' > >again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script? > > >> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: >> export PATH=${PATH}:/home/hans/BLAST/bin >> export BLASTDIR=/home/hans/BLAST/ >> >> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. > >It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added. > >It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you. > > >Dave >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From waldenhe at muohio.edu Tue Sep 28 00:52:56 2010 From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene) Date: Tue, 28 Sep 2010 00:52:56 -0400 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: References: Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> Thanks Guys, I have run those steps, my current version now is: hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.006001 But I am still having problems. I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI. I can get the command-line Blast running. But I still cannot get Perl to see BLAST. Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now. I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: export PATH=${PATH}:/home/hans/BLAST/bin export BLASTDIR=/home/hans/BLAST/ Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. Thanks for the help, Hans ________________________________________ From: Mark A. Jensen [maj at fortinbras.us] Sent: Sunday, September 26, 2010 8:43 To: Dave Messina; Waldenmaier, Hans Eugene Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] StandAloneBlastPlus Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: git clone http://github.com/bioperl/bioperl-live.git git clone http://github.com/bioperl/bioperl-run.git (i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl cheers MAJ -------------------------- Mark A. Jensen, PhD Senior Consultant Fortinbras Research http://www.fortinbras.us >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Sunday, September 26, 2010 12:11 PM >To: 'Waldenmaier, Hans Eugene' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans, > > >> I think the real problem is the "cannot find path to Blastall. > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > >Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Sep 28 11:04:07 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 28 Sep 2010 15:04:07 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: Should work from .bashrc, Hans. Also add export BLASTPLUSDIR=/home/hans/BLAST/bin It really should see it in the PATH as you have it, so that may be a bug; however the BLASTPLUSDIR should force it to see the program. You can also execute the export commands in the shell, and the variables will be set and visible to programs for the duration of the login session. You can see what they are set to in the shell by doing set | grep BLAST cheers MAJ >-----Original Message----- >From: Waldenmaier, Hans Eugene [mailto:waldenhe at muohio.edu] >Sent: Tuesday, September 28, 2010 12:52 AM >To: 'Mark A. Jensen', 'Dave Messina' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Thanks Guys, > >I have run those steps, my current version now is: >hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' >1.006001 > >But I am still having problems. > >I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI. I can get the command-line Blast running. But I still cannot get Perl to see BLAST. >Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now. >I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: >export PATH=${PATH}:/home/hans/BLAST/bin >export BLASTDIR=/home/hans/BLAST/ > >Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. > >Thanks for the help, > >Hans >________________________________________ >From: Mark A. Jensen [maj at fortinbras.us] >Sent: Sunday, September 26, 2010 8:43 >To: Dave Messina; Waldenmaier, Hans Eugene >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: > >git clone http://github.com/bioperl/bioperl-live.git >git clone http://github.com/bioperl/bioperl-run.git > >(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl > >cheers MAJ > >-------------------------- >Mark A. Jensen, PhD >Senior Consultant >Fortinbras Research >http://www.fortinbras.us > >>-----Original Message----- >>From: Dave Messina [mailto:David.Messina at sbc.su.se] >>Sent: Sunday, September 26, 2010 12:11 PM >>To: 'Waldenmaier, Hans Eugene' >>Cc: bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] StandAloneBlastPlus >> >>Hi Hans, >> >> >>> I think the real problem is the "cannot find path to Blastall. >> >>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. >> >>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. >> >>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. >> >> >> >>Dave >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From chiragmatkarbioinfo at gmail.com Thu Sep 30 08:20:35 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Thu, 30 Sep 2010 19:20:35 +0700 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id Message-ID: Hello all, Is there any module to fetch dna sequence data from ensemble gene id? -- Regards, Chirag Matkar From jun.yin at ucd.ie Thu Sep 30 09:36:31 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Thu, 30 Sep 2010 14:36:31 +0100 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id In-Reply-To: References: Message-ID: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> Hi, Chirag, BioPerl does not have any module to retrieve data from Ensembl. But Ensembl provides a BioPerl-like interface on that function. You can visit Ensembl's website on how to use that module: http://www.ensembl.org/info/data/api.html Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar Sent: Thursday, September 30, 2010 1:21 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id Hello all, Is there any module to fetch dna sequence data from ensemble gene id? -- Regards, Chirag Matkar _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Thu Sep 30 11:16:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 30 Sep 2010 10:16:45 -0500 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id In-Reply-To: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> References: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> Message-ID: On Sep 30, 2010, at 8:36 AM, Jun Yin wrote: > Hi, Chirag, > > BioPerl does not have any module to retrieve data from Ensembl. But Ensembl > provides a BioPerl-like interface on that function. Actually, BioPerl does have Bio::Tools::Run::Ensembl, which was submitted by Sendu Bala a few years back. I think it stills works rather well, at least tests pass. You might get more out of using the Ensembl API directly as Jun states though, YMMV. BTW, the ensembl API also works with the latest bioperl code, regardless what the Ensembl website says (e.g. they only support v1.2.3). Haven't heard more about whether this discrepancy was supposed to be addressed at some point. chris > You can visit Ensembl's website on how to use that module: > http://www.ensembl.org/info/data/api.html > > Cheers, > Jun Yin > Ph.D. student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar > Sent: Thursday, September 30, 2010 1:21 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id > > Hello all, > Is there any module to fetch dna sequence data from ensemble gene id? > > -- > Regards, > Chirag Matkar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From A.Vakhrusheva at lumc.nl Wed Sep 29 09:28:54 2010 From: A.Vakhrusheva at lumc.nl (A.Vakhrusheva at lumc.nl) Date: Wed, 29 Sep 2010 15:28:54 +0200 Subject: [Bioperl-l] Bio::Matrix::MatrixI Message-ID: <35D95AF6C5D146479C328BBBA554FB76028C367E@mailf.lumcnet.prod.intern> Bio::Matrix::MatrixI I have a question concerning this interface. I want to calculate p distances matrix, but what format is acceptable for input? Phylip doesn't work Anna From shalabh.sharma7 at gmail.com Wed Sep 1 20:56:35 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 1 Sep 2010 16:56:35 -0400 Subject: [Bioperl-l] Bio::SearchIO::hmmer Message-ID: Hi , I am trying to parse hmmsearch report (from HMMER3). I am using the script mentioned here: http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm I am not getting anything but this "amoA_10genes_align.fasta.2 [M=247] for HMM" as the output, i am not even getting any error. I am attaching the hmmsearch report (just a test report) which i tried to test against the parser. I would really appreciate if anyone can help me out. Thanks Shalabh Sharma -------------- next part -------------- # hmmsearch :: search profile(s) against a sequence database # HMMER 3.0 (March 2010); http://hmmer.org/ # Copyright (C) 2010 Howard Hughes Medical Institute. # Freely distributed under the GNU General Public License (GPLv3). # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: amoA_10genes.hmm # target sequence database: test.faa # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: amoA_10genes_align.fasta.2 [M=247] Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 1.6e-72 231.1 5.1 1.7e-72 231.0 3.5 1.0 1 gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacte 1.6e-72 231.1 5.1 1.7e-72 231.0 3.5 1.0 1 gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacte Domain annotation for each sequence (and alignments): >> gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacterium] # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 231.0 3.5 1.7e-72 1.7e-72 113 245 .. 1 144 [. 1 146 [. 0.95 Alignments for each domain: == domain 1 score: 231.0 bits; conditional E-value: 1.7e-72 amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe gi|63021979|gb|AAY26564.1| 1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 8********************************************************************************** PP amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af +k+ gi|63021979|gb|AAY26564.1| 84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144 **********************************************966666666655555 PP >> gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacterium] # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 231.0 3.5 1.7e-72 1.7e-72 113 245 .. 1 144 [. 1 146 [. 0.95 Alignments for each domain: == domain 1 score: 231.0 bits; conditional E-value: 1.7e-72 amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe gi|63021981|gb|AAY26565.1| 1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 8********************************************************************************** PP amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af +k+ gi|63021981|gb|AAY26565.1| 84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144 **********************************************966666666655555 PP Internal pipeline statistics summary: ------------------------------------- Query model(s): 1 (247 nodes) Target sequences: 2 (300 residues) Passed MSV filter: 2 (1); expected 0.0 (0.02) Passed bias filter: 2 (1); expected 0.0 (0.02) Passed Vit filter: 2 (1); expected 0.0 (0.001) Passed Fwd filter: 2 (1); expected 0.0 (1e-05) Initial search space (Z): 2 [actual number of targets] Domain search space (domZ): 2 [number of targets reported over threshold] # CPU time: 0.03u 0.00s 00:00:00.03 Elapsed: 00:00:00.08 # Mc/sec: 0.93 // From thomas.sharpton at gmail.com Wed Sep 1 21:29:26 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Wed, 1 Sep 2010 14:29:26 -0700 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: References: Message-ID: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> Hi Shalabh, We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to use the HMMER3 version, as found here: http://github.com/bioperl/bioperl-hmmer3 Hope this helps, T On Sep 1, 2010, at 1:56 PM, shalabh sharma wrote: > Hi , > I am trying to parse hmmsearch report (from HMMER3). I am using > the > script mentioned here: > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm > > I am not getting anything but this "amoA_10genes_align.fasta.2 > [M=247] for > HMM" as the output, i am not even getting any error. > I am attaching the hmmsearch report (just a test report) which i > tried to > test against the parser. > > I would really appreciate if anyone can help me out. > > Thanks > Shalabh Sharma > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From kai.blin at biotech.uni-tuebingen.de Thu Sep 2 08:44:58 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 2 Sep 2010 10:44:58 +0200 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> References: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> Message-ID: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> On Wed, 1 Sep 2010 14:29:26 -0700 Thomas Sharpton wrote: Hi, > We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to > use the HMMER3 version, as found here: > > http://github.com/bioperl/bioperl-hmmer3 Actually it's now included in the bioperl-live repository, but the code hasn't made it into a release yet. http://github.com/bioperl/bioperl-live.git Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-University of T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Deutschland Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From e.stupka at ucl.ac.uk Thu Sep 2 12:32:02 2010 From: e.stupka at ucl.ac.uk (Elia Stupka) Date: Thu, 2 Sep 2010 13:32:02 +0100 Subject: [Bioperl-l] git account Message-ID: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> Hello there, I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any? thanks! Elia --- '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." ~ Stephen Hawkings Senior Lecturer, Bioinformatics Scientific Director - Bioinformatics, UCL Genomics UCL Cancer Institute Paul O' Gorman Building University College London Gower Street WC1E 6BT London UK Institute of Cell and Molecular Science Barts and The London School of Medicine and Dentistry 4 Newark Street Whitechapel London E1 2AT Office (UCL): +44 207 679 6493 Fax: +44 0207 6796817 Office (ICMS): +44 0207 8822374 Mobile: +44 787 6478912 From cjfields at illinois.edu Thu Sep 2 14:29:40 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 2 Sep 2010 09:29:40 -0500 Subject: [Bioperl-l] git account In-Reply-To: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> References: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk> Message-ID: Done! Let us know if you run into problems. chris On Sep 2, 2010, at 7:32 AM, Elia Stupka wrote: > Hello there, > > I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any? > > thanks! > > Elia > > > --- > '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." > ~ Stephen Hawkings > > Senior Lecturer, Bioinformatics > Scientific Director - Bioinformatics, UCL Genomics > > UCL Cancer Institute > Paul O' Gorman Building > University College London > Gower Street > WC1E 6BT > London > UK > > Institute of Cell and Molecular Science > Barts and The London School of Medicine and Dentistry > 4 Newark Street > Whitechapel > London > E1 2AT > > Office (UCL): +44 207 679 6493 > Fax: +44 0207 6796817 > Office (ICMS): +44 0207 8822374 > > Mobile: +44 787 6478912 > > > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From J.Christopher.Ellis at duke.edu Thu Sep 2 14:53:34 2010 From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis) Date: Thu, 2 Sep 2010 10:53:34 -0400 Subject: [Bioperl-l] Taxonomy DB problem Message-ID: <53096.1283439214@duke.edu> Chris have you had any luck with this? Thanks, Chris On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent: Yes, I see that one. It may be the ID hash that is being returned is empty. I'll look into it. -c On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > Hi Chris, > > The error is... > > "Use of uninitialized value $id in join or string at C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363." > > The script from http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > use Bio::DB::EUtilities; > > > > > > > > > my (%taxa, @taxa); > > > > my (%names, %idmap); > > > > > > > > > # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', > > > > # (probably) > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > 20807972 > 730439); > > > > > > > my $factory = Bio::DB::EUtilities->new( > > - > eutil => 'elink', > > > -db => 'taxonomy', > > > > > -dbfrom => 'protein', > > > > > -correspondence => 1, > > > > > -id => @ids); > > > > > > > > > # iterate through the LinkSet objects > > > > while (my $ds = $factory->next_LinkSet) { > > > > > $taxa{($ds->get_submitted_ids)[0] > > } > = ($ds->get_ids)[0] > > } > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > => > 'esummary', > > > -db => 'taxonomy', > > > > > -id => @taxa ); > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > { > > > $names{($_->get_contents_by_name('TaxId')) > > [ > 0]} = > > ($_->get_contents_by_name('ScientificName'))[0 > > ] > ; > > } > > > > > > > > > foreach (@ids) { > > > > > $idmap{$_} = $names{$taxa{$_ > > } > }; > > } > > > > > > > > > # %idmap is > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > # 730439 => 'Bacillus caldolyticus' > > > > # 89318838 => undef (this record has been removed from the db) > > > > > > > > > 1; > > > Thanks, > > > > Chris > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent: > Chris, > > Regarding a fix for that script, we would have to see your modified script and the error. However, there are modules within BioPerl to essentially do what you want, in particular, Bio::DB::Taxonomy. > > chris > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > Hi All, > > > > I am trying to extract the entire taxonomy of an organism including the > > classifications. Some thing like... > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia > > > > I am not worried about format just that I get the information and the associated level of hierarchy. The script found athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > My first question is "Is there a known fix for this?" and my second question is how do I get the full hierarchical information (as seen above) with the taxonomy db? > > > > Thanks for all your help in advance! > > > > Chris > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Sep 2 16:21:48 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 02 Sep 2010 11:21:48 -0500 Subject: [Bioperl-l] Taxonomy DB problem In-Reply-To: <53096.1283439214@duke.edu> References: <53096.1283439214@duke.edu> Message-ID: <1283444508.5339.10.camel@pyrimidine.igb.uiuc.edu> Chris, There are a few things wrong with the original script, so I'll fix them. Basically, it makes the assumption that every ID in the original list is found. The problem: eutils only reports back data it finds, silently discarding IDs that don't match. So, using the original ID list when building the hashes needs a bit more error checking. Here's the revised script that works for me. https://gist.github.com/f5db90a432fed68548d4 I'm also adding a check to ensure all IDs are defined prior to adding them to the param string, just in case. chris On Thu, 2010-09-02 at 10:53 -0400, J. Christopher Ellis wrote: > Chris have you had any luck with this? > > Thanks, > Chris > > On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent: > Yes, I see that one. It may be the ID hash that is being > returned is empty. I'll look into it. > > -c > > On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote: > > > Hi Chris, > > > > The error is... > > > > "Use of uninitialized value $id in join or string at > C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm > line 363." > > > > The script from > http://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows.... > > > > use Bio::DB::EUtilities; > > > > > > > > > > > > > > > > > > my (%taxa, @taxa); > > > > > > > > my (%names, %idmap); > > > > > > > > > > > > > > > > > > # these are protein ids; nuc ids will work by changing > -dbfrom => 'nucleotide', > > > > > > > > # (probably) > > > > > > > > > > > > > > > > > > my @ids = qw(1621261 89318838 68536103 > > > > 20807972 > > 730439); > > > > > > > > > > > > > > my $factory = Bio::DB::EUtilities->new( > > > > - > > eutil => 'elink', > > > > > > -db => 'taxonomy', > > > > > > > > > > -dbfrom => 'protein', > > > > > > > > > > -correspondence => 1, > > > > > > > > > > -id => \@ids); > > > > > > > > > > > > > > > > > > # iterate through the LinkSet objects > > > > > > > > while (my $ds = $factory->next_LinkSet) { > > > > > > > > > > $taxa{($ds->get_submitted_ids)[0] > > > > } > > = ($ds->get_ids)[0] > > > > } > > > > > > > > > > > > > > > > > > @taxa = @taxa{@ids}; > > > > > > > > > > > > > > > > > > $factory = Bio::DB::EUtilities->new(-eutil > > > > => > > 'esummary', > > > > > > -db => 'taxonomy', > > > > > > > > > > -id => \@taxa ); > > > > > > > > > > > > > > > > > > while (local $_ = $factory->next_DocSum) > > > > > > { > > > > > > $names{($_->get_contents_by_name('TaxId')) > > > > [ > > 0]} = > > > > ($_->get_contents_by_name('ScientificName'))[0 > > > > ] > > ; > > > > } > > > > > > > > > > > > > > > > > > foreach (@ids) { > > > > > > > > > > $idmap{$_} = $names{$taxa{$_ > > > > } > > }; > > > > } > > > > > > > > > > > > > > > > > > # %idmap is > > > > > > > > # 1621261 => 'Mycobacterium tuberculosis H37Rv' > > > > > > > > # 20807972 => 'Thermoanaerobacter tengcongensis MB4' > > > > > > > > # 68536103 => 'Corynebacterium jeikeium K411' > > > > > > > > # 730439 => 'Bacillus caldolyticus' > > > > > > > > # 89318838 => undef (this record has been removed from the > db) > > > > > > > > > > > > > > > > > > 1; > > > > > > Thanks, > > > > > > > > Chris > > > > > > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu > sent: > > Chris, > > > > Regarding a fix for that script, we would have to see your > modified script and the error. However, there are modules > within BioPerl to essentially do what you want, in particular, > Bio::DB::Taxonomy. > > > > chris > > > > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote: > > > > > Hi All, > > > > > > I am trying to extract the entire taxonomy of an organism > including the > > > classifications. Some thing like... > > > > > > Phylum:Proteobacteria, Class:Gammaproteobacteria, > Order:Enterobacteriales, Family:Enterobacteriaceae, > Genus:Escherichia > > > > > > I am not worried about format just that I get the > information and the associated level of hierarchy. The script > found > http://bioperl.org/wiki/Species_names_from_accession_numbers% > 26quot%3B%26gt% > 3Bhttp://bioperl.org/wiki/Species_names_from_accession_numbers">athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error. > > > > > > My first question is "Is there a known fix for this?" and > my second question is how do I get the full hierarchical > information (as seen above) with the taxonomy db? > > > > > > Thanks for all your help in advance! > > > > > > Chris > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l% > 26quot%3B%26gt% > 3Bhttp://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From thomas.sharpton at gmail.com Thu Sep 2 16:34:07 2010 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Thu, 2 Sep 2010 09:34:07 -0700 Subject: [Bioperl-l] Bio::SearchIO::hmmer In-Reply-To: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> References: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com> <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de> Message-ID: So it is! I'm paying attention, I swear I am.... Shalabh, if the HMMER3 version of SearchIO doesn't solve your problem, do let us know. Best, Tom On Sep 2, 2010, at 1:44 AM, Kai Blin wrote: > On Wed, 1 Sep 2010 14:29:26 -0700 > Thomas Sharpton wrote: > > Hi, > >> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to >> use the HMMER3 version, as found here: >> >> http://github.com/bioperl/bioperl-hmmer3 > > Actually it's now included in the bioperl-live repository, but the > code > hasn't made it into a release yet. > > http://github.com/bioperl/bioperl-live.git > > Cheers, > Kai > -- > Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de > Institute for Microbiology and Infection Medicine > Division of Microbiology/Biotechnology > Eberhard-Karls-University of T?bingen > Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 > D-72076 T?bingen Fax : ++49 7071 29-5979 > Deutschland > Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From johnny at mit.edu Sat Sep 4 15:40:37 2010 From: johnny at mit.edu (Jonathan Rameseder) Date: Sat, 4 Sep 2010 11:40:37 -0400 Subject: [Bioperl-l] Client-side Scansite Bioperl module Message-ID: hi guys it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases. maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D! best wishes johnny [1] Bio::Tools::Analysis::Protein::Scansite [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593 ******************** Jonathan Rameseder Ph.D. Candidate Computational Systems Biology Initiative Koch Institute for Integrative Cancer Research Massachusetts Institute of Technology ******************** From David.Messina at sbc.su.se Mon Sep 6 12:14:20 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Sep 2010 14:14:20 +0200 Subject: [Bioperl-l] Client-side Scansite Bioperl module In-Reply-To: References: Message-ID: <0EA1C4B0-66CF-4AE3-9A47-CC6624737821@sbc.su.se> Hi Jonathan, Great to hear you're interested in including your code in BioPerl! In general, we are liberal in what we accept. I think (and I'd like to hear what other BioPerlers think) the value of adding your code depends a lot on how it ties in with existing BioPerl objects ? does it make use of Bio::Seq or Bio::SeqIO, for example? If you haven't already, you might want to take a look at some of our developer documentation. For example: http://www.bioperl.org/wiki/Bioperl_Best_Practices http://www.bioperl.org/wiki/Advanced_BioPerl Also, the other thing to be aware of is that in the near future BioPerl itself will be splitting up into separately distributed modules anyway. I can't find a good recent thread that discussed the rationale and details, but here's a couple anyway: http://www.bioperl.org/wiki/Proposed_BioPerl_changes http://old.nabble.com/Final-BioPerl-1.6-release-td29180027.html#a29195208 Dave From ross at cuhk.edu.hk Tue Sep 7 08:28:00 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 7 Sep 2010 16:28:00 +0800 Subject: [Bioperl-l] Indexing nr database In-Reply-To: References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> Message-ID: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> By the following codes, I wanna index the 4G nr database, however, the index file is > 1T and the job has been running for weeks and still hasn't finished. Could anybody tell me how you accomplish the goal? Thanks in advance. use strict; use Bio::DB::Flat::BinarySearch; (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; # use single quotes so you don't have to write # regular expressions like "gi\\|(\\d+)" #my $primary_pattern = '^>(\S+)'; #if ($fullHeader == 1) { my $primary_pattern = '^>(.+)'; #} my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis H37Rv complete genome"; #$string =~ s/$primary_pattern/RRR/g; #print "$string\n"; # one or more patterns stored in a hash: my $secondary_patterns = {GI => 'gi\|(\d+)'}; my $db = Bio::DB::Flat::BinarySearch->new( -directory => $baseDir, -dbname => $dbName, -write_flag => 1, -primary_pattern => $primary_pattern, -primary_namespace => 'ACC', -secondary_patterns => $secondary_patterns, -verbose => 1, -format => 'fasta' ); $db->build_index($seqFile); From David.Messina at sbc.su.se Tue Sep 7 09:23:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 7 Sep 2010 11:23:42 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> Message-ID: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> Hi Ross, What do you need the index for? If it's random retrieval of sequences using an accession or GI, you'd be better off using NCBI's own database indexing and retrieval tools. They're far faster than BioPerl. They're distributed with Blast+ and available here: ftp://ftp.ncbi.nlm.nih.gov//blast/executables/LATEST Specifically, I'm talking about 'makeblastdb' and blastdbcmd'. I'm not sure what you mean by "4g" nr, but there's an already-indexed version of nr available here: ftp://ftp.ncbi.nih.gov//blast/db You can use that directly with the BLAST+ database tools. Also, you take a look at the cookbook at the end of the Blast+ user manual (available in the same download directory as Blast+ itself). Some nice examples there showing off the flexibility of this latest version of the software. Dave From ross at cuhk.edu.hk Tue Sep 7 09:18:16 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Tue, 7 Sep 2010 17:18:16 +0800 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <4C860148.3030000@fmi.ch> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> Message-ID: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> The reason is that I have to retrieve the specific information of the matched sequences, e.g. extract the 64th amino acid of the top matched sequence. Is there any way to achieve that? -----Original Message----- From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] Sent: Tuesday, September 07, 2010 5:09 PM To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk Subject: Re: [Bioperl-l] Indexing nr database Hi why don't you use the pre-indexed BLAST files from NCBI: ftp://ftp.ncbi.nih.gov/blast/db/ you can use them to fetch individual sequences by gi number or accession with the tool "blastdbcmd" from blast+ binaries: ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ regards, Hans On 09/07/2010 10:28 AM, Ross KK Leung wrote: > By the following codes, I wanna index the 4G nr database, however, the index > file is> 1T and the job has been running for weeks and still hasn't > finished. Could anybody tell me how you accomplish the goal? Thanks in > advance. > > use strict; > > use Bio::DB::Flat::BinarySearch; > > > > (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; > > > > # use single quotes so you don't have to write > > # regular expressions like "gi\\|(\\d+)" > > #my $primary_pattern = '^>(\S+)'; > > #if ($fullHeader == 1) { > > my $primary_pattern = '^>(.+)'; > > #} > > my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis > H37Rv complete genome"; > #$string =~ s/$primary_pattern/RRR/g; > > #print "$string\n"; > > > > # one or more patterns stored in a hash: > > my $secondary_patterns = {GI => 'gi\|(\d+)'}; > > > > my $db = Bio::DB::Flat::BinarySearch->new( > > -directory => $baseDir, > > -dbname => $dbName, > > -write_flag => 1, > > -primary_pattern => $primary_pattern, > > -primary_namespace => 'ACC', > > -secondary_patterns => $secondary_patterns, > > -verbose => 1, > > -format => 'fasta' ); > > > > $db->build_index($seqFile); > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Tue Sep 7 09:09:28 2010 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Tue, 07 Sep 2010 11:09:28 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> Message-ID: <4C860148.3030000@fmi.ch> Hi why don't you use the pre-indexed BLAST files from NCBI: ftp://ftp.ncbi.nih.gov/blast/db/ you can use them to fetch individual sequences by gi number or accession with the tool "blastdbcmd" from blast+ binaries: ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ regards, Hans On 09/07/2010 10:28 AM, Ross KK Leung wrote: > By the following codes, I wanna index the 4G nr database, however, the index > file is> 1T and the job has been running for weeks and still hasn't > finished. Could anybody tell me how you accomplish the goal? Thanks in > advance. > > use strict; > > use Bio::DB::Flat::BinarySearch; > > > > (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV; > > > > # use single quotes so you don't have to write > > # regular expressions like "gi\\|(\\d+)" > > #my $primary_pattern = '^>(\S+)'; > > #if ($fullHeader == 1) { > > my $primary_pattern = '^>(.+)'; > > #} > > my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis > H37Rv complete genome"; > #$string =~ s/$primary_pattern/RRR/g; > > #print "$string\n"; > > > > # one or more patterns stored in a hash: > > my $secondary_patterns = {GI => 'gi\|(\d+)'}; > > > > my $db = Bio::DB::Flat::BinarySearch->new( > > -directory => $baseDir, > > -dbname => $dbName, > > -write_flag => 1, > > -primary_pattern => $primary_pattern, > > -primary_namespace => 'ACC', > > -secondary_patterns => $secondary_patterns, > > -verbose => 1, > > -format => 'fasta' ); > > > > $db->build_index($seqFile); > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hrh at fmi.ch Tue Sep 7 09:33:46 2010 From: hrh at fmi.ch (Hans-Rudolf Hotz) Date: Tue, 07 Sep 2010 11:33:46 +0200 Subject: [Bioperl-l] Indexing nr database In-Reply-To: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch> <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk> Message-ID: <4C8606FA.3000509@fmi.ch> On 09/07/2010 11:18 AM, Ross KK Leung wrote: > The reason is that I have to retrieve the specific information of the > matched sequences, e.g. extract the 64th amino acid of the top matched > sequence. Is there any way to achieve that? "blastdbcmd" has several options like "-range" and even if "blastdbcmd" does not give you the subset of information you want to fetch, I am still convinced you are quicker by fetching the complete entry with"blastdbcmd" and then parse the required data out of just one entry. Hans > -----Original Message----- > From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] > Sent: Tuesday, September 07, 2010 5:09 PM > To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk > Subject: Re: [Bioperl-l] Indexing nr database > > Hi > > > why don't you use the pre-indexed BLAST files from NCBI: > > ftp://ftp.ncbi.nih.gov/blast/db/ > > you can use them to fetch individual sequences by gi number or accession > with the tool "blastdbcmd" from blast+ binaries: > > ftp://ftp.ncbi.nih.gov/blast/executables/blast+/ > > > regards, Hans > > > > On 09/07/2010 10:28 AM, Ross KK Leung wrote: >> By the following codes, I wanna index the 4G nr database, however, the > index >> file is> 1T and the job has been running for weeks and still hasn't >> finished. Could anybody tell me how you accomplish the goal? Thanks in >> advance. >> >> use strict; >> >> use Bio::DB::Flat::BinarySearch; >> >> >> >> (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = > @ARGV; >> >> >> >> # use single quotes so you don't have to write >> >> # regular expressions like "gi\\|(\\d+)" >> >> #my $primary_pattern = '^>(\S+)'; >> >> #if ($fullHeader == 1) { >> >> my $primary_pattern = '^>(.+)'; >> >> #} >> >> my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis >> H37Rv complete genome"; >> #$string =~ s/$primary_pattern/RRR/g; >> >> #print "$string\n"; >> >> >> >> # one or more patterns stored in a hash: >> >> my $secondary_patterns = {GI => 'gi\|(\d+)'}; >> >> >> >> my $db = Bio::DB::Flat::BinarySearch->new( >> >> -directory => $baseDir, >> >> -dbname => $dbName, >> >> -write_flag => 1, >> >> -primary_pattern => $primary_pattern, >> >> -primary_namespace => 'ACC', >> >> -secondary_patterns => $secondary_patterns, >> >> -verbose => 1, >> >> -format => 'fasta' ); >> >> >> >> $db->build_index($seqFile); >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From fs5 at sanger.ac.uk Tue Sep 7 12:09:52 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 07 Sep 2010 13:09:52 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> Message-ID: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> I am working a lot with feature-rich Bio::Seq objects these days and thought that it would be really nice if I could do something like: my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); instead of having to grep for the feature every time. There could then be 'by_tag' and 'by_region' options as well. According to the Bio::Seq docs, something like this seems to be planned at some stage. I would be willing to contribute to this feature if I can and if this isn't already being implemented by somebody else. Does anybody know the state of this feature? Frank -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jason at bioperl.org Tue Sep 7 17:36:07 2010 From: jason at bioperl.org (Jason Stajich) Date: Tue, 07 Sep 2010 10:36:07 -0700 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <4C867807.2040907@bioperl.org> And the implementation would just be something like this? my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] eq 'my_gene' } $seq->get_SeqFeatures(); I think any implementation would be if we moved from the in-memory arrays & hash-based system to a sqlite db on the back-end for how Sequence and Feature objects are stored. This would be a somewhat slower but wouldn't have performance/memory problems we get for sequences with many annotations. -jason Frank Schwach wrote, On 9/7/10 5:09 AM: > I am working a lot with feature-rich Bio::Seq objects these days and > thought that it would be really nice if I could do something like: > > my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > > instead of having to grep for the feature every time. > There could then be 'by_tag' and 'by_region' options as well. > > According to the Bio::Seq docs, something like this seems to be planned > at some stage. I would be willing to contribute to this feature if I can > and if this isn't already being implemented by somebody else. > Does anybody know the state of this feature? > > Frank > > > > > > > From fs5 at sanger.ac.uk Wed Sep 8 08:42:57 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 08 Sep 2010 09:42:57 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <4C867807.2040907@bioperl.org> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> Message-ID: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> Hi Jason, Yes, I guess that would be the simplest way of doing it - basically just doing it the way the docs suggest for getting at a specific feature but hiding the grep behind a Bio::Seq method with search parameters. But we could also build a hash of feature tags as the Bio::Seq is built so that retrieval is more efficient. This could also be used to implement a bin indexing scheme for range queries, similar to what Bio::DB::GFF does. Is a move to an sqlite backend planend for the near future? Frank On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: > And the implementation would just be something like this? > > my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] > eq 'my_gene' } $seq->get_SeqFeatures(); > > I think any implementation would be if we moved from the in-memory > arrays & hash-based system to a sqlite db on the back-end for how > Sequence and Feature objects are stored. > This would be a somewhat slower but wouldn't have performance/memory > problems we get for sequences with many annotations. > > -jason > Frank Schwach wrote, On 9/7/10 5:09 AM: > > I am working a lot with feature-rich Bio::Seq objects these days and > > thought that it would be really nice if I could do something like: > > > > my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > > > > instead of having to grep for the feature every time. > > There could then be 'by_tag' and 'by_region' options as well. > > > > According to the Bio::Seq docs, something like this seems to be planned > > at some stage. I would be willing to contribute to this feature if I can > > and if this isn't already being implemented by somebody else. > > Does anybody know the state of this feature? > > > > Frank > > > > > > > > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From stefan.kirov at bms.com Wed Sep 8 15:09:55 2010 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 08 Sep 2010 11:09:55 -0400 Subject: [Bioperl-l] Another interesting Javascript library Message-ID: <4C87A743.5010109@bms.com> Sorry for off topic, but I believe a lot of people can find this quite useful: "CanvasXpress is a javascript library based on the tag implemented in HTML5. I developed this library as the core visualization component for our BMS systems biology platform which I hope to release soon. The basic idea was to have generic and simple way to display genomics data. CanvasXpress supports bar graphs, line graphs, bar-line combination graphs, boxplots, dotplots, area graphs, stacked graphs, percentage-stacked graphs, correlation plots, Venn diagrams, heatmaps, newick trees, 2D-scatter plots, 2D-scatter bubble plots, 3D-scatter plots, pie charts, networks (or pathways), and a genome browser. It also supports a few data transformations like log and exponential transformation, z-score, percentile transformation and ratio. It also support grouping of samples, zooming, events ... yada, yada, yada ... and more importantly I created an Ext panel for it. Take a look. http://canvasxpress.org/" Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From alperyilmaz at gmail.com Wed Sep 8 16:47:42 2010 From: alperyilmaz at gmail.com (Alper Yilmaz) Date: Wed, 8 Sep 2010 12:47:42 -0400 Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates Message-ID: Hi, I have a GFF file listing mRNA and CDS coordinates for every transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates based on that information. I was wondering, if there's already made script for that purpose that you're aware of. I already uploaded the GFF file into Bio::DB::SeqFeature database, so I can utilize both flat file or database based scripts. thanks, Alper Yilmaz Post-doctoral Researcher Plant Biotechnology Center The Ohio State University 1060 Carmack Rd Columbus, OH 43210 (614)688-4954 From cjfields at illinois.edu Wed Sep 8 23:20:09 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Sep 2010 18:20:09 -0500 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> Message-ID: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). chris On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > Hi Jason, > > Yes, I guess that would be the simplest way of doing it - basically just > doing it the way the docs suggest for getting at a specific feature but > hiding the grep behind a Bio::Seq method with search parameters. But we > could also build a hash of feature tags as the Bio::Seq is built so that > retrieval is more efficient. This could also be used to implement a bin > indexing scheme for range queries, similar to what Bio::DB::GFF does. > Is a move to an sqlite backend planend for the near future? > > Frank > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: >> And the implementation would just be something like this? >> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] >> eq 'my_gene' } $seq->get_SeqFeatures(); >> >> I think any implementation would be if we moved from the in-memory >> arrays & hash-based system to a sqlite db on the back-end for how >> Sequence and Feature objects are stored. >> This would be a somewhat slower but wouldn't have performance/memory >> problems we get for sequences with many annotations. >> >> -jason >> Frank Schwach wrote, On 9/7/10 5:09 AM: >>> I am working a lot with feature-rich Bio::Seq objects these days and >>> thought that it would be really nice if I could do something like: >>> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); >>> >>> instead of having to grep for the feature every time. >>> There could then be 'by_tag' and 'by_region' options as well. >>> >>> According to the Bio::Seq docs, something like this seems to be planned >>> at some stage. I would be willing to contribute to this feature if I can >>> and if this isn't already being implemented by somebody else. >>> Does anybody know the state of this feature? >>> >>> Frank >>> >>> >>> >>> >>> >>> >>> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu Sep 9 05:51:53 2010 From: jason at bioperl.org (Jason Stajich) Date: Wed, 08 Sep 2010 22:51:53 -0700 Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates In-Reply-To: References: Message-ID: <4C8875F9.6020502@bioperl.org> Hi Alper - This script operates on gtf so doesn't quite do what you want but could be modified to be simpler to just look at the CDS and mRNA rather than the exon,start/stop codon info http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl Otherwise I think there make be some easy ways to do this from some tools in MAKER too. -jason Alper Yilmaz wrote, On 9/8/10 9:47 AM: > Hi, > > I have a GFF file listing mRNA and CDS coordinates for every > transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates > based on that information. I was wondering, if there's already made > script for that purpose that you're aware of. > > I already uploaded the GFF file into Bio::DB::SeqFeature database, so > I can utilize both flat file or database based scripts. > > thanks, > > Alper Yilmaz > Post-doctoral Researcher > Plant Biotechnology Center > The Ohio State University > 1060 Carmack Rd > Columbus, OH 43210 > (614)688-4954 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Thu Sep 9 08:10:36 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 09 Sep 2010 09:10:36 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Message-ID: <1284019836.4777.281.camel@deskpro15336.dynamic.sanger.ac.uk> so something like an abstract Bio::Seq::FeatureContainer that defines the methods for storing and retrieving features and that would then be sub-classed to e.g. Bio::Seq::FeatureContainer::Memory or Bio::Seq::FeatureContainer:Sqlite - is that the plan? Is there any way I can get involved or is it better to wait for other features to be developed first? Cheers, Frank On Wed, 2010-09-08 at 18:20 -0500, Chris Fields wrote: > Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). > > chris > > On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > > > Hi Jason, > > > > Yes, I guess that would be the simplest way of doing it - basically just > > doing it the way the docs suggest for getting at a specific feature but > > hiding the grep behind a Bio::Seq method with search parameters. But we > > could also build a hash of feature tags as the Bio::Seq is built so that > > retrieval is more efficient. This could also be used to implement a bin > > indexing scheme for range queries, similar to what Bio::DB::GFF does. > > Is a move to an sqlite backend planend for the near future? > > > > Frank > > > > > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: > >> And the implementation would just be something like this? > >> > >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] > >> eq 'my_gene' } $seq->get_SeqFeatures(); > >> > >> I think any implementation would be if we moved from the in-memory > >> arrays & hash-based system to a sqlite db on the back-end for how > >> Sequence and Feature objects are stored. > >> This would be a somewhat slower but wouldn't have performance/memory > >> problems we get for sequences with many annotations. > >> > >> -jason > >> Frank Schwach wrote, On 9/7/10 5:09 AM: > >>> I am working a lot with feature-rich Bio::Seq objects these days and > >>> thought that it would be really nice if I could do something like: > >>> > >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); > >>> > >>> instead of having to grep for the feature every time. > >>> There could then be 'by_tag' and 'by_region' options as well. > >>> > >>> According to the Bio::Seq docs, something like this seems to be planned > >>> at some stage. I would be willing to contribute to this feature if I can > >>> and if this isn't already being implemented by somebody else. > >>> Does anybody know the state of this feature? > >>> > >>> Frank > >>> > >>> > >>> > >>> > >>> > >>> > >>> > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jun.yin at ucd.ie Thu Sep 9 08:20:39 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Thu, 09 Sep 2010 09:20:39 +0100 Subject: [Bioperl-l] Bio::Seq, search for specific features In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk> <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se> <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk> <4C867807.2040907@bioperl.org> <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk> <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu> Message-ID: <00ea01cb4ff7$e30652f0$a912f8d0$%yin@ucd.ie> Hi, I would like to give a go on the bin indexing scheme on Bio::Seq(or a similar package to Bio::LocatableSeq). The idea is to save the index of sequences to a local database (AnyDBM) instead of the memory itself. So this will free some memory usage. This idea actually comes from Bio::DB::Fasta, as implemented by Lincoln Stein. Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Thursday, September 09, 2010 12:20 AM To: Frank Schwach Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Seq, search for specific features Well, no move has been concretely made yet. It would be nice to abstract the backend, so one could use possibly any db or memory adaptor. This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing). chris On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote: > Hi Jason, > > Yes, I guess that would be the simplest way of doing it - basically just > doing it the way the docs suggest for getting at a specific feature but > hiding the grep behind a Bio::Seq method with search parameters. But we > could also build a hash of feature tags as the Bio::Seq is built so that > retrieval is more efficient. This could also be used to implement a bin > indexing scheme for range queries, similar to what Bio::DB::GFF does. > Is a move to an sqlite backend planend for the near future? > > Frank > > > > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote: >> And the implementation would just be something like this? >> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] >> eq 'my_gene' } $seq->get_SeqFeatures(); >> >> I think any implementation would be if we moved from the in-memory >> arrays & hash-based system to a sqlite db on the back-end for how >> Sequence and Feature objects are stored. >> This would be a somewhat slower but wouldn't have performance/memory >> problems we get for sequences with many annotations. >> >> -jason >> Frank Schwach wrote, On 9/7/10 5:09 AM: >>> I am working a lot with feature-rich Bio::Seq objects these days and >>> thought that it would be really nice if I could do something like: >>> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene'); >>> >>> instead of having to grep for the feature every time. >>> There could then be 'by_tag' and 'by_region' options as well. >>> >>> According to the Bio::Seq docs, something like this seems to be planned >>> at some stage. I would be willing to contribute to this feature if I can >>> and if this isn't already being implemented by somebody else. >>> Does anybody know the state of this feature? >>> >>> Frank >>> >>> >>> >>> >>> >>> >>> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From s1012635 at student.hsleiden.nl Thu Sep 9 09:27:23 2010 From: s1012635 at student.hsleiden.nl (_Lelieveld, Stefan - s1012635) Date: Thu, 9 Sep 2010 11:27:23 +0200 (CEST) Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Hi, I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : use Bio::Tools::Tmhmm; my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); while( my $tmhmm_feat = $parser->next_result ) { #do something #eg push @tmhmm_feat, $tmhmm_feat; } How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? cheers! Stefan Lelieveld From fs5 at sanger.ac.uk Thu Sep 9 10:28:51 2010 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 09 Sep 2010 11:28:51 +0100 Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> References: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <1284028131.4777.290.camel@deskpro15336.dynamic.sanger.ac.uk> I haven't used that module myself but it appears to be a parser for results from TMHMM, i.e. you don't feed it the FASTA file but the output from TMHMM after it was run. To run TMHMM you should use Bio::Tools::Run::Tmhmm http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Tmhmm.pm Follow the synopsis to feed the tool with your sequences. You can learn how to read a FASTA file and access each sequence in a loop here: http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples Essentially it boils down to: use Bio::SeqIO; my $file = shift; # to get a file path from command line my $inseq = Bio::SeqIO->new(-file => "<$file",-format => 'FASTA' ); while (my $seq = $inseq->next_seq) { print $seq->accession_number,"\n"; } as an example for printing out accession numbers from $seq, which is a Bio::Seq object. So what you have to do now is to feed each of those Bio::Seq objects into your TMHMM runner. Frank On Thu, 2010-09-09 at 11:27 +0200, _Lelieveld, Stefan - s1012635 wrote: > Hi, > > I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). > I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. > > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : > use Bio::Tools::Tmhmm; > my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); > while( my $tmhmm_feat = $parser->next_result ) { > #do something > #eg > push @tmhmm_feat, $tmhmm_feat; > } > > How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? > > cheers! > > Stefan Lelieveld > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From kai.blin at biotech.uni-tuebingen.de Thu Sep 9 10:16:08 2010 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 9 Sep 2010 12:16:08 +0200 Subject: [Bioperl-l] Bio::Tools::TMHMM; In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> References: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl> <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl> Message-ID: <20100909121608.2571bbff.kai.blin@biotech.uni-tuebingen.de> On Thu, 9 Sep 2010 11:27:23 +0200 (CEST) "_Lelieveld, Stefan - s1012635" wrote: Hi Stefan, > http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : > use Bio::Tools::Tmhmm; > my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); > while( my $tmhmm_feat = $parser->next_result ) { > #do something > #eg > push @tmhmm_feat, $tmhmm_feat; > } > > How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? You need to run TMHMM first, of course. Bio::Tools::Tmhmm only parses the TMHMM output file and returns an object that you can ask for Bio::SeqFeature objects. So if you want to run TMHMM on some fasta files, this module isn't going to do that for you. Assuming that input.txt contains the TMHMM output, """ my $parser = new Bio::Tools:Tmhmm(-file => "input.txt"); """ will load parse the TMHMM output for you. HTH, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From elanorbust2 at yahoo.com Thu Sep 9 16:10:06 2010 From: elanorbust2 at yahoo.com (sally roberts) Date: Thu, 9 Sep 2010 09:10:06 -0700 (PDT) Subject: [Bioperl-l] standaloneblastplus Message-ID: <154453.73718.qm@web37504.mail.mud.yahoo.com> I am running a test for standaloneblastplus but getting data back that does not exist in my query or my local database. Below is a outline of my script small database, query list, and erroneous results. As you will notice the query list is comprised of the first four sequences found in the database. The results say it can not find the first two and then the mathces for the last two do not exist! Thanks for any help! Program #!/usr/bin/perl use Bio::Tools::Run::StandAloneBlastPlus; $fac = Bio::Tools::Run::StandAloneBlastPlus->new( ? -db_name => 'ITS', ? -db_data => 'smallDB.fas', ? -create => 1 ); $result = $fac->blastn( -query => , 'sequences.fasta', ??????????????????????? -outfile => 'ITStest2.bls'); smallDB.fas Data >302585252|HM807352|Waitea circinata? internal transcribed spacer 1 ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >302585252|HM807352|Waitea circinata? internal transcribed spacer 2 GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA" CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGCGCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCATTAAAGCGGAGGAA >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1 GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGCGCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTATATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAGAATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCTACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACCTCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTGACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGTGAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCGTTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCGTCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTACCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAGTCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATATATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGAACGCA CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTAAGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCTTATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTGGGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGAATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGAAACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGAAATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTAACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGACTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAAGTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACGTAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGAAACTTTTATATATAGTTCGCCGAATAATAGCGAAC >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer 1 AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCGCTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATCGTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTGAGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTCGAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCGAGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACCGGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTGGCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCCCTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTGTATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATATGAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATGTATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCTATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1 AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCTTTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATCTCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCGGGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAATAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGAAACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAATATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCCGCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTCGTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATAGTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATACAAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCATAAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1 AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGCTCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATTCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTCTTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCGATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGAGATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAATTACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAACGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGTAGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATATATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG sequences.fasta data >Test1 ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >Test2 GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >Test3 CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >Test4 GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA Results BLASTN 2.2.24+ Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Database: ITS ?????????? 5 sequences; 1,102 total letters Query=? Test1 Length=204 ***** No hits found ***** Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 202071 Query=? Test2 Length=192 ***** No hits found ***** Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 189507 Query=? Test3 Length=437 ????????????????????????????????????????????????????????????????????? Score???? E Sequences producing significant alignments:????????????????????????? (Bits)? Value dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 300??? 2e-085 dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 6e-016 dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012 dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 4e-012 >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G59F Length=203 ?Score =? 300 bits (162),? Expect = 2e-085 ?Identities = 176/182 (96%), Gaps = 4/182 (2%) ?Strand=Plus/Plus Query? 10?? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC? 66 ??????????? ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC? 81 Query? 67?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 126 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 82?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 141 Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 201 Query? 187? GG? 188 ??????????? || Sbjct? 202? GG? 203 >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G64F Length=217 ?Score = 69.4 bits (37),? Expect = 6e-016 ?Identities = 39/40 (97%), Gaps = 0/40 (0%) ?Strand=Plus/Plus Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188 ??????????? ||||| |||||||||||||||||||||||||||||||||| Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G60F Length=206 ?Score = 58.4 bits (31),? Expect = 1e-012 ?Identities = 39/42 (92%), Gaps = 3/42 (7%) ?Strand=Plus/Plus Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||| || ||| |||||||||||||||||||||||||||||| Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G65F Length=256 ?Score = 56.5 bits (30),? Expect = 4e-012 ?Identities = 30/30 (100%), Gaps = 0/30 (0%) ?Strand=Plus/Plus Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186 ??????????? |||||||||||||||||||||||||||||| Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 442850 Query=? Test4 Length=521 ????????????????????????????????????????????????????????????????????? Score???? E Sequences producing significant alignments:????????????????????????? (Bits)? Value dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 309??? 4e-088 dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 7e-016 dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012 dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 5e-012 >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G59F Length=203 ?Score =? 309 bits (167),? Expect = 4e-088 ?Identities = 177/181 (97%), Gaps = 3/181 (1%) ?Strand=Plus/Plus Query? 7??? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA? 63 ??????????? ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA? 82 Query? 64?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 123 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 83?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 142 Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 183 ??????????? |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 202 Query? 184? G? 184 ??????????? | Sbjct? 203? G? 203 >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G64F Length=217 ?Score = 69.4 bits (37),? Expect = 7e-016 ?Identities = 39/40 (97%), Gaps = 0/40 (0%) ?Strand=Plus/Plus Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184 ??????????? ||||| |||||||||||||||||||||||||||||||||| Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G60F Length=206 ?Score = 58.4 bits (31),? Expect = 1e-012 ?Identities = 39/42 (92%), Gaps = 3/42 (7%) ?Strand=Plus/Plus Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182 ??????????? |||| || ||| |||||||||||||||||||||||||||||| Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial sequence, clone: G65F Length=256 ?Score = 56.5 bits (30),? Expect = 5e-012 ?Identities = 30/30 (100%), Gaps = 0/30 (0%) ?Strand=Plus/Plus Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182 ??????????? |||||||||||||||||||||||||||||| Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 Lambda???? K????? H ??? 1.33??? 0.621???? 1.12 Gapped Lambda???? K????? H ??? 1.28??? 0.460??? 0.850 Effective search space used: 530378 ? Database: ITS ??? Posted date:? Aug 27, 2010? 9:43 AM ? Number of letters in database: 1,102 ? Number of sequences in database:? 5 Matrix: blastn matrix 1 -2 Gap Penalties: Existence: 0, Extension: 2.5 From jaya1786 at gmail.com Thu Sep 9 16:59:51 2010 From: jaya1786 at gmail.com (jayanthijayakumar) Date: Thu, 9 Sep 2010 22:29:51 +0530 Subject: [Bioperl-l] Regarding GSoC 2010 Message-ID: Respected sir/madam, I am Jayanthi Jayakumar doing my second year MS(By Research) in computational biology in Anna University Chennai,India. Iam very much interested to participate in GSoC 2010 under the project "Major Bioperl recognition". I request you to provide details and eligiblity criteria for the same. Thanking you, yours faithfully, Jayanthi Jayakumar From Russell.Smithies at agresearch.co.nz Thu Sep 9 22:54:43 2010 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 10 Sep 2010 10:54:43 +1200 Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <154453.73718.qm@web37504.mail.mud.yahoo.com> References: <154453.73718.qm@web37504.mail.mud.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz> Is that a typo in your email or are some of your fasta headers in your db incorrect? Eg. >301333052|GU725063|Xiphinema adenohystherum internal transcribed >301333052|GU725063|spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Shouldn't that be: >301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Maybe the invalid fasta headers are breaking the db formatter? Russell Smithies Technical Support T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of sally roberts > Sent: Friday, 10 September 2010 4:10 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] standaloneblastplus > > I am running a test for standaloneblastplus but getting data back that > does not exist in my query or my local database. Below is a outline of my > script small database, query list, and erroneous results. As you will > notice the query list is comprised of the first four sequences found in > the database. The results say it can not find the first two and then the > mathces for the last two do not exist! > > Thanks for any help! > > > > Program > > > #!/usr/bin/perl > > use Bio::Tools::Run::StandAloneBlastPlus; > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( > -db_name => 'ITS', > -db_data => 'smallDB.fas', > -create => 1 > ); > > $result = $fac->blastn( -query => , 'sequences.fasta', > -outfile => 'ITStest2.bls'); > > > smallDB.fas Data > > >302585252|HM807352|Waitea circinata internal transcribed spacer 1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >302585252|HM807352|Waitea circinata internal transcribed spacer 2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >302585250|HM802273|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >302585249|HM802272|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > >302585248|HM802271|Fusarium oxysporum contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC > CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA > TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT > GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC > CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG > GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC > GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC > ATATCATTAAAGCGGAGGAA > > >301333053|GU725064|Xiphinema turcicum internal transcribed spacer 1 > GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC > GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA > TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG > AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT > ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC > TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG > ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT > GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG > TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG > TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA > CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG > TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA > TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA > ACGCA > CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA > AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT > TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA > > >301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer > 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG > CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT > CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG > GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA > ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA > AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA > AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA > ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA > CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA > AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA > GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG > TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA > AACTTTTATATATAGTTCGCCGAATAATAGCGAAC > > >301333051|GU725062|Xiphinema sphaerocephalum internal transcribed spacer > 1 > AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG > CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC > GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG > AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC > GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG > AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC > GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG > GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC > CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG > TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT > GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG > TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT > ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC > > >301333050|GU725061|Xiphinema hispanum internal transcribed spacer 1 > AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT > TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC > TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG > GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA > TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA > AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA > TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC > GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC > GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC > ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA > GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC > AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT > AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG > > >301333049|GU725060|Xiphinema pyrenaicum internal transcribed spacer 1 > AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC > TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT > TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC > TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG > ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA > GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT > TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA > CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT > AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA > GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA > TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT > ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG > > > > sequences.fasta data > > >Test1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >Test2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >Test3 > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >Test4 > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > > > > Results > > BLASTN 2.2.24+ > > > Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb > Miller (2000), "A greedy algorithm for aligning DNA sequences", J > Comput Biol 2000; 7(1-2):203-14. > > > > Database: ITS > 5 sequences; 1,102 total letters > > > > Query= Test1 > Length=204 > > > ***** No hits found ***** > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 202071 > > > Query= Test2 > Length=192 > > > ***** No hits found ***** > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 189507 > > > Query= Test3 > Length=437 > > Score E > Sequences producing significant alignments: > (Bits) Value > > dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 300 2e-085 > dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4 6e-016 > dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4 1e-012 > dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5 4e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > > Score = 300 bits (162), Expect = 2e-085 > Identities = 176/182 (96%), Gaps = 4/182 (2%) > Strand=Plus/Plus > > Query 10 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC > 66 > ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| > Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC > 81 > > Query 67 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 126 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 82 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 141 > > Query 127 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 186 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 142 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 201 > > Query 187 GG 188 > || > Sbjct 202 GG 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > > Score = 69.4 bits (37), Expect = 6e-016 > Identities = 39/40 (97%), Gaps = 0/40 (0%) > Strand=Plus/Plus > > Query 149 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 188 > ||||| |||||||||||||||||||||||||||||||||| > Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > > Score = 58.4 bits (31), Expect = 1e-012 > Identities = 39/42 (92%), Gaps = 3/42 (7%) > Strand=Plus/Plus > > Query 146 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 186 > |||| || ||| |||||||||||||||||||||||||||||| > Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > > Score = 56.5 bits (30), Expect = 4e-012 > Identities = 30/30 (100%), Gaps = 0/30 (0%) > Strand=Plus/Plus > > Query 157 AAAACTTTCAACAACGGATCTCTTGGTTCT 186 > |||||||||||||||||||||||||||||| > Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 442850 > > > Query= Test4 > Length=521 > > Score E > Sequences producing significant alignments: > (Bits) Value > > dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 309 4e-088 > dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4 7e-016 > dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4 1e-012 > dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5 5e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > > Score = 309 bits (167), Expect = 4e-088 > Identities = 177/181 (97%), Gaps = 3/181 (1%) > Strand=Plus/Plus > > Query 7 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA > 63 > ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| > Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA > 82 > > Query 64 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 123 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 83 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 142 > > Query 124 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 183 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct 143 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 202 > > Query 184 G 184 > | > Sbjct 203 G 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > > Score = 69.4 bits (37), Expect = 7e-016 > Identities = 39/40 (97%), Gaps = 0/40 (0%) > Strand=Plus/Plus > > Query 145 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 184 > ||||| |||||||||||||||||||||||||||||||||| > Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > > Score = 58.4 bits (31), Expect = 1e-012 > Identities = 39/42 (92%), Gaps = 3/42 (7%) > Strand=Plus/Plus > > Query 142 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 182 > |||| || ||| |||||||||||||||||||||||||||||| > Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > > Score = 56.5 bits (30), Expect = 5e-012 > Identities = 30/30 (100%), Gaps = 0/30 (0%) > Strand=Plus/Plus > > Query 153 AAAACTTTCAACAACGGATCTCTTGGTTCT 182 > |||||||||||||||||||||||||||||| > Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 > > > > Lambda K H > 1.33 0.621 1.12 > > Gapped > Lambda K H > 1.28 0.460 0.850 > > Effective search space used: 530378 > > > Database: ITS > Posted date: Aug 27, 2010 9:43 AM > Number of letters in database: 1,102 > Number of sequences in database: 5 > > > > Matrix: blastn matrix 1 -2 > Gap Penalties: Existence: 0, Extension: 2.5 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From elanorbust2 at yahoo.com Fri Sep 10 15:13:08 2010 From: elanorbust2 at yahoo.com (sally roberts) Date: Fri, 10 Sep 2010 08:13:08 -0700 (PDT) Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz> Message-ID: <23696.14536.qm@web37508.mail.mud.yahoo.com> I think that is just a email error. Thanks for looking though! --- On Thu, 9/9/10, Smithies, Russell wrote: From: Smithies, Russell Subject: RE: [Bioperl-l] standaloneblastplus To: "'sally roberts'" , "'bioperl-l at lists.open-bio.org'" Date: Thursday, September 9, 2010, 6:54 PM Is that a typo in your email or are some of your fasta headers in your db incorrect? Eg. >301333052|GU725063|Xiphinema adenohystherum? internal transcribed >301333052|GU725063|spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Shouldn't that be: >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1 AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT Maybe the invalid fasta headers are breaking the db formatter? Russell Smithies Technical Support T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809 F? +64 3 489 9174 www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of sally roberts > Sent: Friday, 10 September 2010 4:10 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] standaloneblastplus > > I am running a test for standaloneblastplus but getting data back that > does not exist in my query or my local database. Below is a outline of my > script small database, query list, and erroneous results. As you will > notice the query list is comprised of the first four sequences found in > the database. The results say it can not find the first two and then the > mathces for the last two do not exist! > > Thanks for any help! > > > > Program > > > #!/usr/bin/perl > > use Bio::Tools::Run::StandAloneBlastPlus; > > > $fac = Bio::Tools::Run::StandAloneBlastPlus->new( >???-db_name => 'ITS', >???-db_data => 'smallDB.fas', >???-create => 1 > ); > > $result = $fac->blastn( -query => , 'sequences.fasta', >? ? ? ? ? ? ? ? ? ? ? ???-outfile => 'ITStest2.bls'); > > > smallDB.fas Data > > >302585252|HM807352|Waitea circinata? internal transcribed spacer 1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >302585252|HM807352|Waitea circinata? internal transcribed spacer 2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, > internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed > spacer 2, and 28S ribosomal RNA" > CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC > CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA > TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT > GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC > CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG > GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC > GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC > ATATCATTAAAGCGGAGGAA > > >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1 > GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC > GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA > TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG > AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT > ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC > TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG > ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT > GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG > TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG > TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA > CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG > TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA > TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA > ACGCA > CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA > AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT > TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA > > >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer > 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG > CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT > CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG > GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA > ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA > AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA > AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA > ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA > CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA > AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA > GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG > TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA > AACTTTTATATATAGTTCGCCGAATAATAGCGAAC > > >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer > 1 > AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG > CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC > GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG > AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC > GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG > AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC > GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG > GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC > CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG > TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT > GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG > TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT > ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC > > >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1 > AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT > TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC > TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG > GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA > TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA > AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA > TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC > GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC > GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC > ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA > GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC > AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT > AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG > > >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1 > AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC > TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT > TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC > TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG > ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA > GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT > TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA > CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT > AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA > GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA > TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT > ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG > > > > sequences.fasta data > > >Test1 > ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC > ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT > TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA > > >Test2 > GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT > CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA > GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA > > >Test3 > CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT > CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA > AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA > ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT > GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC > CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC > > >Test4 > GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG > GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA > AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT > GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT > GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT > TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA > AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG > GAA > > > > > Results > > BLASTN 2.2.24+ > > > Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb > Miller (2000), "A greedy algorithm for aligning DNA sequences", J > Comput Biol 2000; 7(1-2):203-14. > > > > Database: ITS >? ? ? ? ? ? 5 sequences; 1,102 total letters > > > > Query=? Test1 > Length=204 > > > ***** No hits found ***** > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 202071 > > > Query=? Test2 > Length=192 > > > ***** No hits found ***** > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 189507 > > > Query=? Test3 > Length=437 > > Score? ???E > Sequences producing significant alignments: > (Bits)? Value > > dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 300? ? 2e-085 > dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4? ? 6e-016 > dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4? ? 1e-012 > dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5? ? 4e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > >? Score =? 300 bits (162),? Expect = 2e-085 >? Identities = 176/182 (96%), Gaps = 4/182 (2%) >? Strand=Plus/Plus > > Query? 10???TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC > 66 >? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| > Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC > 81 > > Query? 67???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 126 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 82???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT > 141 > > Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 186 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT > 201 > > Query? 187? GG? 188 >? ? ? ? ? ???|| > Sbjct? 202? GG? 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > >? Score = 69.4 bits (37),? Expect = 6e-016 >? Identities = 39/40 (97%), Gaps = 0/40 (0%) >? Strand=Plus/Plus > > Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188 >? ? ? ? ? ???||||| |||||||||||||||||||||||||||||||||| > Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > >? Score = 58.4 bits (31),? Expect = 1e-012 >? Identities = 39/42 (92%), Gaps = 3/42 (7%) >? Strand=Plus/Plus > > Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186 >? ? ? ? ? ???|||| || ||| |||||||||||||||||||||||||||||| > Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > >? Score = 56.5 bits (30),? Expect = 4e-012 >? Identities = 30/30 (100%), Gaps = 0/30 (0%) >? Strand=Plus/Plus > > Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186 >? ? ? ? ? ???|||||||||||||||||||||||||||||| > Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 442850 > > > Query=? Test4 > Length=521 > > Score? ???E > Sequences producing significant alignments: > (Bits)? Value > > dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 309? ? 4e-088 > dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 69.4? ? 7e-016 > dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 58.4? ? 1e-012 > dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5... > 56.5? ? 5e-012 > > > >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G59F > Length=203 > >? Score =? 309 bits (167),? Expect = 4e-088 >? Identities = 177/181 (97%), Gaps = 3/181 (1%) >? Strand=Plus/Plus > > Query? 7? ? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA > 63 >? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| > Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA > 82 > > Query? 64???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 123 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 83???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG > 142 > > Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 183 >? ? ? ? ? ???|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG > 202 > > Query? 184? G? 184 >? ? ? ? ? ???| > Sbjct? 203? G? 203 > > > >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G64F > Length=217 > >? Score = 69.4 bits (37),? Expect = 7e-016 >? Identities = 39/40 (97%), Gaps = 0/40 (0%) >? Strand=Plus/Plus > > Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184 >? ? ? ? ? ???||||| |||||||||||||||||||||||||||||||||| > Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217 > > > >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G60F > Length=206 > >? Score = 58.4 bits (31),? Expect = 1e-012 >? Identities = 39/42 (92%), Gaps = 3/42 (7%) >? Strand=Plus/Plus > > Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182 >? ? ? ? ? ???|||| || ||| |||||||||||||||||||||||||||||| > Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204 > > > >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, > partial > sequence, clone: G65F > Length=256 > >? Score = 56.5 bits (30),? Expect = 5e-012 >? Identities = 30/30 (100%), Gaps = 0/30 (0%) >? Strand=Plus/Plus > > Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182 >? ? ? ? ? ???|||||||||||||||||||||||||||||| > Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254 > > > > Lambda? ???K? ? ? H >? ???1.33? ? 0.621? ???1.12 > > Gapped > Lambda? ???K? ? ? H >? ???1.28? ? 0.460? ? 0.850 > > Effective search space used: 530378 > > >???Database: ITS >? ???Posted date:? Aug 27, 2010? 9:43 AM >???Number of letters in database: 1,102 >???Number of sequences in database:? 5 > > > > Matrix: blastn matrix 1 -2 > Gap Penalties: Existence: 0, Extension: 2.5 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From David.Messina at sbc.su.se Fri Sep 10 16:23:26 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 10 Sep 2010 18:23:26 +0200 Subject: [Bioperl-l] standaloneblastplus In-Reply-To: <23696.14536.qm@web37508.mail.mud.yahoo.com> References: <23696.14536.qm@web37508.mail.mud.yahoo.com> Message-ID: Hi Sally, Did you run the same search on the command line, outside of BioPerl? The issue you're having may be with Blast+ and not BioPerl. For example, it's possible that the low-complexity and compositional matrix adjustment filtering (which are turned on by default) are excluding the expected matches. Dave On Sep 10, 2010, at 17:13 , sally roberts wrote: > I think that is just a email error. Thanks for looking though! > > --- On Thu, 9/9/10, Smithies, Russell wrote: > > From: Smithies, Russell > Subject: RE: [Bioperl-l] standaloneblastplus > To: "'sally roberts'" , "'bioperl-l at lists.open-bio.org'" > Date: Thursday, September 9, 2010, 6:54 PM > > Is that a typo in your email or are some of your fasta headers in your db incorrect? > Eg. >> 301333052|GU725063|Xiphinema adenohystherum internal transcribed >> 301333052|GU725063|spacer 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT > > Shouldn't that be: >> 301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer 1 > AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT > > Maybe the invalid fasta headers are breaking the db formatter? > > > Russell Smithies > > Technical Support > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of sally roberts >> Sent: Friday, 10 September 2010 4:10 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] standaloneblastplus >> >> I am running a test for standaloneblastplus but getting data back that >> does not exist in my query or my local database. Below is a outline of my >> script small database, query list, and erroneous results. As you will >> notice the query list is comprised of the first four sequences found in >> the database. The results say it can not find the first two and then the >> mathces for the last two do not exist! >> >> Thanks for any help! >> >> >> >> Program >> >> >> #!/usr/bin/perl >> >> use Bio::Tools::Run::StandAloneBlastPlus; >> >> >> $fac = Bio::Tools::Run::StandAloneBlastPlus->new( >> -db_name => 'ITS', >> -db_data => 'smallDB.fas', >> -create => 1 >> ); >> >> $result = $fac->blastn( -query => , 'sequences.fasta', >> -outfile => 'ITStest2.bls'); >> >> >> smallDB.fas Data >> >>> 302585252|HM807352|Waitea circinata internal transcribed spacer 1 >> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC >> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT >> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >> >>> 302585252|HM807352|Waitea circinata internal transcribed spacer 2 >> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT >> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA >> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >> >>> 302585250|HM802273|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT >> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA >> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA >> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT >> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC >> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >> >>> 302585249|HM802272|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG >> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA >> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT >> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT >> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT >> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA >> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG >> GAA >> >>> 302585248|HM802271|Fusarium oxysporum contains 18S ribosomal RNA, >> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed >> spacer 2, and 28S ribosomal RNA" >> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC >> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA >> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT >> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC >> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG >> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC >> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC >> ATATCATTAAAGCGGAGGAA >> >>> 301333053|GU725064|Xiphinema turcicum internal transcribed spacer 1 >> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC >> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA >> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG >> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT >> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC >> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG >> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT >> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG >> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG >> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA >> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG >> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA >> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA >> ACGCA >> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA >> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT >> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA >> >>> 301333052|GU725063|Xiphinema adenohystherum internal transcribed spacer >> 1 >> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG >> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT >> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG >> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA >> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA >> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA >> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA >> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA >> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA >> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA >> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG >> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA >> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC >> >>> 301333051|GU725062|Xiphinema sphaerocephalum internal transcribed spacer >> 1 >> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG >> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC >> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG >> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC >> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG >> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC >> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG >> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC >> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG >> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT >> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG >> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT >> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC >> >>> 301333050|GU725061|Xiphinema hispanum internal transcribed spacer 1 >> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT >> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC >> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG >> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA >> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA >> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA >> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC >> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC >> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC >> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA >> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC >> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT >> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG >> >>> 301333049|GU725060|Xiphinema pyrenaicum internal transcribed spacer 1 >> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC >> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT >> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC >> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG >> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA >> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT >> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA >> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT >> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA >> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA >> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT >> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG >> >> >> >> sequences.fasta data >> >>> Test1 >> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC >> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT >> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA >> >>> Test2 >> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT >> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA >> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA >> >>> Test3 >> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT >> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA >> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA >> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT >> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC >> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC >> >>> Test4 >> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG >> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA >> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT >> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT >> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT >> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA >> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG >> GAA >> >> >> >> >> Results >> >> BLASTN 2.2.24+ >> >> >> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb >> Miller (2000), "A greedy algorithm for aligning DNA sequences", J >> Comput Biol 2000; 7(1-2):203-14. >> >> >> >> Database: ITS >> 5 sequences; 1,102 total letters >> >> >> >> Query= Test1 >> Length=204 >> >> >> ***** No hits found ***** >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 202071 >> >> >> Query= Test2 >> Length=192 >> >> >> ***** No hits found ***** >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 189507 >> >> >> Query= Test3 >> Length=437 >> >> Score E >> Sequences producing significant alignments: >> (Bits) Value >> >> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 300 2e-085 >> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 69.4 6e-016 >> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 58.4 1e-012 >> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 56.5 4e-012 >> >> >>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G59F >> Length=203 >> >> Score = 300 bits (162), Expect = 2e-085 >> Identities = 176/182 (96%), Gaps = 4/182 (2%) >> Strand=Plus/Plus >> >> Query 10 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC >> 66 >> ||||||||||| | |||||| |||||| |||||||| |||| |||||||||||||||||| >> Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC >> 81 >> >> Query 67 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT >> 126 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 82 AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT >> 141 >> >> Query 127 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT >> 186 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 142 GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT >> 201 >> >> Query 187 GG 188 >> || >> Sbjct 202 GG 203 >> >> >>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G64F >> Length=217 >> >> Score = 69.4 bits (37), Expect = 6e-016 >> Identities = 39/40 (97%), Gaps = 0/40 (0%) >> Strand=Plus/Plus >> >> Query 149 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 188 >> ||||| |||||||||||||||||||||||||||||||||| >> Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 >> >> >>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G60F >> Length=206 >> >> Score = 58.4 bits (31), Expect = 1e-012 >> Identities = 39/42 (92%), Gaps = 3/42 (7%) >> Strand=Plus/Plus >> >> Query 146 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 186 >> |||| || ||| |||||||||||||||||||||||||||||| >> Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 >> >> >>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G65F >> Length=256 >> >> Score = 56.5 bits (30), Expect = 4e-012 >> Identities = 30/30 (100%), Gaps = 0/30 (0%) >> Strand=Plus/Plus >> >> Query 157 AAAACTTTCAACAACGGATCTCTTGGTTCT 186 >> |||||||||||||||||||||||||||||| >> Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 442850 >> >> >> Query= Test4 >> Length=521 >> >> Score E >> Sequences producing significant alignments: >> (Bits) Value >> >> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 309 4e-088 >> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 69.4 7e-016 >> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 58.4 1e-012 >> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5... >> 56.5 5e-012 >> >> >>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G59F >> Length=203 >> >> Score = 309 bits (167), Expect = 4e-088 >> Identities = 177/181 (97%), Gaps = 3/181 (1%) >> Strand=Plus/Plus >> >> Query 7 TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA >> 63 >> ||||||||||| | |||||| |||||| |||||||||||||||||||||||||||||||| >> Sbjct 23 TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA >> 82 >> >> Query 64 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG >> 123 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 83 GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG >> 142 >> >> Query 124 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG >> 183 >> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| >> Sbjct 143 TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG >> 202 >> >> Query 184 G 184 >> | >> Sbjct 203 G 203 >> >> >>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G64F >> Length=217 >> >> Score = 69.4 bits (37), Expect = 7e-016 >> Identities = 39/40 (97%), Gaps = 0/40 (0%) >> Strand=Plus/Plus >> >> Query 145 AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 184 >> ||||| |||||||||||||||||||||||||||||||||| >> Sbjct 178 AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG 217 >> >> >>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G60F >> Length=206 >> >> Score = 58.4 bits (31), Expect = 1e-012 >> Identities = 39/42 (92%), Gaps = 3/42 (7%) >> Strand=Plus/Plus >> >> Query 142 ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT 182 >> |||| || ||| |||||||||||||||||||||||||||||| >> Sbjct 165 ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT 204 >> >> >>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, >> partial >> sequence, clone: G65F >> Length=256 >> >> Score = 56.5 bits (30), Expect = 5e-012 >> Identities = 30/30 (100%), Gaps = 0/30 (0%) >> Strand=Plus/Plus >> >> Query 153 AAAACTTTCAACAACGGATCTCTTGGTTCT 182 >> |||||||||||||||||||||||||||||| >> Sbjct 225 AAAACTTTCAACAACGGATCTCTTGGTTCT 254 >> >> >> >> Lambda K H >> 1.33 0.621 1.12 >> >> Gapped >> Lambda K H >> 1.28 0.460 0.850 >> >> Effective search space used: 530378 >> >> >> Database: ITS >> Posted date: Aug 27, 2010 9:43 AM >> Number of letters in database: 1,102 >> Number of sequences in database: 5 >> >> >> >> Matrix: blastn matrix 1 -2 >> Gap Penalties: Existence: 0, Extension: 2.5 >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jun.yin at ucd.ie Sat Sep 11 16:13:09 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Sat, 11 Sep 2010 17:13:09 +0100 Subject: [Bioperl-l] Regarding GSoC 2010 In-Reply-To: References: Message-ID: <019501cb51cc$39d15730$ad740590$%yin@ucd.ie> Hi, Jayanthi Jayakumar, GSoC is already finished this year. You can check the information here: http://socghop.appspot.com/gsoc/program/home/google/gsoc2010 However, you can still contribute to the BioPerl project if you like. You can talk to people in this mail list. Or you can join the IRC channel (http://www.bioperl.org/wiki/IRC). Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jayanthijayakumar Sent: Thursday, September 09, 2010 6:00 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Regarding GSoC 2010 Respected sir/madam, I am Jayanthi Jayakumar doing my second year MS(By Research) in computational biology in Anna University Chennai,India. Iam very much interested to participate in GSoC 2010 under the project "Major Bioperl recognition". I request you to provide details and eligiblity criteria for the same. Thanking you, yours faithfully, Jayanthi Jayakumar _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From david.breimann at gmail.com Sun Sep 12 13:16:29 2010 From: david.breimann at gmail.com (David Breimann) Date: Sun, 12 Sep 2010 15:16:29 +0200 Subject: [Bioperl-l] Circular genomes Message-ID: Hello, As continuation to http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033904.html, I would like to ask: Was the fix implemented yet? That is, are GFF3 created for circular genomes comply with GFF3 specs for such genomes? I just find it difficult to keep track using git ,so I'm not sure if this was already handled. Also, will the stat and end coordinates of such genes loaded from a GFF3 file will be "normal" (i.e. no coordinate is larger than the size of the genome) or just as written in the GFF3 (which demands that end > start even if end > genome length)? Thanks, David From David.Messina at sbc.su.se Mon Sep 13 15:10:42 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 13 Sep 2010 17:10:42 +0200 Subject: [Bioperl-l] BioPerl net installer Message-ID: <80921A33-63E0-481A-B31B-3C0338542F2B@sbc.su.se> Hi everyone, I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. It's already part of bioperl-live, and you can also get it here: http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl Dave From maj at fortinbras.us Mon Sep 13 16:47:45 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 13 Sep 2010 16:47:45 +0000 Subject: [Bioperl-l] BioPerl net installer Message-ID: Dear Scott- You rock! Sincerely, Mark >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Monday, September 13, 2010 11:10 AM >To: 'BioPerl List' >Subject: [Bioperl-l] BioPerl net installer > >Hi everyone, > >I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. > >The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. > >It's already part of bioperl-live, and you can also get it here: > > http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Sep 13 21:15:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 13 Sep 2010 16:15:45 -0500 Subject: [Bioperl-l] BioPerl net installer In-Reply-To: References: Message-ID: <3D7D24C5-B2BD-472E-9611-F3D7112E453D@illinois.edu> Ditto! chris (briefly resurfacing) On Sep 13, 2010, at 11:47 AM, Mark A. Jensen wrote: > Dear Scott- > You rock! > Sincerely, > Mark > >> -----Original Message----- >> From: Dave Messina [mailto:David.Messina at sbc.su.se] >> Sent: Monday, September 13, 2010 11:10 AM >> To: 'BioPerl List' >> Subject: [Bioperl-l] BioPerl net installer >> >> Hi everyone, >> >> I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl. >> >> The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running. >> >> It's already part of bioperl-live, and you can also get it here: >> >> http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl >> >> >> >> Dave >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From timmcilveen at talktalk.net Mon Sep 13 23:07:00 2010 From: timmcilveen at talktalk.net (tim) Date: Tue, 14 Sep 2010 00:07:00 +0100 Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3 Message-ID: <201009140007.00798.timmcilveen@talktalk.net> Hi, I have just installed Bioperl on my Linux system using the CPAN install. The install summary is as follows: Test Summary Report ------------------- t/RemoteDB/GenPept.t (Wstat: 256 Tests: 21 Failed: 1) Failed test: 17 Non-zero exit status: 1 t/RemoteDB/Query/GenBank.t (Wstat: 256 Tests: 18 Failed: 1) Failed test: 9 Non-zero exit status: 1 Parse errors: Bad plan. You planned 21 tests but ran 18. t/RemoteDB/Taxonomy.t (Wstat: 512 Tests: 103 Failed: 2) Failed tests: 15, 98 Non-zero exit status: 2 t/Root/RootIO.t (Wstat: 7424 Tests: 30 Failed: 0) Non-zero exit status: 29 Parse errors: Bad plan. You planned 31 tests but ran 30. Files=329, Tests=18407, 512 wallclock secs ( 6.19 usr 0.91 sys + 156.68 cusr 9.16 csys = 172.94 CPU) Result: FAIL Failed 4/329 test programs. 4/18407 subtests failed. CJFIELDS/BioPerl-1.6.1.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.1.tar.gz Running Build install make test had returned bad status, won't install without force Failed during this command: CJFIELDS/BioPerl-1.6.1.tar.gz : make_test NO Is Bioperl properly installed? During the install process I was getting quite a lot of this error (100's of instances): 'replacement list longer than search list . This happened with t/tools, t/seq / t/search and many others. Any advice would be great. Tim From David.Messina at sbc.su.se Tue Sep 14 07:56:33 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 14 Sep 2010 09:56:33 +0200 Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3 In-Reply-To: <201009140007.00798.timmcilveen@talktalk.net> References: <201009140007.00798.timmcilveen@talktalk.net> Message-ID: <5955676D-D3BC-452B-BAA0-6F230EC11EC1@sbc.su.se> Hi Tim, Thanks for your report. > Is Bioperl properly installed? No, it wasn't. When installing through CPAN, if any tests fail the installation is aborted. You can always check by looking for this line: > make test had returned bad status, won't install without force As for the error(s) > 'replacement list longer than search list' I believe this was fixed a couple of months ago. For details, see: http://bugzilla.open-bio.org/show_bug.cgi?id=3116 So I would recommend that you grab the latest copy of bioperl-live from github, wherein the bug will be fixed: http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots Give that a shot and let us know how it goes. Dave From jskittrell at unmc.edu Thu Sep 16 16:15:49 2010 From: jskittrell at unmc.edu (Jeff Kittrell) Date: Thu, 16 Sep 2010 16:15:49 +0000 (UTC) Subject: [Bioperl-l] mpiblast Message-ID: Does Bioperl work with mpiblast? Is the there a standalone like module that allows you to easily call mpiblast? I'm assuming seqio with parse a mpiblast output file correctly? Thanks for any help, Jeff From David.Messina at sbc.su.se Thu Sep 16 18:25:57 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Thu, 16 Sep 2010 20:25:57 +0200 Subject: [Bioperl-l] mpiblast In-Reply-To: References: Message-ID: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> > Is the there a standalone like module that allows you to easily call mpiblast? No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward. http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase > I'm assuming seqio with parse a mpiblast output file correctly? Yes, although I see that a new version of mpiblast was recently released. Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet? Dave From shalabh.sharma7 at gmail.com Thu Sep 16 21:38:14 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Thu, 16 Sep 2010 17:38:14 -0400 Subject: [Bioperl-l] IUPAC code similarity Message-ID: Hi All, I have few nucleotide sequences that are composed of IUPAC codes. Like >test VGSRVBSSSSSNSC Similarly i have a database made of of these kind of sequences. I want to find sequences that are 100% similar to the query sequence. Is there any bioPerl module to deal with this, i tried normal blast but it didn't worked. Do i have to convert these sequences to 4 base codes or there is any other way out. Thanks Shalabh From amackey at virginia.edu Fri Sep 17 14:28:15 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 10:28:15 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Convert the IUPAC code to a regular expression, and use regular expressions (in Perl or grep or similar) to find 100% identical matches. -Aaron On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma wrote: > Hi All, > I have few nucleotide sequences that are composed of IUPAC codes. Like > >test > VGSRVBSSSSSNSC > > Similarly i have a database made of of these kind of sequences. I want to > find sequences that are 100% similar to the query sequence. > > Is there any bioPerl module to deal with this, i tried normal blast but it > didn't worked. > Do i have to convert these sequences to 4 base codes or there is any other > way out. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Fri Sep 17 15:07:38 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 17 Sep 2010 11:07:38 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Thanks Aaron for your reply. Actually i tried that first, but there is another problem, i have to divide each query sequence to window size 5 with 1 base shift and its not possible to divide regular expression in that way. So what i am trying is to convert those iupac codes to 4 base code sequence and then do the normal search. Now the problem is that i cant able to convert those IUPAC sequences to normal ones, i am still trying to write a script but its taking time. Thanks Shalabh On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: > Convert the IUPAC code to a regular expression, and use regular expressions > (in Perl or grep or similar) to find 100% identical matches. > > -Aaron > > On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma > wrote: > >> Hi All, >> I have few nucleotide sequences that are composed of IUPAC codes. >> Like >> >test >> VGSRVBSSSSSNSC >> >> Similarly i have a database made of of these kind of sequences. I want to >> find sequences that are 100% similar to the query sequence. >> >> Is there any bioPerl module to deal with this, i tried normal blast but it >> didn't worked. >> Do i have to convert these sequences to 4 base codes or there is any other >> way out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From roy.chaudhuri at gmail.com Fri Sep 17 15:04:28 2010 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 17 Sep 2010 16:04:28 +0100 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: <4C93837C.4080008@gmail.com> Hi Shalabh, The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC codes to regular expressions: $perl -e 'use Bio::Tools::SeqPattern; print Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand' [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C Although that won't work if there are also abiguity codes in your database. For a non-BioPerl solution you could try fuzznuc from Emboss. Cheers. Roy. On 17/09/2010 15:28, Aaron Mackey wrote: > Convert the IUPAC code to a regular expression, and use regular expressions > (in Perl or grep or similar) to find 100% identical matches. > > -Aaron > > On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma > wrote: > >> Hi All, >> I have few nucleotide sequences that are composed of IUPAC codes. Like >>> test >> VGSRVBSSSSSNSC >> >> Similarly i have a database made of of these kind of sequences. I want to >> find sequences that are 100% similar to the query sequence. >> >> Is there any bioPerl module to deal with this, i tried normal blast but it >> didn't worked. >> Do i have to convert these sequences to 4 base codes or there is any other >> way out. >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From david.breimann at gmail.com Fri Sep 17 18:13:22 2010 From: david.breimann at gmail.com (David Breimann) Date: Fri, 17 Sep 2010 20:13:22 +0200 Subject: [Bioperl-l] Installing using git after an older installation Message-ID: Hello, I'm sharing a server with some other lab members. I would like to install the latest version of bioperl for my own use, without affecting my colleagues. I used git to clone a copy of bioperl-live and exported PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB". Now perl -MBio::Perl -le 'print Bio::Perl->VERSION;' returns 1.0069 My question is: is that all? Now I'm using the latest version? Should I include anything special in my scripts? Also, what about all the bp_***.pl scripts? Are the now using the latest version, too? I guess not, since I didn't build anything. So what should I do about them? Thanks, Dave From amackey at virginia.edu Fri Sep 17 19:24:44 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 15:24:44 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: <4C93837C.4080008@gmail.com> References: <4C93837C.4080008@gmail.com> Message-ID: If there are ambi. codes in the database, then the expanded character class has to also include the original ambiguity code; non-ambiguous nucleotides must also be expanded to include all ambiguity codes that represent the nucleotide. -Aaron On Fri, Sep 17, 2010 at 11:04 AM, Roy Chaudhuri wrote: > Hi Shalabh, > > The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC > codes to regular expressions: > > $perl -e 'use Bio::Tools::SeqPattern; print > Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand' > [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C > > Although that won't work if there are also abiguity codes in your database. > For a non-BioPerl solution you could try fuzznuc from Emboss. > > Cheers. > Roy. > > > On 17/09/2010 15:28, Aaron Mackey wrote: > >> Convert the IUPAC code to a regular expression, and use regular >> expressions >> (in Perl or grep or similar) to find 100% identical matches. >> >> -Aaron >> >> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma >> wrote: >> >> Hi All, >>> I have few nucleotide sequences that are composed of IUPAC codes. >>> Like >>> >>>> test >>>> >>> VGSRVBSSSSSNSC >>> >>> Similarly i have a database made of of these kind of sequences. I want to >>> find sequences that are 100% similar to the query sequence. >>> >>> Is there any bioPerl module to deal with this, i tried normal blast but >>> it >>> didn't worked. >>> Do i have to convert these sequences to 4 base codes or there is any >>> other >>> way out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From amackey at virginia.edu Fri Sep 17 19:25:54 2010 From: amackey at virginia.edu (Aaron Mackey) Date: Fri, 17 Sep 2010 15:25:54 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: do your windowing/shifting on the unexpanded query sequences; then transform the 5-bp queries into regular expressions. -Aaron On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma wrote: > Thanks Aaron for your reply. > Actually i tried that first, but there is another problem, i have to divide > each query sequence to window size 5 with 1 base shift and its not possible > to divide regular expression in that way. > So what i am trying is to convert those iupac codes to 4 base code sequence > and then do the normal search. > Now the problem is that i cant able to convert those IUPAC sequences to > normal ones, i am still trying to write a script but its taking time. > > Thanks > Shalabh > > > On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: > >> Convert the IUPAC code to a regular expression, and use regular >> expressions (in Perl or grep or similar) to find 100% identical matches. >> >> -Aaron >> >> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma < >> shalabh.sharma7 at gmail.com> wrote: >> >>> Hi All, >>> I have few nucleotide sequences that are composed of IUPAC codes. >>> Like >>> >test >>> VGSRVBSSSSSNSC >>> >>> Similarly i have a database made of of these kind of sequences. I want to >>> find sequences that are 100% similar to the query sequence. >>> >>> Is there any bioPerl module to deal with this, i tried normal blast but >>> it >>> didn't worked. >>> Do i have to convert these sequences to 4 base codes or there is any >>> other >>> way out. >>> >>> Thanks >>> Shalabh >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> > From Kevin.M.Brown at asu.edu Fri Sep 17 20:09:34 2010 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 17 Sep 2010 13:09:34 -0700 Subject: [Bioperl-l] Installing using git after an older installation In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40701E0A4@EX02.asurite.ad.asu.edu> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE RL_IN_A_PERSONAL_MODULE_AREA From shalabh.sharma7 at gmail.com Fri Sep 17 20:45:50 2010 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 17 Sep 2010 16:45:50 -0400 Subject: [Bioperl-l] IUPAC code similarity In-Reply-To: References: Message-ID: Thanks Aaron, changing the query sequence worked well but i am still struggling with the database. -Shalabh On Fri, Sep 17, 2010 at 3:25 PM, Aaron Mackey wrote: > do your windowing/shifting on the unexpanded query sequences; then > transform the 5-bp queries into regular expressions. > > -Aaron > > > On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma < > shalabh.sharma7 at gmail.com> wrote: > >> Thanks Aaron for your reply. >> Actually i tried that first, but there is another problem, i have to >> divide each query sequence to window size 5 with 1 base shift and its not >> possible to divide regular expression in that way. >> So what i am trying is to convert those iupac codes to 4 base code >> sequence and then do the normal search. >> Now the problem is that i cant able to convert those IUPAC sequences to >> normal ones, i am still trying to write a script but its taking time. >> >> Thanks >> Shalabh >> >> >> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey wrote: >> >>> Convert the IUPAC code to a regular expression, and use regular >>> expressions (in Perl or grep or similar) to find 100% identical matches. >>> >>> -Aaron >>> >>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma < >>> shalabh.sharma7 at gmail.com> wrote: >>> >>>> Hi All, >>>> I have few nucleotide sequences that are composed of IUPAC codes. >>>> Like >>>> >test >>>> VGSRVBSSSSSNSC >>>> >>>> Similarly i have a database made of of these kind of sequences. I want >>>> to >>>> find sequences that are 100% similar to the query sequence. >>>> >>>> Is there any bioPerl module to deal with this, i tried normal blast but >>>> it >>>> didn't worked. >>>> Do i have to convert these sequences to 4 base codes or there is any >>>> other >>>> way out. >>>> >>>> Thanks >>>> Shalabh >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >> > From heikki.lehvaslaiho at gmail.com Sat Sep 18 07:41:22 2010 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Sat, 18 Sep 2010 10:41:22 +0300 Subject: [Bioperl-l] mpiblast In-Reply-To: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> References: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se> Message-ID: Been running 1.6 and its betas on Blue Gene/P for months. The output is identical to standard BLAST output. No issues in parsing it with BioPerl. ? ?? -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +966 545 595 849? office: +966 2 808 2429 Computational Bioscience Research Centre (CBRC), Building #2, Office #4216 4700 King Abdullah University of Science and Technology (KAUST) Thuwal 23955-6900, Kingdom of Saudi Arabia On 16 September 2010 21:25, Dave Messina wrote: >> Is the there a standalone like module that allows you to easily call mpiblast? > > No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward. > > ? ? ? ?http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase > > >> I'm assuming seqio with parse a mpiblast output file correctly? > > Yes, although I see that a new version of mpiblast was recently released. > > Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet? > > > Dave > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From david.breimann at gmail.com Sat Sep 18 09:05:58 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 11:05:58 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl Message-ID: Hello, I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag` in the fields and sometime it doesn't, even though the genabank has a locus tag. Also, is the ID always equivalent to the locus tag? Thanks, Dave From scott at scottcain.net Sat Sep 18 09:17:24 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 10:17:24 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, bp_genbank2gff3.pl suffers from the fact that it has to deal with GenBank files :-) It was designed initially to work on whole genome refseqs, and contains several ad hoc rules for trying to make it "do the right thing." In practice, it is not unusual for a post processing step (either by hand or a quicky perl script) to be required to really get it right. I don't recall the specifics (if I ever knew :-) for when and how the locus tag is used, but I do know that there is a list of things that it will try to use for the ID, and while the locus is on the list, I don't know where it comes in the list, so it's possible that other items might supersede it. Scott On Sat, Sep 18, 2010 at 10:05 AM, David Breimann wrote: > Hello, > > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag` > in the fields and sometime it doesn't, even though the genabank has a locus > tag. > Also, is the ID always equivalent to the locus tag? > > Thanks, > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 09:20:33 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 11:20:33 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Since locus_tag is an essential tag in genbank, I suggest locus_tag will be always added to the GFF last column if it exists in the genbank, whether it is used as ID in the GFF or not. On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain wrote: > Hi Dave, > > bp_genbank2gff3.pl suffers from the fact that it has to deal with > GenBank files :-) It was designed initially to work on whole genome > refseqs, and contains several ad hoc rules for trying to make it "do > the right thing." In practice, it is not unusual for a post > processing step (either by hand or a quicky perl script) to be > required to really get it right. I don't recall the specifics (if I > ever knew :-) for when and how the locus tag is used, but I do know > that there is a list of things that it will try to use for the ID, and > while the locus is on the list, I don't know where it comes in the > list, so it's possible that other items might supersede it. > > Scott > > > On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > wrote: > > Hello, > > > > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a > `locus_tag` > > in the fields and sometime it doesn't, even though the genabank has a > locus > > tag. > > Also, is the ID always equivalent to the locus tag? > > > > Thanks, > > Dave > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From scott at scottcain.net Sat Sep 18 10:08:26 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 11:08:26 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, That seems perfectly reasonable. If you could point out a GenBank entry for which that does not happen, I could try to figure out why not. Scott On Sat, Sep 18, 2010 at 10:20 AM, David Breimann wrote: > Since locus_tag is an essential tag in genbank, I suggest locus_tag will be > always added to the GFF last column if it exists in the genbank, whether it > is used as ID in the GFF or not. > > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain wrote: >> >> Hi Dave, >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with >> GenBank files :-) ?It was designed initially to work on whole genome >> refseqs, and contains several ad hoc rules for trying to make it "do >> the right thing." ?In practice, it is not unusual for a post >> processing step (either by hand or a quicky perl script) to be >> required to really get it right. ?I don't recall the specifics (if I >> ever knew :-) for when and how the locus tag is used, but I do know >> that there is a list of things that it will try to use for the ID, and >> while the locus is on the list, I don't know where it comes in the >> list, so it's possible that other items might supersede it. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> wrote: >> > Hello, >> > >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a >> > `locus_tag` >> > in the fields and sometime it doesn't, even though the genabank has a >> > locus >> > tag. >> > Also, is the ID always equivalent to the locus tag? >> > >> > Thanks, >> > Dave >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 10:20:50 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 12:20:50 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Scott, Here is a very short genbank: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk Note all genes in the genbank have locus tags. In the resulting GFF3, however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no idea why it deserves a special treatment... :) p.s. making this change (i.e., copying locus_tag to the GFF3 last column whenever available) will really make my life easier. Thank you, Dave On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain wrote: > Hi Dave, > > That seems perfectly reasonable. If you could point out a GenBank > entry for which that does not happen, I could try to figure out why > not. > > Scott > > > On Sat, Sep 18, 2010 at 10:20 AM, David Breimann > wrote: > > Since locus_tag is an essential tag in genbank, I suggest locus_tag will > be > > always added to the GFF last column if it exists in the genbank, whether > it > > is used as ID in the GFF or not. > > > > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain > wrote: > >> > >> Hi Dave, > >> > >> bp_genbank2gff3.pl suffers from the fact that it has to deal with > >> GenBank files :-) It was designed initially to work on whole genome > >> refseqs, and contains several ad hoc rules for trying to make it "do > >> the right thing." In practice, it is not unusual for a post > >> processing step (either by hand or a quicky perl script) to be > >> required to really get it right. I don't recall the specifics (if I > >> ever knew :-) for when and how the locus tag is used, but I do know > >> that there is a list of things that it will try to use for the ID, and > >> while the locus is on the list, I don't know where it comes in the > >> list, so it's possible that other items might supersede it. > >> > >> Scott > >> > >> > >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > >> wrote: > >> > Hello, > >> > > >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a > >> > `locus_tag` > >> > in the fields and sometime it doesn't, even though the genabank has a > >> > locus > >> > tag. > >> > Also, is the ID always equivalent to the locus tag? > >> > > >> > Thanks, > >> > Dave > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From david.breimann at gmail.com Sat Sep 18 10:45:13 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 12:45:13 +0200 Subject: [Bioperl-l] Extracting sequences from GFF3 Message-ID: As you know, GFF3 files can contain FASTA sequences after the features. How do I extract a specific FASTA sequence given it's ID? I tried: use Bio::Tools::GFF; use Data::Dumper; my $gffio = Bio::Tools::GFF->new( -file => "/path/to/file.gff", -gff_version => 3 ); print Dumper $gffio->get_seqs(); but $gffio->get_seqs() seems to return nothing, although the GFF3 has sequences and is also valid. By the way, I am able to parse the features themselves (using $gffio->next_feature()). Thanks, Dave From scott at scottcain.net Sat Sep 18 11:07:13 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 12:07:13 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, A fresh "pull" of the bioperl git repository shows that bp_genbank2gff3.pl already does this. It creates a locus_tag for all features that have a locus_tag, and uses the locus_tag for the ID when it can (it can't blindly use the locus tag for the ID since both the gene and the CDS have the same tag). Scott On Sat, Sep 18, 2010 at 11:20 AM, David Breimann wrote: > Hi Scott, > > Here is a very short genbank: > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > > Note all genes in the genbank have locus tags. In the resulting GFF3, > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no > idea why it deserves a special treatment... :) > > p.s. making this change (i.e., copying locus_tag to the GFF3 last column > whenever available) will really make my life easier. > > Thank you, > Dave > > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain wrote: >> >> Hi Dave, >> >> That seems perfectly reasonable. ?If you could point out a GenBank >> entry for which that does not happen, I could try to figure out why >> not. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> wrote: >> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will >> > be >> > always added to the GFF last column if it exists in the genbank, whether >> > it >> > is used as ID in the GFF or not. >> > >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> > wrote: >> >> >> >> Hi Dave, >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with >> >> GenBank files :-) ?It was designed initially to work on whole genome >> >> refseqs, and contains several ad hoc rules for trying to make it "do >> >> the right thing." ?In practice, it is not unusual for a post >> >> processing step (either by hand or a quicky perl script) to be >> >> required to really get it right. ?I don't recall the specifics (if I >> >> ever knew :-) for when and how the locus tag is used, but I do know >> >> that there is a list of things that it will try to use for the ID, and >> >> while the locus is on the list, I don't know where it comes in the >> >> list, so it's possible that other items might supersede it. >> >> >> >> Scott >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> wrote: >> >> > Hello, >> >> > >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a >> >> > `locus_tag` >> >> > in the fields and sometime it doesn't, even though the genabank has a >> >> > locus >> >> > tag. >> >> > Also, is the ID always equivalent to the locus tag? >> >> > >> >> > Thanks, >> >> > Dave >> >> > _______________________________________________ >> >> > Bioperl-l mailing list >> >> > Bioperl-l at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 11:13:23 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 12:13:23 +0100 Subject: [Bioperl-l] Extracting sequences from GFF3 In-Reply-To: References: Message-ID: Hi Dave, I would use Bio::DB::SeqFeature::Store (either with a database on the backend or a flat file if a database isn't warranted): my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory', -dir => 'path/to/file' ); # Warning: this returns a string, and not a PrimarySeq object my $sequence = $db->fetch_sequence('Chr1',5000=>6000); Scott On Sat, Sep 18, 2010 at 11:45 AM, David Breimann wrote: > As you know, GFF3 files can contain FASTA sequences after the features. > > How do I extract a specific FASTA sequence given it's ID? > > I tried: > > use Bio::Tools::GFF; > use Data::Dumper; > > my $gffio = Bio::Tools::GFF->new( > -file => > "/path/to/file.gff", > -gff_version => 3 > ); > > print Dumper $gffio->get_seqs(); > > but $gffio->get_seqs() seems to return nothing, although the GFF3 has > sequences and is also valid. > > By the way, I am able to parse the features themselves (using > $gffio->next_feature()). > > > Thanks, > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 13:40:35 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 14:40:35 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, Let's keep the discussion on the mailing list so we can make sure that when this problem is solved, its resolution will be archived. I don't really understand what is going on either, though it would probably be a good idea to set your PERL5LIB env variable so that when you execute this script from the git repository that it will also uses BioPerl modules in the git repository instead of the ones that are installed in your "normal" path. Also, are you using any command line flags when executing it? I didn't. Scott On Sat, Sep 18, 2010 at 2:14 PM, David Breimann wrote: > Yes, I'm using Ubuntu 10.04. > > That is really weired. I tried running the script from the perl-live dir > (which I just pulled using git), and I get the same results as before > (`Name` instead of `locus_tag`): > > ?$ wget > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y > NC_009789.genbank > > Attached is the resulting GFF3. > I also attach a copy of bp_genbank2gff3.pl as found under > /home/dave/src/bioperl-live/blib/script. > > This is a real mystery for me! > > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >> >> Typically I do build and install, but you can run it directly from the >> git checkout directory. >> >> For locating other versions of the script, are you running linux? ?If >> so, are you familiar with the "locate" command: >> >> ?locate bp_genbank2gff3.pl >> >> If you've never used it before, you may need to update the database >> the locate command uses as root: >> >> ?sudo updatedb >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >> wrote: >> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed >> > of >> > `locus_tag=`. >> > >> > I don't really know how to check for multiple bioperl installations. >> > I'm using my personal server, so I don't mind removing and installing >> > everything from scratch -- but I do'nt know ho to do that. >> > >> > Also, what I don't get with the git is how the scripts are supposed to >> > be >> > updated (unless you build and install). >> > >> > Thanks you! >> > >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain wrote: >> >> >> >> Well, if you aren't getting the same results as me then I'd say you >> >> aren't using the same version of the script :-) >> >> >> >> Unfortunately, the scripts are no longer automatically marked with the >> >> "internal" version information when committed, so there really isn't >> >> anything in the script I can tell you to look for. ?Check for more >> >> than one bioperl instance on your ?computer. >> >> >> >> I've attached the GFF3 file I got so you can look at it and tell me if >> >> it is what you expect. >> >> >> >> Scott >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >> >> wrote: >> >> > Hi Scott, >> >> > >> >> > I just pulled the lated bioperl-live using git. >> >> > I'm not sure how the scripts are updated, so I Build and installed >> >> > anyway >> >> > (perhaps exporting the path is supposed to be enough?) >> >> > Anyway, I still get the same results. No locus_tag. >> >> > How can I tell if I'm using the latest version of the script? >> >> > >> >> > Thanks again. >> >> > >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >> >> > wrote: >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> A fresh "pull" of the bioperl git repository shows that >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >> >> >> all >> >> >> features that have a locus_tag, and uses the locus_tag for the ID >> >> >> when >> >> >> it can (it can't blindly use the locus tag for the ID since both the >> >> >> gene and the CDS have the same tag). >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >> >> >> wrote: >> >> >> > Hi Scott, >> >> >> > >> >> >> > Here is a very short genbank: >> >> >> > >> >> >> > >> >> >> > >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> >> >> > >> >> >> > Note all genes in the genbank have locus tags. In the resulting >> >> >> > GFF3, >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I >> >> >> > have >> >> >> > no >> >> >> > idea why it deserves a special treatment... :) >> >> >> > >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last >> >> >> > column >> >> >> > whenever available) will really make my life easier. >> >> >> > >> >> >> > Thank you, >> >> >> > Dave >> >> >> > >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> That seems perfectly reasonable. ?If you could point out a >> >> >> >> GenBank >> >> >> >> entry for which that does not happen, I could try to figure out >> >> >> >> why >> >> >> >> not. >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> >> >> >> wrote: >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >> >> >> >> > locus_tag >> >> >> >> > will >> >> >> >> > be >> >> >> >> > always added to the GFF last column if it exists in the >> >> >> >> > genbank, >> >> >> >> > whether >> >> >> >> > it >> >> >> >> > is used as ID in the GFF or not. >> >> >> >> > >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal >> >> >> >> >> with >> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole >> >> >> >> >> genome >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make >> >> >> >> >> it >> >> >> >> >> "do >> >> >> >> >> the right thing." ?In practice, it is not unusual for a post >> >> >> >> >> processing step (either by hand or a quicky perl script) to be >> >> >> >> >> required to really get it right. ?I don't recall the specifics >> >> >> >> >> (if I >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I do >> >> >> >> >> know >> >> >> >> >> that there is a list of things that it will try to use for the >> >> >> >> >> ID, >> >> >> >> >> and >> >> >> >> >> while the locus is on the list, I don't know where it comes in >> >> >> >> >> the >> >> >> >> >> list, so it's possible that other items might supersede it. >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> >> >> >> wrote: >> >> >> >> >> > Hello, >> >> >> >> >> > >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds >> >> >> >> >> > a >> >> >> >> >> > `locus_tag` >> >> >> >> >> > in the fields and sometime it doesn't, even though the >> >> >> >> >> > genabank >> >> >> >> >> > has a >> >> >> >> >> > locus >> >> >> >> >> > tag. >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >> >> >> >> >> > >> >> >> >> >> > Thanks, >> >> >> >> >> > Dave >> >> >> >> >> > _______________________________________________ >> >> >> >> >> > Bioperl-l mailing list >> >> >> >> >> > Bioperl-l at lists.open-bio.org >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> >> scottcain >> >> >> >> >> dot net >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> 216-392-3087 >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> scottcain >> >> >> >> dot net >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> 216-392-3087 >> >> >> >> Ontario Institute for Cancer Research >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> scottcain >> >> >> dot net >> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> >> Ontario Institute for Cancer Research >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Sat Sep 18 13:48:35 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 14:48:35 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: Hi Dave, The blib directory is not part of the repository; it is created when you execute ./Build as a staging area before installation. The directory that the script resides is scripts/Bio-DB-GFF/ Scott On Sat, Sep 18, 2010 at 2:40 PM, David Breimann wrote: > Now I did a fresh clone (instead of pull) into a new dir: > > $ git clone http://github.com/bioperl/bioperl-live.git > > but I don't find the script at all (there is no blib dir as before)... > > > On Sat, Sep 18, 2010 at 3:14 PM, David Breimann > wrote: >> >> Yes, I'm using Ubuntu 10.04. >> >> That is really weired. I tried running the script from the perl-live dir >> (which I just pulled using git), and I get the same results as before >> (`Name` instead of `locus_tag`): >> >> ?$ wget >> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y >> NC_009789.genbank >> >> Attached is the resulting GFF3. >> I also attach a copy of bp_genbank2gff3.pl as found under >> /home/dave/src/bioperl-live/blib/script. >> >> This is a real mystery for me! >> >> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >>> >>> Typically I do build and install, but you can run it directly from the >>> git checkout directory. >>> >>> For locating other versions of the script, are you running linux? ?If >>> so, are you familiar with the "locate" command: >>> >>> ?locate bp_genbank2gff3.pl >>> >>> If you've never used it before, you may need to update the database >>> the locate command uses as root: >>> >>> ?sudo updatedb >>> >>> Scott >>> >>> >>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >>> wrote: >>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed >>> > of >>> > `locus_tag=`. >>> > >>> > I don't really know how to check for multiple bioperl installations. >>> > I'm using my personal server, so I don't mind removing and installing >>> > everything from scratch -- but I do'nt know ho to do that. >>> > >>> > Also, what I don't get with the git is how the scripts are supposed to >>> > be >>> > updated (unless you build and install). >>> > >>> > Thanks you! >>> > >>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain >>> > wrote: >>> >> >>> >> Well, if you aren't getting the same results as me then I'd say you >>> >> aren't using the same version of the script :-) >>> >> >>> >> Unfortunately, the scripts are no longer automatically marked with the >>> >> "internal" version information when committed, so there really isn't >>> >> anything in the script I can tell you to look for. ?Check for more >>> >> than one bioperl instance on your ?computer. >>> >> >>> >> I've attached the GFF3 file I got so you can look at it and tell me if >>> >> it is what you expect. >>> >> >>> >> Scott >>> >> >>> >> >>> >> >>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >>> >> wrote: >>> >> > Hi Scott, >>> >> > >>> >> > I just pulled the lated bioperl-live using git. >>> >> > I'm not sure how the scripts are updated, so I Build and installed >>> >> > anyway >>> >> > (perhaps exporting the path is supposed to be enough?) >>> >> > Anyway, I still get the same results. No locus_tag. >>> >> > How can I tell if I'm using the latest version of the script? >>> >> > >>> >> > Thanks again. >>> >> > >>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >>> >> > wrote: >>> >> >> >>> >> >> Hi Dave, >>> >> >> >>> >> >> A fresh "pull" of the bioperl git repository shows that >>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >>> >> >> all >>> >> >> features that have a locus_tag, and uses the locus_tag for the ID >>> >> >> when >>> >> >> it can (it can't blindly use the locus tag for the ID since both >>> >> >> the >>> >> >> gene and the CDS have the same tag). >>> >> >> >>> >> >> Scott >>> >> >> >>> >> >> >>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >>> >> >> wrote: >>> >> >> > Hi Scott, >>> >> >> > >>> >> >> > Here is a very short genbank: >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >>> >> >> > >>> >> >> > Note all genes in the genbank have locus tags. In the resulting >>> >> >> > GFF3, >>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I >>> >> >> > have >>> >> >> > no >>> >> >> > idea why it deserves a special treatment... :) >>> >> >> > >>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last >>> >> >> > column >>> >> >> > whenever available) will really make my life easier. >>> >> >> > >>> >> >> > Thank you, >>> >> >> > Dave >>> >> >> > >>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >>> >> >> > >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Hi Dave, >>> >> >> >> >>> >> >> >> That seems perfectly reasonable. ?If you could point out a >>> >> >> >> GenBank >>> >> >> >> entry for which that does not happen, I could try to figure out >>> >> >> >> why >>> >> >> >> not. >>> >> >> >> >>> >> >> >> Scott >>> >> >> >> >>> >> >> >> >>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >>> >> >> >> wrote: >>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >>> >> >> >> > locus_tag >>> >> >> >> > will >>> >> >> >> > be >>> >> >> >> > always added to the GFF last column if it exists in the >>> >> >> >> > genbank, >>> >> >> >> > whether >>> >> >> >> > it >>> >> >> >> > is used as ID in the GFF or not. >>> >> >> >> > >>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >>> >> >> >> > >>> >> >> >> > wrote: >>> >> >> >> >> >>> >> >> >> >> Hi Dave, >>> >> >> >> >> >>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal >>> >> >> >> >> with >>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole >>> >> >> >> >> genome >>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make >>> >> >> >> >> it >>> >> >> >> >> "do >>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post >>> >> >> >> >> processing step (either by hand or a quicky perl script) to >>> >> >> >> >> be >>> >> >> >> >> required to really get it right. ?I don't recall the >>> >> >> >> >> specifics >>> >> >> >> >> (if I >>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I >>> >> >> >> >> do >>> >> >> >> >> know >>> >> >> >> >> that there is a list of things that it will try to use for >>> >> >> >> >> the >>> >> >> >> >> ID, >>> >> >> >> >> and >>> >> >> >> >> while the locus is on the list, I don't know where it comes >>> >> >> >> >> in >>> >> >> >> >> the >>> >> >> >> >> list, so it's possible that other items might supersede it. >>> >> >> >> >> >>> >> >> >> >> Scott >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >>> >> >> >> >> wrote: >>> >> >> >> >> > Hello, >>> >> >> >> >> > >>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it >>> >> >> >> >> > adds a >>> >> >> >> >> > `locus_tag` >>> >> >> >> >> > in the fields and sometime it doesn't, even though the >>> >> >> >> >> > genabank >>> >> >> >> >> > has a >>> >> >> >> >> > locus >>> >> >> >> >> > tag. >>> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >>> >> >> >> >> > >>> >> >> >> >> > Thanks, >>> >> >> >> >> > Dave >>> >> >> >> >> > _______________________________________________ >>> >> >> >> >> > Bioperl-l mailing list >>> >> >> >> >> > Bioperl-l at lists.open-bio.org >>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> > >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> -- >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> ------------------------------------------------------------------------ >>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> >> >> scottcain >>> >> >> >> >> dot net >>> >> >> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> >> >> 216-392-3087 >>> >> >> >> >> Ontario Institute for Cancer Research >>> >> >> >> > >>> >> >> >> > >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> -- >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> ------------------------------------------------------------------------ >>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> >> scottcain >>> >> >> >> dot net >>> >> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> >> 216-392-3087 >>> >> >> >> Ontario Institute for Cancer Research >>> >> >> > >>> >> >> > >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> >>> >> >> >>> >> >> ------------------------------------------------------------------------ >>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> >> scottcain >>> >> >> dot net >>> >> >> GMOD Coordinator (http://gmod.org/) >>> >> >> 216-392-3087 >>> >> >> Ontario Institute for Cancer Research >>> >> > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> >>> >> ------------------------------------------------------------------------ >>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >>> >> scottcain >>> >> dot net >>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >>> >> Ontario Institute for Cancer Research >>> > >>> > >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >>> Ontario Institute for Cancer Research >> > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From david.breimann at gmail.com Sat Sep 18 13:57:30 2010 From: david.breimann at gmail.com (David Breimann) Date: Sat, 18 Sep 2010 15:57:30 +0200 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: So let's do an intermediate summary of my situation: I'm using Ubuntu 10.04 and Perl 5.10.1. I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of "locus_tag=" in the last GFF3 column), while Scott gets the expected results while using the latest version of bioperl. I cloned a fresh version of bioperl live into my ~/src: $ cd ~/src $ git clone http://github.com/bioperl/bioperl-live.git I then added the following line to the end of ~/.profile: export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB" and ran $ source ~/.profile I then downloaded a small genome from NCBI $ wget ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk and tested the script: $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk Following are the top 10 lines of the resulting GFF3: ##gff-version 3 # sequence-region NC_009789 1 6199 # conversion-by bp_genbank2gff3.pl # organism Escherichia coli E24377A # date 06-JAN-2010 # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. NC_009789 GenBank region 1 6199 . + 1 ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli E24377A;plasmid=pETEC_6;strain=E24377A NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001 NC_009789 GenBank mRNA 665 781 . - 1 ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 NC_009789 GenBank CDS 665 781 . - 1 ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified by glimmer%3B putative;codon_start=1;product=hypothetical protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 while these are from Scotts' file: ##gff-version 3 # sequence-region NC_009789 1 6199 # conversion-by bp_genbank2gff3.pl # organism Escherichia coli E24377A # date 06-JAN-2010 # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. NC_009789 GenBank region 1 6199 . + 1 ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from CP000798. Source DNA and bacteria available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli E24377A;plasmid=pETEC_6;strain=E24377A NC_009789 GenBank gene 665 781 . - 1 ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001 NC_009789 GenBank mRNA 665 781 . - 1 ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 NC_009789 GenBank CDS 665 781 . - 1 ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified by glimmer%3B putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's, as desired. I have no idea what is going on here... Best, Dave On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain wrote: > Hi Dave, > > Let's keep the discussion on the mailing list so we can make sure that > when this problem is solved, its resolution will be archived. > > I don't really understand what is going on either, though it would > probably be a good idea to set your PERL5LIB env variable so that when > you execute this script from the git repository that it will also uses > BioPerl modules in the git repository instead of the ones that are > installed in your "normal" path. > > Also, are you using any command line flags when executing it? I didn't. > > Scott > > > On Sat, Sep 18, 2010 at 2:14 PM, David Breimann > wrote: > > Yes, I'm using Ubuntu 10.04. > > > > That is really weired. I tried running the script from the perl-live dir > > (which I just pulled using git), and I get the same results as before > > (`Name` instead of `locus_tag`): > > > > $ wget > > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > > $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y > > NC_009789.genbank > > > > Attached is the resulting GFF3. > > I also attach a copy of bp_genbank2gff3.pl as found under > > /home/dave/src/bioperl-live/blib/script. > > > > This is a real mystery for me! > > > > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: > >> > >> Typically I do build and install, but you can run it directly from the > >> git checkout directory. > >> > >> For locating other versions of the script, are you running linux? If > >> so, are you familiar with the "locate" command: > >> > >> locate bp_genbank2gff3.pl > >> > >> If you've never used it before, you may need to update the database > >> the locate command uses as root: > >> > >> sudo updatedb > >> > >> Scott > >> > >> > >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann > >> wrote: > >> > Your gff seems fine. I get a vey similiar one, but with `Name=` > instaed > >> > of > >> > `locus_tag=`. > >> > > >> > I don't really know how to check for multiple bioperl installations. > >> > I'm using my personal server, so I don't mind removing and installing > >> > everything from scratch -- but I do'nt know ho to do that. > >> > > >> > Also, what I don't get with the git is how the scripts are supposed to > >> > be > >> > updated (unless you build and install). > >> > > >> > Thanks you! > >> > > >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain > wrote: > >> >> > >> >> Well, if you aren't getting the same results as me then I'd say you > >> >> aren't using the same version of the script :-) > >> >> > >> >> Unfortunately, the scripts are no longer automatically marked with > the > >> >> "internal" version information when committed, so there really isn't > >> >> anything in the script I can tell you to look for. Check for more > >> >> than one bioperl instance on your computer. > >> >> > >> >> I've attached the GFF3 file I got so you can look at it and tell me > if > >> >> it is what you expect. > >> >> > >> >> Scott > >> >> > >> >> > >> >> > >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann > >> >> wrote: > >> >> > Hi Scott, > >> >> > > >> >> > I just pulled the lated bioperl-live using git. > >> >> > I'm not sure how the scripts are updated, so I Build and installed > >> >> > anyway > >> >> > (perhaps exporting the path is supposed to be enough?) > >> >> > Anyway, I still get the same results. No locus_tag. > >> >> > How can I tell if I'm using the latest version of the script? > >> >> > > >> >> > Thanks again. > >> >> > > >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain > >> >> > wrote: > >> >> >> > >> >> >> Hi Dave, > >> >> >> > >> >> >> A fresh "pull" of the bioperl git repository shows that > >> >> >> bp_genbank2gff3.pl already does this. It creates a locus_tag for > >> >> >> all > >> >> >> features that have a locus_tag, and uses the locus_tag for the ID > >> >> >> when > >> >> >> it can (it can't blindly use the locus tag for the ID since both > the > >> >> >> gene and the CDS have the same tag). > >> >> >> > >> >> >> Scott > >> >> >> > >> >> >> > >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann > >> >> >> wrote: > >> >> >> > Hi Scott, > >> >> >> > > >> >> >> > Here is a very short genbank: > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > >> >> >> > > >> >> >> > Note all genes in the genbank have locus tags. In the resulting > >> >> >> > GFF3, > >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. > I > >> >> >> > have > >> >> >> > no > >> >> >> > idea why it deserves a special treatment... :) > >> >> >> > > >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 > last > >> >> >> > column > >> >> >> > whenever available) will really make my life easier. > >> >> >> > > >> >> >> > Thank you, > >> >> >> > Dave > >> >> >> > > >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain < > scott at scottcain.net> > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> Hi Dave, > >> >> >> >> > >> >> >> >> That seems perfectly reasonable. If you could point out a > >> >> >> >> GenBank > >> >> >> >> entry for which that does not happen, I could try to figure out > >> >> >> >> why > >> >> >> >> not. > >> >> >> >> > >> >> >> >> Scott > >> >> >> >> > >> >> >> >> > >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann > >> >> >> >> wrote: > >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest > >> >> >> >> > locus_tag > >> >> >> >> > will > >> >> >> >> > be > >> >> >> >> > always added to the GFF last column if it exists in the > >> >> >> >> > genbank, > >> >> >> >> > whether > >> >> >> >> > it > >> >> >> >> > is used as ID in the GFF or not. > >> >> >> >> > > >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain > >> >> >> >> > > >> >> >> >> > wrote: > >> >> >> >> >> > >> >> >> >> >> Hi Dave, > >> >> >> >> >> > >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to > deal > >> >> >> >> >> with > >> >> >> >> >> GenBank files :-) It was designed initially to work on > whole > >> >> >> >> >> genome > >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to > make > >> >> >> >> >> it > >> >> >> >> >> "do > >> >> >> >> >> the right thing." In practice, it is not unusual for a post > >> >> >> >> >> processing step (either by hand or a quicky perl script) to > be > >> >> >> >> >> required to really get it right. I don't recall the > specifics > >> >> >> >> >> (if I > >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I > do > >> >> >> >> >> know > >> >> >> >> >> that there is a list of things that it will try to use for > the > >> >> >> >> >> ID, > >> >> >> >> >> and > >> >> >> >> >> while the locus is on the list, I don't know where it comes > in > >> >> >> >> >> the > >> >> >> >> >> list, so it's possible that other items might supersede it. > >> >> >> >> >> > >> >> >> >> >> Scott > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann > >> >> >> >> >> wrote: > >> >> >> >> >> > Hello, > >> >> >> >> >> > > >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it > adds > >> >> >> >> >> > a > >> >> >> >> >> > `locus_tag` > >> >> >> >> >> > in the fields and sometime it doesn't, even though the > >> >> >> >> >> > genabank > >> >> >> >> >> > has a > >> >> >> >> >> > locus > >> >> >> >> >> > tag. > >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? > >> >> >> >> >> > > >> >> >> >> >> > Thanks, > >> >> >> >> >> > Dave > >> >> >> >> >> > _______________________________________________ > >> >> >> >> >> > Bioperl-l mailing list > >> >> >> >> >> > Bioperl-l at lists.open-bio.org > >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> >> >> >> >> > > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> -- > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > ------------------------------------------------------------------------ > >> >> >> >> >> Scott Cain, Ph. D. scott > at > >> >> >> >> >> scottcain > >> >> >> >> >> dot net > >> >> >> >> >> GMOD Coordinator (http://gmod.org/) > >> >> >> >> >> 216-392-3087 > >> >> >> >> >> Ontario Institute for Cancer Research > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> -- > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ------------------------------------------------------------------------ > >> >> >> >> Scott Cain, Ph. D. scott at > >> >> >> >> scottcain > >> >> >> >> dot net > >> >> >> >> GMOD Coordinator (http://gmod.org/) > >> >> >> >> 216-392-3087 > >> >> >> >> Ontario Institute for Cancer Research > >> >> >> > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> > >> >> >> > >> >> >> > ------------------------------------------------------------------------ > >> >> >> Scott Cain, Ph. D. scott at > >> >> >> scottcain > >> >> >> dot net > >> >> >> GMOD Coordinator (http://gmod.org/) > 216-392-3087 > >> >> >> Ontario Institute for Cancer Research > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> > >> >> > ------------------------------------------------------------------------ > >> >> Scott Cain, Ph. D. scott at > scottcain > >> >> dot net > >> >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> >> Ontario Institute for Cancer Research > >> > > >> > > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From scott at scottcain.net Sat Sep 18 14:03:43 2010 From: scott at scottcain.net (Scott Cain) Date: Sat, 18 Sep 2010 15:03:43 +0100 Subject: [Bioperl-l] bp_genbank2gff3.pl In-Reply-To: References: Message-ID: The only thing I can add is that I did a 'git diff genbank2gff3.PLS' and found no differences. It occurred to me that perhaps I'd done some fixing and not commited it, but it looks to me that that's not the case (assuming I've managed to use git correctly (not a great assumption, but I don't have another one to work with :-)) Scott On Sat, Sep 18, 2010 at 2:57 PM, David Breimann wrote: > So let's do an intermediate summary of my situation: > I'm using Ubuntu 10.04 and Perl 5.10.1. > I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of > "locus_tag=" in the last GFF3 column), while Scott gets the expected results > while using the latest version of bioperl. > I cloned a fresh version of bioperl live into my ~/src: > $ cd ~/src > $ git clone http://github.com/bioperl/bioperl-live.git > > I then added the following line to the end of ~/.profile: > export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB" > and ran > $ source ~/.profile > > I then downloaded a small genome from NCBI > $ wget > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk > and tested the script: > $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk > > Following are the top 10 lines of the resulting GFF3: > > ##gff-version 3 > # sequence-region NC_009789 1 6199 > # conversion-by bp_genbank2gff3.pl > # organism Escherichia coli E24377A > # date 06-JAN-2010 > # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. > NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1 > ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia > coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This > record has not yet been subject to final NCBI review. The reference sequence > was derived from CP000798. Source DNA and bacteria available from Jacques > Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL > REFSEQ: This record has not yet been subject to final NCBI review. The > reference sequence was derived from CP000798. Source DNA and bacteria > available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. > ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli > E24377A;plasmid=pETEC_6;strain=E24377A > NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001 > NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 > NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified > by glimmer%3B putative;codon_start=1;product=hypothetical > protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 > > while these are from Scotts' file: > ##gff-version 3 > # sequence-region NC_009789 1 6199 > # conversion-by bp_genbank2gff3.pl > # organism Escherichia coli E24377A > # date 06-JAN-2010 > # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence. > NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1 > ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A > plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has > not yet been subject to final NCBI review. The reference sequence was > derived from CP000798. Source DNA and bacteria available from Jacques Ravel > (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ: > This record has not yet been subject to final NCBI review. The reference > sequence was derived from CP000798. Source DNA and bacteria available from > Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length. > ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli > E24377A;plasmid=pETEC_6;strain=E24377A > NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001 > NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001 > NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1 > ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified > by glimmer%3B > putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical > protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38 > > > Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's, > as desired. > I have no idea what is going on here... > > Best, > Dave > > On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain wrote: >> >> Hi Dave, >> >> Let's keep the discussion on the mailing list so we can make sure that >> when this problem is solved, its resolution will be archived. >> >> I don't really understand what is going on either, though it would >> probably be a good idea to set your PERL5LIB env variable so that when >> you execute this script from the git repository that it will also uses >> BioPerl modules in the git repository instead of the ones that are >> installed in your "normal" path. >> >> Also, are you using any command line flags when executing it? ?I didn't. >> >> Scott >> >> >> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann >> wrote: >> > Yes, I'm using Ubuntu 10.04. >> > >> > That is really weired. I tried running the script from the perl-live dir >> > (which I just pulled using git), and I get the same results as before >> > (`Name` instead of `locus_tag`): >> > >> > ?$ wget >> > >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y >> > NC_009789.genbank >> > >> > Attached is the resulting GFF3. >> > I also attach a copy of bp_genbank2gff3.pl as found under >> > /home/dave/src/bioperl-live/blib/script. >> > >> > This is a real mystery for me! >> > >> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain wrote: >> >> >> >> Typically I do build and install, but you can run it directly from the >> >> git checkout directory. >> >> >> >> For locating other versions of the script, are you running linux? ?If >> >> so, are you familiar with the "locate" command: >> >> >> >> ?locate bp_genbank2gff3.pl >> >> >> >> If you've never used it before, you may need to update the database >> >> the locate command uses as root: >> >> >> >> ?sudo updatedb >> >> >> >> Scott >> >> >> >> >> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann >> >> wrote: >> >> > Your gff seems fine. I get a vey similiar one, but with `Name=` >> >> > instaed >> >> > of >> >> > `locus_tag=`. >> >> > >> >> > I don't really know how to check for multiple bioperl installations. >> >> > I'm using my personal server, so I don't mind removing and installing >> >> > everything from scratch -- but I do'nt know ho to do that. >> >> > >> >> > Also, what I don't get with the git is how the scripts are supposed >> >> > to >> >> > be >> >> > updated (unless you build and install). >> >> > >> >> > Thanks you! >> >> > >> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain >> >> > wrote: >> >> >> >> >> >> Well, if you aren't getting the same results as me then I'd say you >> >> >> aren't using the same version of the script :-) >> >> >> >> >> >> Unfortunately, the scripts are no longer automatically marked with >> >> >> the >> >> >> "internal" version information when committed, so there really isn't >> >> >> anything in the script I can tell you to look for. ?Check for more >> >> >> than one bioperl instance on your ?computer. >> >> >> >> >> >> I've attached the GFF3 file I got so you can look at it and tell me >> >> >> if >> >> >> it is what you expect. >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann >> >> >> wrote: >> >> >> > Hi Scott, >> >> >> > >> >> >> > I just pulled the lated bioperl-live using git. >> >> >> > I'm not sure how the scripts are updated, so I Build and installed >> >> >> > anyway >> >> >> > (perhaps exporting the path is supposed to be enough?) >> >> >> > Anyway, I still get the same results. No locus_tag. >> >> >> > How can I tell if I'm using the latest version of the script? >> >> >> > >> >> >> > Thanks again. >> >> >> > >> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> A fresh "pull" of the bioperl git repository shows that >> >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for >> >> >> >> all >> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID >> >> >> >> when >> >> >> >> it can (it can't blindly use the locus tag for the ID since both >> >> >> >> the >> >> >> >> gene and the CDS have the same tag). >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann >> >> >> >> wrote: >> >> >> >> > Hi Scott, >> >> >> >> > >> >> >> >> > Here is a very short genbank: >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk >> >> >> >> > >> >> >> >> > Note all genes in the genbank have locus tags. In the resulting >> >> >> >> > GFF3, >> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. >> >> >> >> > I >> >> >> >> > have >> >> >> >> > no >> >> >> >> > idea why it deserves a special treatment... :) >> >> >> >> > >> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 >> >> >> >> > last >> >> >> >> > column >> >> >> >> > whenever available) will really make my life easier. >> >> >> >> > >> >> >> >> > Thank you, >> >> >> >> > Dave >> >> >> >> > >> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain >> >> >> >> > >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> That seems perfectly reasonable. ?If you could point out a >> >> >> >> >> GenBank >> >> >> >> >> entry for which that does not happen, I could try to figure >> >> >> >> >> out >> >> >> >> >> why >> >> >> >> >> not. >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann >> >> >> >> >> wrote: >> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest >> >> >> >> >> > locus_tag >> >> >> >> >> > will >> >> >> >> >> > be >> >> >> >> >> > always added to the GFF last column if it exists in the >> >> >> >> >> > genbank, >> >> >> >> >> > whether >> >> >> >> >> > it >> >> >> >> >> > is used as ID in the GFF or not. >> >> >> >> >> > >> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain >> >> >> >> >> > >> >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> >> >> Hi Dave, >> >> >> >> >> >> >> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to >> >> >> >> >> >> deal >> >> >> >> >> >> with >> >> >> >> >> >> GenBank files :-) ?It was designed initially to work on >> >> >> >> >> >> whole >> >> >> >> >> >> genome >> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to >> >> >> >> >> >> make >> >> >> >> >> >> it >> >> >> >> >> >> "do >> >> >> >> >> >> the right thing." ?In practice, it is not unusual for a >> >> >> >> >> >> post >> >> >> >> >> >> processing step (either by hand or a quicky perl script) to >> >> >> >> >> >> be >> >> >> >> >> >> required to really get it right. ?I don't recall the >> >> >> >> >> >> specifics >> >> >> >> >> >> (if I >> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I >> >> >> >> >> >> do >> >> >> >> >> >> know >> >> >> >> >> >> that there is a list of things that it will try to use for >> >> >> >> >> >> the >> >> >> >> >> >> ID, >> >> >> >> >> >> and >> >> >> >> >> >> while the locus is on the list, I don't know where it comes >> >> >> >> >> >> in >> >> >> >> >> >> the >> >> >> >> >> >> list, so it's possible that other items might supersede it. >> >> >> >> >> >> >> >> >> >> >> >> Scott >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann >> >> >> >> >> >> wrote: >> >> >> >> >> >> > Hello, >> >> >> >> >> >> > >> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it >> >> >> >> >> >> > adds >> >> >> >> >> >> > a >> >> >> >> >> >> > `locus_tag` >> >> >> >> >> >> > in the fields and sometime it doesn't, even though the >> >> >> >> >> >> > genabank >> >> >> >> >> >> > has a >> >> >> >> >> >> > locus >> >> >> >> >> >> > tag. >> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag? >> >> >> >> >> >> > >> >> >> >> >> >> > Thanks, >> >> >> >> >> >> > Dave >> >> >> >> >> >> > _______________________________________________ >> >> >> >> >> >> > Bioperl-l mailing list >> >> >> >> >> >> > Bioperl-l at lists.open-bio.org >> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott >> >> >> >> >> >> at >> >> >> >> >> >> scottcain >> >> >> >> >> >> dot net >> >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> >> 216-392-3087 >> >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> >> scottcain >> >> >> >> >> dot net >> >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> >> 216-392-3087 >> >> >> >> >> Ontario Institute for Cancer Research >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> >> scottcain >> >> >> >> dot net >> >> >> >> GMOD Coordinator (http://gmod.org/) >> >> >> >> 216-392-3087 >> >> >> >> Ontario Institute for Cancer Research >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at >> >> >> scottcain >> >> >> dot net >> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> >> Ontario Institute for Cancer Research >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> >> >> ------------------------------------------------------------------------ >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> >> dot net >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> >> Ontario Institute for Cancer Research >> > >> > >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From j.scholtalbers at gmail.com Mon Sep 20 08:04:34 2010 From: j.scholtalbers at gmail.com (Jelle Scholtalbers) Date: Mon, 20 Sep 2010 10:04:34 +0200 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: Hi, I'm trying to get all descendents for a specific taxon using Entrez. each_Descendent and get_all_Descendents don't seem to be implemented or working. I then tried by getting the tree for this taxon using Bio::DB::Taxonomy's get_tree. However this only retrieves the ancestors/parents. What would be the best approach here? Cheers, Jelle On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > Eric > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > wrote: > > Sounds like this is going through an initial indexing step (for > flatfiles). I would expect the initial indexing of the tables to take time > as you have to create the DB, but subsequent lookups post-indexing should be > much faster if the index is already present. Maybe Jason could answer in > more detail? > > > > chris > > > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > >> Hello, > >> > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl > >> 5.8.5 with BioPerl 1.6.0 > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > >> > >> It ran for 100 cpu seconds and output: > >> > >> 33090 Viridiplantae kingdom > >> > >> I was expecting it to also output the descendents. Some questions: > >> > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > >> implemented? It looks to be in Taxon.pm but it is not documented and > >> when I ran Data::Dumper on $node the value '_desc' was empty. > >> > >> 2) is the flatfile reader always so slow? after replacing 'flatfile' > >> with a call to 'entrez' it took only 0.02 cpu seconds to come > >> up with the same result. > >> > >> thanks, > >> Eric > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From pcantalupo at gmail.com Mon Sep 20 14:46:32 2010 From: pcantalupo at gmail.com (Paul Cantalupo) Date: Mon, 20 Sep 2010 10:46:32 -0400 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: Jelle, Below is my subroutine that returns the lineage corresponding to a Taxonomy id. For example, if you use 10633 as the taxid, the subroutine will return: Viruses dsDNA viruses, no RNA stage Polyomaviridae Polyomavirus Simian virus 40 I hope this is what you wanted. Good luck sub taxid2lineage { ?? my ($id) = @_; ?? return undef unless ($id); ?? my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', ????????????????????????????????????????? -db??? => 'taxonomy', ????????????????????????????????????????? -email => 'pcantalupo at gmail.com', ????????????????????????????????????????? -id??? => [ $id ], ????????????????????????????????????????? ); ?? my $res = $factory->get_Response->content; ?? my $data = XMLin($res); ?? if (!ref($data)) { ????? # this happens when the Taxid is not found in the Taxonomy DB ????? return $data; ?? } ?? my @lineage = (); ?? foreach my $taxa (@{ $data->{Taxon}->{LineageEx}->{Taxon} } ) { ????? # taxa is a hash with three keys ScientificName, TaxId, and Rank ????? # I'm only saving the ScientificName but possible extensions to this ????? # subroutine would be to return the TaxId and Rank as well. ????? push (@lineage, $taxa->{ScientificName}); ?? } ?? # add the Species to the end of the Lineage array. ?? push (@lineage, $data->{Taxon}->{ScientificName}); ?? return wantarray ? return @lineage : join("; ", @lineage); } Paul Cantalupo University of Pittsburgh On Mon, Sep 20, 2010 at 4:04 AM, Jelle Scholtalbers wrote: > > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. ?I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > > Eric > > > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > > wrote: > > > Sounds like this is going through an initial indexing step (for > > flatfiles). ?I would expect the initial indexing of the tables to take time > > as you have to create the DB, but subsequent lookups post-indexing should be > > much faster if the index is already present. ?Maybe Jason could answer in > > more detail? > > > > > > chris > > > > > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > > > >> Hello, > > >> > > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl > > >> 5.8.5 with BioPerl 1.6.0 > > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > >> > > >> It ran for 100 cpu seconds and output: > > >> > > >> 33090 Viridiplantae kingdom > > >> > > >> I was expecting it to also output the descendents. Some questions: > > >> > > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > > >> implemented? It looks to be in Taxon.pm but it is not documented and > > >> when I ran Data::Dumper on $node the value '_desc' was empty. > > >> > > >> 2) is the flatfile reader always so slow? after replacing 'flatfile' > > >> with a call to 'entrez' it took only 0.02 cpu seconds to come > > >> up with the same result. > > >> > > >> thanks, > > >> Eric > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Sep 20 15:38:36 2010 From: jason at bioperl.org (Jason Stajich) Date: Mon, 20 Sep 2010 08:38:36 -0700 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: <4C977FFC.5000205@bioperl.org> This works for me to get all the descendents from sub-node. You have to call the function with the dabatase handle. I am not sure if the Taxon implementation has reference to the dbhandle or not: #!/usr/bin/perl -w use strict; use Bio::DB::Taxonomy; my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into this directory my $db = Bio::DB::Taxonomy->new(-source => 'flatfile', -nodesfile => "$dbdir/nodes.dmp", -namesfile => "$dbdir/names.dmp", ); my $taxa = $db->get_taxon(-taxonid => 151341); my @d = $db->get_all_Descendents($taxa); print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name } @d), "\n"; Hope that helps. Jelle Scholtalbers wrote, On 9/20/10 1:04 AM: > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > >> Thanks, that was indeed the answer to #2. Any idea about each_Descendent? >> Eric >> >> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields >> wrote: >> >>> Sounds like this is going through an initial indexing step (for >>> >> flatfiles). I would expect the initial indexing of the tables to take time >> as you have to create the DB, but subsequent lookups post-indexing should be >> much faster if the index is already present. Maybe Jason could answer in >> more detail? >> >>> chris >>> >>> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: >>> >>> >>>> Hello, >>>> >>>> I tried the Bio::DB::Taxonomy example on this wiki page using perl >>>> 5.8.5 with BioPerl 1.6.0 >>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy >>>> >>>> It ran for 100 cpu seconds and output: >>>> >>>> 33090 Viridiplantae kingdom >>>> >>>> I was expecting it to also output the descendents. Some questions: >>>> >>>> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually >>>> implemented? It looks to be in Taxon.pm but it is not documented and >>>> when I ran Data::Dumper on $node the value '_desc' was empty. >>>> >>>> 2) is the flatfile reader always so slow? after replacing 'flatfile' >>>> with a call to 'entrez' it took only 0.02 cpu seconds to come >>>> up with the same result. >>>> >>>> thanks, >>>> Eric >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From j.scholtalbers at gmail.com Wed Sep 22 07:46:35 2010 From: j.scholtalbers at gmail.com (Jelle Scholtalbers) Date: Wed, 22 Sep 2010 09:46:35 +0200 Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent In-Reply-To: <4C977FFC.5000205@bioperl.org> References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> <4C977FFC.5000205@bioperl.org> Message-ID: Hi Jason, this was the same method I was using. With the taxdump it works apparently, however it does not work with Entrez as source. So I will just stick to a up2date taxdump then. Thanks for your example. @Paul: Your method gives indeed the lineage but will only retrieve the ancestors. I want to retrieve all the descendents. Thx anyway. Cheers, Jelle On Mon, Sep 20, 2010 at 5:38 PM, Jason Stajich wrote: > > This works for me to get all the descendents from sub-node. You have to > call the function with the dabatase handle. I am not sure if the Taxon > implementation has reference to the dbhandle or not: > #!/usr/bin/perl -w > use strict; > use Bio::DB::Taxonomy; > my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into > this directory > my $db = Bio::DB::Taxonomy->new(-source => 'flatfile', > -nodesfile => "$dbdir/nodes.dmp", > -namesfile => "$dbdir/names.dmp", > ); > my $taxa = $db->get_taxon(-taxonid => 151341); > my @d = $db->get_all_Descendents($taxa); > > print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name > } @d), "\n"; > > > Hope that helps. > Jelle Scholtalbers wrote, On 9/20/10 1:04 AM: > > Hi, > > I'm trying to get all descendents for a specific taxon using Entrez. > each_Descendent and get_all_Descendents don't seem to be implemented or > working. I then tried by getting the tree for this taxon using > Bio::DB::Taxonomy's get_tree. However this only retrieves the > ancestors/parents. > What would be the best approach here? > > Cheers, > Jelle > > On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins wrote: > > > > Thanks, that was indeed the answer to #2. Any idea about each_Descendent? > Eric > > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields > wrote: > > > Sounds like this is going through an initial indexing step (for > > > flatfiles). I would expect the initial indexing of the tables to take time > as you have to create the DB, but subsequent lookups post-indexing should be > much faster if the index is already present. Maybe Jason could answer in > more detail? > > > chris > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote: > > > > Hello, > > I tried the Bio::DB::Taxonomy example on this wiki page using perl > 5.8.5 with BioPerl 1.6.0http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy > > It ran for 100 cpu seconds and output: > > 33090 Viridiplantae kingdom > > I was expecting it to also output the descendents. Some questions: > > 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually > implemented? It looks to be in Taxon.pm but it is not documented and > when I ran Data::Dumper on $node the value '_desc' was empty. > > 2) is the flatfile reader always so slow? after replacing 'flatfile' > with a call to 'entrez' it took only 0.02 cpu seconds to come > up with the same result. > > thanks, > Eric > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l > > From waldenhe at muohio.edu Fri Sep 24 19:15:48 2010 From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene) Date: Fri, 24 Sep 2010 15:15:48 -0400 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> Hello Bioperl Masters, I am trying to perform a local blast with a query list of fasta files against a db of other fasta files. I am attempting to use the Bio::Tools::Run::StandAloneBlastPlus module. I have downleaded from the NCBI website BLAST+ 2.2.24+ and installed on my ubuntu machine. I am using bioperl-1.5.2. so the snibbit of code that is giving me errors is below: my $seq_obj = Bio::Seq->new(-id =>$accn, -seq =>$seq); my $report_obj = $blast_obj->blastall($seq_obj); my $result_obj = $report_obj->next_result; print $result_obj->num_hits; The error I am getting is: --------------------- WARNING --------------------- MSG: cannot find path to blastall --------------------------------------------------- Can't call method "next_result" on an undefined value at /media/C8B3-4A4A/Bioinformatics 1.1 beta/BioPerl/bioperl.pm line 284. I think the real problem is the "cannot find path to Blastall. >From reading around on different forums I have to make a .ncbirc text file with the location of BLAST+2.2.24+ on my machine. I have that file in my /home folder. How do I get StandAloneBlastPlus synced up with BLAST+2.2.24+ ? Am I approaching this right? Thankyou, Hans Waldenmaier From ross at cuhk.edu.hk Sat Sep 25 08:30:39 2010 From: ross at cuhk.edu.hk (Ross KK Leung) Date: Sat, 25 Sep 2010 16:30:39 +0800 Subject: [Bioperl-l] perl for GO In-Reply-To: References: <9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu> Message-ID: <015201cb5c8b$ef693490$ce3b9db0$@edu.hk> Given a set of GO IDs, e.g. GO:0008150 GO:0005750 GO:0006122 GO:0008121 GO:0003674 GO:0005575 GO:0008150 GO:0009507 GO:0009535 GO:0009567 GO:0009977 GO:0010027 GO:0031361 from http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo one can manually examine the hierarchy. Although there is go-perl (http://search.cpan.org/~cmungall/go-perl/) and go-db-perl (http://search.cpan.org/~cmungall/go-db-perl/), as a life science student who just learns Perl, I find it difficult to draw a hierarchy tree (or simply make it a table to count the occurrence) to produce something like: biological_process (4) *** cellular process (4) ****** cell adhesion (1) ****** cell differention (3) Molecular function (4) Cellular component (4) Can anybody advise? I don't need any fancy figures at all... From David.Messina at sbc.su.se Sun Sep 26 16:11:54 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Sun, 26 Sep 2010 18:11:54 +0200 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu> Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0@sbc.su.se> Hi Hans, > I think the real problem is the "cannot find path to Blastall. Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. Dave From maj at fortinbras.us Mon Sep 27 00:43:15 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 27 Sep 2010 00:43:15 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: git clone http://github.com/bioperl/bioperl-live.git git clone http://github.com/bioperl/bioperl-run.git (i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl cheers MAJ -------------------------- Mark A. Jensen, PhD Senior Consultant Fortinbras Research http://www.fortinbras.us >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Sunday, September 26, 2010 12:11 PM >To: 'Waldenmaier, Hans Eugene' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans, > > >> I think the real problem is the "cannot find path to Blastall. > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > >Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Mon Sep 27 21:07:11 2010 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Sep 2010 16:07:11 -0500 Subject: [Bioperl-l] Client-side Scansite Bioperl module In-Reply-To: References: Message-ID: Sorry, didn't see this being responded to on-list (been off the radar the last month). I think this is a good idea, but I'm wondering if this might be better as a separate release on CPAN from bioperl core, seeing as we're in the prelim stages after the next bioperl release of modularizing the current bioperl core into smaller independent releases. chris On Sep 4, 2010, at 10:40 AM, Jonathan Rameseder wrote: > hi guys > > it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases. maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D! > > best wishes > johnny > > [1] Bio::Tools::Analysis::Protein::Scansite > [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593 > > ******************** > Jonathan Rameseder > Ph.D. Candidate > Computational Systems Biology Initiative > Koch Institute for Integrative Cancer Research > Massachusetts Institute of Technology > ******************** From gandipalem at gmail.com Tue Sep 28 04:09:06 2010 From: gandipalem at gmail.com (bv s) Date: Tue, 28 Sep 2010 09:39:06 +0530 Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19 In-Reply-To: References: Message-ID: Dear Sir/Madam, Any one can tell how to use the make_primers.pl script? What is Coordination file? Regards Suresh Scholar, National Bureau Of Plant Genetic Resources, New Delhi. On Mon, Sep 27, 2010 at 9:30 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. Re: StandAloneBlastPlus (Dave Messina) > 2. Re: StandAloneBlastPlus (Mark A. Jensen) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 26 Sep 2010 18:11:54 +0200 > From: Dave Messina > Subject: Re: [Bioperl-l] StandAloneBlastPlus > To: "Waldenmaier, Hans Eugene" > Cc: "bioperl-l at bioperl.org" > Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0 at sbc.su.se> > Content-Type: text/plain; charset=us-ascii > > Hi Hans, > > > > I think the real problem is the "cannot find path to Blastall. > > Yes. But it sounds like you're trying to use the Bio::Tools::Run modules > for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it > has blastn, blastp, etc. > > See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > > Also, you probably need to upgrade your BioPerl installation. I'm pretty > sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > > Dave > > > > > ------------------------------ > > Message: 2 > Date: Mon, 27 Sep 2010 00:43:15 +0000 > From: "Mark A. Jensen" > Subject: Re: [Bioperl-l] StandAloneBlastPlus > To: "Dave Messina" , "Waldenmaier, Hans > Eugene" > Cc: bioperl-l at bioperl.org > Message-ID: > Content-Type: text/plain; charset="us-ascii" > > Hi Hans-- Dave is right; you'll need both the new blast+ as well as the > latest BioPerl trunk code. Get it by doing both of the following: > > git clone http://github.com/bioperl/bioperl-live.git > git clone http://github.com/bioperl/bioperl-run.git > > (i.e., you need the latest core and run distributions). To install, see > http://www.bioperl.org/wiki/Installing_BioPerl > > cheers MAJ > > -------------------------- > Mark A. Jensen, PhD > Senior Consultant > Fortinbras Research > http://www.fortinbras.us > > >-----Original Message----- > >From: Dave Messina [mailto:David.Messina at sbc.su.se] > >Sent: Sunday, September 26, 2010 12:11 PM > >To: 'Waldenmaier, Hans Eugene' > >Cc: bioperl-l at bioperl.org > >Subject: Re: [Bioperl-l] StandAloneBlastPlus > > > >Hi Hans, > > > > > >> I think the real problem is the "cannot find path to Blastall. > > > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules > for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it > has blastn, blastp, etc. > > > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > > > >Also, you probably need to upgrade your BioPerl installation. I'm pretty > sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > > > > > >Dave > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 89, Issue 19 > ***************************************** > From David.Messina at sbc.su.se Tue Sep 28 07:53:29 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 28 Sep 2010 09:53:29 +0200 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> Message-ID: <0BFD9DB0-40D9-4443-8968-CF5D5A31BD02@sbc.su.se> > I can get the command-line Blast running. But I still cannot get Perl to see BLAST. Type the following on the command line: perl -e 'print $ENV{PATH}, "\n"' You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing export /home/hans/BLAST/bin:PATH=${PATH} on the command line and then type perl -e 'print $ENV{PATH}, "\n"' again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script? > I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: > export PATH=${PATH}:/home/hans/BLAST/bin > export BLASTDIR=/home/hans/BLAST/ > > Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added. It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you. Dave From David.Messina at sbc.su.se Tue Sep 28 07:58:00 2010 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 28 Sep 2010 09:58:00 +0200 Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19 In-Reply-To: References: Message-ID: <6BACC902-4F5E-466B-B949-FE373831CB92@sbc.su.se> > Any one can tell how to use the make_primers.pl script? > What is Coordination file? >From the documentation at the top of the script: Description: This program designs primers for constructing knockouts of genes by transformation of PCR products (ref: Datsenko & Wanner, PNAS 2000). A tab-delimited file containing ORF START STOP is read, and primers flanking the start & stop coordinates are designed based on the user-designated sequence file. In addition, primers flanking the knockout regions are chosen for PCR screening purposes once the knockout is generated. The script uses Bioperl in order to determine the primer sequences, which requires getting subsequences and reverse complementing some of the objects. Dave From maj at fortinbras.us Tue Sep 28 11:18:34 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 28 Sep 2010 11:18:34 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: The module checks the env variable BLASTPLUSDIR for the executable; you can set it directly export BLASTPLUSDIR=/home/hans/BLAST/bin and you should be good to go. MAJ >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Tuesday, September 28, 2010 03:53 AM >To: 'Waldenmaier, Hans Eugene' >Cc: 'Mark A. Jensen', bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >> I can get the command-line Blast running. But I still cannot get Perl to see BLAST. > >Type the following on the command line: >perl -e 'print $ENV{PATH}, "\n"' > >You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing >export /home/hans/BLAST/bin:PATH=${PATH} > >on the command line and then type >perl -e 'print $ENV{PATH}, "\n"' > >again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script? > > >> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: >> export PATH=${PATH}:/home/hans/BLAST/bin >> export BLASTDIR=/home/hans/BLAST/ >> >> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. > >It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added. > >It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you. > > >Dave >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From waldenhe at muohio.edu Tue Sep 28 04:52:56 2010 From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene) Date: Tue, 28 Sep 2010 00:52:56 -0400 Subject: [Bioperl-l] StandAloneBlastPlus In-Reply-To: References: Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu> Thanks Guys, I have run those steps, my current version now is: hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' 1.006001 But I am still having problems. I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI. I can get the command-line Blast running. But I still cannot get Perl to see BLAST. Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now. I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: export PATH=${PATH}:/home/hans/BLAST/bin export BLASTDIR=/home/hans/BLAST/ Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. Thanks for the help, Hans ________________________________________ From: Mark A. Jensen [maj at fortinbras.us] Sent: Sunday, September 26, 2010 8:43 To: Dave Messina; Waldenmaier, Hans Eugene Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] StandAloneBlastPlus Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: git clone http://github.com/bioperl/bioperl-live.git git clone http://github.com/bioperl/bioperl-run.git (i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl cheers MAJ -------------------------- Mark A. Jensen, PhD Senior Consultant Fortinbras Research http://www.fortinbras.us >-----Original Message----- >From: Dave Messina [mailto:David.Messina at sbc.su.se] >Sent: Sunday, September 26, 2010 12:11 PM >To: 'Waldenmaier, Hans Eugene' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans, > > >> I think the real problem is the "cannot find path to Blastall. > >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. > >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. > >Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. > > > >Dave > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Tue Sep 28 15:04:07 2010 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 28 Sep 2010 15:04:07 +0000 Subject: [Bioperl-l] StandAloneBlastPlus Message-ID: Should work from .bashrc, Hans. Also add export BLASTPLUSDIR=/home/hans/BLAST/bin It really should see it in the PATH as you have it, so that may be a bug; however the BLASTPLUSDIR should force it to see the program. You can also execute the export commands in the shell, and the variables will be set and visible to programs for the duration of the login session. You can see what they are set to in the shell by doing set | grep BLAST cheers MAJ >-----Original Message----- >From: Waldenmaier, Hans Eugene [mailto:waldenhe at muohio.edu] >Sent: Tuesday, September 28, 2010 12:52 AM >To: 'Mark A. Jensen', 'Dave Messina' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Thanks Guys, > >I have run those steps, my current version now is: >hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;' >1.006001 > >But I am still having problems. > >I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI. I can get the command-line Blast running. But I still cannot get Perl to see BLAST. >Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now. >I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST: >export PATH=${PATH}:/home/hans/BLAST/bin >export BLASTDIR=/home/hans/BLAST/ > >Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special. > >Thanks for the help, > >Hans >________________________________________ >From: Mark A. Jensen [maj at fortinbras.us] >Sent: Sunday, September 26, 2010 8:43 >To: Dave Messina; Waldenmaier, Hans Eugene >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] StandAloneBlastPlus > >Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following: > >git clone http://github.com/bioperl/bioperl-live.git >git clone http://github.com/bioperl/bioperl-run.git > >(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl > >cheers MAJ > >-------------------------- >Mark A. Jensen, PhD >Senior Consultant >Fortinbras Research >http://www.fortinbras.us > >>-----Original Message----- >>From: Dave Messina [mailto:David.Messina at sbc.su.se] >>Sent: Sunday, September 26, 2010 12:11 PM >>To: 'Waldenmaier, Hans Eugene' >>Cc: bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] StandAloneBlastPlus >> >>Hi Hans, >> >> >>> I think the real problem is the "cannot find path to Blastall. >> >>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc. >> >>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code. >> >>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it. >> >> >> >>Dave >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From chiragmatkarbioinfo at gmail.com Thu Sep 30 12:20:35 2010 From: chiragmatkarbioinfo at gmail.com (chirag matkar) Date: Thu, 30 Sep 2010 19:20:35 +0700 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id Message-ID: Hello all, Is there any module to fetch dna sequence data from ensemble gene id? -- Regards, Chirag Matkar From jun.yin at ucd.ie Thu Sep 30 13:36:31 2010 From: jun.yin at ucd.ie (Jun Yin) Date: Thu, 30 Sep 2010 14:36:31 +0100 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id In-Reply-To: References: Message-ID: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> Hi, Chirag, BioPerl does not have any module to retrieve data from Ensembl. But Ensembl provides a BioPerl-like interface on that function. You can visit Ensembl's website on how to use that module: http://www.ensembl.org/info/data/api.html Cheers, Jun Yin Ph.D.?student in U.C.D. Bioinformatics Laboratory Conway Institute University College Dublin -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar Sent: Thursday, September 30, 2010 1:21 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id Hello all, Is there any module to fetch dna sequence data from ensemble gene id? -- Regards, Chirag Matkar _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com __________ Information from ESET Smart Security, version of virus signature database 5377 (20100818) __________ The message was checked by ESET Smart Security. http://www.eset.com From cjfields at illinois.edu Thu Sep 30 15:16:45 2010 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 30 Sep 2010 10:16:45 -0500 Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id In-Reply-To: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> References: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie> Message-ID: On Sep 30, 2010, at 8:36 AM, Jun Yin wrote: > Hi, Chirag, > > BioPerl does not have any module to retrieve data from Ensembl. But Ensembl > provides a BioPerl-like interface on that function. Actually, BioPerl does have Bio::Tools::Run::Ensembl, which was submitted by Sendu Bala a few years back. I think it stills works rather well, at least tests pass. You might get more out of using the Ensembl API directly as Jun states though, YMMV. BTW, the ensembl API also works with the latest bioperl code, regardless what the Ensembl website says (e.g. they only support v1.2.3). Haven't heard more about whether this discrepancy was supposed to be addressed at some point. chris > You can visit Ensembl's website on how to use that module: > http://www.ensembl.org/info/data/api.html > > Cheers, > Jun Yin > Ph.D. student in U.C.D. > > Bioinformatics Laboratory > Conway Institute > University College Dublin > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar > Sent: Thursday, September 30, 2010 1:21 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id > > Hello all, > Is there any module to fetch dna sequence data from ensemble gene id? > > -- > Regards, > Chirag Matkar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > __________ Information from ESET Smart Security, version of virus signature > database 5377 (20100818) __________ > > The message was checked by ESET Smart Security. > > http://www.eset.com > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From A.Vakhrusheva at lumc.nl Wed Sep 29 13:28:54 2010 From: A.Vakhrusheva at lumc.nl (A.Vakhrusheva at lumc.nl) Date: Wed, 29 Sep 2010 15:28:54 +0200 Subject: [Bioperl-l] Bio::Matrix::MatrixI Message-ID: <35D95AF6C5D146479C328BBBA554FB76028C367E@mailf.lumcnet.prod.intern> Bio::Matrix::MatrixI I have a question concerning this interface. I want to calculate p distances matrix, but what format is acceptable for input? Phylip doesn't work Anna