From shalabh.sharma7 at gmail.com  Wed Sep  1 16:56:35 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Wed, 1 Sep 2010 16:56:35 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer
Message-ID: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>

Hi ,
     I am trying to parse hmmsearch report (from HMMER3). I am using the
script mentioned here:
http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm

I am not getting anything but this "amoA_10genes_align.fasta.2  [M=247] for
HMM" as the output, i am not even getting any error.
I am attaching the hmmsearch report (just a test report) which i tried to
test against the parser.

I would really appreciate if anyone can help me out.

Thanks
Shalabh Sharma
-------------- next part --------------
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  amoA_10genes.hmm
# target sequence database:        test.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       amoA_10genes_align.fasta.2  [M=247]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                   Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                   -----------
    1.6e-72  231.1   5.1    1.7e-72  231.0   3.5    1.0  1  gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacte
    1.6e-72  231.1   5.1    1.7e-72  231.0   3.5    1.0  1  gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacte


Domain annotation for each sequence (and alignments):
>> gi|63021979|gb|AAY26564.1|  AmoA [uncultured beta proteobacterium]
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  231.0   3.5   1.7e-72   1.7e-72     113     245 ..       1     144 [.       1     146 [. 0.95

  Alignments for each domain:
  == domain 1    score: 231.0 bits;  conditional E-value: 1.7e-72
  amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195
                                 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe
  gi|63021979|gb|AAY26564.1|   1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 
                                 8********************************************************************************** PP

  amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245
                                 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af           +k+
  gi|63021979|gb|AAY26564.1|  84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144
                                 **********************************************966666666655555 PP

>> gi|63021981|gb|AAY26565.1|  AmoA [uncultured beta proteobacterium]
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  231.0   3.5   1.7e-72   1.7e-72     113     245 ..       1     144 [.       1     146 [. 0.95

  Alignments for each domain:
  == domain 1    score: 231.0 bits;  conditional E-value: 1.7e-72
  amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195
                                 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe
  gi|63021981|gb|AAY26565.1|   1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 
                                 8********************************************************************************** PP

  amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245
                                 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af           +k+
  gi|63021981|gb|AAY26565.1|  84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144
                                 **********************************************966666666655555 PP


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (247 nodes)
Target sequences:                          2  (300 residues)
Passed MSV filter:                         2  (1); expected 0.0 (0.02)
Passed bias filter:                        2  (1); expected 0.0 (0.02)
Passed Vit filter:                         2  (1); expected 0.0 (0.001)
Passed Fwd filter:                         2  (1); expected 0.0 (1e-05)
Initial search space (Z):                  2  [actual number of targets]
Domain search space  (domZ):               2  [number of targets reported over threshold]
# CPU time: 0.03u 0.00s 00:00:00.03 Elapsed: 00:00:00.08
# Mc/sec: 0.93
//

From thomas.sharpton at gmail.com  Wed Sep  1 17:29:26 2010
From: thomas.sharpton at gmail.com (Thomas Sharpton)
Date: Wed, 1 Sep 2010 14:29:26 -0700
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
Message-ID: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>

Hi Shalabh,

We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to  
use the HMMER3 version, as found here:

http://github.com/bioperl/bioperl-hmmer3

Hope this helps,
T

On Sep 1, 2010, at 1:56 PM, shalabh sharma wrote:

> Hi ,
>     I am trying to parse hmmsearch report (from HMMER3). I am using  
> the
> script mentioned here:
> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm
>
> I am not getting anything but this "amoA_10genes_align.fasta.2   
> [M=247] for
> HMM" as the output, i am not even getting any error.
> I am attaching the hmmsearch report (just a test report) which i  
> tried to
> test against the parser.
>
> I would really appreciate if anyone can help me out.
>
> Thanks
> Shalabh Sharma
> <test_amao.txt>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From kai.blin at biotech.uni-tuebingen.de  Thu Sep  2 04:44:58 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 2 Sep 2010 10:44:58 +0200
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
	<8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
Message-ID: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>

On Wed, 1 Sep 2010 14:29:26 -0700
Thomas Sharpton <thomas.sharpton at gmail.com> wrote:

Hi,

> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to  
> use the HMMER3 version, as found here:
> 
> http://github.com/bioperl/bioperl-hmmer3

Actually it's now included in the bioperl-live repository, but the code
hasn't made it into a release yet.

http://github.com/bioperl/bioperl-live.git

Cheers,
Kai
-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From e.stupka at ucl.ac.uk  Thu Sep  2 08:32:02 2010
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Thu, 2 Sep 2010 13:32:02 +0100
Subject: [Bioperl-l] git account
Message-ID: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>

Hello there,

I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any?

thanks!

Elia


---
'"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." 
~ Stephen Hawkings

Senior Lecturer, Bioinformatics
Scientific Director - Bioinformatics, UCL Genomics

UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Institute of Cell and Molecular Science
Barts and The London School of Medicine and Dentistry
4 Newark Street
Whitechapel
London
E1 2AT

Office (UCL): +44 207 679 6493
Fax: +44 0207 6796817 
Office (ICMS): +44 0207 8822374

Mobile: +44 787 6478912


From cjfields at illinois.edu  Thu Sep  2 10:29:40 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 2 Sep 2010 09:29:40 -0500
Subject: [Bioperl-l] git account
In-Reply-To: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>
References: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>
Message-ID: <C9AD72E7-32C2-48D0-A7AB-18F42BABD28D@illinois.edu>

Done!  Let us know if you run into problems.

chris

On Sep 2, 2010, at 7:32 AM, Elia Stupka wrote:

> Hello there,
> 
> I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any?
> 
> thanks!
> 
> Elia
> 
> 
> ---
> '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." 
> ~ Stephen Hawkings
> 
> Senior Lecturer, Bioinformatics
> Scientific Director - Bioinformatics, UCL Genomics
> 
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Institute of Cell and Molecular Science
> Barts and The London School of Medicine and Dentistry
> 4 Newark Street
> Whitechapel
> London
> E1 2AT
> 
> Office (UCL): +44 207 679 6493
> Fax: +44 0207 6796817 
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 787 6478912
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From J.Christopher.Ellis at duke.edu  Thu Sep  2 10:53:34 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Thu,  2 Sep 2010 10:53:34 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <53096.1283439214@duke.edu>

 Chris have you had any luck with this?

 Thanks,
 Chris

 On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent:
 Yes, I see that one. It may be the ID hash that is being returned is
empty. I'll look into it.

 -c 

 On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:

 > Hi Chris,
 > 
 > The error is...
 > 
 > "Use of uninitialized value $id in join or string at
C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."
 > 
 > The script from
http://bioperl.org/wiki/Species_names_from_accession_numbers is as
follows....
 > 
 > use Bio::DB::EUtilities;
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > my (%taxa, @taxa);
 > 
 > 
 > 
 > my (%names, %idmap);
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # these are protein ids; nuc ids will work by changing -dbfrom =>
'nucleotide',
 > 
 > 
 > 
 > # (probably)
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > my @ids = qw(1621261 89318838 68536103 
 > 
 > 20807972
 > 730439);
 > 
 > 
 > 
 > 
 > 
 > 
 > my $factory = Bio::DB::EUtilities->new(
 > 
 > -
 > eutil => 'elink',
 > 
 > 
 > -db => 'taxonomy',
 > 
 > 
 > 
 > 
 > -dbfrom => 'protein',
 > 
 > 
 > 
 > 
 > -correspondence => 1,
 > 
 > 
 > 
 > 
 > -id => @ids);
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # iterate through the LinkSet objects
 > 
 > 
 > 
 > while (my $ds = $factory->next_LinkSet) {
 > 
 > 
 > 
 > 
 > $taxa{($ds->get_submitted_ids)[0]
 > 
 > }
 > = ($ds->get_ids)[0]
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > @taxa = @taxa{@ids};
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > $factory = Bio::DB::EUtilities->new(-eutil 
 > 
 > =>
 > 'esummary',
 > 
 > 
 > -db => 'taxonomy',
 > 
 > 
 > 
 > 
 > -id => @taxa );
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > while (local $_ = $factory->next_DocSum)
 > 
 > 
 > {
 > 
 > 
 > $names{($_->get_contents_by_name('TaxId'))
 > 
 > [
 > 0]} = 
 > 
 > ($_->get_contents_by_name('ScientificName'))[0
 > 
 > ]
 > ;
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > foreach (@ids) {
 > 
 > 
 > 
 > 
 > $idmap{$_} = $names{$taxa{$_
 > 
 > }
 > };
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # %idmap is
 > 
 > 
 > 
 > # 1621261 => 'Mycobacterium tuberculosis H37Rv'
 > 
 > 
 > 
 > # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
 > 
 > 
 > 
 > # 68536103 => 'Corynebacterium jeikeium K411'
 > 
 > 
 > 
 > # 730439 => 'Bacillus caldolyticus'
 > 
 > 
 > 
 > # 89318838 => undef (this record has been removed from the db)
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 1;
 > 
 > 
 > Thanks,
 > 
 > 
 > 
 > Chris
 > 
 > 
 > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
 > Chris,
 > 
 > Regarding a fix for that script, we would have to see your modified
script and the error. However, there are modules within BioPerl to
essentially do what you want, in particular, Bio::DB::Taxonomy.
 > 
 > chris
 > 
 > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
 > 
 > > Hi All,
 > > 
 > > I am trying to extract the entire taxonomy of an organism including
the
 > > classifications. Some thing like...
 > > 
 > > Phylum:Proteobacteria, Class:Gammaproteobacteria,
Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
 > > 
 > > I am not worried about format just that I get the information and the
associated level of hierarchy. The script found
athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers
seemed like a good starting point so I copied it and tried run it but got
an error.
 > > 
 > > My first question is "Is there a known fix for this?" and my second
question is how do I get the full hierarchical information (as seen above)
with the taxonomy db?
 > > 
 > > Thanks for all your help in advance!
 > > 
 > > Chris 
 > > 
 > > 
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l at lists.open-bio.org
 > >
http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 

 
From cjfields at illinois.edu  Thu Sep  2 12:21:48 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 02 Sep 2010 11:21:48 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <53096.1283439214@duke.edu>
References: <53096.1283439214@duke.edu>
Message-ID: <1283444508.5339.10.camel@pyrimidine.igb.uiuc.edu>

Chris,

There are a few things wrong with the original script, so I'll fix them.
Basically, it makes the assumption that every ID in the original list is
found.  The problem: eutils only reports back data it finds, silently
discarding IDs that don't match.  So, using the original ID list when
building the hashes needs a bit more error checking.

Here's the revised script that works for me.

https://gist.github.com/f5db90a432fed68548d4

I'm also adding a check to ensure all IDs are defined prior to adding
them to the param string, just in case.

chris

On Thu, 2010-09-02 at 10:53 -0400, J. Christopher Ellis wrote:
> Chris have you had any luck with this?
> 
> Thanks,
> Chris
> 
> On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent:
>         Yes, I see that one. It may be the ID hash that is being
>         returned is empty. I'll look into it.
>         
>         -c 
>         
>         On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:
>         
>         > Hi Chris,
>         > 
>         > The error is...
>         > 
>         > "Use of uninitialized value $id in join or string at
>         C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm
>         line 363."
>         > 
>         > The script from
>         http://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows....
>         > 
>         > use Bio::DB::EUtilities;
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my (%taxa, @taxa);
>         > 
>         > 
>         > 
>         > my (%names, %idmap);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # these are protein ids; nuc ids will work by changing
>         -dbfrom => 'nucleotide',
>         > 
>         > 
>         > 
>         > # (probably)
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my @ids = qw(1621261 89318838 68536103 
>         > 
>         > 20807972
>         > 730439);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my $factory = Bio::DB::EUtilities->new(
>         > 
>         > -
>         > eutil => 'elink',
>         > 
>         > 
>         > -db => 'taxonomy',
>         > 
>         > 
>         > 
>         > 
>         > -dbfrom => 'protein',
>         > 
>         > 
>         > 
>         > 
>         > -correspondence => 1,
>         > 
>         > 
>         > 
>         > 
>         > -id => \@ids);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # iterate through the LinkSet objects
>         > 
>         > 
>         > 
>         > while (my $ds = $factory->next_LinkSet) {
>         > 
>         > 
>         > 
>         > 
>         > $taxa{($ds->get_submitted_ids)[0]
>         > 
>         > }
>         > = ($ds->get_ids)[0]
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > @taxa = @taxa{@ids};
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > $factory = Bio::DB::EUtilities->new(-eutil 
>         > 
>         > =>
>         > 'esummary',
>         > 
>         > 
>         > -db => 'taxonomy',
>         > 
>         > 
>         > 
>         > 
>         > -id => \@taxa );
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > while (local $_ = $factory->next_DocSum)
>         > 
>         > 
>         > {
>         > 
>         > 
>         > $names{($_->get_contents_by_name('TaxId'))
>         > 
>         > [
>         > 0]} = 
>         > 
>         > ($_->get_contents_by_name('ScientificName'))[0
>         > 
>         > ]
>         > ;
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > foreach (@ids) {
>         > 
>         > 
>         > 
>         > 
>         > $idmap{$_} = $names{$taxa{$_
>         > 
>         > }
>         > };
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # %idmap is
>         > 
>         > 
>         > 
>         > # 1621261 => 'Mycobacterium tuberculosis H37Rv'
>         > 
>         > 
>         > 
>         > # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
>         > 
>         > 
>         > 
>         > # 68536103 => 'Corynebacterium jeikeium K411'
>         > 
>         > 
>         > 
>         > # 730439 => 'Bacillus caldolyticus'
>         > 
>         > 
>         > 
>         > # 89318838 => undef (this record has been removed from the
>         db)
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 1;
>         > 
>         > 
>         > Thanks,
>         > 
>         > 
>         > 
>         > Chris
>         > 
>         > 
>         > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu
>         sent:
>         > Chris,
>         > 
>         > Regarding a fix for that script, we would have to see your
>         modified script and the error. However, there are modules
>         within BioPerl to essentially do what you want, in particular,
>         Bio::DB::Taxonomy.
>         > 
>         > chris
>         > 
>         > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
>         > 
>         > > Hi All,
>         > > 
>         > > I am trying to extract the entire taxonomy of an organism
>         including the
>         > > classifications. Some thing like...
>         > > 
>         > > Phylum:Proteobacteria, Class:Gammaproteobacteria,
>         Order:Enterobacteriales, Family:Enterobacteriaceae,
>         Genus:Escherichia
>         > > 
>         > > I am not worried about format just that I get the
>         information and the associated level of hierarchy. The script
>         found
>         http://bioperl.org/wiki/Species_names_from_accession_numbers%
>         26quot%3B%26gt%
>         3Bhttp://bioperl.org/wiki/Species_names_from_accession_numbers">athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
>         > > 
>         > > My first question is "Is there a known fix for this?" and
>         my second question is how do I get the full hierarchical
>         information (as seen above) with the taxonomy db?
>         > > 
>         > > Thanks for all your help in advance!
>         > > 
>         > > Chris 
>         > > 
>         > > 
>         > > _______________________________________________
>         > > Bioperl-l mailing list
>         > > Bioperl-l at lists.open-bio.org
>         > > http://lists.open-bio.org/mailman/listinfo/bioperl-l%
>         26quot%3B%26gt%
>         3Bhttp://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
>         > 
>         > 
>         
>         


From thomas.sharpton at gmail.com  Thu Sep  2 12:34:07 2010
From: thomas.sharpton at gmail.com (Thomas Sharpton)
Date: Thu, 2 Sep 2010 09:34:07 -0700
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
	<8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
	<20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <C272A232-C37D-40F9-B729-626B8DB17BB3@gmail.com>

So it is! I'm paying attention, I swear I am....

Shalabh, if the HMMER3 version of SearchIO doesn't solve your problem,  
do let us know.

Best,
Tom

On Sep 2, 2010, at 1:44 AM, Kai Blin wrote:

> On Wed, 1 Sep 2010 14:29:26 -0700
> Thomas Sharpton <thomas.sharpton at gmail.com> wrote:
>
> Hi,
>
>> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to
>> use the HMMER3 version, as found here:
>>
>> http://github.com/bioperl/bioperl-hmmer3
>
> Actually it's now included in the bioperl-live repository, but the  
> code
> hasn't made it into a release yet.
>
> http://github.com/bioperl/bioperl-live.git
>
> Cheers,
> Kai
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From johnny at mit.edu  Sat Sep  4 11:40:37 2010
From: johnny at mit.edu (Jonathan Rameseder)
Date: Sat, 4 Sep 2010 11:40:37 -0400
Subject: [Bioperl-l] Client-side Scansite Bioperl module
Message-ID: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>

hi guys

it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases.  maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D!

best wishes
johnny

[1] Bio::Tools::Analysis::Protein::Scansite
[2] http://www.ncbi.nlm.nih.gov/pubmed/11283593

********************
Jonathan Rameseder
Ph.D. Candidate
Computational Systems Biology Initiative
Koch Institute for Integrative Cancer Research
Massachusetts Institute of Technology
********************


From David.Messina at sbc.su.se  Mon Sep  6 08:14:20 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 6 Sep 2010 14:14:20 +0200
Subject: [Bioperl-l] Client-side Scansite Bioperl module
In-Reply-To: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
References: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
Message-ID: <0EA1C4B0-66CF-4AE3-9A47-CC6624737821@sbc.su.se>

Hi Jonathan,

Great to hear you're interested in including your code in BioPerl!

In general, we are liberal in what we accept. I think (and I'd like to hear what other BioPerlers think) the value of adding your code depends a lot on how it ties in with existing BioPerl objects ? does it make use of Bio::Seq or Bio::SeqIO, for example?

If you haven't already, you might want to take a look at some of our developer documentation. For example:
 
	http://www.bioperl.org/wiki/Bioperl_Best_Practices
	http://www.bioperl.org/wiki/Advanced_BioPerl


Also, the other thing to be aware of is that in the near future BioPerl itself will be splitting up into separately distributed modules anyway.

I can't find a good recent thread that discussed the rationale and details, but here's a couple anyway:

	http://www.bioperl.org/wiki/Proposed_BioPerl_changes
	http://old.nabble.com/Final-BioPerl-1.6-release-td29180027.html#a29195208


Dave


From ross at cuhk.edu.hk  Tue Sep  7 04:28:00 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Tue, 7 Sep 2010 16:28:00 +0800
Subject: [Bioperl-l]  Indexing nr database
In-Reply-To: <C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
Message-ID: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>

By the following codes, I wanna index the 4G nr database, however, the index
file is > 1T and the job has been running for weeks and still hasn't
finished. Could anybody tell me how you accomplish the goal? Thanks in
advance.

    use strict;

     use Bio::DB::Flat::BinarySearch;

 
     (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV;

 
     # use single quotes so you don't have to write

     # regular expressions like "gi\\|(\\d+)"

     #my $primary_pattern = '^>(\S+)';

     #if ($fullHeader == 1) {

        my $primary_pattern = '^>(.+)';

     #}

     my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
H37Rv complete genome";
#$string =~ s/$primary_pattern/RRR/g;

     #print "$string\n";

 
     # one or more patterns stored in a hash:

     my $secondary_patterns = {GI => 'gi\|(\d+)'};

 
     my $db = Bio::DB::Flat::BinarySearch->new(

                           -directory          => $baseDir,

                           -dbname             => $dbName,

                           -write_flag         => 1,

                           -primary_pattern    => $primary_pattern,

                           -primary_namespace  => 'ACC',

                           -secondary_patterns => $secondary_patterns,

                           -verbose            => 1,

                           -format             => 'fasta'  );

 
     $db->build_index($seqFile);

                                              
From David.Messina at sbc.su.se  Tue Sep  7 05:23:42 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 7 Sep 2010 11:23:42 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
Message-ID: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>

Hi Ross,

What do you need the index for?

If it's random retrieval of sequences using an accession or GI, you'd be better off using NCBI's own database indexing and retrieval tools. They're far faster than BioPerl.

They're distributed with Blast+ and available here:

	ftp://ftp.ncbi.nlm.nih.gov//blast/executables/LATEST

Specifically, I'm talking about 'makeblastdb' and blastdbcmd'.


I'm not sure what you mean by "4g" nr, but there's an already-indexed version of nr available here:

		ftp://ftp.ncbi.nih.gov//blast/db

You can use that directly with the BLAST+ database tools.


Also, you take a look at the cookbook at the end of the Blast+ user manual (available in the same download directory as Blast+ itself). Some nice examples there showing off the flexibility of this latest version of the software.


Dave


From ross at cuhk.edu.hk  Tue Sep  7 05:18:16 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Tue, 7 Sep 2010 17:18:16 +0800
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <4C860148.3030000@fmi.ch>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch>
Message-ID: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>

The reason is that I have to retrieve the specific information of the
matched sequences, e.g. extract the 64th amino acid of the top matched
sequence. Is there any way to achieve that?

-----Original Message-----
From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] 
Sent: Tuesday, September 07, 2010 5:09 PM
To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk
Subject: Re: [Bioperl-l] Indexing nr database

Hi


why don't you use the pre-indexed BLAST files from NCBI:

ftp://ftp.ncbi.nih.gov/blast/db/

you can use them to fetch individual sequences by gi number or accession 
with the tool "blastdbcmd" from blast+ binaries:

ftp://ftp.ncbi.nih.gov/blast/executables/blast+/


regards, Hans


On 09/07/2010 10:28 AM, Ross KK Leung wrote:
> By the following codes, I wanna index the 4G nr database, however, the
index
> file is>  1T and the job has been running for weeks and still hasn't
> finished. Could anybody tell me how you accomplish the goal? Thanks in
> advance.
>
>      use strict;
>
>       use Bio::DB::Flat::BinarySearch;
>
>
>
>       (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) =
@ARGV;
>
>
>
>       # use single quotes so you don't have to write
>
>       # regular expressions like "gi\\|(\\d+)"
>
>       #my $primary_pattern = '^>(\S+)';
>
>       #if ($fullHeader == 1) {
>
>          my $primary_pattern = '^>(.+)';
>
>       #}
>
>       my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
> H37Rv complete genome";
> #$string =~ s/$primary_pattern/RRR/g;
>
>       #print "$string\n";
>
>
>
>       # one or more patterns stored in a hash:
>
>       my $secondary_patterns = {GI =>  'gi\|(\d+)'};
>
>
>
>       my $db = Bio::DB::Flat::BinarySearch->new(
>
>                             -directory          =>  $baseDir,
>
>                             -dbname             =>  $dbName,
>
>                             -write_flag         =>  1,
>
>                             -primary_pattern    =>  $primary_pattern,
>
>                             -primary_namespace  =>  'ACC',
>
>                             -secondary_patterns =>  $secondary_patterns,
>
>                             -verbose            =>  1,
>
>                             -format             =>  'fasta'  );
>
>
>
>       $db->build_index($seqFile);
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hrh at fmi.ch  Tue Sep  7 05:09:28 2010
From: hrh at fmi.ch (Hans-Rudolf Hotz)
Date: Tue, 07 Sep 2010 11:09:28 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
Message-ID: <4C860148.3030000@fmi.ch>

Hi


why don't you use the pre-indexed BLAST files from NCBI:

ftp://ftp.ncbi.nih.gov/blast/db/

you can use them to fetch individual sequences by gi number or accession 
with the tool "blastdbcmd" from blast+ binaries:

ftp://ftp.ncbi.nih.gov/blast/executables/blast+/


regards, Hans


On 09/07/2010 10:28 AM, Ross KK Leung wrote:
> By the following codes, I wanna index the 4G nr database, however, the index
> file is>  1T and the job has been running for weeks and still hasn't
> finished. Could anybody tell me how you accomplish the goal? Thanks in
> advance.
>
>      use strict;
>
>       use Bio::DB::Flat::BinarySearch;
>
>
>
>       (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV;
>
>
>
>       # use single quotes so you don't have to write
>
>       # regular expressions like "gi\\|(\\d+)"
>
>       #my $primary_pattern = '^>(\S+)';
>
>       #if ($fullHeader == 1) {
>
>          my $primary_pattern = '^>(.+)';
>
>       #}
>
>       my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
> H37Rv complete genome";
> #$string =~ s/$primary_pattern/RRR/g;
>
>       #print "$string\n";
>
>
>
>       # one or more patterns stored in a hash:
>
>       my $secondary_patterns = {GI =>  'gi\|(\d+)'};
>
>
>
>       my $db = Bio::DB::Flat::BinarySearch->new(
>
>                             -directory          =>  $baseDir,
>
>                             -dbname             =>  $dbName,
>
>                             -write_flag         =>  1,
>
>                             -primary_pattern    =>  $primary_pattern,
>
>                             -primary_namespace  =>  'ACC',
>
>                             -secondary_patterns =>  $secondary_patterns,
>
>                             -verbose            =>  1,
>
>                             -format             =>  'fasta'  );
>
>
>
>       $db->build_index($seqFile);
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From hrh at fmi.ch  Tue Sep  7 05:33:46 2010
From: hrh at fmi.ch (Hans-Rudolf Hotz)
Date: Tue, 07 Sep 2010 11:33:46 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch>
	<007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>
Message-ID: <4C8606FA.3000509@fmi.ch>


On 09/07/2010 11:18 AM, Ross KK Leung wrote:
> The reason is that I have to retrieve the specific information of the
> matched sequences, e.g. extract the 64th amino acid of the top matched
> sequence. Is there any way to achieve that?

"blastdbcmd" has several options like "-range"

and even if "blastdbcmd" does not give you the subset of information you 
want to fetch, I am still convinced you are quicker by fetching the 
complete entry with"blastdbcmd" and then parse the required data out of 
just one entry.

Hans

> -----Original Message-----
> From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch]
> Sent: Tuesday, September 07, 2010 5:09 PM
> To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk
> Subject: Re: [Bioperl-l] Indexing nr database
>
> Hi
>
>
> why don't you use the pre-indexed BLAST files from NCBI:
>
> ftp://ftp.ncbi.nih.gov/blast/db/
>
> you can use them to fetch individual sequences by gi number or accession
> with the tool "blastdbcmd" from blast+ binaries:
>
> ftp://ftp.ncbi.nih.gov/blast/executables/blast+/
>
>
> regards, Hans
>
>
>
> On 09/07/2010 10:28 AM, Ross KK Leung wrote:
>> By the following codes, I wanna index the 4G nr database, however, the
> index
>> file is>   1T and the job has been running for weeks and still hasn't
>> finished. Could anybody tell me how you accomplish the goal? Thanks in
>> advance.
>>
>>       use strict;
>>
>>        use Bio::DB::Flat::BinarySearch;
>>
>>
>>
>>        (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) =
> @ARGV;
>>
>>
>>
>>        # use single quotes so you don't have to write
>>
>>        # regular expressions like "gi\\|(\\d+)"
>>
>>        #my $primary_pattern = '^>(\S+)';
>>
>>        #if ($fullHeader == 1) {
>>
>>           my $primary_pattern = '^>(.+)';
>>
>>        #}
>>
>>        my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
>> H37Rv complete genome";
>> #$string =~ s/$primary_pattern/RRR/g;
>>
>>        #print "$string\n";
>>
>>
>>
>>        # one or more patterns stored in a hash:
>>
>>        my $secondary_patterns = {GI =>   'gi\|(\d+)'};
>>
>>
>>
>>        my $db = Bio::DB::Flat::BinarySearch->new(
>>
>>                              -directory          =>   $baseDir,
>>
>>                              -dbname             =>   $dbName,
>>
>>                              -write_flag         =>   1,
>>
>>                              -primary_pattern    =>   $primary_pattern,
>>
>>                              -primary_namespace  =>   'ACC',
>>
>>                              -secondary_patterns =>   $secondary_patterns,
>>
>>                              -verbose            =>   1,
>>
>>                              -format             =>   'fasta'  );
>>
>>
>>
>>        $db->build_index($seqFile);
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From fs5 at sanger.ac.uk  Tue Sep  7 08:09:52 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Tue, 07 Sep 2010 13:09:52 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
Message-ID: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>

I am working a lot with feature-rich Bio::Seq objects these days and
thought that it would be really nice if I could do something like:

my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene');

instead of having to grep for the feature every time. 
There could then be 'by_tag' and 'by_region' options as well. 

According to the Bio::Seq docs, something like this seems to be planned
at some stage. I would be willing to contribute to this feature if I can
and if this isn't already being implemented by somebody else. 
Does anybody know the state of this feature?

Frank


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From jason at bioperl.org  Tue Sep  7 13:36:07 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 07 Sep 2010 10:36:07 -0700
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <4C867807.2040907@bioperl.org>

And the implementation would just be something like this?

my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
eq 'my_gene' } $seq->get_SeqFeatures();

I think any implementation would be if we moved from the in-memory 
arrays & hash-based system to a sqlite db on the back-end for how 
Sequence and Feature objects are stored.
This would be a somewhat slower but wouldn't have performance/memory 
problems we get for sequences with many annotations.

-jason
Frank Schwach wrote, On 9/7/10 5:09 AM:
> I am working a lot with feature-rich Bio::Seq objects these days and
> thought that it would be really nice if I could do something like:
>
> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>
> instead of having to grep for the feature every time.
> There could then be 'by_tag' and 'by_region' options as well.
>
> According to the Bio::Seq docs, something like this seems to be planned
> at some stage. I would be willing to contribute to this feature if I can
> and if this isn't already being implemented by somebody else.
> Does anybody know the state of this feature?
>
> Frank
>
>
>
>
>
>
>    

From fs5 at sanger.ac.uk  Wed Sep  8 04:42:57 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 08 Sep 2010 09:42:57 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <4C867807.2040907@bioperl.org>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
Message-ID: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Jason,

Yes, I guess that would be the simplest way of doing it - basically just
doing it the way the docs suggest for getting at a specific feature but
hiding the grep behind a Bio::Seq method with search parameters. But we
could also build a hash of feature tags as the Bio::Seq is built so that
retrieval is more efficient. This could also be used to implement a bin
indexing scheme for range queries, similar to what Bio::DB::GFF does.
Is a move to an sqlite backend planend for the near future? 

Frank


On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
> And the implementation would just be something like this?
> 
> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
> eq 'my_gene' } $seq->get_SeqFeatures();
> 
> I think any implementation would be if we moved from the in-memory 
> arrays & hash-based system to a sqlite db on the back-end for how 
> Sequence and Feature objects are stored.
> This would be a somewhat slower but wouldn't have performance/memory 
> problems we get for sequences with many annotations.
> 
> -jason
> Frank Schwach wrote, On 9/7/10 5:09 AM:
> > I am working a lot with feature-rich Bio::Seq objects these days and
> > thought that it would be really nice if I could do something like:
> >
> > my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
> >
> > instead of having to grep for the feature every time.
> > There could then be 'by_tag' and 'by_region' options as well.
> >
> > According to the Bio::Seq docs, something like this seems to be planned
> > at some stage. I would be willing to contribute to this feature if I can
> > and if this isn't already being implemented by somebody else.
> > Does anybody know the state of this feature?
> >
> > Frank
> >
> >
> >
> >
> >
> >
> >    


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From stefan.kirov at bms.com  Wed Sep  8 11:09:55 2010
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 08 Sep 2010 11:09:55 -0400
Subject: [Bioperl-l] Another interesting Javascript library
Message-ID: <4C87A743.5010109@bms.com>


    Sorry for off topic, but I believe a lot of people can
    find this quite useful:

        "CanvasXpress is a javascript library based on the
        <canvas> tag implemented in HTML5. I developed this
        library as the core visualization component for our
        BMS systems biology platform which I hope to release
        soon. The basic idea was to have generic and simple
        way to display genomics data. CanvasXpress supports
        bar graphs, line graphs, bar-line combination
        graphs, boxplots, dotplots, area graphs, stacked
        graphs, percentage-stacked graphs, correlation
        plots, Venn diagrams, heatmaps, newick trees,
        2D-scatter plots, 2D-scatter bubble plots,
        3D-scatter plots, pie charts, networks (or
        pathways), and a genome browser. It also supports a
        few data transformations like log and exponential
        transformation, z-score, percentile transformation
        and ratio. It also support grouping of samples,
        zooming, events ... yada, yada, yada ... and more
        importantly I created an Ext panel for it. Take a look.

        http://canvasxpress.org/"


    Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_kirov.vcf
Type: text/x-vcard
Size: 207 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100908/30205812/attachment.vcf>

From alperyilmaz at gmail.com  Wed Sep  8 12:47:42 2010
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Wed, 8 Sep 2010 12:47:42 -0400
Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates
Message-ID: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>

Hi,

I have a GFF file listing mRNA and CDS coordinates for every
transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates
based on that information. I was wondering, if there's already made
script for that purpose that you're aware of.

I already uploaded the GFF file into Bio::DB::SeqFeature database, so
I can utilize both flat file or database based scripts.

thanks,

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954

From cjfields at illinois.edu  Wed Sep  8 19:20:09 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 8 Sep 2010 18:20:09 -0500
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>

Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Thu Sep  9 01:51:53 2010
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 08 Sep 2010 22:51:53 -0700
Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates
In-Reply-To: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>
References: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>
Message-ID: <4C8875F9.6020502@bioperl.org>

Hi Alper -

This script operates on gtf so doesn't quite do what you want but could 
be modified to be simpler to just look at the CDS and mRNA rather than 
the exon,start/stop codon info
   
http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl

Otherwise I think there make be some easy ways to do this from some 
tools in MAKER too.

-jason
Alper Yilmaz wrote, On 9/8/10 9:47 AM:
> Hi,
>
> I have a GFF file listing mRNA and CDS coordinates for every
> transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates
> based on that information. I was wondering, if there's already made
> script for that purpose that you're aware of.
>
> I already uploaded the GFF file into Bio::DB::SeqFeature database, so
> I can utilize both flat file or database based scripts.
>
> thanks,
>
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>    

From fs5 at sanger.ac.uk  Thu Sep  9 04:10:36 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 09 Sep 2010 09:10:36 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
	<03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
Message-ID: <1284019836.4777.281.camel@deskpro15336.dynamic.sanger.ac.uk>

so something like an abstract Bio::Seq::FeatureContainer that defines
the methods for storing and retrieving features and that would then be
sub-classed to e.g. Bio::Seq::FeatureContainer::Memory or
Bio::Seq::FeatureContainer:Sqlite - is that the plan? Is there any way I
can get involved or is it better to wait for other features to be
developed first?

Cheers,

Frank


On Wed, 2010-09-08 at 18:20 -0500, Chris Fields wrote:
> Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).
> 
> chris
> 
> On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:
> 
> > Hi Jason,
> > 
> > Yes, I guess that would be the simplest way of doing it - basically just
> > doing it the way the docs suggest for getting at a specific feature but
> > hiding the grep behind a Bio::Seq method with search parameters. But we
> > could also build a hash of feature tags as the Bio::Seq is built so that
> > retrieval is more efficient. This could also be used to implement a bin
> > indexing scheme for range queries, similar to what Bio::DB::GFF does.
> > Is a move to an sqlite backend planend for the near future? 
> > 
> > Frank
> > 
> > 
> > 
> > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
> >> And the implementation would just be something like this?
> >> 
> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
> >> eq 'my_gene' } $seq->get_SeqFeatures();
> >> 
> >> I think any implementation would be if we moved from the in-memory 
> >> arrays & hash-based system to a sqlite db on the back-end for how 
> >> Sequence and Feature objects are stored.
> >> This would be a somewhat slower but wouldn't have performance/memory 
> >> problems we get for sequences with many annotations.
> >> 
> >> -jason
> >> Frank Schwach wrote, On 9/7/10 5:09 AM:
> >>> I am working a lot with feature-rich Bio::Seq objects these days and
> >>> thought that it would be really nice if I could do something like:
> >>> 
> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
> >>> 
> >>> instead of having to grep for the feature every time.
> >>> There could then be 'by_tag' and 'by_region' options as well.
> >>> 
> >>> According to the Bio::Seq docs, something like this seems to be planned
> >>> at some stage. I would be willing to contribute to this feature if I can
> >>> and if this isn't already being implemented by somebody else.
> >>> Does anybody know the state of this feature?
> >>> 
> >>> Frank
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> > 
> > 
> > 
> > -- 
> > The Wellcome Trust Sanger Institute is operated by Genome Research 
> > Limited, a charity registered in England with number 1021457 and a 
> > company registered in England with number 2742969, whose registered 
> > office is 215 Euston Road, London, NW1 2BE. 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From jun.yin at ucd.ie  Thu Sep  9 04:20:39 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Thu, 09 Sep 2010 09:20:39 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
	<03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
Message-ID: <00ea01cb4ff7$e30652f0$a912f8d0$%yin@ucd.ie>

Hi,

I would like to give a go on the bin indexing scheme on Bio::Seq(or a
similar package to Bio::LocatableSeq). The idea is to save the index of
sequences to a local database (AnyDBM) instead of the memory itself. So this
will free some memory usage. This idea actually comes from Bio::DB::Fasta,
as implemented by Lincoln Stein.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Thursday, September 09, 2010 12:20 AM
To: Frank Schwach
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Seq, search for specific features

Well, no move has been concretely made yet.  It would be nice to abstract
the backend, so one could use possibly any db or memory adaptor.  This is
essentially the direction I would like to take the alignment data as well
(part of the GSoC project for BioPerl this year was to tackle this very
thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From s1012635 at student.hsleiden.nl  Thu Sep  9 05:27:23 2010
From: s1012635 at student.hsleiden.nl (_Lelieveld, Stefan - s1012635)
Date: Thu, 9 Sep 2010 11:27:23 +0200 (CEST)
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>

Hi, 

I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). 
I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. 

http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
use Bio::Tools::Tmhmm; 
my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
while( my $tmhmm_feat = $parser->next_result ) { 
#do something 
#eg 
push @tmhmm_feat, $tmhmm_feat; 
} 

How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 

cheers! 

Stefan Lelieveld 


From fs5 at sanger.ac.uk  Thu Sep  9 06:28:51 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 09 Sep 2010 11:28:51 +0100
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
References: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <1284028131.4777.290.camel@deskpro15336.dynamic.sanger.ac.uk>

I haven't used that module myself but it appears to be a parser for
results from TMHMM, i.e. you don't feed it the FASTA file but the output
from TMHMM after it was run.
To run TMHMM you should use Bio::Tools::Run::Tmhmm
http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Tmhmm.pm

Follow the synopsis to feed the tool with your sequences. You can learn
how to read a FASTA file and access each sequence in a loop here: 
http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples

Essentially it boils down to:

use Bio::SeqIO;
my $file = shift; # to get a file path from command line
my $inseq = Bio::SeqIO->new(-file   => "<$file",-format => 'FASTA' );
while (my $seq = $inseq->next_seq) {
  print $seq->accession_number,"\n";
}

as an example for printing out accession numbers from $seq, which is a Bio::Seq object. 
So what you have to do now is to feed each of those Bio::Seq objects into your TMHMM runner. 

Frank


On Thu, 2010-09-09 at 11:27 +0200, _Lelieveld, Stefan - s1012635 wrote:
> Hi, 
> 
> I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). 
> I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. 
> 
> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
> use Bio::Tools::Tmhmm; 
> my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
> while( my $tmhmm_feat = $parser->next_result ) { 
> #do something 
> #eg 
> push @tmhmm_feat, $tmhmm_feat; 
> } 
> 
> How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 
> 
> cheers! 
> 
> Stefan Lelieveld 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From kai.blin at biotech.uni-tuebingen.de  Thu Sep  9 06:16:08 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 9 Sep 2010 12:16:08 +0200
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
References: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl>
	<814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <20100909121608.2571bbff.kai.blin@biotech.uni-tuebingen.de>

On Thu, 9 Sep 2010 11:27:23 +0200 (CEST)
"_Lelieveld, Stefan - s1012635" <s1012635 at student.hsleiden.nl> wrote:

Hi Stefan,

> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
> use Bio::Tools::Tmhmm; 
> my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
> while( my $tmhmm_feat = $parser->next_result ) { 
> #do something 
> #eg 
> push @tmhmm_feat, $tmhmm_feat; 
> } 
> 
> How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 

You need to run TMHMM first, of course. Bio::Tools::Tmhmm only parses
the TMHMM output file and returns an object that you can ask for
Bio::SeqFeature objects. So if you want to run TMHMM on some fasta
files, this module isn't going to do that for you.

Assuming that input.txt contains the TMHMM output,
"""
my $parser = new Bio::Tools:Tmhmm(-file => "input.txt");
"""
will load parse the TMHMM output for you.

HTH,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From elanorbust2 at yahoo.com  Thu Sep  9 12:10:06 2010
From: elanorbust2 at yahoo.com (sally roberts)
Date: Thu, 9 Sep 2010 09:10:06 -0700 (PDT)
Subject: [Bioperl-l] standaloneblastplus
Message-ID: <154453.73718.qm@web37504.mail.mud.yahoo.com>

I am running a test for standaloneblastplus but getting data back that does not exist in my query or my local database. Below is a outline of my script small database, query list, and erroneous results. As you will notice the query list is comprised of the first four sequences found in the database. The results say it can not find the first two and then the mathces for the last two do not exist!

Thanks for any help! 


Program 


#!/usr/bin/perl

use Bio::Tools::Run::StandAloneBlastPlus;


$fac = Bio::Tools::Run::StandAloneBlastPlus->new(
? -db_name => 'ITS',
? -db_data => 'smallDB.fas',
? -create => 1
);

$result = $fac->blastn( -query => , 'sequences.fasta',
??????????????????????? -outfile => 'ITStest2.bls');


smallDB.fas Data

>302585252|HM807352|Waitea circinata? internal transcribed spacer 1
ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA

>302585252|HM807352|Waitea circinata? internal transcribed spacer 2
GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA

>302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC

>302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA

>302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGCGCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCATTAAAGCGGAGGAA

>301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1
GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGCGCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTATATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAGAATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCTACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACCTCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTGACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGTGAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCGTTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCGTCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTACCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAGTCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATATATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGAACGCA
CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTAAGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCTTATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA

>301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTGGGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGAATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGAAACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGAAATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTAACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGACTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAAGTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACGTAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGAAACTTTTATATATAGTTCGCCGAATAATAGCGAAC

>301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer 1
AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCGCTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATCGTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTGAGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTCGAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCGAGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACCGGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTGGCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCCCTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTGTATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATATGAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATGTATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCTATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC

>301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1
AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCTTTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATCTCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCGGGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAATAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGAAACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAATATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCCGCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTCGTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATAGTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATACAAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCATAAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG

>301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1
AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGCTCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATTCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTCTTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCGATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGAGATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAATTACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAACGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGTAGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATATATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG


sequences.fasta data

>Test1
ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA

>Test2
GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA

>Test3
CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC

>Test4
GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA


Results

BLASTN 2.2.24+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.


Database: ITS
?????????? 5 sequences; 1,102 total letters


Query=? Test1
Length=204


***** No hits found *****


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 202071


Query=? Test2
Length=192


***** No hits found *****


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 189507


Query=? Test3
Length=437
????????????????????????????????????????????????????????????????????? Score???? E
Sequences producing significant alignments:????????????????????????? (Bits)? Value

dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 300??? 2e-085
dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 6e-016
dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012
dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 4e-012


>dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G59F
Length=203

?Score =? 300 bits (162),? Expect = 2e-085
?Identities = 176/182 (96%), Gaps = 4/182 (2%)
?Strand=Plus/Plus

Query? 10?? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC? 66
??????????? ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC? 81

Query? 67?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 126
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 82?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 141

Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 201

Query? 187? GG? 188
??????????? ||
Sbjct? 202? GG? 203


>dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G64F
Length=217

?Score = 69.4 bits (37),? Expect = 6e-016
?Identities = 39/40 (97%), Gaps = 0/40 (0%)
?Strand=Plus/Plus

Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188
??????????? ||||| ||||||||||||||||||||||||||||||||||
Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217


>dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G60F
Length=206

?Score = 58.4 bits (31),? Expect = 1e-012
?Identities = 39/42 (92%), Gaps = 3/42 (7%)
?Strand=Plus/Plus

Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? |||| || ||| ||||||||||||||||||||||||||||||
Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204


>dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G65F
Length=256

?Score = 56.5 bits (30),? Expect = 4e-012
?Identities = 30/30 (100%), Gaps = 0/30 (0%)
?Strand=Plus/Plus

Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? ||||||||||||||||||||||||||||||
Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 442850


Query=? Test4
Length=521
????????????????????????????????????????????????????????????????????? Score???? E
Sequences producing significant alignments:????????????????????????? (Bits)? Value

dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 309??? 4e-088
dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 7e-016
dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012
dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 5e-012


>dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G59F
Length=203

?Score =? 309 bits (167),? Expect = 4e-088
?Identities = 177/181 (97%), Gaps = 3/181 (1%)
?Strand=Plus/Plus

Query? 7??? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA? 63
??????????? ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA? 82

Query? 64?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 123
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 83?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 142

Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 183
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 202

Query? 184? G? 184
??????????? |
Sbjct? 203? G? 203


>dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G64F
Length=217

?Score = 69.4 bits (37),? Expect = 7e-016
?Identities = 39/40 (97%), Gaps = 0/40 (0%)
?Strand=Plus/Plus

Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184
??????????? ||||| ||||||||||||||||||||||||||||||||||
Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217


>dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G60F
Length=206

?Score = 58.4 bits (31),? Expect = 1e-012
?Identities = 39/42 (92%), Gaps = 3/42 (7%)
?Strand=Plus/Plus

Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182
??????????? |||| || ||| ||||||||||||||||||||||||||||||
Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204


>dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G65F
Length=256

?Score = 56.5 bits (30),? Expect = 5e-012
?Identities = 30/30 (100%), Gaps = 0/30 (0%)
?Strand=Plus/Plus

Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182
??????????? ||||||||||||||||||||||||||||||
Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 530378


? Database: ITS
??? Posted date:? Aug 27, 2010? 9:43 AM
? Number of letters in database: 1,102
? Number of sequences in database:? 5


Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 2.5


From jaya1786 at gmail.com  Thu Sep  9 12:59:51 2010
From: jaya1786 at gmail.com (jayanthijayakumar)
Date: Thu, 9 Sep 2010 22:29:51 +0530
Subject: [Bioperl-l] Regarding GSoC 2010
Message-ID: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>

Respected sir/madam,

I am Jayanthi Jayakumar doing my second year MS(By Research) in
computational biology in Anna University Chennai,India.

Iam very  much interested to participate in GSoC 2010 under the project
"Major Bioperl recognition".

I request you to provide details and eligiblity criteria for the same.


Thanking you,


yours faithfully,
Jayanthi Jayakumar

From Russell.Smithies at agresearch.co.nz  Thu Sep  9 18:54:43 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 10 Sep 2010 10:54:43 +1200
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <154453.73718.qm@web37504.mail.mud.yahoo.com>
References: <154453.73718.qm@web37504.mail.mud.yahoo.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz>

Is that a typo in your email or are some of your fasta headers in your db incorrect?
Eg.
>301333052|GU725063|Xiphinema adenohystherum  internal transcribed
>301333052|GU725063|spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Shouldn't that be:
>301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Maybe the invalid fasta headers are breaking the db formatter?


Russell Smithies

Technical Support
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz
Invermay  Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T  +64 3 489 3809
F  +64 3 489 9174
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of sally roberts
> Sent: Friday, 10 September 2010 4:10 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] standaloneblastplus
>
> I am running a test for standaloneblastplus but getting data back that
> does not exist in my query or my local database. Below is a outline of my
> script small database, query list, and erroneous results. As you will
> notice the query list is comprised of the first four sequences found in
> the database. The results say it can not find the first two and then the
> mathces for the last two do not exist!
>
> Thanks for any help!
>
>
>
> Program
>
>
> #!/usr/bin/perl
>
> use Bio::Tools::Run::StandAloneBlastPlus;
>
>
> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>   -db_name => 'ITS',
>   -db_data => 'smallDB.fas',
>   -create => 1
> );
>
> $result = $fac->blastn( -query => , 'sequences.fasta',
>                         -outfile => 'ITStest2.bls');
>
>
> smallDB.fas Data
>
> >302585252|HM807352|Waitea circinata  internal transcribed spacer 1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >302585252|HM807352|Waitea circinata  internal transcribed spacer 2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >302585250|HM802273|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >302585249|HM802272|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
> >302585248|HM802271|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
> ATATCATTAAAGCGGAGGAA
>
> >301333053|GU725064|Xiphinema turcicum  internal transcribed spacer 1
> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
> ACGCA
> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>
> >301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer
> 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>
> >301333051|GU725062|Xiphinema sphaerocephalum  internal transcribed spacer
> 1
> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>
> >301333050|GU725061|Xiphinema hispanum  internal transcribed spacer 1
> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>
> >301333049|GU725060|Xiphinema pyrenaicum  internal transcribed spacer 1
> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>
>
>
> sequences.fasta data
>
> >Test1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >Test2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >Test3
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >Test4
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
>
>
>
> Results
>
> BLASTN 2.2.24+
>
>
> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
> Comput Biol 2000; 7(1-2):203-14.
>
>
>
> Database: ITS
>            5 sequences; 1,102 total letters
>
>
>
> Query=  Test1
> Length=204
>
>
> ***** No hits found *****
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 202071
>
>
> Query=  Test2
> Length=192
>
>
> ***** No hits found *****
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 189507
>
>
> Query=  Test3
> Length=437
>
> Score     E
> Sequences producing significant alignments:
> (Bits)  Value
>
> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 300    2e-085
> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4    6e-016
> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4    1e-012
> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5    4e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>  Score =  300 bits (162),  Expect = 2e-085
>  Identities = 176/182 (96%), Gaps = 4/182 (2%)
>  Strand=Plus/Plus
>
> Query  10   TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
> 66
>             ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
> 81
>
> Query  67   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 126
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  82   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 141
>
> Query  127  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 186
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  142  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 201
>
> Query  187  GG  188
>             ||
> Sbjct  202  GG  203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>  Score = 69.4 bits (37),  Expect = 6e-016
>  Identities = 39/40 (97%), Gaps = 0/40 (0%)
>  Strand=Plus/Plus
>
> Query  149  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  188
>             ||||| ||||||||||||||||||||||||||||||||||
> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>  Score = 58.4 bits (31),  Expect = 1e-012
>  Identities = 39/42 (92%), Gaps = 3/42 (7%)
>  Strand=Plus/Plus
>
> Query  146  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  186
>             |||| || ||| ||||||||||||||||||||||||||||||
> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>  Score = 56.5 bits (30),  Expect = 4e-012
>  Identities = 30/30 (100%), Gaps = 0/30 (0%)
>  Strand=Plus/Plus
>
> Query  157  AAAACTTTCAACAACGGATCTCTTGGTTCT  186
>             ||||||||||||||||||||||||||||||
> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 442850
>
>
> Query=  Test4
> Length=521
>
> Score     E
> Sequences producing significant alignments:
> (Bits)  Value
>
> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 309    4e-088
> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4    7e-016
> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4    1e-012
> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5    5e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>  Score =  309 bits (167),  Expect = 4e-088
>  Identities = 177/181 (97%), Gaps = 3/181 (1%)
>  Strand=Plus/Plus
>
> Query  7    TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 63
>             ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 82
>
> Query  64   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 123
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  83   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 142
>
> Query  124  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 183
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  143  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 202
>
> Query  184  G  184
>             |
> Sbjct  203  G  203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>  Score = 69.4 bits (37),  Expect = 7e-016
>  Identities = 39/40 (97%), Gaps = 0/40 (0%)
>  Strand=Plus/Plus
>
> Query  145  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  184
>             ||||| ||||||||||||||||||||||||||||||||||
> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>  Score = 58.4 bits (31),  Expect = 1e-012
>  Identities = 39/42 (92%), Gaps = 3/42 (7%)
>  Strand=Plus/Plus
>
> Query  142  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  182
>             |||| || ||| ||||||||||||||||||||||||||||||
> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>  Score = 56.5 bits (30),  Expect = 5e-012
>  Identities = 30/30 (100%), Gaps = 0/30 (0%)
>  Strand=Plus/Plus
>
> Query  153  AAAACTTTCAACAACGGATCTCTTGGTTCT  182
>             ||||||||||||||||||||||||||||||
> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 530378
>
>
>   Database: ITS
>     Posted date:  Aug 27, 2010  9:43 AM
>   Number of letters in database: 1,102
>   Number of sequences in database:  5
>
>
>
> Matrix: blastn matrix 1 -2
> Gap Penalties: Existence: 0, Extension: 2.5
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From elanorbust2 at yahoo.com  Fri Sep 10 11:13:08 2010
From: elanorbust2 at yahoo.com (sally roberts)
Date: Fri, 10 Sep 2010 08:13:08 -0700 (PDT)
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz>
Message-ID: <23696.14536.qm@web37508.mail.mud.yahoo.com>

I think that is just a email error. Thanks for looking though!

--- On Thu, 9/9/10, Smithies, Russell <Russell.Smithies at agresearch.co.nz> wrote:

From: Smithies, Russell <Russell.Smithies at agresearch.co.nz>
Subject: RE: [Bioperl-l] standaloneblastplus
To: "'sally roberts'" <elanorbust2 at yahoo.com>, "'bioperl-l at lists.open-bio.org'" <bioperl-l at lists.open-bio.org>
Date: Thursday, September 9, 2010, 6:54 PM

Is that a typo in your email or are some of your fasta headers in your db incorrect?
Eg.
>301333052|GU725063|Xiphinema adenohystherum? internal transcribed
>301333052|GU725063|spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Shouldn't that be:
>301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Maybe the invalid fasta headers are breaking the db formatter?


Russell Smithies

Technical Support
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz
Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809
F? +64 3 489 9174
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of sally roberts
> Sent: Friday, 10 September 2010 4:10 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] standaloneblastplus
>
> I am running a test for standaloneblastplus but getting data back that
> does not exist in my query or my local database. Below is a outline of my
> script small database, query list, and erroneous results. As you will
> notice the query list is comprised of the first four sequences found in
> the database. The results say it can not find the first two and then the
> mathces for the last two do not exist!
>
> Thanks for any help!
>
>
>
> Program
>
>
> #!/usr/bin/perl
>
> use Bio::Tools::Run::StandAloneBlastPlus;
>
>
> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>???-db_name => 'ITS',
>???-db_data => 'smallDB.fas',
>???-create => 1
> );
>
> $result = $fac->blastn( -query => , 'sequences.fasta',
>? ? ? ? ? ? ? ? ? ? ? ???-outfile => 'ITStest2.bls');
>
>
> smallDB.fas Data
>
> >302585252|HM807352|Waitea circinata? internal transcribed spacer 1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >302585252|HM807352|Waitea circinata? internal transcribed spacer 2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
> >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
> ATATCATTAAAGCGGAGGAA
>
> >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1
> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
> ACGCA
> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>
> >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer
> 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>
> >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer
> 1
> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>
> >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1
> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>
> >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1
> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>
>
>
> sequences.fasta data
>
> >Test1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >Test2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >Test3
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >Test4
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
>
>
>
> Results
>
> BLASTN 2.2.24+
>
>
> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
> Comput Biol 2000; 7(1-2):203-14.
>
>
>
> Database: ITS
>? ? ? ? ? ? 5 sequences; 1,102 total letters
>
>
>
> Query=? Test1
> Length=204
>
>
> ***** No hits found *****
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 202071
>
>
> Query=? Test2
> Length=192
>
>
> ***** No hits found *****
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 189507
>
>
> Query=? Test3
> Length=437
>
> Score? ???E
> Sequences producing significant alignments:
> (Bits)? Value
>
> dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 300? ? 2e-085
> dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4? ? 6e-016
> dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4? ? 1e-012
> dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5? ? 4e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>? Score =? 300 bits (162),? Expect = 2e-085
>? Identities = 176/182 (96%), Gaps = 4/182 (2%)
>? Strand=Plus/Plus
>
> Query? 10???TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
> 66
>? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
> Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
> 81
>
> Query? 67???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 126
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 82???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 141
>
> Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 186
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 201
>
> Query? 187? GG? 188
>? ? ? ? ? ???||
> Sbjct? 202? GG? 203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>? Score = 69.4 bits (37),? Expect = 6e-016
>? Identities = 39/40 (97%), Gaps = 0/40 (0%)
>? Strand=Plus/Plus
>
> Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188
>? ? ? ? ? ???||||| ||||||||||||||||||||||||||||||||||
> Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>? Score = 58.4 bits (31),? Expect = 1e-012
>? Identities = 39/42 (92%), Gaps = 3/42 (7%)
>? Strand=Plus/Plus
>
> Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
>? ? ? ? ? ???|||| || ||| ||||||||||||||||||||||||||||||
> Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>? Score = 56.5 bits (30),? Expect = 4e-012
>? Identities = 30/30 (100%), Gaps = 0/30 (0%)
>? Strand=Plus/Plus
>
> Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186
>? ? ? ? ? ???||||||||||||||||||||||||||||||
> Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 442850
>
>
> Query=? Test4
> Length=521
>
> Score? ???E
> Sequences producing significant alignments:
> (Bits)? Value
>
> dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 309? ? 4e-088
> dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4? ? 7e-016
> dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4? ? 1e-012
> dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5? ? 5e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>? Score =? 309 bits (167),? Expect = 4e-088
>? Identities = 177/181 (97%), Gaps = 3/181 (1%)
>? Strand=Plus/Plus
>
> Query? 7? ? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 63
>? ? ? ? ? ???||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
> Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 82
>
> Query? 64???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 123
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 83???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 142
>
> Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 183
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 202
>
> Query? 184? G? 184
>? ? ? ? ? ???|
> Sbjct? 203? G? 203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>? Score = 69.4 bits (37),? Expect = 7e-016
>? Identities = 39/40 (97%), Gaps = 0/40 (0%)
>? Strand=Plus/Plus
>
> Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184
>? ? ? ? ? ???||||| ||||||||||||||||||||||||||||||||||
> Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>? Score = 58.4 bits (31),? Expect = 1e-012
>? Identities = 39/42 (92%), Gaps = 3/42 (7%)
>? Strand=Plus/Plus
>
> Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182
>? ? ? ? ? ???|||| || ||| ||||||||||||||||||||||||||||||
> Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>? Score = 56.5 bits (30),? Expect = 5e-012
>? Identities = 30/30 (100%), Gaps = 0/30 (0%)
>? Strand=Plus/Plus
>
> Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182
>? ? ? ? ? ???||||||||||||||||||||||||||||||
> Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 530378
>
>
>???Database: ITS
>? ???Posted date:? Aug 27, 2010? 9:43 AM
>???Number of letters in database: 1,102
>???Number of sequences in database:? 5
>
>
>
> Matrix: blastn matrix 1 -2
> Gap Penalties: Existence: 0, Extension: 2.5
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From David.Messina at sbc.su.se  Fri Sep 10 12:23:26 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 10 Sep 2010 18:23:26 +0200
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <23696.14536.qm@web37508.mail.mud.yahoo.com>
References: <23696.14536.qm@web37508.mail.mud.yahoo.com>
Message-ID: <CFF0CF54-E5BD-4FD1-990D-E3AC219FBF5B@sbc.su.se>

Hi Sally,

Did you run the same search on the command line, outside of BioPerl?

The issue you're having may be with Blast+ and not BioPerl. For example, it's possible that the low-complexity and compositional matrix adjustment filtering (which are turned on by default) are excluding the expected matches.


Dave


On Sep 10, 2010, at 17:13 , sally roberts wrote:

> I think that is just a email error. Thanks for looking though!
> 
> --- On Thu, 9/9/10, Smithies, Russell <Russell.Smithies at agresearch.co.nz> wrote:
> 
> From: Smithies, Russell <Russell.Smithies at agresearch.co.nz>
> Subject: RE: [Bioperl-l] standaloneblastplus
> To: "'sally roberts'" <elanorbust2 at yahoo.com>, "'bioperl-l at lists.open-bio.org'" <bioperl-l at lists.open-bio.org>
> Date: Thursday, September 9, 2010, 6:54 PM
> 
> Is that a typo in your email or are some of your fasta headers in your db incorrect?
> Eg.
>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed
>> 301333052|GU725063|spacer 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT
> 
> Shouldn't that be:
>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT
> 
> Maybe the invalid fasta headers are breaking the db formatter?
> 
> 
> Russell Smithies
> 
> Technical Support
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of sally roberts
>> Sent: Friday, 10 September 2010 4:10 a.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] standaloneblastplus
>> 
>> I am running a test for standaloneblastplus but getting data back that
>> does not exist in my query or my local database. Below is a outline of my
>> script small database, query list, and erroneous results. As you will
>> notice the query list is comprised of the first four sequences found in
>> the database. The results say it can not find the first two and then the
>> mathces for the last two do not exist!
>> 
>> Thanks for any help!
>> 
>> 
>> 
>> Program
>> 
>> 
>> #!/usr/bin/perl
>> 
>> use Bio::Tools::Run::StandAloneBlastPlus;
>> 
>> 
>> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>>    -db_name => 'ITS',
>>    -db_data => 'smallDB.fas',
>>    -create => 1
>> );
>> 
>> $result = $fac->blastn( -query => , 'sequences.fasta',
>>                          -outfile => 'ITStest2.bls');
>> 
>> 
>> smallDB.fas Data
>> 
>>> 302585252|HM807352|Waitea circinata  internal transcribed spacer 1
>> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
>> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
>> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>> 
>>> 302585252|HM807352|Waitea circinata  internal transcribed spacer 2
>> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
>> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
>> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>> 
>>> 302585250|HM802273|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
>> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
>> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
>> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
>> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
>> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>> 
>>> 302585249|HM802272|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
>> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
>> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
>> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
>> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
>> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
>> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
>> GAA
>> 
>>> 302585248|HM802271|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
>> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
>> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
>> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
>> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
>> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
>> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
>> ATATCATTAAAGCGGAGGAA
>> 
>>> 301333053|GU725064|Xiphinema turcicum  internal transcribed spacer 1
>> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
>> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
>> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
>> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
>> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
>> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
>> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
>> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
>> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
>> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
>> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
>> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
>> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
>> ACGCA
>> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
>> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
>> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>> 
>>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer
>> 1
>> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
>> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
>> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
>> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
>> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
>> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
>> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
>> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
>> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
>> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
>> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
>> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
>> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>> 
>>> 301333051|GU725062|Xiphinema sphaerocephalum  internal transcribed spacer
>> 1
>> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
>> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
>> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
>> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
>> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
>> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
>> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
>> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
>> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
>> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
>> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
>> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
>> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>> 
>>> 301333050|GU725061|Xiphinema hispanum  internal transcribed spacer 1
>> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
>> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
>> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
>> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
>> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
>> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
>> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
>> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
>> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
>> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
>> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
>> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
>> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>> 
>>> 301333049|GU725060|Xiphinema pyrenaicum  internal transcribed spacer 1
>> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
>> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
>> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
>> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
>> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
>> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
>> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
>> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
>> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
>> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
>> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
>> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>> 
>> 
>> 
>> sequences.fasta data
>> 
>>> Test1
>> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
>> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
>> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>> 
>>> Test2
>> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
>> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
>> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>> 
>>> Test3
>> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
>> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
>> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
>> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
>> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
>> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>> 
>>> Test4
>> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
>> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
>> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
>> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
>> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
>> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
>> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
>> GAA
>> 
>> 
>> 
>> 
>> Results
>> 
>> BLASTN 2.2.24+
>> 
>> 
>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
>> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
>> Comput Biol 2000; 7(1-2):203-14.
>> 
>> 
>> 
>> Database: ITS
>>             5 sequences; 1,102 total letters
>> 
>> 
>> 
>> Query=  Test1
>> Length=204
>> 
>> 
>> ***** No hits found *****
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 202071
>> 
>> 
>> Query=  Test2
>> Length=192
>> 
>> 
>> ***** No hits found *****
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 189507
>> 
>> 
>> Query=  Test3
>> Length=437
>> 
>> Score     E
>> Sequences producing significant alignments:
>> (Bits)  Value
>> 
>> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 300    2e-085
>> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 69.4    6e-016
>> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 58.4    1e-012
>> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 56.5    4e-012
>> 
>> 
>>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G59F
>> Length=203
>> 
>>   Score =  300 bits (162),  Expect = 2e-085
>>   Identities = 176/182 (96%), Gaps = 4/182 (2%)
>>   Strand=Plus/Plus
>> 
>> Query  10   TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
>> 66
>>              ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
>> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
>> 81
>> 
>> Query  67   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
>> 126
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  82   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
>> 141
>> 
>> Query  127  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
>> 186
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  142  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
>> 201
>> 
>> Query  187  GG  188
>>              ||
>> Sbjct  202  GG  203
>> 
>> 
>>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G64F
>> Length=217
>> 
>>   Score = 69.4 bits (37),  Expect = 6e-016
>>   Identities = 39/40 (97%), Gaps = 0/40 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  149  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  188
>>              ||||| ||||||||||||||||||||||||||||||||||
>> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>> 
>> 
>>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G60F
>> Length=206
>> 
>>   Score = 58.4 bits (31),  Expect = 1e-012
>>   Identities = 39/42 (92%), Gaps = 3/42 (7%)
>>   Strand=Plus/Plus
>> 
>> Query  146  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  186
>>              |||| || ||| ||||||||||||||||||||||||||||||
>> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>> 
>> 
>>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G65F
>> Length=256
>> 
>>   Score = 56.5 bits (30),  Expect = 4e-012
>>   Identities = 30/30 (100%), Gaps = 0/30 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  157  AAAACTTTCAACAACGGATCTCTTGGTTCT  186
>>              ||||||||||||||||||||||||||||||
>> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 442850
>> 
>> 
>> Query=  Test4
>> Length=521
>> 
>> Score     E
>> Sequences producing significant alignments:
>> (Bits)  Value
>> 
>> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 309    4e-088
>> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 69.4    7e-016
>> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 58.4    1e-012
>> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 56.5    5e-012
>> 
>> 
>>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G59F
>> Length=203
>> 
>>   Score =  309 bits (167),  Expect = 4e-088
>>   Identities = 177/181 (97%), Gaps = 3/181 (1%)
>>   Strand=Plus/Plus
>> 
>> Query  7    TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
>> 63
>>              ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
>> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
>> 82
>> 
>> Query  64   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
>> 123
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  83   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
>> 142
>> 
>> Query  124  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
>> 183
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  143  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
>> 202
>> 
>> Query  184  G  184
>>              |
>> Sbjct  203  G  203
>> 
>> 
>>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G64F
>> Length=217
>> 
>>   Score = 69.4 bits (37),  Expect = 7e-016
>>   Identities = 39/40 (97%), Gaps = 0/40 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  145  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  184
>>              ||||| ||||||||||||||||||||||||||||||||||
>> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>> 
>> 
>>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G60F
>> Length=206
>> 
>>   Score = 58.4 bits (31),  Expect = 1e-012
>>   Identities = 39/42 (92%), Gaps = 3/42 (7%)
>>   Strand=Plus/Plus
>> 
>> Query  142  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  182
>>              |||| || ||| ||||||||||||||||||||||||||||||
>> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>> 
>> 
>>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G65F
>> Length=256
>> 
>>   Score = 56.5 bits (30),  Expect = 5e-012
>>   Identities = 30/30 (100%), Gaps = 0/30 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  153  AAAACTTTCAACAACGGATCTCTTGGTTCT  182
>>              ||||||||||||||||||||||||||||||
>> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 530378
>> 
>> 
>>    Database: ITS
>>      Posted date:  Aug 27, 2010  9:43 AM
>>    Number of letters in database: 1,102
>>    Number of sequences in database:  5
>> 
>> 
>> 
>> Matrix: blastn matrix 1 -2
>> Gap Penalties: Existence: 0, Extension: 2.5
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Sat Sep 11 12:13:09 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Sat, 11 Sep 2010 17:13:09 +0100
Subject: [Bioperl-l] Regarding GSoC 2010
In-Reply-To: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>
References: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>
Message-ID: <019501cb51cc$39d15730$ad740590$%yin@ucd.ie>

Hi, Jayanthi Jayakumar,

GSoC is already finished this year. You can check the information here:
http://socghop.appspot.com/gsoc/program/home/google/gsoc2010

However, you can still contribute to the BioPerl project if you like. You
can talk to people in this mail list. Or you can join the IRC channel
(http://www.bioperl.org/wiki/IRC).

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jayanthijayakumar
Sent: Thursday, September 09, 2010 6:00 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Regarding GSoC 2010

Respected sir/madam,

I am Jayanthi Jayakumar doing my second year MS(By Research) in
computational biology in Anna University Chennai,India.

Iam very  much interested to participate in GSoC 2010 under the project
"Major Bioperl recognition".

I request you to provide details and eligiblity criteria for the same.


Thanking you,


yours faithfully,
Jayanthi Jayakumar
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 
 
__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From david.breimann at gmail.com  Sun Sep 12 09:16:29 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sun, 12 Sep 2010 15:16:29 +0200
Subject: [Bioperl-l] Circular genomes
Message-ID: <AANLkTi=LgWkUVeUd6WdaoDZQBTJMm7aCRe86p2-+R9PE@mail.gmail.com>

Hello,

As continuation to
http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033904.html, I
would like to ask:
Was the fix implemented yet? That is, are GFF3 created for circular genomes
comply with GFF3 specs for such genomes?
I just find it difficult to keep track using git ,so I'm not sure if this
was already handled.

Also, will the stat and end coordinates of such genes loaded from a GFF3
file will be "normal" (i.e. no coordinate is larger than the size of the
genome) or just as written in the GFF3 (which demands that end > start even
if end > genome length)?

Thanks,
David

From David.Messina at sbc.su.se  Mon Sep 13 11:10:42 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 13 Sep 2010 17:10:42 +0200
Subject: [Bioperl-l] BioPerl net installer
Message-ID: <80921A33-63E0-481A-B31B-3C0338542F2B@sbc.su.se>

Hi everyone,

I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.

The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.

It's already part of bioperl-live, and you can also get it here:

	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl


Dave


From maj at fortinbras.us  Mon Sep 13 12:47:45 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 13 Sep 2010 16:47:45 +0000
Subject: [Bioperl-l] BioPerl net installer
Message-ID: <W964423161995101284396465@webmail24>

Dear Scott-
You rock!
Sincerely, 
Mark

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Monday, September 13, 2010 11:10 AM
>To: 'BioPerl List'
>Subject: [Bioperl-l] BioPerl net installer
>
>Hi everyone,
>
>I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.
>
>The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.
>
>It's already part of bioperl-live, and you can also get it here:
>
>	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Sep 13 17:15:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 13 Sep 2010 16:15:45 -0500
Subject: [Bioperl-l] BioPerl net installer
In-Reply-To: <W964423161995101284396465@webmail24>
References: <W964423161995101284396465@webmail24>
Message-ID: <3D7D24C5-B2BD-472E-9611-F3D7112E453D@illinois.edu>

Ditto!  

chris (briefly resurfacing)

On Sep 13, 2010, at 11:47 AM, Mark A. Jensen wrote:

> Dear Scott-
> You rock!
> Sincerely, 
> Mark
> 
>> -----Original Message-----
>> From: Dave Messina [mailto:David.Messina at sbc.su.se]
>> Sent: Monday, September 13, 2010 11:10 AM
>> To: 'BioPerl List'
>> Subject: [Bioperl-l] BioPerl net installer
>> 
>> Hi everyone,
>> 
>> I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.
>> 
>> The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.
>> 
>> It's already part of bioperl-live, and you can also get it here:
>> 
>> 	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl
>> 
>> 
>> 
>> Dave
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From timmcilveen at talktalk.net  Mon Sep 13 19:07:00 2010
From: timmcilveen at talktalk.net (tim)
Date: Tue, 14 Sep 2010 00:07:00 +0100
Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3
Message-ID: <201009140007.00798.timmcilveen@talktalk.net>

Hi,
I have just installed Bioperl on my Linux system using the CPAN install. The 
install summary is as follows:
Test Summary Report
-------------------
t/RemoteDB/GenPept.t                       (Wstat: 256 Tests: 21 Failed: 1)
  Failed test:  17
  Non-zero exit status: 1
t/RemoteDB/Query/GenBank.t                 (Wstat: 256 Tests: 18 Failed: 1)
  Failed test:  9
  Non-zero exit status: 1
  Parse errors: Bad plan.  You planned 21 tests but ran 18.
t/RemoteDB/Taxonomy.t                      (Wstat: 512 Tests: 103 Failed: 2)
  Failed tests:  15, 98
  Non-zero exit status: 2
t/Root/RootIO.t                            (Wstat: 7424 Tests: 30 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 31 tests but ran 30.
Files=329, Tests=18407, 512 wallclock secs ( 6.19 usr  0.91 sys + 156.68 cusr  
9.16 csys = 172.94 CPU)
Result: FAIL
Failed 4/329 test programs. 4/18407 subtests failed.
  CJFIELDS/BioPerl-1.6.1.tar.gz
  ./Build test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports CJFIELDS/BioPerl-1.6.1.tar.gz
Running Build install
  make test had returned bad status, won't install without force
Failed during this command:
 CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO

Is Bioperl properly installed? During the install process I was getting quite 
a lot of this error (100's of instances):
'replacement list longer than search list . This happened with t/tools, t/seq 
/ t/search and many others.

Any advice would be great.
Tim


From David.Messina at sbc.su.se  Tue Sep 14 03:56:33 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 14 Sep 2010 09:56:33 +0200
Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3
In-Reply-To: <201009140007.00798.timmcilveen@talktalk.net>
References: <201009140007.00798.timmcilveen@talktalk.net>
Message-ID: <5955676D-D3BC-452B-BAA0-6F230EC11EC1@sbc.su.se>

Hi Tim,

Thanks for your report.


> Is Bioperl properly installed?

No, it wasn't. When installing through CPAN, if any tests fail the installation is aborted. You can always check by looking for this line:

>  make test had returned bad status, won't install without force


As for the error(s)

> 'replacement list longer than search list'

I believe this was fixed a couple of months ago. For details, see:

	http://bugzilla.open-bio.org/show_bug.cgi?id=3116


So I would recommend that you grab the latest copy of bioperl-live from github, wherein the bug will be fixed:

	http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots


Give that a shot and let us know how it goes.


Dave


From jskittrell at unmc.edu  Thu Sep 16 12:15:49 2010
From: jskittrell at unmc.edu (Jeff Kittrell)
Date: Thu, 16 Sep 2010 16:15:49 +0000 (UTC)
Subject: [Bioperl-l] mpiblast
Message-ID: <loom.20100916T180946-13@post.gmane.org>

Does Bioperl work with mpiblast? Is the there a standalone like module that
allows you to easily call mpiblast? I'm assuming seqio with parse a mpiblast
output file correctly?

Thanks for any help,

Jeff 


From David.Messina at sbc.su.se  Thu Sep 16 14:25:57 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 16 Sep 2010 20:25:57 +0200
Subject: [Bioperl-l] mpiblast
In-Reply-To: <loom.20100916T180946-13@post.gmane.org>
References: <loom.20100916T180946-13@post.gmane.org>
Message-ID: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>

> Is the there a standalone like module that allows you to easily call mpiblast?

No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward.

	http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase


> I'm assuming seqio with parse a mpiblast output file correctly?

Yes, although I see that a new version of mpiblast was recently released. 

Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet?


Dave


From shalabh.sharma7 at gmail.com  Thu Sep 16 17:38:14 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Thu, 16 Sep 2010 17:38:14 -0400
Subject: [Bioperl-l] IUPAC code similarity
Message-ID: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>

Hi All,
      I have few nucleotide sequences that are composed of IUPAC codes. Like
>test
VGSRVBSSSSSNSC

Similarly i have a database made of of these kind of sequences. I want to
find sequences that are 100% similar to the query sequence.

Is there any bioPerl module to deal with this, i tried normal blast but it
didn't worked.
Do i have to convert these sequences to 4 base codes or there is any other
way out.

Thanks
Shalabh

From amackey at virginia.edu  Fri Sep 17 10:28:15 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 10:28:15 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
Message-ID: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>

Convert the IUPAC code to a regular expression, and use regular expressions
(in Perl or grep or similar) to find 100% identical matches.

-Aaron

On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>      I have few nucleotide sequences that are composed of IUPAC codes. Like
> >test
> VGSRVBSSSSSNSC
>
> Similarly i have a database made of of these kind of sequences. I want to
> find sequences that are 100% similar to the query sequence.
>
> Is there any bioPerl module to deal with this, i tried normal blast but it
> didn't worked.
> Do i have to convert these sequences to 4 base codes or there is any other
> way out.
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From shalabh.sharma7 at gmail.com  Fri Sep 17 11:07:38 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Fri, 17 Sep 2010 11:07:38 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
Message-ID: <AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>

Thanks Aaron for your reply.
Actually i tried that first, but there is another problem, i have to divide
each query sequence to window size 5 with 1 base shift and its not possible
to divide regular expression in that way.
So what i am trying is to convert those iupac codes to 4 base code sequence
and then do the normal search.
Now the problem is that i cant able to convert those IUPAC sequences to
normal ones, i am still trying to write a script but its taking time.

Thanks
Shalabh


On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu> wrote:

> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <shalabh.sharma7 at gmail.com
> > wrote:
>
>> Hi All,
>>      I have few nucleotide sequences that are composed of IUPAC codes.
>> Like
>> >test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>

From roy.chaudhuri at gmail.com  Fri Sep 17 11:04:28 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 17 Sep 2010 16:04:28 +0100
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
Message-ID: <4C93837C.4080008@gmail.com>

Hi Shalabh,

The expand method in Bio::Tools::SeqPattern may be useful to convert 
IUPAC codes to regular expressions:

$perl -e 'use Bio::Tools::SeqPattern; print 
Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
[ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C

Although that won't work if there are also abiguity codes in your 
database. For a non-BioPerl solution you could try fuzznuc from Emboss.

Cheers.
Roy.

On 17/09/2010 15:28, Aaron Mackey wrote:
> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
> <shalabh.sharma7 at gmail.com>wrote:
>
>> Hi All,
>>       I have few nucleotide sequences that are composed of IUPAC codes. Like
>>> test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Fri Sep 17 14:13:22 2010
From: david.breimann at gmail.com (David Breimann)
Date: Fri, 17 Sep 2010 20:13:22 +0200
Subject: [Bioperl-l] Installing using git after an older installation
Message-ID: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>

Hello,

I'm sharing a server with some other lab members.
I would like to install the latest version of bioperl for my own use,
without affecting my colleagues.
I used git to clone a copy of bioperl-live and exported
  PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB".
Now
  perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
returns
  1.0069

My question is: is that all?
Now I'm using the latest version?
Should I include anything special in my scripts?

Also, what about all the bp_***.pl scripts?
Are the now using the latest version, too?
I guess not, since I didn't build anything. So what should I do about them?

Thanks,
Dave

From amackey at virginia.edu  Fri Sep 17 15:24:44 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 15:24:44 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <4C93837C.4080008@gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<4C93837C.4080008@gmail.com>
Message-ID: <AANLkTi=Nx5fQ=YvxzB4mSsJh+38B-2+87Lvf=WHU_rCH@mail.gmail.com>

If there are ambi. codes in the database, then the expanded character class
has to also include the original ambiguity code; non-ambiguous nucleotides
must also be expanded to include all ambiguity codes that represent the
nucleotide.

-Aaron

On Fri, Sep 17, 2010 at 11:04 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Hi Shalabh,
>
> The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC
> codes to regular expressions:
>
> $perl -e 'use Bio::Tools::SeqPattern; print
> Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
> [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C
>
> Although that won't work if there are also abiguity codes in your database.
> For a non-BioPerl solution you could try fuzznuc from Emboss.
>
> Cheers.
> Roy.
>
>
> On 17/09/2010 15:28, Aaron Mackey wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions
>> (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
>> <shalabh.sharma7 at gmail.com>wrote:
>>
>>  Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>>
>>>> test
>>>>
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>  _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>

From amackey at virginia.edu  Fri Sep 17 15:25:54 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 15:25:54 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
Message-ID: <AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>

do your windowing/shifting on the unexpanded query sequences; then transform
the 5-bp queries into regular expressions.

-Aaron

On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Thanks Aaron for your reply.
> Actually i tried that first, but there is another problem, i have to divide
> each query sequence to window size 5 with 1 base shift and its not possible
> to divide regular expression in that way.
> So what i am trying is to convert those iupac codes to 4 base code sequence
> and then do the normal search.
> Now the problem is that i cant able to convert those IUPAC sequences to
> normal ones, i am still trying to write a script but its taking time.
>
> Thanks
> Shalabh
>
>
> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu>wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>> shalabh.sharma7 at gmail.com> wrote:
>>
>>> Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>> >test
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>

From Kevin.M.Brown at asu.edu  Fri Sep 17 16:09:34 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 17 Sep 2010 13:09:34 -0700
Subject: [Bioperl-l] Installing using git after an older installation
In-Reply-To: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>
References: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B40701E0A4@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_IN_A_PERSONAL_MODULE_AREA


From shalabh.sharma7 at gmail.com  Fri Sep 17 16:45:50 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Fri, 17 Sep 2010 16:45:50 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
	<AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>
Message-ID: <AANLkTi=b7MYxC=2DuY3N5Udg9CsfoVDV=Q9cFUOUML0x@mail.gmail.com>

Thanks Aaron,
changing the query sequence worked well but i am still struggling with the
database.

-Shalabh


On Fri, Sep 17, 2010 at 3:25 PM, Aaron Mackey <amackey at virginia.edu> wrote:

> do your windowing/shifting on the unexpanded query sequences; then
> transform the 5-bp queries into regular expressions.
>
> -Aaron
>
>
> On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma <
> shalabh.sharma7 at gmail.com> wrote:
>
>> Thanks Aaron for your reply.
>> Actually i tried that first, but there is another problem, i have to
>> divide each query sequence to window size 5 with 1 base shift and its not
>> possible to divide regular expression in that way.
>> So what i am trying is to convert those iupac codes to 4 base code
>> sequence and then do the normal search.
>> Now the problem is that i cant able to convert those IUPAC sequences to
>> normal ones, i am still trying to write a script but its taking time.
>>
>> Thanks
>> Shalabh
>>
>>
>> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu>wrote:
>>
>>> Convert the IUPAC code to a regular expression, and use regular
>>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>>
>>> -Aaron
>>>
>>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>>> shalabh.sharma7 at gmail.com> wrote:
>>>
>>>> Hi All,
>>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>>> Like
>>>> >test
>>>> VGSRVBSSSSSNSC
>>>>
>>>> Similarly i have a database made of of these kind of sequences. I want
>>>> to
>>>> find sequences that are 100% similar to the query sequence.
>>>>
>>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>>> it
>>>> didn't worked.
>>>> Do i have to convert these sequences to 4 base codes or there is any
>>>> other
>>>> way out.
>>>>
>>>> Thanks
>>>> Shalabh
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>

From heikki.lehvaslaiho at gmail.com  Sat Sep 18 03:41:22 2010
From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho)
Date: Sat, 18 Sep 2010 10:41:22 +0300
Subject: [Bioperl-l] mpiblast
In-Reply-To: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>
References: <loom.20100916T180946-13@post.gmane.org>
	<0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>
Message-ID: <AANLkTi=SOzf5PH605OkPYf2r6w=xaTdXqU+zY7LBwkNw@mail.gmail.com>

Been running 1.6 and its betas on Blue Gene/P for months. The output
is identical to standard BLAST output. No issues in parsing it with
BioPerl.

? ?? -Heikki

Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +966 545 595 849? office: +966 2 808 2429

Computational Bioscience Research Centre (CBRC), Building #2, Office #4216
4700 King Abdullah University of Science and Technology (KAUST)
Thuwal 23955-6900, Kingdom of Saudi Arabia


On 16 September 2010 21:25, Dave Messina <David.Messina at sbc.su.se> wrote:
>> Is the there a standalone like module that allows you to easily call mpiblast?
>
> No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward.
>
> ? ? ? ?http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase
>
>
>> I'm assuming seqio with parse a mpiblast output file correctly?
>
> Yes, although I see that a new version of mpiblast was recently released.
>
> Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet?
>
>
> Dave
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From david.breimann at gmail.com  Sat Sep 18 05:05:58 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 11:05:58 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
Message-ID: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>

Hello,

I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag`
in the fields and sometime it doesn't, even though the genabank has a locus
tag.
Also, is the ID always equivalent to the locus tag?

Thanks,
Dave

From scott at scottcain.net  Sat Sep 18 05:17:24 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 10:17:24 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
Message-ID: <AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>

Hi Dave,

bp_genbank2gff3.pl suffers from the fact that it has to deal with
GenBank files :-)  It was designed initially to work on whole genome
refseqs, and contains several ad hoc rules for trying to make it "do
the right thing."  In practice, it is not unusual for a post
processing step (either by hand or a quicky perl script) to be
required to really get it right.  I don't recall the specifics (if I
ever knew :-) for when and how the locus tag is used, but I do know
that there is a list of things that it will try to use for the ID, and
while the locus is on the list, I don't know where it comes in the
list, so it's possible that other items might supersede it.

Scott


On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Hello,
>
> I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag`
> in the fields and sometime it doesn't, even though the genabank has a locus
> tag.
> Also, is the ID always equivalent to the locus tag?
>
> Thanks,
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 05:20:33 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 11:20:33 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
Message-ID: <AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>

Since locus_tag is an essential tag in genbank, I suggest locus_tag will be
always added to the GFF last column if it exists in the genbank, whether it
is used as ID in the GFF or not.

On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> bp_genbank2gff3.pl suffers from the fact that it has to deal with
> GenBank files :-)  It was designed initially to work on whole genome
> refseqs, and contains several ad hoc rules for trying to make it "do
> the right thing."  In practice, it is not unusual for a post
> processing step (either by hand or a quicky perl script) to be
> required to really get it right.  I don't recall the specifics (if I
> ever knew :-) for when and how the locus tag is used, but I do know
> that there is a list of things that it will try to use for the ID, and
> while the locus is on the list, I don't know where it comes in the
> list, so it's possible that other items might supersede it.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Hello,
> >
> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
> `locus_tag`
> > in the fields and sometime it doesn't, even though the genabank has a
> locus
> > tag.
> > Also, is the ID always equivalent to the locus tag?
> >
> > Thanks,
> > Dave
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>

From scott at scottcain.net  Sat Sep 18 06:08:26 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 11:08:26 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
Message-ID: <AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>

Hi Dave,

That seems perfectly reasonable.  If you could point out a GenBank
entry for which that does not happen, I could try to figure out why
not.

Scott


On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Since locus_tag is an essential tag in genbank, I suggest locus_tag will be
> always added to the GFF last column if it exists in the genbank, whether it
> is used as ID in the GFF or not.
>
> On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> bp_genbank2gff3.pl suffers from the fact that it has to deal with
>> GenBank files :-) ?It was designed initially to work on whole genome
>> refseqs, and contains several ad hoc rules for trying to make it "do
>> the right thing." ?In practice, it is not unusual for a post
>> processing step (either by hand or a quicky perl script) to be
>> required to really get it right. ?I don't recall the specifics (if I
>> ever knew :-) for when and how the locus tag is used, but I do know
>> that there is a list of things that it will try to use for the ID, and
>> while the locus is on the list, I don't know where it comes in the
>> list, so it's possible that other items might supersede it.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Hello,
>> >
>> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
>> > `locus_tag`
>> > in the fields and sometime it doesn't, even though the genabank has a
>> > locus
>> > tag.
>> > Also, is the ID always equivalent to the locus tag?
>> >
>> > Thanks,
>> > Dave
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 06:20:50 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 12:20:50 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
Message-ID: <AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>

Hi Scott,

Here is a very short genbank:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk

Note all genes in the genbank have locus tags. In the resulting GFF3,
however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no
idea why it deserves a special treatment... :)

p.s. making this change (i.e., copying locus_tag to the GFF3 last column
whenever available) will really make my life easier.

Thank you,
Dave

On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> That seems perfectly reasonable.  If you could point out a GenBank
> entry for which that does not happen, I could try to figure out why
> not.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will
> be
> > always added to the GFF last column if it exists in the genbank, whether
> it
> > is used as ID in the GFF or not.
> >
> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net>
> wrote:
> >>
> >> Hi Dave,
> >>
> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with
> >> GenBank files :-)  It was designed initially to work on whole genome
> >> refseqs, and contains several ad hoc rules for trying to make it "do
> >> the right thing."  In practice, it is not unusual for a post
> >> processing step (either by hand or a quicky perl script) to be
> >> required to really get it right.  I don't recall the specifics (if I
> >> ever knew :-) for when and how the locus tag is used, but I do know
> >> that there is a list of things that it will try to use for the ID, and
> >> while the locus is on the list, I don't know where it comes in the
> >> list, so it's possible that other items might supersede it.
> >>
> >> Scott
> >>
> >>
> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> >> <david.breimann at gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
> >> > `locus_tag`
> >> > in the fields and sometime it doesn't, even though the genabank has a
> >> > locus
> >> > tag.
> >> > Also, is the ID always equivalent to the locus tag?
> >> >
> >> > Thanks,
> >> > Dave
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>

From david.breimann at gmail.com  Sat Sep 18 06:45:13 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 12:45:13 +0200
Subject: [Bioperl-l] Extracting sequences from GFF3
Message-ID: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>

As you know, GFF3 files can contain FASTA sequences after the features.

How do I extract a specific FASTA sequence given it's ID?

I tried:

use Bio::Tools::GFF;
use Data::Dumper;

my $gffio = Bio::Tools::GFF->new(
    -file =>
        "/path/to/file.gff",
    -gff_version => 3
);

print Dumper $gffio->get_seqs();

but $gffio->get_seqs() seems to return nothing, although the GFF3 has
sequences and is also valid.

By the way, I am able to parse the features themselves (using
$gffio->next_feature()).


Thanks,

Dave

From scott at scottcain.net  Sat Sep 18 07:07:13 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 12:07:13 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
Message-ID: <AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>

Hi Dave,

A fresh "pull" of the bioperl git repository shows that
bp_genbank2gff3.pl already does this.  It creates a locus_tag for all
features that have a locus_tag, and uses the locus_tag for the ID when
it can (it can't blindly use the locus tag for the ID since both the
gene and the CDS have the same tag).

Scott


On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Hi Scott,
>
> Here is a very short genbank:
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>
> Note all genes in the genbank have locus tags. In the resulting GFF3,
> however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no
> idea why it deserves a special treatment... :)
>
> p.s. making this change (i.e., copying locus_tag to the GFF3 last column
> whenever available) will really make my life easier.
>
> Thank you,
> Dave
>
> On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> That seems perfectly reasonable. ?If you could point out a GenBank
>> entry for which that does not happen, I could try to figure out why
>> not.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will
>> > be
>> > always added to the GFF last column if it exists in the genbank, whether
>> > it
>> > is used as ID in the GFF or not.
>> >
>> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net>
>> > wrote:
>> >>
>> >> Hi Dave,
>> >>
>> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with
>> >> GenBank files :-) ?It was designed initially to work on whole genome
>> >> refseqs, and contains several ad hoc rules for trying to make it "do
>> >> the right thing." ?In practice, it is not unusual for a post
>> >> processing step (either by hand or a quicky perl script) to be
>> >> required to really get it right. ?I don't recall the specifics (if I
>> >> ever knew :-) for when and how the locus tag is used, but I do know
>> >> that there is a list of things that it will try to use for the ID, and
>> >> while the locus is on the list, I don't know where it comes in the
>> >> list, so it's possible that other items might supersede it.
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
>> >> > `locus_tag`
>> >> > in the fields and sometime it doesn't, even though the genabank has a
>> >> > locus
>> >> > tag.
>> >> > Also, is the ID always equivalent to the locus tag?
>> >> >
>> >> > Thanks,
>> >> > Dave
>> >> > _______________________________________________
>> >> > Bioperl-l mailing list
>> >> > Bioperl-l at lists.open-bio.org
>> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 07:13:23 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 12:13:23 +0100
Subject: [Bioperl-l] Extracting sequences from GFF3
In-Reply-To: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>
References: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>
Message-ID: <AANLkTimc9Zf2a2N64wUUancrHWJ05wRKgfMuUQ0X9UTe@mail.gmail.com>

Hi Dave,

I would use Bio::DB::SeqFeature::Store (either with a database on the
backend or a flat file if a database isn't warranted):

  my $db      = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory',
                                                        -dir =>
'path/to/file' );

  # Warning: this returns a string, and not a PrimarySeq object
  my $sequence = $db->fetch_sequence('Chr1',5000=>6000);

Scott


On Sat, Sep 18, 2010 at 11:45 AM, David Breimann
<david.breimann at gmail.com> wrote:
> As you know, GFF3 files can contain FASTA sequences after the features.
>
> How do I extract a specific FASTA sequence given it's ID?
>
> I tried:
>
> use Bio::Tools::GFF;
> use Data::Dumper;
>
> my $gffio = Bio::Tools::GFF->new(
>    -file =>
>        "/path/to/file.gff",
>    -gff_version => 3
> );
>
> print Dumper $gffio->get_seqs();
>
> but $gffio->get_seqs() seems to return nothing, although the GFF3 has
> sequences and is also valid.
>
> By the way, I am able to parse the features themselves (using
> $gffio->next_feature()).
>
>
> Thanks,
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

From scott at scottcain.net  Sat Sep 18 09:40:35 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 14:40:35 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
Message-ID: <AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>

Hi Dave,

Let's keep the discussion on the mailing list so we can make sure that
when this problem is solved, its resolution will be archived.

I don't really understand what is going on either, though it would
probably be a good idea to set your PERL5LIB env variable so that when
you execute this script from the git repository that it will also uses
BioPerl modules in the git repository instead of the ones that are
installed in your "normal" path.

Also, are you using any command line flags when executing it?  I didn't.

Scott


On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Yes, I'm using Ubuntu 10.04.
>
> That is really weired. I tried running the script from the perl-live dir
> (which I just pulled using git), and I get the same results as before
> (`Name` instead of `locus_tag`):
>
> ?$ wget
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
> NC_009789.genbank
>
> Attached is the resulting GFF3.
> I also attach a copy of bp_genbank2gff3.pl as found under
> /home/dave/src/bioperl-live/blib/script.
>
> This is a real mystery for me!
>
> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Typically I do build and install, but you can run it directly from the
>> git checkout directory.
>>
>> For locating other versions of the script, are you running linux? ?If
>> so, are you familiar with the "locate" command:
>>
>> ?locate bp_genbank2gff3.pl
>>
>> If you've never used it before, you may need to update the database
>> the locate command uses as root:
>>
>> ?sudo updatedb
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>> > of
>> > `locus_tag=`.
>> >
>> > I don't really know how to check for multiple bioperl installations.
>> > I'm using my personal server, so I don't mind removing and installing
>> > everything from scratch -- but I do'nt know ho to do that.
>> >
>> > Also, what I don't get with the git is how the scripts are supposed to
>> > be
>> > updated (unless you build and install).
>> >
>> > Thanks you!
>> >
>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net> wrote:
>> >>
>> >> Well, if you aren't getting the same results as me then I'd say you
>> >> aren't using the same version of the script :-)
>> >>
>> >> Unfortunately, the scripts are no longer automatically marked with the
>> >> "internal" version information when committed, so there really isn't
>> >> anything in the script I can tell you to look for. ?Check for more
>> >> than one bioperl instance on your ?computer.
>> >>
>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>> >> it is what you expect.
>> >>
>> >> Scott
>> >>
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Hi Scott,
>> >> >
>> >> > I just pulled the lated bioperl-live using git.
>> >> > I'm not sure how the scripts are updated, so I Build and installed
>> >> > anyway
>> >> > (perhaps exporting the path is supposed to be enough?)
>> >> > Anyway, I still get the same results. No locus_tag.
>> >> > How can I tell if I'm using the latest version of the script?
>> >> >
>> >> > Thanks again.
>> >> >
>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>> >> > wrote:
>> >> >>
>> >> >> Hi Dave,
>> >> >>
>> >> >> A fresh "pull" of the bioperl git repository shows that
>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>> >> >> all
>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>> >> >> when
>> >> >> it can (it can't blindly use the locus tag for the ID since both the
>> >> >> gene and the CDS have the same tag).
>> >> >>
>> >> >> Scott
>> >> >>
>> >> >>
>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>> >> >> <david.breimann at gmail.com> wrote:
>> >> >> > Hi Scott,
>> >> >> >
>> >> >> > Here is a very short genbank:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> >> >> >
>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>> >> >> > GFF3,
>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>> >> >> > have
>> >> >> > no
>> >> >> > idea why it deserves a special treatment... :)
>> >> >> >
>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>> >> >> > column
>> >> >> > whenever available) will really make my life easier.
>> >> >> >
>> >> >> > Thank you,
>> >> >> > Dave
>> >> >> >
>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Dave,
>> >> >> >>
>> >> >> >> That seems perfectly reasonable. ?If you could point out a
>> >> >> >> GenBank
>> >> >> >> entry for which that does not happen, I could try to figure out
>> >> >> >> why
>> >> >> >> not.
>> >> >> >>
>> >> >> >> Scott
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>> >> >> >> > locus_tag
>> >> >> >> > will
>> >> >> >> > be
>> >> >> >> > always added to the GFF last column if it exists in the
>> >> >> >> > genbank,
>> >> >> >> > whether
>> >> >> >> > it
>> >> >> >> > is used as ID in the GFF or not.
>> >> >> >> >
>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>> >> >> >> > <scott at scottcain.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi Dave,
>> >> >> >> >>
>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>> >> >> >> >> with
>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole
>> >> >> >> >> genome
>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>> >> >> >> >> it
>> >> >> >> >> "do
>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post
>> >> >> >> >> processing step (either by hand or a quicky perl script) to be
>> >> >> >> >> required to really get it right. ?I don't recall the specifics
>> >> >> >> >> (if I
>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I do
>> >> >> >> >> know
>> >> >> >> >> that there is a list of things that it will try to use for the
>> >> >> >> >> ID,
>> >> >> >> >> and
>> >> >> >> >> while the locus is on the list, I don't know where it comes in
>> >> >> >> >> the
>> >> >> >> >> list, so it's possible that other items might supersede it.
>> >> >> >> >>
>> >> >> >> >> Scott
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds
>> >> >> >> >> > a
>> >> >> >> >> > `locus_tag`
>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>> >> >> >> >> > genabank
>> >> >> >> >> > has a
>> >> >> >> >> > locus
>> >> >> >> >> > tag.
>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>> >> >> >> >> >
>> >> >> >> >> > Thanks,
>> >> >> >> >> > Dave
>> >> >> >> >> > _______________________________________________
>> >> >> >> >> > Bioperl-l mailing list
>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> >> scottcain
>> >> >> >> >> dot net
>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> 216-392-3087
>> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------
>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> scottcain
>> >> >> >> dot net
>> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> 216-392-3087
>> >> >> >> Ontario Institute for Cancer Research
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------
>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> scottcain
>> >> >> dot net
>> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> >> Ontario Institute for Cancer Research
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 09:48:35 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 14:48:35 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimt5Ty6-qXigmF1w+h5ZvMCaPZ7wO=YXx8TJPqO@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTimt5Ty6-qXigmF1w+h5ZvMCaPZ7wO=YXx8TJPqO@mail.gmail.com>
Message-ID: <AANLkTimMZ4vZUw2Gsd-NmaWL8xiD4hryH8Z-7UezT=LO@mail.gmail.com>

Hi Dave,

The blib directory is not part of the repository; it is created when
you execute ./Build as a staging area before installation.  The
directory that the script resides is scripts/Bio-DB-GFF/

Scott


On Sat, Sep 18, 2010 at 2:40 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Now I did a fresh clone (instead of pull) into a new dir:
>
> $ git clone http://github.com/bioperl/bioperl-live.git
>
> but I don't find the script at all (there is no blib dir as before)...
>
>
> On Sat, Sep 18, 2010 at 3:14 PM, David Breimann <david.breimann at gmail.com>
> wrote:
>>
>> Yes, I'm using Ubuntu 10.04.
>>
>> That is really weired. I tried running the script from the perl-live dir
>> (which I just pulled using git), and I get the same results as before
>> (`Name` instead of `locus_tag`):
>>
>> ?$ wget
>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
>> NC_009789.genbank
>>
>> Attached is the resulting GFF3.
>> I also attach a copy of bp_genbank2gff3.pl as found under
>> /home/dave/src/bioperl-live/blib/script.
>>
>> This is a real mystery for me!
>>
>> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>>
>>> Typically I do build and install, but you can run it directly from the
>>> git checkout directory.
>>>
>>> For locating other versions of the script, are you running linux? ?If
>>> so, are you familiar with the "locate" command:
>>>
>>> ?locate bp_genbank2gff3.pl
>>>
>>> If you've never used it before, you may need to update the database
>>> the locate command uses as root:
>>>
>>> ?sudo updatedb
>>>
>>> Scott
>>>
>>>
>>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>>> <david.breimann at gmail.com> wrote:
>>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>>> > of
>>> > `locus_tag=`.
>>> >
>>> > I don't really know how to check for multiple bioperl installations.
>>> > I'm using my personal server, so I don't mind removing and installing
>>> > everything from scratch -- but I do'nt know ho to do that.
>>> >
>>> > Also, what I don't get with the git is how the scripts are supposed to
>>> > be
>>> > updated (unless you build and install).
>>> >
>>> > Thanks you!
>>> >
>>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
>>> > wrote:
>>> >>
>>> >> Well, if you aren't getting the same results as me then I'd say you
>>> >> aren't using the same version of the script :-)
>>> >>
>>> >> Unfortunately, the scripts are no longer automatically marked with the
>>> >> "internal" version information when committed, so there really isn't
>>> >> anything in the script I can tell you to look for. ?Check for more
>>> >> than one bioperl instance on your ?computer.
>>> >>
>>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>>> >> it is what you expect.
>>> >>
>>> >> Scott
>>> >>
>>> >>
>>> >>
>>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>>> >> <david.breimann at gmail.com> wrote:
>>> >> > Hi Scott,
>>> >> >
>>> >> > I just pulled the lated bioperl-live using git.
>>> >> > I'm not sure how the scripts are updated, so I Build and installed
>>> >> > anyway
>>> >> > (perhaps exporting the path is supposed to be enough?)
>>> >> > Anyway, I still get the same results. No locus_tag.
>>> >> > How can I tell if I'm using the latest version of the script?
>>> >> >
>>> >> > Thanks again.
>>> >> >
>>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Dave,
>>> >> >>
>>> >> >> A fresh "pull" of the bioperl git repository shows that
>>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>>> >> >> all
>>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>>> >> >> when
>>> >> >> it can (it can't blindly use the locus tag for the ID since both
>>> >> >> the
>>> >> >> gene and the CDS have the same tag).
>>> >> >>
>>> >> >> Scott
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>>> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> > Hi Scott,
>>> >> >> >
>>> >> >> > Here is a very short genbank:
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>>> >> >> >
>>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>>> >> >> > GFF3,
>>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>>> >> >> > have
>>> >> >> > no
>>> >> >> > idea why it deserves a special treatment... :)
>>> >> >> >
>>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>>> >> >> > column
>>> >> >> > whenever available) will really make my life easier.
>>> >> >> >
>>> >> >> > Thank you,
>>> >> >> > Dave
>>> >> >> >
>>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain
>>> >> >> > <scott at scottcain.net>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hi Dave,
>>> >> >> >>
>>> >> >> >> That seems perfectly reasonable. ?If you could point out a
>>> >> >> >> GenBank
>>> >> >> >> entry for which that does not happen, I could try to figure out
>>> >> >> >> why
>>> >> >> >> not.
>>> >> >> >>
>>> >> >> >> Scott
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>>> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>>> >> >> >> > locus_tag
>>> >> >> >> > will
>>> >> >> >> > be
>>> >> >> >> > always added to the GFF last column if it exists in the
>>> >> >> >> > genbank,
>>> >> >> >> > whether
>>> >> >> >> > it
>>> >> >> >> > is used as ID in the GFF or not.
>>> >> >> >> >
>>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>>> >> >> >> > <scott at scottcain.net>
>>> >> >> >> > wrote:
>>> >> >> >> >>
>>> >> >> >> >> Hi Dave,
>>> >> >> >> >>
>>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>>> >> >> >> >> with
>>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole
>>> >> >> >> >> genome
>>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>>> >> >> >> >> it
>>> >> >> >> >> "do
>>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post
>>> >> >> >> >> processing step (either by hand or a quicky perl script) to
>>> >> >> >> >> be
>>> >> >> >> >> required to really get it right. ?I don't recall the
>>> >> >> >> >> specifics
>>> >> >> >> >> (if I
>>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
>>> >> >> >> >> do
>>> >> >> >> >> know
>>> >> >> >> >> that there is a list of things that it will try to use for
>>> >> >> >> >> the
>>> >> >> >> >> ID,
>>> >> >> >> >> and
>>> >> >> >> >> while the locus is on the list, I don't know where it comes
>>> >> >> >> >> in
>>> >> >> >> >> the
>>> >> >> >> >> list, so it's possible that other items might supersede it.
>>> >> >> >> >>
>>> >> >> >> >> Scott
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>>> >> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> >> > Hello,
>>> >> >> >> >> >
>>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
>>> >> >> >> >> > adds a
>>> >> >> >> >> > `locus_tag`
>>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>>> >> >> >> >> > genabank
>>> >> >> >> >> > has a
>>> >> >> >> >> > locus
>>> >> >> >> >> > tag.
>>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>>> >> >> >> >> >
>>> >> >> >> >> > Thanks,
>>> >> >> >> >> > Dave
>>> >> >> >> >> > _______________________________________________
>>> >> >> >> >> > Bioperl-l mailing list
>>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> >> >> scottcain
>>> >> >> >> >> dot net
>>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> >> 216-392-3087
>>> >> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> >> scottcain
>>> >> >> >> dot net
>>> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> 216-392-3087
>>> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >>
>>> >> >>
>>> >> >> ------------------------------------------------------------------------
>>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> scottcain
>>> >> >> dot net
>>> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> 216-392-3087
>>> >> >> Ontario Institute for Cancer Research
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> ------------------------------------------------------------------------
>>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> scottcain
>>> >> dot net
>>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>>> >> Ontario Institute for Cancer Research
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>>> dot net
>>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>>> Ontario Institute for Cancer Research
>>
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 09:57:30 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 15:57:30 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
Message-ID: <AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>

So let's do an intermediate summary of my situation:
I'm using Ubuntu 10.04 and Perl 5.10.1.
I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of
"locus_tag=" in the last GFF3 column), while Scott gets the expected results
while using the latest version of bioperl.
I cloned a fresh version of bioperl live into my ~/src:
$ cd ~/src
$ git clone http://github.com/bioperl/bioperl-live.git

I then added the following line to the end of ~/.profile:
export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
and ran
$ source ~/.profile

I then downloaded a small genome from NCBI
$ wget
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
and tested the script:
$ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk

Following are the top 10 lines of the resulting GFF3:

##gff-version 3
# sequence-region NC_009789 1 6199
# conversion-by bp_genbank2gff3.pl
# organism Escherichia coli E24377A
# date 06-JAN-2010
# Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
NC_009789    GenBank    region    1    6199    .    +    1
ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia
coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This
record has not yet been subject to final NCBI review. The reference sequence
was derived from CP000798. Source DNA and bacteria available from Jacques
Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL
REFSEQ: This record has not yet been subject to final NCBI review. The
reference sequence was derived from CP000798. Source DNA and bacteria
available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
E24377A;plasmid=pETEC_6;strain=E24377A
NC_009789    GenBank    gene    665    781    .    -    1
ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001
NC_009789    GenBank    mRNA    665    781    .    -    1
ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
NC_009789    GenBank    CDS    665    781    .    -    1
ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified
by glimmer%3B putative;codon_start=1;product=hypothetical
protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38

while these are from Scotts' file:
##gff-version 3
# sequence-region NC_009789 1 6199
# conversion-by bp_genbank2gff3.pl
# organism Escherichia coli E24377A
# date 06-JAN-2010
# Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
NC_009789    GenBank    region    1    6199    .    +    1
ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A
plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has
not yet been subject to final NCBI review. The reference sequence was
derived from CP000798. Source DNA and bacteria available from Jacques Ravel
(jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ:
This record has not yet been subject to final NCBI review. The reference
sequence was derived from CP000798. Source DNA and bacteria available from
Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
E24377A;plasmid=pETEC_6;strain=E24377A
NC_009789    GenBank    gene    665    781    .    -    1
ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001
NC_009789    GenBank    mRNA    665    781    .    -    1
ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
NC_009789    GenBank    CDS    665    781    .    -    1
ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified
by glimmer%3B
putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical
protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38


Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's,
as desired.
I have no idea what is going on here...

Best,
Dave

On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> Let's keep the discussion on the mailing list so we can make sure that
> when this problem is solved, its resolution will be archived.
>
> I don't really understand what is going on either, though it would
> probably be a good idea to set your PERL5LIB env variable so that when
> you execute this script from the git repository that it will also uses
> BioPerl modules in the git repository instead of the ones that are
> installed in your "normal" path.
>
> Also, are you using any command line flags when executing it?  I didn't.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Yes, I'm using Ubuntu 10.04.
> >
> > That is really weired. I tried running the script from the perl-live dir
> > (which I just pulled using git), and I get the same results as before
> > (`Name` instead of `locus_tag`):
> >
> >  $ wget
> >
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> >  $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
> > NC_009789.genbank
> >
> > Attached is the resulting GFF3.
> > I also attach a copy of bp_genbank2gff3.pl as found under
> > /home/dave/src/bioperl-live/blib/script.
> >
> > This is a real mystery for me!
> >
> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
> >>
> >> Typically I do build and install, but you can run it directly from the
> >> git checkout directory.
> >>
> >> For locating other versions of the script, are you running linux?  If
> >> so, are you familiar with the "locate" command:
> >>
> >>  locate bp_genbank2gff3.pl
> >>
> >> If you've never used it before, you may need to update the database
> >> the locate command uses as root:
> >>
> >>  sudo updatedb
> >>
> >> Scott
> >>
> >>
> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
> >> <david.breimann at gmail.com> wrote:
> >> > Your gff seems fine. I get a vey similiar one, but with `Name=`
> instaed
> >> > of
> >> > `locus_tag=`.
> >> >
> >> > I don't really know how to check for multiple bioperl installations.
> >> > I'm using my personal server, so I don't mind removing and installing
> >> > everything from scratch -- but I do'nt know ho to do that.
> >> >
> >> > Also, what I don't get with the git is how the scripts are supposed to
> >> > be
> >> > updated (unless you build and install).
> >> >
> >> > Thanks you!
> >> >
> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
> wrote:
> >> >>
> >> >> Well, if you aren't getting the same results as me then I'd say you
> >> >> aren't using the same version of the script :-)
> >> >>
> >> >> Unfortunately, the scripts are no longer automatically marked with
> the
> >> >> "internal" version information when committed, so there really isn't
> >> >> anything in the script I can tell you to look for.  Check for more
> >> >> than one bioperl instance on your  computer.
> >> >>
> >> >> I've attached the GFF3 file I got so you can look at it and tell me
> if
> >> >> it is what you expect.
> >> >>
> >> >> Scott
> >> >>
> >> >>
> >> >>
> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
> >> >> <david.breimann at gmail.com> wrote:
> >> >> > Hi Scott,
> >> >> >
> >> >> > I just pulled the lated bioperl-live using git.
> >> >> > I'm not sure how the scripts are updated, so I Build and installed
> >> >> > anyway
> >> >> > (perhaps exporting the path is supposed to be enough?)
> >> >> > Anyway, I still get the same results. No locus_tag.
> >> >> > How can I tell if I'm using the latest version of the script?
> >> >> >
> >> >> > Thanks again.
> >> >> >
> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi Dave,
> >> >> >>
> >> >> >> A fresh "pull" of the bioperl git repository shows that
> >> >> >> bp_genbank2gff3.pl already does this.  It creates a locus_tag for
> >> >> >> all
> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID
> >> >> >> when
> >> >> >> it can (it can't blindly use the locus tag for the ID since both
> the
> >> >> >> gene and the CDS have the same tag).
> >> >> >>
> >> >> >> Scott
> >> >> >>
> >> >> >>
> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> > Hi Scott,
> >> >> >> >
> >> >> >> > Here is a very short genbank:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> >> >> >> >
> >> >> >> > Note all genes in the genbank have locus tags. In the resulting
> >> >> >> > GFF3,
> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag.
> I
> >> >> >> > have
> >> >> >> > no
> >> >> >> > idea why it deserves a special treatment... :)
> >> >> >> >
> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3
> last
> >> >> >> > column
> >> >> >> > whenever available) will really make my life easier.
> >> >> >> >
> >> >> >> > Thank you,
> >> >> >> > Dave
> >> >> >> >
> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <
> scott at scottcain.net>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Hi Dave,
> >> >> >> >>
> >> >> >> >> That seems perfectly reasonable.  If you could point out a
> >> >> >> >> GenBank
> >> >> >> >> entry for which that does not happen, I could try to figure out
> >> >> >> >> why
> >> >> >> >> not.
> >> >> >> >>
> >> >> >> >> Scott
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
> >> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
> >> >> >> >> > locus_tag
> >> >> >> >> > will
> >> >> >> >> > be
> >> >> >> >> > always added to the GFF last column if it exists in the
> >> >> >> >> > genbank,
> >> >> >> >> > whether
> >> >> >> >> > it
> >> >> >> >> > is used as ID in the GFF or not.
> >> >> >> >> >
> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
> >> >> >> >> > <scott at scottcain.net>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> Hi Dave,
> >> >> >> >> >>
> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to
> deal
> >> >> >> >> >> with
> >> >> >> >> >> GenBank files :-)  It was designed initially to work on
> whole
> >> >> >> >> >> genome
> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to
> make
> >> >> >> >> >> it
> >> >> >> >> >> "do
> >> >> >> >> >> the right thing."  In practice, it is not unusual for a post
> >> >> >> >> >> processing step (either by hand or a quicky perl script) to
> be
> >> >> >> >> >> required to really get it right.  I don't recall the
> specifics
> >> >> >> >> >> (if I
> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
> do
> >> >> >> >> >> know
> >> >> >> >> >> that there is a list of things that it will try to use for
> the
> >> >> >> >> >> ID,
> >> >> >> >> >> and
> >> >> >> >> >> while the locus is on the list, I don't know where it comes
> in
> >> >> >> >> >> the
> >> >> >> >> >> list, so it's possible that other items might supersede it.
> >> >> >> >> >>
> >> >> >> >> >> Scott
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> >> >> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> >> >> > Hello,
> >> >> >> >> >> >
> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
> adds
> >> >> >> >> >> > a
> >> >> >> >> >> > `locus_tag`
> >> >> >> >> >> > in the fields and sometime it doesn't, even though the
> >> >> >> >> >> > genabank
> >> >> >> >> >> > has a
> >> >> >> >> >> > locus
> >> >> >> >> >> > tag.
> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
> >> >> >> >> >> >
> >> >> >> >> >> > Thanks,
> >> >> >> >> >> > Dave
> >> >> >> >> >> > _______________________________________________
> >> >> >> >> >> > Bioperl-l mailing list
> >> >> >> >> >> > Bioperl-l at lists.open-bio.org
> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> >> >> Scott Cain, Ph. D.                                   scott
> at
> >> >> >> >> >> scottcain
> >> >> >> >> >> dot net
> >> >> >> >> >> GMOD Coordinator (http://gmod.org/)
> >> >> >> >> >> 216-392-3087
> >> >> >> >> >> Ontario Institute for Cancer Research
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> >> Scott Cain, Ph. D.                                   scott at
> >> >> >> >> scottcain
> >> >> >> >> dot net
> >> >> >> >> GMOD Coordinator (http://gmod.org/)
> >> >> >> >> 216-392-3087
> >> >> >> >> Ontario Institute for Cancer Research
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >>
> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> Scott Cain, Ph. D.                                   scott at
> >> >> >> scottcain
> >> >> >> dot net
> >> >> >> GMOD Coordinator (http://gmod.org/)
> 216-392-3087
> >> >> >> Ontario Institute for Cancer Research
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>
> ------------------------------------------------------------------------
> >> >> Scott Cain, Ph. D.                                   scott at
> scottcain
> >> >> dot net
> >> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> >> Ontario Institute for Cancer Research
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>

From scott at scottcain.net  Sat Sep 18 10:03:43 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 15:03:43 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
	<AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>
Message-ID: <AANLkTi=yGr+VHDinKtSiy7XE3f7_F73hGRA4bNFzRGxz@mail.gmail.com>

The only thing I can add is that I did a 'git diff genbank2gff3.PLS'
and found no differences.  It occurred to me that perhaps I'd done
some fixing and not commited it, but it looks to me that that's not
the case (assuming I've managed to use git correctly (not a great
assumption, but I don't have another one to work with :-))

Scott


On Sat, Sep 18, 2010 at 2:57 PM, David Breimann
<david.breimann at gmail.com> wrote:
> So let's do an intermediate summary of my situation:
> I'm using Ubuntu 10.04 and Perl 5.10.1.
> I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of
> "locus_tag=" in the last GFF3 column), while Scott gets the expected results
> while using the latest version of bioperl.
> I cloned a fresh version of bioperl live into my ~/src:
> $ cd ~/src
> $ git clone http://github.com/bioperl/bioperl-live.git
>
> I then added the following line to the end of ~/.profile:
> export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
> and ran
> $ source ~/.profile
>
> I then downloaded a small genome from NCBI
> $ wget
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> and tested the script:
> $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk
>
> Following are the top 10 lines of the resulting GFF3:
>
> ##gff-version 3
> # sequence-region NC_009789 1 6199
> # conversion-by bp_genbank2gff3.pl
> # organism Escherichia coli E24377A
> # date 06-JAN-2010
> # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
> NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1
> ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia
> coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This
> record has not yet been subject to final NCBI review. The reference sequence
> was derived from CP000798. Source DNA and bacteria available from Jacques
> Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL
> REFSEQ: This record has not yet been subject to final NCBI review. The
> reference sequence was derived from CP000798. Source DNA and bacteria
> available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
> ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
> E24377A;plasmid=pETEC_6;strain=E24377A
> NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001
> NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
> NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified
> by glimmer%3B putative;codon_start=1;product=hypothetical
> protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38
>
> while these are from Scotts' file:
> ##gff-version 3
> # sequence-region NC_009789 1 6199
> # conversion-by bp_genbank2gff3.pl
> # organism Escherichia coli E24377A
> # date 06-JAN-2010
> # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
> NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1
> ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A
> plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has
> not yet been subject to final NCBI review. The reference sequence was
> derived from CP000798. Source DNA and bacteria available from Jacques Ravel
> (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ:
> This record has not yet been subject to final NCBI review. The reference
> sequence was derived from CP000798. Source DNA and bacteria available from
> Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
> ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
> E24377A;plasmid=pETEC_6;strain=E24377A
> NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001
> NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
> NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified
> by glimmer%3B
> putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical
> protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38
>
>
> Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's,
> as desired.
> I have no idea what is going on here...
>
> Best,
> Dave
>
> On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> Let's keep the discussion on the mailing list so we can make sure that
>> when this problem is solved, its resolution will be archived.
>>
>> I don't really understand what is going on either, though it would
>> probably be a good idea to set your PERL5LIB env variable so that when
>> you execute this script from the git repository that it will also uses
>> BioPerl modules in the git repository instead of the ones that are
>> installed in your "normal" path.
>>
>> Also, are you using any command line flags when executing it? ?I didn't.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Yes, I'm using Ubuntu 10.04.
>> >
>> > That is really weired. I tried running the script from the perl-live dir
>> > (which I just pulled using git), and I get the same results as before
>> > (`Name` instead of `locus_tag`):
>> >
>> > ?$ wget
>> >
>> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
>> > NC_009789.genbank
>> >
>> > Attached is the resulting GFF3.
>> > I also attach a copy of bp_genbank2gff3.pl as found under
>> > /home/dave/src/bioperl-live/blib/script.
>> >
>> > This is a real mystery for me!
>> >
>> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>> >>
>> >> Typically I do build and install, but you can run it directly from the
>> >> git checkout directory.
>> >>
>> >> For locating other versions of the script, are you running linux? ?If
>> >> so, are you familiar with the "locate" command:
>> >>
>> >> ?locate bp_genbank2gff3.pl
>> >>
>> >> If you've never used it before, you may need to update the database
>> >> the locate command uses as root:
>> >>
>> >> ?sudo updatedb
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Your gff seems fine. I get a vey similiar one, but with `Name=`
>> >> > instaed
>> >> > of
>> >> > `locus_tag=`.
>> >> >
>> >> > I don't really know how to check for multiple bioperl installations.
>> >> > I'm using my personal server, so I don't mind removing and installing
>> >> > everything from scratch -- but I do'nt know ho to do that.
>> >> >
>> >> > Also, what I don't get with the git is how the scripts are supposed
>> >> > to
>> >> > be
>> >> > updated (unless you build and install).
>> >> >
>> >> > Thanks you!
>> >> >
>> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
>> >> > wrote:
>> >> >>
>> >> >> Well, if you aren't getting the same results as me then I'd say you
>> >> >> aren't using the same version of the script :-)
>> >> >>
>> >> >> Unfortunately, the scripts are no longer automatically marked with
>> >> >> the
>> >> >> "internal" version information when committed, so there really isn't
>> >> >> anything in the script I can tell you to look for. ?Check for more
>> >> >> than one bioperl instance on your ?computer.
>> >> >>
>> >> >> I've attached the GFF3 file I got so you can look at it and tell me
>> >> >> if
>> >> >> it is what you expect.
>> >> >>
>> >> >> Scott
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>> >> >> <david.breimann at gmail.com> wrote:
>> >> >> > Hi Scott,
>> >> >> >
>> >> >> > I just pulled the lated bioperl-live using git.
>> >> >> > I'm not sure how the scripts are updated, so I Build and installed
>> >> >> > anyway
>> >> >> > (perhaps exporting the path is supposed to be enough?)
>> >> >> > Anyway, I still get the same results. No locus_tag.
>> >> >> > How can I tell if I'm using the latest version of the script?
>> >> >> >
>> >> >> > Thanks again.
>> >> >> >
>> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Dave,
>> >> >> >>
>> >> >> >> A fresh "pull" of the bioperl git repository shows that
>> >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>> >> >> >> all
>> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>> >> >> >> when
>> >> >> >> it can (it can't blindly use the locus tag for the ID since both
>> >> >> >> the
>> >> >> >> gene and the CDS have the same tag).
>> >> >> >>
>> >> >> >> Scott
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> > Hi Scott,
>> >> >> >> >
>> >> >> >> > Here is a very short genbank:
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> >> >> >> >
>> >> >> >> > Note all genes in the genbank have locus tags. In the resulting
>> >> >> >> > GFF3,
>> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag.
>> >> >> >> > I
>> >> >> >> > have
>> >> >> >> > no
>> >> >> >> > idea why it deserves a special treatment... :)
>> >> >> >> >
>> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3
>> >> >> >> > last
>> >> >> >> > column
>> >> >> >> > whenever available) will really make my life easier.
>> >> >> >> >
>> >> >> >> > Thank you,
>> >> >> >> > Dave
>> >> >> >> >
>> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain
>> >> >> >> > <scott at scottcain.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi Dave,
>> >> >> >> >>
>> >> >> >> >> That seems perfectly reasonable. ?If you could point out a
>> >> >> >> >> GenBank
>> >> >> >> >> entry for which that does not happen, I could try to figure
>> >> >> >> >> out
>> >> >> >> >> why
>> >> >> >> >> not.
>> >> >> >> >>
>> >> >> >> >> Scott
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>> >> >> >> >> > locus_tag
>> >> >> >> >> > will
>> >> >> >> >> > be
>> >> >> >> >> > always added to the GFF last column if it exists in the
>> >> >> >> >> > genbank,
>> >> >> >> >> > whether
>> >> >> >> >> > it
>> >> >> >> >> > is used as ID in the GFF or not.
>> >> >> >> >> >
>> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>> >> >> >> >> > <scott at scottcain.net>
>> >> >> >> >> > wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> Hi Dave,
>> >> >> >> >> >>
>> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to
>> >> >> >> >> >> deal
>> >> >> >> >> >> with
>> >> >> >> >> >> GenBank files :-) ?It was designed initially to work on
>> >> >> >> >> >> whole
>> >> >> >> >> >> genome
>> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to
>> >> >> >> >> >> make
>> >> >> >> >> >> it
>> >> >> >> >> >> "do
>> >> >> >> >> >> the right thing." ?In practice, it is not unusual for a
>> >> >> >> >> >> post
>> >> >> >> >> >> processing step (either by hand or a quicky perl script) to
>> >> >> >> >> >> be
>> >> >> >> >> >> required to really get it right. ?I don't recall the
>> >> >> >> >> >> specifics
>> >> >> >> >> >> (if I
>> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
>> >> >> >> >> >> do
>> >> >> >> >> >> know
>> >> >> >> >> >> that there is a list of things that it will try to use for
>> >> >> >> >> >> the
>> >> >> >> >> >> ID,
>> >> >> >> >> >> and
>> >> >> >> >> >> while the locus is on the list, I don't know where it comes
>> >> >> >> >> >> in
>> >> >> >> >> >> the
>> >> >> >> >> >> list, so it's possible that other items might supersede it.
>> >> >> >> >> >>
>> >> >> >> >> >> Scott
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >
>> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
>> >> >> >> >> >> > adds
>> >> >> >> >> >> > a
>> >> >> >> >> >> > `locus_tag`
>> >> >> >> >> >> > in the fields and sometime it doesn't, even though the
>> >> >> >> >> >> > genabank
>> >> >> >> >> >> > has a
>> >> >> >> >> >> > locus
>> >> >> >> >> >> > tag.
>> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>> >> >> >> >> >> >
>> >> >> >> >> >> > Thanks,
>> >> >> >> >> >> > Dave
>> >> >> >> >> >> > _______________________________________________
>> >> >> >> >> >> > Bioperl-l mailing list
>> >> >> >> >> >> > Bioperl-l at lists.open-bio.org
>> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott
>> >> >> >> >> >> at
>> >> >> >> >> >> scottcain
>> >> >> >> >> >> dot net
>> >> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> >> 216-392-3087
>> >> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> >> scottcain
>> >> >> >> >> dot net
>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> 216-392-3087
>> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------
>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> scottcain
>> >> >> >> dot net
>> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> 216-392-3087
>> >> >> >> Ontario Institute for Cancer Research
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------
>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> scottcain
>> >> >> dot net
>> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> >> Ontario Institute for Cancer Research
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From j.scholtalbers at gmail.com  Mon Sep 20 04:04:34 2010
From: j.scholtalbers at gmail.com (Jelle Scholtalbers)
Date: Mon, 20 Sep 2010 10:04:34 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
Message-ID: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>

Hi,

I'm trying to get all descendents for a specific taxon using Entrez.
each_Descendent and get_all_Descendents don't seem to be implemented or
working.  I then tried by getting the tree for this taxon using
Bio::DB::Taxonomy's get_tree. However this only retrieves the
ancestors/parents.
What would be the best approach here?

Cheers,
Jelle

On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> wrote:

> Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> Eric
>
> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> > Sounds like this is going through an initial indexing step (for
> flatfiles).  I would expect the initial indexing of the tables to take time
> as you have to create the DB, but subsequent lookups post-indexing should be
> much faster if the index is already present.  Maybe Jason could answer in
> more detail?
> >
> > chris
> >
> > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
> >
> >> Hello,
> >>
> >> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> >> 5.8.5 with BioPerl 1.6.0
> >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> >>
> >> It ran for 100 cpu seconds and output:
> >>
> >> 33090 Viridiplantae kingdom
> >>
> >> I was expecting it to also output the descendents. Some questions:
> >>
> >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> >> implemented? It looks to be in Taxon.pm but it is not documented and
> >> when I ran Data::Dumper on $node the value '_desc' was empty.
> >>
> >> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> >> with a call to 'entrez' it took only 0.02 cpu seconds to come
> >> up with the same result.
> >>
> >> thanks,
> >> Eric
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From pcantalupo at gmail.com  Mon Sep 20 10:46:32 2010
From: pcantalupo at gmail.com (Paul Cantalupo)
Date: Mon, 20 Sep 2010 10:46:32 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <AANLkTikCi7XiKptEghr7uckAUBcD_K3n_YAgHEJjGo_F@mail.gmail.com>

Jelle,

Below is my subroutine that returns the lineage corresponding to a
Taxonomy id. For example, if you use 10633 as the taxid, the
subroutine will return:

Viruses
dsDNA viruses, no RNA stage
Polyomaviridae
Polyomavirus
Simian virus 40

I hope this is what you wanted. Good luck

sub taxid2lineage {
?? my ($id) = @_;
?? return undef unless ($id);

?? my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
????????????????????????????????????????? -db??? => 'taxonomy',
????????????????????????????????????????? -email => 'pcantalupo at gmail.com',
????????????????????????????????????????? -id??? => [ $id ],
????????????????????????????????????????? );

?? my $res = $factory->get_Response->content;
?? my $data = XMLin($res);

?? if (!ref($data)) {
????? # this happens when the Taxid is not found in the Taxonomy DB
????? return $data;
?? }

?? my @lineage = ();

?? foreach my $taxa (@{ $data->{Taxon}->{LineageEx}->{Taxon} } ) {
????? # taxa is a hash with three keys ScientificName, TaxId, and Rank
????? # I'm only saving the ScientificName but possible extensions to this
????? # subroutine would be to return the TaxId and Rank as well.
????? push (@lineage, $taxa->{ScientificName});
?? }

?? # add the Species to the end of the Lineage array.
?? push (@lineage, $data->{Taxon}->{ScientificName});

?? return wantarray ? return @lineage : join("; ", @lineage);
}

Paul Cantalupo
University of Pittsburgh


On Mon, Sep 20, 2010 at 4:04 AM, Jelle Scholtalbers
<j.scholtalbers at gmail.com> wrote:
>
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working. ?I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> wrote:
>
> > Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> > Eric
> >
> > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu>
> > wrote:
> > > Sounds like this is going through an initial indexing step (for
> > flatfiles). ?I would expect the initial indexing of the tables to take time
> > as you have to create the DB, but subsequent lookups post-indexing should be
> > much faster if the index is already present. ?Maybe Jason could answer in
> > more detail?
> > >
> > > chris
> > >
> > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
> > >
> > >> Hello,
> > >>
> > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> > >> 5.8.5 with BioPerl 1.6.0
> > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> > >>
> > >> It ran for 100 cpu seconds and output:
> > >>
> > >> 33090 Viridiplantae kingdom
> > >>
> > >> I was expecting it to also output the descendents. Some questions:
> > >>
> > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> > >> implemented? It looks to be in Taxon.pm but it is not documented and
> > >> when I ran Data::Dumper on $node the value '_desc' was empty.
> > >>
> > >> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> > >> with a call to 'entrez' it took only 0.02 cpu seconds to come
> > >> up with the same result.
> > >>
> > >> thanks,
> > >> Eric
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Sep 20 11:38:36 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 20 Sep 2010 08:38:36 -0700
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <4C977FFC.5000205@bioperl.org>


This works for me to get all the descendents from sub-node. You have to 
call the function with the dabatase handle.   I am not sure if the Taxon 
implementation has reference to the dbhandle or not:
#!/usr/bin/perl -w
use strict;
use Bio::DB::Taxonomy;
my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump 
into this directory
my $db = Bio::DB::Taxonomy->new(-source => 'flatfile',
                                 -nodesfile => "$dbdir/nodes.dmp",
                                 -namesfile => "$dbdir/names.dmp",
                                 );
my $taxa = $db->get_taxon(-taxonid => 151341);
my @d = $db->get_all_Descendents($taxa);

print join("\n", map { $_->id . " " . $_->rank . " " . 
$_->scientific_name } @d), "\n";


Hope that helps.
Jelle Scholtalbers wrote, On 9/20/10 1:04 AM:
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working.  I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins<rec3141 at mcmaster.ca>  wrote:
>
>    
>> Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
>> Eric
>>
>> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields<cjfields at illinois.edu>
>> wrote:
>>      
>>> Sounds like this is going through an initial indexing step (for
>>>        
>> flatfiles).  I would expect the initial indexing of the tables to take time
>> as you have to create the DB, but subsequent lookups post-indexing should be
>> much faster if the index is already present.  Maybe Jason could answer in
>> more detail?
>>      
>>> chris
>>>
>>> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
>>>
>>>        
>>>> Hello,
>>>>
>>>> I tried the Bio::DB::Taxonomy example on this wiki page using perl
>>>> 5.8.5 with BioPerl 1.6.0
>>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>>>>
>>>> It ran for 100 cpu seconds and output:
>>>>
>>>> 33090 Viridiplantae kingdom
>>>>
>>>> I was expecting it to also output the descendents. Some questions:
>>>>
>>>> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
>>>> implemented? It looks to be in Taxon.pm but it is not documented and
>>>> when I ran Data::Dumper on $node the value '_desc' was empty.
>>>>
>>>> 2) is the flatfile reader always so slow? after replacing 'flatfile'
>>>> with a call to 'entrez' it took only 0.02 cpu seconds to come
>>>> up with the same result.
>>>>
>>>> thanks,
>>>> Eric
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>          
>>>        
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>      
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>    

From j.scholtalbers at gmail.com  Wed Sep 22 03:46:35 2010
From: j.scholtalbers at gmail.com (Jelle Scholtalbers)
Date: Wed, 22 Sep 2010 09:46:35 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <4C977FFC.5000205@bioperl.org>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
	<4C977FFC.5000205@bioperl.org>
Message-ID: <AANLkTi=nbAJcCs6a2F3pacUaxZE3JZ690dHTasLGug_a@mail.gmail.com>

Hi Jason,

this was the same method I was using. With the taxdump it works apparently,
however it does not work with Entrez as source. So I will just stick to a
up2date taxdump then. Thanks for your example.

@Paul: Your method gives indeed the lineage but will only retrieve the
ancestors. I want to retrieve all the descendents. Thx anyway.

Cheers,
Jelle

On Mon, Sep 20, 2010 at 5:38 PM, Jason Stajich <jason at bioperl.org> wrote:

>
> This works for me to get all the descendents from sub-node. You have to
> call the function with the dabatase handle.   I am not sure if the Taxon
> implementation has reference to the dbhandle or not:
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::Taxonomy;
> my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into
> this directory
> my $db = Bio::DB::Taxonomy->new(-source => 'flatfile',
>                                 -nodesfile => "$dbdir/nodes.dmp",
>                                 -namesfile => "$dbdir/names.dmp",
>                                 );
> my $taxa = $db->get_taxon(-taxonid => 151341);
> my @d = $db->get_all_Descendents($taxa);
>
> print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name
> } @d), "\n";
>
>
> Hope that helps.
> Jelle Scholtalbers wrote, On 9/20/10 1:04 AM:
>
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working.  I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> <rec3141 at mcmaster.ca> wrote:
>
>
>
>  Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> Eric
>
> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu> <cjfields at illinois.edu>
> wrote:
>
>
>  Sounds like this is going through an initial indexing step (for
>
>
>  flatfiles).  I would expect the initial indexing of the tables to take time
> as you have to create the DB, but subsequent lookups post-indexing should be
> much faster if the index is already present.  Maybe Jason could answer in
> more detail?
>
>
>  chris
>
> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
>
>
>
>  Hello,
>
> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> 5.8.5 with BioPerl 1.6.0http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>
> It ran for 100 cpu seconds and output:
>
> 33090 Viridiplantae kingdom
>
> I was expecting it to also output the descendents. Some questions:
>
> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> implemented? It looks to be in Taxon.pm but it is not documented and
> when I ran Data::Dumper on $node the value '_desc' was empty.
>
> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> with a call to 'entrez' it took only 0.02 cpu seconds to come
> up with the same result.
>
> thanks,
> Eric
> _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>         _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>      _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From waldenhe at muohio.edu  Fri Sep 24 15:15:48 2010
From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene)
Date: Fri, 24 Sep 2010 15:15:48 -0400
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>

Hello Bioperl Masters,

I am trying to perform a local blast with a query list of fasta files against a db of other fasta files.  I am attempting to use the Bio::Tools::Run::StandAloneBlastPlus module.  I have downleaded from the NCBI website BLAST+ 2.2.24+ and installed on my ubuntu machine.  I am using bioperl-1.5.2.

so the snibbit of code that is giving me errors is below:

my $seq_obj = Bio::Seq->new(-id =>$accn, -seq =>$seq);
my $report_obj = $blast_obj->blastall($seq_obj);
my $result_obj = $report_obj->next_result;
print $result_obj->num_hits;

The error I am getting is:

--------------------- WARNING ---------------------
MSG: cannot find path to blastall
---------------------------------------------------
Can't call method "next_result" on an undefined value at /media/C8B3-4A4A/Bioinformatics 1.1 beta/BioPerl/bioperl.pm line 284.

I think the real problem is the "cannot find path to Blastall.

>From reading around on different forums I have to make  a  .ncbirc text file with the location of BLAST+2.2.24+ on my machine.  I have that file in my /home folder.

How do I get  StandAloneBlastPlus synced up with BLAST+2.2.24+ ?  Am I approaching this right?

Thankyou,

Hans Waldenmaier

From ross at cuhk.edu.hk  Sat Sep 25 04:30:39 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Sat, 25 Sep 2010 16:30:39 +0800
Subject: [Bioperl-l] perl for GO
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <015201cb5c8b$ef693490$ce3b9db0$@edu.hk>


Given a set of GO IDs, e.g. 

GO:0008150
GO:0005750
GO:0006122
GO:0008121
GO:0003674
GO:0005575
GO:0008150
GO:0009507
GO:0009535
GO:0009567
GO:0009977
GO:0010027
GO:0031361

from
http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo

one can manually examine the hierarchy. Although there is go-perl
(http://search.cpan.org/~cmungall/go-perl/) and go-db-perl
(http://search.cpan.org/~cmungall/go-db-perl/), as a life science student
who just learns Perl, I find it difficult to draw a hierarchy tree (or
simply make it a table to count the occurrence) to produce something like:


biological_process (4)
*** cellular process (4)
****** cell adhesion (1)
****** cell differention (3)
Molecular function (4)
<snip>
Cellular component (4)
<snip>

Can anybody advise? I don't need any fancy figures at all...


From David.Messina at sbc.su.se  Sun Sep 26 12:11:54 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 26 Sep 2010 18:11:54 +0200
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>
References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>
Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0@sbc.su.se>

Hi Hans,


> I think the real problem is the "cannot find path to Blastall.

Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.

See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.

Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.


Dave


From maj at fortinbras.us  Sun Sep 26 20:43:15 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 27 Sep 2010 00:43:15 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W479062635438381285548195@webmail34>

Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:

git clone http://github.com/bioperl/bioperl-live.git
git clone http://github.com/bioperl/bioperl-run.git

(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl

cheers MAJ

--------------------------
Mark A. Jensen, PhD
Senior Consultant
Fortinbras Research
http://www.fortinbras.us

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Sunday, September 26, 2010 12:11 PM
>To: 'Waldenmaier, Hans Eugene'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans,
>
>
>> I think the real problem is the "cannot find path to Blastall.
>
>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>
>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Sep 27 17:07:11 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 27 Sep 2010 16:07:11 -0500
Subject: [Bioperl-l] Client-side Scansite Bioperl module
In-Reply-To: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
References: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
Message-ID: <D8C2E980-BEB9-4231-8014-7318778CD215@illinois.edu>

Sorry, didn't see this being responded to on-list (been off the radar the last month).  I think this is a good idea, but I'm wondering if this might be better as a separate release on CPAN from bioperl core, seeing as we're in the prelim stages after the next bioperl release of modularizing the current bioperl core into smaller independent releases.

chris

On Sep 4, 2010, at 10:40 AM, Jonathan Rameseder wrote:

> hi guys
> 
> it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases.  maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D!
> 
> best wishes
> johnny
> 
> [1] Bio::Tools::Analysis::Protein::Scansite
> [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593
> 
> ********************
> Jonathan Rameseder
> Ph.D. Candidate
> Computational Systems Biology Initiative
> Koch Institute for Integrative Cancer Research
> Massachusetts Institute of Technology
> ********************


From gandipalem at gmail.com  Tue Sep 28 00:09:06 2010
From: gandipalem at gmail.com (bv s)
Date: Tue, 28 Sep 2010 09:39:06 +0530
Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19
In-Reply-To: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
References: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
Message-ID: <AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>

Dear Sir/Madam,

Any one can tell how to use the make_primers.pl script?
What is Coordination file?

Regards
Suresh
Scholar,
National Bureau Of Plant Genetic Resources,
New Delhi.


On Mon, Sep 27, 2010 at 9:30 PM, <bioperl-l-request at lists.open-bio.org>wrote:

> Send Bioperl-l mailing list submissions to
>        bioperl-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> or, via email, send a message with subject or body 'help' to
>        bioperl-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
>        bioperl-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioperl-l digest..."
>
>
> Today's Topics:
>
>   1. Re:  StandAloneBlastPlus (Dave Messina)
>   2. Re:  StandAloneBlastPlus (Mark A. Jensen)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 26 Sep 2010 18:11:54 +0200
> From: Dave Messina <David.Messina at sbc.su.se>
> Subject: Re: [Bioperl-l] StandAloneBlastPlus
> To: "Waldenmaier, Hans Eugene" <waldenhe at muohio.edu>
> Cc: "bioperl-l at bioperl.org" <bioperl-l at bioperl.org>
> Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0 at sbc.su.se>
> Content-Type: text/plain; charset=us-ascii
>
> Hi Hans,
>
>
> > I think the real problem is the "cannot find path to Blastall.
>
> Yes. But it sounds like you're trying to use the Bio::Tools::Run modules
> for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it
> has blastn, blastp, etc.
>
> See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
> Also, you probably need to upgrade your BioPerl installation. I'm pretty
> sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
> Dave
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 27 Sep 2010 00:43:15 +0000
> From: "Mark A. Jensen" <maj at fortinbras.us>
> Subject: Re: [Bioperl-l] StandAloneBlastPlus
> To: "Dave Messina" <David.Messina at sbc.su.se>,   "Waldenmaier, Hans
>        Eugene" <waldenhe at muohio.edu>
> Cc: bioperl-l at bioperl.org
> Message-ID: <W479062635438381285548195 at webmail34>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Hans-- Dave is right; you'll need both the new blast+ as well as the
> latest BioPerl trunk code. Get it by doing both of the following:
>
> git clone http://github.com/bioperl/bioperl-live.git
> git clone http://github.com/bioperl/bioperl-run.git
>
> (i.e., you need the latest core and run distributions). To install, see
> http://www.bioperl.org/wiki/Installing_BioPerl
>
> cheers MAJ
>
> --------------------------
> Mark A. Jensen, PhD
> Senior Consultant
> Fortinbras Research
> http://www.fortinbras.us
>
> >-----Original Message-----
> >From: Dave Messina [mailto:David.Messina at sbc.su.se]
> >Sent: Sunday, September 26, 2010 12:11 PM
> >To: 'Waldenmaier, Hans Eugene'
> >Cc: bioperl-l at bioperl.org
> >Subject: Re: [Bioperl-l] StandAloneBlastPlus
> >
> >Hi Hans,
> >
> >
> >> I think the real problem is the "cannot find path to Blastall.
> >
> >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules
> for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it
> has blastn, blastp, etc.
> >
> >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
> >
> >Also, you probably need to upgrade your BioPerl installation. I'm pretty
> sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
> >
> >
> >
> >Dave
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> End of Bioperl-l Digest, Vol 89, Issue 19
> *****************************************
>

From David.Messina at sbc.su.se  Tue Sep 28 03:53:29 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 28 Sep 2010 09:53:29 +0200
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>
References: <W479062635438381285548195@webmail34>
	<23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>
Message-ID: <0BFD9DB0-40D9-4443-8968-CF5D5A31BD02@sbc.su.se>

> I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  

Type the following on the command line:
perl -e 'print $ENV{PATH}, "\n"'

You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing 
export /home/hans/BLAST/bin:PATH=${PATH}

on the command line and then type
perl -e 'print $ENV{PATH}, "\n"'

again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script?


> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
> export PATH=${PATH}:/home/hans/BLAST/bin
> export BLASTDIR=/home/hans/BLAST/
> 
> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.

It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added.

It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you.


Dave

From David.Messina at sbc.su.se  Tue Sep 28 03:58:00 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 28 Sep 2010 09:58:00 +0200
Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19
In-Reply-To: <AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>
References: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
	<AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>
Message-ID: <6BACC902-4F5E-466B-B949-FE373831CB92@sbc.su.se>

> Any one can tell how to use the make_primers.pl script?
> What is Coordination file?

>From the documentation at the top of the script:

Description: This program designs primers for constructing knockouts
of genes by transformation of PCR products (ref: Datsenko & Wanner,
PNAS 2000).

A tab-delimited file containing ORF START STOP is read, and primers
flanking the start & stop coordinates are designed based on the
user-designated sequence file.

In addition, primers flanking the knockout regions are chosen for PCR
screening purposes once the knockout is generated.  The script uses
Bioperl in order to determine the primer sequences, which requires
getting subsequences and reverse complementing some of the objects.


Dave


From maj at fortinbras.us  Tue Sep 28 07:18:34 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 28 Sep 2010 11:18:34 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W726055216153101285672714@webmail57>

The module checks the env variable BLASTPLUSDIR for the executable; 
you can set it directly

export BLASTPLUSDIR=/home/hans/BLAST/bin

and you should be good to go.
MAJ

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Tuesday, September 28, 2010 03:53 AM
>To: 'Waldenmaier, Hans Eugene'
>Cc: 'Mark A. Jensen', bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>> I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
>
>Type the following on the command line:
>perl -e 'print $ENV{PATH}, "\n"'
>
>You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing 
>export /home/hans/BLAST/bin:PATH=${PATH}
>
>on the command line and then type
>perl -e 'print $ENV{PATH}, "\n"'
>
>again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script?
>
>
>> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
>> export PATH=${PATH}:/home/hans/BLAST/bin
>> export BLASTDIR=/home/hans/BLAST/
>> 
>> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.
>
>It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added.
>
>It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you.
>
>
>Dave
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From waldenhe at muohio.edu  Tue Sep 28 00:52:56 2010
From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene)
Date: Tue, 28 Sep 2010 00:52:56 -0400
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <W479062635438381285548195@webmail34>
References: <W479062635438381285548195@webmail34>
Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>

Thanks Guys,

I have run those steps, my current version now is:
hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
1.006001

But I am still having problems.

I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI.  I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now.  
I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
export PATH=${PATH}:/home/hans/BLAST/bin
export BLASTDIR=/home/hans/BLAST/

Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.

Thanks for the help,

Hans
________________________________________
From: Mark A. Jensen [maj at fortinbras.us]
Sent: Sunday, September 26, 2010 8:43 
To: Dave Messina; Waldenmaier, Hans Eugene
Cc: bioperl-l at bioperl.org
Subject: Re:  [Bioperl-l] StandAloneBlastPlus

Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:

git clone http://github.com/bioperl/bioperl-live.git
git clone http://github.com/bioperl/bioperl-run.git

(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl

cheers MAJ

--------------------------
Mark A. Jensen, PhD
Senior Consultant
Fortinbras Research
http://www.fortinbras.us

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Sunday, September 26, 2010 12:11 PM
>To: 'Waldenmaier, Hans Eugene'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans,
>
>
>> I think the real problem is the "cannot find path to Blastall.
>
>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>
>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Tue Sep 28 11:04:07 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 28 Sep 2010 15:04:07 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W8604325546183031285686247@webmail66>

Should work from .bashrc, Hans. Also add  

export BLASTPLUSDIR=/home/hans/BLAST/bin

It really should see it in the PATH as you have it, so that may be a bug; however the BLASTPLUSDIR should force it to see the program.

You can also execute the export commands in the shell, and the variables will be set and visible to programs for the duration of the login session. You can see what they are set to in the shell by doing

set | grep BLAST

cheers MAJ

>-----Original Message-----
>From: Waldenmaier, Hans Eugene [mailto:waldenhe at muohio.edu]
>Sent: Tuesday, September 28, 2010 12:52 AM
>To: 'Mark A. Jensen', 'Dave Messina'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Thanks Guys,
>
>I have run those steps, my current version now is:
>hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
>1.006001
>
>But I am still having problems.
>
>I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI.  I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
>Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now.  
>I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
>export PATH=${PATH}:/home/hans/BLAST/bin
>export BLASTDIR=/home/hans/BLAST/
>
>Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.
>
>Thanks for the help,
>
>Hans
>________________________________________
>From: Mark A. Jensen [maj at fortinbras.us]
>Sent: Sunday, September 26, 2010 8:43 
>To: Dave Messina; Waldenmaier, Hans Eugene
>Cc: bioperl-l at bioperl.org
>Subject: Re:  [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:
>
>git clone http://github.com/bioperl/bioperl-live.git
>git clone http://github.com/bioperl/bioperl-run.git
>
>(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl
>
>cheers MAJ
>
>--------------------------
>Mark A. Jensen, PhD
>Senior Consultant
>Fortinbras Research
>http://www.fortinbras.us
>
>>-----Original Message-----
>>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>>Sent: Sunday, September 26, 2010 12:11 PM
>>To: 'Waldenmaier, Hans Eugene'
>>Cc: bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>>
>>Hi Hans,
>>
>>
>>> I think the real problem is the "cannot find path to Blastall.
>>
>>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>>
>>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>>
>>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>>
>>
>>
>>Dave
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From chiragmatkarbioinfo at gmail.com  Thu Sep 30 08:20:35 2010
From: chiragmatkarbioinfo at gmail.com (chirag matkar)
Date: Thu, 30 Sep 2010 19:20:35 +0700
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
Message-ID: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>

Hello all,
Is there any module to fetch dna sequence data from ensemble gene id?

-- 
Regards,
Chirag Matkar

From jun.yin at ucd.ie  Thu Sep 30 09:36:31 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Thu, 30 Sep 2010 14:36:31 +0100
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
In-Reply-To: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
References: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
Message-ID: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>

Hi, Chirag,

BioPerl does not have any module to retrieve data from Ensembl. But Ensembl
provides a BioPerl-like interface on that function.

You can visit Ensembl's website on how to use that module:
http://www.ensembl.org/info/data/api.html

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar
Sent: Thursday, September 30, 2010 1:21 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id

Hello all,
Is there any module to fetch dna sequence data from ensemble gene id?

-- 
Regards,
Chirag Matkar
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 
 
__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Thu Sep 30 11:16:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 30 Sep 2010 10:16:45 -0500
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
In-Reply-To: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>
References: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
	<011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>
Message-ID: <E4D361DE-CEB6-45CC-83EE-6547C497ABE3@illinois.edu>

On Sep 30, 2010, at 8:36 AM, Jun Yin wrote:

> Hi, Chirag,
> 
> BioPerl does not have any module to retrieve data from Ensembl. But Ensembl
> provides a BioPerl-like interface on that function.

Actually, BioPerl does have Bio::Tools::Run::Ensembl, which was submitted by Sendu Bala a few years back.  I think it stills works rather well, at least tests pass.  You might get more out of using the Ensembl API directly as Jun states though, YMMV.

BTW, the ensembl API also works with the latest bioperl code, regardless what the Ensembl website says (e.g. they only support v1.2.3).  Haven't heard more about whether this discrepancy was supposed to be addressed at some point.

chris

> You can visit Ensembl's website on how to use that module:
> http://www.ensembl.org/info/data/api.html
> 
> Cheers,
> Jun Yin
> Ph.D. student in U.C.D.
> 
> Bioinformatics Laboratory
> Conway Institute
> University College Dublin
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar
> Sent: Thursday, September 30, 2010 1:21 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
> 
> Hello all,
> Is there any module to fetch dna sequence data from ensemble gene id?
> 
> -- 
> Regards,
> Chirag Matkar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From A.Vakhrusheva at lumc.nl  Wed Sep 29 09:28:54 2010
From: A.Vakhrusheva at lumc.nl (A.Vakhrusheva at lumc.nl)
Date: Wed, 29 Sep 2010 15:28:54 +0200
Subject: [Bioperl-l] Bio::Matrix::MatrixI
Message-ID: <35D95AF6C5D146479C328BBBA554FB76028C367E@mailf.lumcnet.prod.intern>

Bio::Matrix::MatrixI

I have a question concerning this interface.

I want to calculate p distances matrix, but what format is acceptable
for input? Phylip doesn't work

Anna 


From shalabh.sharma7 at gmail.com  Wed Sep  1 16:56:35 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Wed, 1 Sep 2010 16:56:35 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer
Message-ID: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>

Hi ,
     I am trying to parse hmmsearch report (from HMMER3). I am using the
script mentioned here:
http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm

I am not getting anything but this "amoA_10genes_align.fasta.2  [M=247] for
HMM" as the output, i am not even getting any error.
I am attaching the hmmsearch report (just a test report) which i tried to
test against the parser.

I would really appreciate if anyone can help me out.

Thanks
Shalabh Sharma
-------------- next part --------------
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  amoA_10genes.hmm
# target sequence database:        test.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       amoA_10genes_align.fasta.2  [M=247]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                   Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                   -----------
    1.6e-72  231.1   5.1    1.7e-72  231.0   3.5    1.0  1  gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacte
    1.6e-72  231.1   5.1    1.7e-72  231.0   3.5    1.0  1  gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacte


Domain annotation for each sequence (and alignments):
>> gi|63021979|gb|AAY26564.1|  AmoA [uncultured beta proteobacterium]
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  231.0   3.5   1.7e-72   1.7e-72     113     245 ..       1     144 [.       1     146 [. 0.95

  Alignments for each domain:
  == domain 1    score: 231.0 bits;  conditional E-value: 1.7e-72
  amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195
                                 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe
  gi|63021979|gb|AAY26564.1|   1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 
                                 8********************************************************************************** PP

  amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245
                                 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af           +k+
  gi|63021979|gb|AAY26564.1|  84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144
                                 **********************************************966666666655555 PP

>> gi|63021981|gb|AAY26565.1|  AmoA [uncultured beta proteobacterium]
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  231.0   3.5   1.7e-72   1.7e-72     113     245 ..       1     144 [.       1     146 [. 0.95

  Alignments for each domain:
  == domain 1    score: 231.0 bits;  conditional E-value: 1.7e-72
  amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195
                                 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe
  gi|63021981|gb|AAY26565.1|   1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 
                                 8********************************************************************************** PP

  amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245
                                 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af           +k+
  gi|63021981|gb|AAY26565.1|  84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144
                                 **********************************************966666666655555 PP


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (247 nodes)
Target sequences:                          2  (300 residues)
Passed MSV filter:                         2  (1); expected 0.0 (0.02)
Passed bias filter:                        2  (1); expected 0.0 (0.02)
Passed Vit filter:                         2  (1); expected 0.0 (0.001)
Passed Fwd filter:                         2  (1); expected 0.0 (1e-05)
Initial search space (Z):                  2  [actual number of targets]
Domain search space  (domZ):               2  [number of targets reported over threshold]
# CPU time: 0.03u 0.00s 00:00:00.03 Elapsed: 00:00:00.08
# Mc/sec: 0.93
//

From thomas.sharpton at gmail.com  Wed Sep  1 17:29:26 2010
From: thomas.sharpton at gmail.com (Thomas Sharpton)
Date: Wed, 1 Sep 2010 14:29:26 -0700
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
Message-ID: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>

Hi Shalabh,

We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to  
use the HMMER3 version, as found here:

http://github.com/bioperl/bioperl-hmmer3

Hope this helps,
T

On Sep 1, 2010, at 1:56 PM, shalabh sharma wrote:

> Hi ,
>     I am trying to parse hmmsearch report (from HMMER3). I am using  
> the
> script mentioned here:
> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm
>
> I am not getting anything but this "amoA_10genes_align.fasta.2   
> [M=247] for
> HMM" as the output, i am not even getting any error.
> I am attaching the hmmsearch report (just a test report) which i  
> tried to
> test against the parser.
>
> I would really appreciate if anyone can help me out.
>
> Thanks
> Shalabh Sharma
> <test_amao.txt>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From kai.blin at biotech.uni-tuebingen.de  Thu Sep  2 04:44:58 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 2 Sep 2010 10:44:58 +0200
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
	<8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
Message-ID: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>

On Wed, 1 Sep 2010 14:29:26 -0700
Thomas Sharpton <thomas.sharpton at gmail.com> wrote:

Hi,

> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to  
> use the HMMER3 version, as found here:
> 
> http://github.com/bioperl/bioperl-hmmer3

Actually it's now included in the bioperl-live repository, but the code
hasn't made it into a release yet.

http://github.com/bioperl/bioperl-live.git

Cheers,
Kai
-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From e.stupka at ucl.ac.uk  Thu Sep  2 08:32:02 2010
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Thu, 2 Sep 2010 13:32:02 +0100
Subject: [Bioperl-l] git account
Message-ID: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>

Hello there,

I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any?

thanks!

Elia


---
'"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." 
~ Stephen Hawkings

Senior Lecturer, Bioinformatics
Scientific Director - Bioinformatics, UCL Genomics

UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Institute of Cell and Molecular Science
Barts and The London School of Medicine and Dentistry
4 Newark Street
Whitechapel
London
E1 2AT

Office (UCL): +44 207 679 6493
Fax: +44 0207 6796817 
Office (ICMS): +44 0207 8822374

Mobile: +44 787 6478912


From cjfields at illinois.edu  Thu Sep  2 10:29:40 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 2 Sep 2010 09:29:40 -0500
Subject: [Bioperl-l] git account
In-Reply-To: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>
References: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>
Message-ID: <C9AD72E7-32C2-48D0-A7AB-18F42BABD28D@illinois.edu>

Done!  Let us know if you run into problems.

chris

On Sep 2, 2010, at 7:32 AM, Elia Stupka wrote:

> Hello there,
> 
> I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any?
> 
> thanks!
> 
> Elia
> 
> 
> ---
> '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." 
> ~ Stephen Hawkings
> 
> Senior Lecturer, Bioinformatics
> Scientific Director - Bioinformatics, UCL Genomics
> 
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Institute of Cell and Molecular Science
> Barts and The London School of Medicine and Dentistry
> 4 Newark Street
> Whitechapel
> London
> E1 2AT
> 
> Office (UCL): +44 207 679 6493
> Fax: +44 0207 6796817 
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 787 6478912
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From J.Christopher.Ellis at duke.edu  Thu Sep  2 10:53:34 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Thu,  2 Sep 2010 10:53:34 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <53096.1283439214@duke.edu>

 Chris have you had any luck with this?

 Thanks,
 Chris

 On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent:
 Yes, I see that one. It may be the ID hash that is being returned is
empty. I'll look into it.

 -c 

 On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:

 > Hi Chris,
 > 
 > The error is...
 > 
 > "Use of uninitialized value $id in join or string at
C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."
 > 
 > The script from
http://bioperl.org/wiki/Species_names_from_accession_numbers is as
follows....
 > 
 > use Bio::DB::EUtilities;
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > my (%taxa, @taxa);
 > 
 > 
 > 
 > my (%names, %idmap);
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # these are protein ids; nuc ids will work by changing -dbfrom =>
'nucleotide',
 > 
 > 
 > 
 > # (probably)
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > my @ids = qw(1621261 89318838 68536103 
 > 
 > 20807972
 > 730439);
 > 
 > 
 > 
 > 
 > 
 > 
 > my $factory = Bio::DB::EUtilities->new(
 > 
 > -
 > eutil => 'elink',
 > 
 > 
 > -db => 'taxonomy',
 > 
 > 
 > 
 > 
 > -dbfrom => 'protein',
 > 
 > 
 > 
 > 
 > -correspondence => 1,
 > 
 > 
 > 
 > 
 > -id => @ids);
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # iterate through the LinkSet objects
 > 
 > 
 > 
 > while (my $ds = $factory->next_LinkSet) {
 > 
 > 
 > 
 > 
 > $taxa{($ds->get_submitted_ids)[0]
 > 
 > }
 > = ($ds->get_ids)[0]
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > @taxa = @taxa{@ids};
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > $factory = Bio::DB::EUtilities->new(-eutil 
 > 
 > =>
 > 'esummary',
 > 
 > 
 > -db => 'taxonomy',
 > 
 > 
 > 
 > 
 > -id => @taxa );
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > while (local $_ = $factory->next_DocSum)
 > 
 > 
 > {
 > 
 > 
 > $names{($_->get_contents_by_name('TaxId'))
 > 
 > [
 > 0]} = 
 > 
 > ($_->get_contents_by_name('ScientificName'))[0
 > 
 > ]
 > ;
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > foreach (@ids) {
 > 
 > 
 > 
 > 
 > $idmap{$_} = $names{$taxa{$_
 > 
 > }
 > };
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # %idmap is
 > 
 > 
 > 
 > # 1621261 => 'Mycobacterium tuberculosis H37Rv'
 > 
 > 
 > 
 > # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
 > 
 > 
 > 
 > # 68536103 => 'Corynebacterium jeikeium K411'
 > 
 > 
 > 
 > # 730439 => 'Bacillus caldolyticus'
 > 
 > 
 > 
 > # 89318838 => undef (this record has been removed from the db)
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 1;
 > 
 > 
 > Thanks,
 > 
 > 
 > 
 > Chris
 > 
 > 
 > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
 > Chris,
 > 
 > Regarding a fix for that script, we would have to see your modified
script and the error. However, there are modules within BioPerl to
essentially do what you want, in particular, Bio::DB::Taxonomy.
 > 
 > chris
 > 
 > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
 > 
 > > Hi All,
 > > 
 > > I am trying to extract the entire taxonomy of an organism including
the
 > > classifications. Some thing like...
 > > 
 > > Phylum:Proteobacteria, Class:Gammaproteobacteria,
Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
 > > 
 > > I am not worried about format just that I get the information and the
associated level of hierarchy. The script found
athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers
seemed like a good starting point so I copied it and tried run it but got
an error.
 > > 
 > > My first question is "Is there a known fix for this?" and my second
question is how do I get the full hierarchical information (as seen above)
with the taxonomy db?
 > > 
 > > Thanks for all your help in advance!
 > > 
 > > Chris 
 > > 
 > > 
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l at lists.open-bio.org
 > >
http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 

 
From cjfields at illinois.edu  Thu Sep  2 12:21:48 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 02 Sep 2010 11:21:48 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <53096.1283439214@duke.edu>
References: <53096.1283439214@duke.edu>
Message-ID: <1283444508.5339.10.camel@pyrimidine.igb.uiuc.edu>

Chris,

There are a few things wrong with the original script, so I'll fix them.
Basically, it makes the assumption that every ID in the original list is
found.  The problem: eutils only reports back data it finds, silently
discarding IDs that don't match.  So, using the original ID list when
building the hashes needs a bit more error checking.

Here's the revised script that works for me.

https://gist.github.com/f5db90a432fed68548d4

I'm also adding a check to ensure all IDs are defined prior to adding
them to the param string, just in case.

chris

On Thu, 2010-09-02 at 10:53 -0400, J. Christopher Ellis wrote:
> Chris have you had any luck with this?
> 
> Thanks,
> Chris
> 
> On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent:
>         Yes, I see that one. It may be the ID hash that is being
>         returned is empty. I'll look into it.
>         
>         -c 
>         
>         On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:
>         
>         > Hi Chris,
>         > 
>         > The error is...
>         > 
>         > "Use of uninitialized value $id in join or string at
>         C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm
>         line 363."
>         > 
>         > The script from
>         http://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows....
>         > 
>         > use Bio::DB::EUtilities;
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my (%taxa, @taxa);
>         > 
>         > 
>         > 
>         > my (%names, %idmap);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # these are protein ids; nuc ids will work by changing
>         -dbfrom => 'nucleotide',
>         > 
>         > 
>         > 
>         > # (probably)
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my @ids = qw(1621261 89318838 68536103 
>         > 
>         > 20807972
>         > 730439);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my $factory = Bio::DB::EUtilities->new(
>         > 
>         > -
>         > eutil => 'elink',
>         > 
>         > 
>         > -db => 'taxonomy',
>         > 
>         > 
>         > 
>         > 
>         > -dbfrom => 'protein',
>         > 
>         > 
>         > 
>         > 
>         > -correspondence => 1,
>         > 
>         > 
>         > 
>         > 
>         > -id => \@ids);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # iterate through the LinkSet objects
>         > 
>         > 
>         > 
>         > while (my $ds = $factory->next_LinkSet) {
>         > 
>         > 
>         > 
>         > 
>         > $taxa{($ds->get_submitted_ids)[0]
>         > 
>         > }
>         > = ($ds->get_ids)[0]
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > @taxa = @taxa{@ids};
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > $factory = Bio::DB::EUtilities->new(-eutil 
>         > 
>         > =>
>         > 'esummary',
>         > 
>         > 
>         > -db => 'taxonomy',
>         > 
>         > 
>         > 
>         > 
>         > -id => \@taxa );
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > while (local $_ = $factory->next_DocSum)
>         > 
>         > 
>         > {
>         > 
>         > 
>         > $names{($_->get_contents_by_name('TaxId'))
>         > 
>         > [
>         > 0]} = 
>         > 
>         > ($_->get_contents_by_name('ScientificName'))[0
>         > 
>         > ]
>         > ;
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > foreach (@ids) {
>         > 
>         > 
>         > 
>         > 
>         > $idmap{$_} = $names{$taxa{$_
>         > 
>         > }
>         > };
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # %idmap is
>         > 
>         > 
>         > 
>         > # 1621261 => 'Mycobacterium tuberculosis H37Rv'
>         > 
>         > 
>         > 
>         > # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
>         > 
>         > 
>         > 
>         > # 68536103 => 'Corynebacterium jeikeium K411'
>         > 
>         > 
>         > 
>         > # 730439 => 'Bacillus caldolyticus'
>         > 
>         > 
>         > 
>         > # 89318838 => undef (this record has been removed from the
>         db)
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 1;
>         > 
>         > 
>         > Thanks,
>         > 
>         > 
>         > 
>         > Chris
>         > 
>         > 
>         > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu
>         sent:
>         > Chris,
>         > 
>         > Regarding a fix for that script, we would have to see your
>         modified script and the error. However, there are modules
>         within BioPerl to essentially do what you want, in particular,
>         Bio::DB::Taxonomy.
>         > 
>         > chris
>         > 
>         > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
>         > 
>         > > Hi All,
>         > > 
>         > > I am trying to extract the entire taxonomy of an organism
>         including the
>         > > classifications. Some thing like...
>         > > 
>         > > Phylum:Proteobacteria, Class:Gammaproteobacteria,
>         Order:Enterobacteriales, Family:Enterobacteriaceae,
>         Genus:Escherichia
>         > > 
>         > > I am not worried about format just that I get the
>         information and the associated level of hierarchy. The script
>         found
>         http://bioperl.org/wiki/Species_names_from_accession_numbers%
>         26quot%3B%26gt%
>         3Bhttp://bioperl.org/wiki/Species_names_from_accession_numbers">athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
>         > > 
>         > > My first question is "Is there a known fix for this?" and
>         my second question is how do I get the full hierarchical
>         information (as seen above) with the taxonomy db?
>         > > 
>         > > Thanks for all your help in advance!
>         > > 
>         > > Chris 
>         > > 
>         > > 
>         > > _______________________________________________
>         > > Bioperl-l mailing list
>         > > Bioperl-l at lists.open-bio.org
>         > > http://lists.open-bio.org/mailman/listinfo/bioperl-l%
>         26quot%3B%26gt%
>         3Bhttp://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
>         > 
>         > 
>         
>         


From thomas.sharpton at gmail.com  Thu Sep  2 12:34:07 2010
From: thomas.sharpton at gmail.com (Thomas Sharpton)
Date: Thu, 2 Sep 2010 09:34:07 -0700
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
	<8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
	<20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <C272A232-C37D-40F9-B729-626B8DB17BB3@gmail.com>

So it is! I'm paying attention, I swear I am....

Shalabh, if the HMMER3 version of SearchIO doesn't solve your problem,  
do let us know.

Best,
Tom

On Sep 2, 2010, at 1:44 AM, Kai Blin wrote:

> On Wed, 1 Sep 2010 14:29:26 -0700
> Thomas Sharpton <thomas.sharpton at gmail.com> wrote:
>
> Hi,
>
>> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to
>> use the HMMER3 version, as found here:
>>
>> http://github.com/bioperl/bioperl-hmmer3
>
> Actually it's now included in the bioperl-live repository, but the  
> code
> hasn't made it into a release yet.
>
> http://github.com/bioperl/bioperl-live.git
>
> Cheers,
> Kai
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From johnny at mit.edu  Sat Sep  4 11:40:37 2010
From: johnny at mit.edu (Jonathan Rameseder)
Date: Sat, 4 Sep 2010 11:40:37 -0400
Subject: [Bioperl-l] Client-side Scansite Bioperl module
Message-ID: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>

hi guys

it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases.  maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D!

best wishes
johnny

[1] Bio::Tools::Analysis::Protein::Scansite
[2] http://www.ncbi.nlm.nih.gov/pubmed/11283593

********************
Jonathan Rameseder
Ph.D. Candidate
Computational Systems Biology Initiative
Koch Institute for Integrative Cancer Research
Massachusetts Institute of Technology
********************


From David.Messina at sbc.su.se  Mon Sep  6 08:14:20 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 6 Sep 2010 14:14:20 +0200
Subject: [Bioperl-l] Client-side Scansite Bioperl module
In-Reply-To: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
References: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
Message-ID: <0EA1C4B0-66CF-4AE3-9A47-CC6624737821@sbc.su.se>

Hi Jonathan,

Great to hear you're interested in including your code in BioPerl!

In general, we are liberal in what we accept. I think (and I'd like to hear what other BioPerlers think) the value of adding your code depends a lot on how it ties in with existing BioPerl objects ? does it make use of Bio::Seq or Bio::SeqIO, for example?

If you haven't already, you might want to take a look at some of our developer documentation. For example:
 
	http://www.bioperl.org/wiki/Bioperl_Best_Practices
	http://www.bioperl.org/wiki/Advanced_BioPerl


Also, the other thing to be aware of is that in the near future BioPerl itself will be splitting up into separately distributed modules anyway.

I can't find a good recent thread that discussed the rationale and details, but here's a couple anyway:

	http://www.bioperl.org/wiki/Proposed_BioPerl_changes
	http://old.nabble.com/Final-BioPerl-1.6-release-td29180027.html#a29195208


Dave


From ross at cuhk.edu.hk  Tue Sep  7 04:28:00 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Tue, 7 Sep 2010 16:28:00 +0800
Subject: [Bioperl-l]  Indexing nr database
In-Reply-To: <C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
Message-ID: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>

By the following codes, I wanna index the 4G nr database, however, the index
file is > 1T and the job has been running for weeks and still hasn't
finished. Could anybody tell me how you accomplish the goal? Thanks in
advance.

    use strict;

     use Bio::DB::Flat::BinarySearch;

 
     (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV;

 
     # use single quotes so you don't have to write

     # regular expressions like "gi\\|(\\d+)"

     #my $primary_pattern = '^>(\S+)';

     #if ($fullHeader == 1) {

        my $primary_pattern = '^>(.+)';

     #}

     my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
H37Rv complete genome";
#$string =~ s/$primary_pattern/RRR/g;

     #print "$string\n";

 
     # one or more patterns stored in a hash:

     my $secondary_patterns = {GI => 'gi\|(\d+)'};

 
     my $db = Bio::DB::Flat::BinarySearch->new(

                           -directory          => $baseDir,

                           -dbname             => $dbName,

                           -write_flag         => 1,

                           -primary_pattern    => $primary_pattern,

                           -primary_namespace  => 'ACC',

                           -secondary_patterns => $secondary_patterns,

                           -verbose            => 1,

                           -format             => 'fasta'  );

 
     $db->build_index($seqFile);

                                              
From David.Messina at sbc.su.se  Tue Sep  7 05:23:42 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 7 Sep 2010 11:23:42 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
Message-ID: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>

Hi Ross,

What do you need the index for?

If it's random retrieval of sequences using an accession or GI, you'd be better off using NCBI's own database indexing and retrieval tools. They're far faster than BioPerl.

They're distributed with Blast+ and available here:

	ftp://ftp.ncbi.nlm.nih.gov//blast/executables/LATEST

Specifically, I'm talking about 'makeblastdb' and blastdbcmd'.


I'm not sure what you mean by "4g" nr, but there's an already-indexed version of nr available here:

		ftp://ftp.ncbi.nih.gov//blast/db

You can use that directly with the BLAST+ database tools.


Also, you take a look at the cookbook at the end of the Blast+ user manual (available in the same download directory as Blast+ itself). Some nice examples there showing off the flexibility of this latest version of the software.


Dave


From ross at cuhk.edu.hk  Tue Sep  7 05:18:16 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Tue, 7 Sep 2010 17:18:16 +0800
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <4C860148.3030000@fmi.ch>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch>
Message-ID: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>

The reason is that I have to retrieve the specific information of the
matched sequences, e.g. extract the 64th amino acid of the top matched
sequence. Is there any way to achieve that?

-----Original Message-----
From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] 
Sent: Tuesday, September 07, 2010 5:09 PM
To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk
Subject: Re: [Bioperl-l] Indexing nr database

Hi


why don't you use the pre-indexed BLAST files from NCBI:

ftp://ftp.ncbi.nih.gov/blast/db/

you can use them to fetch individual sequences by gi number or accession 
with the tool "blastdbcmd" from blast+ binaries:

ftp://ftp.ncbi.nih.gov/blast/executables/blast+/


regards, Hans


On 09/07/2010 10:28 AM, Ross KK Leung wrote:
> By the following codes, I wanna index the 4G nr database, however, the
index
> file is>  1T and the job has been running for weeks and still hasn't
> finished. Could anybody tell me how you accomplish the goal? Thanks in
> advance.
>
>      use strict;
>
>       use Bio::DB::Flat::BinarySearch;
>
>
>
>       (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) =
@ARGV;
>
>
>
>       # use single quotes so you don't have to write
>
>       # regular expressions like "gi\\|(\\d+)"
>
>       #my $primary_pattern = '^>(\S+)';
>
>       #if ($fullHeader == 1) {
>
>          my $primary_pattern = '^>(.+)';
>
>       #}
>
>       my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
> H37Rv complete genome";
> #$string =~ s/$primary_pattern/RRR/g;
>
>       #print "$string\n";
>
>
>
>       # one or more patterns stored in a hash:
>
>       my $secondary_patterns = {GI =>  'gi\|(\d+)'};
>
>
>
>       my $db = Bio::DB::Flat::BinarySearch->new(
>
>                             -directory          =>  $baseDir,
>
>                             -dbname             =>  $dbName,
>
>                             -write_flag         =>  1,
>
>                             -primary_pattern    =>  $primary_pattern,
>
>                             -primary_namespace  =>  'ACC',
>
>                             -secondary_patterns =>  $secondary_patterns,
>
>                             -verbose            =>  1,
>
>                             -format             =>  'fasta'  );
>
>
>
>       $db->build_index($seqFile);
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hrh at fmi.ch  Tue Sep  7 05:09:28 2010
From: hrh at fmi.ch (Hans-Rudolf Hotz)
Date: Tue, 07 Sep 2010 11:09:28 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
Message-ID: <4C860148.3030000@fmi.ch>

Hi


why don't you use the pre-indexed BLAST files from NCBI:

ftp://ftp.ncbi.nih.gov/blast/db/

you can use them to fetch individual sequences by gi number or accession 
with the tool "blastdbcmd" from blast+ binaries:

ftp://ftp.ncbi.nih.gov/blast/executables/blast+/


regards, Hans


On 09/07/2010 10:28 AM, Ross KK Leung wrote:
> By the following codes, I wanna index the 4G nr database, however, the index
> file is>  1T and the job has been running for weeks and still hasn't
> finished. Could anybody tell me how you accomplish the goal? Thanks in
> advance.
>
>      use strict;
>
>       use Bio::DB::Flat::BinarySearch;
>
>
>
>       (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV;
>
>
>
>       # use single quotes so you don't have to write
>
>       # regular expressions like "gi\\|(\\d+)"
>
>       #my $primary_pattern = '^>(\S+)';
>
>       #if ($fullHeader == 1) {
>
>          my $primary_pattern = '^>(.+)';
>
>       #}
>
>       my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
> H37Rv complete genome";
> #$string =~ s/$primary_pattern/RRR/g;
>
>       #print "$string\n";
>
>
>
>       # one or more patterns stored in a hash:
>
>       my $secondary_patterns = {GI =>  'gi\|(\d+)'};
>
>
>
>       my $db = Bio::DB::Flat::BinarySearch->new(
>
>                             -directory          =>  $baseDir,
>
>                             -dbname             =>  $dbName,
>
>                             -write_flag         =>  1,
>
>                             -primary_pattern    =>  $primary_pattern,
>
>                             -primary_namespace  =>  'ACC',
>
>                             -secondary_patterns =>  $secondary_patterns,
>
>                             -verbose            =>  1,
>
>                             -format             =>  'fasta'  );
>
>
>
>       $db->build_index($seqFile);
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hrh at fmi.ch  Tue Sep  7 05:33:46 2010
From: hrh at fmi.ch (Hans-Rudolf Hotz)
Date: Tue, 07 Sep 2010 11:33:46 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch>
	<007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>
Message-ID: <4C8606FA.3000509@fmi.ch>


On 09/07/2010 11:18 AM, Ross KK Leung wrote:
> The reason is that I have to retrieve the specific information of the
> matched sequences, e.g. extract the 64th amino acid of the top matched
> sequence. Is there any way to achieve that?

"blastdbcmd" has several options like "-range"

and even if "blastdbcmd" does not give you the subset of information you 
want to fetch, I am still convinced you are quicker by fetching the 
complete entry with"blastdbcmd" and then parse the required data out of 
just one entry.

Hans

> -----Original Message-----
> From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch]
> Sent: Tuesday, September 07, 2010 5:09 PM
> To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk
> Subject: Re: [Bioperl-l] Indexing nr database
>
> Hi
>
>
> why don't you use the pre-indexed BLAST files from NCBI:
>
> ftp://ftp.ncbi.nih.gov/blast/db/
>
> you can use them to fetch individual sequences by gi number or accession
> with the tool "blastdbcmd" from blast+ binaries:
>
> ftp://ftp.ncbi.nih.gov/blast/executables/blast+/
>
>
> regards, Hans
>
>
>
> On 09/07/2010 10:28 AM, Ross KK Leung wrote:
>> By the following codes, I wanna index the 4G nr database, however, the
> index
>> file is>   1T and the job has been running for weeks and still hasn't
>> finished. Could anybody tell me how you accomplish the goal? Thanks in
>> advance.
>>
>>       use strict;
>>
>>        use Bio::DB::Flat::BinarySearch;
>>
>>
>>
>>        (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) =
> @ARGV;
>>
>>
>>
>>        # use single quotes so you don't have to write
>>
>>        # regular expressions like "gi\\|(\\d+)"
>>
>>        #my $primary_pattern = '^>(\S+)';
>>
>>        #if ($fullHeader == 1) {
>>
>>           my $primary_pattern = '^>(.+)';
>>
>>        #}
>>
>>        my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
>> H37Rv complete genome";
>> #$string =~ s/$primary_pattern/RRR/g;
>>
>>        #print "$string\n";
>>
>>
>>
>>        # one or more patterns stored in a hash:
>>
>>        my $secondary_patterns = {GI =>   'gi\|(\d+)'};
>>
>>
>>
>>        my $db = Bio::DB::Flat::BinarySearch->new(
>>
>>                              -directory          =>   $baseDir,
>>
>>                              -dbname             =>   $dbName,
>>
>>                              -write_flag         =>   1,
>>
>>                              -primary_pattern    =>   $primary_pattern,
>>
>>                              -primary_namespace  =>   'ACC',
>>
>>                              -secondary_patterns =>   $secondary_patterns,
>>
>>                              -verbose            =>   1,
>>
>>                              -format             =>   'fasta'  );
>>
>>
>>
>>        $db->build_index($seqFile);
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From fs5 at sanger.ac.uk  Tue Sep  7 08:09:52 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Tue, 07 Sep 2010 13:09:52 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
Message-ID: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>

I am working a lot with feature-rich Bio::Seq objects these days and
thought that it would be really nice if I could do something like:

my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene');

instead of having to grep for the feature every time. 
There could then be 'by_tag' and 'by_region' options as well. 

According to the Bio::Seq docs, something like this seems to be planned
at some stage. I would be willing to contribute to this feature if I can
and if this isn't already being implemented by somebody else. 
Does anybody know the state of this feature?

Frank


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From jason at bioperl.org  Tue Sep  7 13:36:07 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 07 Sep 2010 10:36:07 -0700
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <4C867807.2040907@bioperl.org>

And the implementation would just be something like this?

my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
eq 'my_gene' } $seq->get_SeqFeatures();

I think any implementation would be if we moved from the in-memory 
arrays & hash-based system to a sqlite db on the back-end for how 
Sequence and Feature objects are stored.
This would be a somewhat slower but wouldn't have performance/memory 
problems we get for sequences with many annotations.

-jason
Frank Schwach wrote, On 9/7/10 5:09 AM:
> I am working a lot with feature-rich Bio::Seq objects these days and
> thought that it would be really nice if I could do something like:
>
> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>
> instead of having to grep for the feature every time.
> There could then be 'by_tag' and 'by_region' options as well.
>
> According to the Bio::Seq docs, something like this seems to be planned
> at some stage. I would be willing to contribute to this feature if I can
> and if this isn't already being implemented by somebody else.
> Does anybody know the state of this feature?
>
> Frank
>
>
>
>
>
>
>    


From fs5 at sanger.ac.uk  Wed Sep  8 04:42:57 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 08 Sep 2010 09:42:57 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <4C867807.2040907@bioperl.org>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
Message-ID: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Jason,

Yes, I guess that would be the simplest way of doing it - basically just
doing it the way the docs suggest for getting at a specific feature but
hiding the grep behind a Bio::Seq method with search parameters. But we
could also build a hash of feature tags as the Bio::Seq is built so that
retrieval is more efficient. This could also be used to implement a bin
indexing scheme for range queries, similar to what Bio::DB::GFF does.
Is a move to an sqlite backend planend for the near future? 

Frank


On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
> And the implementation would just be something like this?
> 
> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
> eq 'my_gene' } $seq->get_SeqFeatures();
> 
> I think any implementation would be if we moved from the in-memory 
> arrays & hash-based system to a sqlite db on the back-end for how 
> Sequence and Feature objects are stored.
> This would be a somewhat slower but wouldn't have performance/memory 
> problems we get for sequences with many annotations.
> 
> -jason
> Frank Schwach wrote, On 9/7/10 5:09 AM:
> > I am working a lot with feature-rich Bio::Seq objects these days and
> > thought that it would be really nice if I could do something like:
> >
> > my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
> >
> > instead of having to grep for the feature every time.
> > There could then be 'by_tag' and 'by_region' options as well.
> >
> > According to the Bio::Seq docs, something like this seems to be planned
> > at some stage. I would be willing to contribute to this feature if I can
> > and if this isn't already being implemented by somebody else.
> > Does anybody know the state of this feature?
> >
> > Frank
> >
> >
> >
> >
> >
> >
> >    


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From stefan.kirov at bms.com  Wed Sep  8 11:09:55 2010
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 08 Sep 2010 11:09:55 -0400
Subject: [Bioperl-l] Another interesting Javascript library
Message-ID: <4C87A743.5010109@bms.com>


    Sorry for off topic, but I believe a lot of people can
    find this quite useful:

        "CanvasXpress is a javascript library based on the
        <canvas> tag implemented in HTML5. I developed this
        library as the core visualization component for our
        BMS systems biology platform which I hope to release
        soon. The basic idea was to have generic and simple
        way to display genomics data. CanvasXpress supports
        bar graphs, line graphs, bar-line combination
        graphs, boxplots, dotplots, area graphs, stacked
        graphs, percentage-stacked graphs, correlation
        plots, Venn diagrams, heatmaps, newick trees,
        2D-scatter plots, 2D-scatter bubble plots,
        3D-scatter plots, pie charts, networks (or
        pathways), and a genome browser. It also supports a
        few data transformations like log and exponential
        transformation, z-score, percentile transformation
        and ratio. It also support grouping of samples,
        zooming, events ... yada, yada, yada ... and more
        importantly I created an Ext panel for it. Take a look.

        http://canvasxpress.org/"


    Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_kirov.vcf
Type: text/x-vcard
Size: 207 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100908/30205812/attachment-0002.vcf>

From alperyilmaz at gmail.com  Wed Sep  8 12:47:42 2010
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Wed, 8 Sep 2010 12:47:42 -0400
Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates
Message-ID: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>

Hi,

I have a GFF file listing mRNA and CDS coordinates for every
transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates
based on that information. I was wondering, if there's already made
script for that purpose that you're aware of.

I already uploaded the GFF file into Bio::DB::SeqFeature database, so
I can utilize both flat file or database based scripts.

thanks,

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954


From cjfields at illinois.edu  Wed Sep  8 19:20:09 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 8 Sep 2010 18:20:09 -0500
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>

Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Thu Sep  9 01:51:53 2010
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 08 Sep 2010 22:51:53 -0700
Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates
In-Reply-To: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>
References: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>
Message-ID: <4C8875F9.6020502@bioperl.org>

Hi Alper -

This script operates on gtf so doesn't quite do what you want but could 
be modified to be simpler to just look at the CDS and mRNA rather than 
the exon,start/stop codon info
   
http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl

Otherwise I think there make be some easy ways to do this from some 
tools in MAKER too.

-jason
Alper Yilmaz wrote, On 9/8/10 9:47 AM:
> Hi,
>
> I have a GFF file listing mRNA and CDS coordinates for every
> transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates
> based on that information. I was wondering, if there's already made
> script for that purpose that you're aware of.
>
> I already uploaded the GFF file into Bio::DB::SeqFeature database, so
> I can utilize both flat file or database based scripts.
>
> thanks,
>
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>    


From fs5 at sanger.ac.uk  Thu Sep  9 04:10:36 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 09 Sep 2010 09:10:36 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
	<03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
Message-ID: <1284019836.4777.281.camel@deskpro15336.dynamic.sanger.ac.uk>

so something like an abstract Bio::Seq::FeatureContainer that defines
the methods for storing and retrieving features and that would then be
sub-classed to e.g. Bio::Seq::FeatureContainer::Memory or
Bio::Seq::FeatureContainer:Sqlite - is that the plan? Is there any way I
can get involved or is it better to wait for other features to be
developed first?

Cheers,

Frank


On Wed, 2010-09-08 at 18:20 -0500, Chris Fields wrote:
> Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).
> 
> chris
> 
> On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:
> 
> > Hi Jason,
> > 
> > Yes, I guess that would be the simplest way of doing it - basically just
> > doing it the way the docs suggest for getting at a specific feature but
> > hiding the grep behind a Bio::Seq method with search parameters. But we
> > could also build a hash of feature tags as the Bio::Seq is built so that
> > retrieval is more efficient. This could also be used to implement a bin
> > indexing scheme for range queries, similar to what Bio::DB::GFF does.
> > Is a move to an sqlite backend planend for the near future? 
> > 
> > Frank
> > 
> > 
> > 
> > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
> >> And the implementation would just be something like this?
> >> 
> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
> >> eq 'my_gene' } $seq->get_SeqFeatures();
> >> 
> >> I think any implementation would be if we moved from the in-memory 
> >> arrays & hash-based system to a sqlite db on the back-end for how 
> >> Sequence and Feature objects are stored.
> >> This would be a somewhat slower but wouldn't have performance/memory 
> >> problems we get for sequences with many annotations.
> >> 
> >> -jason
> >> Frank Schwach wrote, On 9/7/10 5:09 AM:
> >>> I am working a lot with feature-rich Bio::Seq objects these days and
> >>> thought that it would be really nice if I could do something like:
> >>> 
> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
> >>> 
> >>> instead of having to grep for the feature every time.
> >>> There could then be 'by_tag' and 'by_region' options as well.
> >>> 
> >>> According to the Bio::Seq docs, something like this seems to be planned
> >>> at some stage. I would be willing to contribute to this feature if I can
> >>> and if this isn't already being implemented by somebody else.
> >>> Does anybody know the state of this feature?
> >>> 
> >>> Frank
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> > 
> > 
> > 
> > -- 
> > The Wellcome Trust Sanger Institute is operated by Genome Research 
> > Limited, a charity registered in England with number 1021457 and a 
> > company registered in England with number 2742969, whose registered 
> > office is 215 Euston Road, London, NW1 2BE. 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From jun.yin at ucd.ie  Thu Sep  9 04:20:39 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Thu, 09 Sep 2010 09:20:39 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
	<03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
Message-ID: <00ea01cb4ff7$e30652f0$a912f8d0$%yin@ucd.ie>

Hi,

I would like to give a go on the bin indexing scheme on Bio::Seq(or a
similar package to Bio::LocatableSeq). The idea is to save the index of
sequences to a local database (AnyDBM) instead of the memory itself. So this
will free some memory usage. This idea actually comes from Bio::DB::Fasta,
as implemented by Lincoln Stein.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Thursday, September 09, 2010 12:20 AM
To: Frank Schwach
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Seq, search for specific features

Well, no move has been concretely made yet.  It would be nice to abstract
the backend, so one could use possibly any db or memory adaptor.  This is
essentially the direction I would like to take the alignment data as well
(part of the GSoC project for BioPerl this year was to tackle this very
thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From s1012635 at student.hsleiden.nl  Thu Sep  9 05:27:23 2010
From: s1012635 at student.hsleiden.nl (_Lelieveld, Stefan - s1012635)
Date: Thu, 9 Sep 2010 11:27:23 +0200 (CEST)
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>

Hi, 

I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). 
I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. 

http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
use Bio::Tools::Tmhmm; 
my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
while( my $tmhmm_feat = $parser->next_result ) { 
#do something 
#eg 
push @tmhmm_feat, $tmhmm_feat; 
} 

How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 

cheers! 

Stefan Lelieveld 


From fs5 at sanger.ac.uk  Thu Sep  9 06:28:51 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 09 Sep 2010 11:28:51 +0100
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
References: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <1284028131.4777.290.camel@deskpro15336.dynamic.sanger.ac.uk>

I haven't used that module myself but it appears to be a parser for
results from TMHMM, i.e. you don't feed it the FASTA file but the output
from TMHMM after it was run.
To run TMHMM you should use Bio::Tools::Run::Tmhmm
http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Tmhmm.pm

Follow the synopsis to feed the tool with your sequences. You can learn
how to read a FASTA file and access each sequence in a loop here: 
http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples

Essentially it boils down to:

use Bio::SeqIO;
my $file = shift; # to get a file path from command line
my $inseq = Bio::SeqIO->new(-file   => "<$file",-format => 'FASTA' );
while (my $seq = $inseq->next_seq) {
  print $seq->accession_number,"\n";
}

as an example for printing out accession numbers from $seq, which is a Bio::Seq object. 
So what you have to do now is to feed each of those Bio::Seq objects into your TMHMM runner. 

Frank


On Thu, 2010-09-09 at 11:27 +0200, _Lelieveld, Stefan - s1012635 wrote:
> Hi, 
> 
> I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). 
> I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. 
> 
> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
> use Bio::Tools::Tmhmm; 
> my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
> while( my $tmhmm_feat = $parser->next_result ) { 
> #do something 
> #eg 
> push @tmhmm_feat, $tmhmm_feat; 
> } 
> 
> How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 
> 
> cheers! 
> 
> Stefan Lelieveld 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From kai.blin at biotech.uni-tuebingen.de  Thu Sep  9 06:16:08 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 9 Sep 2010 12:16:08 +0200
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
References: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl>
	<814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <20100909121608.2571bbff.kai.blin@biotech.uni-tuebingen.de>

On Thu, 9 Sep 2010 11:27:23 +0200 (CEST)
"_Lelieveld, Stefan - s1012635" <s1012635 at student.hsleiden.nl> wrote:

Hi Stefan,

> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
> use Bio::Tools::Tmhmm; 
> my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
> while( my $tmhmm_feat = $parser->next_result ) { 
> #do something 
> #eg 
> push @tmhmm_feat, $tmhmm_feat; 
> } 
> 
> How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 

You need to run TMHMM first, of course. Bio::Tools::Tmhmm only parses
the TMHMM output file and returns an object that you can ask for
Bio::SeqFeature objects. So if you want to run TMHMM on some fasta
files, this module isn't going to do that for you.

Assuming that input.txt contains the TMHMM output,
"""
my $parser = new Bio::Tools:Tmhmm(-file => "input.txt");
"""
will load parse the TMHMM output for you.

HTH,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From elanorbust2 at yahoo.com  Thu Sep  9 12:10:06 2010
From: elanorbust2 at yahoo.com (sally roberts)
Date: Thu, 9 Sep 2010 09:10:06 -0700 (PDT)
Subject: [Bioperl-l] standaloneblastplus
Message-ID: <154453.73718.qm@web37504.mail.mud.yahoo.com>

I am running a test for standaloneblastplus but getting data back that does not exist in my query or my local database. Below is a outline of my script small database, query list, and erroneous results. As you will notice the query list is comprised of the first four sequences found in the database. The results say it can not find the first two and then the mathces for the last two do not exist!

Thanks for any help! 


Program 


#!/usr/bin/perl

use Bio::Tools::Run::StandAloneBlastPlus;


$fac = Bio::Tools::Run::StandAloneBlastPlus->new(
? -db_name => 'ITS',
? -db_data => 'smallDB.fas',
? -create => 1
);

$result = $fac->blastn( -query => , 'sequences.fasta',
??????????????????????? -outfile => 'ITStest2.bls');


smallDB.fas Data

>302585252|HM807352|Waitea circinata? internal transcribed spacer 1
ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA

>302585252|HM807352|Waitea circinata? internal transcribed spacer 2
GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA

>302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC

>302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA

>302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGCGCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCATTAAAGCGGAGGAA

>301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1
GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGCGCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTATATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAGAATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCTACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACCTCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTGACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGTGAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCGTTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCGTCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTACCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAGTCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATATATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGAACGCA
CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTAAGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCTTATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA

>301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTGGGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGAATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGAAACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGAAATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTAACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGACTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAAGTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACGTAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGAAACTTTTATATATAGTTCGCCGAATAATAGCGAAC

>301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer 1
AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCGCTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATCGTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTGAGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTCGAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCGAGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACCGGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTGGCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCCCTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTGTATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATATGAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATGTATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCTATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC

>301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1
AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCTTTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATCTCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCGGGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAATAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGAAACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAATATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCCGCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTCGTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATAGTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATACAAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCATAAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG

>301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1
AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGCTCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATTCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTCTTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCGATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGAGATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAATTACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAACGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGTAGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATATATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG


sequences.fasta data

>Test1
ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA

>Test2
GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA

>Test3
CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC

>Test4
GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA


Results

BLASTN 2.2.24+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.


Database: ITS
?????????? 5 sequences; 1,102 total letters


Query=? Test1
Length=204


***** No hits found *****


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 202071


Query=? Test2
Length=192


***** No hits found *****


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 189507


Query=? Test3
Length=437
????????????????????????????????????????????????????????????????????? Score???? E
Sequences producing significant alignments:????????????????????????? (Bits)? Value

dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 300??? 2e-085
dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 6e-016
dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012
dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 4e-012


>dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G59F
Length=203

?Score =? 300 bits (162),? Expect = 2e-085
?Identities = 176/182 (96%), Gaps = 4/182 (2%)
?Strand=Plus/Plus

Query? 10?? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC? 66
??????????? ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC? 81

Query? 67?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 126
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 82?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 141

Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 201

Query? 187? GG? 188
??????????? ||
Sbjct? 202? GG? 203


>dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G64F
Length=217

?Score = 69.4 bits (37),? Expect = 6e-016
?Identities = 39/40 (97%), Gaps = 0/40 (0%)
?Strand=Plus/Plus

Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188
??????????? ||||| ||||||||||||||||||||||||||||||||||
Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217


>dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G60F
Length=206

?Score = 58.4 bits (31),? Expect = 1e-012
?Identities = 39/42 (92%), Gaps = 3/42 (7%)
?Strand=Plus/Plus

Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? |||| || ||| ||||||||||||||||||||||||||||||
Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204


>dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G65F
Length=256

?Score = 56.5 bits (30),? Expect = 4e-012
?Identities = 30/30 (100%), Gaps = 0/30 (0%)
?Strand=Plus/Plus

Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? ||||||||||||||||||||||||||||||
Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 442850


Query=? Test4
Length=521
????????????????????????????????????????????????????????????????????? Score???? E
Sequences producing significant alignments:????????????????????????? (Bits)? Value

dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 309??? 4e-088
dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 7e-016
dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012
dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 5e-012


>dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G59F
Length=203

?Score =? 309 bits (167),? Expect = 4e-088
?Identities = 177/181 (97%), Gaps = 3/181 (1%)
?Strand=Plus/Plus

Query? 7??? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA? 63
??????????? ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA? 82

Query? 64?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 123
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 83?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 142

Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 183
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 202

Query? 184? G? 184
??????????? |
Sbjct? 203? G? 203


>dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G64F
Length=217

?Score = 69.4 bits (37),? Expect = 7e-016
?Identities = 39/40 (97%), Gaps = 0/40 (0%)
?Strand=Plus/Plus

Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184
??????????? ||||| ||||||||||||||||||||||||||||||||||
Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217


>dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G60F
Length=206

?Score = 58.4 bits (31),? Expect = 1e-012
?Identities = 39/42 (92%), Gaps = 3/42 (7%)
?Strand=Plus/Plus

Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182
??????????? |||| || ||| ||||||||||||||||||||||||||||||
Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204


>dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G65F
Length=256

?Score = 56.5 bits (30),? Expect = 5e-012
?Identities = 30/30 (100%), Gaps = 0/30 (0%)
?Strand=Plus/Plus

Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182
??????????? ||||||||||||||||||||||||||||||
Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 530378


? Database: ITS
??? Posted date:? Aug 27, 2010? 9:43 AM
? Number of letters in database: 1,102
? Number of sequences in database:? 5


Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 2.5


From jaya1786 at gmail.com  Thu Sep  9 12:59:51 2010
From: jaya1786 at gmail.com (jayanthijayakumar)
Date: Thu, 9 Sep 2010 22:29:51 +0530
Subject: [Bioperl-l] Regarding GSoC 2010
Message-ID: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>

Respected sir/madam,

I am Jayanthi Jayakumar doing my second year MS(By Research) in
computational biology in Anna University Chennai,India.

Iam very  much interested to participate in GSoC 2010 under the project
"Major Bioperl recognition".

I request you to provide details and eligiblity criteria for the same.


Thanking you,


yours faithfully,
Jayanthi Jayakumar


From Russell.Smithies at agresearch.co.nz  Thu Sep  9 18:54:43 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 10 Sep 2010 10:54:43 +1200
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <154453.73718.qm@web37504.mail.mud.yahoo.com>
References: <154453.73718.qm@web37504.mail.mud.yahoo.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz>

Is that a typo in your email or are some of your fasta headers in your db incorrect?
Eg.
>301333052|GU725063|Xiphinema adenohystherum  internal transcribed
>301333052|GU725063|spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Shouldn't that be:
>301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Maybe the invalid fasta headers are breaking the db formatter?


Russell Smithies

Technical Support
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz
Invermay  Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T  +64 3 489 3809
F  +64 3 489 9174
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of sally roberts
> Sent: Friday, 10 September 2010 4:10 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] standaloneblastplus
>
> I am running a test for standaloneblastplus but getting data back that
> does not exist in my query or my local database. Below is a outline of my
> script small database, query list, and erroneous results. As you will
> notice the query list is comprised of the first four sequences found in
> the database. The results say it can not find the first two and then the
> mathces for the last two do not exist!
>
> Thanks for any help!
>
>
>
> Program
>
>
> #!/usr/bin/perl
>
> use Bio::Tools::Run::StandAloneBlastPlus;
>
>
> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>   -db_name => 'ITS',
>   -db_data => 'smallDB.fas',
>   -create => 1
> );
>
> $result = $fac->blastn( -query => , 'sequences.fasta',
>                         -outfile => 'ITStest2.bls');
>
>
> smallDB.fas Data
>
> >302585252|HM807352|Waitea circinata  internal transcribed spacer 1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >302585252|HM807352|Waitea circinata  internal transcribed spacer 2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >302585250|HM802273|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >302585249|HM802272|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
> >302585248|HM802271|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
> ATATCATTAAAGCGGAGGAA
>
> >301333053|GU725064|Xiphinema turcicum  internal transcribed spacer 1
> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
> ACGCA
> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>
> >301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer
> 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>
> >301333051|GU725062|Xiphinema sphaerocephalum  internal transcribed spacer
> 1
> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>
> >301333050|GU725061|Xiphinema hispanum  internal transcribed spacer 1
> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>
> >301333049|GU725060|Xiphinema pyrenaicum  internal transcribed spacer 1
> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>
>
>
> sequences.fasta data
>
> >Test1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >Test2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >Test3
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >Test4
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
>
>
>
> Results
>
> BLASTN 2.2.24+
>
>
> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
> Comput Biol 2000; 7(1-2):203-14.
>
>
>
> Database: ITS
>            5 sequences; 1,102 total letters
>
>
>
> Query=  Test1
> Length=204
>
>
> ***** No hits found *****
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 202071
>
>
> Query=  Test2
> Length=192
>
>
> ***** No hits found *****
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 189507
>
>
> Query=  Test3
> Length=437
>
> Score     E
> Sequences producing significant alignments:
> (Bits)  Value
>
> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 300    2e-085
> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4    6e-016
> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4    1e-012
> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5    4e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>  Score =  300 bits (162),  Expect = 2e-085
>  Identities = 176/182 (96%), Gaps = 4/182 (2%)
>  Strand=Plus/Plus
>
> Query  10   TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
> 66
>             ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
> 81
>
> Query  67   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 126
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  82   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 141
>
> Query  127  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 186
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  142  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 201
>
> Query  187  GG  188
>             ||
> Sbjct  202  GG  203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>  Score = 69.4 bits (37),  Expect = 6e-016
>  Identities = 39/40 (97%), Gaps = 0/40 (0%)
>  Strand=Plus/Plus
>
> Query  149  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  188
>             ||||| ||||||||||||||||||||||||||||||||||
> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>  Score = 58.4 bits (31),  Expect = 1e-012
>  Identities = 39/42 (92%), Gaps = 3/42 (7%)
>  Strand=Plus/Plus
>
> Query  146  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  186
>             |||| || ||| ||||||||||||||||||||||||||||||
> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>  Score = 56.5 bits (30),  Expect = 4e-012
>  Identities = 30/30 (100%), Gaps = 0/30 (0%)
>  Strand=Plus/Plus
>
> Query  157  AAAACTTTCAACAACGGATCTCTTGGTTCT  186
>             ||||||||||||||||||||||||||||||
> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 442850
>
>
> Query=  Test4
> Length=521
>
> Score     E
> Sequences producing significant alignments:
> (Bits)  Value
>
> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 309    4e-088
> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4    7e-016
> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4    1e-012
> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5    5e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>  Score =  309 bits (167),  Expect = 4e-088
>  Identities = 177/181 (97%), Gaps = 3/181 (1%)
>  Strand=Plus/Plus
>
> Query  7    TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 63
>             ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 82
>
> Query  64   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 123
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  83   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 142
>
> Query  124  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 183
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  143  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 202
>
> Query  184  G  184
>             |
> Sbjct  203  G  203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>  Score = 69.4 bits (37),  Expect = 7e-016
>  Identities = 39/40 (97%), Gaps = 0/40 (0%)
>  Strand=Plus/Plus
>
> Query  145  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  184
>             ||||| ||||||||||||||||||||||||||||||||||
> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>  Score = 58.4 bits (31),  Expect = 1e-012
>  Identities = 39/42 (92%), Gaps = 3/42 (7%)
>  Strand=Plus/Plus
>
> Query  142  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  182
>             |||| || ||| ||||||||||||||||||||||||||||||
> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>  Score = 56.5 bits (30),  Expect = 5e-012
>  Identities = 30/30 (100%), Gaps = 0/30 (0%)
>  Strand=Plus/Plus
>
> Query  153  AAAACTTTCAACAACGGATCTCTTGGTTCT  182
>             ||||||||||||||||||||||||||||||
> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 530378
>
>
>   Database: ITS
>     Posted date:  Aug 27, 2010  9:43 AM
>   Number of letters in database: 1,102
>   Number of sequences in database:  5
>
>
>
> Matrix: blastn matrix 1 -2
> Gap Penalties: Existence: 0, Extension: 2.5
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From elanorbust2 at yahoo.com  Fri Sep 10 11:13:08 2010
From: elanorbust2 at yahoo.com (sally roberts)
Date: Fri, 10 Sep 2010 08:13:08 -0700 (PDT)
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz>
Message-ID: <23696.14536.qm@web37508.mail.mud.yahoo.com>

I think that is just a email error. Thanks for looking though!

--- On Thu, 9/9/10, Smithies, Russell <Russell.Smithies at agresearch.co.nz> wrote:

From: Smithies, Russell <Russell.Smithies at agresearch.co.nz>
Subject: RE: [Bioperl-l] standaloneblastplus
To: "'sally roberts'" <elanorbust2 at yahoo.com>, "'bioperl-l at lists.open-bio.org'" <bioperl-l at lists.open-bio.org>
Date: Thursday, September 9, 2010, 6:54 PM

Is that a typo in your email or are some of your fasta headers in your db incorrect?
Eg.
>301333052|GU725063|Xiphinema adenohystherum? internal transcribed
>301333052|GU725063|spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Shouldn't that be:
>301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Maybe the invalid fasta headers are breaking the db formatter?


Russell Smithies

Technical Support
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz
Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809
F? +64 3 489 9174
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of sally roberts
> Sent: Friday, 10 September 2010 4:10 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] standaloneblastplus
>
> I am running a test for standaloneblastplus but getting data back that
> does not exist in my query or my local database. Below is a outline of my
> script small database, query list, and erroneous results. As you will
> notice the query list is comprised of the first four sequences found in
> the database. The results say it can not find the first two and then the
> mathces for the last two do not exist!
>
> Thanks for any help!
>
>
>
> Program
>
>
> #!/usr/bin/perl
>
> use Bio::Tools::Run::StandAloneBlastPlus;
>
>
> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>???-db_name => 'ITS',
>???-db_data => 'smallDB.fas',
>???-create => 1
> );
>
> $result = $fac->blastn( -query => , 'sequences.fasta',
>? ? ? ? ? ? ? ? ? ? ? ???-outfile => 'ITStest2.bls');
>
>
> smallDB.fas Data
>
> >302585252|HM807352|Waitea circinata? internal transcribed spacer 1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >302585252|HM807352|Waitea circinata? internal transcribed spacer 2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
> >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
> ATATCATTAAAGCGGAGGAA
>
> >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1
> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
> ACGCA
> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>
> >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer
> 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>
> >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer
> 1
> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>
> >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1
> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>
> >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1
> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>
>
>
> sequences.fasta data
>
> >Test1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >Test2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >Test3
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >Test4
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
>
>
>
> Results
>
> BLASTN 2.2.24+
>
>
> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
> Comput Biol 2000; 7(1-2):203-14.
>
>
>
> Database: ITS
>? ? ? ? ? ? 5 sequences; 1,102 total letters
>
>
>
> Query=? Test1
> Length=204
>
>
> ***** No hits found *****
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 202071
>
>
> Query=? Test2
> Length=192
>
>
> ***** No hits found *****
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 189507
>
>
> Query=? Test3
> Length=437
>
> Score? ???E
> Sequences producing significant alignments:
> (Bits)? Value
>
> dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 300? ? 2e-085
> dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4? ? 6e-016
> dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4? ? 1e-012
> dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5? ? 4e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>? Score =? 300 bits (162),? Expect = 2e-085
>? Identities = 176/182 (96%), Gaps = 4/182 (2%)
>? Strand=Plus/Plus
>
> Query? 10???TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
> 66
>? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
> Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
> 81
>
> Query? 67???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 126
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 82???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 141
>
> Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 186
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 201
>
> Query? 187? GG? 188
>? ? ? ? ? ???||
> Sbjct? 202? GG? 203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>? Score = 69.4 bits (37),? Expect = 6e-016
>? Identities = 39/40 (97%), Gaps = 0/40 (0%)
>? Strand=Plus/Plus
>
> Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188
>? ? ? ? ? ???||||| ||||||||||||||||||||||||||||||||||
> Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>? Score = 58.4 bits (31),? Expect = 1e-012
>? Identities = 39/42 (92%), Gaps = 3/42 (7%)
>? Strand=Plus/Plus
>
> Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
>? ? ? ? ? ???|||| || ||| ||||||||||||||||||||||||||||||
> Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>? Score = 56.5 bits (30),? Expect = 4e-012
>? Identities = 30/30 (100%), Gaps = 0/30 (0%)
>? Strand=Plus/Plus
>
> Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186
>? ? ? ? ? ???||||||||||||||||||||||||||||||
> Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 442850
>
>
> Query=? Test4
> Length=521
>
> Score? ???E
> Sequences producing significant alignments:
> (Bits)? Value
>
> dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 309? ? 4e-088
> dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4? ? 7e-016
> dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4? ? 1e-012
> dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5? ? 5e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>? Score =? 309 bits (167),? Expect = 4e-088
>? Identities = 177/181 (97%), Gaps = 3/181 (1%)
>? Strand=Plus/Plus
>
> Query? 7? ? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 63
>? ? ? ? ? ???||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
> Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 82
>
> Query? 64???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 123
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 83???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 142
>
> Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 183
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 202
>
> Query? 184? G? 184
>? ? ? ? ? ???|
> Sbjct? 203? G? 203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>? Score = 69.4 bits (37),? Expect = 7e-016
>? Identities = 39/40 (97%), Gaps = 0/40 (0%)
>? Strand=Plus/Plus
>
> Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184
>? ? ? ? ? ???||||| ||||||||||||||||||||||||||||||||||
> Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>? Score = 58.4 bits (31),? Expect = 1e-012
>? Identities = 39/42 (92%), Gaps = 3/42 (7%)
>? Strand=Plus/Plus
>
> Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182
>? ? ? ? ? ???|||| || ||| ||||||||||||||||||||||||||||||
> Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>? Score = 56.5 bits (30),? Expect = 5e-012
>? Identities = 30/30 (100%), Gaps = 0/30 (0%)
>? Strand=Plus/Plus
>
> Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182
>? ? ? ? ? ???||||||||||||||||||||||||||||||
> Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 530378
>
>
>???Database: ITS
>? ???Posted date:? Aug 27, 2010? 9:43 AM
>???Number of letters in database: 1,102
>???Number of sequences in database:? 5
>
>
>
> Matrix: blastn matrix 1 -2
> Gap Penalties: Existence: 0, Extension: 2.5
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From David.Messina at sbc.su.se  Fri Sep 10 12:23:26 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 10 Sep 2010 18:23:26 +0200
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <23696.14536.qm@web37508.mail.mud.yahoo.com>
References: <23696.14536.qm@web37508.mail.mud.yahoo.com>
Message-ID: <CFF0CF54-E5BD-4FD1-990D-E3AC219FBF5B@sbc.su.se>

Hi Sally,

Did you run the same search on the command line, outside of BioPerl?

The issue you're having may be with Blast+ and not BioPerl. For example, it's possible that the low-complexity and compositional matrix adjustment filtering (which are turned on by default) are excluding the expected matches.


Dave


On Sep 10, 2010, at 17:13 , sally roberts wrote:

> I think that is just a email error. Thanks for looking though!
> 
> --- On Thu, 9/9/10, Smithies, Russell <Russell.Smithies at agresearch.co.nz> wrote:
> 
> From: Smithies, Russell <Russell.Smithies at agresearch.co.nz>
> Subject: RE: [Bioperl-l] standaloneblastplus
> To: "'sally roberts'" <elanorbust2 at yahoo.com>, "'bioperl-l at lists.open-bio.org'" <bioperl-l at lists.open-bio.org>
> Date: Thursday, September 9, 2010, 6:54 PM
> 
> Is that a typo in your email or are some of your fasta headers in your db incorrect?
> Eg.
>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed
>> 301333052|GU725063|spacer 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT
> 
> Shouldn't that be:
>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT
> 
> Maybe the invalid fasta headers are breaking the db formatter?
> 
> 
> Russell Smithies
> 
> Technical Support
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of sally roberts
>> Sent: Friday, 10 September 2010 4:10 a.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] standaloneblastplus
>> 
>> I am running a test for standaloneblastplus but getting data back that
>> does not exist in my query or my local database. Below is a outline of my
>> script small database, query list, and erroneous results. As you will
>> notice the query list is comprised of the first four sequences found in
>> the database. The results say it can not find the first two and then the
>> mathces for the last two do not exist!
>> 
>> Thanks for any help!
>> 
>> 
>> 
>> Program
>> 
>> 
>> #!/usr/bin/perl
>> 
>> use Bio::Tools::Run::StandAloneBlastPlus;
>> 
>> 
>> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>>    -db_name => 'ITS',
>>    -db_data => 'smallDB.fas',
>>    -create => 1
>> );
>> 
>> $result = $fac->blastn( -query => , 'sequences.fasta',
>>                          -outfile => 'ITStest2.bls');
>> 
>> 
>> smallDB.fas Data
>> 
>>> 302585252|HM807352|Waitea circinata  internal transcribed spacer 1
>> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
>> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
>> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>> 
>>> 302585252|HM807352|Waitea circinata  internal transcribed spacer 2
>> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
>> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
>> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>> 
>>> 302585250|HM802273|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
>> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
>> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
>> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
>> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
>> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>> 
>>> 302585249|HM802272|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
>> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
>> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
>> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
>> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
>> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
>> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
>> GAA
>> 
>>> 302585248|HM802271|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
>> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
>> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
>> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
>> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
>> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
>> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
>> ATATCATTAAAGCGGAGGAA
>> 
>>> 301333053|GU725064|Xiphinema turcicum  internal transcribed spacer 1
>> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
>> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
>> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
>> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
>> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
>> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
>> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
>> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
>> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
>> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
>> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
>> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
>> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
>> ACGCA
>> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
>> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
>> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>> 
>>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer
>> 1
>> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
>> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
>> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
>> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
>> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
>> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
>> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
>> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
>> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
>> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
>> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
>> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
>> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>> 
>>> 301333051|GU725062|Xiphinema sphaerocephalum  internal transcribed spacer
>> 1
>> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
>> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
>> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
>> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
>> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
>> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
>> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
>> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
>> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
>> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
>> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
>> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
>> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>> 
>>> 301333050|GU725061|Xiphinema hispanum  internal transcribed spacer 1
>> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
>> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
>> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
>> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
>> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
>> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
>> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
>> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
>> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
>> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
>> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
>> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
>> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>> 
>>> 301333049|GU725060|Xiphinema pyrenaicum  internal transcribed spacer 1
>> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
>> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
>> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
>> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
>> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
>> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
>> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
>> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
>> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
>> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
>> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
>> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>> 
>> 
>> 
>> sequences.fasta data
>> 
>>> Test1
>> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
>> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
>> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>> 
>>> Test2
>> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
>> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
>> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>> 
>>> Test3
>> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
>> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
>> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
>> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
>> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
>> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>> 
>>> Test4
>> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
>> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
>> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
>> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
>> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
>> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
>> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
>> GAA
>> 
>> 
>> 
>> 
>> Results
>> 
>> BLASTN 2.2.24+
>> 
>> 
>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
>> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
>> Comput Biol 2000; 7(1-2):203-14.
>> 
>> 
>> 
>> Database: ITS
>>             5 sequences; 1,102 total letters
>> 
>> 
>> 
>> Query=  Test1
>> Length=204
>> 
>> 
>> ***** No hits found *****
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 202071
>> 
>> 
>> Query=  Test2
>> Length=192
>> 
>> 
>> ***** No hits found *****
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 189507
>> 
>> 
>> Query=  Test3
>> Length=437
>> 
>> Score     E
>> Sequences producing significant alignments:
>> (Bits)  Value
>> 
>> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 300    2e-085
>> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 69.4    6e-016
>> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 58.4    1e-012
>> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 56.5    4e-012
>> 
>> 
>>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G59F
>> Length=203
>> 
>>   Score =  300 bits (162),  Expect = 2e-085
>>   Identities = 176/182 (96%), Gaps = 4/182 (2%)
>>   Strand=Plus/Plus
>> 
>> Query  10   TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
>> 66
>>              ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
>> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
>> 81
>> 
>> Query  67   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
>> 126
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  82   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
>> 141
>> 
>> Query  127  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
>> 186
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  142  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
>> 201
>> 
>> Query  187  GG  188
>>              ||
>> Sbjct  202  GG  203
>> 
>> 
>>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G64F
>> Length=217
>> 
>>   Score = 69.4 bits (37),  Expect = 6e-016
>>   Identities = 39/40 (97%), Gaps = 0/40 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  149  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  188
>>              ||||| ||||||||||||||||||||||||||||||||||
>> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>> 
>> 
>>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G60F
>> Length=206
>> 
>>   Score = 58.4 bits (31),  Expect = 1e-012
>>   Identities = 39/42 (92%), Gaps = 3/42 (7%)
>>   Strand=Plus/Plus
>> 
>> Query  146  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  186
>>              |||| || ||| ||||||||||||||||||||||||||||||
>> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>> 
>> 
>>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G65F
>> Length=256
>> 
>>   Score = 56.5 bits (30),  Expect = 4e-012
>>   Identities = 30/30 (100%), Gaps = 0/30 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  157  AAAACTTTCAACAACGGATCTCTTGGTTCT  186
>>              ||||||||||||||||||||||||||||||
>> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 442850
>> 
>> 
>> Query=  Test4
>> Length=521
>> 
>> Score     E
>> Sequences producing significant alignments:
>> (Bits)  Value
>> 
>> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 309    4e-088
>> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 69.4    7e-016
>> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 58.4    1e-012
>> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 56.5    5e-012
>> 
>> 
>>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G59F
>> Length=203
>> 
>>   Score =  309 bits (167),  Expect = 4e-088
>>   Identities = 177/181 (97%), Gaps = 3/181 (1%)
>>   Strand=Plus/Plus
>> 
>> Query  7    TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
>> 63
>>              ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
>> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
>> 82
>> 
>> Query  64   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
>> 123
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  83   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
>> 142
>> 
>> Query  124  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
>> 183
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  143  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
>> 202
>> 
>> Query  184  G  184
>>              |
>> Sbjct  203  G  203
>> 
>> 
>>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G64F
>> Length=217
>> 
>>   Score = 69.4 bits (37),  Expect = 7e-016
>>   Identities = 39/40 (97%), Gaps = 0/40 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  145  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  184
>>              ||||| ||||||||||||||||||||||||||||||||||
>> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>> 
>> 
>>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G60F
>> Length=206
>> 
>>   Score = 58.4 bits (31),  Expect = 1e-012
>>   Identities = 39/42 (92%), Gaps = 3/42 (7%)
>>   Strand=Plus/Plus
>> 
>> Query  142  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  182
>>              |||| || ||| ||||||||||||||||||||||||||||||
>> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>> 
>> 
>>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G65F
>> Length=256
>> 
>>   Score = 56.5 bits (30),  Expect = 5e-012
>>   Identities = 30/30 (100%), Gaps = 0/30 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  153  AAAACTTTCAACAACGGATCTCTTGGTTCT  182
>>              ||||||||||||||||||||||||||||||
>> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 530378
>> 
>> 
>>    Database: ITS
>>      Posted date:  Aug 27, 2010  9:43 AM
>>    Number of letters in database: 1,102
>>    Number of sequences in database:  5
>> 
>> 
>> 
>> Matrix: blastn matrix 1 -2
>> Gap Penalties: Existence: 0, Extension: 2.5
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Sat Sep 11 12:13:09 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Sat, 11 Sep 2010 17:13:09 +0100
Subject: [Bioperl-l] Regarding GSoC 2010
In-Reply-To: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>
References: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>
Message-ID: <019501cb51cc$39d15730$ad740590$%yin@ucd.ie>

Hi, Jayanthi Jayakumar,

GSoC is already finished this year. You can check the information here:
http://socghop.appspot.com/gsoc/program/home/google/gsoc2010

However, you can still contribute to the BioPerl project if you like. You
can talk to people in this mail list. Or you can join the IRC channel
(http://www.bioperl.org/wiki/IRC).

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jayanthijayakumar
Sent: Thursday, September 09, 2010 6:00 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Regarding GSoC 2010

Respected sir/madam,

I am Jayanthi Jayakumar doing my second year MS(By Research) in
computational biology in Anna University Chennai,India.

Iam very  much interested to participate in GSoC 2010 under the project
"Major Bioperl recognition".

I request you to provide details and eligiblity criteria for the same.


Thanking you,


yours faithfully,
Jayanthi Jayakumar
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 
 
__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From david.breimann at gmail.com  Sun Sep 12 09:16:29 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sun, 12 Sep 2010 15:16:29 +0200
Subject: [Bioperl-l] Circular genomes
Message-ID: <AANLkTi=LgWkUVeUd6WdaoDZQBTJMm7aCRe86p2-+R9PE@mail.gmail.com>

Hello,

As continuation to
http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033904.html, I
would like to ask:
Was the fix implemented yet? That is, are GFF3 created for circular genomes
comply with GFF3 specs for such genomes?
I just find it difficult to keep track using git ,so I'm not sure if this
was already handled.

Also, will the stat and end coordinates of such genes loaded from a GFF3
file will be "normal" (i.e. no coordinate is larger than the size of the
genome) or just as written in the GFF3 (which demands that end > start even
if end > genome length)?

Thanks,
David


From David.Messina at sbc.su.se  Mon Sep 13 11:10:42 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 13 Sep 2010 17:10:42 +0200
Subject: [Bioperl-l] BioPerl net installer
Message-ID: <80921A33-63E0-481A-B31B-3C0338542F2B@sbc.su.se>

Hi everyone,

I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.

The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.

It's already part of bioperl-live, and you can also get it here:

	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl


Dave


From maj at fortinbras.us  Mon Sep 13 12:47:45 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 13 Sep 2010 16:47:45 +0000
Subject: [Bioperl-l] BioPerl net installer
Message-ID: <W964423161995101284396465@webmail24>

Dear Scott-
You rock!
Sincerely, 
Mark

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Monday, September 13, 2010 11:10 AM
>To: 'BioPerl List'
>Subject: [Bioperl-l] BioPerl net installer
>
>Hi everyone,
>
>I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.
>
>The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.
>
>It's already part of bioperl-live, and you can also get it here:
>
>	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Sep 13 17:15:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 13 Sep 2010 16:15:45 -0500
Subject: [Bioperl-l] BioPerl net installer
In-Reply-To: <W964423161995101284396465@webmail24>
References: <W964423161995101284396465@webmail24>
Message-ID: <3D7D24C5-B2BD-472E-9611-F3D7112E453D@illinois.edu>

Ditto!  

chris (briefly resurfacing)

On Sep 13, 2010, at 11:47 AM, Mark A. Jensen wrote:

> Dear Scott-
> You rock!
> Sincerely, 
> Mark
> 
>> -----Original Message-----
>> From: Dave Messina [mailto:David.Messina at sbc.su.se]
>> Sent: Monday, September 13, 2010 11:10 AM
>> To: 'BioPerl List'
>> Subject: [Bioperl-l] BioPerl net installer
>> 
>> Hi everyone,
>> 
>> I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.
>> 
>> The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.
>> 
>> It's already part of bioperl-live, and you can also get it here:
>> 
>> 	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl
>> 
>> 
>> 
>> Dave
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From timmcilveen at talktalk.net  Mon Sep 13 19:07:00 2010
From: timmcilveen at talktalk.net (tim)
Date: Tue, 14 Sep 2010 00:07:00 +0100
Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3
Message-ID: <201009140007.00798.timmcilveen@talktalk.net>

Hi,
I have just installed Bioperl on my Linux system using the CPAN install. The 
install summary is as follows:
Test Summary Report
-------------------
t/RemoteDB/GenPept.t                       (Wstat: 256 Tests: 21 Failed: 1)
  Failed test:  17
  Non-zero exit status: 1
t/RemoteDB/Query/GenBank.t                 (Wstat: 256 Tests: 18 Failed: 1)
  Failed test:  9
  Non-zero exit status: 1
  Parse errors: Bad plan.  You planned 21 tests but ran 18.
t/RemoteDB/Taxonomy.t                      (Wstat: 512 Tests: 103 Failed: 2)
  Failed tests:  15, 98
  Non-zero exit status: 2
t/Root/RootIO.t                            (Wstat: 7424 Tests: 30 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 31 tests but ran 30.
Files=329, Tests=18407, 512 wallclock secs ( 6.19 usr  0.91 sys + 156.68 cusr  
9.16 csys = 172.94 CPU)
Result: FAIL
Failed 4/329 test programs. 4/18407 subtests failed.
  CJFIELDS/BioPerl-1.6.1.tar.gz
  ./Build test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports CJFIELDS/BioPerl-1.6.1.tar.gz
Running Build install
  make test had returned bad status, won't install without force
Failed during this command:
 CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO

Is Bioperl properly installed? During the install process I was getting quite 
a lot of this error (100's of instances):
'replacement list longer than search list . This happened with t/tools, t/seq 
/ t/search and many others.

Any advice would be great.
Tim


From David.Messina at sbc.su.se  Tue Sep 14 03:56:33 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 14 Sep 2010 09:56:33 +0200
Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3
In-Reply-To: <201009140007.00798.timmcilveen@talktalk.net>
References: <201009140007.00798.timmcilveen@talktalk.net>
Message-ID: <5955676D-D3BC-452B-BAA0-6F230EC11EC1@sbc.su.se>

Hi Tim,

Thanks for your report.


> Is Bioperl properly installed?

No, it wasn't. When installing through CPAN, if any tests fail the installation is aborted. You can always check by looking for this line:

>  make test had returned bad status, won't install without force


As for the error(s)

> 'replacement list longer than search list'

I believe this was fixed a couple of months ago. For details, see:

	http://bugzilla.open-bio.org/show_bug.cgi?id=3116


So I would recommend that you grab the latest copy of bioperl-live from github, wherein the bug will be fixed:

	http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots


Give that a shot and let us know how it goes.


Dave


From jskittrell at unmc.edu  Thu Sep 16 12:15:49 2010
From: jskittrell at unmc.edu (Jeff Kittrell)
Date: Thu, 16 Sep 2010 16:15:49 +0000 (UTC)
Subject: [Bioperl-l] mpiblast
Message-ID: <loom.20100916T180946-13@post.gmane.org>

Does Bioperl work with mpiblast? Is the there a standalone like module that
allows you to easily call mpiblast? I'm assuming seqio with parse a mpiblast
output file correctly?

Thanks for any help,

Jeff 


From David.Messina at sbc.su.se  Thu Sep 16 14:25:57 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 16 Sep 2010 20:25:57 +0200
Subject: [Bioperl-l] mpiblast
In-Reply-To: <loom.20100916T180946-13@post.gmane.org>
References: <loom.20100916T180946-13@post.gmane.org>
Message-ID: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>

> Is the there a standalone like module that allows you to easily call mpiblast?

No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward.

	http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase


> I'm assuming seqio with parse a mpiblast output file correctly?

Yes, although I see that a new version of mpiblast was recently released. 

Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet?


Dave


From shalabh.sharma7 at gmail.com  Thu Sep 16 17:38:14 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Thu, 16 Sep 2010 17:38:14 -0400
Subject: [Bioperl-l] IUPAC code similarity
Message-ID: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>

Hi All,
      I have few nucleotide sequences that are composed of IUPAC codes. Like
>test
VGSRVBSSSSSNSC

Similarly i have a database made of of these kind of sequences. I want to
find sequences that are 100% similar to the query sequence.

Is there any bioPerl module to deal with this, i tried normal blast but it
didn't worked.
Do i have to convert these sequences to 4 base codes or there is any other
way out.

Thanks
Shalabh


From amackey at virginia.edu  Fri Sep 17 10:28:15 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 10:28:15 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
Message-ID: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>

Convert the IUPAC code to a regular expression, and use regular expressions
(in Perl or grep or similar) to find 100% identical matches.

-Aaron

On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>      I have few nucleotide sequences that are composed of IUPAC codes. Like
> >test
> VGSRVBSSSSSNSC
>
> Similarly i have a database made of of these kind of sequences. I want to
> find sequences that are 100% similar to the query sequence.
>
> Is there any bioPerl module to deal with this, i tried normal blast but it
> didn't worked.
> Do i have to convert these sequences to 4 base codes or there is any other
> way out.
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From shalabh.sharma7 at gmail.com  Fri Sep 17 11:07:38 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Fri, 17 Sep 2010 11:07:38 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
Message-ID: <AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>

Thanks Aaron for your reply.
Actually i tried that first, but there is another problem, i have to divide
each query sequence to window size 5 with 1 base shift and its not possible
to divide regular expression in that way.
So what i am trying is to convert those iupac codes to 4 base code sequence
and then do the normal search.
Now the problem is that i cant able to convert those IUPAC sequences to
normal ones, i am still trying to write a script but its taking time.

Thanks
Shalabh


On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu> wrote:

> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <shalabh.sharma7 at gmail.com
> > wrote:
>
>> Hi All,
>>      I have few nucleotide sequences that are composed of IUPAC codes.
>> Like
>> >test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From roy.chaudhuri at gmail.com  Fri Sep 17 11:04:28 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 17 Sep 2010 16:04:28 +0100
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
Message-ID: <4C93837C.4080008@gmail.com>

Hi Shalabh,

The expand method in Bio::Tools::SeqPattern may be useful to convert 
IUPAC codes to regular expressions:

$perl -e 'use Bio::Tools::SeqPattern; print 
Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
[ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C

Although that won't work if there are also abiguity codes in your 
database. For a non-BioPerl solution you could try fuzznuc from Emboss.

Cheers.
Roy.

On 17/09/2010 15:28, Aaron Mackey wrote:
> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
> <shalabh.sharma7 at gmail.com>wrote:
>
>> Hi All,
>>       I have few nucleotide sequences that are composed of IUPAC codes. Like
>>> test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Fri Sep 17 14:13:22 2010
From: david.breimann at gmail.com (David Breimann)
Date: Fri, 17 Sep 2010 20:13:22 +0200
Subject: [Bioperl-l] Installing using git after an older installation
Message-ID: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>

Hello,

I'm sharing a server with some other lab members.
I would like to install the latest version of bioperl for my own use,
without affecting my colleagues.
I used git to clone a copy of bioperl-live and exported
  PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB".
Now
  perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
returns
  1.0069

My question is: is that all?
Now I'm using the latest version?
Should I include anything special in my scripts?

Also, what about all the bp_***.pl scripts?
Are the now using the latest version, too?
I guess not, since I didn't build anything. So what should I do about them?

Thanks,
Dave


From amackey at virginia.edu  Fri Sep 17 15:24:44 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 15:24:44 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <4C93837C.4080008@gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<4C93837C.4080008@gmail.com>
Message-ID: <AANLkTi=Nx5fQ=YvxzB4mSsJh+38B-2+87Lvf=WHU_rCH@mail.gmail.com>

If there are ambi. codes in the database, then the expanded character class
has to also include the original ambiguity code; non-ambiguous nucleotides
must also be expanded to include all ambiguity codes that represent the
nucleotide.

-Aaron

On Fri, Sep 17, 2010 at 11:04 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Hi Shalabh,
>
> The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC
> codes to regular expressions:
>
> $perl -e 'use Bio::Tools::SeqPattern; print
> Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
> [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C
>
> Although that won't work if there are also abiguity codes in your database.
> For a non-BioPerl solution you could try fuzznuc from Emboss.
>
> Cheers.
> Roy.
>
>
> On 17/09/2010 15:28, Aaron Mackey wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions
>> (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
>> <shalabh.sharma7 at gmail.com>wrote:
>>
>>  Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>>
>>>> test
>>>>
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>  _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From amackey at virginia.edu  Fri Sep 17 15:25:54 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 15:25:54 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
Message-ID: <AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>

do your windowing/shifting on the unexpanded query sequences; then transform
the 5-bp queries into regular expressions.

-Aaron

On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Thanks Aaron for your reply.
> Actually i tried that first, but there is another problem, i have to divide
> each query sequence to window size 5 with 1 base shift and its not possible
> to divide regular expression in that way.
> So what i am trying is to convert those iupac codes to 4 base code sequence
> and then do the normal search.
> Now the problem is that i cant able to convert those IUPAC sequences to
> normal ones, i am still trying to write a script but its taking time.
>
> Thanks
> Shalabh
>
>
> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu>wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>> shalabh.sharma7 at gmail.com> wrote:
>>
>>> Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>> >test
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>


From Kevin.M.Brown at asu.edu  Fri Sep 17 16:09:34 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 17 Sep 2010 13:09:34 -0700
Subject: [Bioperl-l] Installing using git after an older installation
In-Reply-To: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>
References: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B40701E0A4@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_IN_A_PERSONAL_MODULE_AREA


From shalabh.sharma7 at gmail.com  Fri Sep 17 16:45:50 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Fri, 17 Sep 2010 16:45:50 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
	<AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>
Message-ID: <AANLkTi=b7MYxC=2DuY3N5Udg9CsfoVDV=Q9cFUOUML0x@mail.gmail.com>

Thanks Aaron,
changing the query sequence worked well but i am still struggling with the
database.

-Shalabh


On Fri, Sep 17, 2010 at 3:25 PM, Aaron Mackey <amackey at virginia.edu> wrote:

> do your windowing/shifting on the unexpanded query sequences; then
> transform the 5-bp queries into regular expressions.
>
> -Aaron
>
>
> On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma <
> shalabh.sharma7 at gmail.com> wrote:
>
>> Thanks Aaron for your reply.
>> Actually i tried that first, but there is another problem, i have to
>> divide each query sequence to window size 5 with 1 base shift and its not
>> possible to divide regular expression in that way.
>> So what i am trying is to convert those iupac codes to 4 base code
>> sequence and then do the normal search.
>> Now the problem is that i cant able to convert those IUPAC sequences to
>> normal ones, i am still trying to write a script but its taking time.
>>
>> Thanks
>> Shalabh
>>
>>
>> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu>wrote:
>>
>>> Convert the IUPAC code to a regular expression, and use regular
>>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>>
>>> -Aaron
>>>
>>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>>> shalabh.sharma7 at gmail.com> wrote:
>>>
>>>> Hi All,
>>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>>> Like
>>>> >test
>>>> VGSRVBSSSSSNSC
>>>>
>>>> Similarly i have a database made of of these kind of sequences. I want
>>>> to
>>>> find sequences that are 100% similar to the query sequence.
>>>>
>>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>>> it
>>>> didn't worked.
>>>> Do i have to convert these sequences to 4 base codes or there is any
>>>> other
>>>> way out.
>>>>
>>>> Thanks
>>>> Shalabh
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>


From heikki.lehvaslaiho at gmail.com  Sat Sep 18 03:41:22 2010
From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho)
Date: Sat, 18 Sep 2010 10:41:22 +0300
Subject: [Bioperl-l] mpiblast
In-Reply-To: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>
References: <loom.20100916T180946-13@post.gmane.org>
	<0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>
Message-ID: <AANLkTi=SOzf5PH605OkPYf2r6w=xaTdXqU+zY7LBwkNw@mail.gmail.com>

Been running 1.6 and its betas on Blue Gene/P for months. The output
is identical to standard BLAST output. No issues in parsing it with
BioPerl.

? ?? -Heikki

Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +966 545 595 849? office: +966 2 808 2429

Computational Bioscience Research Centre (CBRC), Building #2, Office #4216
4700 King Abdullah University of Science and Technology (KAUST)
Thuwal 23955-6900, Kingdom of Saudi Arabia


On 16 September 2010 21:25, Dave Messina <David.Messina at sbc.su.se> wrote:
>> Is the there a standalone like module that allows you to easily call mpiblast?
>
> No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward.
>
> ? ? ? ?http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase
>
>
>> I'm assuming seqio with parse a mpiblast output file correctly?
>
> Yes, although I see that a new version of mpiblast was recently released.
>
> Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet?
>
>
> Dave
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From david.breimann at gmail.com  Sat Sep 18 05:05:58 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 11:05:58 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
Message-ID: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>

Hello,

I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag`
in the fields and sometime it doesn't, even though the genabank has a locus
tag.
Also, is the ID always equivalent to the locus tag?

Thanks,
Dave


From scott at scottcain.net  Sat Sep 18 05:17:24 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 10:17:24 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
Message-ID: <AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>

Hi Dave,

bp_genbank2gff3.pl suffers from the fact that it has to deal with
GenBank files :-)  It was designed initially to work on whole genome
refseqs, and contains several ad hoc rules for trying to make it "do
the right thing."  In practice, it is not unusual for a post
processing step (either by hand or a quicky perl script) to be
required to really get it right.  I don't recall the specifics (if I
ever knew :-) for when and how the locus tag is used, but I do know
that there is a list of things that it will try to use for the ID, and
while the locus is on the list, I don't know where it comes in the
list, so it's possible that other items might supersede it.

Scott


On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Hello,
>
> I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag`
> in the fields and sometime it doesn't, even though the genabank has a locus
> tag.
> Also, is the ID always equivalent to the locus tag?
>
> Thanks,
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 05:20:33 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 11:20:33 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
Message-ID: <AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>

Since locus_tag is an essential tag in genbank, I suggest locus_tag will be
always added to the GFF last column if it exists in the genbank, whether it
is used as ID in the GFF or not.

On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> bp_genbank2gff3.pl suffers from the fact that it has to deal with
> GenBank files :-)  It was designed initially to work on whole genome
> refseqs, and contains several ad hoc rules for trying to make it "do
> the right thing."  In practice, it is not unusual for a post
> processing step (either by hand or a quicky perl script) to be
> required to really get it right.  I don't recall the specifics (if I
> ever knew :-) for when and how the locus tag is used, but I do know
> that there is a list of things that it will try to use for the ID, and
> while the locus is on the list, I don't know where it comes in the
> list, so it's possible that other items might supersede it.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Hello,
> >
> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
> `locus_tag`
> > in the fields and sometime it doesn't, even though the genabank has a
> locus
> > tag.
> > Also, is the ID always equivalent to the locus tag?
> >
> > Thanks,
> > Dave
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


From scott at scottcain.net  Sat Sep 18 06:08:26 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 11:08:26 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
Message-ID: <AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>

Hi Dave,

That seems perfectly reasonable.  If you could point out a GenBank
entry for which that does not happen, I could try to figure out why
not.

Scott


On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Since locus_tag is an essential tag in genbank, I suggest locus_tag will be
> always added to the GFF last column if it exists in the genbank, whether it
> is used as ID in the GFF or not.
>
> On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> bp_genbank2gff3.pl suffers from the fact that it has to deal with
>> GenBank files :-) ?It was designed initially to work on whole genome
>> refseqs, and contains several ad hoc rules for trying to make it "do
>> the right thing." ?In practice, it is not unusual for a post
>> processing step (either by hand or a quicky perl script) to be
>> required to really get it right. ?I don't recall the specifics (if I
>> ever knew :-) for when and how the locus tag is used, but I do know
>> that there is a list of things that it will try to use for the ID, and
>> while the locus is on the list, I don't know where it comes in the
>> list, so it's possible that other items might supersede it.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Hello,
>> >
>> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
>> > `locus_tag`
>> > in the fields and sometime it doesn't, even though the genabank has a
>> > locus
>> > tag.
>> > Also, is the ID always equivalent to the locus tag?
>> >
>> > Thanks,
>> > Dave
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 06:20:50 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 12:20:50 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
Message-ID: <AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>

Hi Scott,

Here is a very short genbank:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk

Note all genes in the genbank have locus tags. In the resulting GFF3,
however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no
idea why it deserves a special treatment... :)

p.s. making this change (i.e., copying locus_tag to the GFF3 last column
whenever available) will really make my life easier.

Thank you,
Dave

On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> That seems perfectly reasonable.  If you could point out a GenBank
> entry for which that does not happen, I could try to figure out why
> not.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will
> be
> > always added to the GFF last column if it exists in the genbank, whether
> it
> > is used as ID in the GFF or not.
> >
> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net>
> wrote:
> >>
> >> Hi Dave,
> >>
> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with
> >> GenBank files :-)  It was designed initially to work on whole genome
> >> refseqs, and contains several ad hoc rules for trying to make it "do
> >> the right thing."  In practice, it is not unusual for a post
> >> processing step (either by hand or a quicky perl script) to be
> >> required to really get it right.  I don't recall the specifics (if I
> >> ever knew :-) for when and how the locus tag is used, but I do know
> >> that there is a list of things that it will try to use for the ID, and
> >> while the locus is on the list, I don't know where it comes in the
> >> list, so it's possible that other items might supersede it.
> >>
> >> Scott
> >>
> >>
> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> >> <david.breimann at gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
> >> > `locus_tag`
> >> > in the fields and sometime it doesn't, even though the genabank has a
> >> > locus
> >> > tag.
> >> > Also, is the ID always equivalent to the locus tag?
> >> >
> >> > Thanks,
> >> > Dave
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


From david.breimann at gmail.com  Sat Sep 18 06:45:13 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 12:45:13 +0200
Subject: [Bioperl-l] Extracting sequences from GFF3
Message-ID: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>

As you know, GFF3 files can contain FASTA sequences after the features.

How do I extract a specific FASTA sequence given it's ID?

I tried:

use Bio::Tools::GFF;
use Data::Dumper;

my $gffio = Bio::Tools::GFF->new(
    -file =>
        "/path/to/file.gff",
    -gff_version => 3
);

print Dumper $gffio->get_seqs();

but $gffio->get_seqs() seems to return nothing, although the GFF3 has
sequences and is also valid.

By the way, I am able to parse the features themselves (using
$gffio->next_feature()).


Thanks,

Dave


From scott at scottcain.net  Sat Sep 18 07:07:13 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 12:07:13 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
Message-ID: <AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>

Hi Dave,

A fresh "pull" of the bioperl git repository shows that
bp_genbank2gff3.pl already does this.  It creates a locus_tag for all
features that have a locus_tag, and uses the locus_tag for the ID when
it can (it can't blindly use the locus tag for the ID since both the
gene and the CDS have the same tag).

Scott


On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Hi Scott,
>
> Here is a very short genbank:
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>
> Note all genes in the genbank have locus tags. In the resulting GFF3,
> however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no
> idea why it deserves a special treatment... :)
>
> p.s. making this change (i.e., copying locus_tag to the GFF3 last column
> whenever available) will really make my life easier.
>
> Thank you,
> Dave
>
> On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> That seems perfectly reasonable. ?If you could point out a GenBank
>> entry for which that does not happen, I could try to figure out why
>> not.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will
>> > be
>> > always added to the GFF last column if it exists in the genbank, whether
>> > it
>> > is used as ID in the GFF or not.
>> >
>> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net>
>> > wrote:
>> >>
>> >> Hi Dave,
>> >>
>> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with
>> >> GenBank files :-) ?It was designed initially to work on whole genome
>> >> refseqs, and contains several ad hoc rules for trying to make it "do
>> >> the right thing." ?In practice, it is not unusual for a post
>> >> processing step (either by hand or a quicky perl script) to be
>> >> required to really get it right. ?I don't recall the specifics (if I
>> >> ever knew :-) for when and how the locus tag is used, but I do know
>> >> that there is a list of things that it will try to use for the ID, and
>> >> while the locus is on the list, I don't know where it comes in the
>> >> list, so it's possible that other items might supersede it.
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
>> >> > `locus_tag`
>> >> > in the fields and sometime it doesn't, even though the genabank has a
>> >> > locus
>> >> > tag.
>> >> > Also, is the ID always equivalent to the locus tag?
>> >> >
>> >> > Thanks,
>> >> > Dave
>> >> > _______________________________________________
>> >> > Bioperl-l mailing list
>> >> > Bioperl-l at lists.open-bio.org
>> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 07:13:23 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 12:13:23 +0100
Subject: [Bioperl-l] Extracting sequences from GFF3
In-Reply-To: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>
References: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>
Message-ID: <AANLkTimc9Zf2a2N64wUUancrHWJ05wRKgfMuUQ0X9UTe@mail.gmail.com>

Hi Dave,

I would use Bio::DB::SeqFeature::Store (either with a database on the
backend or a flat file if a database isn't warranted):

  my $db      = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory',
                                                        -dir =>
'path/to/file' );

  # Warning: this returns a string, and not a PrimarySeq object
  my $sequence = $db->fetch_sequence('Chr1',5000=>6000);

Scott


On Sat, Sep 18, 2010 at 11:45 AM, David Breimann
<david.breimann at gmail.com> wrote:
> As you know, GFF3 files can contain FASTA sequences after the features.
>
> How do I extract a specific FASTA sequence given it's ID?
>
> I tried:
>
> use Bio::Tools::GFF;
> use Data::Dumper;
>
> my $gffio = Bio::Tools::GFF->new(
>    -file =>
>        "/path/to/file.gff",
>    -gff_version => 3
> );
>
> print Dumper $gffio->get_seqs();
>
> but $gffio->get_seqs() seems to return nothing, although the GFF3 has
> sequences and is also valid.
>
> By the way, I am able to parse the features themselves (using
> $gffio->next_feature()).
>
>
> Thanks,
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 09:40:35 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 14:40:35 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
Message-ID: <AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>

Hi Dave,

Let's keep the discussion on the mailing list so we can make sure that
when this problem is solved, its resolution will be archived.

I don't really understand what is going on either, though it would
probably be a good idea to set your PERL5LIB env variable so that when
you execute this script from the git repository that it will also uses
BioPerl modules in the git repository instead of the ones that are
installed in your "normal" path.

Also, are you using any command line flags when executing it?  I didn't.

Scott


On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Yes, I'm using Ubuntu 10.04.
>
> That is really weired. I tried running the script from the perl-live dir
> (which I just pulled using git), and I get the same results as before
> (`Name` instead of `locus_tag`):
>
> ?$ wget
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
> NC_009789.genbank
>
> Attached is the resulting GFF3.
> I also attach a copy of bp_genbank2gff3.pl as found under
> /home/dave/src/bioperl-live/blib/script.
>
> This is a real mystery for me!
>
> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Typically I do build and install, but you can run it directly from the
>> git checkout directory.
>>
>> For locating other versions of the script, are you running linux? ?If
>> so, are you familiar with the "locate" command:
>>
>> ?locate bp_genbank2gff3.pl
>>
>> If you've never used it before, you may need to update the database
>> the locate command uses as root:
>>
>> ?sudo updatedb
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>> > of
>> > `locus_tag=`.
>> >
>> > I don't really know how to check for multiple bioperl installations.
>> > I'm using my personal server, so I don't mind removing and installing
>> > everything from scratch -- but I do'nt know ho to do that.
>> >
>> > Also, what I don't get with the git is how the scripts are supposed to
>> > be
>> > updated (unless you build and install).
>> >
>> > Thanks you!
>> >
>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net> wrote:
>> >>
>> >> Well, if you aren't getting the same results as me then I'd say you
>> >> aren't using the same version of the script :-)
>> >>
>> >> Unfortunately, the scripts are no longer automatically marked with the
>> >> "internal" version information when committed, so there really isn't
>> >> anything in the script I can tell you to look for. ?Check for more
>> >> than one bioperl instance on your ?computer.
>> >>
>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>> >> it is what you expect.
>> >>
>> >> Scott
>> >>
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Hi Scott,
>> >> >
>> >> > I just pulled the lated bioperl-live using git.
>> >> > I'm not sure how the scripts are updated, so I Build and installed
>> >> > anyway
>> >> > (perhaps exporting the path is supposed to be enough?)
>> >> > Anyway, I still get the same results. No locus_tag.
>> >> > How can I tell if I'm using the latest version of the script?
>> >> >
>> >> > Thanks again.
>> >> >
>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>> >> > wrote:
>> >> >>
>> >> >> Hi Dave,
>> >> >>
>> >> >> A fresh "pull" of the bioperl git repository shows that
>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>> >> >> all
>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>> >> >> when
>> >> >> it can (it can't blindly use the locus tag for the ID since both the
>> >> >> gene and the CDS have the same tag).
>> >> >>
>> >> >> Scott
>> >> >>
>> >> >>
>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>> >> >> <david.breimann at gmail.com> wrote:
>> >> >> > Hi Scott,
>> >> >> >
>> >> >> > Here is a very short genbank:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> >> >> >
>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>> >> >> > GFF3,
>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>> >> >> > have
>> >> >> > no
>> >> >> > idea why it deserves a special treatment... :)
>> >> >> >
>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>> >> >> > column
>> >> >> > whenever available) will really make my life easier.
>> >> >> >
>> >> >> > Thank you,
>> >> >> > Dave
>> >> >> >
>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Dave,
>> >> >> >>
>> >> >> >> That seems perfectly reasonable. ?If you could point out a
>> >> >> >> GenBank
>> >> >> >> entry for which that does not happen, I could try to figure out
>> >> >> >> why
>> >> >> >> not.
>> >> >> >>
>> >> >> >> Scott
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>> >> >> >> > locus_tag
>> >> >> >> > will
>> >> >> >> > be
>> >> >> >> > always added to the GFF last column if it exists in the
>> >> >> >> > genbank,
>> >> >> >> > whether
>> >> >> >> > it
>> >> >> >> > is used as ID in the GFF or not.
>> >> >> >> >
>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>> >> >> >> > <scott at scottcain.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi Dave,
>> >> >> >> >>
>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>> >> >> >> >> with
>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole
>> >> >> >> >> genome
>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>> >> >> >> >> it
>> >> >> >> >> "do
>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post
>> >> >> >> >> processing step (either by hand or a quicky perl script) to be
>> >> >> >> >> required to really get it right. ?I don't recall the specifics
>> >> >> >> >> (if I
>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I do
>> >> >> >> >> know
>> >> >> >> >> that there is a list of things that it will try to use for the
>> >> >> >> >> ID,
>> >> >> >> >> and
>> >> >> >> >> while the locus is on the list, I don't know where it comes in
>> >> >> >> >> the
>> >> >> >> >> list, so it's possible that other items might supersede it.
>> >> >> >> >>
>> >> >> >> >> Scott
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds
>> >> >> >> >> > a
>> >> >> >> >> > `locus_tag`
>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>> >> >> >> >> > genabank
>> >> >> >> >> > has a
>> >> >> >> >> > locus
>> >> >> >> >> > tag.
>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>> >> >> >> >> >
>> >> >> >> >> > Thanks,
>> >> >> >> >> > Dave
>> >> >> >> >> > _______________________________________________
>> >> >> >> >> > Bioperl-l mailing list
>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> >> scottcain
>> >> >> >> >> dot net
>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> 216-392-3087
>> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------
>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> scottcain
>> >> >> >> dot net
>> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> 216-392-3087
>> >> >> >> Ontario Institute for Cancer Research
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------
>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> scottcain
>> >> >> dot net
>> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> >> Ontario Institute for Cancer Research
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 09:48:35 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 14:48:35 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimt5Ty6-qXigmF1w+h5ZvMCaPZ7wO=YXx8TJPqO@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTimt5Ty6-qXigmF1w+h5ZvMCaPZ7wO=YXx8TJPqO@mail.gmail.com>
Message-ID: <AANLkTimMZ4vZUw2Gsd-NmaWL8xiD4hryH8Z-7UezT=LO@mail.gmail.com>

Hi Dave,

The blib directory is not part of the repository; it is created when
you execute ./Build as a staging area before installation.  The
directory that the script resides is scripts/Bio-DB-GFF/

Scott


On Sat, Sep 18, 2010 at 2:40 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Now I did a fresh clone (instead of pull) into a new dir:
>
> $ git clone http://github.com/bioperl/bioperl-live.git
>
> but I don't find the script at all (there is no blib dir as before)...
>
>
> On Sat, Sep 18, 2010 at 3:14 PM, David Breimann <david.breimann at gmail.com>
> wrote:
>>
>> Yes, I'm using Ubuntu 10.04.
>>
>> That is really weired. I tried running the script from the perl-live dir
>> (which I just pulled using git), and I get the same results as before
>> (`Name` instead of `locus_tag`):
>>
>> ?$ wget
>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
>> NC_009789.genbank
>>
>> Attached is the resulting GFF3.
>> I also attach a copy of bp_genbank2gff3.pl as found under
>> /home/dave/src/bioperl-live/blib/script.
>>
>> This is a real mystery for me!
>>
>> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>>
>>> Typically I do build and install, but you can run it directly from the
>>> git checkout directory.
>>>
>>> For locating other versions of the script, are you running linux? ?If
>>> so, are you familiar with the "locate" command:
>>>
>>> ?locate bp_genbank2gff3.pl
>>>
>>> If you've never used it before, you may need to update the database
>>> the locate command uses as root:
>>>
>>> ?sudo updatedb
>>>
>>> Scott
>>>
>>>
>>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>>> <david.breimann at gmail.com> wrote:
>>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>>> > of
>>> > `locus_tag=`.
>>> >
>>> > I don't really know how to check for multiple bioperl installations.
>>> > I'm using my personal server, so I don't mind removing and installing
>>> > everything from scratch -- but I do'nt know ho to do that.
>>> >
>>> > Also, what I don't get with the git is how the scripts are supposed to
>>> > be
>>> > updated (unless you build and install).
>>> >
>>> > Thanks you!
>>> >
>>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
>>> > wrote:
>>> >>
>>> >> Well, if you aren't getting the same results as me then I'd say you
>>> >> aren't using the same version of the script :-)
>>> >>
>>> >> Unfortunately, the scripts are no longer automatically marked with the
>>> >> "internal" version information when committed, so there really isn't
>>> >> anything in the script I can tell you to look for. ?Check for more
>>> >> than one bioperl instance on your ?computer.
>>> >>
>>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>>> >> it is what you expect.
>>> >>
>>> >> Scott
>>> >>
>>> >>
>>> >>
>>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>>> >> <david.breimann at gmail.com> wrote:
>>> >> > Hi Scott,
>>> >> >
>>> >> > I just pulled the lated bioperl-live using git.
>>> >> > I'm not sure how the scripts are updated, so I Build and installed
>>> >> > anyway
>>> >> > (perhaps exporting the path is supposed to be enough?)
>>> >> > Anyway, I still get the same results. No locus_tag.
>>> >> > How can I tell if I'm using the latest version of the script?
>>> >> >
>>> >> > Thanks again.
>>> >> >
>>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Dave,
>>> >> >>
>>> >> >> A fresh "pull" of the bioperl git repository shows that
>>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>>> >> >> all
>>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>>> >> >> when
>>> >> >> it can (it can't blindly use the locus tag for the ID since both
>>> >> >> the
>>> >> >> gene and the CDS have the same tag).
>>> >> >>
>>> >> >> Scott
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>>> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> > Hi Scott,
>>> >> >> >
>>> >> >> > Here is a very short genbank:
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>>> >> >> >
>>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>>> >> >> > GFF3,
>>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>>> >> >> > have
>>> >> >> > no
>>> >> >> > idea why it deserves a special treatment... :)
>>> >> >> >
>>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>>> >> >> > column
>>> >> >> > whenever available) will really make my life easier.
>>> >> >> >
>>> >> >> > Thank you,
>>> >> >> > Dave
>>> >> >> >
>>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain
>>> >> >> > <scott at scottcain.net>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hi Dave,
>>> >> >> >>
>>> >> >> >> That seems perfectly reasonable. ?If you could point out a
>>> >> >> >> GenBank
>>> >> >> >> entry for which that does not happen, I could try to figure out
>>> >> >> >> why
>>> >> >> >> not.
>>> >> >> >>
>>> >> >> >> Scott
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>>> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>>> >> >> >> > locus_tag
>>> >> >> >> > will
>>> >> >> >> > be
>>> >> >> >> > always added to the GFF last column if it exists in the
>>> >> >> >> > genbank,
>>> >> >> >> > whether
>>> >> >> >> > it
>>> >> >> >> > is used as ID in the GFF or not.
>>> >> >> >> >
>>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>>> >> >> >> > <scott at scottcain.net>
>>> >> >> >> > wrote:
>>> >> >> >> >>
>>> >> >> >> >> Hi Dave,
>>> >> >> >> >>
>>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>>> >> >> >> >> with
>>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole
>>> >> >> >> >> genome
>>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>>> >> >> >> >> it
>>> >> >> >> >> "do
>>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post
>>> >> >> >> >> processing step (either by hand or a quicky perl script) to
>>> >> >> >> >> be
>>> >> >> >> >> required to really get it right. ?I don't recall the
>>> >> >> >> >> specifics
>>> >> >> >> >> (if I
>>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
>>> >> >> >> >> do
>>> >> >> >> >> know
>>> >> >> >> >> that there is a list of things that it will try to use for
>>> >> >> >> >> the
>>> >> >> >> >> ID,
>>> >> >> >> >> and
>>> >> >> >> >> while the locus is on the list, I don't know where it comes
>>> >> >> >> >> in
>>> >> >> >> >> the
>>> >> >> >> >> list, so it's possible that other items might supersede it.
>>> >> >> >> >>
>>> >> >> >> >> Scott
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>>> >> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> >> > Hello,
>>> >> >> >> >> >
>>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
>>> >> >> >> >> > adds a
>>> >> >> >> >> > `locus_tag`
>>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>>> >> >> >> >> > genabank
>>> >> >> >> >> > has a
>>> >> >> >> >> > locus
>>> >> >> >> >> > tag.
>>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>>> >> >> >> >> >
>>> >> >> >> >> > Thanks,
>>> >> >> >> >> > Dave
>>> >> >> >> >> > _______________________________________________
>>> >> >> >> >> > Bioperl-l mailing list
>>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> >> >> scottcain
>>> >> >> >> >> dot net
>>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> >> 216-392-3087
>>> >> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> >> scottcain
>>> >> >> >> dot net
>>> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> 216-392-3087
>>> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >>
>>> >> >>
>>> >> >> ------------------------------------------------------------------------
>>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> scottcain
>>> >> >> dot net
>>> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> 216-392-3087
>>> >> >> Ontario Institute for Cancer Research
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> ------------------------------------------------------------------------
>>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> scottcain
>>> >> dot net
>>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>>> >> Ontario Institute for Cancer Research
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>>> dot net
>>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>>> Ontario Institute for Cancer Research
>>
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 09:57:30 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 15:57:30 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
Message-ID: <AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>

So let's do an intermediate summary of my situation:
I'm using Ubuntu 10.04 and Perl 5.10.1.
I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of
"locus_tag=" in the last GFF3 column), while Scott gets the expected results
while using the latest version of bioperl.
I cloned a fresh version of bioperl live into my ~/src:
$ cd ~/src
$ git clone http://github.com/bioperl/bioperl-live.git

I then added the following line to the end of ~/.profile:
export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
and ran
$ source ~/.profile

I then downloaded a small genome from NCBI
$ wget
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
and tested the script:
$ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk

Following are the top 10 lines of the resulting GFF3:

##gff-version 3
# sequence-region NC_009789 1 6199
# conversion-by bp_genbank2gff3.pl
# organism Escherichia coli E24377A
# date 06-JAN-2010
# Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
NC_009789    GenBank    region    1    6199    .    +    1
ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia
coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This
record has not yet been subject to final NCBI review. The reference sequence
was derived from CP000798. Source DNA and bacteria available from Jacques
Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL
REFSEQ: This record has not yet been subject to final NCBI review. The
reference sequence was derived from CP000798. Source DNA and bacteria
available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
E24377A;plasmid=pETEC_6;strain=E24377A
NC_009789    GenBank    gene    665    781    .    -    1
ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001
NC_009789    GenBank    mRNA    665    781    .    -    1
ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
NC_009789    GenBank    CDS    665    781    .    -    1
ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified
by glimmer%3B putative;codon_start=1;product=hypothetical
protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38

while these are from Scotts' file:
##gff-version 3
# sequence-region NC_009789 1 6199
# conversion-by bp_genbank2gff3.pl
# organism Escherichia coli E24377A
# date 06-JAN-2010
# Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
NC_009789    GenBank    region    1    6199    .    +    1
ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A
plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has
not yet been subject to final NCBI review. The reference sequence was
derived from CP000798. Source DNA and bacteria available from Jacques Ravel
(jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ:
This record has not yet been subject to final NCBI review. The reference
sequence was derived from CP000798. Source DNA and bacteria available from
Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
E24377A;plasmid=pETEC_6;strain=E24377A
NC_009789    GenBank    gene    665    781    .    -    1
ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001
NC_009789    GenBank    mRNA    665    781    .    -    1
ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
NC_009789    GenBank    CDS    665    781    .    -    1
ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified
by glimmer%3B
putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical
protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38


Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's,
as desired.
I have no idea what is going on here...

Best,
Dave

On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> Let's keep the discussion on the mailing list so we can make sure that
> when this problem is solved, its resolution will be archived.
>
> I don't really understand what is going on either, though it would
> probably be a good idea to set your PERL5LIB env variable so that when
> you execute this script from the git repository that it will also uses
> BioPerl modules in the git repository instead of the ones that are
> installed in your "normal" path.
>
> Also, are you using any command line flags when executing it?  I didn't.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Yes, I'm using Ubuntu 10.04.
> >
> > That is really weired. I tried running the script from the perl-live dir
> > (which I just pulled using git), and I get the same results as before
> > (`Name` instead of `locus_tag`):
> >
> >  $ wget
> >
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> >  $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
> > NC_009789.genbank
> >
> > Attached is the resulting GFF3.
> > I also attach a copy of bp_genbank2gff3.pl as found under
> > /home/dave/src/bioperl-live/blib/script.
> >
> > This is a real mystery for me!
> >
> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
> >>
> >> Typically I do build and install, but you can run it directly from the
> >> git checkout directory.
> >>
> >> For locating other versions of the script, are you running linux?  If
> >> so, are you familiar with the "locate" command:
> >>
> >>  locate bp_genbank2gff3.pl
> >>
> >> If you've never used it before, you may need to update the database
> >> the locate command uses as root:
> >>
> >>  sudo updatedb
> >>
> >> Scott
> >>
> >>
> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
> >> <david.breimann at gmail.com> wrote:
> >> > Your gff seems fine. I get a vey similiar one, but with `Name=`
> instaed
> >> > of
> >> > `locus_tag=`.
> >> >
> >> > I don't really know how to check for multiple bioperl installations.
> >> > I'm using my personal server, so I don't mind removing and installing
> >> > everything from scratch -- but I do'nt know ho to do that.
> >> >
> >> > Also, what I don't get with the git is how the scripts are supposed to
> >> > be
> >> > updated (unless you build and install).
> >> >
> >> > Thanks you!
> >> >
> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
> wrote:
> >> >>
> >> >> Well, if you aren't getting the same results as me then I'd say you
> >> >> aren't using the same version of the script :-)
> >> >>
> >> >> Unfortunately, the scripts are no longer automatically marked with
> the
> >> >> "internal" version information when committed, so there really isn't
> >> >> anything in the script I can tell you to look for.  Check for more
> >> >> than one bioperl instance on your  computer.
> >> >>
> >> >> I've attached the GFF3 file I got so you can look at it and tell me
> if
> >> >> it is what you expect.
> >> >>
> >> >> Scott
> >> >>
> >> >>
> >> >>
> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
> >> >> <david.breimann at gmail.com> wrote:
> >> >> > Hi Scott,
> >> >> >
> >> >> > I just pulled the lated bioperl-live using git.
> >> >> > I'm not sure how the scripts are updated, so I Build and installed
> >> >> > anyway
> >> >> > (perhaps exporting the path is supposed to be enough?)
> >> >> > Anyway, I still get the same results. No locus_tag.
> >> >> > How can I tell if I'm using the latest version of the script?
> >> >> >
> >> >> > Thanks again.
> >> >> >
> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi Dave,
> >> >> >>
> >> >> >> A fresh "pull" of the bioperl git repository shows that
> >> >> >> bp_genbank2gff3.pl already does this.  It creates a locus_tag for
> >> >> >> all
> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID
> >> >> >> when
> >> >> >> it can (it can't blindly use the locus tag for the ID since both
> the
> >> >> >> gene and the CDS have the same tag).
> >> >> >>
> >> >> >> Scott
> >> >> >>
> >> >> >>
> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> > Hi Scott,
> >> >> >> >
> >> >> >> > Here is a very short genbank:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> >> >> >> >
> >> >> >> > Note all genes in the genbank have locus tags. In the resulting
> >> >> >> > GFF3,
> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag.
> I
> >> >> >> > have
> >> >> >> > no
> >> >> >> > idea why it deserves a special treatment... :)
> >> >> >> >
> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3
> last
> >> >> >> > column
> >> >> >> > whenever available) will really make my life easier.
> >> >> >> >
> >> >> >> > Thank you,
> >> >> >> > Dave
> >> >> >> >
> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <
> scott at scottcain.net>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Hi Dave,
> >> >> >> >>
> >> >> >> >> That seems perfectly reasonable.  If you could point out a
> >> >> >> >> GenBank
> >> >> >> >> entry for which that does not happen, I could try to figure out
> >> >> >> >> why
> >> >> >> >> not.
> >> >> >> >>
> >> >> >> >> Scott
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
> >> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
> >> >> >> >> > locus_tag
> >> >> >> >> > will
> >> >> >> >> > be
> >> >> >> >> > always added to the GFF last column if it exists in the
> >> >> >> >> > genbank,
> >> >> >> >> > whether
> >> >> >> >> > it
> >> >> >> >> > is used as ID in the GFF or not.
> >> >> >> >> >
> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
> >> >> >> >> > <scott at scottcain.net>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> Hi Dave,
> >> >> >> >> >>
> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to
> deal
> >> >> >> >> >> with
> >> >> >> >> >> GenBank files :-)  It was designed initially to work on
> whole
> >> >> >> >> >> genome
> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to
> make
> >> >> >> >> >> it
> >> >> >> >> >> "do
> >> >> >> >> >> the right thing."  In practice, it is not unusual for a post
> >> >> >> >> >> processing step (either by hand or a quicky perl script) to
> be
> >> >> >> >> >> required to really get it right.  I don't recall the
> specifics
> >> >> >> >> >> (if I
> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
> do
> >> >> >> >> >> know
> >> >> >> >> >> that there is a list of things that it will try to use for
> the
> >> >> >> >> >> ID,
> >> >> >> >> >> and
> >> >> >> >> >> while the locus is on the list, I don't know where it comes
> in
> >> >> >> >> >> the
> >> >> >> >> >> list, so it's possible that other items might supersede it.
> >> >> >> >> >>
> >> >> >> >> >> Scott
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> >> >> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> >> >> > Hello,
> >> >> >> >> >> >
> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
> adds
> >> >> >> >> >> > a
> >> >> >> >> >> > `locus_tag`
> >> >> >> >> >> > in the fields and sometime it doesn't, even though the
> >> >> >> >> >> > genabank
> >> >> >> >> >> > has a
> >> >> >> >> >> > locus
> >> >> >> >> >> > tag.
> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
> >> >> >> >> >> >
> >> >> >> >> >> > Thanks,
> >> >> >> >> >> > Dave
> >> >> >> >> >> > _______________________________________________
> >> >> >> >> >> > Bioperl-l mailing list
> >> >> >> >> >> > Bioperl-l at lists.open-bio.org
> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> >> >> Scott Cain, Ph. D.                                   scott
> at
> >> >> >> >> >> scottcain
> >> >> >> >> >> dot net
> >> >> >> >> >> GMOD Coordinator (http://gmod.org/)
> >> >> >> >> >> 216-392-3087
> >> >> >> >> >> Ontario Institute for Cancer Research
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> >> Scott Cain, Ph. D.                                   scott at
> >> >> >> >> scottcain
> >> >> >> >> dot net
> >> >> >> >> GMOD Coordinator (http://gmod.org/)
> >> >> >> >> 216-392-3087
> >> >> >> >> Ontario Institute for Cancer Research
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >>
> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> Scott Cain, Ph. D.                                   scott at
> >> >> >> scottcain
> >> >> >> dot net
> >> >> >> GMOD Coordinator (http://gmod.org/)
> 216-392-3087
> >> >> >> Ontario Institute for Cancer Research
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>
> ------------------------------------------------------------------------
> >> >> Scott Cain, Ph. D.                                   scott at
> scottcain
> >> >> dot net
> >> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> >> Ontario Institute for Cancer Research
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


From scott at scottcain.net  Sat Sep 18 10:03:43 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 15:03:43 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
	<AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>
Message-ID: <AANLkTi=yGr+VHDinKtSiy7XE3f7_F73hGRA4bNFzRGxz@mail.gmail.com>

The only thing I can add is that I did a 'git diff genbank2gff3.PLS'
and found no differences.  It occurred to me that perhaps I'd done
some fixing and not commited it, but it looks to me that that's not
the case (assuming I've managed to use git correctly (not a great
assumption, but I don't have another one to work with :-))

Scott


On Sat, Sep 18, 2010 at 2:57 PM, David Breimann
<david.breimann at gmail.com> wrote:
> So let's do an intermediate summary of my situation:
> I'm using Ubuntu 10.04 and Perl 5.10.1.
> I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of
> "locus_tag=" in the last GFF3 column), while Scott gets the expected results
> while using the latest version of bioperl.
> I cloned a fresh version of bioperl live into my ~/src:
> $ cd ~/src
> $ git clone http://github.com/bioperl/bioperl-live.git
>
> I then added the following line to the end of ~/.profile:
> export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
> and ran
> $ source ~/.profile
>
> I then downloaded a small genome from NCBI
> $ wget
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> and tested the script:
> $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk
>
> Following are the top 10 lines of the resulting GFF3:
>
> ##gff-version 3
> # sequence-region NC_009789 1 6199
> # conversion-by bp_genbank2gff3.pl
> # organism Escherichia coli E24377A
> # date 06-JAN-2010
> # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
> NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1
> ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia
> coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This
> record has not yet been subject to final NCBI review. The reference sequence
> was derived from CP000798. Source DNA and bacteria available from Jacques
> Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL
> REFSEQ: This record has not yet been subject to final NCBI review. The
> reference sequence was derived from CP000798. Source DNA and bacteria
> available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
> ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
> E24377A;plasmid=pETEC_6;strain=E24377A
> NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001
> NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
> NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified
> by glimmer%3B putative;codon_start=1;product=hypothetical
> protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38
>
> while these are from Scotts' file:
> ##gff-version 3
> # sequence-region NC_009789 1 6199
> # conversion-by bp_genbank2gff3.pl
> # organism Escherichia coli E24377A
> # date 06-JAN-2010
> # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
> NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1
> ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A
> plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has
> not yet been subject to final NCBI review. The reference sequence was
> derived from CP000798. Source DNA and bacteria available from Jacques Ravel
> (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ:
> This record has not yet been subject to final NCBI review. The reference
> sequence was derived from CP000798. Source DNA and bacteria available from
> Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
> ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
> E24377A;plasmid=pETEC_6;strain=E24377A
> NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001
> NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
> NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified
> by glimmer%3B
> putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical
> protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38
>
>
> Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's,
> as desired.
> I have no idea what is going on here...
>
> Best,
> Dave
>
> On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> Let's keep the discussion on the mailing list so we can make sure that
>> when this problem is solved, its resolution will be archived.
>>
>> I don't really understand what is going on either, though it would
>> probably be a good idea to set your PERL5LIB env variable so that when
>> you execute this script from the git repository that it will also uses
>> BioPerl modules in the git repository instead of the ones that are
>> installed in your "normal" path.
>>
>> Also, are you using any command line flags when executing it? ?I didn't.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Yes, I'm using Ubuntu 10.04.
>> >
>> > That is really weired. I tried running the script from the perl-live dir
>> > (which I just pulled using git), and I get the same results as before
>> > (`Name` instead of `locus_tag`):
>> >
>> > ?$ wget
>> >
>> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
>> > NC_009789.genbank
>> >
>> > Attached is the resulting GFF3.
>> > I also attach a copy of bp_genbank2gff3.pl as found under
>> > /home/dave/src/bioperl-live/blib/script.
>> >
>> > This is a real mystery for me!
>> >
>> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>> >>
>> >> Typically I do build and install, but you can run it directly from the
>> >> git checkout directory.
>> >>
>> >> For locating other versions of the script, are you running linux? ?If
>> >> so, are you familiar with the "locate" command:
>> >>
>> >> ?locate bp_genbank2gff3.pl
>> >>
>> >> If you've never used it before, you may need to update the database
>> >> the locate command uses as root:
>> >>
>> >> ?sudo updatedb
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Your gff seems fine. I get a vey similiar one, but with `Name=`
>> >> > instaed
>> >> > of
>> >> > `locus_tag=`.
>> >> >
>> >> > I don't really know how to check for multiple bioperl installations.
>> >> > I'm using my personal server, so I don't mind removing and installing
>> >> > everything from scratch -- but I do'nt know ho to do that.
>> >> >
>> >> > Also, what I don't get with the git is how the scripts are supposed
>> >> > to
>> >> > be
>> >> > updated (unless you build and install).
>> >> >
>> >> > Thanks you!
>> >> >
>> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
>> >> > wrote:
>> >> >>
>> >> >> Well, if you aren't getting the same results as me then I'd say you
>> >> >> aren't using the same version of the script :-)
>> >> >>
>> >> >> Unfortunately, the scripts are no longer automatically marked with
>> >> >> the
>> >> >> "internal" version information when committed, so there really isn't
>> >> >> anything in the script I can tell you to look for. ?Check for more
>> >> >> than one bioperl instance on your ?computer.
>> >> >>
>> >> >> I've attached the GFF3 file I got so you can look at it and tell me
>> >> >> if
>> >> >> it is what you expect.
>> >> >>
>> >> >> Scott
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>> >> >> <david.breimann at gmail.com> wrote:
>> >> >> > Hi Scott,
>> >> >> >
>> >> >> > I just pulled the lated bioperl-live using git.
>> >> >> > I'm not sure how the scripts are updated, so I Build and installed
>> >> >> > anyway
>> >> >> > (perhaps exporting the path is supposed to be enough?)
>> >> >> > Anyway, I still get the same results. No locus_tag.
>> >> >> > How can I tell if I'm using the latest version of the script?
>> >> >> >
>> >> >> > Thanks again.
>> >> >> >
>> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Dave,
>> >> >> >>
>> >> >> >> A fresh "pull" of the bioperl git repository shows that
>> >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>> >> >> >> all
>> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>> >> >> >> when
>> >> >> >> it can (it can't blindly use the locus tag for the ID since both
>> >> >> >> the
>> >> >> >> gene and the CDS have the same tag).
>> >> >> >>
>> >> >> >> Scott
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> > Hi Scott,
>> >> >> >> >
>> >> >> >> > Here is a very short genbank:
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> >> >> >> >
>> >> >> >> > Note all genes in the genbank have locus tags. In the resulting
>> >> >> >> > GFF3,
>> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag.
>> >> >> >> > I
>> >> >> >> > have
>> >> >> >> > no
>> >> >> >> > idea why it deserves a special treatment... :)
>> >> >> >> >
>> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3
>> >> >> >> > last
>> >> >> >> > column
>> >> >> >> > whenever available) will really make my life easier.
>> >> >> >> >
>> >> >> >> > Thank you,
>> >> >> >> > Dave
>> >> >> >> >
>> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain
>> >> >> >> > <scott at scottcain.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi Dave,
>> >> >> >> >>
>> >> >> >> >> That seems perfectly reasonable. ?If you could point out a
>> >> >> >> >> GenBank
>> >> >> >> >> entry for which that does not happen, I could try to figure
>> >> >> >> >> out
>> >> >> >> >> why
>> >> >> >> >> not.
>> >> >> >> >>
>> >> >> >> >> Scott
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>> >> >> >> >> > locus_tag
>> >> >> >> >> > will
>> >> >> >> >> > be
>> >> >> >> >> > always added to the GFF last column if it exists in the
>> >> >> >> >> > genbank,
>> >> >> >> >> > whether
>> >> >> >> >> > it
>> >> >> >> >> > is used as ID in the GFF or not.
>> >> >> >> >> >
>> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>> >> >> >> >> > <scott at scottcain.net>
>> >> >> >> >> > wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> Hi Dave,
>> >> >> >> >> >>
>> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to
>> >> >> >> >> >> deal
>> >> >> >> >> >> with
>> >> >> >> >> >> GenBank files :-) ?It was designed initially to work on
>> >> >> >> >> >> whole
>> >> >> >> >> >> genome
>> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to
>> >> >> >> >> >> make
>> >> >> >> >> >> it
>> >> >> >> >> >> "do
>> >> >> >> >> >> the right thing." ?In practice, it is not unusual for a
>> >> >> >> >> >> post
>> >> >> >> >> >> processing step (either by hand or a quicky perl script) to
>> >> >> >> >> >> be
>> >> >> >> >> >> required to really get it right. ?I don't recall the
>> >> >> >> >> >> specifics
>> >> >> >> >> >> (if I
>> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
>> >> >> >> >> >> do
>> >> >> >> >> >> know
>> >> >> >> >> >> that there is a list of things that it will try to use for
>> >> >> >> >> >> the
>> >> >> >> >> >> ID,
>> >> >> >> >> >> and
>> >> >> >> >> >> while the locus is on the list, I don't know where it comes
>> >> >> >> >> >> in
>> >> >> >> >> >> the
>> >> >> >> >> >> list, so it's possible that other items might supersede it.
>> >> >> >> >> >>
>> >> >> >> >> >> Scott
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >
>> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
>> >> >> >> >> >> > adds
>> >> >> >> >> >> > a
>> >> >> >> >> >> > `locus_tag`
>> >> >> >> >> >> > in the fields and sometime it doesn't, even though the
>> >> >> >> >> >> > genabank
>> >> >> >> >> >> > has a
>> >> >> >> >> >> > locus
>> >> >> >> >> >> > tag.
>> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>> >> >> >> >> >> >
>> >> >> >> >> >> > Thanks,
>> >> >> >> >> >> > Dave
>> >> >> >> >> >> > _______________________________________________
>> >> >> >> >> >> > Bioperl-l mailing list
>> >> >> >> >> >> > Bioperl-l at lists.open-bio.org
>> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott
>> >> >> >> >> >> at
>> >> >> >> >> >> scottcain
>> >> >> >> >> >> dot net
>> >> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> >> 216-392-3087
>> >> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> >> scottcain
>> >> >> >> >> dot net
>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> 216-392-3087
>> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------
>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> scottcain
>> >> >> >> dot net
>> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> 216-392-3087
>> >> >> >> Ontario Institute for Cancer Research
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------
>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> scottcain
>> >> >> dot net
>> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> >> Ontario Institute for Cancer Research
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From j.scholtalbers at gmail.com  Mon Sep 20 04:04:34 2010
From: j.scholtalbers at gmail.com (Jelle Scholtalbers)
Date: Mon, 20 Sep 2010 10:04:34 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
Message-ID: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>

Hi,

I'm trying to get all descendents for a specific taxon using Entrez.
each_Descendent and get_all_Descendents don't seem to be implemented or
working.  I then tried by getting the tree for this taxon using
Bio::DB::Taxonomy's get_tree. However this only retrieves the
ancestors/parents.
What would be the best approach here?

Cheers,
Jelle

On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> wrote:

> Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> Eric
>
> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> > Sounds like this is going through an initial indexing step (for
> flatfiles).  I would expect the initial indexing of the tables to take time
> as you have to create the DB, but subsequent lookups post-indexing should be
> much faster if the index is already present.  Maybe Jason could answer in
> more detail?
> >
> > chris
> >
> > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
> >
> >> Hello,
> >>
> >> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> >> 5.8.5 with BioPerl 1.6.0
> >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> >>
> >> It ran for 100 cpu seconds and output:
> >>
> >> 33090 Viridiplantae kingdom
> >>
> >> I was expecting it to also output the descendents. Some questions:
> >>
> >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> >> implemented? It looks to be in Taxon.pm but it is not documented and
> >> when I ran Data::Dumper on $node the value '_desc' was empty.
> >>
> >> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> >> with a call to 'entrez' it took only 0.02 cpu seconds to come
> >> up with the same result.
> >>
> >> thanks,
> >> Eric
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From pcantalupo at gmail.com  Mon Sep 20 10:46:32 2010
From: pcantalupo at gmail.com (Paul Cantalupo)
Date: Mon, 20 Sep 2010 10:46:32 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <AANLkTikCi7XiKptEghr7uckAUBcD_K3n_YAgHEJjGo_F@mail.gmail.com>

Jelle,

Below is my subroutine that returns the lineage corresponding to a
Taxonomy id. For example, if you use 10633 as the taxid, the
subroutine will return:

Viruses
dsDNA viruses, no RNA stage
Polyomaviridae
Polyomavirus
Simian virus 40

I hope this is what you wanted. Good luck

sub taxid2lineage {
?? my ($id) = @_;
?? return undef unless ($id);

?? my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
????????????????????????????????????????? -db??? => 'taxonomy',
????????????????????????????????????????? -email => 'pcantalupo at gmail.com',
????????????????????????????????????????? -id??? => [ $id ],
????????????????????????????????????????? );

?? my $res = $factory->get_Response->content;
?? my $data = XMLin($res);

?? if (!ref($data)) {
????? # this happens when the Taxid is not found in the Taxonomy DB
????? return $data;
?? }

?? my @lineage = ();

?? foreach my $taxa (@{ $data->{Taxon}->{LineageEx}->{Taxon} } ) {
????? # taxa is a hash with three keys ScientificName, TaxId, and Rank
????? # I'm only saving the ScientificName but possible extensions to this
????? # subroutine would be to return the TaxId and Rank as well.
????? push (@lineage, $taxa->{ScientificName});
?? }

?? # add the Species to the end of the Lineage array.
?? push (@lineage, $data->{Taxon}->{ScientificName});

?? return wantarray ? return @lineage : join("; ", @lineage);
}

Paul Cantalupo
University of Pittsburgh


On Mon, Sep 20, 2010 at 4:04 AM, Jelle Scholtalbers
<j.scholtalbers at gmail.com> wrote:
>
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working. ?I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> wrote:
>
> > Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> > Eric
> >
> > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu>
> > wrote:
> > > Sounds like this is going through an initial indexing step (for
> > flatfiles). ?I would expect the initial indexing of the tables to take time
> > as you have to create the DB, but subsequent lookups post-indexing should be
> > much faster if the index is already present. ?Maybe Jason could answer in
> > more detail?
> > >
> > > chris
> > >
> > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
> > >
> > >> Hello,
> > >>
> > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> > >> 5.8.5 with BioPerl 1.6.0
> > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> > >>
> > >> It ran for 100 cpu seconds and output:
> > >>
> > >> 33090 Viridiplantae kingdom
> > >>
> > >> I was expecting it to also output the descendents. Some questions:
> > >>
> > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> > >> implemented? It looks to be in Taxon.pm but it is not documented and
> > >> when I ran Data::Dumper on $node the value '_desc' was empty.
> > >>
> > >> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> > >> with a call to 'entrez' it took only 0.02 cpu seconds to come
> > >> up with the same result.
> > >>
> > >> thanks,
> > >> Eric
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Sep 20 11:38:36 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 20 Sep 2010 08:38:36 -0700
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <4C977FFC.5000205@bioperl.org>


This works for me to get all the descendents from sub-node. You have to 
call the function with the dabatase handle.   I am not sure if the Taxon 
implementation has reference to the dbhandle or not:
#!/usr/bin/perl -w
use strict;
use Bio::DB::Taxonomy;
my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump 
into this directory
my $db = Bio::DB::Taxonomy->new(-source => 'flatfile',
                                 -nodesfile => "$dbdir/nodes.dmp",
                                 -namesfile => "$dbdir/names.dmp",
                                 );
my $taxa = $db->get_taxon(-taxonid => 151341);
my @d = $db->get_all_Descendents($taxa);

print join("\n", map { $_->id . " " . $_->rank . " " . 
$_->scientific_name } @d), "\n";


Hope that helps.
Jelle Scholtalbers wrote, On 9/20/10 1:04 AM:
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working.  I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins<rec3141 at mcmaster.ca>  wrote:
>
>    
>> Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
>> Eric
>>
>> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields<cjfields at illinois.edu>
>> wrote:
>>      
>>> Sounds like this is going through an initial indexing step (for
>>>        
>> flatfiles).  I would expect the initial indexing of the tables to take time
>> as you have to create the DB, but subsequent lookups post-indexing should be
>> much faster if the index is already present.  Maybe Jason could answer in
>> more detail?
>>      
>>> chris
>>>
>>> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
>>>
>>>        
>>>> Hello,
>>>>
>>>> I tried the Bio::DB::Taxonomy example on this wiki page using perl
>>>> 5.8.5 with BioPerl 1.6.0
>>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>>>>
>>>> It ran for 100 cpu seconds and output:
>>>>
>>>> 33090 Viridiplantae kingdom
>>>>
>>>> I was expecting it to also output the descendents. Some questions:
>>>>
>>>> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
>>>> implemented? It looks to be in Taxon.pm but it is not documented and
>>>> when I ran Data::Dumper on $node the value '_desc' was empty.
>>>>
>>>> 2) is the flatfile reader always so slow? after replacing 'flatfile'
>>>> with a call to 'entrez' it took only 0.02 cpu seconds to come
>>>> up with the same result.
>>>>
>>>> thanks,
>>>> Eric
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>          
>>>        
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>      
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>    


From j.scholtalbers at gmail.com  Wed Sep 22 03:46:35 2010
From: j.scholtalbers at gmail.com (Jelle Scholtalbers)
Date: Wed, 22 Sep 2010 09:46:35 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <4C977FFC.5000205@bioperl.org>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
	<4C977FFC.5000205@bioperl.org>
Message-ID: <AANLkTi=nbAJcCs6a2F3pacUaxZE3JZ690dHTasLGug_a@mail.gmail.com>

Hi Jason,

this was the same method I was using. With the taxdump it works apparently,
however it does not work with Entrez as source. So I will just stick to a
up2date taxdump then. Thanks for your example.

@Paul: Your method gives indeed the lineage but will only retrieve the
ancestors. I want to retrieve all the descendents. Thx anyway.

Cheers,
Jelle

On Mon, Sep 20, 2010 at 5:38 PM, Jason Stajich <jason at bioperl.org> wrote:

>
> This works for me to get all the descendents from sub-node. You have to
> call the function with the dabatase handle.   I am not sure if the Taxon
> implementation has reference to the dbhandle or not:
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::Taxonomy;
> my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into
> this directory
> my $db = Bio::DB::Taxonomy->new(-source => 'flatfile',
>                                 -nodesfile => "$dbdir/nodes.dmp",
>                                 -namesfile => "$dbdir/names.dmp",
>                                 );
> my $taxa = $db->get_taxon(-taxonid => 151341);
> my @d = $db->get_all_Descendents($taxa);
>
> print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name
> } @d), "\n";
>
>
> Hope that helps.
> Jelle Scholtalbers wrote, On 9/20/10 1:04 AM:
>
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working.  I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> <rec3141 at mcmaster.ca> wrote:
>
>
>
>  Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> Eric
>
> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu> <cjfields at illinois.edu>
> wrote:
>
>
>  Sounds like this is going through an initial indexing step (for
>
>
>  flatfiles).  I would expect the initial indexing of the tables to take time
> as you have to create the DB, but subsequent lookups post-indexing should be
> much faster if the index is already present.  Maybe Jason could answer in
> more detail?
>
>
>  chris
>
> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
>
>
>
>  Hello,
>
> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> 5.8.5 with BioPerl 1.6.0http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>
> It ran for 100 cpu seconds and output:
>
> 33090 Viridiplantae kingdom
>
> I was expecting it to also output the descendents. Some questions:
>
> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> implemented? It looks to be in Taxon.pm but it is not documented and
> when I ran Data::Dumper on $node the value '_desc' was empty.
>
> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> with a call to 'entrez' it took only 0.02 cpu seconds to come
> up with the same result.
>
> thanks,
> Eric
> _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>         _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>      _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From waldenhe at muohio.edu  Fri Sep 24 15:15:48 2010
From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene)
Date: Fri, 24 Sep 2010 15:15:48 -0400
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>

Hello Bioperl Masters,

I am trying to perform a local blast with a query list of fasta files against a db of other fasta files.  I am attempting to use the Bio::Tools::Run::StandAloneBlastPlus module.  I have downleaded from the NCBI website BLAST+ 2.2.24+ and installed on my ubuntu machine.  I am using bioperl-1.5.2.

so the snibbit of code that is giving me errors is below:

my $seq_obj = Bio::Seq->new(-id =>$accn, -seq =>$seq);
my $report_obj = $blast_obj->blastall($seq_obj);
my $result_obj = $report_obj->next_result;
print $result_obj->num_hits;

The error I am getting is:

--------------------- WARNING ---------------------
MSG: cannot find path to blastall
---------------------------------------------------
Can't call method "next_result" on an undefined value at /media/C8B3-4A4A/Bioinformatics 1.1 beta/BioPerl/bioperl.pm line 284.

I think the real problem is the "cannot find path to Blastall.

>From reading around on different forums I have to make  a  .ncbirc text file with the location of BLAST+2.2.24+ on my machine.  I have that file in my /home folder.

How do I get  StandAloneBlastPlus synced up with BLAST+2.2.24+ ?  Am I approaching this right?

Thankyou,

Hans Waldenmaier


From ross at cuhk.edu.hk  Sat Sep 25 04:30:39 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Sat, 25 Sep 2010 16:30:39 +0800
Subject: [Bioperl-l] perl for GO
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <015201cb5c8b$ef693490$ce3b9db0$@edu.hk>


Given a set of GO IDs, e.g. 

GO:0008150
GO:0005750
GO:0006122
GO:0008121
GO:0003674
GO:0005575
GO:0008150
GO:0009507
GO:0009535
GO:0009567
GO:0009977
GO:0010027
GO:0031361

from
http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo

one can manually examine the hierarchy. Although there is go-perl
(http://search.cpan.org/~cmungall/go-perl/) and go-db-perl
(http://search.cpan.org/~cmungall/go-db-perl/), as a life science student
who just learns Perl, I find it difficult to draw a hierarchy tree (or
simply make it a table to count the occurrence) to produce something like:


biological_process (4)
*** cellular process (4)
****** cell adhesion (1)
****** cell differention (3)
Molecular function (4)
<snip>
Cellular component (4)
<snip>

Can anybody advise? I don't need any fancy figures at all...


From David.Messina at sbc.su.se  Sun Sep 26 12:11:54 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 26 Sep 2010 18:11:54 +0200
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>
References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>
Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0@sbc.su.se>

Hi Hans,


> I think the real problem is the "cannot find path to Blastall.

Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.

See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.

Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.


Dave


From maj at fortinbras.us  Sun Sep 26 20:43:15 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 27 Sep 2010 00:43:15 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W479062635438381285548195@webmail34>

Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:

git clone http://github.com/bioperl/bioperl-live.git
git clone http://github.com/bioperl/bioperl-run.git

(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl

cheers MAJ

--------------------------
Mark A. Jensen, PhD
Senior Consultant
Fortinbras Research
http://www.fortinbras.us

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Sunday, September 26, 2010 12:11 PM
>To: 'Waldenmaier, Hans Eugene'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans,
>
>
>> I think the real problem is the "cannot find path to Blastall.
>
>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>
>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Sep 27 17:07:11 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 27 Sep 2010 16:07:11 -0500
Subject: [Bioperl-l] Client-side Scansite Bioperl module
In-Reply-To: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
References: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
Message-ID: <D8C2E980-BEB9-4231-8014-7318778CD215@illinois.edu>

Sorry, didn't see this being responded to on-list (been off the radar the last month).  I think this is a good idea, but I'm wondering if this might be better as a separate release on CPAN from bioperl core, seeing as we're in the prelim stages after the next bioperl release of modularizing the current bioperl core into smaller independent releases.

chris

On Sep 4, 2010, at 10:40 AM, Jonathan Rameseder wrote:

> hi guys
> 
> it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases.  maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D!
> 
> best wishes
> johnny
> 
> [1] Bio::Tools::Analysis::Protein::Scansite
> [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593
> 
> ********************
> Jonathan Rameseder
> Ph.D. Candidate
> Computational Systems Biology Initiative
> Koch Institute for Integrative Cancer Research
> Massachusetts Institute of Technology
> ********************


From gandipalem at gmail.com  Tue Sep 28 00:09:06 2010
From: gandipalem at gmail.com (bv s)
Date: Tue, 28 Sep 2010 09:39:06 +0530
Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19
In-Reply-To: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
References: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
Message-ID: <AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>

Dear Sir/Madam,

Any one can tell how to use the make_primers.pl script?
What is Coordination file?

Regards
Suresh
Scholar,
National Bureau Of Plant Genetic Resources,
New Delhi.


On Mon, Sep 27, 2010 at 9:30 PM, <bioperl-l-request at lists.open-bio.org>wrote:

> Send Bioperl-l mailing list submissions to
>        bioperl-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> or, via email, send a message with subject or body 'help' to
>        bioperl-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
>        bioperl-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioperl-l digest..."
>
>
> Today's Topics:
>
>   1. Re:  StandAloneBlastPlus (Dave Messina)
>   2. Re:  StandAloneBlastPlus (Mark A. Jensen)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 26 Sep 2010 18:11:54 +0200
> From: Dave Messina <David.Messina at sbc.su.se>
> Subject: Re: [Bioperl-l] StandAloneBlastPlus
> To: "Waldenmaier, Hans Eugene" <waldenhe at muohio.edu>
> Cc: "bioperl-l at bioperl.org" <bioperl-l at bioperl.org>
> Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0 at sbc.su.se>
> Content-Type: text/plain; charset=us-ascii
>
> Hi Hans,
>
>
> > I think the real problem is the "cannot find path to Blastall.
>
> Yes. But it sounds like you're trying to use the Bio::Tools::Run modules
> for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it
> has blastn, blastp, etc.
>
> See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
> Also, you probably need to upgrade your BioPerl installation. I'm pretty
> sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
> Dave
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 27 Sep 2010 00:43:15 +0000
> From: "Mark A. Jensen" <maj at fortinbras.us>
> Subject: Re: [Bioperl-l] StandAloneBlastPlus
> To: "Dave Messina" <David.Messina at sbc.su.se>,   "Waldenmaier, Hans
>        Eugene" <waldenhe at muohio.edu>
> Cc: bioperl-l at bioperl.org
> Message-ID: <W479062635438381285548195 at webmail34>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Hans-- Dave is right; you'll need both the new blast+ as well as the
> latest BioPerl trunk code. Get it by doing both of the following:
>
> git clone http://github.com/bioperl/bioperl-live.git
> git clone http://github.com/bioperl/bioperl-run.git
>
> (i.e., you need the latest core and run distributions). To install, see
> http://www.bioperl.org/wiki/Installing_BioPerl
>
> cheers MAJ
>
> --------------------------
> Mark A. Jensen, PhD
> Senior Consultant
> Fortinbras Research
> http://www.fortinbras.us
>
> >-----Original Message-----
> >From: Dave Messina [mailto:David.Messina at sbc.su.se]
> >Sent: Sunday, September 26, 2010 12:11 PM
> >To: 'Waldenmaier, Hans Eugene'
> >Cc: bioperl-l at bioperl.org
> >Subject: Re: [Bioperl-l] StandAloneBlastPlus
> >
> >Hi Hans,
> >
> >
> >> I think the real problem is the "cannot find path to Blastall.
> >
> >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules
> for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it
> has blastn, blastp, etc.
> >
> >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
> >
> >Also, you probably need to upgrade your BioPerl installation. I'm pretty
> sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
> >
> >
> >
> >Dave
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> End of Bioperl-l Digest, Vol 89, Issue 19
> *****************************************
>


From David.Messina at sbc.su.se  Tue Sep 28 03:53:29 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 28 Sep 2010 09:53:29 +0200
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>
References: <W479062635438381285548195@webmail34>
	<23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>
Message-ID: <0BFD9DB0-40D9-4443-8968-CF5D5A31BD02@sbc.su.se>

> I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  

Type the following on the command line:
perl -e 'print $ENV{PATH}, "\n"'

You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing 
export /home/hans/BLAST/bin:PATH=${PATH}

on the command line and then type
perl -e 'print $ENV{PATH}, "\n"'

again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script?


> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
> export PATH=${PATH}:/home/hans/BLAST/bin
> export BLASTDIR=/home/hans/BLAST/
> 
> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.

It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added.

It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you.


Dave


From David.Messina at sbc.su.se  Tue Sep 28 03:58:00 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 28 Sep 2010 09:58:00 +0200
Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19
In-Reply-To: <AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>
References: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
	<AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>
Message-ID: <6BACC902-4F5E-466B-B949-FE373831CB92@sbc.su.se>

> Any one can tell how to use the make_primers.pl script?
> What is Coordination file?

>From the documentation at the top of the script:

Description: This program designs primers for constructing knockouts
of genes by transformation of PCR products (ref: Datsenko & Wanner,
PNAS 2000).

A tab-delimited file containing ORF START STOP is read, and primers
flanking the start & stop coordinates are designed based on the
user-designated sequence file.

In addition, primers flanking the knockout regions are chosen for PCR
screening purposes once the knockout is generated.  The script uses
Bioperl in order to determine the primer sequences, which requires
getting subsequences and reverse complementing some of the objects.


Dave


From maj at fortinbras.us  Tue Sep 28 07:18:34 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 28 Sep 2010 11:18:34 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W726055216153101285672714@webmail57>

The module checks the env variable BLASTPLUSDIR for the executable; 
you can set it directly

export BLASTPLUSDIR=/home/hans/BLAST/bin

and you should be good to go.
MAJ

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Tuesday, September 28, 2010 03:53 AM
>To: 'Waldenmaier, Hans Eugene'
>Cc: 'Mark A. Jensen', bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>> I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
>
>Type the following on the command line:
>perl -e 'print $ENV{PATH}, "\n"'
>
>You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing 
>export /home/hans/BLAST/bin:PATH=${PATH}
>
>on the command line and then type
>perl -e 'print $ENV{PATH}, "\n"'
>
>again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script?
>
>
>> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
>> export PATH=${PATH}:/home/hans/BLAST/bin
>> export BLASTDIR=/home/hans/BLAST/
>> 
>> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.
>
>It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added.
>
>It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you.
>
>
>Dave
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From waldenhe at muohio.edu  Tue Sep 28 00:52:56 2010
From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene)
Date: Tue, 28 Sep 2010 00:52:56 -0400
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <W479062635438381285548195@webmail34>
References: <W479062635438381285548195@webmail34>
Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>

Thanks Guys,

I have run those steps, my current version now is:
hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
1.006001

But I am still having problems.

I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI.  I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now.  
I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
export PATH=${PATH}:/home/hans/BLAST/bin
export BLASTDIR=/home/hans/BLAST/

Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.

Thanks for the help,

Hans
________________________________________
From: Mark A. Jensen [maj at fortinbras.us]
Sent: Sunday, September 26, 2010 8:43 
To: Dave Messina; Waldenmaier, Hans Eugene
Cc: bioperl-l at bioperl.org
Subject: Re:  [Bioperl-l] StandAloneBlastPlus

Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:

git clone http://github.com/bioperl/bioperl-live.git
git clone http://github.com/bioperl/bioperl-run.git

(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl

cheers MAJ

--------------------------
Mark A. Jensen, PhD
Senior Consultant
Fortinbras Research
http://www.fortinbras.us

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Sunday, September 26, 2010 12:11 PM
>To: 'Waldenmaier, Hans Eugene'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans,
>
>
>> I think the real problem is the "cannot find path to Blastall.
>
>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>
>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Tue Sep 28 11:04:07 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 28 Sep 2010 15:04:07 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W8604325546183031285686247@webmail66>

Should work from .bashrc, Hans. Also add  

export BLASTPLUSDIR=/home/hans/BLAST/bin

It really should see it in the PATH as you have it, so that may be a bug; however the BLASTPLUSDIR should force it to see the program.

You can also execute the export commands in the shell, and the variables will be set and visible to programs for the duration of the login session. You can see what they are set to in the shell by doing

set | grep BLAST

cheers MAJ

>-----Original Message-----
>From: Waldenmaier, Hans Eugene [mailto:waldenhe at muohio.edu]
>Sent: Tuesday, September 28, 2010 12:52 AM
>To: 'Mark A. Jensen', 'Dave Messina'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Thanks Guys,
>
>I have run those steps, my current version now is:
>hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
>1.006001
>
>But I am still having problems.
>
>I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI.  I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
>Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now.  
>I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
>export PATH=${PATH}:/home/hans/BLAST/bin
>export BLASTDIR=/home/hans/BLAST/
>
>Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.
>
>Thanks for the help,
>
>Hans
>________________________________________
>From: Mark A. Jensen [maj at fortinbras.us]
>Sent: Sunday, September 26, 2010 8:43 
>To: Dave Messina; Waldenmaier, Hans Eugene
>Cc: bioperl-l at bioperl.org
>Subject: Re:  [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:
>
>git clone http://github.com/bioperl/bioperl-live.git
>git clone http://github.com/bioperl/bioperl-run.git
>
>(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl
>
>cheers MAJ
>
>--------------------------
>Mark A. Jensen, PhD
>Senior Consultant
>Fortinbras Research
>http://www.fortinbras.us
>
>>-----Original Message-----
>>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>>Sent: Sunday, September 26, 2010 12:11 PM
>>To: 'Waldenmaier, Hans Eugene'
>>Cc: bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>>
>>Hi Hans,
>>
>>
>>> I think the real problem is the "cannot find path to Blastall.
>>
>>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>>
>>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>>
>>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>>
>>
>>
>>Dave
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From chiragmatkarbioinfo at gmail.com  Thu Sep 30 08:20:35 2010
From: chiragmatkarbioinfo at gmail.com (chirag matkar)
Date: Thu, 30 Sep 2010 19:20:35 +0700
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
Message-ID: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>

Hello all,
Is there any module to fetch dna sequence data from ensemble gene id?

-- 
Regards,
Chirag Matkar


From jun.yin at ucd.ie  Thu Sep 30 09:36:31 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Thu, 30 Sep 2010 14:36:31 +0100
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
In-Reply-To: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
References: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
Message-ID: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>

Hi, Chirag,

BioPerl does not have any module to retrieve data from Ensembl. But Ensembl
provides a BioPerl-like interface on that function.

You can visit Ensembl's website on how to use that module:
http://www.ensembl.org/info/data/api.html

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar
Sent: Thursday, September 30, 2010 1:21 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id

Hello all,
Is there any module to fetch dna sequence data from ensemble gene id?

-- 
Regards,
Chirag Matkar
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 
 
__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Thu Sep 30 11:16:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 30 Sep 2010 10:16:45 -0500
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
In-Reply-To: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>
References: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
	<011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>
Message-ID: <E4D361DE-CEB6-45CC-83EE-6547C497ABE3@illinois.edu>

On Sep 30, 2010, at 8:36 AM, Jun Yin wrote:

> Hi, Chirag,
> 
> BioPerl does not have any module to retrieve data from Ensembl. But Ensembl
> provides a BioPerl-like interface on that function.

Actually, BioPerl does have Bio::Tools::Run::Ensembl, which was submitted by Sendu Bala a few years back.  I think it stills works rather well, at least tests pass.  You might get more out of using the Ensembl API directly as Jun states though, YMMV.

BTW, the ensembl API also works with the latest bioperl code, regardless what the Ensembl website says (e.g. they only support v1.2.3).  Haven't heard more about whether this discrepancy was supposed to be addressed at some point.

chris

> You can visit Ensembl's website on how to use that module:
> http://www.ensembl.org/info/data/api.html
> 
> Cheers,
> Jun Yin
> Ph.D. student in U.C.D.
> 
> Bioinformatics Laboratory
> Conway Institute
> University College Dublin
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar
> Sent: Thursday, September 30, 2010 1:21 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
> 
> Hello all,
> Is there any module to fetch dna sequence data from ensemble gene id?
> 
> -- 
> Regards,
> Chirag Matkar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From A.Vakhrusheva at lumc.nl  Wed Sep 29 09:28:54 2010
From: A.Vakhrusheva at lumc.nl (A.Vakhrusheva at lumc.nl)
Date: Wed, 29 Sep 2010 15:28:54 +0200
Subject: [Bioperl-l] Bio::Matrix::MatrixI
Message-ID: <35D95AF6C5D146479C328BBBA554FB76028C367E@mailf.lumcnet.prod.intern>

Bio::Matrix::MatrixI

I have a question concerning this interface.

I want to calculate p distances matrix, but what format is acceptable
for input? Phylip doesn't work

Anna 


From shalabh.sharma7 at gmail.com  Wed Sep  1 16:56:35 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Wed, 1 Sep 2010 16:56:35 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer
Message-ID: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>

Hi ,
     I am trying to parse hmmsearch report (from HMMER3). I am using the
script mentioned here:
http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm

I am not getting anything but this "amoA_10genes_align.fasta.2  [M=247] for
HMM" as the output, i am not even getting any error.
I am attaching the hmmsearch report (just a test report) which i tried to
test against the parser.

I would really appreciate if anyone can help me out.

Thanks
Shalabh Sharma
-------------- next part --------------
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  amoA_10genes.hmm
# target sequence database:        test.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       amoA_10genes_align.fasta.2  [M=247]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                   Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                   -----------
    1.6e-72  231.1   5.1    1.7e-72  231.0   3.5    1.0  1  gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacte
    1.6e-72  231.1   5.1    1.7e-72  231.0   3.5    1.0  1  gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacte


Domain annotation for each sequence (and alignments):
>> gi|63021979|gb|AAY26564.1|  AmoA [uncultured beta proteobacterium]
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  231.0   3.5   1.7e-72   1.7e-72     113     245 ..       1     144 [.       1     146 [. 0.95

  Alignments for each domain:
  == domain 1    score: 231.0 bits;  conditional E-value: 1.7e-72
  amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195
                                 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe
  gi|63021979|gb|AAY26564.1|   1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 
                                 8********************************************************************************** PP

  amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245
                                 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af           +k+
  gi|63021979|gb|AAY26564.1|  84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144
                                 **********************************************966666666655555 PP

>> gi|63021981|gb|AAY26565.1|  AmoA [uncultured beta proteobacterium]
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  231.0   3.5   1.7e-72   1.7e-72     113     245 ..       1     144 [.       1     146 [. 0.95

  Alignments for each domain:
  == domain 1    score: 231.0 bits;  conditional E-value: 1.7e-72
  amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195
                                 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe
  gi|63021981|gb|AAY26565.1|   1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 
                                 8********************************************************************************** PP

  amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245
                                 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af           +k+
  gi|63021981|gb|AAY26565.1|  84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144
                                 **********************************************966666666655555 PP


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (247 nodes)
Target sequences:                          2  (300 residues)
Passed MSV filter:                         2  (1); expected 0.0 (0.02)
Passed bias filter:                        2  (1); expected 0.0 (0.02)
Passed Vit filter:                         2  (1); expected 0.0 (0.001)
Passed Fwd filter:                         2  (1); expected 0.0 (1e-05)
Initial search space (Z):                  2  [actual number of targets]
Domain search space  (domZ):               2  [number of targets reported over threshold]
# CPU time: 0.03u 0.00s 00:00:00.03 Elapsed: 00:00:00.08
# Mc/sec: 0.93
//

From thomas.sharpton at gmail.com  Wed Sep  1 17:29:26 2010
From: thomas.sharpton at gmail.com (Thomas Sharpton)
Date: Wed, 1 Sep 2010 14:29:26 -0700
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
Message-ID: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>

Hi Shalabh,

We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to  
use the HMMER3 version, as found here:

http://github.com/bioperl/bioperl-hmmer3

Hope this helps,
T

On Sep 1, 2010, at 1:56 PM, shalabh sharma wrote:

> Hi ,
>     I am trying to parse hmmsearch report (from HMMER3). I am using  
> the
> script mentioned here:
> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm
>
> I am not getting anything but this "amoA_10genes_align.fasta.2   
> [M=247] for
> HMM" as the output, i am not even getting any error.
> I am attaching the hmmsearch report (just a test report) which i  
> tried to
> test against the parser.
>
> I would really appreciate if anyone can help me out.
>
> Thanks
> Shalabh Sharma
> <test_amao.txt>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From kai.blin at biotech.uni-tuebingen.de  Thu Sep  2 04:44:58 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 2 Sep 2010 10:44:58 +0200
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
	<8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
Message-ID: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>

On Wed, 1 Sep 2010 14:29:26 -0700
Thomas Sharpton <thomas.sharpton at gmail.com> wrote:

Hi,

> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to  
> use the HMMER3 version, as found here:
> 
> http://github.com/bioperl/bioperl-hmmer3

Actually it's now included in the bioperl-live repository, but the code
hasn't made it into a release yet.

http://github.com/bioperl/bioperl-live.git

Cheers,
Kai
-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From e.stupka at ucl.ac.uk  Thu Sep  2 08:32:02 2010
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Thu, 2 Sep 2010 13:32:02 +0100
Subject: [Bioperl-l] git account
Message-ID: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>

Hello there,

I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any?

thanks!

Elia


---
'"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." 
~ Stephen Hawkings

Senior Lecturer, Bioinformatics
Scientific Director - Bioinformatics, UCL Genomics

UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Institute of Cell and Molecular Science
Barts and The London School of Medicine and Dentistry
4 Newark Street
Whitechapel
London
E1 2AT

Office (UCL): +44 207 679 6493
Fax: +44 0207 6796817 
Office (ICMS): +44 0207 8822374

Mobile: +44 787 6478912


From cjfields at illinois.edu  Thu Sep  2 10:29:40 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 2 Sep 2010 09:29:40 -0500
Subject: [Bioperl-l] git account
In-Reply-To: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>
References: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>
Message-ID: <C9AD72E7-32C2-48D0-A7AB-18F42BABD28D@illinois.edu>

Done!  Let us know if you run into problems.

chris

On Sep 2, 2010, at 7:32 AM, Elia Stupka wrote:

> Hello there,
> 
> I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any?
> 
> thanks!
> 
> Elia
> 
> 
> ---
> '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." 
> ~ Stephen Hawkings
> 
> Senior Lecturer, Bioinformatics
> Scientific Director - Bioinformatics, UCL Genomics
> 
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Institute of Cell and Molecular Science
> Barts and The London School of Medicine and Dentistry
> 4 Newark Street
> Whitechapel
> London
> E1 2AT
> 
> Office (UCL): +44 207 679 6493
> Fax: +44 0207 6796817 
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 787 6478912
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From J.Christopher.Ellis at duke.edu  Thu Sep  2 10:53:34 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Thu,  2 Sep 2010 10:53:34 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <53096.1283439214@duke.edu>

 Chris have you had any luck with this?

 Thanks,
 Chris

 On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent:
 Yes, I see that one. It may be the ID hash that is being returned is
empty. I'll look into it.

 -c 

 On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:

 > Hi Chris,
 > 
 > The error is...
 > 
 > "Use of uninitialized value $id in join or string at
C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."
 > 
 > The script from
http://bioperl.org/wiki/Species_names_from_accession_numbers is as
follows....
 > 
 > use Bio::DB::EUtilities;
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > my (%taxa, @taxa);
 > 
 > 
 > 
 > my (%names, %idmap);
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # these are protein ids; nuc ids will work by changing -dbfrom =>
'nucleotide',
 > 
 > 
 > 
 > # (probably)
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > my @ids = qw(1621261 89318838 68536103 
 > 
 > 20807972
 > 730439);
 > 
 > 
 > 
 > 
 > 
 > 
 > my $factory = Bio::DB::EUtilities->new(
 > 
 > -
 > eutil => 'elink',
 > 
 > 
 > -db => 'taxonomy',
 > 
 > 
 > 
 > 
 > -dbfrom => 'protein',
 > 
 > 
 > 
 > 
 > -correspondence => 1,
 > 
 > 
 > 
 > 
 > -id => @ids);
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # iterate through the LinkSet objects
 > 
 > 
 > 
 > while (my $ds = $factory->next_LinkSet) {
 > 
 > 
 > 
 > 
 > $taxa{($ds->get_submitted_ids)[0]
 > 
 > }
 > = ($ds->get_ids)[0]
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > @taxa = @taxa{@ids};
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > $factory = Bio::DB::EUtilities->new(-eutil 
 > 
 > =>
 > 'esummary',
 > 
 > 
 > -db => 'taxonomy',
 > 
 > 
 > 
 > 
 > -id => @taxa );
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > while (local $_ = $factory->next_DocSum)
 > 
 > 
 > {
 > 
 > 
 > $names{($_->get_contents_by_name('TaxId'))
 > 
 > [
 > 0]} = 
 > 
 > ($_->get_contents_by_name('ScientificName'))[0
 > 
 > ]
 > ;
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > foreach (@ids) {
 > 
 > 
 > 
 > 
 > $idmap{$_} = $names{$taxa{$_
 > 
 > }
 > };
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # %idmap is
 > 
 > 
 > 
 > # 1621261 => 'Mycobacterium tuberculosis H37Rv'
 > 
 > 
 > 
 > # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
 > 
 > 
 > 
 > # 68536103 => 'Corynebacterium jeikeium K411'
 > 
 > 
 > 
 > # 730439 => 'Bacillus caldolyticus'
 > 
 > 
 > 
 > # 89318838 => undef (this record has been removed from the db)
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 1;
 > 
 > 
 > Thanks,
 > 
 > 
 > 
 > Chris
 > 
 > 
 > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
 > Chris,
 > 
 > Regarding a fix for that script, we would have to see your modified
script and the error. However, there are modules within BioPerl to
essentially do what you want, in particular, Bio::DB::Taxonomy.
 > 
 > chris
 > 
 > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
 > 
 > > Hi All,
 > > 
 > > I am trying to extract the entire taxonomy of an organism including
the
 > > classifications. Some thing like...
 > > 
 > > Phylum:Proteobacteria, Class:Gammaproteobacteria,
Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
 > > 
 > > I am not worried about format just that I get the information and the
associated level of hierarchy. The script found
athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers
seemed like a good starting point so I copied it and tried run it but got
an error.
 > > 
 > > My first question is "Is there a known fix for this?" and my second
question is how do I get the full hierarchical information (as seen above)
with the taxonomy db?
 > > 
 > > Thanks for all your help in advance!
 > > 
 > > Chris 
 > > 
 > > 
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l at lists.open-bio.org
 > >
http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 

 
From cjfields at illinois.edu  Thu Sep  2 12:21:48 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 02 Sep 2010 11:21:48 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <53096.1283439214@duke.edu>
References: <53096.1283439214@duke.edu>
Message-ID: <1283444508.5339.10.camel@pyrimidine.igb.uiuc.edu>

Chris,

There are a few things wrong with the original script, so I'll fix them.
Basically, it makes the assumption that every ID in the original list is
found.  The problem: eutils only reports back data it finds, silently
discarding IDs that don't match.  So, using the original ID list when
building the hashes needs a bit more error checking.

Here's the revised script that works for me.

https://gist.github.com/f5db90a432fed68548d4

I'm also adding a check to ensure all IDs are defined prior to adding
them to the param string, just in case.

chris

On Thu, 2010-09-02 at 10:53 -0400, J. Christopher Ellis wrote:
> Chris have you had any luck with this?
> 
> Thanks,
> Chris
> 
> On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent:
>         Yes, I see that one. It may be the ID hash that is being
>         returned is empty. I'll look into it.
>         
>         -c 
>         
>         On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:
>         
>         > Hi Chris,
>         > 
>         > The error is...
>         > 
>         > "Use of uninitialized value $id in join or string at
>         C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm
>         line 363."
>         > 
>         > The script from
>         http://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows....
>         > 
>         > use Bio::DB::EUtilities;
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my (%taxa, @taxa);
>         > 
>         > 
>         > 
>         > my (%names, %idmap);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # these are protein ids; nuc ids will work by changing
>         -dbfrom => 'nucleotide',
>         > 
>         > 
>         > 
>         > # (probably)
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my @ids = qw(1621261 89318838 68536103 
>         > 
>         > 20807972
>         > 730439);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my $factory = Bio::DB::EUtilities->new(
>         > 
>         > -
>         > eutil => 'elink',
>         > 
>         > 
>         > -db => 'taxonomy',
>         > 
>         > 
>         > 
>         > 
>         > -dbfrom => 'protein',
>         > 
>         > 
>         > 
>         > 
>         > -correspondence => 1,
>         > 
>         > 
>         > 
>         > 
>         > -id => \@ids);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # iterate through the LinkSet objects
>         > 
>         > 
>         > 
>         > while (my $ds = $factory->next_LinkSet) {
>         > 
>         > 
>         > 
>         > 
>         > $taxa{($ds->get_submitted_ids)[0]
>         > 
>         > }
>         > = ($ds->get_ids)[0]
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > @taxa = @taxa{@ids};
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > $factory = Bio::DB::EUtilities->new(-eutil 
>         > 
>         > =>
>         > 'esummary',
>         > 
>         > 
>         > -db => 'taxonomy',
>         > 
>         > 
>         > 
>         > 
>         > -id => \@taxa );
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > while (local $_ = $factory->next_DocSum)
>         > 
>         > 
>         > {
>         > 
>         > 
>         > $names{($_->get_contents_by_name('TaxId'))
>         > 
>         > [
>         > 0]} = 
>         > 
>         > ($_->get_contents_by_name('ScientificName'))[0
>         > 
>         > ]
>         > ;
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > foreach (@ids) {
>         > 
>         > 
>         > 
>         > 
>         > $idmap{$_} = $names{$taxa{$_
>         > 
>         > }
>         > };
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # %idmap is
>         > 
>         > 
>         > 
>         > # 1621261 => 'Mycobacterium tuberculosis H37Rv'
>         > 
>         > 
>         > 
>         > # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
>         > 
>         > 
>         > 
>         > # 68536103 => 'Corynebacterium jeikeium K411'
>         > 
>         > 
>         > 
>         > # 730439 => 'Bacillus caldolyticus'
>         > 
>         > 
>         > 
>         > # 89318838 => undef (this record has been removed from the
>         db)
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 1;
>         > 
>         > 
>         > Thanks,
>         > 
>         > 
>         > 
>         > Chris
>         > 
>         > 
>         > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu
>         sent:
>         > Chris,
>         > 
>         > Regarding a fix for that script, we would have to see your
>         modified script and the error. However, there are modules
>         within BioPerl to essentially do what you want, in particular,
>         Bio::DB::Taxonomy.
>         > 
>         > chris
>         > 
>         > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
>         > 
>         > > Hi All,
>         > > 
>         > > I am trying to extract the entire taxonomy of an organism
>         including the
>         > > classifications. Some thing like...
>         > > 
>         > > Phylum:Proteobacteria, Class:Gammaproteobacteria,
>         Order:Enterobacteriales, Family:Enterobacteriaceae,
>         Genus:Escherichia
>         > > 
>         > > I am not worried about format just that I get the
>         information and the associated level of hierarchy. The script
>         found
>         http://bioperl.org/wiki/Species_names_from_accession_numbers%
>         26quot%3B%26gt%
>         3Bhttp://bioperl.org/wiki/Species_names_from_accession_numbers">athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
>         > > 
>         > > My first question is "Is there a known fix for this?" and
>         my second question is how do I get the full hierarchical
>         information (as seen above) with the taxonomy db?
>         > > 
>         > > Thanks for all your help in advance!
>         > > 
>         > > Chris 
>         > > 
>         > > 
>         > > _______________________________________________
>         > > Bioperl-l mailing list
>         > > Bioperl-l at lists.open-bio.org
>         > > http://lists.open-bio.org/mailman/listinfo/bioperl-l%
>         26quot%3B%26gt%
>         3Bhttp://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
>         > 
>         > 
>         
>         


From thomas.sharpton at gmail.com  Thu Sep  2 12:34:07 2010
From: thomas.sharpton at gmail.com (Thomas Sharpton)
Date: Thu, 2 Sep 2010 09:34:07 -0700
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
	<8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
	<20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <C272A232-C37D-40F9-B729-626B8DB17BB3@gmail.com>

So it is! I'm paying attention, I swear I am....

Shalabh, if the HMMER3 version of SearchIO doesn't solve your problem,  
do let us know.

Best,
Tom

On Sep 2, 2010, at 1:44 AM, Kai Blin wrote:

> On Wed, 1 Sep 2010 14:29:26 -0700
> Thomas Sharpton <thomas.sharpton at gmail.com> wrote:
>
> Hi,
>
>> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to
>> use the HMMER3 version, as found here:
>>
>> http://github.com/bioperl/bioperl-hmmer3
>
> Actually it's now included in the bioperl-live repository, but the  
> code
> hasn't made it into a release yet.
>
> http://github.com/bioperl/bioperl-live.git
>
> Cheers,
> Kai
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From johnny at mit.edu  Sat Sep  4 11:40:37 2010
From: johnny at mit.edu (Jonathan Rameseder)
Date: Sat, 4 Sep 2010 11:40:37 -0400
Subject: [Bioperl-l] Client-side Scansite Bioperl module
Message-ID: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>

hi guys

it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases.  maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D!

best wishes
johnny

[1] Bio::Tools::Analysis::Protein::Scansite
[2] http://www.ncbi.nlm.nih.gov/pubmed/11283593

********************
Jonathan Rameseder
Ph.D. Candidate
Computational Systems Biology Initiative
Koch Institute for Integrative Cancer Research
Massachusetts Institute of Technology
********************


From David.Messina at sbc.su.se  Mon Sep  6 08:14:20 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 6 Sep 2010 14:14:20 +0200
Subject: [Bioperl-l] Client-side Scansite Bioperl module
In-Reply-To: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
References: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
Message-ID: <0EA1C4B0-66CF-4AE3-9A47-CC6624737821@sbc.su.se>

Hi Jonathan,

Great to hear you're interested in including your code in BioPerl!

In general, we are liberal in what we accept. I think (and I'd like to hear what other BioPerlers think) the value of adding your code depends a lot on how it ties in with existing BioPerl objects ? does it make use of Bio::Seq or Bio::SeqIO, for example?

If you haven't already, you might want to take a look at some of our developer documentation. For example:
 
	http://www.bioperl.org/wiki/Bioperl_Best_Practices
	http://www.bioperl.org/wiki/Advanced_BioPerl


Also, the other thing to be aware of is that in the near future BioPerl itself will be splitting up into separately distributed modules anyway.

I can't find a good recent thread that discussed the rationale and details, but here's a couple anyway:

	http://www.bioperl.org/wiki/Proposed_BioPerl_changes
	http://old.nabble.com/Final-BioPerl-1.6-release-td29180027.html#a29195208


Dave


From ross at cuhk.edu.hk  Tue Sep  7 04:28:00 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Tue, 7 Sep 2010 16:28:00 +0800
Subject: [Bioperl-l]  Indexing nr database
In-Reply-To: <C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
Message-ID: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>

By the following codes, I wanna index the 4G nr database, however, the index
file is > 1T and the job has been running for weeks and still hasn't
finished. Could anybody tell me how you accomplish the goal? Thanks in
advance.

    use strict;

     use Bio::DB::Flat::BinarySearch;

 
     (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV;

 
     # use single quotes so you don't have to write

     # regular expressions like "gi\\|(\\d+)"

     #my $primary_pattern = '^>(\S+)';

     #if ($fullHeader == 1) {

        my $primary_pattern = '^>(.+)';

     #}

     my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
H37Rv complete genome";
#$string =~ s/$primary_pattern/RRR/g;

     #print "$string\n";

 
     # one or more patterns stored in a hash:

     my $secondary_patterns = {GI => 'gi\|(\d+)'};

 
     my $db = Bio::DB::Flat::BinarySearch->new(

                           -directory          => $baseDir,

                           -dbname             => $dbName,

                           -write_flag         => 1,

                           -primary_pattern    => $primary_pattern,

                           -primary_namespace  => 'ACC',

                           -secondary_patterns => $secondary_patterns,

                           -verbose            => 1,

                           -format             => 'fasta'  );

 
     $db->build_index($seqFile);

                                              
From David.Messina at sbc.su.se  Tue Sep  7 05:23:42 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 7 Sep 2010 11:23:42 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
Message-ID: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>

Hi Ross,

What do you need the index for?

If it's random retrieval of sequences using an accession or GI, you'd be better off using NCBI's own database indexing and retrieval tools. They're far faster than BioPerl.

They're distributed with Blast+ and available here:

	ftp://ftp.ncbi.nlm.nih.gov//blast/executables/LATEST

Specifically, I'm talking about 'makeblastdb' and blastdbcmd'.


I'm not sure what you mean by "4g" nr, but there's an already-indexed version of nr available here:

		ftp://ftp.ncbi.nih.gov//blast/db

You can use that directly with the BLAST+ database tools.


Also, you take a look at the cookbook at the end of the Blast+ user manual (available in the same download directory as Blast+ itself). Some nice examples there showing off the flexibility of this latest version of the software.


Dave


From ross at cuhk.edu.hk  Tue Sep  7 05:18:16 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Tue, 7 Sep 2010 17:18:16 +0800
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <4C860148.3030000@fmi.ch>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch>
Message-ID: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>

The reason is that I have to retrieve the specific information of the
matched sequences, e.g. extract the 64th amino acid of the top matched
sequence. Is there any way to achieve that?

-----Original Message-----
From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] 
Sent: Tuesday, September 07, 2010 5:09 PM
To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk
Subject: Re: [Bioperl-l] Indexing nr database

Hi


why don't you use the pre-indexed BLAST files from NCBI:

ftp://ftp.ncbi.nih.gov/blast/db/

you can use them to fetch individual sequences by gi number or accession 
with the tool "blastdbcmd" from blast+ binaries:

ftp://ftp.ncbi.nih.gov/blast/executables/blast+/


regards, Hans


On 09/07/2010 10:28 AM, Ross KK Leung wrote:
> By the following codes, I wanna index the 4G nr database, however, the
index
> file is>  1T and the job has been running for weeks and still hasn't
> finished. Could anybody tell me how you accomplish the goal? Thanks in
> advance.
>
>      use strict;
>
>       use Bio::DB::Flat::BinarySearch;
>
>
>
>       (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) =
@ARGV;
>
>
>
>       # use single quotes so you don't have to write
>
>       # regular expressions like "gi\\|(\\d+)"
>
>       #my $primary_pattern = '^>(\S+)';
>
>       #if ($fullHeader == 1) {
>
>          my $primary_pattern = '^>(.+)';
>
>       #}
>
>       my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
> H37Rv complete genome";
> #$string =~ s/$primary_pattern/RRR/g;
>
>       #print "$string\n";
>
>
>
>       # one or more patterns stored in a hash:
>
>       my $secondary_patterns = {GI =>  'gi\|(\d+)'};
>
>
>
>       my $db = Bio::DB::Flat::BinarySearch->new(
>
>                             -directory          =>  $baseDir,
>
>                             -dbname             =>  $dbName,
>
>                             -write_flag         =>  1,
>
>                             -primary_pattern    =>  $primary_pattern,
>
>                             -primary_namespace  =>  'ACC',
>
>                             -secondary_patterns =>  $secondary_patterns,
>
>                             -verbose            =>  1,
>
>                             -format             =>  'fasta'  );
>
>
>
>       $db->build_index($seqFile);
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hrh at fmi.ch  Tue Sep  7 05:09:28 2010
From: hrh at fmi.ch (Hans-Rudolf Hotz)
Date: Tue, 07 Sep 2010 11:09:28 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
Message-ID: <4C860148.3030000@fmi.ch>

Hi


why don't you use the pre-indexed BLAST files from NCBI:

ftp://ftp.ncbi.nih.gov/blast/db/

you can use them to fetch individual sequences by gi number or accession 
with the tool "blastdbcmd" from blast+ binaries:

ftp://ftp.ncbi.nih.gov/blast/executables/blast+/


regards, Hans


On 09/07/2010 10:28 AM, Ross KK Leung wrote:
> By the following codes, I wanna index the 4G nr database, however, the index
> file is>  1T and the job has been running for weeks and still hasn't
> finished. Could anybody tell me how you accomplish the goal? Thanks in
> advance.
>
>      use strict;
>
>       use Bio::DB::Flat::BinarySearch;
>
>
>
>       (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV;
>
>
>
>       # use single quotes so you don't have to write
>
>       # regular expressions like "gi\\|(\\d+)"
>
>       #my $primary_pattern = '^>(\S+)';
>
>       #if ($fullHeader == 1) {
>
>          my $primary_pattern = '^>(.+)';
>
>       #}
>
>       my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
> H37Rv complete genome";
> #$string =~ s/$primary_pattern/RRR/g;
>
>       #print "$string\n";
>
>
>
>       # one or more patterns stored in a hash:
>
>       my $secondary_patterns = {GI =>  'gi\|(\d+)'};
>
>
>
>       my $db = Bio::DB::Flat::BinarySearch->new(
>
>                             -directory          =>  $baseDir,
>
>                             -dbname             =>  $dbName,
>
>                             -write_flag         =>  1,
>
>                             -primary_pattern    =>  $primary_pattern,
>
>                             -primary_namespace  =>  'ACC',
>
>                             -secondary_patterns =>  $secondary_patterns,
>
>                             -verbose            =>  1,
>
>                             -format             =>  'fasta'  );
>
>
>
>       $db->build_index($seqFile);
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hrh at fmi.ch  Tue Sep  7 05:33:46 2010
From: hrh at fmi.ch (Hans-Rudolf Hotz)
Date: Tue, 07 Sep 2010 11:33:46 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch>
	<007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>
Message-ID: <4C8606FA.3000509@fmi.ch>


On 09/07/2010 11:18 AM, Ross KK Leung wrote:
> The reason is that I have to retrieve the specific information of the
> matched sequences, e.g. extract the 64th amino acid of the top matched
> sequence. Is there any way to achieve that?

"blastdbcmd" has several options like "-range"

and even if "blastdbcmd" does not give you the subset of information you 
want to fetch, I am still convinced you are quicker by fetching the 
complete entry with"blastdbcmd" and then parse the required data out of 
just one entry.

Hans

> -----Original Message-----
> From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch]
> Sent: Tuesday, September 07, 2010 5:09 PM
> To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk
> Subject: Re: [Bioperl-l] Indexing nr database
>
> Hi
>
>
> why don't you use the pre-indexed BLAST files from NCBI:
>
> ftp://ftp.ncbi.nih.gov/blast/db/
>
> you can use them to fetch individual sequences by gi number or accession
> with the tool "blastdbcmd" from blast+ binaries:
>
> ftp://ftp.ncbi.nih.gov/blast/executables/blast+/
>
>
> regards, Hans
>
>
>
> On 09/07/2010 10:28 AM, Ross KK Leung wrote:
>> By the following codes, I wanna index the 4G nr database, however, the
> index
>> file is>   1T and the job has been running for weeks and still hasn't
>> finished. Could anybody tell me how you accomplish the goal? Thanks in
>> advance.
>>
>>       use strict;
>>
>>        use Bio::DB::Flat::BinarySearch;
>>
>>
>>
>>        (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) =
> @ARGV;
>>
>>
>>
>>        # use single quotes so you don't have to write
>>
>>        # regular expressions like "gi\\|(\\d+)"
>>
>>        #my $primary_pattern = '^>(\S+)';
>>
>>        #if ($fullHeader == 1) {
>>
>>           my $primary_pattern = '^>(.+)';
>>
>>        #}
>>
>>        my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
>> H37Rv complete genome";
>> #$string =~ s/$primary_pattern/RRR/g;
>>
>>        #print "$string\n";
>>
>>
>>
>>        # one or more patterns stored in a hash:
>>
>>        my $secondary_patterns = {GI =>   'gi\|(\d+)'};
>>
>>
>>
>>        my $db = Bio::DB::Flat::BinarySearch->new(
>>
>>                              -directory          =>   $baseDir,
>>
>>                              -dbname             =>   $dbName,
>>
>>                              -write_flag         =>   1,
>>
>>                              -primary_pattern    =>   $primary_pattern,
>>
>>                              -primary_namespace  =>   'ACC',
>>
>>                              -secondary_patterns =>   $secondary_patterns,
>>
>>                              -verbose            =>   1,
>>
>>                              -format             =>   'fasta'  );
>>
>>
>>
>>        $db->build_index($seqFile);
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From fs5 at sanger.ac.uk  Tue Sep  7 08:09:52 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Tue, 07 Sep 2010 13:09:52 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
Message-ID: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>

I am working a lot with feature-rich Bio::Seq objects these days and
thought that it would be really nice if I could do something like:

my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene');

instead of having to grep for the feature every time. 
There could then be 'by_tag' and 'by_region' options as well. 

According to the Bio::Seq docs, something like this seems to be planned
at some stage. I would be willing to contribute to this feature if I can
and if this isn't already being implemented by somebody else. 
Does anybody know the state of this feature?

Frank


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From jason at bioperl.org  Tue Sep  7 13:36:07 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 07 Sep 2010 10:36:07 -0700
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <4C867807.2040907@bioperl.org>

And the implementation would just be something like this?

my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
eq 'my_gene' } $seq->get_SeqFeatures();

I think any implementation would be if we moved from the in-memory 
arrays & hash-based system to a sqlite db on the back-end for how 
Sequence and Feature objects are stored.
This would be a somewhat slower but wouldn't have performance/memory 
problems we get for sequences with many annotations.

-jason
Frank Schwach wrote, On 9/7/10 5:09 AM:
> I am working a lot with feature-rich Bio::Seq objects these days and
> thought that it would be really nice if I could do something like:
>
> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>
> instead of having to grep for the feature every time.
> There could then be 'by_tag' and 'by_region' options as well.
>
> According to the Bio::Seq docs, something like this seems to be planned
> at some stage. I would be willing to contribute to this feature if I can
> and if this isn't already being implemented by somebody else.
> Does anybody know the state of this feature?
>
> Frank
>
>
>
>
>
>
>    


From fs5 at sanger.ac.uk  Wed Sep  8 04:42:57 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 08 Sep 2010 09:42:57 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <4C867807.2040907@bioperl.org>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
Message-ID: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Jason,

Yes, I guess that would be the simplest way of doing it - basically just
doing it the way the docs suggest for getting at a specific feature but
hiding the grep behind a Bio::Seq method with search parameters. But we
could also build a hash of feature tags as the Bio::Seq is built so that
retrieval is more efficient. This could also be used to implement a bin
indexing scheme for range queries, similar to what Bio::DB::GFF does.
Is a move to an sqlite backend planend for the near future? 

Frank


On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
> And the implementation would just be something like this?
> 
> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
> eq 'my_gene' } $seq->get_SeqFeatures();
> 
> I think any implementation would be if we moved from the in-memory 
> arrays & hash-based system to a sqlite db on the back-end for how 
> Sequence and Feature objects are stored.
> This would be a somewhat slower but wouldn't have performance/memory 
> problems we get for sequences with many annotations.
> 
> -jason
> Frank Schwach wrote, On 9/7/10 5:09 AM:
> > I am working a lot with feature-rich Bio::Seq objects these days and
> > thought that it would be really nice if I could do something like:
> >
> > my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
> >
> > instead of having to grep for the feature every time.
> > There could then be 'by_tag' and 'by_region' options as well.
> >
> > According to the Bio::Seq docs, something like this seems to be planned
> > at some stage. I would be willing to contribute to this feature if I can
> > and if this isn't already being implemented by somebody else.
> > Does anybody know the state of this feature?
> >
> > Frank
> >
> >
> >
> >
> >
> >
> >    


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From stefan.kirov at bms.com  Wed Sep  8 11:09:55 2010
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 08 Sep 2010 11:09:55 -0400
Subject: [Bioperl-l] Another interesting Javascript library
Message-ID: <4C87A743.5010109@bms.com>


    Sorry for off topic, but I believe a lot of people can
    find this quite useful:

        "CanvasXpress is a javascript library based on the
        <canvas> tag implemented in HTML5. I developed this
        library as the core visualization component for our
        BMS systems biology platform which I hope to release
        soon. The basic idea was to have generic and simple
        way to display genomics data. CanvasXpress supports
        bar graphs, line graphs, bar-line combination
        graphs, boxplots, dotplots, area graphs, stacked
        graphs, percentage-stacked graphs, correlation
        plots, Venn diagrams, heatmaps, newick trees,
        2D-scatter plots, 2D-scatter bubble plots,
        3D-scatter plots, pie charts, networks (or
        pathways), and a genome browser. It also supports a
        few data transformations like log and exponential
        transformation, z-score, percentile transformation
        and ratio. It also support grouping of samples,
        zooming, events ... yada, yada, yada ... and more
        importantly I created an Ext panel for it. Take a look.

        http://canvasxpress.org/"


    Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_kirov.vcf
Type: text/x-vcard
Size: 207 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100908/30205812/attachment-0003.vcf>

From alperyilmaz at gmail.com  Wed Sep  8 12:47:42 2010
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Wed, 8 Sep 2010 12:47:42 -0400
Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates
Message-ID: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>

Hi,

I have a GFF file listing mRNA and CDS coordinates for every
transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates
based on that information. I was wondering, if there's already made
script for that purpose that you're aware of.

I already uploaded the GFF file into Bio::DB::SeqFeature database, so
I can utilize both flat file or database based scripts.

thanks,

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954


From cjfields at illinois.edu  Wed Sep  8 19:20:09 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 8 Sep 2010 18:20:09 -0500
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>

Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Thu Sep  9 01:51:53 2010
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 08 Sep 2010 22:51:53 -0700
Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates
In-Reply-To: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>
References: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>
Message-ID: <4C8875F9.6020502@bioperl.org>

Hi Alper -

This script operates on gtf so doesn't quite do what you want but could 
be modified to be simpler to just look at the CDS and mRNA rather than 
the exon,start/stop codon info
   
http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl

Otherwise I think there make be some easy ways to do this from some 
tools in MAKER too.

-jason
Alper Yilmaz wrote, On 9/8/10 9:47 AM:
> Hi,
>
> I have a GFF file listing mRNA and CDS coordinates for every
> transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates
> based on that information. I was wondering, if there's already made
> script for that purpose that you're aware of.
>
> I already uploaded the GFF file into Bio::DB::SeqFeature database, so
> I can utilize both flat file or database based scripts.
>
> thanks,
>
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>    


From fs5 at sanger.ac.uk  Thu Sep  9 04:10:36 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 09 Sep 2010 09:10:36 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
	<03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
Message-ID: <1284019836.4777.281.camel@deskpro15336.dynamic.sanger.ac.uk>

so something like an abstract Bio::Seq::FeatureContainer that defines
the methods for storing and retrieving features and that would then be
sub-classed to e.g. Bio::Seq::FeatureContainer::Memory or
Bio::Seq::FeatureContainer:Sqlite - is that the plan? Is there any way I
can get involved or is it better to wait for other features to be
developed first?

Cheers,

Frank


On Wed, 2010-09-08 at 18:20 -0500, Chris Fields wrote:
> Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).
> 
> chris
> 
> On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:
> 
> > Hi Jason,
> > 
> > Yes, I guess that would be the simplest way of doing it - basically just
> > doing it the way the docs suggest for getting at a specific feature but
> > hiding the grep behind a Bio::Seq method with search parameters. But we
> > could also build a hash of feature tags as the Bio::Seq is built so that
> > retrieval is more efficient. This could also be used to implement a bin
> > indexing scheme for range queries, similar to what Bio::DB::GFF does.
> > Is a move to an sqlite backend planend for the near future? 
> > 
> > Frank
> > 
> > 
> > 
> > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
> >> And the implementation would just be something like this?
> >> 
> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
> >> eq 'my_gene' } $seq->get_SeqFeatures();
> >> 
> >> I think any implementation would be if we moved from the in-memory 
> >> arrays & hash-based system to a sqlite db on the back-end for how 
> >> Sequence and Feature objects are stored.
> >> This would be a somewhat slower but wouldn't have performance/memory 
> >> problems we get for sequences with many annotations.
> >> 
> >> -jason
> >> Frank Schwach wrote, On 9/7/10 5:09 AM:
> >>> I am working a lot with feature-rich Bio::Seq objects these days and
> >>> thought that it would be really nice if I could do something like:
> >>> 
> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
> >>> 
> >>> instead of having to grep for the feature every time.
> >>> There could then be 'by_tag' and 'by_region' options as well.
> >>> 
> >>> According to the Bio::Seq docs, something like this seems to be planned
> >>> at some stage. I would be willing to contribute to this feature if I can
> >>> and if this isn't already being implemented by somebody else.
> >>> Does anybody know the state of this feature?
> >>> 
> >>> Frank
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> > 
> > 
> > 
> > -- 
> > The Wellcome Trust Sanger Institute is operated by Genome Research 
> > Limited, a charity registered in England with number 1021457 and a 
> > company registered in England with number 2742969, whose registered 
> > office is 215 Euston Road, London, NW1 2BE. 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From jun.yin at ucd.ie  Thu Sep  9 04:20:39 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Thu, 09 Sep 2010 09:20:39 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
	<03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
Message-ID: <00ea01cb4ff7$e30652f0$a912f8d0$%yin@ucd.ie>

Hi,

I would like to give a go on the bin indexing scheme on Bio::Seq(or a
similar package to Bio::LocatableSeq). The idea is to save the index of
sequences to a local database (AnyDBM) instead of the memory itself. So this
will free some memory usage. This idea actually comes from Bio::DB::Fasta,
as implemented by Lincoln Stein.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Thursday, September 09, 2010 12:20 AM
To: Frank Schwach
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Seq, search for specific features

Well, no move has been concretely made yet.  It would be nice to abstract
the backend, so one could use possibly any db or memory adaptor.  This is
essentially the direction I would like to take the alignment data as well
(part of the GSoC project for BioPerl this year was to tackle this very
thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From s1012635 at student.hsleiden.nl  Thu Sep  9 05:27:23 2010
From: s1012635 at student.hsleiden.nl (_Lelieveld, Stefan - s1012635)
Date: Thu, 9 Sep 2010 11:27:23 +0200 (CEST)
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>

Hi, 

I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). 
I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. 

http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
use Bio::Tools::Tmhmm; 
my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
while( my $tmhmm_feat = $parser->next_result ) { 
#do something 
#eg 
push @tmhmm_feat, $tmhmm_feat; 
} 

How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 

cheers! 

Stefan Lelieveld 


From fs5 at sanger.ac.uk  Thu Sep  9 06:28:51 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 09 Sep 2010 11:28:51 +0100
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
References: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <1284028131.4777.290.camel@deskpro15336.dynamic.sanger.ac.uk>

I haven't used that module myself but it appears to be a parser for
results from TMHMM, i.e. you don't feed it the FASTA file but the output
from TMHMM after it was run.
To run TMHMM you should use Bio::Tools::Run::Tmhmm
http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Tmhmm.pm

Follow the synopsis to feed the tool with your sequences. You can learn
how to read a FASTA file and access each sequence in a loop here: 
http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples

Essentially it boils down to:

use Bio::SeqIO;
my $file = shift; # to get a file path from command line
my $inseq = Bio::SeqIO->new(-file   => "<$file",-format => 'FASTA' );
while (my $seq = $inseq->next_seq) {
  print $seq->accession_number,"\n";
}

as an example for printing out accession numbers from $seq, which is a Bio::Seq object. 
So what you have to do now is to feed each of those Bio::Seq objects into your TMHMM runner. 

Frank


On Thu, 2010-09-09 at 11:27 +0200, _Lelieveld, Stefan - s1012635 wrote:
> Hi, 
> 
> I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). 
> I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. 
> 
> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
> use Bio::Tools::Tmhmm; 
> my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
> while( my $tmhmm_feat = $parser->next_result ) { 
> #do something 
> #eg 
> push @tmhmm_feat, $tmhmm_feat; 
> } 
> 
> How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 
> 
> cheers! 
> 
> Stefan Lelieveld 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From kai.blin at biotech.uni-tuebingen.de  Thu Sep  9 06:16:08 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 9 Sep 2010 12:16:08 +0200
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
References: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl>
	<814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <20100909121608.2571bbff.kai.blin@biotech.uni-tuebingen.de>

On Thu, 9 Sep 2010 11:27:23 +0200 (CEST)
"_Lelieveld, Stefan - s1012635" <s1012635 at student.hsleiden.nl> wrote:

Hi Stefan,

> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
> use Bio::Tools::Tmhmm; 
> my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
> while( my $tmhmm_feat = $parser->next_result ) { 
> #do something 
> #eg 
> push @tmhmm_feat, $tmhmm_feat; 
> } 
> 
> How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 

You need to run TMHMM first, of course. Bio::Tools::Tmhmm only parses
the TMHMM output file and returns an object that you can ask for
Bio::SeqFeature objects. So if you want to run TMHMM on some fasta
files, this module isn't going to do that for you.

Assuming that input.txt contains the TMHMM output,
"""
my $parser = new Bio::Tools:Tmhmm(-file => "input.txt");
"""
will load parse the TMHMM output for you.

HTH,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From elanorbust2 at yahoo.com  Thu Sep  9 12:10:06 2010
From: elanorbust2 at yahoo.com (sally roberts)
Date: Thu, 9 Sep 2010 09:10:06 -0700 (PDT)
Subject: [Bioperl-l] standaloneblastplus
Message-ID: <154453.73718.qm@web37504.mail.mud.yahoo.com>

I am running a test for standaloneblastplus but getting data back that does not exist in my query or my local database. Below is a outline of my script small database, query list, and erroneous results. As you will notice the query list is comprised of the first four sequences found in the database. The results say it can not find the first two and then the mathces for the last two do not exist!

Thanks for any help! 


Program 


#!/usr/bin/perl

use Bio::Tools::Run::StandAloneBlastPlus;


$fac = Bio::Tools::Run::StandAloneBlastPlus->new(
? -db_name => 'ITS',
? -db_data => 'smallDB.fas',
? -create => 1
);

$result = $fac->blastn( -query => , 'sequences.fasta',
??????????????????????? -outfile => 'ITStest2.bls');


smallDB.fas Data

>302585252|HM807352|Waitea circinata? internal transcribed spacer 1
ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA

>302585252|HM807352|Waitea circinata? internal transcribed spacer 2
GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA

>302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC

>302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA

>302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGCGCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCATTAAAGCGGAGGAA

>301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1
GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGCGCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTATATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAGAATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCTACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACCTCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTGACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGTGAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCGTTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCGTCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTACCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAGTCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATATATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGAACGCA
CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTAAGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCTTATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA

>301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTGGGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGAATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGAAACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGAAATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTAACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGACTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAAGTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACGTAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGAAACTTTTATATATAGTTCGCCGAATAATAGCGAAC

>301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer 1
AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCGCTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATCGTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTGAGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTCGAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCGAGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACCGGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTGGCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCCCTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTGTATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATATGAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATGTATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCTATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC

>301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1
AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCTTTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATCTCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCGGGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAATAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGAAACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAATATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCCGCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTCGTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATAGTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATACAAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCATAAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG

>301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1
AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGCTCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATTCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTCTTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCGATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGAGATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAATTACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAACGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGTAGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATATATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG


sequences.fasta data

>Test1
ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA

>Test2
GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA

>Test3
CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC

>Test4
GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA


Results

BLASTN 2.2.24+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.


Database: ITS
?????????? 5 sequences; 1,102 total letters


Query=? Test1
Length=204


***** No hits found *****


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 202071


Query=? Test2
Length=192


***** No hits found *****


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 189507


Query=? Test3
Length=437
????????????????????????????????????????????????????????????????????? Score???? E
Sequences producing significant alignments:????????????????????????? (Bits)? Value

dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 300??? 2e-085
dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 6e-016
dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012
dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 4e-012


>dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G59F
Length=203

?Score =? 300 bits (162),? Expect = 2e-085
?Identities = 176/182 (96%), Gaps = 4/182 (2%)
?Strand=Plus/Plus

Query? 10?? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC? 66
??????????? ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC? 81

Query? 67?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 126
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 82?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 141

Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 201

Query? 187? GG? 188
??????????? ||
Sbjct? 202? GG? 203


>dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G64F
Length=217

?Score = 69.4 bits (37),? Expect = 6e-016
?Identities = 39/40 (97%), Gaps = 0/40 (0%)
?Strand=Plus/Plus

Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188
??????????? ||||| ||||||||||||||||||||||||||||||||||
Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217


>dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G60F
Length=206

?Score = 58.4 bits (31),? Expect = 1e-012
?Identities = 39/42 (92%), Gaps = 3/42 (7%)
?Strand=Plus/Plus

Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? |||| || ||| ||||||||||||||||||||||||||||||
Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204


>dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G65F
Length=256

?Score = 56.5 bits (30),? Expect = 4e-012
?Identities = 30/30 (100%), Gaps = 0/30 (0%)
?Strand=Plus/Plus

Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? ||||||||||||||||||||||||||||||
Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 442850


Query=? Test4
Length=521
????????????????????????????????????????????????????????????????????? Score???? E
Sequences producing significant alignments:????????????????????????? (Bits)? Value

dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 309??? 4e-088
dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 7e-016
dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012
dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 5e-012


>dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G59F
Length=203

?Score =? 309 bits (167),? Expect = 4e-088
?Identities = 177/181 (97%), Gaps = 3/181 (1%)
?Strand=Plus/Plus

Query? 7??? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA? 63
??????????? ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA? 82

Query? 64?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 123
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 83?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 142

Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 183
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 202

Query? 184? G? 184
??????????? |
Sbjct? 203? G? 203


>dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G64F
Length=217

?Score = 69.4 bits (37),? Expect = 7e-016
?Identities = 39/40 (97%), Gaps = 0/40 (0%)
?Strand=Plus/Plus

Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184
??????????? ||||| ||||||||||||||||||||||||||||||||||
Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217


>dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G60F
Length=206

?Score = 58.4 bits (31),? Expect = 1e-012
?Identities = 39/42 (92%), Gaps = 3/42 (7%)
?Strand=Plus/Plus

Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182
??????????? |||| || ||| ||||||||||||||||||||||||||||||
Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204


>dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G65F
Length=256

?Score = 56.5 bits (30),? Expect = 5e-012
?Identities = 30/30 (100%), Gaps = 0/30 (0%)
?Strand=Plus/Plus

Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182
??????????? ||||||||||||||||||||||||||||||
Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 530378


? Database: ITS
??? Posted date:? Aug 27, 2010? 9:43 AM
? Number of letters in database: 1,102
? Number of sequences in database:? 5


Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 2.5


From jaya1786 at gmail.com  Thu Sep  9 12:59:51 2010
From: jaya1786 at gmail.com (jayanthijayakumar)
Date: Thu, 9 Sep 2010 22:29:51 +0530
Subject: [Bioperl-l] Regarding GSoC 2010
Message-ID: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>

Respected sir/madam,

I am Jayanthi Jayakumar doing my second year MS(By Research) in
computational biology in Anna University Chennai,India.

Iam very  much interested to participate in GSoC 2010 under the project
"Major Bioperl recognition".

I request you to provide details and eligiblity criteria for the same.


Thanking you,


yours faithfully,
Jayanthi Jayakumar


From Russell.Smithies at agresearch.co.nz  Thu Sep  9 18:54:43 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 10 Sep 2010 10:54:43 +1200
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <154453.73718.qm@web37504.mail.mud.yahoo.com>
References: <154453.73718.qm@web37504.mail.mud.yahoo.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz>

Is that a typo in your email or are some of your fasta headers in your db incorrect?
Eg.
>301333052|GU725063|Xiphinema adenohystherum  internal transcribed
>301333052|GU725063|spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Shouldn't that be:
>301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Maybe the invalid fasta headers are breaking the db formatter?


Russell Smithies

Technical Support
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz
Invermay  Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T  +64 3 489 3809
F  +64 3 489 9174
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of sally roberts
> Sent: Friday, 10 September 2010 4:10 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] standaloneblastplus
>
> I am running a test for standaloneblastplus but getting data back that
> does not exist in my query or my local database. Below is a outline of my
> script small database, query list, and erroneous results. As you will
> notice the query list is comprised of the first four sequences found in
> the database. The results say it can not find the first two and then the
> mathces for the last two do not exist!
>
> Thanks for any help!
>
>
>
> Program
>
>
> #!/usr/bin/perl
>
> use Bio::Tools::Run::StandAloneBlastPlus;
>
>
> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>   -db_name => 'ITS',
>   -db_data => 'smallDB.fas',
>   -create => 1
> );
>
> $result = $fac->blastn( -query => , 'sequences.fasta',
>                         -outfile => 'ITStest2.bls');
>
>
> smallDB.fas Data
>
> >302585252|HM807352|Waitea circinata  internal transcribed spacer 1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >302585252|HM807352|Waitea circinata  internal transcribed spacer 2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >302585250|HM802273|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >302585249|HM802272|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
> >302585248|HM802271|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
> ATATCATTAAAGCGGAGGAA
>
> >301333053|GU725064|Xiphinema turcicum  internal transcribed spacer 1
> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
> ACGCA
> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>
> >301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer
> 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>
> >301333051|GU725062|Xiphinema sphaerocephalum  internal transcribed spacer
> 1
> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>
> >301333050|GU725061|Xiphinema hispanum  internal transcribed spacer 1
> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>
> >301333049|GU725060|Xiphinema pyrenaicum  internal transcribed spacer 1
> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>
>
>
> sequences.fasta data
>
> >Test1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >Test2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >Test3
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >Test4
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
>
>
>
> Results
>
> BLASTN 2.2.24+
>
>
> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
> Comput Biol 2000; 7(1-2):203-14.
>
>
>
> Database: ITS
>            5 sequences; 1,102 total letters
>
>
>
> Query=  Test1
> Length=204
>
>
> ***** No hits found *****
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 202071
>
>
> Query=  Test2
> Length=192
>
>
> ***** No hits found *****
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 189507
>
>
> Query=  Test3
> Length=437
>
> Score     E
> Sequences producing significant alignments:
> (Bits)  Value
>
> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 300    2e-085
> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4    6e-016
> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4    1e-012
> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5    4e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>  Score =  300 bits (162),  Expect = 2e-085
>  Identities = 176/182 (96%), Gaps = 4/182 (2%)
>  Strand=Plus/Plus
>
> Query  10   TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
> 66
>             ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
> 81
>
> Query  67   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 126
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  82   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 141
>
> Query  127  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 186
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  142  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 201
>
> Query  187  GG  188
>             ||
> Sbjct  202  GG  203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>  Score = 69.4 bits (37),  Expect = 6e-016
>  Identities = 39/40 (97%), Gaps = 0/40 (0%)
>  Strand=Plus/Plus
>
> Query  149  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  188
>             ||||| ||||||||||||||||||||||||||||||||||
> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>  Score = 58.4 bits (31),  Expect = 1e-012
>  Identities = 39/42 (92%), Gaps = 3/42 (7%)
>  Strand=Plus/Plus
>
> Query  146  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  186
>             |||| || ||| ||||||||||||||||||||||||||||||
> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>  Score = 56.5 bits (30),  Expect = 4e-012
>  Identities = 30/30 (100%), Gaps = 0/30 (0%)
>  Strand=Plus/Plus
>
> Query  157  AAAACTTTCAACAACGGATCTCTTGGTTCT  186
>             ||||||||||||||||||||||||||||||
> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 442850
>
>
> Query=  Test4
> Length=521
>
> Score     E
> Sequences producing significant alignments:
> (Bits)  Value
>
> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 309    4e-088
> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4    7e-016
> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4    1e-012
> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5    5e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>  Score =  309 bits (167),  Expect = 4e-088
>  Identities = 177/181 (97%), Gaps = 3/181 (1%)
>  Strand=Plus/Plus
>
> Query  7    TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 63
>             ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 82
>
> Query  64   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 123
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  83   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 142
>
> Query  124  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 183
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  143  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 202
>
> Query  184  G  184
>             |
> Sbjct  203  G  203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>  Score = 69.4 bits (37),  Expect = 7e-016
>  Identities = 39/40 (97%), Gaps = 0/40 (0%)
>  Strand=Plus/Plus
>
> Query  145  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  184
>             ||||| ||||||||||||||||||||||||||||||||||
> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>  Score = 58.4 bits (31),  Expect = 1e-012
>  Identities = 39/42 (92%), Gaps = 3/42 (7%)
>  Strand=Plus/Plus
>
> Query  142  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  182
>             |||| || ||| ||||||||||||||||||||||||||||||
> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>  Score = 56.5 bits (30),  Expect = 5e-012
>  Identities = 30/30 (100%), Gaps = 0/30 (0%)
>  Strand=Plus/Plus
>
> Query  153  AAAACTTTCAACAACGGATCTCTTGGTTCT  182
>             ||||||||||||||||||||||||||||||
> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 530378
>
>
>   Database: ITS
>     Posted date:  Aug 27, 2010  9:43 AM
>   Number of letters in database: 1,102
>   Number of sequences in database:  5
>
>
>
> Matrix: blastn matrix 1 -2
> Gap Penalties: Existence: 0, Extension: 2.5
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From elanorbust2 at yahoo.com  Fri Sep 10 11:13:08 2010
From: elanorbust2 at yahoo.com (sally roberts)
Date: Fri, 10 Sep 2010 08:13:08 -0700 (PDT)
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz>
Message-ID: <23696.14536.qm@web37508.mail.mud.yahoo.com>

I think that is just a email error. Thanks for looking though!

--- On Thu, 9/9/10, Smithies, Russell <Russell.Smithies at agresearch.co.nz> wrote:

From: Smithies, Russell <Russell.Smithies at agresearch.co.nz>
Subject: RE: [Bioperl-l] standaloneblastplus
To: "'sally roberts'" <elanorbust2 at yahoo.com>, "'bioperl-l at lists.open-bio.org'" <bioperl-l at lists.open-bio.org>
Date: Thursday, September 9, 2010, 6:54 PM

Is that a typo in your email or are some of your fasta headers in your db incorrect?
Eg.
>301333052|GU725063|Xiphinema adenohystherum? internal transcribed
>301333052|GU725063|spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Shouldn't that be:
>301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Maybe the invalid fasta headers are breaking the db formatter?


Russell Smithies

Technical Support
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz
Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809
F? +64 3 489 9174
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of sally roberts
> Sent: Friday, 10 September 2010 4:10 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] standaloneblastplus
>
> I am running a test for standaloneblastplus but getting data back that
> does not exist in my query or my local database. Below is a outline of my
> script small database, query list, and erroneous results. As you will
> notice the query list is comprised of the first four sequences found in
> the database. The results say it can not find the first two and then the
> mathces for the last two do not exist!
>
> Thanks for any help!
>
>
>
> Program
>
>
> #!/usr/bin/perl
>
> use Bio::Tools::Run::StandAloneBlastPlus;
>
>
> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>???-db_name => 'ITS',
>???-db_data => 'smallDB.fas',
>???-create => 1
> );
>
> $result = $fac->blastn( -query => , 'sequences.fasta',
>? ? ? ? ? ? ? ? ? ? ? ???-outfile => 'ITStest2.bls');
>
>
> smallDB.fas Data
>
> >302585252|HM807352|Waitea circinata? internal transcribed spacer 1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >302585252|HM807352|Waitea circinata? internal transcribed spacer 2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
> >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
> ATATCATTAAAGCGGAGGAA
>
> >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1
> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
> ACGCA
> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>
> >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer
> 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>
> >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer
> 1
> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>
> >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1
> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>
> >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1
> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>
>
>
> sequences.fasta data
>
> >Test1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >Test2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >Test3
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >Test4
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
>
>
>
> Results
>
> BLASTN 2.2.24+
>
>
> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
> Comput Biol 2000; 7(1-2):203-14.
>
>
>
> Database: ITS
>? ? ? ? ? ? 5 sequences; 1,102 total letters
>
>
>
> Query=? Test1
> Length=204
>
>
> ***** No hits found *****
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 202071
>
>
> Query=? Test2
> Length=192
>
>
> ***** No hits found *****
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 189507
>
>
> Query=? Test3
> Length=437
>
> Score? ???E
> Sequences producing significant alignments:
> (Bits)? Value
>
> dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 300? ? 2e-085
> dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4? ? 6e-016
> dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4? ? 1e-012
> dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5? ? 4e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>? Score =? 300 bits (162),? Expect = 2e-085
>? Identities = 176/182 (96%), Gaps = 4/182 (2%)
>? Strand=Plus/Plus
>
> Query? 10???TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
> 66
>? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
> Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
> 81
>
> Query? 67???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 126
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 82???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 141
>
> Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 186
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 201
>
> Query? 187? GG? 188
>? ? ? ? ? ???||
> Sbjct? 202? GG? 203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>? Score = 69.4 bits (37),? Expect = 6e-016
>? Identities = 39/40 (97%), Gaps = 0/40 (0%)
>? Strand=Plus/Plus
>
> Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188
>? ? ? ? ? ???||||| ||||||||||||||||||||||||||||||||||
> Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>? Score = 58.4 bits (31),? Expect = 1e-012
>? Identities = 39/42 (92%), Gaps = 3/42 (7%)
>? Strand=Plus/Plus
>
> Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
>? ? ? ? ? ???|||| || ||| ||||||||||||||||||||||||||||||
> Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>? Score = 56.5 bits (30),? Expect = 4e-012
>? Identities = 30/30 (100%), Gaps = 0/30 (0%)
>? Strand=Plus/Plus
>
> Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186
>? ? ? ? ? ???||||||||||||||||||||||||||||||
> Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 442850
>
>
> Query=? Test4
> Length=521
>
> Score? ???E
> Sequences producing significant alignments:
> (Bits)? Value
>
> dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 309? ? 4e-088
> dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4? ? 7e-016
> dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4? ? 1e-012
> dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5? ? 5e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>? Score =? 309 bits (167),? Expect = 4e-088
>? Identities = 177/181 (97%), Gaps = 3/181 (1%)
>? Strand=Plus/Plus
>
> Query? 7? ? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 63
>? ? ? ? ? ???||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
> Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 82
>
> Query? 64???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 123
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 83???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 142
>
> Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 183
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 202
>
> Query? 184? G? 184
>? ? ? ? ? ???|
> Sbjct? 203? G? 203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>? Score = 69.4 bits (37),? Expect = 7e-016
>? Identities = 39/40 (97%), Gaps = 0/40 (0%)
>? Strand=Plus/Plus
>
> Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184
>? ? ? ? ? ???||||| ||||||||||||||||||||||||||||||||||
> Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>? Score = 58.4 bits (31),? Expect = 1e-012
>? Identities = 39/42 (92%), Gaps = 3/42 (7%)
>? Strand=Plus/Plus
>
> Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182
>? ? ? ? ? ???|||| || ||| ||||||||||||||||||||||||||||||
> Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>? Score = 56.5 bits (30),? Expect = 5e-012
>? Identities = 30/30 (100%), Gaps = 0/30 (0%)
>? Strand=Plus/Plus
>
> Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182
>? ? ? ? ? ???||||||||||||||||||||||||||||||
> Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 530378
>
>
>???Database: ITS
>? ???Posted date:? Aug 27, 2010? 9:43 AM
>???Number of letters in database: 1,102
>???Number of sequences in database:? 5
>
>
>
> Matrix: blastn matrix 1 -2
> Gap Penalties: Existence: 0, Extension: 2.5
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From David.Messina at sbc.su.se  Fri Sep 10 12:23:26 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 10 Sep 2010 18:23:26 +0200
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <23696.14536.qm@web37508.mail.mud.yahoo.com>
References: <23696.14536.qm@web37508.mail.mud.yahoo.com>
Message-ID: <CFF0CF54-E5BD-4FD1-990D-E3AC219FBF5B@sbc.su.se>

Hi Sally,

Did you run the same search on the command line, outside of BioPerl?

The issue you're having may be with Blast+ and not BioPerl. For example, it's possible that the low-complexity and compositional matrix adjustment filtering (which are turned on by default) are excluding the expected matches.


Dave


On Sep 10, 2010, at 17:13 , sally roberts wrote:

> I think that is just a email error. Thanks for looking though!
> 
> --- On Thu, 9/9/10, Smithies, Russell <Russell.Smithies at agresearch.co.nz> wrote:
> 
> From: Smithies, Russell <Russell.Smithies at agresearch.co.nz>
> Subject: RE: [Bioperl-l] standaloneblastplus
> To: "'sally roberts'" <elanorbust2 at yahoo.com>, "'bioperl-l at lists.open-bio.org'" <bioperl-l at lists.open-bio.org>
> Date: Thursday, September 9, 2010, 6:54 PM
> 
> Is that a typo in your email or are some of your fasta headers in your db incorrect?
> Eg.
>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed
>> 301333052|GU725063|spacer 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT
> 
> Shouldn't that be:
>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT
> 
> Maybe the invalid fasta headers are breaking the db formatter?
> 
> 
> Russell Smithies
> 
> Technical Support
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of sally roberts
>> Sent: Friday, 10 September 2010 4:10 a.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] standaloneblastplus
>> 
>> I am running a test for standaloneblastplus but getting data back that
>> does not exist in my query or my local database. Below is a outline of my
>> script small database, query list, and erroneous results. As you will
>> notice the query list is comprised of the first four sequences found in
>> the database. The results say it can not find the first two and then the
>> mathces for the last two do not exist!
>> 
>> Thanks for any help!
>> 
>> 
>> 
>> Program
>> 
>> 
>> #!/usr/bin/perl
>> 
>> use Bio::Tools::Run::StandAloneBlastPlus;
>> 
>> 
>> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>>    -db_name => 'ITS',
>>    -db_data => 'smallDB.fas',
>>    -create => 1
>> );
>> 
>> $result = $fac->blastn( -query => , 'sequences.fasta',
>>                          -outfile => 'ITStest2.bls');
>> 
>> 
>> smallDB.fas Data
>> 
>>> 302585252|HM807352|Waitea circinata  internal transcribed spacer 1
>> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
>> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
>> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>> 
>>> 302585252|HM807352|Waitea circinata  internal transcribed spacer 2
>> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
>> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
>> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>> 
>>> 302585250|HM802273|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
>> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
>> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
>> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
>> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
>> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>> 
>>> 302585249|HM802272|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
>> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
>> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
>> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
>> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
>> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
>> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
>> GAA
>> 
>>> 302585248|HM802271|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
>> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
>> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
>> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
>> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
>> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
>> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
>> ATATCATTAAAGCGGAGGAA
>> 
>>> 301333053|GU725064|Xiphinema turcicum  internal transcribed spacer 1
>> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
>> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
>> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
>> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
>> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
>> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
>> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
>> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
>> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
>> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
>> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
>> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
>> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
>> ACGCA
>> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
>> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
>> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>> 
>>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer
>> 1
>> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
>> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
>> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
>> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
>> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
>> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
>> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
>> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
>> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
>> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
>> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
>> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
>> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>> 
>>> 301333051|GU725062|Xiphinema sphaerocephalum  internal transcribed spacer
>> 1
>> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
>> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
>> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
>> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
>> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
>> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
>> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
>> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
>> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
>> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
>> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
>> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
>> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>> 
>>> 301333050|GU725061|Xiphinema hispanum  internal transcribed spacer 1
>> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
>> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
>> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
>> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
>> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
>> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
>> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
>> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
>> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
>> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
>> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
>> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
>> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>> 
>>> 301333049|GU725060|Xiphinema pyrenaicum  internal transcribed spacer 1
>> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
>> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
>> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
>> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
>> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
>> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
>> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
>> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
>> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
>> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
>> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
>> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>> 
>> 
>> 
>> sequences.fasta data
>> 
>>> Test1
>> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
>> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
>> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>> 
>>> Test2
>> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
>> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
>> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>> 
>>> Test3
>> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
>> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
>> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
>> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
>> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
>> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>> 
>>> Test4
>> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
>> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
>> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
>> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
>> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
>> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
>> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
>> GAA
>> 
>> 
>> 
>> 
>> Results
>> 
>> BLASTN 2.2.24+
>> 
>> 
>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
>> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
>> Comput Biol 2000; 7(1-2):203-14.
>> 
>> 
>> 
>> Database: ITS
>>             5 sequences; 1,102 total letters
>> 
>> 
>> 
>> Query=  Test1
>> Length=204
>> 
>> 
>> ***** No hits found *****
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 202071
>> 
>> 
>> Query=  Test2
>> Length=192
>> 
>> 
>> ***** No hits found *****
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 189507
>> 
>> 
>> Query=  Test3
>> Length=437
>> 
>> Score     E
>> Sequences producing significant alignments:
>> (Bits)  Value
>> 
>> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 300    2e-085
>> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 69.4    6e-016
>> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 58.4    1e-012
>> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 56.5    4e-012
>> 
>> 
>>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G59F
>> Length=203
>> 
>>   Score =  300 bits (162),  Expect = 2e-085
>>   Identities = 176/182 (96%), Gaps = 4/182 (2%)
>>   Strand=Plus/Plus
>> 
>> Query  10   TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
>> 66
>>              ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
>> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
>> 81
>> 
>> Query  67   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
>> 126
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  82   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
>> 141
>> 
>> Query  127  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
>> 186
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  142  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
>> 201
>> 
>> Query  187  GG  188
>>              ||
>> Sbjct  202  GG  203
>> 
>> 
>>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G64F
>> Length=217
>> 
>>   Score = 69.4 bits (37),  Expect = 6e-016
>>   Identities = 39/40 (97%), Gaps = 0/40 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  149  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  188
>>              ||||| ||||||||||||||||||||||||||||||||||
>> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>> 
>> 
>>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G60F
>> Length=206
>> 
>>   Score = 58.4 bits (31),  Expect = 1e-012
>>   Identities = 39/42 (92%), Gaps = 3/42 (7%)
>>   Strand=Plus/Plus
>> 
>> Query  146  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  186
>>              |||| || ||| ||||||||||||||||||||||||||||||
>> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>> 
>> 
>>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G65F
>> Length=256
>> 
>>   Score = 56.5 bits (30),  Expect = 4e-012
>>   Identities = 30/30 (100%), Gaps = 0/30 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  157  AAAACTTTCAACAACGGATCTCTTGGTTCT  186
>>              ||||||||||||||||||||||||||||||
>> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 442850
>> 
>> 
>> Query=  Test4
>> Length=521
>> 
>> Score     E
>> Sequences producing significant alignments:
>> (Bits)  Value
>> 
>> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 309    4e-088
>> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 69.4    7e-016
>> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 58.4    1e-012
>> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 56.5    5e-012
>> 
>> 
>>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G59F
>> Length=203
>> 
>>   Score =  309 bits (167),  Expect = 4e-088
>>   Identities = 177/181 (97%), Gaps = 3/181 (1%)
>>   Strand=Plus/Plus
>> 
>> Query  7    TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
>> 63
>>              ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
>> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
>> 82
>> 
>> Query  64   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
>> 123
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  83   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
>> 142
>> 
>> Query  124  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
>> 183
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  143  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
>> 202
>> 
>> Query  184  G  184
>>              |
>> Sbjct  203  G  203
>> 
>> 
>>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G64F
>> Length=217
>> 
>>   Score = 69.4 bits (37),  Expect = 7e-016
>>   Identities = 39/40 (97%), Gaps = 0/40 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  145  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  184
>>              ||||| ||||||||||||||||||||||||||||||||||
>> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>> 
>> 
>>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G60F
>> Length=206
>> 
>>   Score = 58.4 bits (31),  Expect = 1e-012
>>   Identities = 39/42 (92%), Gaps = 3/42 (7%)
>>   Strand=Plus/Plus
>> 
>> Query  142  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  182
>>              |||| || ||| ||||||||||||||||||||||||||||||
>> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>> 
>> 
>>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G65F
>> Length=256
>> 
>>   Score = 56.5 bits (30),  Expect = 5e-012
>>   Identities = 30/30 (100%), Gaps = 0/30 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  153  AAAACTTTCAACAACGGATCTCTTGGTTCT  182
>>              ||||||||||||||||||||||||||||||
>> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 530378
>> 
>> 
>>    Database: ITS
>>      Posted date:  Aug 27, 2010  9:43 AM
>>    Number of letters in database: 1,102
>>    Number of sequences in database:  5
>> 
>> 
>> 
>> Matrix: blastn matrix 1 -2
>> Gap Penalties: Existence: 0, Extension: 2.5
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Sat Sep 11 12:13:09 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Sat, 11 Sep 2010 17:13:09 +0100
Subject: [Bioperl-l] Regarding GSoC 2010
In-Reply-To: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>
References: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>
Message-ID: <019501cb51cc$39d15730$ad740590$%yin@ucd.ie>

Hi, Jayanthi Jayakumar,

GSoC is already finished this year. You can check the information here:
http://socghop.appspot.com/gsoc/program/home/google/gsoc2010

However, you can still contribute to the BioPerl project if you like. You
can talk to people in this mail list. Or you can join the IRC channel
(http://www.bioperl.org/wiki/IRC).

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jayanthijayakumar
Sent: Thursday, September 09, 2010 6:00 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Regarding GSoC 2010

Respected sir/madam,

I am Jayanthi Jayakumar doing my second year MS(By Research) in
computational biology in Anna University Chennai,India.

Iam very  much interested to participate in GSoC 2010 under the project
"Major Bioperl recognition".

I request you to provide details and eligiblity criteria for the same.


Thanking you,


yours faithfully,
Jayanthi Jayakumar
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 
 
__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From david.breimann at gmail.com  Sun Sep 12 09:16:29 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sun, 12 Sep 2010 15:16:29 +0200
Subject: [Bioperl-l] Circular genomes
Message-ID: <AANLkTi=LgWkUVeUd6WdaoDZQBTJMm7aCRe86p2-+R9PE@mail.gmail.com>

Hello,

As continuation to
http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033904.html, I
would like to ask:
Was the fix implemented yet? That is, are GFF3 created for circular genomes
comply with GFF3 specs for such genomes?
I just find it difficult to keep track using git ,so I'm not sure if this
was already handled.

Also, will the stat and end coordinates of such genes loaded from a GFF3
file will be "normal" (i.e. no coordinate is larger than the size of the
genome) or just as written in the GFF3 (which demands that end > start even
if end > genome length)?

Thanks,
David


From David.Messina at sbc.su.se  Mon Sep 13 11:10:42 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 13 Sep 2010 17:10:42 +0200
Subject: [Bioperl-l] BioPerl net installer
Message-ID: <80921A33-63E0-481A-B31B-3C0338542F2B@sbc.su.se>

Hi everyone,

I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.

The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.

It's already part of bioperl-live, and you can also get it here:

	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl


Dave


From maj at fortinbras.us  Mon Sep 13 12:47:45 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 13 Sep 2010 16:47:45 +0000
Subject: [Bioperl-l] BioPerl net installer
Message-ID: <W964423161995101284396465@webmail24>

Dear Scott-
You rock!
Sincerely, 
Mark

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Monday, September 13, 2010 11:10 AM
>To: 'BioPerl List'
>Subject: [Bioperl-l] BioPerl net installer
>
>Hi everyone,
>
>I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.
>
>The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.
>
>It's already part of bioperl-live, and you can also get it here:
>
>	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Sep 13 17:15:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 13 Sep 2010 16:15:45 -0500
Subject: [Bioperl-l] BioPerl net installer
In-Reply-To: <W964423161995101284396465@webmail24>
References: <W964423161995101284396465@webmail24>
Message-ID: <3D7D24C5-B2BD-472E-9611-F3D7112E453D@illinois.edu>

Ditto!  

chris (briefly resurfacing)

On Sep 13, 2010, at 11:47 AM, Mark A. Jensen wrote:

> Dear Scott-
> You rock!
> Sincerely, 
> Mark
> 
>> -----Original Message-----
>> From: Dave Messina [mailto:David.Messina at sbc.su.se]
>> Sent: Monday, September 13, 2010 11:10 AM
>> To: 'BioPerl List'
>> Subject: [Bioperl-l] BioPerl net installer
>> 
>> Hi everyone,
>> 
>> I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.
>> 
>> The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.
>> 
>> It's already part of bioperl-live, and you can also get it here:
>> 
>> 	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl
>> 
>> 
>> 
>> Dave
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From timmcilveen at talktalk.net  Mon Sep 13 19:07:00 2010
From: timmcilveen at talktalk.net (tim)
Date: Tue, 14 Sep 2010 00:07:00 +0100
Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3
Message-ID: <201009140007.00798.timmcilveen@talktalk.net>

Hi,
I have just installed Bioperl on my Linux system using the CPAN install. The 
install summary is as follows:
Test Summary Report
-------------------
t/RemoteDB/GenPept.t                       (Wstat: 256 Tests: 21 Failed: 1)
  Failed test:  17
  Non-zero exit status: 1
t/RemoteDB/Query/GenBank.t                 (Wstat: 256 Tests: 18 Failed: 1)
  Failed test:  9
  Non-zero exit status: 1
  Parse errors: Bad plan.  You planned 21 tests but ran 18.
t/RemoteDB/Taxonomy.t                      (Wstat: 512 Tests: 103 Failed: 2)
  Failed tests:  15, 98
  Non-zero exit status: 2
t/Root/RootIO.t                            (Wstat: 7424 Tests: 30 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 31 tests but ran 30.
Files=329, Tests=18407, 512 wallclock secs ( 6.19 usr  0.91 sys + 156.68 cusr  
9.16 csys = 172.94 CPU)
Result: FAIL
Failed 4/329 test programs. 4/18407 subtests failed.
  CJFIELDS/BioPerl-1.6.1.tar.gz
  ./Build test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports CJFIELDS/BioPerl-1.6.1.tar.gz
Running Build install
  make test had returned bad status, won't install without force
Failed during this command:
 CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO

Is Bioperl properly installed? During the install process I was getting quite 
a lot of this error (100's of instances):
'replacement list longer than search list . This happened with t/tools, t/seq 
/ t/search and many others.

Any advice would be great.
Tim


From David.Messina at sbc.su.se  Tue Sep 14 03:56:33 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 14 Sep 2010 09:56:33 +0200
Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3
In-Reply-To: <201009140007.00798.timmcilveen@talktalk.net>
References: <201009140007.00798.timmcilveen@talktalk.net>
Message-ID: <5955676D-D3BC-452B-BAA0-6F230EC11EC1@sbc.su.se>

Hi Tim,

Thanks for your report.


> Is Bioperl properly installed?

No, it wasn't. When installing through CPAN, if any tests fail the installation is aborted. You can always check by looking for this line:

>  make test had returned bad status, won't install without force


As for the error(s)

> 'replacement list longer than search list'

I believe this was fixed a couple of months ago. For details, see:

	http://bugzilla.open-bio.org/show_bug.cgi?id=3116


So I would recommend that you grab the latest copy of bioperl-live from github, wherein the bug will be fixed:

	http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots


Give that a shot and let us know how it goes.


Dave


From jskittrell at unmc.edu  Thu Sep 16 12:15:49 2010
From: jskittrell at unmc.edu (Jeff Kittrell)
Date: Thu, 16 Sep 2010 16:15:49 +0000 (UTC)
Subject: [Bioperl-l] mpiblast
Message-ID: <loom.20100916T180946-13@post.gmane.org>

Does Bioperl work with mpiblast? Is the there a standalone like module that
allows you to easily call mpiblast? I'm assuming seqio with parse a mpiblast
output file correctly?

Thanks for any help,

Jeff 


From David.Messina at sbc.su.se  Thu Sep 16 14:25:57 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 16 Sep 2010 20:25:57 +0200
Subject: [Bioperl-l] mpiblast
In-Reply-To: <loom.20100916T180946-13@post.gmane.org>
References: <loom.20100916T180946-13@post.gmane.org>
Message-ID: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>

> Is the there a standalone like module that allows you to easily call mpiblast?

No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward.

	http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase


> I'm assuming seqio with parse a mpiblast output file correctly?

Yes, although I see that a new version of mpiblast was recently released. 

Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet?


Dave


From shalabh.sharma7 at gmail.com  Thu Sep 16 17:38:14 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Thu, 16 Sep 2010 17:38:14 -0400
Subject: [Bioperl-l] IUPAC code similarity
Message-ID: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>

Hi All,
      I have few nucleotide sequences that are composed of IUPAC codes. Like
>test
VGSRVBSSSSSNSC

Similarly i have a database made of of these kind of sequences. I want to
find sequences that are 100% similar to the query sequence.

Is there any bioPerl module to deal with this, i tried normal blast but it
didn't worked.
Do i have to convert these sequences to 4 base codes or there is any other
way out.

Thanks
Shalabh


From amackey at virginia.edu  Fri Sep 17 10:28:15 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 10:28:15 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
Message-ID: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>

Convert the IUPAC code to a regular expression, and use regular expressions
(in Perl or grep or similar) to find 100% identical matches.

-Aaron

On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>      I have few nucleotide sequences that are composed of IUPAC codes. Like
> >test
> VGSRVBSSSSSNSC
>
> Similarly i have a database made of of these kind of sequences. I want to
> find sequences that are 100% similar to the query sequence.
>
> Is there any bioPerl module to deal with this, i tried normal blast but it
> didn't worked.
> Do i have to convert these sequences to 4 base codes or there is any other
> way out.
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From shalabh.sharma7 at gmail.com  Fri Sep 17 11:07:38 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Fri, 17 Sep 2010 11:07:38 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
Message-ID: <AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>

Thanks Aaron for your reply.
Actually i tried that first, but there is another problem, i have to divide
each query sequence to window size 5 with 1 base shift and its not possible
to divide regular expression in that way.
So what i am trying is to convert those iupac codes to 4 base code sequence
and then do the normal search.
Now the problem is that i cant able to convert those IUPAC sequences to
normal ones, i am still trying to write a script but its taking time.

Thanks
Shalabh


On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu> wrote:

> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <shalabh.sharma7 at gmail.com
> > wrote:
>
>> Hi All,
>>      I have few nucleotide sequences that are composed of IUPAC codes.
>> Like
>> >test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From roy.chaudhuri at gmail.com  Fri Sep 17 11:04:28 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 17 Sep 2010 16:04:28 +0100
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
Message-ID: <4C93837C.4080008@gmail.com>

Hi Shalabh,

The expand method in Bio::Tools::SeqPattern may be useful to convert 
IUPAC codes to regular expressions:

$perl -e 'use Bio::Tools::SeqPattern; print 
Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
[ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C

Although that won't work if there are also abiguity codes in your 
database. For a non-BioPerl solution you could try fuzznuc from Emboss.

Cheers.
Roy.

On 17/09/2010 15:28, Aaron Mackey wrote:
> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
> <shalabh.sharma7 at gmail.com>wrote:
>
>> Hi All,
>>       I have few nucleotide sequences that are composed of IUPAC codes. Like
>>> test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Fri Sep 17 14:13:22 2010
From: david.breimann at gmail.com (David Breimann)
Date: Fri, 17 Sep 2010 20:13:22 +0200
Subject: [Bioperl-l] Installing using git after an older installation
Message-ID: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>

Hello,

I'm sharing a server with some other lab members.
I would like to install the latest version of bioperl for my own use,
without affecting my colleagues.
I used git to clone a copy of bioperl-live and exported
  PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB".
Now
  perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
returns
  1.0069

My question is: is that all?
Now I'm using the latest version?
Should I include anything special in my scripts?

Also, what about all the bp_***.pl scripts?
Are the now using the latest version, too?
I guess not, since I didn't build anything. So what should I do about them?

Thanks,
Dave


From amackey at virginia.edu  Fri Sep 17 15:24:44 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 15:24:44 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <4C93837C.4080008@gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<4C93837C.4080008@gmail.com>
Message-ID: <AANLkTi=Nx5fQ=YvxzB4mSsJh+38B-2+87Lvf=WHU_rCH@mail.gmail.com>

If there are ambi. codes in the database, then the expanded character class
has to also include the original ambiguity code; non-ambiguous nucleotides
must also be expanded to include all ambiguity codes that represent the
nucleotide.

-Aaron

On Fri, Sep 17, 2010 at 11:04 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Hi Shalabh,
>
> The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC
> codes to regular expressions:
>
> $perl -e 'use Bio::Tools::SeqPattern; print
> Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
> [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C
>
> Although that won't work if there are also abiguity codes in your database.
> For a non-BioPerl solution you could try fuzznuc from Emboss.
>
> Cheers.
> Roy.
>
>
> On 17/09/2010 15:28, Aaron Mackey wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions
>> (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
>> <shalabh.sharma7 at gmail.com>wrote:
>>
>>  Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>>
>>>> test
>>>>
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>  _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From amackey at virginia.edu  Fri Sep 17 15:25:54 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 15:25:54 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
Message-ID: <AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>

do your windowing/shifting on the unexpanded query sequences; then transform
the 5-bp queries into regular expressions.

-Aaron

On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Thanks Aaron for your reply.
> Actually i tried that first, but there is another problem, i have to divide
> each query sequence to window size 5 with 1 base shift and its not possible
> to divide regular expression in that way.
> So what i am trying is to convert those iupac codes to 4 base code sequence
> and then do the normal search.
> Now the problem is that i cant able to convert those IUPAC sequences to
> normal ones, i am still trying to write a script but its taking time.
>
> Thanks
> Shalabh
>
>
> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu>wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>> shalabh.sharma7 at gmail.com> wrote:
>>
>>> Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>> >test
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>


From Kevin.M.Brown at asu.edu  Fri Sep 17 16:09:34 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 17 Sep 2010 13:09:34 -0700
Subject: [Bioperl-l] Installing using git after an older installation
In-Reply-To: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>
References: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B40701E0A4@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_IN_A_PERSONAL_MODULE_AREA


From shalabh.sharma7 at gmail.com  Fri Sep 17 16:45:50 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Fri, 17 Sep 2010 16:45:50 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
	<AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>
Message-ID: <AANLkTi=b7MYxC=2DuY3N5Udg9CsfoVDV=Q9cFUOUML0x@mail.gmail.com>

Thanks Aaron,
changing the query sequence worked well but i am still struggling with the
database.

-Shalabh


On Fri, Sep 17, 2010 at 3:25 PM, Aaron Mackey <amackey at virginia.edu> wrote:

> do your windowing/shifting on the unexpanded query sequences; then
> transform the 5-bp queries into regular expressions.
>
> -Aaron
>
>
> On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma <
> shalabh.sharma7 at gmail.com> wrote:
>
>> Thanks Aaron for your reply.
>> Actually i tried that first, but there is another problem, i have to
>> divide each query sequence to window size 5 with 1 base shift and its not
>> possible to divide regular expression in that way.
>> So what i am trying is to convert those iupac codes to 4 base code
>> sequence and then do the normal search.
>> Now the problem is that i cant able to convert those IUPAC sequences to
>> normal ones, i am still trying to write a script but its taking time.
>>
>> Thanks
>> Shalabh
>>
>>
>> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu>wrote:
>>
>>> Convert the IUPAC code to a regular expression, and use regular
>>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>>
>>> -Aaron
>>>
>>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>>> shalabh.sharma7 at gmail.com> wrote:
>>>
>>>> Hi All,
>>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>>> Like
>>>> >test
>>>> VGSRVBSSSSSNSC
>>>>
>>>> Similarly i have a database made of of these kind of sequences. I want
>>>> to
>>>> find sequences that are 100% similar to the query sequence.
>>>>
>>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>>> it
>>>> didn't worked.
>>>> Do i have to convert these sequences to 4 base codes or there is any
>>>> other
>>>> way out.
>>>>
>>>> Thanks
>>>> Shalabh
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>


From heikki.lehvaslaiho at gmail.com  Sat Sep 18 03:41:22 2010
From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho)
Date: Sat, 18 Sep 2010 10:41:22 +0300
Subject: [Bioperl-l] mpiblast
In-Reply-To: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>
References: <loom.20100916T180946-13@post.gmane.org>
	<0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>
Message-ID: <AANLkTi=SOzf5PH605OkPYf2r6w=xaTdXqU+zY7LBwkNw@mail.gmail.com>

Been running 1.6 and its betas on Blue Gene/P for months. The output
is identical to standard BLAST output. No issues in parsing it with
BioPerl.

? ?? -Heikki

Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +966 545 595 849? office: +966 2 808 2429

Computational Bioscience Research Centre (CBRC), Building #2, Office #4216
4700 King Abdullah University of Science and Technology (KAUST)
Thuwal 23955-6900, Kingdom of Saudi Arabia


On 16 September 2010 21:25, Dave Messina <David.Messina at sbc.su.se> wrote:
>> Is the there a standalone like module that allows you to easily call mpiblast?
>
> No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward.
>
> ? ? ? ?http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase
>
>
>> I'm assuming seqio with parse a mpiblast output file correctly?
>
> Yes, although I see that a new version of mpiblast was recently released.
>
> Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet?
>
>
> Dave
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From david.breimann at gmail.com  Sat Sep 18 05:05:58 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 11:05:58 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
Message-ID: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>

Hello,

I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag`
in the fields and sometime it doesn't, even though the genabank has a locus
tag.
Also, is the ID always equivalent to the locus tag?

Thanks,
Dave


From scott at scottcain.net  Sat Sep 18 05:17:24 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 10:17:24 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
Message-ID: <AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>

Hi Dave,

bp_genbank2gff3.pl suffers from the fact that it has to deal with
GenBank files :-)  It was designed initially to work on whole genome
refseqs, and contains several ad hoc rules for trying to make it "do
the right thing."  In practice, it is not unusual for a post
processing step (either by hand or a quicky perl script) to be
required to really get it right.  I don't recall the specifics (if I
ever knew :-) for when and how the locus tag is used, but I do know
that there is a list of things that it will try to use for the ID, and
while the locus is on the list, I don't know where it comes in the
list, so it's possible that other items might supersede it.

Scott


On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Hello,
>
> I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag`
> in the fields and sometime it doesn't, even though the genabank has a locus
> tag.
> Also, is the ID always equivalent to the locus tag?
>
> Thanks,
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 05:20:33 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 11:20:33 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
Message-ID: <AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>

Since locus_tag is an essential tag in genbank, I suggest locus_tag will be
always added to the GFF last column if it exists in the genbank, whether it
is used as ID in the GFF or not.

On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> bp_genbank2gff3.pl suffers from the fact that it has to deal with
> GenBank files :-)  It was designed initially to work on whole genome
> refseqs, and contains several ad hoc rules for trying to make it "do
> the right thing."  In practice, it is not unusual for a post
> processing step (either by hand or a quicky perl script) to be
> required to really get it right.  I don't recall the specifics (if I
> ever knew :-) for when and how the locus tag is used, but I do know
> that there is a list of things that it will try to use for the ID, and
> while the locus is on the list, I don't know where it comes in the
> list, so it's possible that other items might supersede it.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Hello,
> >
> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
> `locus_tag`
> > in the fields and sometime it doesn't, even though the genabank has a
> locus
> > tag.
> > Also, is the ID always equivalent to the locus tag?
> >
> > Thanks,
> > Dave
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


From scott at scottcain.net  Sat Sep 18 06:08:26 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 11:08:26 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
Message-ID: <AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>

Hi Dave,

That seems perfectly reasonable.  If you could point out a GenBank
entry for which that does not happen, I could try to figure out why
not.

Scott


On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Since locus_tag is an essential tag in genbank, I suggest locus_tag will be
> always added to the GFF last column if it exists in the genbank, whether it
> is used as ID in the GFF or not.
>
> On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> bp_genbank2gff3.pl suffers from the fact that it has to deal with
>> GenBank files :-) ?It was designed initially to work on whole genome
>> refseqs, and contains several ad hoc rules for trying to make it "do
>> the right thing." ?In practice, it is not unusual for a post
>> processing step (either by hand or a quicky perl script) to be
>> required to really get it right. ?I don't recall the specifics (if I
>> ever knew :-) for when and how the locus tag is used, but I do know
>> that there is a list of things that it will try to use for the ID, and
>> while the locus is on the list, I don't know where it comes in the
>> list, so it's possible that other items might supersede it.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Hello,
>> >
>> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
>> > `locus_tag`
>> > in the fields and sometime it doesn't, even though the genabank has a
>> > locus
>> > tag.
>> > Also, is the ID always equivalent to the locus tag?
>> >
>> > Thanks,
>> > Dave
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 06:20:50 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 12:20:50 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
Message-ID: <AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>

Hi Scott,

Here is a very short genbank:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk

Note all genes in the genbank have locus tags. In the resulting GFF3,
however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no
idea why it deserves a special treatment... :)

p.s. making this change (i.e., copying locus_tag to the GFF3 last column
whenever available) will really make my life easier.

Thank you,
Dave

On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> That seems perfectly reasonable.  If you could point out a GenBank
> entry for which that does not happen, I could try to figure out why
> not.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will
> be
> > always added to the GFF last column if it exists in the genbank, whether
> it
> > is used as ID in the GFF or not.
> >
> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net>
> wrote:
> >>
> >> Hi Dave,
> >>
> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with
> >> GenBank files :-)  It was designed initially to work on whole genome
> >> refseqs, and contains several ad hoc rules for trying to make it "do
> >> the right thing."  In practice, it is not unusual for a post
> >> processing step (either by hand or a quicky perl script) to be
> >> required to really get it right.  I don't recall the specifics (if I
> >> ever knew :-) for when and how the locus tag is used, but I do know
> >> that there is a list of things that it will try to use for the ID, and
> >> while the locus is on the list, I don't know where it comes in the
> >> list, so it's possible that other items might supersede it.
> >>
> >> Scott
> >>
> >>
> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> >> <david.breimann at gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
> >> > `locus_tag`
> >> > in the fields and sometime it doesn't, even though the genabank has a
> >> > locus
> >> > tag.
> >> > Also, is the ID always equivalent to the locus tag?
> >> >
> >> > Thanks,
> >> > Dave
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


From david.breimann at gmail.com  Sat Sep 18 06:45:13 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 12:45:13 +0200
Subject: [Bioperl-l] Extracting sequences from GFF3
Message-ID: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>

As you know, GFF3 files can contain FASTA sequences after the features.

How do I extract a specific FASTA sequence given it's ID?

I tried:

use Bio::Tools::GFF;
use Data::Dumper;

my $gffio = Bio::Tools::GFF->new(
    -file =>
        "/path/to/file.gff",
    -gff_version => 3
);

print Dumper $gffio->get_seqs();

but $gffio->get_seqs() seems to return nothing, although the GFF3 has
sequences and is also valid.

By the way, I am able to parse the features themselves (using
$gffio->next_feature()).


Thanks,

Dave


From scott at scottcain.net  Sat Sep 18 07:07:13 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 12:07:13 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
Message-ID: <AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>

Hi Dave,

A fresh "pull" of the bioperl git repository shows that
bp_genbank2gff3.pl already does this.  It creates a locus_tag for all
features that have a locus_tag, and uses the locus_tag for the ID when
it can (it can't blindly use the locus tag for the ID since both the
gene and the CDS have the same tag).

Scott


On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Hi Scott,
>
> Here is a very short genbank:
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>
> Note all genes in the genbank have locus tags. In the resulting GFF3,
> however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no
> idea why it deserves a special treatment... :)
>
> p.s. making this change (i.e., copying locus_tag to the GFF3 last column
> whenever available) will really make my life easier.
>
> Thank you,
> Dave
>
> On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> That seems perfectly reasonable. ?If you could point out a GenBank
>> entry for which that does not happen, I could try to figure out why
>> not.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will
>> > be
>> > always added to the GFF last column if it exists in the genbank, whether
>> > it
>> > is used as ID in the GFF or not.
>> >
>> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net>
>> > wrote:
>> >>
>> >> Hi Dave,
>> >>
>> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with
>> >> GenBank files :-) ?It was designed initially to work on whole genome
>> >> refseqs, and contains several ad hoc rules for trying to make it "do
>> >> the right thing." ?In practice, it is not unusual for a post
>> >> processing step (either by hand or a quicky perl script) to be
>> >> required to really get it right. ?I don't recall the specifics (if I
>> >> ever knew :-) for when and how the locus tag is used, but I do know
>> >> that there is a list of things that it will try to use for the ID, and
>> >> while the locus is on the list, I don't know where it comes in the
>> >> list, so it's possible that other items might supersede it.
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
>> >> > `locus_tag`
>> >> > in the fields and sometime it doesn't, even though the genabank has a
>> >> > locus
>> >> > tag.
>> >> > Also, is the ID always equivalent to the locus tag?
>> >> >
>> >> > Thanks,
>> >> > Dave
>> >> > _______________________________________________
>> >> > Bioperl-l mailing list
>> >> > Bioperl-l at lists.open-bio.org
>> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 07:13:23 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 12:13:23 +0100
Subject: [Bioperl-l] Extracting sequences from GFF3
In-Reply-To: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>
References: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>
Message-ID: <AANLkTimc9Zf2a2N64wUUancrHWJ05wRKgfMuUQ0X9UTe@mail.gmail.com>

Hi Dave,

I would use Bio::DB::SeqFeature::Store (either with a database on the
backend or a flat file if a database isn't warranted):

  my $db      = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory',
                                                        -dir =>
'path/to/file' );

  # Warning: this returns a string, and not a PrimarySeq object
  my $sequence = $db->fetch_sequence('Chr1',5000=>6000);

Scott


On Sat, Sep 18, 2010 at 11:45 AM, David Breimann
<david.breimann at gmail.com> wrote:
> As you know, GFF3 files can contain FASTA sequences after the features.
>
> How do I extract a specific FASTA sequence given it's ID?
>
> I tried:
>
> use Bio::Tools::GFF;
> use Data::Dumper;
>
> my $gffio = Bio::Tools::GFF->new(
>    -file =>
>        "/path/to/file.gff",
>    -gff_version => 3
> );
>
> print Dumper $gffio->get_seqs();
>
> but $gffio->get_seqs() seems to return nothing, although the GFF3 has
> sequences and is also valid.
>
> By the way, I am able to parse the features themselves (using
> $gffio->next_feature()).
>
>
> Thanks,
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 09:40:35 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 14:40:35 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
Message-ID: <AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>

Hi Dave,

Let's keep the discussion on the mailing list so we can make sure that
when this problem is solved, its resolution will be archived.

I don't really understand what is going on either, though it would
probably be a good idea to set your PERL5LIB env variable so that when
you execute this script from the git repository that it will also uses
BioPerl modules in the git repository instead of the ones that are
installed in your "normal" path.

Also, are you using any command line flags when executing it?  I didn't.

Scott


On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Yes, I'm using Ubuntu 10.04.
>
> That is really weired. I tried running the script from the perl-live dir
> (which I just pulled using git), and I get the same results as before
> (`Name` instead of `locus_tag`):
>
> ?$ wget
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
> NC_009789.genbank
>
> Attached is the resulting GFF3.
> I also attach a copy of bp_genbank2gff3.pl as found under
> /home/dave/src/bioperl-live/blib/script.
>
> This is a real mystery for me!
>
> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Typically I do build and install, but you can run it directly from the
>> git checkout directory.
>>
>> For locating other versions of the script, are you running linux? ?If
>> so, are you familiar with the "locate" command:
>>
>> ?locate bp_genbank2gff3.pl
>>
>> If you've never used it before, you may need to update the database
>> the locate command uses as root:
>>
>> ?sudo updatedb
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>> > of
>> > `locus_tag=`.
>> >
>> > I don't really know how to check for multiple bioperl installations.
>> > I'm using my personal server, so I don't mind removing and installing
>> > everything from scratch -- but I do'nt know ho to do that.
>> >
>> > Also, what I don't get with the git is how the scripts are supposed to
>> > be
>> > updated (unless you build and install).
>> >
>> > Thanks you!
>> >
>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net> wrote:
>> >>
>> >> Well, if you aren't getting the same results as me then I'd say you
>> >> aren't using the same version of the script :-)
>> >>
>> >> Unfortunately, the scripts are no longer automatically marked with the
>> >> "internal" version information when committed, so there really isn't
>> >> anything in the script I can tell you to look for. ?Check for more
>> >> than one bioperl instance on your ?computer.
>> >>
>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>> >> it is what you expect.
>> >>
>> >> Scott
>> >>
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Hi Scott,
>> >> >
>> >> > I just pulled the lated bioperl-live using git.
>> >> > I'm not sure how the scripts are updated, so I Build and installed
>> >> > anyway
>> >> > (perhaps exporting the path is supposed to be enough?)
>> >> > Anyway, I still get the same results. No locus_tag.
>> >> > How can I tell if I'm using the latest version of the script?
>> >> >
>> >> > Thanks again.
>> >> >
>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>> >> > wrote:
>> >> >>
>> >> >> Hi Dave,
>> >> >>
>> >> >> A fresh "pull" of the bioperl git repository shows that
>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>> >> >> all
>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>> >> >> when
>> >> >> it can (it can't blindly use the locus tag for the ID since both the
>> >> >> gene and the CDS have the same tag).
>> >> >>
>> >> >> Scott
>> >> >>
>> >> >>
>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>> >> >> <david.breimann at gmail.com> wrote:
>> >> >> > Hi Scott,
>> >> >> >
>> >> >> > Here is a very short genbank:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> >> >> >
>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>> >> >> > GFF3,
>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>> >> >> > have
>> >> >> > no
>> >> >> > idea why it deserves a special treatment... :)
>> >> >> >
>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>> >> >> > column
>> >> >> > whenever available) will really make my life easier.
>> >> >> >
>> >> >> > Thank you,
>> >> >> > Dave
>> >> >> >
>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Dave,
>> >> >> >>
>> >> >> >> That seems perfectly reasonable. ?If you could point out a
>> >> >> >> GenBank
>> >> >> >> entry for which that does not happen, I could try to figure out
>> >> >> >> why
>> >> >> >> not.
>> >> >> >>
>> >> >> >> Scott
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>> >> >> >> > locus_tag
>> >> >> >> > will
>> >> >> >> > be
>> >> >> >> > always added to the GFF last column if it exists in the
>> >> >> >> > genbank,
>> >> >> >> > whether
>> >> >> >> > it
>> >> >> >> > is used as ID in the GFF or not.
>> >> >> >> >
>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>> >> >> >> > <scott at scottcain.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi Dave,
>> >> >> >> >>
>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>> >> >> >> >> with
>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole
>> >> >> >> >> genome
>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>> >> >> >> >> it
>> >> >> >> >> "do
>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post
>> >> >> >> >> processing step (either by hand or a quicky perl script) to be
>> >> >> >> >> required to really get it right. ?I don't recall the specifics
>> >> >> >> >> (if I
>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I do
>> >> >> >> >> know
>> >> >> >> >> that there is a list of things that it will try to use for the
>> >> >> >> >> ID,
>> >> >> >> >> and
>> >> >> >> >> while the locus is on the list, I don't know where it comes in
>> >> >> >> >> the
>> >> >> >> >> list, so it's possible that other items might supersede it.
>> >> >> >> >>
>> >> >> >> >> Scott
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds
>> >> >> >> >> > a
>> >> >> >> >> > `locus_tag`
>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>> >> >> >> >> > genabank
>> >> >> >> >> > has a
>> >> >> >> >> > locus
>> >> >> >> >> > tag.
>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>> >> >> >> >> >
>> >> >> >> >> > Thanks,
>> >> >> >> >> > Dave
>> >> >> >> >> > _______________________________________________
>> >> >> >> >> > Bioperl-l mailing list
>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> >> scottcain
>> >> >> >> >> dot net
>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> 216-392-3087
>> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------
>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> scottcain
>> >> >> >> dot net
>> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> 216-392-3087
>> >> >> >> Ontario Institute for Cancer Research
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------
>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> scottcain
>> >> >> dot net
>> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> >> Ontario Institute for Cancer Research
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 09:48:35 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 14:48:35 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimt5Ty6-qXigmF1w+h5ZvMCaPZ7wO=YXx8TJPqO@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTimt5Ty6-qXigmF1w+h5ZvMCaPZ7wO=YXx8TJPqO@mail.gmail.com>
Message-ID: <AANLkTimMZ4vZUw2Gsd-NmaWL8xiD4hryH8Z-7UezT=LO@mail.gmail.com>

Hi Dave,

The blib directory is not part of the repository; it is created when
you execute ./Build as a staging area before installation.  The
directory that the script resides is scripts/Bio-DB-GFF/

Scott


On Sat, Sep 18, 2010 at 2:40 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Now I did a fresh clone (instead of pull) into a new dir:
>
> $ git clone http://github.com/bioperl/bioperl-live.git
>
> but I don't find the script at all (there is no blib dir as before)...
>
>
> On Sat, Sep 18, 2010 at 3:14 PM, David Breimann <david.breimann at gmail.com>
> wrote:
>>
>> Yes, I'm using Ubuntu 10.04.
>>
>> That is really weired. I tried running the script from the perl-live dir
>> (which I just pulled using git), and I get the same results as before
>> (`Name` instead of `locus_tag`):
>>
>> ?$ wget
>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
>> NC_009789.genbank
>>
>> Attached is the resulting GFF3.
>> I also attach a copy of bp_genbank2gff3.pl as found under
>> /home/dave/src/bioperl-live/blib/script.
>>
>> This is a real mystery for me!
>>
>> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>>
>>> Typically I do build and install, but you can run it directly from the
>>> git checkout directory.
>>>
>>> For locating other versions of the script, are you running linux? ?If
>>> so, are you familiar with the "locate" command:
>>>
>>> ?locate bp_genbank2gff3.pl
>>>
>>> If you've never used it before, you may need to update the database
>>> the locate command uses as root:
>>>
>>> ?sudo updatedb
>>>
>>> Scott
>>>
>>>
>>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>>> <david.breimann at gmail.com> wrote:
>>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>>> > of
>>> > `locus_tag=`.
>>> >
>>> > I don't really know how to check for multiple bioperl installations.
>>> > I'm using my personal server, so I don't mind removing and installing
>>> > everything from scratch -- but I do'nt know ho to do that.
>>> >
>>> > Also, what I don't get with the git is how the scripts are supposed to
>>> > be
>>> > updated (unless you build and install).
>>> >
>>> > Thanks you!
>>> >
>>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
>>> > wrote:
>>> >>
>>> >> Well, if you aren't getting the same results as me then I'd say you
>>> >> aren't using the same version of the script :-)
>>> >>
>>> >> Unfortunately, the scripts are no longer automatically marked with the
>>> >> "internal" version information when committed, so there really isn't
>>> >> anything in the script I can tell you to look for. ?Check for more
>>> >> than one bioperl instance on your ?computer.
>>> >>
>>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>>> >> it is what you expect.
>>> >>
>>> >> Scott
>>> >>
>>> >>
>>> >>
>>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>>> >> <david.breimann at gmail.com> wrote:
>>> >> > Hi Scott,
>>> >> >
>>> >> > I just pulled the lated bioperl-live using git.
>>> >> > I'm not sure how the scripts are updated, so I Build and installed
>>> >> > anyway
>>> >> > (perhaps exporting the path is supposed to be enough?)
>>> >> > Anyway, I still get the same results. No locus_tag.
>>> >> > How can I tell if I'm using the latest version of the script?
>>> >> >
>>> >> > Thanks again.
>>> >> >
>>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Dave,
>>> >> >>
>>> >> >> A fresh "pull" of the bioperl git repository shows that
>>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>>> >> >> all
>>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>>> >> >> when
>>> >> >> it can (it can't blindly use the locus tag for the ID since both
>>> >> >> the
>>> >> >> gene and the CDS have the same tag).
>>> >> >>
>>> >> >> Scott
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>>> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> > Hi Scott,
>>> >> >> >
>>> >> >> > Here is a very short genbank:
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>>> >> >> >
>>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>>> >> >> > GFF3,
>>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>>> >> >> > have
>>> >> >> > no
>>> >> >> > idea why it deserves a special treatment... :)
>>> >> >> >
>>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>>> >> >> > column
>>> >> >> > whenever available) will really make my life easier.
>>> >> >> >
>>> >> >> > Thank you,
>>> >> >> > Dave
>>> >> >> >
>>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain
>>> >> >> > <scott at scottcain.net>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hi Dave,
>>> >> >> >>
>>> >> >> >> That seems perfectly reasonable. ?If you could point out a
>>> >> >> >> GenBank
>>> >> >> >> entry for which that does not happen, I could try to figure out
>>> >> >> >> why
>>> >> >> >> not.
>>> >> >> >>
>>> >> >> >> Scott
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>>> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>>> >> >> >> > locus_tag
>>> >> >> >> > will
>>> >> >> >> > be
>>> >> >> >> > always added to the GFF last column if it exists in the
>>> >> >> >> > genbank,
>>> >> >> >> > whether
>>> >> >> >> > it
>>> >> >> >> > is used as ID in the GFF or not.
>>> >> >> >> >
>>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>>> >> >> >> > <scott at scottcain.net>
>>> >> >> >> > wrote:
>>> >> >> >> >>
>>> >> >> >> >> Hi Dave,
>>> >> >> >> >>
>>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>>> >> >> >> >> with
>>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole
>>> >> >> >> >> genome
>>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>>> >> >> >> >> it
>>> >> >> >> >> "do
>>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post
>>> >> >> >> >> processing step (either by hand or a quicky perl script) to
>>> >> >> >> >> be
>>> >> >> >> >> required to really get it right. ?I don't recall the
>>> >> >> >> >> specifics
>>> >> >> >> >> (if I
>>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
>>> >> >> >> >> do
>>> >> >> >> >> know
>>> >> >> >> >> that there is a list of things that it will try to use for
>>> >> >> >> >> the
>>> >> >> >> >> ID,
>>> >> >> >> >> and
>>> >> >> >> >> while the locus is on the list, I don't know where it comes
>>> >> >> >> >> in
>>> >> >> >> >> the
>>> >> >> >> >> list, so it's possible that other items might supersede it.
>>> >> >> >> >>
>>> >> >> >> >> Scott
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>>> >> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> >> > Hello,
>>> >> >> >> >> >
>>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
>>> >> >> >> >> > adds a
>>> >> >> >> >> > `locus_tag`
>>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>>> >> >> >> >> > genabank
>>> >> >> >> >> > has a
>>> >> >> >> >> > locus
>>> >> >> >> >> > tag.
>>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>>> >> >> >> >> >
>>> >> >> >> >> > Thanks,
>>> >> >> >> >> > Dave
>>> >> >> >> >> > _______________________________________________
>>> >> >> >> >> > Bioperl-l mailing list
>>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> >> >> scottcain
>>> >> >> >> >> dot net
>>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> >> 216-392-3087
>>> >> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> >> scottcain
>>> >> >> >> dot net
>>> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> 216-392-3087
>>> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >>
>>> >> >>
>>> >> >> ------------------------------------------------------------------------
>>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> scottcain
>>> >> >> dot net
>>> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> 216-392-3087
>>> >> >> Ontario Institute for Cancer Research
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> ------------------------------------------------------------------------
>>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> scottcain
>>> >> dot net
>>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>>> >> Ontario Institute for Cancer Research
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>>> dot net
>>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>>> Ontario Institute for Cancer Research
>>
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 09:57:30 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 15:57:30 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
Message-ID: <AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>

So let's do an intermediate summary of my situation:
I'm using Ubuntu 10.04 and Perl 5.10.1.
I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of
"locus_tag=" in the last GFF3 column), while Scott gets the expected results
while using the latest version of bioperl.
I cloned a fresh version of bioperl live into my ~/src:
$ cd ~/src
$ git clone http://github.com/bioperl/bioperl-live.git

I then added the following line to the end of ~/.profile:
export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
and ran
$ source ~/.profile

I then downloaded a small genome from NCBI
$ wget
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
and tested the script:
$ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk

Following are the top 10 lines of the resulting GFF3:

##gff-version 3
# sequence-region NC_009789 1 6199
# conversion-by bp_genbank2gff3.pl
# organism Escherichia coli E24377A
# date 06-JAN-2010
# Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
NC_009789    GenBank    region    1    6199    .    +    1
ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia
coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This
record has not yet been subject to final NCBI review. The reference sequence
was derived from CP000798. Source DNA and bacteria available from Jacques
Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL
REFSEQ: This record has not yet been subject to final NCBI review. The
reference sequence was derived from CP000798. Source DNA and bacteria
available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
E24377A;plasmid=pETEC_6;strain=E24377A
NC_009789    GenBank    gene    665    781    .    -    1
ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001
NC_009789    GenBank    mRNA    665    781    .    -    1
ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
NC_009789    GenBank    CDS    665    781    .    -    1
ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified
by glimmer%3B putative;codon_start=1;product=hypothetical
protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38

while these are from Scotts' file:
##gff-version 3
# sequence-region NC_009789 1 6199
# conversion-by bp_genbank2gff3.pl
# organism Escherichia coli E24377A
# date 06-JAN-2010
# Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
NC_009789    GenBank    region    1    6199    .    +    1
ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A
plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has
not yet been subject to final NCBI review. The reference sequence was
derived from CP000798. Source DNA and bacteria available from Jacques Ravel
(jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ:
This record has not yet been subject to final NCBI review. The reference
sequence was derived from CP000798. Source DNA and bacteria available from
Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
E24377A;plasmid=pETEC_6;strain=E24377A
NC_009789    GenBank    gene    665    781    .    -    1
ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001
NC_009789    GenBank    mRNA    665    781    .    -    1
ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
NC_009789    GenBank    CDS    665    781    .    -    1
ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified
by glimmer%3B
putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical
protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38


Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's,
as desired.
I have no idea what is going on here...

Best,
Dave

On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> Let's keep the discussion on the mailing list so we can make sure that
> when this problem is solved, its resolution will be archived.
>
> I don't really understand what is going on either, though it would
> probably be a good idea to set your PERL5LIB env variable so that when
> you execute this script from the git repository that it will also uses
> BioPerl modules in the git repository instead of the ones that are
> installed in your "normal" path.
>
> Also, are you using any command line flags when executing it?  I didn't.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Yes, I'm using Ubuntu 10.04.
> >
> > That is really weired. I tried running the script from the perl-live dir
> > (which I just pulled using git), and I get the same results as before
> > (`Name` instead of `locus_tag`):
> >
> >  $ wget
> >
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> >  $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
> > NC_009789.genbank
> >
> > Attached is the resulting GFF3.
> > I also attach a copy of bp_genbank2gff3.pl as found under
> > /home/dave/src/bioperl-live/blib/script.
> >
> > This is a real mystery for me!
> >
> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
> >>
> >> Typically I do build and install, but you can run it directly from the
> >> git checkout directory.
> >>
> >> For locating other versions of the script, are you running linux?  If
> >> so, are you familiar with the "locate" command:
> >>
> >>  locate bp_genbank2gff3.pl
> >>
> >> If you've never used it before, you may need to update the database
> >> the locate command uses as root:
> >>
> >>  sudo updatedb
> >>
> >> Scott
> >>
> >>
> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
> >> <david.breimann at gmail.com> wrote:
> >> > Your gff seems fine. I get a vey similiar one, but with `Name=`
> instaed
> >> > of
> >> > `locus_tag=`.
> >> >
> >> > I don't really know how to check for multiple bioperl installations.
> >> > I'm using my personal server, so I don't mind removing and installing
> >> > everything from scratch -- but I do'nt know ho to do that.
> >> >
> >> > Also, what I don't get with the git is how the scripts are supposed to
> >> > be
> >> > updated (unless you build and install).
> >> >
> >> > Thanks you!
> >> >
> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
> wrote:
> >> >>
> >> >> Well, if you aren't getting the same results as me then I'd say you
> >> >> aren't using the same version of the script :-)
> >> >>
> >> >> Unfortunately, the scripts are no longer automatically marked with
> the
> >> >> "internal" version information when committed, so there really isn't
> >> >> anything in the script I can tell you to look for.  Check for more
> >> >> than one bioperl instance on your  computer.
> >> >>
> >> >> I've attached the GFF3 file I got so you can look at it and tell me
> if
> >> >> it is what you expect.
> >> >>
> >> >> Scott
> >> >>
> >> >>
> >> >>
> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
> >> >> <david.breimann at gmail.com> wrote:
> >> >> > Hi Scott,
> >> >> >
> >> >> > I just pulled the lated bioperl-live using git.
> >> >> > I'm not sure how the scripts are updated, so I Build and installed
> >> >> > anyway
> >> >> > (perhaps exporting the path is supposed to be enough?)
> >> >> > Anyway, I still get the same results. No locus_tag.
> >> >> > How can I tell if I'm using the latest version of the script?
> >> >> >
> >> >> > Thanks again.
> >> >> >
> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi Dave,
> >> >> >>
> >> >> >> A fresh "pull" of the bioperl git repository shows that
> >> >> >> bp_genbank2gff3.pl already does this.  It creates a locus_tag for
> >> >> >> all
> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID
> >> >> >> when
> >> >> >> it can (it can't blindly use the locus tag for the ID since both
> the
> >> >> >> gene and the CDS have the same tag).
> >> >> >>
> >> >> >> Scott
> >> >> >>
> >> >> >>
> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> > Hi Scott,
> >> >> >> >
> >> >> >> > Here is a very short genbank:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> >> >> >> >
> >> >> >> > Note all genes in the genbank have locus tags. In the resulting
> >> >> >> > GFF3,
> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag.
> I
> >> >> >> > have
> >> >> >> > no
> >> >> >> > idea why it deserves a special treatment... :)
> >> >> >> >
> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3
> last
> >> >> >> > column
> >> >> >> > whenever available) will really make my life easier.
> >> >> >> >
> >> >> >> > Thank you,
> >> >> >> > Dave
> >> >> >> >
> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <
> scott at scottcain.net>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Hi Dave,
> >> >> >> >>
> >> >> >> >> That seems perfectly reasonable.  If you could point out a
> >> >> >> >> GenBank
> >> >> >> >> entry for which that does not happen, I could try to figure out
> >> >> >> >> why
> >> >> >> >> not.
> >> >> >> >>
> >> >> >> >> Scott
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
> >> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
> >> >> >> >> > locus_tag
> >> >> >> >> > will
> >> >> >> >> > be
> >> >> >> >> > always added to the GFF last column if it exists in the
> >> >> >> >> > genbank,
> >> >> >> >> > whether
> >> >> >> >> > it
> >> >> >> >> > is used as ID in the GFF or not.
> >> >> >> >> >
> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
> >> >> >> >> > <scott at scottcain.net>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> Hi Dave,
> >> >> >> >> >>
> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to
> deal
> >> >> >> >> >> with
> >> >> >> >> >> GenBank files :-)  It was designed initially to work on
> whole
> >> >> >> >> >> genome
> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to
> make
> >> >> >> >> >> it
> >> >> >> >> >> "do
> >> >> >> >> >> the right thing."  In practice, it is not unusual for a post
> >> >> >> >> >> processing step (either by hand or a quicky perl script) to
> be
> >> >> >> >> >> required to really get it right.  I don't recall the
> specifics
> >> >> >> >> >> (if I
> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
> do
> >> >> >> >> >> know
> >> >> >> >> >> that there is a list of things that it will try to use for
> the
> >> >> >> >> >> ID,
> >> >> >> >> >> and
> >> >> >> >> >> while the locus is on the list, I don't know where it comes
> in
> >> >> >> >> >> the
> >> >> >> >> >> list, so it's possible that other items might supersede it.
> >> >> >> >> >>
> >> >> >> >> >> Scott
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> >> >> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> >> >> > Hello,
> >> >> >> >> >> >
> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
> adds
> >> >> >> >> >> > a
> >> >> >> >> >> > `locus_tag`
> >> >> >> >> >> > in the fields and sometime it doesn't, even though the
> >> >> >> >> >> > genabank
> >> >> >> >> >> > has a
> >> >> >> >> >> > locus
> >> >> >> >> >> > tag.
> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
> >> >> >> >> >> >
> >> >> >> >> >> > Thanks,
> >> >> >> >> >> > Dave
> >> >> >> >> >> > _______________________________________________
> >> >> >> >> >> > Bioperl-l mailing list
> >> >> >> >> >> > Bioperl-l at lists.open-bio.org
> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> >> >> Scott Cain, Ph. D.                                   scott
> at
> >> >> >> >> >> scottcain
> >> >> >> >> >> dot net
> >> >> >> >> >> GMOD Coordinator (http://gmod.org/)
> >> >> >> >> >> 216-392-3087
> >> >> >> >> >> Ontario Institute for Cancer Research
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> >> Scott Cain, Ph. D.                                   scott at
> >> >> >> >> scottcain
> >> >> >> >> dot net
> >> >> >> >> GMOD Coordinator (http://gmod.org/)
> >> >> >> >> 216-392-3087
> >> >> >> >> Ontario Institute for Cancer Research
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >>
> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> Scott Cain, Ph. D.                                   scott at
> >> >> >> scottcain
> >> >> >> dot net
> >> >> >> GMOD Coordinator (http://gmod.org/)
> 216-392-3087
> >> >> >> Ontario Institute for Cancer Research
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>
> ------------------------------------------------------------------------
> >> >> Scott Cain, Ph. D.                                   scott at
> scottcain
> >> >> dot net
> >> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> >> Ontario Institute for Cancer Research
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


From scott at scottcain.net  Sat Sep 18 10:03:43 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 15:03:43 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
	<AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>
Message-ID: <AANLkTi=yGr+VHDinKtSiy7XE3f7_F73hGRA4bNFzRGxz@mail.gmail.com>

The only thing I can add is that I did a 'git diff genbank2gff3.PLS'
and found no differences.  It occurred to me that perhaps I'd done
some fixing and not commited it, but it looks to me that that's not
the case (assuming I've managed to use git correctly (not a great
assumption, but I don't have another one to work with :-))

Scott


On Sat, Sep 18, 2010 at 2:57 PM, David Breimann
<david.breimann at gmail.com> wrote:
> So let's do an intermediate summary of my situation:
> I'm using Ubuntu 10.04 and Perl 5.10.1.
> I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of
> "locus_tag=" in the last GFF3 column), while Scott gets the expected results
> while using the latest version of bioperl.
> I cloned a fresh version of bioperl live into my ~/src:
> $ cd ~/src
> $ git clone http://github.com/bioperl/bioperl-live.git
>
> I then added the following line to the end of ~/.profile:
> export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
> and ran
> $ source ~/.profile
>
> I then downloaded a small genome from NCBI
> $ wget
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> and tested the script:
> $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk
>
> Following are the top 10 lines of the resulting GFF3:
>
> ##gff-version 3
> # sequence-region NC_009789 1 6199
> # conversion-by bp_genbank2gff3.pl
> # organism Escherichia coli E24377A
> # date 06-JAN-2010
> # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
> NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1
> ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia
> coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This
> record has not yet been subject to final NCBI review. The reference sequence
> was derived from CP000798. Source DNA and bacteria available from Jacques
> Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL
> REFSEQ: This record has not yet been subject to final NCBI review. The
> reference sequence was derived from CP000798. Source DNA and bacteria
> available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
> ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
> E24377A;plasmid=pETEC_6;strain=E24377A
> NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001
> NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
> NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified
> by glimmer%3B putative;codon_start=1;product=hypothetical
> protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38
>
> while these are from Scotts' file:
> ##gff-version 3
> # sequence-region NC_009789 1 6199
> # conversion-by bp_genbank2gff3.pl
> # organism Escherichia coli E24377A
> # date 06-JAN-2010
> # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
> NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1
> ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A
> plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has
> not yet been subject to final NCBI review. The reference sequence was
> derived from CP000798. Source DNA and bacteria available from Jacques Ravel
> (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ:
> This record has not yet been subject to final NCBI review. The reference
> sequence was derived from CP000798. Source DNA and bacteria available from
> Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
> ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
> E24377A;plasmid=pETEC_6;strain=E24377A
> NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001
> NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
> NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified
> by glimmer%3B
> putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical
> protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38
>
>
> Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's,
> as desired.
> I have no idea what is going on here...
>
> Best,
> Dave
>
> On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> Let's keep the discussion on the mailing list so we can make sure that
>> when this problem is solved, its resolution will be archived.
>>
>> I don't really understand what is going on either, though it would
>> probably be a good idea to set your PERL5LIB env variable so that when
>> you execute this script from the git repository that it will also uses
>> BioPerl modules in the git repository instead of the ones that are
>> installed in your "normal" path.
>>
>> Also, are you using any command line flags when executing it? ?I didn't.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Yes, I'm using Ubuntu 10.04.
>> >
>> > That is really weired. I tried running the script from the perl-live dir
>> > (which I just pulled using git), and I get the same results as before
>> > (`Name` instead of `locus_tag`):
>> >
>> > ?$ wget
>> >
>> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
>> > NC_009789.genbank
>> >
>> > Attached is the resulting GFF3.
>> > I also attach a copy of bp_genbank2gff3.pl as found under
>> > /home/dave/src/bioperl-live/blib/script.
>> >
>> > This is a real mystery for me!
>> >
>> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>> >>
>> >> Typically I do build and install, but you can run it directly from the
>> >> git checkout directory.
>> >>
>> >> For locating other versions of the script, are you running linux? ?If
>> >> so, are you familiar with the "locate" command:
>> >>
>> >> ?locate bp_genbank2gff3.pl
>> >>
>> >> If you've never used it before, you may need to update the database
>> >> the locate command uses as root:
>> >>
>> >> ?sudo updatedb
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Your gff seems fine. I get a vey similiar one, but with `Name=`
>> >> > instaed
>> >> > of
>> >> > `locus_tag=`.
>> >> >
>> >> > I don't really know how to check for multiple bioperl installations.
>> >> > I'm using my personal server, so I don't mind removing and installing
>> >> > everything from scratch -- but I do'nt know ho to do that.
>> >> >
>> >> > Also, what I don't get with the git is how the scripts are supposed
>> >> > to
>> >> > be
>> >> > updated (unless you build and install).
>> >> >
>> >> > Thanks you!
>> >> >
>> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
>> >> > wrote:
>> >> >>
>> >> >> Well, if you aren't getting the same results as me then I'd say you
>> >> >> aren't using the same version of the script :-)
>> >> >>
>> >> >> Unfortunately, the scripts are no longer automatically marked with
>> >> >> the
>> >> >> "internal" version information when committed, so there really isn't
>> >> >> anything in the script I can tell you to look for. ?Check for more
>> >> >> than one bioperl instance on your ?computer.
>> >> >>
>> >> >> I've attached the GFF3 file I got so you can look at it and tell me
>> >> >> if
>> >> >> it is what you expect.
>> >> >>
>> >> >> Scott
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>> >> >> <david.breimann at gmail.com> wrote:
>> >> >> > Hi Scott,
>> >> >> >
>> >> >> > I just pulled the lated bioperl-live using git.
>> >> >> > I'm not sure how the scripts are updated, so I Build and installed
>> >> >> > anyway
>> >> >> > (perhaps exporting the path is supposed to be enough?)
>> >> >> > Anyway, I still get the same results. No locus_tag.
>> >> >> > How can I tell if I'm using the latest version of the script?
>> >> >> >
>> >> >> > Thanks again.
>> >> >> >
>> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Dave,
>> >> >> >>
>> >> >> >> A fresh "pull" of the bioperl git repository shows that
>> >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>> >> >> >> all
>> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>> >> >> >> when
>> >> >> >> it can (it can't blindly use the locus tag for the ID since both
>> >> >> >> the
>> >> >> >> gene and the CDS have the same tag).
>> >> >> >>
>> >> >> >> Scott
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> > Hi Scott,
>> >> >> >> >
>> >> >> >> > Here is a very short genbank:
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> >> >> >> >
>> >> >> >> > Note all genes in the genbank have locus tags. In the resulting
>> >> >> >> > GFF3,
>> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag.
>> >> >> >> > I
>> >> >> >> > have
>> >> >> >> > no
>> >> >> >> > idea why it deserves a special treatment... :)
>> >> >> >> >
>> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3
>> >> >> >> > last
>> >> >> >> > column
>> >> >> >> > whenever available) will really make my life easier.
>> >> >> >> >
>> >> >> >> > Thank you,
>> >> >> >> > Dave
>> >> >> >> >
>> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain
>> >> >> >> > <scott at scottcain.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi Dave,
>> >> >> >> >>
>> >> >> >> >> That seems perfectly reasonable. ?If you could point out a
>> >> >> >> >> GenBank
>> >> >> >> >> entry for which that does not happen, I could try to figure
>> >> >> >> >> out
>> >> >> >> >> why
>> >> >> >> >> not.
>> >> >> >> >>
>> >> >> >> >> Scott
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>> >> >> >> >> > locus_tag
>> >> >> >> >> > will
>> >> >> >> >> > be
>> >> >> >> >> > always added to the GFF last column if it exists in the
>> >> >> >> >> > genbank,
>> >> >> >> >> > whether
>> >> >> >> >> > it
>> >> >> >> >> > is used as ID in the GFF or not.
>> >> >> >> >> >
>> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>> >> >> >> >> > <scott at scottcain.net>
>> >> >> >> >> > wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> Hi Dave,
>> >> >> >> >> >>
>> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to
>> >> >> >> >> >> deal
>> >> >> >> >> >> with
>> >> >> >> >> >> GenBank files :-) ?It was designed initially to work on
>> >> >> >> >> >> whole
>> >> >> >> >> >> genome
>> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to
>> >> >> >> >> >> make
>> >> >> >> >> >> it
>> >> >> >> >> >> "do
>> >> >> >> >> >> the right thing." ?In practice, it is not unusual for a
>> >> >> >> >> >> post
>> >> >> >> >> >> processing step (either by hand or a quicky perl script) to
>> >> >> >> >> >> be
>> >> >> >> >> >> required to really get it right. ?I don't recall the
>> >> >> >> >> >> specifics
>> >> >> >> >> >> (if I
>> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
>> >> >> >> >> >> do
>> >> >> >> >> >> know
>> >> >> >> >> >> that there is a list of things that it will try to use for
>> >> >> >> >> >> the
>> >> >> >> >> >> ID,
>> >> >> >> >> >> and
>> >> >> >> >> >> while the locus is on the list, I don't know where it comes
>> >> >> >> >> >> in
>> >> >> >> >> >> the
>> >> >> >> >> >> list, so it's possible that other items might supersede it.
>> >> >> >> >> >>
>> >> >> >> >> >> Scott
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >
>> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
>> >> >> >> >> >> > adds
>> >> >> >> >> >> > a
>> >> >> >> >> >> > `locus_tag`
>> >> >> >> >> >> > in the fields and sometime it doesn't, even though the
>> >> >> >> >> >> > genabank
>> >> >> >> >> >> > has a
>> >> >> >> >> >> > locus
>> >> >> >> >> >> > tag.
>> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>> >> >> >> >> >> >
>> >> >> >> >> >> > Thanks,
>> >> >> >> >> >> > Dave
>> >> >> >> >> >> > _______________________________________________
>> >> >> >> >> >> > Bioperl-l mailing list
>> >> >> >> >> >> > Bioperl-l at lists.open-bio.org
>> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott
>> >> >> >> >> >> at
>> >> >> >> >> >> scottcain
>> >> >> >> >> >> dot net
>> >> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> >> 216-392-3087
>> >> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> >> scottcain
>> >> >> >> >> dot net
>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> 216-392-3087
>> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------
>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> scottcain
>> >> >> >> dot net
>> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> 216-392-3087
>> >> >> >> Ontario Institute for Cancer Research
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------
>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> scottcain
>> >> >> dot net
>> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> >> Ontario Institute for Cancer Research
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From j.scholtalbers at gmail.com  Mon Sep 20 04:04:34 2010
From: j.scholtalbers at gmail.com (Jelle Scholtalbers)
Date: Mon, 20 Sep 2010 10:04:34 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
Message-ID: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>

Hi,

I'm trying to get all descendents for a specific taxon using Entrez.
each_Descendent and get_all_Descendents don't seem to be implemented or
working.  I then tried by getting the tree for this taxon using
Bio::DB::Taxonomy's get_tree. However this only retrieves the
ancestors/parents.
What would be the best approach here?

Cheers,
Jelle

On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> wrote:

> Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> Eric
>
> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> > Sounds like this is going through an initial indexing step (for
> flatfiles).  I would expect the initial indexing of the tables to take time
> as you have to create the DB, but subsequent lookups post-indexing should be
> much faster if the index is already present.  Maybe Jason could answer in
> more detail?
> >
> > chris
> >
> > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
> >
> >> Hello,
> >>
> >> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> >> 5.8.5 with BioPerl 1.6.0
> >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> >>
> >> It ran for 100 cpu seconds and output:
> >>
> >> 33090 Viridiplantae kingdom
> >>
> >> I was expecting it to also output the descendents. Some questions:
> >>
> >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> >> implemented? It looks to be in Taxon.pm but it is not documented and
> >> when I ran Data::Dumper on $node the value '_desc' was empty.
> >>
> >> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> >> with a call to 'entrez' it took only 0.02 cpu seconds to come
> >> up with the same result.
> >>
> >> thanks,
> >> Eric
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From pcantalupo at gmail.com  Mon Sep 20 10:46:32 2010
From: pcantalupo at gmail.com (Paul Cantalupo)
Date: Mon, 20 Sep 2010 10:46:32 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <AANLkTikCi7XiKptEghr7uckAUBcD_K3n_YAgHEJjGo_F@mail.gmail.com>

Jelle,

Below is my subroutine that returns the lineage corresponding to a
Taxonomy id. For example, if you use 10633 as the taxid, the
subroutine will return:

Viruses
dsDNA viruses, no RNA stage
Polyomaviridae
Polyomavirus
Simian virus 40

I hope this is what you wanted. Good luck

sub taxid2lineage {
?? my ($id) = @_;
?? return undef unless ($id);

?? my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
????????????????????????????????????????? -db??? => 'taxonomy',
????????????????????????????????????????? -email => 'pcantalupo at gmail.com',
????????????????????????????????????????? -id??? => [ $id ],
????????????????????????????????????????? );

?? my $res = $factory->get_Response->content;
?? my $data = XMLin($res);

?? if (!ref($data)) {
????? # this happens when the Taxid is not found in the Taxonomy DB
????? return $data;
?? }

?? my @lineage = ();

?? foreach my $taxa (@{ $data->{Taxon}->{LineageEx}->{Taxon} } ) {
????? # taxa is a hash with three keys ScientificName, TaxId, and Rank
????? # I'm only saving the ScientificName but possible extensions to this
????? # subroutine would be to return the TaxId and Rank as well.
????? push (@lineage, $taxa->{ScientificName});
?? }

?? # add the Species to the end of the Lineage array.
?? push (@lineage, $data->{Taxon}->{ScientificName});

?? return wantarray ? return @lineage : join("; ", @lineage);
}

Paul Cantalupo
University of Pittsburgh


On Mon, Sep 20, 2010 at 4:04 AM, Jelle Scholtalbers
<j.scholtalbers at gmail.com> wrote:
>
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working. ?I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> wrote:
>
> > Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> > Eric
> >
> > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu>
> > wrote:
> > > Sounds like this is going through an initial indexing step (for
> > flatfiles). ?I would expect the initial indexing of the tables to take time
> > as you have to create the DB, but subsequent lookups post-indexing should be
> > much faster if the index is already present. ?Maybe Jason could answer in
> > more detail?
> > >
> > > chris
> > >
> > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
> > >
> > >> Hello,
> > >>
> > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> > >> 5.8.5 with BioPerl 1.6.0
> > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> > >>
> > >> It ran for 100 cpu seconds and output:
> > >>
> > >> 33090 Viridiplantae kingdom
> > >>
> > >> I was expecting it to also output the descendents. Some questions:
> > >>
> > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> > >> implemented? It looks to be in Taxon.pm but it is not documented and
> > >> when I ran Data::Dumper on $node the value '_desc' was empty.
> > >>
> > >> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> > >> with a call to 'entrez' it took only 0.02 cpu seconds to come
> > >> up with the same result.
> > >>
> > >> thanks,
> > >> Eric
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Sep 20 11:38:36 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 20 Sep 2010 08:38:36 -0700
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <4C977FFC.5000205@bioperl.org>


This works for me to get all the descendents from sub-node. You have to 
call the function with the dabatase handle.   I am not sure if the Taxon 
implementation has reference to the dbhandle or not:
#!/usr/bin/perl -w
use strict;
use Bio::DB::Taxonomy;
my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump 
into this directory
my $db = Bio::DB::Taxonomy->new(-source => 'flatfile',
                                 -nodesfile => "$dbdir/nodes.dmp",
                                 -namesfile => "$dbdir/names.dmp",
                                 );
my $taxa = $db->get_taxon(-taxonid => 151341);
my @d = $db->get_all_Descendents($taxa);

print join("\n", map { $_->id . " " . $_->rank . " " . 
$_->scientific_name } @d), "\n";


Hope that helps.
Jelle Scholtalbers wrote, On 9/20/10 1:04 AM:
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working.  I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins<rec3141 at mcmaster.ca>  wrote:
>
>    
>> Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
>> Eric
>>
>> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields<cjfields at illinois.edu>
>> wrote:
>>      
>>> Sounds like this is going through an initial indexing step (for
>>>        
>> flatfiles).  I would expect the initial indexing of the tables to take time
>> as you have to create the DB, but subsequent lookups post-indexing should be
>> much faster if the index is already present.  Maybe Jason could answer in
>> more detail?
>>      
>>> chris
>>>
>>> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
>>>
>>>        
>>>> Hello,
>>>>
>>>> I tried the Bio::DB::Taxonomy example on this wiki page using perl
>>>> 5.8.5 with BioPerl 1.6.0
>>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>>>>
>>>> It ran for 100 cpu seconds and output:
>>>>
>>>> 33090 Viridiplantae kingdom
>>>>
>>>> I was expecting it to also output the descendents. Some questions:
>>>>
>>>> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
>>>> implemented? It looks to be in Taxon.pm but it is not documented and
>>>> when I ran Data::Dumper on $node the value '_desc' was empty.
>>>>
>>>> 2) is the flatfile reader always so slow? after replacing 'flatfile'
>>>> with a call to 'entrez' it took only 0.02 cpu seconds to come
>>>> up with the same result.
>>>>
>>>> thanks,
>>>> Eric
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>          
>>>        
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>      
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>    


From j.scholtalbers at gmail.com  Wed Sep 22 03:46:35 2010
From: j.scholtalbers at gmail.com (Jelle Scholtalbers)
Date: Wed, 22 Sep 2010 09:46:35 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <4C977FFC.5000205@bioperl.org>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
	<4C977FFC.5000205@bioperl.org>
Message-ID: <AANLkTi=nbAJcCs6a2F3pacUaxZE3JZ690dHTasLGug_a@mail.gmail.com>

Hi Jason,

this was the same method I was using. With the taxdump it works apparently,
however it does not work with Entrez as source. So I will just stick to a
up2date taxdump then. Thanks for your example.

@Paul: Your method gives indeed the lineage but will only retrieve the
ancestors. I want to retrieve all the descendents. Thx anyway.

Cheers,
Jelle

On Mon, Sep 20, 2010 at 5:38 PM, Jason Stajich <jason at bioperl.org> wrote:

>
> This works for me to get all the descendents from sub-node. You have to
> call the function with the dabatase handle.   I am not sure if the Taxon
> implementation has reference to the dbhandle or not:
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::Taxonomy;
> my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into
> this directory
> my $db = Bio::DB::Taxonomy->new(-source => 'flatfile',
>                                 -nodesfile => "$dbdir/nodes.dmp",
>                                 -namesfile => "$dbdir/names.dmp",
>                                 );
> my $taxa = $db->get_taxon(-taxonid => 151341);
> my @d = $db->get_all_Descendents($taxa);
>
> print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name
> } @d), "\n";
>
>
> Hope that helps.
> Jelle Scholtalbers wrote, On 9/20/10 1:04 AM:
>
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working.  I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> <rec3141 at mcmaster.ca> wrote:
>
>
>
>  Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> Eric
>
> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu> <cjfields at illinois.edu>
> wrote:
>
>
>  Sounds like this is going through an initial indexing step (for
>
>
>  flatfiles).  I would expect the initial indexing of the tables to take time
> as you have to create the DB, but subsequent lookups post-indexing should be
> much faster if the index is already present.  Maybe Jason could answer in
> more detail?
>
>
>  chris
>
> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
>
>
>
>  Hello,
>
> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> 5.8.5 with BioPerl 1.6.0http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>
> It ran for 100 cpu seconds and output:
>
> 33090 Viridiplantae kingdom
>
> I was expecting it to also output the descendents. Some questions:
>
> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> implemented? It looks to be in Taxon.pm but it is not documented and
> when I ran Data::Dumper on $node the value '_desc' was empty.
>
> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> with a call to 'entrez' it took only 0.02 cpu seconds to come
> up with the same result.
>
> thanks,
> Eric
> _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>         _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>      _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From waldenhe at muohio.edu  Fri Sep 24 15:15:48 2010
From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene)
Date: Fri, 24 Sep 2010 15:15:48 -0400
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>

Hello Bioperl Masters,

I am trying to perform a local blast with a query list of fasta files against a db of other fasta files.  I am attempting to use the Bio::Tools::Run::StandAloneBlastPlus module.  I have downleaded from the NCBI website BLAST+ 2.2.24+ and installed on my ubuntu machine.  I am using bioperl-1.5.2.

so the snibbit of code that is giving me errors is below:

my $seq_obj = Bio::Seq->new(-id =>$accn, -seq =>$seq);
my $report_obj = $blast_obj->blastall($seq_obj);
my $result_obj = $report_obj->next_result;
print $result_obj->num_hits;

The error I am getting is:

--------------------- WARNING ---------------------
MSG: cannot find path to blastall
---------------------------------------------------
Can't call method "next_result" on an undefined value at /media/C8B3-4A4A/Bioinformatics 1.1 beta/BioPerl/bioperl.pm line 284.

I think the real problem is the "cannot find path to Blastall.

>From reading around on different forums I have to make  a  .ncbirc text file with the location of BLAST+2.2.24+ on my machine.  I have that file in my /home folder.

How do I get  StandAloneBlastPlus synced up with BLAST+2.2.24+ ?  Am I approaching this right?

Thankyou,

Hans Waldenmaier


From ross at cuhk.edu.hk  Sat Sep 25 04:30:39 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Sat, 25 Sep 2010 16:30:39 +0800
Subject: [Bioperl-l] perl for GO
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <015201cb5c8b$ef693490$ce3b9db0$@edu.hk>


Given a set of GO IDs, e.g. 

GO:0008150
GO:0005750
GO:0006122
GO:0008121
GO:0003674
GO:0005575
GO:0008150
GO:0009507
GO:0009535
GO:0009567
GO:0009977
GO:0010027
GO:0031361

from
http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo

one can manually examine the hierarchy. Although there is go-perl
(http://search.cpan.org/~cmungall/go-perl/) and go-db-perl
(http://search.cpan.org/~cmungall/go-db-perl/), as a life science student
who just learns Perl, I find it difficult to draw a hierarchy tree (or
simply make it a table to count the occurrence) to produce something like:


biological_process (4)
*** cellular process (4)
****** cell adhesion (1)
****** cell differention (3)
Molecular function (4)
<snip>
Cellular component (4)
<snip>

Can anybody advise? I don't need any fancy figures at all...


From David.Messina at sbc.su.se  Sun Sep 26 12:11:54 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 26 Sep 2010 18:11:54 +0200
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>
References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>
Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0@sbc.su.se>

Hi Hans,


> I think the real problem is the "cannot find path to Blastall.

Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.

See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.

Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.


Dave


From maj at fortinbras.us  Sun Sep 26 20:43:15 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 27 Sep 2010 00:43:15 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W479062635438381285548195@webmail34>

Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:

git clone http://github.com/bioperl/bioperl-live.git
git clone http://github.com/bioperl/bioperl-run.git

(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl

cheers MAJ

--------------------------
Mark A. Jensen, PhD
Senior Consultant
Fortinbras Research
http://www.fortinbras.us

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Sunday, September 26, 2010 12:11 PM
>To: 'Waldenmaier, Hans Eugene'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans,
>
>
>> I think the real problem is the "cannot find path to Blastall.
>
>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>
>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Sep 27 17:07:11 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 27 Sep 2010 16:07:11 -0500
Subject: [Bioperl-l] Client-side Scansite Bioperl module
In-Reply-To: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
References: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
Message-ID: <D8C2E980-BEB9-4231-8014-7318778CD215@illinois.edu>

Sorry, didn't see this being responded to on-list (been off the radar the last month).  I think this is a good idea, but I'm wondering if this might be better as a separate release on CPAN from bioperl core, seeing as we're in the prelim stages after the next bioperl release of modularizing the current bioperl core into smaller independent releases.

chris

On Sep 4, 2010, at 10:40 AM, Jonathan Rameseder wrote:

> hi guys
> 
> it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases.  maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D!
> 
> best wishes
> johnny
> 
> [1] Bio::Tools::Analysis::Protein::Scansite
> [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593
> 
> ********************
> Jonathan Rameseder
> Ph.D. Candidate
> Computational Systems Biology Initiative
> Koch Institute for Integrative Cancer Research
> Massachusetts Institute of Technology
> ********************


From gandipalem at gmail.com  Tue Sep 28 00:09:06 2010
From: gandipalem at gmail.com (bv s)
Date: Tue, 28 Sep 2010 09:39:06 +0530
Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19
In-Reply-To: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
References: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
Message-ID: <AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>

Dear Sir/Madam,

Any one can tell how to use the make_primers.pl script?
What is Coordination file?

Regards
Suresh
Scholar,
National Bureau Of Plant Genetic Resources,
New Delhi.


On Mon, Sep 27, 2010 at 9:30 PM, <bioperl-l-request at lists.open-bio.org>wrote:

> Send Bioperl-l mailing list submissions to
>        bioperl-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> or, via email, send a message with subject or body 'help' to
>        bioperl-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
>        bioperl-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioperl-l digest..."
>
>
> Today's Topics:
>
>   1. Re:  StandAloneBlastPlus (Dave Messina)
>   2. Re:  StandAloneBlastPlus (Mark A. Jensen)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 26 Sep 2010 18:11:54 +0200
> From: Dave Messina <David.Messina at sbc.su.se>
> Subject: Re: [Bioperl-l] StandAloneBlastPlus
> To: "Waldenmaier, Hans Eugene" <waldenhe at muohio.edu>
> Cc: "bioperl-l at bioperl.org" <bioperl-l at bioperl.org>
> Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0 at sbc.su.se>
> Content-Type: text/plain; charset=us-ascii
>
> Hi Hans,
>
>
> > I think the real problem is the "cannot find path to Blastall.
>
> Yes. But it sounds like you're trying to use the Bio::Tools::Run modules
> for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it
> has blastn, blastp, etc.
>
> See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
> Also, you probably need to upgrade your BioPerl installation. I'm pretty
> sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
> Dave
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 27 Sep 2010 00:43:15 +0000
> From: "Mark A. Jensen" <maj at fortinbras.us>
> Subject: Re: [Bioperl-l] StandAloneBlastPlus
> To: "Dave Messina" <David.Messina at sbc.su.se>,   "Waldenmaier, Hans
>        Eugene" <waldenhe at muohio.edu>
> Cc: bioperl-l at bioperl.org
> Message-ID: <W479062635438381285548195 at webmail34>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Hans-- Dave is right; you'll need both the new blast+ as well as the
> latest BioPerl trunk code. Get it by doing both of the following:
>
> git clone http://github.com/bioperl/bioperl-live.git
> git clone http://github.com/bioperl/bioperl-run.git
>
> (i.e., you need the latest core and run distributions). To install, see
> http://www.bioperl.org/wiki/Installing_BioPerl
>
> cheers MAJ
>
> --------------------------
> Mark A. Jensen, PhD
> Senior Consultant
> Fortinbras Research
> http://www.fortinbras.us
>
> >-----Original Message-----
> >From: Dave Messina [mailto:David.Messina at sbc.su.se]
> >Sent: Sunday, September 26, 2010 12:11 PM
> >To: 'Waldenmaier, Hans Eugene'
> >Cc: bioperl-l at bioperl.org
> >Subject: Re: [Bioperl-l] StandAloneBlastPlus
> >
> >Hi Hans,
> >
> >
> >> I think the real problem is the "cannot find path to Blastall.
> >
> >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules
> for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it
> has blastn, blastp, etc.
> >
> >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
> >
> >Also, you probably need to upgrade your BioPerl installation. I'm pretty
> sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
> >
> >
> >
> >Dave
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> End of Bioperl-l Digest, Vol 89, Issue 19
> *****************************************
>


From David.Messina at sbc.su.se  Tue Sep 28 03:53:29 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 28 Sep 2010 09:53:29 +0200
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>
References: <W479062635438381285548195@webmail34>
	<23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>
Message-ID: <0BFD9DB0-40D9-4443-8968-CF5D5A31BD02@sbc.su.se>

> I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  

Type the following on the command line:
perl -e 'print $ENV{PATH}, "\n"'

You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing 
export /home/hans/BLAST/bin:PATH=${PATH}

on the command line and then type
perl -e 'print $ENV{PATH}, "\n"'

again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script?


> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
> export PATH=${PATH}:/home/hans/BLAST/bin
> export BLASTDIR=/home/hans/BLAST/
> 
> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.

It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added.

It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you.


Dave


From David.Messina at sbc.su.se  Tue Sep 28 03:58:00 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 28 Sep 2010 09:58:00 +0200
Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19
In-Reply-To: <AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>
References: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
	<AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>
Message-ID: <6BACC902-4F5E-466B-B949-FE373831CB92@sbc.su.se>

> Any one can tell how to use the make_primers.pl script?
> What is Coordination file?

>From the documentation at the top of the script:

Description: This program designs primers for constructing knockouts
of genes by transformation of PCR products (ref: Datsenko & Wanner,
PNAS 2000).

A tab-delimited file containing ORF START STOP is read, and primers
flanking the start & stop coordinates are designed based on the
user-designated sequence file.

In addition, primers flanking the knockout regions are chosen for PCR
screening purposes once the knockout is generated.  The script uses
Bioperl in order to determine the primer sequences, which requires
getting subsequences and reverse complementing some of the objects.


Dave


From maj at fortinbras.us  Tue Sep 28 07:18:34 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 28 Sep 2010 11:18:34 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W726055216153101285672714@webmail57>

The module checks the env variable BLASTPLUSDIR for the executable; 
you can set it directly

export BLASTPLUSDIR=/home/hans/BLAST/bin

and you should be good to go.
MAJ

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Tuesday, September 28, 2010 03:53 AM
>To: 'Waldenmaier, Hans Eugene'
>Cc: 'Mark A. Jensen', bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>> I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
>
>Type the following on the command line:
>perl -e 'print $ENV{PATH}, "\n"'
>
>You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing 
>export /home/hans/BLAST/bin:PATH=${PATH}
>
>on the command line and then type
>perl -e 'print $ENV{PATH}, "\n"'
>
>again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script?
>
>
>> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
>> export PATH=${PATH}:/home/hans/BLAST/bin
>> export BLASTDIR=/home/hans/BLAST/
>> 
>> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.
>
>It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added.
>
>It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you.
>
>
>Dave
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From waldenhe at muohio.edu  Tue Sep 28 00:52:56 2010
From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene)
Date: Tue, 28 Sep 2010 00:52:56 -0400
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <W479062635438381285548195@webmail34>
References: <W479062635438381285548195@webmail34>
Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>

Thanks Guys,

I have run those steps, my current version now is:
hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
1.006001

But I am still having problems.

I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI.  I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now.  
I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
export PATH=${PATH}:/home/hans/BLAST/bin
export BLASTDIR=/home/hans/BLAST/

Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.

Thanks for the help,

Hans
________________________________________
From: Mark A. Jensen [maj at fortinbras.us]
Sent: Sunday, September 26, 2010 8:43 
To: Dave Messina; Waldenmaier, Hans Eugene
Cc: bioperl-l at bioperl.org
Subject: Re:  [Bioperl-l] StandAloneBlastPlus

Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:

git clone http://github.com/bioperl/bioperl-live.git
git clone http://github.com/bioperl/bioperl-run.git

(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl

cheers MAJ

--------------------------
Mark A. Jensen, PhD
Senior Consultant
Fortinbras Research
http://www.fortinbras.us

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Sunday, September 26, 2010 12:11 PM
>To: 'Waldenmaier, Hans Eugene'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans,
>
>
>> I think the real problem is the "cannot find path to Blastall.
>
>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>
>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Tue Sep 28 11:04:07 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 28 Sep 2010 15:04:07 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W8604325546183031285686247@webmail66>

Should work from .bashrc, Hans. Also add  

export BLASTPLUSDIR=/home/hans/BLAST/bin

It really should see it in the PATH as you have it, so that may be a bug; however the BLASTPLUSDIR should force it to see the program.

You can also execute the export commands in the shell, and the variables will be set and visible to programs for the duration of the login session. You can see what they are set to in the shell by doing

set | grep BLAST

cheers MAJ

>-----Original Message-----
>From: Waldenmaier, Hans Eugene [mailto:waldenhe at muohio.edu]
>Sent: Tuesday, September 28, 2010 12:52 AM
>To: 'Mark A. Jensen', 'Dave Messina'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Thanks Guys,
>
>I have run those steps, my current version now is:
>hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
>1.006001
>
>But I am still having problems.
>
>I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI.  I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
>Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now.  
>I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
>export PATH=${PATH}:/home/hans/BLAST/bin
>export BLASTDIR=/home/hans/BLAST/
>
>Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.
>
>Thanks for the help,
>
>Hans
>________________________________________
>From: Mark A. Jensen [maj at fortinbras.us]
>Sent: Sunday, September 26, 2010 8:43 
>To: Dave Messina; Waldenmaier, Hans Eugene
>Cc: bioperl-l at bioperl.org
>Subject: Re:  [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:
>
>git clone http://github.com/bioperl/bioperl-live.git
>git clone http://github.com/bioperl/bioperl-run.git
>
>(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl
>
>cheers MAJ
>
>--------------------------
>Mark A. Jensen, PhD
>Senior Consultant
>Fortinbras Research
>http://www.fortinbras.us
>
>>-----Original Message-----
>>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>>Sent: Sunday, September 26, 2010 12:11 PM
>>To: 'Waldenmaier, Hans Eugene'
>>Cc: bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>>
>>Hi Hans,
>>
>>
>>> I think the real problem is the "cannot find path to Blastall.
>>
>>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>>
>>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>>
>>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>>
>>
>>
>>Dave
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From chiragmatkarbioinfo at gmail.com  Thu Sep 30 08:20:35 2010
From: chiragmatkarbioinfo at gmail.com (chirag matkar)
Date: Thu, 30 Sep 2010 19:20:35 +0700
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
Message-ID: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>

Hello all,
Is there any module to fetch dna sequence data from ensemble gene id?

-- 
Regards,
Chirag Matkar


From jun.yin at ucd.ie  Thu Sep 30 09:36:31 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Thu, 30 Sep 2010 14:36:31 +0100
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
In-Reply-To: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
References: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
Message-ID: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>

Hi, Chirag,

BioPerl does not have any module to retrieve data from Ensembl. But Ensembl
provides a BioPerl-like interface on that function.

You can visit Ensembl's website on how to use that module:
http://www.ensembl.org/info/data/api.html

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar
Sent: Thursday, September 30, 2010 1:21 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id

Hello all,
Is there any module to fetch dna sequence data from ensemble gene id?

-- 
Regards,
Chirag Matkar
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 
 
__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Thu Sep 30 11:16:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 30 Sep 2010 10:16:45 -0500
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
In-Reply-To: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>
References: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
	<011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>
Message-ID: <E4D361DE-CEB6-45CC-83EE-6547C497ABE3@illinois.edu>

On Sep 30, 2010, at 8:36 AM, Jun Yin wrote:

> Hi, Chirag,
> 
> BioPerl does not have any module to retrieve data from Ensembl. But Ensembl
> provides a BioPerl-like interface on that function.

Actually, BioPerl does have Bio::Tools::Run::Ensembl, which was submitted by Sendu Bala a few years back.  I think it stills works rather well, at least tests pass.  You might get more out of using the Ensembl API directly as Jun states though, YMMV.

BTW, the ensembl API also works with the latest bioperl code, regardless what the Ensembl website says (e.g. they only support v1.2.3).  Haven't heard more about whether this discrepancy was supposed to be addressed at some point.

chris

> You can visit Ensembl's website on how to use that module:
> http://www.ensembl.org/info/data/api.html
> 
> Cheers,
> Jun Yin
> Ph.D. student in U.C.D.
> 
> Bioinformatics Laboratory
> Conway Institute
> University College Dublin
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar
> Sent: Thursday, September 30, 2010 1:21 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
> 
> Hello all,
> Is there any module to fetch dna sequence data from ensemble gene id?
> 
> -- 
> Regards,
> Chirag Matkar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From A.Vakhrusheva at lumc.nl  Wed Sep 29 09:28:54 2010
From: A.Vakhrusheva at lumc.nl (A.Vakhrusheva at lumc.nl)
Date: Wed, 29 Sep 2010 15:28:54 +0200
Subject: [Bioperl-l] Bio::Matrix::MatrixI
Message-ID: <35D95AF6C5D146479C328BBBA554FB76028C367E@mailf.lumcnet.prod.intern>

Bio::Matrix::MatrixI

I have a question concerning this interface.

I want to calculate p distances matrix, but what format is acceptable
for input? Phylip doesn't work

Anna 


From shalabh.sharma7 at gmail.com  Wed Sep  1 20:56:35 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Wed, 1 Sep 2010 16:56:35 -0400
Subject: [Bioperl-l] Bio::SearchIO::hmmer
Message-ID: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>

Hi ,
     I am trying to parse hmmsearch report (from HMMER3). I am using the
script mentioned here:
http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm

I am not getting anything but this "amoA_10genes_align.fasta.2  [M=247] for
HMM" as the output, i am not even getting any error.
I am attaching the hmmsearch report (just a test report) which i tried to
test against the parser.

I would really appreciate if anyone can help me out.

Thanks
Shalabh Sharma
-------------- next part --------------
# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  amoA_10genes.hmm
# target sequence database:        test.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       amoA_10genes_align.fasta.2  [M=247]
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                   Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                   -----------
    1.6e-72  231.1   5.1    1.7e-72  231.0   3.5    1.0  1  gi|63021979|gb|AAY26564.1| AmoA [uncultured beta proteobacte
    1.6e-72  231.1   5.1    1.7e-72  231.0   3.5    1.0  1  gi|63021981|gb|AAY26565.1| AmoA [uncultured beta proteobacte


Domain annotation for each sequence (and alignments):
>> gi|63021979|gb|AAY26564.1|  AmoA [uncultured beta proteobacterium]
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  231.0   3.5   1.7e-72   1.7e-72     113     245 ..       1     144 [.       1     146 [. 0.95

  Alignments for each domain:
  == domain 1    score: 231.0 bits;  conditional E-value: 1.7e-72
  amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195
                                 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe
  gi|63021979|gb|AAY26564.1|   1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 
                                 8********************************************************************************** PP

  amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245
                                 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af           +k+
  gi|63021979|gb|AAY26564.1|  84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144
                                 **********************************************966666666655555 PP

>> gi|63021981|gb|AAY26565.1|  AmoA [uncultured beta proteobacterium]
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  231.0   3.5   1.7e-72   1.7e-72     113     245 ..       1     144 [.       1     146 [. 0.95

  Alignments for each domain:
  == domain 1    score: 231.0 bits;  conditional E-value: 1.7e-72
  amoA_10genes_align.fasta.2 113 lyPinfvlpsvllPsallldavlalkrnklvtalvGGglfGlllypgnwplfgavhlllvaegvllsladyvgfkyvrtgtPe 195
                                 +yPinfv+ps+++P+al++d+v++l+rn+++talvGGg+fGll+ypgnwp+fg++hl+lvaegvllslady+gf+yvrtgtPe
  gi|63021981|gb|AAY26565.1|   1 HYPINFVFPSTMIPGALIMDTVMLLTRNWMITALVGGGAFGLLFYPGNWPIFGPTHLPLVAEGVLLSLADYTGFLYVRTGTPE 83 
                                 8********************************************************************************** PP

  amoA_10genes_align.fasta.2 196 yvrliekgslrtfgkstvaiaaffsafvsvlmfavwaylgklyskaf...........kkd 245
                                 yvrlie+gslrtfg++t++iaaffsafvs+lmf+vw+y+gkly++af           +k+
  gi|63021981|gb|AAY26565.1|  84 YVRLIEQGSLRTFGGHTTVIAAFFSAFVSMLMFCVWWYFGKLYCTAFyyvkgprgrvtMKN 144
                                 **********************************************966666666655555 PP


Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (247 nodes)
Target sequences:                          2  (300 residues)
Passed MSV filter:                         2  (1); expected 0.0 (0.02)
Passed bias filter:                        2  (1); expected 0.0 (0.02)
Passed Vit filter:                         2  (1); expected 0.0 (0.001)
Passed Fwd filter:                         2  (1); expected 0.0 (1e-05)
Initial search space (Z):                  2  [actual number of targets]
Domain search space  (domZ):               2  [number of targets reported over threshold]
# CPU time: 0.03u 0.00s 00:00:00.03 Elapsed: 00:00:00.08
# Mc/sec: 0.93
//

From thomas.sharpton at gmail.com  Wed Sep  1 21:29:26 2010
From: thomas.sharpton at gmail.com (Thomas Sharpton)
Date: Wed, 1 Sep 2010 14:29:26 -0700
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
Message-ID: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>

Hi Shalabh,

We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to  
use the HMMER3 version, as found here:

http://github.com/bioperl/bioperl-hmmer3

Hope this helps,
T

On Sep 1, 2010, at 1:56 PM, shalabh sharma wrote:

> Hi ,
>     I am trying to parse hmmsearch report (from HMMER3). I am using  
> the
> script mentioned here:
> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO/hmmer.pm
>
> I am not getting anything but this "amoA_10genes_align.fasta.2   
> [M=247] for
> HMM" as the output, i am not even getting any error.
> I am attaching the hmmsearch report (just a test report) which i  
> tried to
> test against the parser.
>
> I would really appreciate if anyone can help me out.
>
> Thanks
> Shalabh Sharma
> <test_amao.txt>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From kai.blin at biotech.uni-tuebingen.de  Thu Sep  2 08:44:58 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 2 Sep 2010 10:44:58 +0200
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
	<8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
Message-ID: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>

On Wed, 1 Sep 2010 14:29:26 -0700
Thomas Sharpton <thomas.sharpton at gmail.com> wrote:

Hi,

> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to  
> use the HMMER3 version, as found here:
> 
> http://github.com/bioperl/bioperl-hmmer3

Actually it's now included in the bioperl-live repository, but the code
hasn't made it into a release yet.

http://github.com/bioperl/bioperl-live.git

Cheers,
Kai
-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-University of T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Deutschland
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From e.stupka at ucl.ac.uk  Thu Sep  2 12:32:02 2010
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Thu, 2 Sep 2010 13:32:02 +0100
Subject: [Bioperl-l] git account
Message-ID: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>

Hello there,

I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any?

thanks!

Elia


---
'"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." 
~ Stephen Hawkings

Senior Lecturer, Bioinformatics
Scientific Director - Bioinformatics, UCL Genomics

UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Institute of Cell and Molecular Science
Barts and The London School of Medicine and Dentistry
4 Newark Street
Whitechapel
London
E1 2AT

Office (UCL): +44 207 679 6493
Fax: +44 0207 6796817 
Office (ICMS): +44 0207 8822374

Mobile: +44 787 6478912


From cjfields at illinois.edu  Thu Sep  2 14:29:40 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 2 Sep 2010 09:29:40 -0500
Subject: [Bioperl-l] git account
In-Reply-To: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>
References: <5FFE2F0F-F20F-4461-A439-63C929897158@ucl.ac.uk>
Message-ID: <C9AD72E7-32C2-48D0-A7AB-18F42BABD28D@illinois.edu>

Done!  Let us know if you run into problems.

chris

On Sep 2, 2010, at 7:32 AM, Elia Stupka wrote:

> Hello there,
> 
> I wanted to poke around our old BioPipe code, could you add my Git account (estupka) so that I can commit some updates if I make any?
> 
> thanks!
> 
> Elia
> 
> 
> ---
> '"We only have to look at ourselves to see how intelligent life might develop into something we wouldn't want to meet." 
> ~ Stephen Hawkings
> 
> Senior Lecturer, Bioinformatics
> Scientific Director - Bioinformatics, UCL Genomics
> 
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Institute of Cell and Molecular Science
> Barts and The London School of Medicine and Dentistry
> 4 Newark Street
> Whitechapel
> London
> E1 2AT
> 
> Office (UCL): +44 207 679 6493
> Fax: +44 0207 6796817 
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 787 6478912
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From J.Christopher.Ellis at duke.edu  Thu Sep  2 14:53:34 2010
From: J.Christopher.Ellis at duke.edu (J. Christopher Ellis)
Date: Thu,  2 Sep 2010 10:53:34 -0400
Subject: [Bioperl-l] Taxonomy DB problem
Message-ID: <53096.1283439214@duke.edu>

 Chris have you had any luck with this?

 Thanks,
 Chris

 On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent:
 Yes, I see that one. It may be the ID hash that is being returned is
empty. I'll look into it.

 -c 

 On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:

 > Hi Chris,
 > 
 > The error is...
 > 
 > "Use of uninitialized value $id in join or string at
C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm line 363."
 > 
 > The script from
http://bioperl.org/wiki/Species_names_from_accession_numbers is as
follows....
 > 
 > use Bio::DB::EUtilities;
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > my (%taxa, @taxa);
 > 
 > 
 > 
 > my (%names, %idmap);
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # these are protein ids; nuc ids will work by changing -dbfrom =>
'nucleotide',
 > 
 > 
 > 
 > # (probably)
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > my @ids = qw(1621261 89318838 68536103 
 > 
 > 20807972
 > 730439);
 > 
 > 
 > 
 > 
 > 
 > 
 > my $factory = Bio::DB::EUtilities->new(
 > 
 > -
 > eutil => 'elink',
 > 
 > 
 > -db => 'taxonomy',
 > 
 > 
 > 
 > 
 > -dbfrom => 'protein',
 > 
 > 
 > 
 > 
 > -correspondence => 1,
 > 
 > 
 > 
 > 
 > -id => @ids);
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # iterate through the LinkSet objects
 > 
 > 
 > 
 > while (my $ds = $factory->next_LinkSet) {
 > 
 > 
 > 
 > 
 > $taxa{($ds->get_submitted_ids)[0]
 > 
 > }
 > = ($ds->get_ids)[0]
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > @taxa = @taxa{@ids};
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > $factory = Bio::DB::EUtilities->new(-eutil 
 > 
 > =>
 > 'esummary',
 > 
 > 
 > -db => 'taxonomy',
 > 
 > 
 > 
 > 
 > -id => @taxa );
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > while (local $_ = $factory->next_DocSum)
 > 
 > 
 > {
 > 
 > 
 > $names{($_->get_contents_by_name('TaxId'))
 > 
 > [
 > 0]} = 
 > 
 > ($_->get_contents_by_name('ScientificName'))[0
 > 
 > ]
 > ;
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > foreach (@ids) {
 > 
 > 
 > 
 > 
 > $idmap{$_} = $names{$taxa{$_
 > 
 > }
 > };
 > 
 > }
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > # %idmap is
 > 
 > 
 > 
 > # 1621261 => 'Mycobacterium tuberculosis H37Rv'
 > 
 > 
 > 
 > # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
 > 
 > 
 > 
 > # 68536103 => 'Corynebacterium jeikeium K411'
 > 
 > 
 > 
 > # 730439 => 'Bacillus caldolyticus'
 > 
 > 
 > 
 > # 89318838 => undef (this record has been removed from the db)
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 
 > 1;
 > 
 > 
 > Thanks,
 > 
 > 
 > 
 > Chris
 > 
 > 
 > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu sent:
 > Chris,
 > 
 > Regarding a fix for that script, we would have to see your modified
script and the error. However, there are modules within BioPerl to
essentially do what you want, in particular, Bio::DB::Taxonomy.
 > 
 > chris
 > 
 > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
 > 
 > > Hi All,
 > > 
 > > I am trying to extract the entire taxonomy of an organism including
the
 > > classifications. Some thing like...
 > > 
 > > Phylum:Proteobacteria, Class:Gammaproteobacteria,
Order:Enterobacteriales, Family:Enterobacteriaceae, Genus:Escherichia
 > > 
 > > I am not worried about format just that I get the information and the
associated level of hierarchy. The script found
athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers
seemed like a good starting point so I copied it and tried run it but got
an error.
 > > 
 > > My first question is "Is there a known fix for this?" and my second
question is how do I get the full hierarchical information (as seen above)
with the taxonomy db?
 > > 
 > > Thanks for all your help in advance!
 > > 
 > > Chris 
 > > 
 > > 
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l at lists.open-bio.org
 > >
http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 

 
From cjfields at illinois.edu  Thu Sep  2 16:21:48 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 02 Sep 2010 11:21:48 -0500
Subject: [Bioperl-l] Taxonomy DB problem
In-Reply-To: <53096.1283439214@duke.edu>
References: <53096.1283439214@duke.edu>
Message-ID: <1283444508.5339.10.camel@pyrimidine.igb.uiuc.edu>

Chris,

There are a few things wrong with the original script, so I'll fix them.
Basically, it makes the assumption that every ID in the original list is
found.  The problem: eutils only reports back data it finds, silently
discarding IDs that don't match.  So, using the original ID list when
building the hashes needs a bit more error checking.

Here's the revised script that works for me.

https://gist.github.com/f5db90a432fed68548d4

I'm also adding a check to ensure all IDs are defined prior to adding
them to the param string, just in case.

chris

On Thu, 2010-09-02 at 10:53 -0400, J. Christopher Ellis wrote:
> Chris have you had any luck with this?
> 
> Thanks,
> Chris
> 
> On Tue 08/31/10 11:01 , "Chris Fields" cjfields at illinois.edu sent:
>         Yes, I see that one. It may be the ID hash that is being
>         returned is empty. I'll look into it.
>         
>         -c 
>         
>         On Aug 31, 2010, at 6:57 AM, J. Christopher Ellis wrote:
>         
>         > Hi Chris,
>         > 
>         > The error is...
>         > 
>         > "Use of uninitialized value $id in join or string at
>         C:/Perl64/site/lib/Bio/Tools/EUtilities/EUtilParameters.pm
>         line 363."
>         > 
>         > The script from
>         http://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers is as follows....
>         > 
>         > use Bio::DB::EUtilities;
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my (%taxa, @taxa);
>         > 
>         > 
>         > 
>         > my (%names, %idmap);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # these are protein ids; nuc ids will work by changing
>         -dbfrom => 'nucleotide',
>         > 
>         > 
>         > 
>         > # (probably)
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my @ids = qw(1621261 89318838 68536103 
>         > 
>         > 20807972
>         > 730439);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > my $factory = Bio::DB::EUtilities->new(
>         > 
>         > -
>         > eutil => 'elink',
>         > 
>         > 
>         > -db => 'taxonomy',
>         > 
>         > 
>         > 
>         > 
>         > -dbfrom => 'protein',
>         > 
>         > 
>         > 
>         > 
>         > -correspondence => 1,
>         > 
>         > 
>         > 
>         > 
>         > -id => \@ids);
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # iterate through the LinkSet objects
>         > 
>         > 
>         > 
>         > while (my $ds = $factory->next_LinkSet) {
>         > 
>         > 
>         > 
>         > 
>         > $taxa{($ds->get_submitted_ids)[0]
>         > 
>         > }
>         > = ($ds->get_ids)[0]
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > @taxa = @taxa{@ids};
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > $factory = Bio::DB::EUtilities->new(-eutil 
>         > 
>         > =>
>         > 'esummary',
>         > 
>         > 
>         > -db => 'taxonomy',
>         > 
>         > 
>         > 
>         > 
>         > -id => \@taxa );
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > while (local $_ = $factory->next_DocSum)
>         > 
>         > 
>         > {
>         > 
>         > 
>         > $names{($_->get_contents_by_name('TaxId'))
>         > 
>         > [
>         > 0]} = 
>         > 
>         > ($_->get_contents_by_name('ScientificName'))[0
>         > 
>         > ]
>         > ;
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > foreach (@ids) {
>         > 
>         > 
>         > 
>         > 
>         > $idmap{$_} = $names{$taxa{$_
>         > 
>         > }
>         > };
>         > 
>         > }
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > # %idmap is
>         > 
>         > 
>         > 
>         > # 1621261 => 'Mycobacterium tuberculosis H37Rv'
>         > 
>         > 
>         > 
>         > # 20807972 => 'Thermoanaerobacter tengcongensis MB4'
>         > 
>         > 
>         > 
>         > # 68536103 => 'Corynebacterium jeikeium K411'
>         > 
>         > 
>         > 
>         > # 730439 => 'Bacillus caldolyticus'
>         > 
>         > 
>         > 
>         > # 89318838 => undef (this record has been removed from the
>         db)
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 
>         > 1;
>         > 
>         > 
>         > Thanks,
>         > 
>         > 
>         > 
>         > Chris
>         > 
>         > 
>         > On Mon 08/30/10 09:36 , "Chris Fields" cjfields at illinois.edu
>         sent:
>         > Chris,
>         > 
>         > Regarding a fix for that script, we would have to see your
>         modified script and the error. However, there are modules
>         within BioPerl to essentially do what you want, in particular,
>         Bio::DB::Taxonomy.
>         > 
>         > chris
>         > 
>         > On Aug 30, 2010, at 7:55 AM, J. Christopher Ellis wrote:
>         > 
>         > > Hi All,
>         > > 
>         > > I am trying to extract the entire taxonomy of an organism
>         including the
>         > > classifications. Some thing like...
>         > > 
>         > > Phylum:Proteobacteria, Class:Gammaproteobacteria,
>         Order:Enterobacteriales, Family:Enterobacteriaceae,
>         Genus:Escherichia
>         > > 
>         > > I am not worried about format just that I get the
>         information and the associated level of hierarchy. The script
>         found
>         http://bioperl.org/wiki/Species_names_from_accession_numbers%
>         26quot%3B%26gt%
>         3Bhttp://bioperl.org/wiki/Species_names_from_accession_numbers">athttp://bioperl.org/wiki/Species_names_from_accession_numbers">http://bioperl.org/wiki/Species_names_from_accession_numbers seemed like a good starting point so I copied it and tried run it but got an error.
>         > > 
>         > > My first question is "Is there a known fix for this?" and
>         my second question is how do I get the full hierarchical
>         information (as seen above) with the taxonomy db?
>         > > 
>         > > Thanks for all your help in advance!
>         > > 
>         > > Chris 
>         > > 
>         > > 
>         > > _______________________________________________
>         > > Bioperl-l mailing list
>         > > Bioperl-l at lists.open-bio.org
>         > > http://lists.open-bio.org/mailman/listinfo/bioperl-l%
>         26quot%3B%26gt%
>         3Bhttp://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l">http://lists.open-bio.org/mailman/listinfo/bioperl-l
>         > 
>         > 
>         
>         


From thomas.sharpton at gmail.com  Thu Sep  2 16:34:07 2010
From: thomas.sharpton at gmail.com (Thomas Sharpton)
Date: Thu, 2 Sep 2010 09:34:07 -0700
Subject: [Bioperl-l] Bio::SearchIO::hmmer
In-Reply-To: <20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>
References: <AANLkTinLJvSJe1Mi1M0EPxpmdBXPTLaCpdu31W+WNL+K@mail.gmail.com>
	<8734BAC3-32EF-43B8-A531-8725A1FFA043@gmail.com>
	<20100902104458.127b0c42.kai.blin@biotech.uni-tuebingen.de>
Message-ID: <C272A232-C37D-40F9-B729-626B8DB17BB3@gmail.com>

So it is! I'm paying attention, I swear I am....

Shalabh, if the HMMER3 version of SearchIO doesn't solve your problem,  
do let us know.

Best,
Tom

On Sep 2, 2010, at 1:44 AM, Kai Blin wrote:

> On Wed, 1 Sep 2010 14:29:26 -0700
> Thomas Sharpton <thomas.sharpton at gmail.com> wrote:
>
> Hi,
>
>> We forked the SearchIO parser for hmmer3 and hmmer2. You'll want to
>> use the HMMER3 version, as found here:
>>
>> http://github.com/bioperl/bioperl-hmmer3
>
> Actually it's now included in the bioperl-live repository, but the  
> code
> hasn't made it into a release yet.
>
> http://github.com/bioperl/bioperl-live.git
>
> Cheers,
> Kai
> -- 
> Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
> Institute for Microbiology and Infection Medicine
> Division of Microbiology/Biotechnology
> Eberhard-Karls-University of T?bingen
> Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
> D-72076 T?bingen                        Fax :   ++49 7071 29-5979
> Deutschland
> Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From johnny at mit.edu  Sat Sep  4 15:40:37 2010
From: johnny at mit.edu (Jonathan Rameseder)
Date: Sat, 4 Sep 2010 11:40:37 -0400
Subject: [Bioperl-l] Client-side Scansite Bioperl module
Message-ID: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>

hi guys

it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases.  maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D!

best wishes
johnny

[1] Bio::Tools::Analysis::Protein::Scansite
[2] http://www.ncbi.nlm.nih.gov/pubmed/11283593

********************
Jonathan Rameseder
Ph.D. Candidate
Computational Systems Biology Initiative
Koch Institute for Integrative Cancer Research
Massachusetts Institute of Technology
********************


From David.Messina at sbc.su.se  Mon Sep  6 12:14:20 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 6 Sep 2010 14:14:20 +0200
Subject: [Bioperl-l] Client-side Scansite Bioperl module
In-Reply-To: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
References: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
Message-ID: <0EA1C4B0-66CF-4AE3-9A47-CC6624737821@sbc.su.se>

Hi Jonathan,

Great to hear you're interested in including your code in BioPerl!

In general, we are liberal in what we accept. I think (and I'd like to hear what other BioPerlers think) the value of adding your code depends a lot on how it ties in with existing BioPerl objects ? does it make use of Bio::Seq or Bio::SeqIO, for example?

If you haven't already, you might want to take a look at some of our developer documentation. For example:
 
	http://www.bioperl.org/wiki/Bioperl_Best_Practices
	http://www.bioperl.org/wiki/Advanced_BioPerl


Also, the other thing to be aware of is that in the near future BioPerl itself will be splitting up into separately distributed modules anyway.

I can't find a good recent thread that discussed the rationale and details, but here's a couple anyway:

	http://www.bioperl.org/wiki/Proposed_BioPerl_changes
	http://old.nabble.com/Final-BioPerl-1.6-release-td29180027.html#a29195208


Dave


From ross at cuhk.edu.hk  Tue Sep  7 08:28:00 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Tue, 7 Sep 2010 16:28:00 +0800
Subject: [Bioperl-l]  Indexing nr database
In-Reply-To: <C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
Message-ID: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>

By the following codes, I wanna index the 4G nr database, however, the index
file is > 1T and the job has been running for weeks and still hasn't
finished. Could anybody tell me how you accomplish the goal? Thanks in
advance.

    use strict;

     use Bio::DB::Flat::BinarySearch;

 
     (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV;

 
     # use single quotes so you don't have to write

     # regular expressions like "gi\\|(\\d+)"

     #my $primary_pattern = '^>(\S+)';

     #if ($fullHeader == 1) {

        my $primary_pattern = '^>(.+)';

     #}

     my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
H37Rv complete genome";
#$string =~ s/$primary_pattern/RRR/g;

     #print "$string\n";

 
     # one or more patterns stored in a hash:

     my $secondary_patterns = {GI => 'gi\|(\d+)'};

 
     my $db = Bio::DB::Flat::BinarySearch->new(

                           -directory          => $baseDir,

                           -dbname             => $dbName,

                           -write_flag         => 1,

                           -primary_pattern    => $primary_pattern,

                           -primary_namespace  => 'ACC',

                           -secondary_patterns => $secondary_patterns,

                           -verbose            => 1,

                           -format             => 'fasta'  );

 
     $db->build_index($seqFile);

                                              
From David.Messina at sbc.su.se  Tue Sep  7 09:23:42 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 7 Sep 2010 11:23:42 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
Message-ID: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>

Hi Ross,

What do you need the index for?

If it's random retrieval of sequences using an accession or GI, you'd be better off using NCBI's own database indexing and retrieval tools. They're far faster than BioPerl.

They're distributed with Blast+ and available here:

	ftp://ftp.ncbi.nlm.nih.gov//blast/executables/LATEST

Specifically, I'm talking about 'makeblastdb' and blastdbcmd'.


I'm not sure what you mean by "4g" nr, but there's an already-indexed version of nr available here:

		ftp://ftp.ncbi.nih.gov//blast/db

You can use that directly with the BLAST+ database tools.


Also, you take a look at the cookbook at the end of the Blast+ user manual (available in the same download directory as Blast+ itself). Some nice examples there showing off the flexibility of this latest version of the software.


Dave


From ross at cuhk.edu.hk  Tue Sep  7 09:18:16 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Tue, 7 Sep 2010 17:18:16 +0800
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <4C860148.3030000@fmi.ch>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch>
Message-ID: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>

The reason is that I have to retrieve the specific information of the
matched sequences, e.g. extract the 64th amino acid of the top matched
sequence. Is there any way to achieve that?

-----Original Message-----
From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch] 
Sent: Tuesday, September 07, 2010 5:09 PM
To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk
Subject: Re: [Bioperl-l] Indexing nr database

Hi


why don't you use the pre-indexed BLAST files from NCBI:

ftp://ftp.ncbi.nih.gov/blast/db/

you can use them to fetch individual sequences by gi number or accession 
with the tool "blastdbcmd" from blast+ binaries:

ftp://ftp.ncbi.nih.gov/blast/executables/blast+/


regards, Hans


On 09/07/2010 10:28 AM, Ross KK Leung wrote:
> By the following codes, I wanna index the 4G nr database, however, the
index
> file is>  1T and the job has been running for weeks and still hasn't
> finished. Could anybody tell me how you accomplish the goal? Thanks in
> advance.
>
>      use strict;
>
>       use Bio::DB::Flat::BinarySearch;
>
>
>
>       (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) =
@ARGV;
>
>
>
>       # use single quotes so you don't have to write
>
>       # regular expressions like "gi\\|(\\d+)"
>
>       #my $primary_pattern = '^>(\S+)';
>
>       #if ($fullHeader == 1) {
>
>          my $primary_pattern = '^>(.+)';
>
>       #}
>
>       my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
> H37Rv complete genome";
> #$string =~ s/$primary_pattern/RRR/g;
>
>       #print "$string\n";
>
>
>
>       # one or more patterns stored in a hash:
>
>       my $secondary_patterns = {GI =>  'gi\|(\d+)'};
>
>
>
>       my $db = Bio::DB::Flat::BinarySearch->new(
>
>                             -directory          =>  $baseDir,
>
>                             -dbname             =>  $dbName,
>
>                             -write_flag         =>  1,
>
>                             -primary_pattern    =>  $primary_pattern,
>
>                             -primary_namespace  =>  'ACC',
>
>                             -secondary_patterns =>  $secondary_patterns,
>
>                             -verbose            =>  1,
>
>                             -format             =>  'fasta'  );
>
>
>
>       $db->build_index($seqFile);
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hrh at fmi.ch  Tue Sep  7 09:09:28 2010
From: hrh at fmi.ch (Hans-Rudolf Hotz)
Date: Tue, 07 Sep 2010 11:09:28 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
Message-ID: <4C860148.3030000@fmi.ch>

Hi


why don't you use the pre-indexed BLAST files from NCBI:

ftp://ftp.ncbi.nih.gov/blast/db/

you can use them to fetch individual sequences by gi number or accession 
with the tool "blastdbcmd" from blast+ binaries:

ftp://ftp.ncbi.nih.gov/blast/executables/blast+/


regards, Hans


On 09/07/2010 10:28 AM, Ross KK Leung wrote:
> By the following codes, I wanna index the 4G nr database, however, the index
> file is>  1T and the job has been running for weeks and still hasn't
> finished. Could anybody tell me how you accomplish the goal? Thanks in
> advance.
>
>      use strict;
>
>       use Bio::DB::Flat::BinarySearch;
>
>
>
>       (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) = @ARGV;
>
>
>
>       # use single quotes so you don't have to write
>
>       # regular expressions like "gi\\|(\\d+)"
>
>       #my $primary_pattern = '^>(\S+)';
>
>       #if ($fullHeader == 1) {
>
>          my $primary_pattern = '^>(.+)';
>
>       #}
>
>       my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
> H37Rv complete genome";
> #$string =~ s/$primary_pattern/RRR/g;
>
>       #print "$string\n";
>
>
>
>       # one or more patterns stored in a hash:
>
>       my $secondary_patterns = {GI =>  'gi\|(\d+)'};
>
>
>
>       my $db = Bio::DB::Flat::BinarySearch->new(
>
>                             -directory          =>  $baseDir,
>
>                             -dbname             =>  $dbName,
>
>                             -write_flag         =>  1,
>
>                             -primary_pattern    =>  $primary_pattern,
>
>                             -primary_namespace  =>  'ACC',
>
>                             -secondary_patterns =>  $secondary_patterns,
>
>                             -verbose            =>  1,
>
>                             -format             =>  'fasta'  );
>
>
>
>       $db->build_index($seqFile);
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hrh at fmi.ch  Tue Sep  7 09:33:46 2010
From: hrh at fmi.ch (Hans-Rudolf Hotz)
Date: Tue, 07 Sep 2010 11:33:46 +0200
Subject: [Bioperl-l] Indexing nr database
In-Reply-To: <007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk> <4C860148.3030000@fmi.ch>
	<007501cb4e6d$9b2c3ac0$d184b040$@edu.hk>
Message-ID: <4C8606FA.3000509@fmi.ch>


On 09/07/2010 11:18 AM, Ross KK Leung wrote:
> The reason is that I have to retrieve the specific information of the
> matched sequences, e.g. extract the 64th amino acid of the top matched
> sequence. Is there any way to achieve that?

"blastdbcmd" has several options like "-range"

and even if "blastdbcmd" does not give you the subset of information you 
want to fetch, I am still convinced you are quicker by fetching the 
complete entry with"blastdbcmd" and then parse the required data out of 
just one entry.

Hans

> -----Original Message-----
> From: Hans-Rudolf Hotz [mailto:hrh at fmi.ch]
> Sent: Tuesday, September 07, 2010 5:09 PM
> To: bioperl-l at lists.open-bio.org; ross at cuhk.edu.hk
> Subject: Re: [Bioperl-l] Indexing nr database
>
> Hi
>
>
> why don't you use the pre-indexed BLAST files from NCBI:
>
> ftp://ftp.ncbi.nih.gov/blast/db/
>
> you can use them to fetch individual sequences by gi number or accession
> with the tool "blastdbcmd" from blast+ binaries:
>
> ftp://ftp.ncbi.nih.gov/blast/executables/blast+/
>
>
> regards, Hans
>
>
>
> On 09/07/2010 10:28 AM, Ross KK Leung wrote:
>> By the following codes, I wanna index the 4G nr database, however, the
> index
>> file is>   1T and the job has been running for weeks and still hasn't
>> finished. Could anybody tell me how you accomplish the goal? Thanks in
>> advance.
>>
>>       use strict;
>>
>>        use Bio::DB::Flat::BinarySearch;
>>
>>
>>
>>        (my $baseDir, my $dbName, my $seqFile, my $testId, my $testGi) =
> @ARGV;
>>
>>
>>
>>        # use single quotes so you don't have to write
>>
>>        # regular expressions like "gi\\|(\\d+)"
>>
>>        #my $primary_pattern = '^>(\S+)';
>>
>>        #if ($fullHeader == 1) {
>>
>>           my $primary_pattern = '^>(.+)';
>>
>>        #}
>>
>>        my $string = "gi|41353971|emb|AL123456.2| Mycobacterium tuberculosis
>> H37Rv complete genome";
>> #$string =~ s/$primary_pattern/RRR/g;
>>
>>        #print "$string\n";
>>
>>
>>
>>        # one or more patterns stored in a hash:
>>
>>        my $secondary_patterns = {GI =>   'gi\|(\d+)'};
>>
>>
>>
>>        my $db = Bio::DB::Flat::BinarySearch->new(
>>
>>                              -directory          =>   $baseDir,
>>
>>                              -dbname             =>   $dbName,
>>
>>                              -write_flag         =>   1,
>>
>>                              -primary_pattern    =>   $primary_pattern,
>>
>>                              -primary_namespace  =>   'ACC',
>>
>>                              -secondary_patterns =>   $secondary_patterns,
>>
>>                              -verbose            =>   1,
>>
>>                              -format             =>   'fasta'  );
>>
>>
>>
>>        $db->build_index($seqFile);
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From fs5 at sanger.ac.uk  Tue Sep  7 12:09:52 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Tue, 07 Sep 2010 13:09:52 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
Message-ID: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>

I am working a lot with feature-rich Bio::Seq objects these days and
thought that it would be really nice if I could do something like:

my @features = $bio_seq_obj->get_SeqFeatures(-by_id => 'my_gene');

instead of having to grep for the feature every time. 
There could then be 'by_tag' and 'by_region' options as well. 

According to the Bio::Seq docs, something like this seems to be planned
at some stage. I would be willing to contribute to this feature if I can
and if this isn't already being implemented by somebody else. 
Does anybody know the state of this feature?

Frank


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From jason at bioperl.org  Tue Sep  7 17:36:07 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 07 Sep 2010 10:36:07 -0700
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <4C867807.2040907@bioperl.org>

And the implementation would just be something like this?

my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
eq 'my_gene' } $seq->get_SeqFeatures();

I think any implementation would be if we moved from the in-memory 
arrays & hash-based system to a sqlite db on the back-end for how 
Sequence and Feature objects are stored.
This would be a somewhat slower but wouldn't have performance/memory 
problems we get for sequences with many annotations.

-jason
Frank Schwach wrote, On 9/7/10 5:09 AM:
> I am working a lot with feature-rich Bio::Seq objects these days and
> thought that it would be really nice if I could do something like:
>
> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>
> instead of having to grep for the feature every time.
> There could then be 'by_tag' and 'by_region' options as well.
>
> According to the Bio::Seq docs, something like this seems to be planned
> at some stage. I would be willing to contribute to this feature if I can
> and if this isn't already being implemented by somebody else.
> Does anybody know the state of this feature?
>
> Frank
>
>
>
>
>
>
>    


From fs5 at sanger.ac.uk  Wed Sep  8 08:42:57 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 08 Sep 2010 09:42:57 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <4C867807.2040907@bioperl.org>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
Message-ID: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>

Hi Jason,

Yes, I guess that would be the simplest way of doing it - basically just
doing it the way the docs suggest for getting at a specific feature but
hiding the grep behind a Bio::Seq method with search parameters. But we
could also build a hash of feature tags as the Bio::Seq is built so that
retrieval is more efficient. This could also be used to implement a bin
indexing scheme for range queries, similar to what Bio::DB::GFF does.
Is a move to an sqlite backend planend for the near future? 

Frank


On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
> And the implementation would just be something like this?
> 
> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
> eq 'my_gene' } $seq->get_SeqFeatures();
> 
> I think any implementation would be if we moved from the in-memory 
> arrays & hash-based system to a sqlite db on the back-end for how 
> Sequence and Feature objects are stored.
> This would be a somewhat slower but wouldn't have performance/memory 
> problems we get for sequences with many annotations.
> 
> -jason
> Frank Schwach wrote, On 9/7/10 5:09 AM:
> > I am working a lot with feature-rich Bio::Seq objects these days and
> > thought that it would be really nice if I could do something like:
> >
> > my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
> >
> > instead of having to grep for the feature every time.
> > There could then be 'by_tag' and 'by_region' options as well.
> >
> > According to the Bio::Seq docs, something like this seems to be planned
> > at some stage. I would be willing to contribute to this feature if I can
> > and if this isn't already being implemented by somebody else.
> > Does anybody know the state of this feature?
> >
> > Frank
> >
> >
> >
> >
> >
> >
> >    


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From stefan.kirov at bms.com  Wed Sep  8 15:09:55 2010
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 08 Sep 2010 11:09:55 -0400
Subject: [Bioperl-l] Another interesting Javascript library
Message-ID: <4C87A743.5010109@bms.com>


    Sorry for off topic, but I believe a lot of people can
    find this quite useful:

        "CanvasXpress is a javascript library based on the
        <canvas> tag implemented in HTML5. I developed this
        library as the core visualization component for our
        BMS systems biology platform which I hope to release
        soon. The basic idea was to have generic and simple
        way to display genomics data. CanvasXpress supports
        bar graphs, line graphs, bar-line combination
        graphs, boxplots, dotplots, area graphs, stacked
        graphs, percentage-stacked graphs, correlation
        plots, Venn diagrams, heatmaps, newick trees,
        2D-scatter plots, 2D-scatter bubble plots,
        3D-scatter plots, pie charts, networks (or
        pathways), and a genome browser. It also supports a
        few data transformations like log and exponential
        transformation, z-score, percentile transformation
        and ratio. It also support grouping of samples,
        zooming, events ... yada, yada, yada ... and more
        importantly I created an Ext panel for it. Take a look.

        http://canvasxpress.org/"


    Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_kirov.vcf
Type: text/x-vcard
Size: 207 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100908/30205812/attachment-0004.vcf>

From alperyilmaz at gmail.com  Wed Sep  8 16:47:42 2010
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Wed, 8 Sep 2010 12:47:42 -0400
Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates
Message-ID: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>

Hi,

I have a GFF file listing mRNA and CDS coordinates for every
transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates
based on that information. I was wondering, if there's already made
script for that purpose that you're aware of.

I already uploaded the GFF file into Bio::DB::SeqFeature database, so
I can utilize both flat file or database based scripts.

thanks,

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954


From cjfields at illinois.edu  Wed Sep  8 23:20:09 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 8 Sep 2010 18:20:09 -0500
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
Message-ID: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>

Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Thu Sep  9 05:51:53 2010
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 08 Sep 2010 22:51:53 -0700
Subject: [Bioperl-l] extract UTR from cds and mRNA coordinates
In-Reply-To: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>
References: <AANLkTimqnnggJn78=PdbYVNisA5=yz3R7fdMWou-1nD9@mail.gmail.com>
Message-ID: <4C8875F9.6020502@bioperl.org>

Hi Alper -

This script operates on gtf so doesn't quite do what you want but could 
be modified to be simpler to just look at the CDS and mRNA rather than 
the exon,start/stop codon info
   
http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl

Otherwise I think there make be some easy ways to do this from some 
tools in MAKER too.

-jason
Alper Yilmaz wrote, On 9/8/10 9:47 AM:
> Hi,
>
> I have a GFF file listing mRNA and CDS coordinates for every
> transcript of each gene. I need to extract 5'UTR and 3'UTR coordinates
> based on that information. I was wondering, if there's already made
> script for that purpose that you're aware of.
>
> I already uploaded the GFF file into Bio::DB::SeqFeature database, so
> I can utilize both flat file or database based scripts.
>
> thanks,
>
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>    


From fs5 at sanger.ac.uk  Thu Sep  9 08:10:36 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 09 Sep 2010 09:10:36 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
	<03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
Message-ID: <1284019836.4777.281.camel@deskpro15336.dynamic.sanger.ac.uk>

so something like an abstract Bio::Seq::FeatureContainer that defines
the methods for storing and retrieving features and that would then be
sub-classed to e.g. Bio::Seq::FeatureContainer::Memory or
Bio::Seq::FeatureContainer:Sqlite - is that the plan? Is there any way I
can get involved or is it better to wait for other features to be
developed first?

Cheers,

Frank


On Wed, 2010-09-08 at 18:20 -0500, Chris Fields wrote:
> Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).
> 
> chris
> 
> On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:
> 
> > Hi Jason,
> > 
> > Yes, I guess that would be the simplest way of doing it - basically just
> > doing it the way the docs suggest for getting at a specific feature but
> > hiding the grep behind a Bio::Seq method with search parameters. But we
> > could also build a hash of feature tags as the Bio::Seq is built so that
> > retrieval is more efficient. This could also be used to implement a bin
> > indexing scheme for range queries, similar to what Bio::DB::GFF does.
> > Is a move to an sqlite backend planend for the near future? 
> > 
> > Frank
> > 
> > 
> > 
> > On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
> >> And the implementation would just be something like this?
> >> 
> >> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
> >> eq 'my_gene' } $seq->get_SeqFeatures();
> >> 
> >> I think any implementation would be if we moved from the in-memory 
> >> arrays & hash-based system to a sqlite db on the back-end for how 
> >> Sequence and Feature objects are stored.
> >> This would be a somewhat slower but wouldn't have performance/memory 
> >> problems we get for sequences with many annotations.
> >> 
> >> -jason
> >> Frank Schwach wrote, On 9/7/10 5:09 AM:
> >>> I am working a lot with feature-rich Bio::Seq objects these days and
> >>> thought that it would be really nice if I could do something like:
> >>> 
> >>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
> >>> 
> >>> instead of having to grep for the feature every time.
> >>> There could then be 'by_tag' and 'by_region' options as well.
> >>> 
> >>> According to the Bio::Seq docs, something like this seems to be planned
> >>> at some stage. I would be willing to contribute to this feature if I can
> >>> and if this isn't already being implemented by somebody else.
> >>> Does anybody know the state of this feature?
> >>> 
> >>> Frank
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> > 
> > 
> > 
> > -- 
> > The Wellcome Trust Sanger Institute is operated by Genome Research 
> > Limited, a charity registered in England with number 1021457 and a 
> > company registered in England with number 2742969, whose registered 
> > office is 215 Euston Road, London, NW1 2BE. 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From jun.yin at ucd.ie  Thu Sep  9 08:20:39 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Thu, 09 Sep 2010 09:20:39 +0100
Subject: [Bioperl-l] Bio::Seq, search for specific features
In-Reply-To: <03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
References: <005e01caf5be$6d00c9c0$47025d40$@edu.hk>
	<C935BB98-BAAD-4B09-9F3E-EAC25C38E890@sbc.su.se>
	<007401cb4e66$9533ecf0$bf9bc6d0$@edu.hk>
	<5FCDA47A-B8E5-4233-B571-4E1020CFCE91@sbc.su.se>
	<1283861392.4777.247.camel@deskpro15336.dynamic.sanger.ac.uk>
	<4C867807.2040907@bioperl.org>
	<1283935377.4777.257.camel@deskpro15336.dynamic.sanger.ac.uk>
	<03DB35B3-4EC0-4F5A-933B-FB6EE63F218A@illinois.edu>
Message-ID: <00ea01cb4ff7$e30652f0$a912f8d0$%yin@ucd.ie>

Hi,

I would like to give a go on the bin indexing scheme on Bio::Seq(or a
similar package to Bio::LocatableSeq). The idea is to save the index of
sequences to a local database (AnyDBM) instead of the memory itself. So this
will free some memory usage. This idea actually comes from Bio::DB::Fasta,
as implemented by Lincoln Stein.

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Thursday, September 09, 2010 12:20 AM
To: Frank Schwach
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Seq, search for specific features

Well, no move has been concretely made yet.  It would be nice to abstract
the backend, so one could use possibly any db or memory adaptor.  This is
essentially the direction I would like to take the alignment data as well
(part of the GSoC project for BioPerl this year was to tackle this very
thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From s1012635 at student.hsleiden.nl  Thu Sep  9 09:27:23 2010
From: s1012635 at student.hsleiden.nl (_Lelieveld, Stefan - s1012635)
Date: Thu, 9 Sep 2010 11:27:23 +0200 (CEST)
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>

Hi, 

I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). 
I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. 

http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
use Bio::Tools::Tmhmm; 
my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
while( my $tmhmm_feat = $parser->next_result ) { 
#do something 
#eg 
push @tmhmm_feat, $tmhmm_feat; 
} 

How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 

cheers! 

Stefan Lelieveld 


From fs5 at sanger.ac.uk  Thu Sep  9 10:28:51 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Thu, 09 Sep 2010 11:28:51 +0100
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
References: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <1284028131.4777.290.camel@deskpro15336.dynamic.sanger.ac.uk>

I haven't used that module myself but it appears to be a parser for
results from TMHMM, i.e. you don't feed it the FASTA file but the output
from TMHMM after it was run.
To run TMHMM you should use Bio::Tools::Run::Tmhmm
http://search.cpan.org/~cjfields/BioPerl-run-1.6.1/Bio/Tools/Run/Tmhmm.pm

Follow the synopsis to feed the tool with your sequences. You can learn
how to read a FASTA file and access each sequence in a loop here: 
http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples

Essentially it boils down to:

use Bio::SeqIO;
my $file = shift; # to get a file path from command line
my $inseq = Bio::SeqIO->new(-file   => "<$file",-format => 'FASTA' );
while (my $seq = $inseq->next_seq) {
  print $seq->accession_number,"\n";
}

as an example for printing out accession numbers from $seq, which is a Bio::Seq object. 
So what you have to do now is to feed each of those Bio::Seq objects into your TMHMM runner. 

Frank


On Thu, 2010-09-09 at 11:27 +0200, _Lelieveld, Stefan - s1012635 wrote:
> Hi, 
> 
> I am a bio-informatics student working on a new project. For this project I need to get the TMHMM prediction of a list of proteins (in fasta format). 
> I came across the Bio::Tools::TMHMM; package for BioPerl which looked promesing. The problem is I lack the advanced knowlegde of perl to get this package to work. So far we had courses in Python and Java not in Perl. 
> 
> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
> use Bio::Tools::Tmhmm; 
> my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
> while( my $tmhmm_feat = $parser->next_result ) { 
> #do something 
> #eg 
> push @tmhmm_feat, $tmhmm_feat; 
> } 
> 
> How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 
> 
> cheers! 
> 
> Stefan Lelieveld 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From kai.blin at biotech.uni-tuebingen.de  Thu Sep  9 10:16:08 2010
From: kai.blin at biotech.uni-tuebingen.de (Kai Blin)
Date: Thu, 9 Sep 2010 12:16:08 +0200
Subject: [Bioperl-l] Bio::Tools::TMHMM;
In-Reply-To: <814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
References: <421761374.485633.1284024358748.JavaMail.root@zembox01.zaas.igi.nl>
	<814361158.485667.1284024443202.JavaMail.root@zembox01.zaas.igi.nl>
Message-ID: <20100909121608.2571bbff.kai.blin@biotech.uni-tuebingen.de>

On Thu, 9 Sep 2010 11:27:23 +0200 (CEST)
"_Lelieveld, Stefan - s1012635" <s1012635 at student.hsleiden.nl> wrote:

Hi Stefan,

> http://search.cpan.org/~birney/bioperl-1.2.3/Bio/Tools/Tmhmm.pm : 
> use Bio::Tools::Tmhmm; 
> my $parser = new Bio::Tools::Tmhmm(-fh =>$filehandle ); 
> while( my $tmhmm_feat = $parser->next_result ) { 
> #do something 
> #eg 
> push @tmhmm_feat, $tmhmm_feat; 
> } 
> 
> How do I feed a input.txt(containing the proteins as fasta format) to this parser and how do I save the output? 

You need to run TMHMM first, of course. Bio::Tools::Tmhmm only parses
the TMHMM output file and returns an object that you can ask for
Bio::SeqFeature objects. So if you want to run TMHMM on some fasta
files, this module isn't going to do that for you.

Assuming that input.txt contains the TMHMM output,
"""
my $parser = new Bio::Tools:Tmhmm(-file => "input.txt");
"""
will load parse the TMHMM output for you.

HTH,
Kai

-- 
Dipl.-Inform. Kai Blin         kai.blin at biotech.uni-tuebingen.de
Institute for Microbiology and Infection Medicine
Division of Microbiology/Biotechnology
Eberhard-Karls-Universit?t T?bingen
Auf der Morgenstelle 28                 Phone : ++49 7071 29-78841
D-72076 T?bingen                        Fax :   ++49 7071 29-5979
Germany
Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben


From elanorbust2 at yahoo.com  Thu Sep  9 16:10:06 2010
From: elanorbust2 at yahoo.com (sally roberts)
Date: Thu, 9 Sep 2010 09:10:06 -0700 (PDT)
Subject: [Bioperl-l] standaloneblastplus
Message-ID: <154453.73718.qm@web37504.mail.mud.yahoo.com>

I am running a test for standaloneblastplus but getting data back that does not exist in my query or my local database. Below is a outline of my script small database, query list, and erroneous results. As you will notice the query list is comprised of the first four sequences found in the database. The results say it can not find the first two and then the mathces for the last two do not exist!

Thanks for any help! 


Program 


#!/usr/bin/perl

use Bio::Tools::Run::StandAloneBlastPlus;


$fac = Bio::Tools::Run::StandAloneBlastPlus->new(
? -db_name => 'ITS',
? -db_data => 'smallDB.fas',
? -create => 1
);

$result = $fac->blastn( -query => , 'sequences.fasta',
??????????????????????? -outfile => 'ITStest2.bls');


smallDB.fas Data

>302585252|HM807352|Waitea circinata? internal transcribed spacer 1
ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA

>302585252|HM807352|Waitea circinata? internal transcribed spacer 2
GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA

>302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC

>302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA

>302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA, internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed spacer 2, and 28S ribosomal RNA"
CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGCGCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGCATATCATTAAAGCGGAGGAA

>301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1
GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGCGCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTATATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAGAATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCTACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACCTCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTGACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGTGAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCGTTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCGTCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTACCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAGTCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATATATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGAACGCA
CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTAAGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCTTATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA

>301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTGGGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGAATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGAAACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGAAATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTAACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGACTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAAGTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACGTAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGAAACTTTTATATATAGTTCGCCGAATAATAGCGAAC

>301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer 1
AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCGCTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATCGTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTGAGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTCGAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCGAGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACCGGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTGGCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCCCTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTGTATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATATGAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATGTATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCTATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC

>301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1
AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCTTTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATCTCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCGGGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAATAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGAAACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAATATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCCGCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTCGTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTCATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATAGTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATACAAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCATAAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG

>301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1
AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGCTCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATTCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTCTTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCGATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGAGATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAATTACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAACGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGTAGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATATATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG


sequences.fasta data

>Test1
ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGCACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGTTTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA

>Test2
GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTTCTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTAGACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA

>Test3
CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC

>Test4
GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGATTGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTAAACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAGGAA


Results

BLASTN 2.2.24+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.


Database: ITS
?????????? 5 sequences; 1,102 total letters


Query=? Test1
Length=204


***** No hits found *****


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 202071


Query=? Test2
Length=192


***** No hits found *****


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 189507


Query=? Test3
Length=437
????????????????????????????????????????????????????????????????????? Score???? E
Sequences producing significant alignments:????????????????????????? (Bits)? Value

dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 300??? 2e-085
dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 6e-016
dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012
dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 4e-012


>dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G59F
Length=203

?Score =? 300 bits (162),? Expect = 2e-085
?Identities = 176/182 (96%), Gaps = 4/182 (2%)
?Strand=Plus/Plus

Query? 10?? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC? 66
??????????? ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC? 81

Query? 67?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 126
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 82?? AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT? 141

Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 201

Query? 187? GG? 188
??????????? ||
Sbjct? 202? GG? 203


>dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G64F
Length=217

?Score = 69.4 bits (37),? Expect = 6e-016
?Identities = 39/40 (97%), Gaps = 0/40 (0%)
?Strand=Plus/Plus

Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188
??????????? ||||| ||||||||||||||||||||||||||||||||||
Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217


>dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G60F
Length=206

?Score = 58.4 bits (31),? Expect = 1e-012
?Identities = 39/42 (92%), Gaps = 3/42 (7%)
?Strand=Plus/Plus

Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? |||| || ||| ||||||||||||||||||||||||||||||
Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204


>dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G65F
Length=256

?Score = 56.5 bits (30),? Expect = 4e-012
?Identities = 30/30 (100%), Gaps = 0/30 (0%)
?Strand=Plus/Plus

Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186
??????????? ||||||||||||||||||||||||||||||
Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 442850


Query=? Test4
Length=521
????????????????????????????????????????????????????????????????????? Score???? E
Sequences producing significant alignments:????????????????????????? (Bits)? Value

dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...?? 309??? 4e-088
dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 69.4??? 7e-016
dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 58.4??? 1e-012
dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...? 56.5??? 5e-012


>dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G59F
Length=203

?Score =? 309 bits (167),? Expect = 4e-088
?Identities = 177/181 (97%), Gaps = 3/181 (1%)
?Strand=Plus/Plus

Query? 7??? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA? 63
??????????? ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
Sbjct? 23?? TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA? 82

Query? 64?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 123
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 83?? GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG? 142

Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 183
??????????? ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG? 202

Query? 184? G? 184
??????????? |
Sbjct? 203? G? 203


>dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G64F
Length=217

?Score = 69.4 bits (37),? Expect = 7e-016
?Identities = 39/40 (97%), Gaps = 0/40 (0%)
?Strand=Plus/Plus

Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184
??????????? ||||| ||||||||||||||||||||||||||||||||||
Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217


>dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G60F
Length=206

?Score = 58.4 bits (31),? Expect = 1e-012
?Identities = 39/42 (92%), Gaps = 3/42 (7%)
?Strand=Plus/Plus

Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182
??????????? |||| || ||| ||||||||||||||||||||||||||||||
Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204


>dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA, partial 
sequence, clone: G65F
Length=256

?Score = 56.5 bits (30),? Expect = 5e-012
?Identities = 30/30 (100%), Gaps = 0/30 (0%)
?Strand=Plus/Plus

Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182
??????????? ||||||||||||||||||||||||||||||
Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254


Lambda???? K????? H
??? 1.33??? 0.621???? 1.12 

Gapped
Lambda???? K????? H
??? 1.28??? 0.460??? 0.850 

Effective search space used: 530378


? Database: ITS
??? Posted date:? Aug 27, 2010? 9:43 AM
? Number of letters in database: 1,102
? Number of sequences in database:? 5


Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 2.5


From jaya1786 at gmail.com  Thu Sep  9 16:59:51 2010
From: jaya1786 at gmail.com (jayanthijayakumar)
Date: Thu, 9 Sep 2010 22:29:51 +0530
Subject: [Bioperl-l] Regarding GSoC 2010
Message-ID: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>

Respected sir/madam,

I am Jayanthi Jayakumar doing my second year MS(By Research) in
computational biology in Anna University Chennai,India.

Iam very  much interested to participate in GSoC 2010 under the project
"Major Bioperl recognition".

I request you to provide details and eligiblity criteria for the same.


Thanking you,


yours faithfully,
Jayanthi Jayakumar


From Russell.Smithies at agresearch.co.nz  Thu Sep  9 22:54:43 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 10 Sep 2010 10:54:43 +1200
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <154453.73718.qm@web37504.mail.mud.yahoo.com>
References: <154453.73718.qm@web37504.mail.mud.yahoo.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz>

Is that a typo in your email or are some of your fasta headers in your db incorrect?
Eg.
>301333052|GU725063|Xiphinema adenohystherum  internal transcribed
>301333052|GU725063|spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Shouldn't that be:
>301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Maybe the invalid fasta headers are breaking the db formatter?


Russell Smithies

Technical Support
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz
Invermay  Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T  +64 3 489 3809
F  +64 3 489 9174
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of sally roberts
> Sent: Friday, 10 September 2010 4:10 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] standaloneblastplus
>
> I am running a test for standaloneblastplus but getting data back that
> does not exist in my query or my local database. Below is a outline of my
> script small database, query list, and erroneous results. As you will
> notice the query list is comprised of the first four sequences found in
> the database. The results say it can not find the first two and then the
> mathces for the last two do not exist!
>
> Thanks for any help!
>
>
>
> Program
>
>
> #!/usr/bin/perl
>
> use Bio::Tools::Run::StandAloneBlastPlus;
>
>
> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>   -db_name => 'ITS',
>   -db_data => 'smallDB.fas',
>   -create => 1
> );
>
> $result = $fac->blastn( -query => , 'sequences.fasta',
>                         -outfile => 'ITStest2.bls');
>
>
> smallDB.fas Data
>
> >302585252|HM807352|Waitea circinata  internal transcribed spacer 1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >302585252|HM807352|Waitea circinata  internal transcribed spacer 2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >302585250|HM802273|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >302585249|HM802272|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
> >302585248|HM802271|Fusarium oxysporum  contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
> ATATCATTAAAGCGGAGGAA
>
> >301333053|GU725064|Xiphinema turcicum  internal transcribed spacer 1
> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
> ACGCA
> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>
> >301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer
> 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>
> >301333051|GU725062|Xiphinema sphaerocephalum  internal transcribed spacer
> 1
> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>
> >301333050|GU725061|Xiphinema hispanum  internal transcribed spacer 1
> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>
> >301333049|GU725060|Xiphinema pyrenaicum  internal transcribed spacer 1
> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>
>
>
> sequences.fasta data
>
> >Test1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >Test2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >Test3
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >Test4
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
>
>
>
> Results
>
> BLASTN 2.2.24+
>
>
> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
> Comput Biol 2000; 7(1-2):203-14.
>
>
>
> Database: ITS
>            5 sequences; 1,102 total letters
>
>
>
> Query=  Test1
> Length=204
>
>
> ***** No hits found *****
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 202071
>
>
> Query=  Test2
> Length=192
>
>
> ***** No hits found *****
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 189507
>
>
> Query=  Test3
> Length=437
>
> Score     E
> Sequences producing significant alignments:
> (Bits)  Value
>
> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 300    2e-085
> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4    6e-016
> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4    1e-012
> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5    4e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>  Score =  300 bits (162),  Expect = 2e-085
>  Identities = 176/182 (96%), Gaps = 4/182 (2%)
>  Strand=Plus/Plus
>
> Query  10   TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
> 66
>             ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
> 81
>
> Query  67   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 126
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  82   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 141
>
> Query  127  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 186
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  142  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 201
>
> Query  187  GG  188
>             ||
> Sbjct  202  GG  203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>  Score = 69.4 bits (37),  Expect = 6e-016
>  Identities = 39/40 (97%), Gaps = 0/40 (0%)
>  Strand=Plus/Plus
>
> Query  149  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  188
>             ||||| ||||||||||||||||||||||||||||||||||
> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>  Score = 58.4 bits (31),  Expect = 1e-012
>  Identities = 39/42 (92%), Gaps = 3/42 (7%)
>  Strand=Plus/Plus
>
> Query  146  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  186
>             |||| || ||| ||||||||||||||||||||||||||||||
> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>  Score = 56.5 bits (30),  Expect = 4e-012
>  Identities = 30/30 (100%), Gaps = 0/30 (0%)
>  Strand=Plus/Plus
>
> Query  157  AAAACTTTCAACAACGGATCTCTTGGTTCT  186
>             ||||||||||||||||||||||||||||||
> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 442850
>
>
> Query=  Test4
> Length=521
>
> Score     E
> Sequences producing significant alignments:
> (Bits)  Value
>
> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 309    4e-088
> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4    7e-016
> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4    1e-012
> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5    5e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>  Score =  309 bits (167),  Expect = 4e-088
>  Identities = 177/181 (97%), Gaps = 3/181 (1%)
>  Strand=Plus/Plus
>
> Query  7    TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 63
>             ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 82
>
> Query  64   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 123
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  83   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 142
>
> Query  124  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 183
>             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct  143  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 202
>
> Query  184  G  184
>             |
> Sbjct  203  G  203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>  Score = 69.4 bits (37),  Expect = 7e-016
>  Identities = 39/40 (97%), Gaps = 0/40 (0%)
>  Strand=Plus/Plus
>
> Query  145  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  184
>             ||||| ||||||||||||||||||||||||||||||||||
> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>  Score = 58.4 bits (31),  Expect = 1e-012
>  Identities = 39/42 (92%), Gaps = 3/42 (7%)
>  Strand=Plus/Plus
>
> Query  142  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  182
>             |||| || ||| ||||||||||||||||||||||||||||||
> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>  Score = 56.5 bits (30),  Expect = 5e-012
>  Identities = 30/30 (100%), Gaps = 0/30 (0%)
>  Strand=Plus/Plus
>
> Query  153  AAAACTTTCAACAACGGATCTCTTGGTTCT  182
>             ||||||||||||||||||||||||||||||
> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>
>
>
> Lambda     K      H
>     1.33    0.621     1.12
>
> Gapped
> Lambda     K      H
>     1.28    0.460    0.850
>
> Effective search space used: 530378
>
>
>   Database: ITS
>     Posted date:  Aug 27, 2010  9:43 AM
>   Number of letters in database: 1,102
>   Number of sequences in database:  5
>
>
>
> Matrix: blastn matrix 1 -2
> Gap Penalties: Existence: 0, Extension: 2.5
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From elanorbust2 at yahoo.com  Fri Sep 10 15:13:08 2010
From: elanorbust2 at yahoo.com (sally roberts)
Date: Fri, 10 Sep 2010 08:13:08 -0700 (PDT)
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF3303A3E293B@exchsth.agresearch.co.nz>
Message-ID: <23696.14536.qm@web37508.mail.mud.yahoo.com>

I think that is just a email error. Thanks for looking though!

--- On Thu, 9/9/10, Smithies, Russell <Russell.Smithies at agresearch.co.nz> wrote:

From: Smithies, Russell <Russell.Smithies at agresearch.co.nz>
Subject: RE: [Bioperl-l] standaloneblastplus
To: "'sally roberts'" <elanorbust2 at yahoo.com>, "'bioperl-l at lists.open-bio.org'" <bioperl-l at lists.open-bio.org>
Date: Thursday, September 9, 2010, 6:54 PM

Is that a typo in your email or are some of your fasta headers in your db incorrect?
Eg.
>301333052|GU725063|Xiphinema adenohystherum? internal transcribed
>301333052|GU725063|spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Shouldn't that be:
>301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer 1
AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT

Maybe the invalid fasta headers are breaking the db formatter?


Russell Smithies

Technical Support
T +64 3 489 9085
E? russell.smithies at agresearch.co.nz
Invermay? Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T? +64 3 489 3809
F? +64 3 489 9174
www.agresearch.co.nz

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of sally roberts
> Sent: Friday, 10 September 2010 4:10 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] standaloneblastplus
>
> I am running a test for standaloneblastplus but getting data back that
> does not exist in my query or my local database. Below is a outline of my
> script small database, query list, and erroneous results. As you will
> notice the query list is comprised of the first four sequences found in
> the database. The results say it can not find the first two and then the
> mathces for the last two do not exist!
>
> Thanks for any help!
>
>
>
> Program
>
>
> #!/usr/bin/perl
>
> use Bio::Tools::Run::StandAloneBlastPlus;
>
>
> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>???-db_name => 'ITS',
>???-db_data => 'smallDB.fas',
>???-create => 1
> );
>
> $result = $fac->blastn( -query => , 'sequences.fasta',
>? ? ? ? ? ? ? ? ? ? ? ???-outfile => 'ITStest2.bls');
>
>
> smallDB.fas Data
>
> >302585252|HM807352|Waitea circinata? internal transcribed spacer 1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >302585252|HM807352|Waitea circinata? internal transcribed spacer 2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >302585250|HM802273|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >302585249|HM802272|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
> >302585248|HM802271|Fusarium oxysporum? contains 18S ribosomal RNA,
> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
> spacer 2, and 28S ribosomal RNA"
> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
> ATATCATTAAAGCGGAGGAA
>
> >301333053|GU725064|Xiphinema turcicum? internal transcribed spacer 1
> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
> ACGCA
> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>
> >301333052|GU725063|Xiphinema adenohystherum? internal transcribed spacer
> 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>
> >301333051|GU725062|Xiphinema sphaerocephalum? internal transcribed spacer
> 1
> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>
> >301333050|GU725061|Xiphinema hispanum? internal transcribed spacer 1
> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>
> >301333049|GU725060|Xiphinema pyrenaicum? internal transcribed spacer 1
> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>
>
>
> sequences.fasta data
>
> >Test1
> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>
> >Test2
> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>
> >Test3
> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>
> >Test4
> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
> GAA
>
>
>
>
> Results
>
> BLASTN 2.2.24+
>
>
> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
> Comput Biol 2000; 7(1-2):203-14.
>
>
>
> Database: ITS
>? ? ? ? ? ? 5 sequences; 1,102 total letters
>
>
>
> Query=? Test1
> Length=204
>
>
> ***** No hits found *****
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 202071
>
>
> Query=? Test2
> Length=192
>
>
> ***** No hits found *****
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 189507
>
>
> Query=? Test3
> Length=437
>
> Score? ???E
> Sequences producing significant alignments:
> (Bits)? Value
>
> dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 300? ? 2e-085
> dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4? ? 6e-016
> dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4? ? 1e-012
> dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5? ? 4e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>? Score =? 300 bits (162),? Expect = 2e-085
>? Identities = 176/182 (96%), Gaps = 4/182 (2%)
>? Strand=Plus/Plus
>
> Query? 10???TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
> 66
>? ? ? ? ? ???||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
> Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
> 81
>
> Query? 67???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 126
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 82???AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
> 141
>
> Query? 127? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 186
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 142? GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
> 201
>
> Query? 187? GG? 188
>? ? ? ? ? ???||
> Sbjct? 202? GG? 203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>? Score = 69.4 bits (37),? Expect = 6e-016
>? Identities = 39/40 (97%), Gaps = 0/40 (0%)
>? Strand=Plus/Plus
>
> Query? 149? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 188
>? ? ? ? ? ???||||| ||||||||||||||||||||||||||||||||||
> Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>? Score = 58.4 bits (31),? Expect = 1e-012
>? Identities = 39/42 (92%), Gaps = 3/42 (7%)
>? Strand=Plus/Plus
>
> Query? 146? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 186
>? ? ? ? ? ???|||| || ||| ||||||||||||||||||||||||||||||
> Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>? Score = 56.5 bits (30),? Expect = 4e-012
>? Identities = 30/30 (100%), Gaps = 0/30 (0%)
>? Strand=Plus/Plus
>
> Query? 157? AAAACTTTCAACAACGGATCTCTTGGTTCT? 186
>? ? ? ? ? ???||||||||||||||||||||||||||||||
> Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 442850
>
>
> Query=? Test4
> Length=521
>
> Score? ???E
> Sequences producing significant alignments:
> (Bits)? Value
>
> dbj|AB581518.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 309? ? 4e-088
> dbj|AB581521.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 69.4? ? 7e-016
> dbj|AB581519.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 58.4? ? 1e-012
> dbj|AB581522.1|? Uncultured fungus genes for 18S rRNA, ITS1 and 5...
> 56.5? ? 5e-012
>
>
> >dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G59F
> Length=203
>
>? Score =? 309 bits (167),? Expect = 4e-088
>? Identities = 177/181 (97%), Gaps = 3/181 (1%)
>? Strand=Plus/Plus
>
> Query? 7? ? TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 63
>? ? ? ? ? ???||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
> Sbjct? 23???TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
> 82
>
> Query? 64???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 123
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 83???GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
> 142
>
> Query? 124? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 183
>? ? ? ? ? ???||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct? 143? TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
> 202
>
> Query? 184? G? 184
>? ? ? ? ? ???|
> Sbjct? 203? G? 203
>
>
> >dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G64F
> Length=217
>
>? Score = 69.4 bits (37),? Expect = 7e-016
>? Identities = 39/40 (97%), Gaps = 0/40 (0%)
>? Strand=Plus/Plus
>
> Query? 145? AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 184
>? ? ? ? ? ???||||| ||||||||||||||||||||||||||||||||||
> Sbjct? 178? AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG? 217
>
>
> >dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G60F
> Length=206
>
>? Score = 58.4 bits (31),? Expect = 1e-012
>? Identities = 39/42 (92%), Gaps = 3/42 (7%)
>? Strand=Plus/Plus
>
> Query? 142? ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT? 182
>? ? ? ? ? ???|||| || ||| ||||||||||||||||||||||||||||||
> Sbjct? 165? ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT? 204
>
>
> >dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
> partial
> sequence, clone: G65F
> Length=256
>
>? Score = 56.5 bits (30),? Expect = 5e-012
>? Identities = 30/30 (100%), Gaps = 0/30 (0%)
>? Strand=Plus/Plus
>
> Query? 153? AAAACTTTCAACAACGGATCTCTTGGTTCT? 182
>? ? ? ? ? ???||||||||||||||||||||||||||||||
> Sbjct? 225? AAAACTTTCAACAACGGATCTCTTGGTTCT? 254
>
>
>
> Lambda? ???K? ? ? H
>? ???1.33? ? 0.621? ???1.12
>
> Gapped
> Lambda? ???K? ? ? H
>? ???1.28? ? 0.460? ? 0.850
>
> Effective search space used: 530378
>
>
>???Database: ITS
>? ???Posted date:? Aug 27, 2010? 9:43 AM
>???Number of letters in database: 1,102
>???Number of sequences in database:? 5
>
>
>
> Matrix: blastn matrix 1 -2
> Gap Penalties: Existence: 0, Extension: 2.5
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From David.Messina at sbc.su.se  Fri Sep 10 16:23:26 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 10 Sep 2010 18:23:26 +0200
Subject: [Bioperl-l] standaloneblastplus
In-Reply-To: <23696.14536.qm@web37508.mail.mud.yahoo.com>
References: <23696.14536.qm@web37508.mail.mud.yahoo.com>
Message-ID: <CFF0CF54-E5BD-4FD1-990D-E3AC219FBF5B@sbc.su.se>

Hi Sally,

Did you run the same search on the command line, outside of BioPerl?

The issue you're having may be with Blast+ and not BioPerl. For example, it's possible that the low-complexity and compositional matrix adjustment filtering (which are turned on by default) are excluding the expected matches.


Dave


On Sep 10, 2010, at 17:13 , sally roberts wrote:

> I think that is just a email error. Thanks for looking though!
> 
> --- On Thu, 9/9/10, Smithies, Russell <Russell.Smithies at agresearch.co.nz> wrote:
> 
> From: Smithies, Russell <Russell.Smithies at agresearch.co.nz>
> Subject: RE: [Bioperl-l] standaloneblastplus
> To: "'sally roberts'" <elanorbust2 at yahoo.com>, "'bioperl-l at lists.open-bio.org'" <bioperl-l at lists.open-bio.org>
> Date: Thursday, September 9, 2010, 6:54 PM
> 
> Is that a typo in your email or are some of your fasta headers in your db incorrect?
> Eg.
>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed
>> 301333052|GU725063|spacer 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT
> 
> Shouldn't that be:
>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer 1
> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCGCTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGATCTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGT
> 
> Maybe the invalid fasta headers are breaking the db formatter?
> 
> 
> Russell Smithies
> 
> Technical Support
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of sally roberts
>> Sent: Friday, 10 September 2010 4:10 a.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] standaloneblastplus
>> 
>> I am running a test for standaloneblastplus but getting data back that
>> does not exist in my query or my local database. Below is a outline of my
>> script small database, query list, and erroneous results. As you will
>> notice the query list is comprised of the first four sequences found in
>> the database. The results say it can not find the first two and then the
>> mathces for the last two do not exist!
>> 
>> Thanks for any help!
>> 
>> 
>> 
>> Program
>> 
>> 
>> #!/usr/bin/perl
>> 
>> use Bio::Tools::Run::StandAloneBlastPlus;
>> 
>> 
>> $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
>>    -db_name => 'ITS',
>>    -db_data => 'smallDB.fas',
>>    -create => 1
>> );
>> 
>> $result = $fac->blastn( -query => , 'sequences.fasta',
>>                          -outfile => 'ITStest2.bls');
>> 
>> 
>> smallDB.fas Data
>> 
>>> 302585252|HM807352|Waitea circinata  internal transcribed spacer 1
>> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
>> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
>> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>> 
>>> 302585252|HM807352|Waitea circinata  internal transcribed spacer 2
>> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
>> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
>> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>> 
>>> 302585250|HM802273|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
>> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
>> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
>> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
>> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
>> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>> 
>>> 302585249|HM802272|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
>> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
>> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
>> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
>> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
>> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
>> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
>> GAA
>> 
>>> 302585248|HM802271|Fusarium oxysporum  contains 18S ribosomal RNA,
>> internal transcribed spacer 1, 5.8S ribosomal RNA, internal transcribed
>> spacer 2, and 28S ribosomal RNA"
>> CCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCAATTGTTGCCTCGGCGGATCAGCCCGCTCC
>> CGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAA
>> TAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTAAT
>> GTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGC
>> CTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGCCG
>> GCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCATTGCGTAGTAGTAAAACCCTCGCAACTGGTACGCGGC
>> GCGGCCAAGCCGTTAAACCCCCAACTTCTGAATGTTGACCTCGGATCAGGTAGGAATACCCGCTGAACTTAAGC
>> ATATCATTAAAGCGGAGGAA
>> 
>>> 301333053|GU725064|Xiphinema turcicum  internal transcribed spacer 1
>> GGAGAGATTATATCTTTCTCGAAAAGAGAAAAAATATCCGAGCCGAGCGAACCGACCGAAAAACGCGGTGAGGC
>> GCCTTTTGCGCAAAGTCCGTACGTCGGTTCTTAGCGAATATAGCCTCGGCCTGGGACCCGAAAGATGTTTCCTA
>> TATGTATCTCGAGACCGACCGTTTAAGACGGTAGCCGGAAAAAAGATTATACCGTGGGTGAAGGTGTCGAAAAG
>> AATAATGTAGGTAAAAAAGAAAGACAGACAGAGGAGAGAAAGAACGAAAGTAGAACTCGAACGTAGTTTGAGCT
>> ACGCAGTAACGGTATCCGTCGTGGGACATCGCGGTGCGTCGGTTGTAGGGAGTTAAGATTACCTACCCGACACC
>> TCGATATTAATCCCGCGCGAATAAATGCGGATTACCGTGAATGTACGCTCTGCTTCGATATCGGGCTTCTTTTG
>> ACACCGAAAATATATATATGAATAAAAATAAAGTCACCCTCGTTGCAACGGTATATATCAAAGCGGTTTTCCGT
>> GAAAAGAAAGAAGGCGGCTTCGGTTCTCGTTATATTAGGAATAATCTAAGTAATTTCAGACGTCCCGGGAATCG
>> TTACTATAGATAGAGAGCGATAGTAACGGTTTCTCCTTCGGGTACTTATCGAACGTTAACACTGCGGTAATCCG
>> TCTGGCCGCAAGGAGAGAGGTGTTACGTTCGGCAGCCCTAAATTTCGACCCGTTCGACTAATGCGACGGCCCTA
>> CCGAGAAAATGTAGGGCCTATGTACATAGTCCGAAAGAAATACGATCGGAATATTAAGGGTTAGGTTTAAAGAG
>> TCATCGGTTCCGAGTACGCGTTCGTTCGGCACGATGCGTGTGTGTATATATCGTAGAGGAGTATTGACGATATA
>> TATGTATGCGTATTCGCCCTTACGATAAGAGAATATCGCGTAATTCGGAGCGGCCGTTCTTCGCGAGAGAGAGA
>> ACGCA
>> CGCGTTAGAAGCTTACGAGTCGGTGTTAAGTTCGAAGGAGAGAGGTTCGAACCGAAGCCGGCGAGTACGCGTTA
>> AGTCGTTTCGCGAGAGACGGTCCGGGACGAAAAGGAGAGAGTATCGTCCGGGTGTCCGCCCGAAATAGATATCT
>> TATCGAGAATATTTTTATATAGTTCGTTAGAAAGAATGCGAACTTTAAA
>> 
>>> 301333052|GU725063|Xiphinema adenohystherum  internal transcribed spacer
>> 1
>> AAAGACGAAAAATATATACTTTCTCACCGAAATATCAGACTATTCGGTTCCGATTTGATTCGCGGTAAGGCGCG
>> CTAGCGCGGTAGCCCGCTATCGGTTCTGACTGTCTGGACCCCGAAAAATAGTAAGAAAGACGGTGTCGTCGGAT
>> CTCGGGTTAGTATTGTATATATCGGGGACTTATCGGTCTGTCGAGTTTCTTTCCGGGGTTCTTTGAGTTTATTG
>> GGACAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAGTCTCGTGAACACGAGCCCGGGA
>> ATAGAAGAGACTCGGCTGATAACGACCGACTATATCTCGTTATATACTCAGAGTTGAATAACTGAGTGGCTCGA
>> AACGGCGACATTGTACTTACTATTTTATGTAGACTCTGGAAATATCAGACGTCCCGGGGAATCGTTACAGAGGA
>> AATATAGGGTACCTGGAAAAAGAATGGTACCCGTTCCTGTAATGATTCCTTATTCGGGTACCTATCGAATACTA
>> ACGGCGCGGATCCCCCGTCTGGCCGCGACGGAATAAGCGTTAGATTCGGTATCCCTATATTCGCGAGTATTCGA
>> CTAGTCATGAAATAGAGCCCTTATCGGGGTATCGACTGTCGATCGGATAGAAAGCGAATTAGGGTTAGGTTTAA
>> AGAGTCATTGGTTCCGTATATATGGGTGGAACGTACCCGTAAAGGAACAGCCGTAGACGCGAGTTCGGAAATAA
>> GTATATTCTCGCGAGAAAGAGGGTCCGTGTACCTTCAAGGTACTTGAATTTAGACCCAGTCTCGTGAATATACG
>> TAACTCGTCGAATGGCTCGGGACATGTAGAATACTATGTCCGGGTGACCGCCCGAAATAAGAATATTCATCAGA
>> AACTTTTATATATAGTTCGCCGAATAATAGCGAAC
>> 
>>> 301333051|GU725062|Xiphinema sphaerocephalum  internal transcribed spacer
>> 1
>> AAAGTCGAAAAAATATACTTTCTCGCGGAGAAATAATACGGACCGTTCAGTCCGACTCTATACGCGGTAAGGCG
>> CTCTTGCGCGAGAGCCCGCTGTCGGTTCTGACGGTCCGGACCCCGAAAAGTAGTAAGTACGACTACGATATATC
>> GTGGTCGAGTATCGGTTAGTAATAGTATATCGGGACTGACCGATCGGTCGGTCGAGTTTCTACCGGCTTCTTTG
>> AGTCTATTCGGGCAGCGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTGTAGAACTCGTGAATTC
>> GAGCTCGGTAACCGGGAACTCGGCTGAGAACGACCGATTACTTCTCGATACGCTCGAACGTATATATCTAACCG
>> AGAAAAGGCGACGTTGTACTTACTATTTATATCAGACGTCCCGAGAGTCGTTACGGTCGGAAATATTGGGTACC
>> GGTATCGGACCCGTTTCCGTATCGGCTCTTTATTCGGGTACCTATCGAATACTAACGCCGCGGTTCACCGTCTG
>> GCCGCGACGGAATACGCGTTAGATTCGGCACCCCCTATATTCGTATATATATCGACTAGTCTCGAAATAGAGCC
>> CTTACTAGGGTGAAGACTATGTCGATCGGAAAGAATCGGATTAGGGGTAGGTTTAAAGAGTCATCGGTTCCGTG
>> TATCCGGGCGAAATATATACCCGTAACGGAACGACCGTTGACGCGAGTTTGAAGATATATACATGTACGTATAT
>> GAGACAAAAAAACGAGGGTCTGTACCGTGAATTTTTTAGGTACCGAAAAGAGGACCCCCGGTCTCGTGAATATG
>> TATTACTCGCCGAACGGTTCGGGACATGGAGAATATTATGTCCGGGTGACCGCCCGAAATAGAAATTTTTTTCT
>> ATAAAGTTTTGATATACGTATAGTTCGTCGAATAAAAGC
>> 
>>> 301333050|GU725061|Xiphinema hispanum  internal transcribed spacer 1
>> AAAGCCGAAAAATATATACTTTCTCAGAGAAATACTAGACTAGTCGATTCCGACTTGATTCGCGGTAAGGCGCT
>> TTCGCGCGATAGCCCGCTGTCGGTTCCGACCGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGATC
>> TCGGTTAGAAATTGTATATATGTCGGGACGGATCGGTCGGTCGAGTTCCTTTCGGTGTTCTTAGAGTTTATTCG
>> GGCAGTGTCGGTTGTAGCGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTTAGAACCCGGAA
>> TAGAGGGAACTCGGCTGATAACGACCGACTTATGTCTCGCCGTATACCGTGAGTTATTTGACCGAGTGGCTCGA
>> AACGGCGGTATTGTACTTACTATTTATCTAGTCTCTGGAAATATCAGACGTCCCGGGAATCGTTACAGCGGAAA
>> TATAGGGTACCCGAAAAACTGGTACCCGTTTCTGAAACGACTCCTTATTCGGGTACCTATCGAATACTAACGCC
>> GCAGTTTCCCGTCTGGCTGCGATGGAAAAAGCGTTAGATTCGGGATCTCTATATTCGCGGGTGTTCGATTAGTC
>> GTGAAATACAGCCCTTACGCGGGTGACGACGGTCGATCGGAAAGAAAGCGAATTAGGGTTAGGTTTAAAGAGTC
>> ATTGGTTCCGTGTACGGGCGAAAAAGTACCCGTTACGGAACGGCCGTCGACGCGAGTGTGGAAATAAGTATATA
>> GTTACGAGAAAGAGGGTCTGTACCTCGGAGTTTTTTGAAGGTACCGTAATCAGGACCCTGTCTCGTGAATATAC
>> AAGTTACTCGCCGAACGGTTCGGCCAATGTAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTTCAT
>> AAAAAGCTTTTATATATAGTTTGCCGAATAATAGCAAACG
>> 
>>> 301333049|GU725060|Xiphinema pyrenaicum  internal transcribed spacer 1
>> AAAGCGGAAAAATTACTTTCTCACCCGGAAAAAACAGACCGTTTATCGGTCCGACTTGAAACGCGGTAAGGCGC
>> TCTTGCGCGATAGCCCGCCGTCGGTTCCGATGGTCTGGACCCCGAAAAATAGTAAGAACGACGGTGTCGTCGAT
>> TCTCGGTTAGTAGTATATCCGGTCGGATCGATATATATCGGTCGGTCGAGTTTCTATCGGGTTCTTTGAGTTTC
>> TTCGGACAGCGTCGGTTGTAGTGGAGTTTGGATACCTACCCGACTGTCCTTATAATCTCGTGAACTCTAGCCCG
>> ATAATAATACGGAACTCGGCTGAGAACGACCGACTTAGGTCTGAGTAGATATACTGAGAATATTACCTAGCCGA
>> GATGAACGAAACGGCGACATTGGAGTTTTACTATTTACTCGTATCAGACGTCCCGGGAATCGTTGCAGTTGAAT
>> TACATATATACGGGTACCTGTAATTGGACTCGTTTCTGTAACGGTTCTTTAGTCGGGTACCTATCGAATACTAA
>> CGCCGCGGTTATCCGTCTGGCCGCGATGGAATAAGCGTTAGATTCGGCATCCCTTTATTCGTATACGTTCGAGT
>> AGTCGTGAATTAGAACCCTTTAACCGGGGTGAAGACTATCGACGGGAGATAAGCGAATTAGGGGTAGGTTTAAA
>> GAGTCATCGGTTCCGGATACGGAGAGAAAAATGCCCGTAATGGAACGACCATTGAAGCGGGATCTATATATATA
>> TATATATGATTCGCCCGATGGTTCGGGACATGGAGAATTTTATGTCCGGGTGACCGCCCGAAATAGAAATATTT
>> ACTTCAAAGTTATTTATATATAGTTCGCCTTATAAGAGCGAACG
>> 
>> 
>> 
>> sequences.fasta data
>> 
>>> Test1
>> ATGATTGGTGGCTGTTGCTGGCTAGTGTTTCTAGTATGTGCACGCCACACCTTCAATCCCACTTACACCTGTGC
>> ACCTTTTGTAGTATTACTTGTGGATATCGAGAGAAAGTTTAGTCTTTCACTCTGTTGAAACCGGTTTACTACGT
>> TTTTTTATACACACACAATAGTCATTGAATGTATTTTTTATTTCTTATGATAAAAA
>> 
>>> Test2
>> GAATCTCTCAAATACAATAATTTTTTTTGATTGTTGTATTTGGACTTGGAAGCTGTTGGCGCAAGTCGACTCTT
>> CTCAAATGTATTAGCTGGGGTTTATATAGTTGGATCCTTGGTGTGATAATTATCTACGCCTTGAAGTCCCTGTA
>> GACTCTGCTTCAAATCGTCTCTTCATGAGACAATATTTGAATCA
>> 
>>> Test3
>> CTCGAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATCAGCCCGCT
>> CCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATA
>> AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCAAAATGCGATAAGTA
>> ATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCAT
>> GCCTGTTCGAGCGTCATTTCAACCCTCAAGCCCCCGGGTTTGGTGTTGGGGATCGGCGAGCCCTTGCGGCAAGC
>> CGGCCCCGAAATCTAGTGGCGGTCTCGCTGCAGCTTCCCCTGCATAGAAAACCCGCGGGGGGGGGAC
>> 
>>> Test4
>> GAGGTCTTACCGAGTTTCACTCCCAACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCAGCCCGCTCCCG
>> GTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATGTAACTTCTGAGTAAAACCATAAATA
>> AATCAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCCAAATGCGATAAGTAATGT
>> GAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCT
>> GTTCGAGCGTCATTTCAACCCTCAAGCACAGCTTGGTGTTGGGACTCGCGTTAATTCGCGTTCCTCAAATTGAT
>> TGTCGTTCACGTCGAGCTTCCATAGCGTAGTAGTAAAACCCTCGTTACTGGTAATCGTCGCGGCCACGCCGTTA
>> AACCCCAACTTCTGAATGTTGACCTCGGATCAGTAAGGAATACCCGCTGAACTTAAGCATATCATTAAGCGGAG
>> GAA
>> 
>> 
>> 
>> 
>> Results
>> 
>> BLASTN 2.2.24+
>> 
>> 
>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
>> Miller (2000), "A greedy algorithm for aligning DNA sequences", J
>> Comput Biol 2000; 7(1-2):203-14.
>> 
>> 
>> 
>> Database: ITS
>>             5 sequences; 1,102 total letters
>> 
>> 
>> 
>> Query=  Test1
>> Length=204
>> 
>> 
>> ***** No hits found *****
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 202071
>> 
>> 
>> Query=  Test2
>> Length=192
>> 
>> 
>> ***** No hits found *****
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 189507
>> 
>> 
>> Query=  Test3
>> Length=437
>> 
>> Score     E
>> Sequences producing significant alignments:
>> (Bits)  Value
>> 
>> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 300    2e-085
>> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 69.4    6e-016
>> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 58.4    1e-012
>> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 56.5    4e-012
>> 
>> 
>>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G59F
>> Length=203
>> 
>>   Score =  300 bits (162),  Expect = 2e-085
>>   Identities = 176/182 (96%), Gaps = 4/182 (2%)
>>   Strand=Plus/Plus
>> 
>> Query  10   TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATCACCAATTGTTGCCTCGGCGGATC
>> 66
>>              ||||||||||| | |||||| |||||| |||||||| |||| ||||||||||||||||||
>> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACAT-ACCACTTGTTGCCTCGGCGGATC
>> 81
>> 
>> Query  67   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
>> 126
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  82   AGCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATAT
>> 141
>> 
>> Query  127  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
>> 186
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  142  GTAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT
>> 201
>> 
>> Query  187  GG  188
>>              ||
>> Sbjct  202  GG  203
>> 
>> 
>>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G64F
>> Length=217
>> 
>>   Score = 69.4 bits (37),  Expect = 6e-016
>>   Identities = 39/40 (97%), Gaps = 0/40 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  149  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  188
>>              ||||| ||||||||||||||||||||||||||||||||||
>> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>> 
>> 
>>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G60F
>> Length=206
>> 
>>   Score = 58.4 bits (31),  Expect = 1e-012
>>   Identities = 39/42 (92%), Gaps = 3/42 (7%)
>>   Strand=Plus/Plus
>> 
>> Query  146  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  186
>>              |||| || ||| ||||||||||||||||||||||||||||||
>> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>> 
>> 
>>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G65F
>> Length=256
>> 
>>   Score = 56.5 bits (30),  Expect = 4e-012
>>   Identities = 30/30 (100%), Gaps = 0/30 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  157  AAAACTTTCAACAACGGATCTCTTGGTTCT  186
>>              ||||||||||||||||||||||||||||||
>> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 442850
>> 
>> 
>> Query=  Test4
>> Length=521
>> 
>> Score     E
>> Sequences producing significant alignments:
>> (Bits)  Value
>> 
>> dbj|AB581518.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 309    4e-088
>> dbj|AB581521.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 69.4    7e-016
>> dbj|AB581519.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 58.4    1e-012
>> dbj|AB581522.1|  Uncultured fungus genes for 18S rRNA, ITS1 and 5...
>> 56.5    5e-012
>> 
>> 
>>> dbj|AB581518.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G59F
>> Length=203
>> 
>>   Score =  309 bits (167),  Expect = 4e-088
>>   Identities = 177/181 (97%), Gaps = 3/181 (1%)
>>   Strand=Plus/Plus
>> 
>> Query  7    TTACCGAGTTT-C-ACTCCC-AACCCCTGTGAACATACCACTTGTTGCCTCGGCGGATCA
>> 63
>>              ||||||||||| | |||||| |||||| ||||||||||||||||||||||||||||||||
>> Sbjct  23   TTACCGAGTTTACAACTCCCAAACCCCAGTGAACATACCACTTGTTGCCTCGGCGGATCA
>> 82
>> 
>> Query  64   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
>> 123
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  83   GCCCGCTCCCGGTAAAACGGGACGGCCCGCCAGAGGACCCCTAAACTCTGTTTCTATATG
>> 142
>> 
>> Query  124  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
>> 183
>>              ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> Sbjct  143  TAACTTCTGAGTAAAACCATAAATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTG
>> 202
>> 
>> Query  184  G  184
>>              |
>> Sbjct  203  G  203
>> 
>> 
>>> dbj|AB581521.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G64F
>> Length=217
>> 
>>   Score = 69.4 bits (37),  Expect = 7e-016
>>   Identities = 39/40 (97%), Gaps = 0/40 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  145  AATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  184
>>              ||||| ||||||||||||||||||||||||||||||||||
>> Sbjct  178  AATAAGTCAAAACTTTCAACAACGGATCTCTTGGTTCTGG  217
>> 
>> 
>>> dbj|AB581519.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G60F
>> Length=206
>> 
>>   Score = 58.4 bits (31),  Expect = 1e-012
>>   Identities = 39/42 (92%), Gaps = 3/42 (7%)
>>   Strand=Plus/Plus
>> 
>> Query  142  ATAA-ATAAATCAAAACTTTCAACAACGGATCTCTTGGTTCT  182
>>              |||| || ||| ||||||||||||||||||||||||||||||
>> Sbjct  165  ATAACAT-AAT-AAAACTTTCAACAACGGATCTCTTGGTTCT  204
>> 
>> 
>>> dbj|AB581522.1| Uncultured fungus genes for 18S rRNA, ITS1 and 5.8S rRNA,
>> partial
>> sequence, clone: G65F
>> Length=256
>> 
>>   Score = 56.5 bits (30),  Expect = 5e-012
>>   Identities = 30/30 (100%), Gaps = 0/30 (0%)
>>   Strand=Plus/Plus
>> 
>> Query  153  AAAACTTTCAACAACGGATCTCTTGGTTCT  182
>>              ||||||||||||||||||||||||||||||
>> Sbjct  225  AAAACTTTCAACAACGGATCTCTTGGTTCT  254
>> 
>> 
>> 
>> Lambda     K      H
>>      1.33    0.621     1.12
>> 
>> Gapped
>> Lambda     K      H
>>      1.28    0.460    0.850
>> 
>> Effective search space used: 530378
>> 
>> 
>>    Database: ITS
>>      Posted date:  Aug 27, 2010  9:43 AM
>>    Number of letters in database: 1,102
>>    Number of sequences in database:  5
>> 
>> 
>> 
>> Matrix: blastn matrix 1 -2
>> Gap Penalties: Existence: 0, Extension: 2.5
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jun.yin at ucd.ie  Sat Sep 11 16:13:09 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Sat, 11 Sep 2010 17:13:09 +0100
Subject: [Bioperl-l] Regarding GSoC 2010
In-Reply-To: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>
References: <AANLkTim6WiTfdw+Kt=xLxsGizPfOVX6mSG1Srh-SKQeY@mail.gmail.com>
Message-ID: <019501cb51cc$39d15730$ad740590$%yin@ucd.ie>

Hi, Jayanthi Jayakumar,

GSoC is already finished this year. You can check the information here:
http://socghop.appspot.com/gsoc/program/home/google/gsoc2010

However, you can still contribute to the BioPerl project if you like. You
can talk to people in this mail list. Or you can join the IRC channel
(http://www.bioperl.org/wiki/IRC).

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jayanthijayakumar
Sent: Thursday, September 09, 2010 6:00 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Regarding GSoC 2010

Respected sir/madam,

I am Jayanthi Jayakumar doing my second year MS(By Research) in
computational biology in Anna University Chennai,India.

Iam very  much interested to participate in GSoC 2010 under the project
"Major Bioperl recognition".

I request you to provide details and eligiblity criteria for the same.


Thanking you,


yours faithfully,
Jayanthi Jayakumar
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 
 
__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From david.breimann at gmail.com  Sun Sep 12 13:16:29 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sun, 12 Sep 2010 15:16:29 +0200
Subject: [Bioperl-l] Circular genomes
Message-ID: <AANLkTi=LgWkUVeUd6WdaoDZQBTJMm7aCRe86p2-+R9PE@mail.gmail.com>

Hello,

As continuation to
http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033904.html, I
would like to ask:
Was the fix implemented yet? That is, are GFF3 created for circular genomes
comply with GFF3 specs for such genomes?
I just find it difficult to keep track using git ,so I'm not sure if this
was already handled.

Also, will the stat and end coordinates of such genes loaded from a GFF3
file will be "normal" (i.e. no coordinate is larger than the size of the
genome) or just as written in the GFF3 (which demands that end > start even
if end > genome length)?

Thanks,
David


From David.Messina at sbc.su.se  Mon Sep 13 15:10:42 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 13 Sep 2010 17:10:42 +0200
Subject: [Bioperl-l] BioPerl net installer
Message-ID: <80921A33-63E0-481A-B31B-3C0338542F2B@sbc.su.se>

Hi everyone,

I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.

The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.

It's already part of bioperl-live, and you can also get it here:

	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl


Dave


From maj at fortinbras.us  Mon Sep 13 16:47:45 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 13 Sep 2010 16:47:45 +0000
Subject: [Bioperl-l] BioPerl net installer
Message-ID: <W964423161995101284396465@webmail24>

Dear Scott-
You rock!
Sincerely, 
Mark

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Monday, September 13, 2010 11:10 AM
>To: 'BioPerl List'
>Subject: [Bioperl-l] BioPerl net installer
>
>Hi everyone,
>
>I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.
>
>The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.
>
>It's already part of bioperl-live, and you can also get it here:
>
>	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Sep 13 21:15:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 13 Sep 2010 16:15:45 -0500
Subject: [Bioperl-l] BioPerl net installer
In-Reply-To: <W964423161995101284396465@webmail24>
References: <W964423161995101284396465@webmail24>
Message-ID: <3D7D24C5-B2BD-472E-9611-F3D7112E453D@illinois.edu>

Ditto!  

chris (briefly resurfacing)

On Sep 13, 2010, at 11:47 AM, Mark A. Jensen wrote:

> Dear Scott-
> You rock!
> Sincerely, 
> Mark
> 
>> -----Original Message-----
>> From: Dave Messina [mailto:David.Messina at sbc.su.se]
>> Sent: Monday, September 13, 2010 11:10 AM
>> To: 'BioPerl List'
>> Subject: [Bioperl-l] BioPerl net installer
>> 
>> Hi everyone,
>> 
>> I don't think it's been announced on the list, but at the Bio-hackathon in Boston last July, Scott Cain kindly adapted his Gbrowse net installer for use with BioPerl.
>> 
>> The net installer will grab bioperl-live and all the prerequisites for you and install them, so this should make it dead simple for anyone to get up and running.
>> 
>> It's already part of bioperl-live, and you can also get it here:
>> 
>> 	http://github.com/bioperl/bioperl-live/blob/master/scripts/bioperl_netinstall.pl
>> 
>> 
>> 
>> Dave
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From timmcilveen at talktalk.net  Mon Sep 13 23:07:00 2010
From: timmcilveen at talktalk.net (tim)
Date: Tue, 14 Sep 2010 00:07:00 +0100
Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3
Message-ID: <201009140007.00798.timmcilveen@talktalk.net>

Hi,
I have just installed Bioperl on my Linux system using the CPAN install. The 
install summary is as follows:
Test Summary Report
-------------------
t/RemoteDB/GenPept.t                       (Wstat: 256 Tests: 21 Failed: 1)
  Failed test:  17
  Non-zero exit status: 1
t/RemoteDB/Query/GenBank.t                 (Wstat: 256 Tests: 18 Failed: 1)
  Failed test:  9
  Non-zero exit status: 1
  Parse errors: Bad plan.  You planned 21 tests but ran 18.
t/RemoteDB/Taxonomy.t                      (Wstat: 512 Tests: 103 Failed: 2)
  Failed tests:  15, 98
  Non-zero exit status: 2
t/Root/RootIO.t                            (Wstat: 7424 Tests: 30 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 31 tests but ran 30.
Files=329, Tests=18407, 512 wallclock secs ( 6.19 usr  0.91 sys + 156.68 cusr  
9.16 csys = 172.94 CPU)
Result: FAIL
Failed 4/329 test programs. 4/18407 subtests failed.
  CJFIELDS/BioPerl-1.6.1.tar.gz
  ./Build test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports CJFIELDS/BioPerl-1.6.1.tar.gz
Running Build install
  make test had returned bad status, won't install without force
Failed during this command:
 CJFIELDS/BioPerl-1.6.1.tar.gz                : make_test NO

Is Bioperl properly installed? During the install process I was getting quite 
a lot of this error (100's of instances):
'replacement list longer than search list . This happened with t/tools, t/seq 
/ t/search and many others.

Any advice would be great.
Tim


From David.Messina at sbc.su.se  Tue Sep 14 07:56:33 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 14 Sep 2010 09:56:33 +0200
Subject: [Bioperl-l] Installing Bioperl using CPAN on Suse 11.3
In-Reply-To: <201009140007.00798.timmcilveen@talktalk.net>
References: <201009140007.00798.timmcilveen@talktalk.net>
Message-ID: <5955676D-D3BC-452B-BAA0-6F230EC11EC1@sbc.su.se>

Hi Tim,

Thanks for your report.


> Is Bioperl properly installed?

No, it wasn't. When installing through CPAN, if any tests fail the installation is aborted. You can always check by looking for this line:

>  make test had returned bad status, won't install without force


As for the error(s)

> 'replacement list longer than search list'

I believe this was fixed a couple of months ago. For details, see:

	http://bugzilla.open-bio.org/show_bug.cgi?id=3116


So I would recommend that you grab the latest copy of bioperl-live from github, wherein the bug will be fixed:

	http://www.bioperl.org/wiki/Getting_BioPerl#Snapshots


Give that a shot and let us know how it goes.


Dave


From jskittrell at unmc.edu  Thu Sep 16 16:15:49 2010
From: jskittrell at unmc.edu (Jeff Kittrell)
Date: Thu, 16 Sep 2010 16:15:49 +0000 (UTC)
Subject: [Bioperl-l] mpiblast
Message-ID: <loom.20100916T180946-13@post.gmane.org>

Does Bioperl work with mpiblast? Is the there a standalone like module that
allows you to easily call mpiblast? I'm assuming seqio with parse a mpiblast
output file correctly?

Thanks for any help,

Jeff 


From David.Messina at sbc.su.se  Thu Sep 16 18:25:57 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 16 Sep 2010 20:25:57 +0200
Subject: [Bioperl-l] mpiblast
In-Reply-To: <loom.20100916T180946-13@post.gmane.org>
References: <loom.20100916T180946-13@post.gmane.org>
Message-ID: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>

> Is the there a standalone like module that allows you to easily call mpiblast?

No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward.

	http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase


> I'm assuming seqio with parse a mpiblast output file correctly?

Yes, although I see that a new version of mpiblast was recently released. 

Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet?


Dave


From shalabh.sharma7 at gmail.com  Thu Sep 16 21:38:14 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Thu, 16 Sep 2010 17:38:14 -0400
Subject: [Bioperl-l] IUPAC code similarity
Message-ID: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>

Hi All,
      I have few nucleotide sequences that are composed of IUPAC codes. Like
>test
VGSRVBSSSSSNSC

Similarly i have a database made of of these kind of sequences. I want to
find sequences that are 100% similar to the query sequence.

Is there any bioPerl module to deal with this, i tried normal blast but it
didn't worked.
Do i have to convert these sequences to 4 base codes or there is any other
way out.

Thanks
Shalabh


From amackey at virginia.edu  Fri Sep 17 14:28:15 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 10:28:15 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
Message-ID: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>

Convert the IUPAC code to a regular expression, and use regular expressions
(in Perl or grep or similar) to find 100% identical matches.

-Aaron

On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>      I have few nucleotide sequences that are composed of IUPAC codes. Like
> >test
> VGSRVBSSSSSNSC
>
> Similarly i have a database made of of these kind of sequences. I want to
> find sequences that are 100% similar to the query sequence.
>
> Is there any bioPerl module to deal with this, i tried normal blast but it
> didn't worked.
> Do i have to convert these sequences to 4 base codes or there is any other
> way out.
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From shalabh.sharma7 at gmail.com  Fri Sep 17 15:07:38 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Fri, 17 Sep 2010 11:07:38 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
Message-ID: <AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>

Thanks Aaron for your reply.
Actually i tried that first, but there is another problem, i have to divide
each query sequence to window size 5 with 1 base shift and its not possible
to divide regular expression in that way.
So what i am trying is to convert those iupac codes to 4 base code sequence
and then do the normal search.
Now the problem is that i cant able to convert those IUPAC sequences to
normal ones, i am still trying to write a script but its taking time.

Thanks
Shalabh


On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu> wrote:

> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <shalabh.sharma7 at gmail.com
> > wrote:
>
>> Hi All,
>>      I have few nucleotide sequences that are composed of IUPAC codes.
>> Like
>> >test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From roy.chaudhuri at gmail.com  Fri Sep 17 15:04:28 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 17 Sep 2010 16:04:28 +0100
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
Message-ID: <4C93837C.4080008@gmail.com>

Hi Shalabh,

The expand method in Bio::Tools::SeqPattern may be useful to convert 
IUPAC codes to regular expressions:

$perl -e 'use Bio::Tools::SeqPattern; print 
Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
[ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C

Although that won't work if there are also abiguity codes in your 
database. For a non-BioPerl solution you could try fuzznuc from Emboss.

Cheers.
Roy.

On 17/09/2010 15:28, Aaron Mackey wrote:
> Convert the IUPAC code to a regular expression, and use regular expressions
> (in Perl or grep or similar) to find 100% identical matches.
>
> -Aaron
>
> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
> <shalabh.sharma7 at gmail.com>wrote:
>
>> Hi All,
>>       I have few nucleotide sequences that are composed of IUPAC codes. Like
>>> test
>> VGSRVBSSSSSNSC
>>
>> Similarly i have a database made of of these kind of sequences. I want to
>> find sequences that are 100% similar to the query sequence.
>>
>> Is there any bioPerl module to deal with this, i tried normal blast but it
>> didn't worked.
>> Do i have to convert these sequences to 4 base codes or there is any other
>> way out.
>>
>> Thanks
>> Shalabh
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From david.breimann at gmail.com  Fri Sep 17 18:13:22 2010
From: david.breimann at gmail.com (David Breimann)
Date: Fri, 17 Sep 2010 20:13:22 +0200
Subject: [Bioperl-l] Installing using git after an older installation
Message-ID: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>

Hello,

I'm sharing a server with some other lab members.
I would like to install the latest version of bioperl for my own use,
without affecting my colleagues.
I used git to clone a copy of bioperl-live and exported
  PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB".
Now
  perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
returns
  1.0069

My question is: is that all?
Now I'm using the latest version?
Should I include anything special in my scripts?

Also, what about all the bp_***.pl scripts?
Are the now using the latest version, too?
I guess not, since I didn't build anything. So what should I do about them?

Thanks,
Dave


From amackey at virginia.edu  Fri Sep 17 19:24:44 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 15:24:44 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <4C93837C.4080008@gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<4C93837C.4080008@gmail.com>
Message-ID: <AANLkTi=Nx5fQ=YvxzB4mSsJh+38B-2+87Lvf=WHU_rCH@mail.gmail.com>

If there are ambi. codes in the database, then the expanded character class
has to also include the original ambiguity code; non-ambiguous nucleotides
must also be expanded to include all ambiguity codes that represent the
nucleotide.

-Aaron

On Fri, Sep 17, 2010 at 11:04 AM, Roy Chaudhuri <roy.chaudhuri at gmail.com>wrote:

> Hi Shalabh,
>
> The expand method in Bio::Tools::SeqPattern may be useful to convert IUPAC
> codes to regular expressions:
>
> $perl -e 'use Bio::Tools::SeqPattern; print
> Bio::Tools::SeqPattern->new(-seq=>"VGSRVBSSSSSNSC", -type=>'DNA')->expand'
> [ACG]G[GC][AG][ACG][CGT][GC][GC][GC][GC][GC].[GC]C
>
> Although that won't work if there are also abiguity codes in your database.
> For a non-BioPerl solution you could try fuzznuc from Emboss.
>
> Cheers.
> Roy.
>
>
> On 17/09/2010 15:28, Aaron Mackey wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions
>> (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma
>> <shalabh.sharma7 at gmail.com>wrote:
>>
>>  Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>>
>>>> test
>>>>
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>  _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From amackey at virginia.edu  Fri Sep 17 19:25:54 2010
From: amackey at virginia.edu (Aaron Mackey)
Date: Fri, 17 Sep 2010 15:25:54 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
Message-ID: <AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>

do your windowing/shifting on the unexpanded query sequences; then transform
the 5-bp queries into regular expressions.

-Aaron

On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Thanks Aaron for your reply.
> Actually i tried that first, but there is another problem, i have to divide
> each query sequence to window size 5 with 1 base shift and its not possible
> to divide regular expression in that way.
> So what i am trying is to convert those iupac codes to 4 base code sequence
> and then do the normal search.
> Now the problem is that i cant able to convert those IUPAC sequences to
> normal ones, i am still trying to write a script but its taking time.
>
> Thanks
> Shalabh
>
>
> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu>wrote:
>
>> Convert the IUPAC code to a regular expression, and use regular
>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>
>> -Aaron
>>
>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>> shalabh.sharma7 at gmail.com> wrote:
>>
>>> Hi All,
>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>> Like
>>> >test
>>> VGSRVBSSSSSNSC
>>>
>>> Similarly i have a database made of of these kind of sequences. I want to
>>> find sequences that are 100% similar to the query sequence.
>>>
>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>> it
>>> didn't worked.
>>> Do i have to convert these sequences to 4 base codes or there is any
>>> other
>>> way out.
>>>
>>> Thanks
>>> Shalabh
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>


From Kevin.M.Brown at asu.edu  Fri Sep 17 20:09:34 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 17 Sep 2010 13:09:34 -0700
Subject: [Bioperl-l] Installing using git after an older installation
In-Reply-To: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>
References: <AANLkTikrrk+=63rf0ivOMvKw4pDiM2NmF13KPxmT7Ytm@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B40701E0A4@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_IN_A_PERSONAL_MODULE_AREA


From shalabh.sharma7 at gmail.com  Fri Sep 17 20:45:50 2010
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Fri, 17 Sep 2010 16:45:50 -0400
Subject: [Bioperl-l] IUPAC code similarity
In-Reply-To: <AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>
References: <AANLkTikVhvt6-39o7g7+SsMS-_vstmk8+LCs7Q7TWzXB@mail.gmail.com>
	<AANLkTimCyYXR8-kQZz2nhtDo4Z-qJxiyNsx3QfEFOuM=@mail.gmail.com>
	<AANLkTimF=eOuT1fPYBXTdW98C0gK=qubB1bpyPPUKu14@mail.gmail.com>
	<AANLkTi=qxrwQ5yTZVWBgEhhc_b4X7qGfSjMPikqrZxbz@mail.gmail.com>
Message-ID: <AANLkTi=b7MYxC=2DuY3N5Udg9CsfoVDV=Q9cFUOUML0x@mail.gmail.com>

Thanks Aaron,
changing the query sequence worked well but i am still struggling with the
database.

-Shalabh


On Fri, Sep 17, 2010 at 3:25 PM, Aaron Mackey <amackey at virginia.edu> wrote:

> do your windowing/shifting on the unexpanded query sequences; then
> transform the 5-bp queries into regular expressions.
>
> -Aaron
>
>
> On Fri, Sep 17, 2010 at 11:07 AM, shalabh sharma <
> shalabh.sharma7 at gmail.com> wrote:
>
>> Thanks Aaron for your reply.
>> Actually i tried that first, but there is another problem, i have to
>> divide each query sequence to window size 5 with 1 base shift and its not
>> possible to divide regular expression in that way.
>> So what i am trying is to convert those iupac codes to 4 base code
>> sequence and then do the normal search.
>> Now the problem is that i cant able to convert those IUPAC sequences to
>> normal ones, i am still trying to write a script but its taking time.
>>
>> Thanks
>> Shalabh
>>
>>
>> On Fri, Sep 17, 2010 at 10:28 AM, Aaron Mackey <amackey at virginia.edu>wrote:
>>
>>> Convert the IUPAC code to a regular expression, and use regular
>>> expressions (in Perl or grep or similar) to find 100% identical matches.
>>>
>>> -Aaron
>>>
>>> On Thu, Sep 16, 2010 at 5:38 PM, shalabh sharma <
>>> shalabh.sharma7 at gmail.com> wrote:
>>>
>>>> Hi All,
>>>>      I have few nucleotide sequences that are composed of IUPAC codes.
>>>> Like
>>>> >test
>>>> VGSRVBSSSSSNSC
>>>>
>>>> Similarly i have a database made of of these kind of sequences. I want
>>>> to
>>>> find sequences that are 100% similar to the query sequence.
>>>>
>>>> Is there any bioPerl module to deal with this, i tried normal blast but
>>>> it
>>>> didn't worked.
>>>> Do i have to convert these sequences to 4 base codes or there is any
>>>> other
>>>> way out.
>>>>
>>>> Thanks
>>>> Shalabh
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>


From heikki.lehvaslaiho at gmail.com  Sat Sep 18 07:41:22 2010
From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho)
Date: Sat, 18 Sep 2010 10:41:22 +0300
Subject: [Bioperl-l] mpiblast
In-Reply-To: <0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>
References: <loom.20100916T180946-13@post.gmane.org>
	<0B4D6EFD-69EE-454F-A0DC-E6BD9ADCF16E@sbc.su.se>
Message-ID: <AANLkTi=SOzf5PH605OkPYf2r6w=xaTdXqU+zY7LBwkNw@mail.gmail.com>

Been running 1.6 and its betas on Blue Gene/P for months. The output
is identical to standard BLAST output. No issues in parsing it with
BioPerl.

? ?? -Heikki

Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
cell: +966 545 595 849? office: +966 2 808 2429

Computational Bioscience Research Centre (CBRC), Building #2, Office #4216
4700 King Abdullah University of Science and Technology (KAUST)
Thuwal 23955-6900, Kingdom of Saudi Arabia


On 16 September 2010 21:25, Dave Messina <David.Messina at sbc.su.se> wrote:
>> Is the there a standalone like module that allows you to easily call mpiblast?
>
> No, although with Mark Jensen's new WrapperBase system, writing one would probably be pretty straightforward.
>
> ? ? ? ?http://www.bioperl.org/wiki/Module:Bio::Tools::Run::WrapperBase
>
>
>> I'm assuming seqio with parse a mpiblast output file correctly?
>
> Yes, although I see that a new version of mpiblast was recently released.
>
> Has anyone out there tested BioPerl against mpiBLAST 1.6.0 output yet?
>
>
> Dave
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From david.breimann at gmail.com  Sat Sep 18 09:05:58 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 11:05:58 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
Message-ID: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>

Hello,

I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag`
in the fields and sometime it doesn't, even though the genabank has a locus
tag.
Also, is the ID always equivalent to the locus tag?

Thanks,
Dave


From scott at scottcain.net  Sat Sep 18 09:17:24 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 10:17:24 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
Message-ID: <AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>

Hi Dave,

bp_genbank2gff3.pl suffers from the fact that it has to deal with
GenBank files :-)  It was designed initially to work on whole genome
refseqs, and contains several ad hoc rules for trying to make it "do
the right thing."  In practice, it is not unusual for a post
processing step (either by hand or a quicky perl script) to be
required to really get it right.  I don't recall the specifics (if I
ever knew :-) for when and how the locus tag is used, but I do know
that there is a list of things that it will try to use for the ID, and
while the locus is on the list, I don't know where it comes in the
list, so it's possible that other items might supersede it.

Scott


On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Hello,
>
> I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a `locus_tag`
> in the fields and sometime it doesn't, even though the genabank has a locus
> tag.
> Also, is the ID always equivalent to the locus tag?
>
> Thanks,
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 09:20:33 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 11:20:33 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
Message-ID: <AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>

Since locus_tag is an essential tag in genbank, I suggest locus_tag will be
always added to the GFF last column if it exists in the genbank, whether it
is used as ID in the GFF or not.

On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> bp_genbank2gff3.pl suffers from the fact that it has to deal with
> GenBank files :-)  It was designed initially to work on whole genome
> refseqs, and contains several ad hoc rules for trying to make it "do
> the right thing."  In practice, it is not unusual for a post
> processing step (either by hand or a quicky perl script) to be
> required to really get it right.  I don't recall the specifics (if I
> ever knew :-) for when and how the locus tag is used, but I do know
> that there is a list of things that it will try to use for the ID, and
> while the locus is on the list, I don't know where it comes in the
> list, so it's possible that other items might supersede it.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Hello,
> >
> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
> `locus_tag`
> > in the fields and sometime it doesn't, even though the genabank has a
> locus
> > tag.
> > Also, is the ID always equivalent to the locus tag?
> >
> > Thanks,
> > Dave
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


From scott at scottcain.net  Sat Sep 18 10:08:26 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 11:08:26 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
Message-ID: <AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>

Hi Dave,

That seems perfectly reasonable.  If you could point out a GenBank
entry for which that does not happen, I could try to figure out why
not.

Scott


On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Since locus_tag is an essential tag in genbank, I suggest locus_tag will be
> always added to the GFF last column if it exists in the genbank, whether it
> is used as ID in the GFF or not.
>
> On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> bp_genbank2gff3.pl suffers from the fact that it has to deal with
>> GenBank files :-) ?It was designed initially to work on whole genome
>> refseqs, and contains several ad hoc rules for trying to make it "do
>> the right thing." ?In practice, it is not unusual for a post
>> processing step (either by hand or a quicky perl script) to be
>> required to really get it right. ?I don't recall the specifics (if I
>> ever knew :-) for when and how the locus tag is used, but I do know
>> that there is a list of things that it will try to use for the ID, and
>> while the locus is on the list, I don't know where it comes in the
>> list, so it's possible that other items might supersede it.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Hello,
>> >
>> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
>> > `locus_tag`
>> > in the fields and sometime it doesn't, even though the genabank has a
>> > locus
>> > tag.
>> > Also, is the ID always equivalent to the locus tag?
>> >
>> > Thanks,
>> > Dave
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 10:20:50 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 12:20:50 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
Message-ID: <AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>

Hi Scott,

Here is a very short genbank:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk

Note all genes in the genbank have locus tags. In the resulting GFF3,
however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no
idea why it deserves a special treatment... :)

p.s. making this change (i.e., copying locus_tag to the GFF3 last column
whenever available) will really make my life easier.

Thank you,
Dave

On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> That seems perfectly reasonable.  If you could point out a GenBank
> entry for which that does not happen, I could try to figure out why
> not.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will
> be
> > always added to the GFF last column if it exists in the genbank, whether
> it
> > is used as ID in the GFF or not.
> >
> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net>
> wrote:
> >>
> >> Hi Dave,
> >>
> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with
> >> GenBank files :-)  It was designed initially to work on whole genome
> >> refseqs, and contains several ad hoc rules for trying to make it "do
> >> the right thing."  In practice, it is not unusual for a post
> >> processing step (either by hand or a quicky perl script) to be
> >> required to really get it right.  I don't recall the specifics (if I
> >> ever knew :-) for when and how the locus tag is used, but I do know
> >> that there is a list of things that it will try to use for the ID, and
> >> while the locus is on the list, I don't know where it comes in the
> >> list, so it's possible that other items might supersede it.
> >>
> >> Scott
> >>
> >>
> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> >> <david.breimann at gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
> >> > `locus_tag`
> >> > in the fields and sometime it doesn't, even though the genabank has a
> >> > locus
> >> > tag.
> >> > Also, is the ID always equivalent to the locus tag?
> >> >
> >> > Thanks,
> >> > Dave
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


From david.breimann at gmail.com  Sat Sep 18 10:45:13 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 12:45:13 +0200
Subject: [Bioperl-l] Extracting sequences from GFF3
Message-ID: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>

As you know, GFF3 files can contain FASTA sequences after the features.

How do I extract a specific FASTA sequence given it's ID?

I tried:

use Bio::Tools::GFF;
use Data::Dumper;

my $gffio = Bio::Tools::GFF->new(
    -file =>
        "/path/to/file.gff",
    -gff_version => 3
);

print Dumper $gffio->get_seqs();

but $gffio->get_seqs() seems to return nothing, although the GFF3 has
sequences and is also valid.

By the way, I am able to parse the features themselves (using
$gffio->next_feature()).


Thanks,

Dave


From scott at scottcain.net  Sat Sep 18 11:07:13 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 12:07:13 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
Message-ID: <AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>

Hi Dave,

A fresh "pull" of the bioperl git repository shows that
bp_genbank2gff3.pl already does this.  It creates a locus_tag for all
features that have a locus_tag, and uses the locus_tag for the ID when
it can (it can't blindly use the locus tag for the ID since both the
gene and the CDS have the same tag).

Scott


On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
<david.breimann at gmail.com> wrote:
> Hi Scott,
>
> Here is a very short genbank:
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>
> Note all genes in the genbank have locus tags. In the resulting GFF3,
> however, only the last gene (EcE24377A_B0005) gets a locus_tag. I have no
> idea why it deserves a special treatment... :)
>
> p.s. making this change (i.e., copying locus_tag to the GFF3 last column
> whenever available) will really make my life easier.
>
> Thank you,
> Dave
>
> On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> That seems perfectly reasonable. ?If you could point out a GenBank
>> entry for which that does not happen, I could try to figure out why
>> not.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Since locus_tag is an essential tag in genbank, I suggest locus_tag will
>> > be
>> > always added to the GFF last column if it exists in the genbank, whether
>> > it
>> > is used as ID in the GFF or not.
>> >
>> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain <scott at scottcain.net>
>> > wrote:
>> >>
>> >> Hi Dave,
>> >>
>> >> bp_genbank2gff3.pl suffers from the fact that it has to deal with
>> >> GenBank files :-) ?It was designed initially to work on whole genome
>> >> refseqs, and contains several ad hoc rules for trying to make it "do
>> >> the right thing." ?In practice, it is not unusual for a post
>> >> processing step (either by hand or a quicky perl script) to be
>> >> required to really get it right. ?I don't recall the specifics (if I
>> >> ever knew :-) for when and how the locus tag is used, but I do know
>> >> that there is a list of things that it will try to use for the ID, and
>> >> while the locus is on the list, I don't know where it comes in the
>> >> list, so it's possible that other items might supersede it.
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds a
>> >> > `locus_tag`
>> >> > in the fields and sometime it doesn't, even though the genabank has a
>> >> > locus
>> >> > tag.
>> >> > Also, is the ID always equivalent to the locus tag?
>> >> >
>> >> > Thanks,
>> >> > Dave
>> >> > _______________________________________________
>> >> > Bioperl-l mailing list
>> >> > Bioperl-l at lists.open-bio.org
>> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 11:13:23 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 12:13:23 +0100
Subject: [Bioperl-l] Extracting sequences from GFF3
In-Reply-To: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>
References: <AANLkTi=Ek0A8+SASajoZMYY5-w0w9Y_AE5mFJucNAheP@mail.gmail.com>
Message-ID: <AANLkTimc9Zf2a2N64wUUancrHWJ05wRKgfMuUQ0X9UTe@mail.gmail.com>

Hi Dave,

I would use Bio::DB::SeqFeature::Store (either with a database on the
backend or a flat file if a database isn't warranted):

  my $db      = Bio::DB::SeqFeature::Store->new( -adaptor => 'memory',
                                                        -dir =>
'path/to/file' );

  # Warning: this returns a string, and not a PrimarySeq object
  my $sequence = $db->fetch_sequence('Chr1',5000=>6000);

Scott


On Sat, Sep 18, 2010 at 11:45 AM, David Breimann
<david.breimann at gmail.com> wrote:
> As you know, GFF3 files can contain FASTA sequences after the features.
>
> How do I extract a specific FASTA sequence given it's ID?
>
> I tried:
>
> use Bio::Tools::GFF;
> use Data::Dumper;
>
> my $gffio = Bio::Tools::GFF->new(
>    -file =>
>        "/path/to/file.gff",
>    -gff_version => 3
> );
>
> print Dumper $gffio->get_seqs();
>
> but $gffio->get_seqs() seems to return nothing, although the GFF3 has
> sequences and is also valid.
>
> By the way, I am able to parse the features themselves (using
> $gffio->next_feature()).
>
>
> Thanks,
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 13:40:35 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 14:40:35 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
Message-ID: <AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>

Hi Dave,

Let's keep the discussion on the mailing list so we can make sure that
when this problem is solved, its resolution will be archived.

I don't really understand what is going on either, though it would
probably be a good idea to set your PERL5LIB env variable so that when
you execute this script from the git repository that it will also uses
BioPerl modules in the git repository instead of the ones that are
installed in your "normal" path.

Also, are you using any command line flags when executing it?  I didn't.

Scott


On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Yes, I'm using Ubuntu 10.04.
>
> That is really weired. I tried running the script from the perl-live dir
> (which I just pulled using git), and I get the same results as before
> (`Name` instead of `locus_tag`):
>
> ?$ wget
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
> NC_009789.genbank
>
> Attached is the resulting GFF3.
> I also attach a copy of bp_genbank2gff3.pl as found under
> /home/dave/src/bioperl-live/blib/script.
>
> This is a real mystery for me!
>
> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Typically I do build and install, but you can run it directly from the
>> git checkout directory.
>>
>> For locating other versions of the script, are you running linux? ?If
>> so, are you familiar with the "locate" command:
>>
>> ?locate bp_genbank2gff3.pl
>>
>> If you've never used it before, you may need to update the database
>> the locate command uses as root:
>>
>> ?sudo updatedb
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>> > of
>> > `locus_tag=`.
>> >
>> > I don't really know how to check for multiple bioperl installations.
>> > I'm using my personal server, so I don't mind removing and installing
>> > everything from scratch -- but I do'nt know ho to do that.
>> >
>> > Also, what I don't get with the git is how the scripts are supposed to
>> > be
>> > updated (unless you build and install).
>> >
>> > Thanks you!
>> >
>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net> wrote:
>> >>
>> >> Well, if you aren't getting the same results as me then I'd say you
>> >> aren't using the same version of the script :-)
>> >>
>> >> Unfortunately, the scripts are no longer automatically marked with the
>> >> "internal" version information when committed, so there really isn't
>> >> anything in the script I can tell you to look for. ?Check for more
>> >> than one bioperl instance on your ?computer.
>> >>
>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>> >> it is what you expect.
>> >>
>> >> Scott
>> >>
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Hi Scott,
>> >> >
>> >> > I just pulled the lated bioperl-live using git.
>> >> > I'm not sure how the scripts are updated, so I Build and installed
>> >> > anyway
>> >> > (perhaps exporting the path is supposed to be enough?)
>> >> > Anyway, I still get the same results. No locus_tag.
>> >> > How can I tell if I'm using the latest version of the script?
>> >> >
>> >> > Thanks again.
>> >> >
>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>> >> > wrote:
>> >> >>
>> >> >> Hi Dave,
>> >> >>
>> >> >> A fresh "pull" of the bioperl git repository shows that
>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>> >> >> all
>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>> >> >> when
>> >> >> it can (it can't blindly use the locus tag for the ID since both the
>> >> >> gene and the CDS have the same tag).
>> >> >>
>> >> >> Scott
>> >> >>
>> >> >>
>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>> >> >> <david.breimann at gmail.com> wrote:
>> >> >> > Hi Scott,
>> >> >> >
>> >> >> > Here is a very short genbank:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> >> >> >
>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>> >> >> > GFF3,
>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>> >> >> > have
>> >> >> > no
>> >> >> > idea why it deserves a special treatment... :)
>> >> >> >
>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>> >> >> > column
>> >> >> > whenever available) will really make my life easier.
>> >> >> >
>> >> >> > Thank you,
>> >> >> > Dave
>> >> >> >
>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <scott at scottcain.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Dave,
>> >> >> >>
>> >> >> >> That seems perfectly reasonable. ?If you could point out a
>> >> >> >> GenBank
>> >> >> >> entry for which that does not happen, I could try to figure out
>> >> >> >> why
>> >> >> >> not.
>> >> >> >>
>> >> >> >> Scott
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>> >> >> >> > locus_tag
>> >> >> >> > will
>> >> >> >> > be
>> >> >> >> > always added to the GFF last column if it exists in the
>> >> >> >> > genbank,
>> >> >> >> > whether
>> >> >> >> > it
>> >> >> >> > is used as ID in the GFF or not.
>> >> >> >> >
>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>> >> >> >> > <scott at scottcain.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi Dave,
>> >> >> >> >>
>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>> >> >> >> >> with
>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole
>> >> >> >> >> genome
>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>> >> >> >> >> it
>> >> >> >> >> "do
>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post
>> >> >> >> >> processing step (either by hand or a quicky perl script) to be
>> >> >> >> >> required to really get it right. ?I don't recall the specifics
>> >> >> >> >> (if I
>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I do
>> >> >> >> >> know
>> >> >> >> >> that there is a list of things that it will try to use for the
>> >> >> >> >> ID,
>> >> >> >> >> and
>> >> >> >> >> while the locus is on the list, I don't know where it comes in
>> >> >> >> >> the
>> >> >> >> >> list, so it's possible that other items might supersede it.
>> >> >> >> >>
>> >> >> >> >> Scott
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> > Hello,
>> >> >> >> >> >
>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it adds
>> >> >> >> >> > a
>> >> >> >> >> > `locus_tag`
>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>> >> >> >> >> > genabank
>> >> >> >> >> > has a
>> >> >> >> >> > locus
>> >> >> >> >> > tag.
>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>> >> >> >> >> >
>> >> >> >> >> > Thanks,
>> >> >> >> >> > Dave
>> >> >> >> >> > _______________________________________________
>> >> >> >> >> > Bioperl-l mailing list
>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> >> scottcain
>> >> >> >> >> dot net
>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> 216-392-3087
>> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------
>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> scottcain
>> >> >> >> dot net
>> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> 216-392-3087
>> >> >> >> Ontario Institute for Cancer Research
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------
>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> scottcain
>> >> >> dot net
>> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> >> Ontario Institute for Cancer Research
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Sep 18 13:48:35 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 14:48:35 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimt5Ty6-qXigmF1w+h5ZvMCaPZ7wO=YXx8TJPqO@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTimt5Ty6-qXigmF1w+h5ZvMCaPZ7wO=YXx8TJPqO@mail.gmail.com>
Message-ID: <AANLkTimMZ4vZUw2Gsd-NmaWL8xiD4hryH8Z-7UezT=LO@mail.gmail.com>

Hi Dave,

The blib directory is not part of the repository; it is created when
you execute ./Build as a staging area before installation.  The
directory that the script resides is scripts/Bio-DB-GFF/

Scott


On Sat, Sep 18, 2010 at 2:40 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Now I did a fresh clone (instead of pull) into a new dir:
>
> $ git clone http://github.com/bioperl/bioperl-live.git
>
> but I don't find the script at all (there is no blib dir as before)...
>
>
> On Sat, Sep 18, 2010 at 3:14 PM, David Breimann <david.breimann at gmail.com>
> wrote:
>>
>> Yes, I'm using Ubuntu 10.04.
>>
>> That is really weired. I tried running the script from the perl-live dir
>> (which I just pulled using git), and I get the same results as before
>> (`Name` instead of `locus_tag`):
>>
>> ?$ wget
>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
>> NC_009789.genbank
>>
>> Attached is the resulting GFF3.
>> I also attach a copy of bp_genbank2gff3.pl as found under
>> /home/dave/src/bioperl-live/blib/script.
>>
>> This is a real mystery for me!
>>
>> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>>
>>> Typically I do build and install, but you can run it directly from the
>>> git checkout directory.
>>>
>>> For locating other versions of the script, are you running linux? ?If
>>> so, are you familiar with the "locate" command:
>>>
>>> ?locate bp_genbank2gff3.pl
>>>
>>> If you've never used it before, you may need to update the database
>>> the locate command uses as root:
>>>
>>> ?sudo updatedb
>>>
>>> Scott
>>>
>>>
>>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>>> <david.breimann at gmail.com> wrote:
>>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>>> > of
>>> > `locus_tag=`.
>>> >
>>> > I don't really know how to check for multiple bioperl installations.
>>> > I'm using my personal server, so I don't mind removing and installing
>>> > everything from scratch -- but I do'nt know ho to do that.
>>> >
>>> > Also, what I don't get with the git is how the scripts are supposed to
>>> > be
>>> > updated (unless you build and install).
>>> >
>>> > Thanks you!
>>> >
>>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
>>> > wrote:
>>> >>
>>> >> Well, if you aren't getting the same results as me then I'd say you
>>> >> aren't using the same version of the script :-)
>>> >>
>>> >> Unfortunately, the scripts are no longer automatically marked with the
>>> >> "internal" version information when committed, so there really isn't
>>> >> anything in the script I can tell you to look for. ?Check for more
>>> >> than one bioperl instance on your ?computer.
>>> >>
>>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>>> >> it is what you expect.
>>> >>
>>> >> Scott
>>> >>
>>> >>
>>> >>
>>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>>> >> <david.breimann at gmail.com> wrote:
>>> >> > Hi Scott,
>>> >> >
>>> >> > I just pulled the lated bioperl-live using git.
>>> >> > I'm not sure how the scripts are updated, so I Build and installed
>>> >> > anyway
>>> >> > (perhaps exporting the path is supposed to be enough?)
>>> >> > Anyway, I still get the same results. No locus_tag.
>>> >> > How can I tell if I'm using the latest version of the script?
>>> >> >
>>> >> > Thanks again.
>>> >> >
>>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Dave,
>>> >> >>
>>> >> >> A fresh "pull" of the bioperl git repository shows that
>>> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>>> >> >> all
>>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>>> >> >> when
>>> >> >> it can (it can't blindly use the locus tag for the ID since both
>>> >> >> the
>>> >> >> gene and the CDS have the same tag).
>>> >> >>
>>> >> >> Scott
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>>> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> > Hi Scott,
>>> >> >> >
>>> >> >> > Here is a very short genbank:
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>>> >> >> >
>>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>>> >> >> > GFF3,
>>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>>> >> >> > have
>>> >> >> > no
>>> >> >> > idea why it deserves a special treatment... :)
>>> >> >> >
>>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>>> >> >> > column
>>> >> >> > whenever available) will really make my life easier.
>>> >> >> >
>>> >> >> > Thank you,
>>> >> >> > Dave
>>> >> >> >
>>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain
>>> >> >> > <scott at scottcain.net>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hi Dave,
>>> >> >> >>
>>> >> >> >> That seems perfectly reasonable. ?If you could point out a
>>> >> >> >> GenBank
>>> >> >> >> entry for which that does not happen, I could try to figure out
>>> >> >> >> why
>>> >> >> >> not.
>>> >> >> >>
>>> >> >> >> Scott
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>>> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>>> >> >> >> > locus_tag
>>> >> >> >> > will
>>> >> >> >> > be
>>> >> >> >> > always added to the GFF last column if it exists in the
>>> >> >> >> > genbank,
>>> >> >> >> > whether
>>> >> >> >> > it
>>> >> >> >> > is used as ID in the GFF or not.
>>> >> >> >> >
>>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>>> >> >> >> > <scott at scottcain.net>
>>> >> >> >> > wrote:
>>> >> >> >> >>
>>> >> >> >> >> Hi Dave,
>>> >> >> >> >>
>>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>>> >> >> >> >> with
>>> >> >> >> >> GenBank files :-) ?It was designed initially to work on whole
>>> >> >> >> >> genome
>>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>>> >> >> >> >> it
>>> >> >> >> >> "do
>>> >> >> >> >> the right thing." ?In practice, it is not unusual for a post
>>> >> >> >> >> processing step (either by hand or a quicky perl script) to
>>> >> >> >> >> be
>>> >> >> >> >> required to really get it right. ?I don't recall the
>>> >> >> >> >> specifics
>>> >> >> >> >> (if I
>>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
>>> >> >> >> >> do
>>> >> >> >> >> know
>>> >> >> >> >> that there is a list of things that it will try to use for
>>> >> >> >> >> the
>>> >> >> >> >> ID,
>>> >> >> >> >> and
>>> >> >> >> >> while the locus is on the list, I don't know where it comes
>>> >> >> >> >> in
>>> >> >> >> >> the
>>> >> >> >> >> list, so it's possible that other items might supersede it.
>>> >> >> >> >>
>>> >> >> >> >> Scott
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>>> >> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> >> > Hello,
>>> >> >> >> >> >
>>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
>>> >> >> >> >> > adds a
>>> >> >> >> >> > `locus_tag`
>>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>>> >> >> >> >> > genabank
>>> >> >> >> >> > has a
>>> >> >> >> >> > locus
>>> >> >> >> >> > tag.
>>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>>> >> >> >> >> >
>>> >> >> >> >> > Thanks,
>>> >> >> >> >> > Dave
>>> >> >> >> >> > _______________________________________________
>>> >> >> >> >> > Bioperl-l mailing list
>>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> >> >> scottcain
>>> >> >> >> >> dot net
>>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> >> 216-392-3087
>>> >> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> >> scottcain
>>> >> >> >> dot net
>>> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> 216-392-3087
>>> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >>
>>> >> >>
>>> >> >> ------------------------------------------------------------------------
>>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> >> scottcain
>>> >> >> dot net
>>> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> 216-392-3087
>>> >> >> Ontario Institute for Cancer Research
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> ------------------------------------------------------------------------
>>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>>> >> scottcain
>>> >> dot net
>>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>>> >> Ontario Institute for Cancer Research
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>>> dot net
>>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>>> Ontario Institute for Cancer Research
>>
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From david.breimann at gmail.com  Sat Sep 18 13:57:30 2010
From: david.breimann at gmail.com (David Breimann)
Date: Sat, 18 Sep 2010 15:57:30 +0200
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
Message-ID: <AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>

So let's do an intermediate summary of my situation:
I'm using Ubuntu 10.04 and Perl 5.10.1.
I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of
"locus_tag=" in the last GFF3 column), while Scott gets the expected results
while using the latest version of bioperl.
I cloned a fresh version of bioperl live into my ~/src:
$ cd ~/src
$ git clone http://github.com/bioperl/bioperl-live.git

I then added the following line to the end of ~/.profile:
export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
and ran
$ source ~/.profile

I then downloaded a small genome from NCBI
$ wget
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
and tested the script:
$ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk

Following are the top 10 lines of the resulting GFF3:

##gff-version 3
# sequence-region NC_009789 1 6199
# conversion-by bp_genbank2gff3.pl
# organism Escherichia coli E24377A
# date 06-JAN-2010
# Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
NC_009789    GenBank    region    1    6199    .    +    1
ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia
coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This
record has not yet been subject to final NCBI review. The reference sequence
was derived from CP000798. Source DNA and bacteria available from Jacques
Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL
REFSEQ: This record has not yet been subject to final NCBI review. The
reference sequence was derived from CP000798. Source DNA and bacteria
available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
E24377A;plasmid=pETEC_6;strain=E24377A
NC_009789    GenBank    gene    665    781    .    -    1
ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001
NC_009789    GenBank    mRNA    665    781    .    -    1
ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
NC_009789    GenBank    CDS    665    781    .    -    1
ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified
by glimmer%3B putative;codon_start=1;product=hypothetical
protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38

while these are from Scotts' file:
##gff-version 3
# sequence-region NC_009789 1 6199
# conversion-by bp_genbank2gff3.pl
# organism Escherichia coli E24377A
# date 06-JAN-2010
# Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
NC_009789    GenBank    region    1    6199    .    +    1
ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A
plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has
not yet been subject to final NCBI review. The reference sequence was
derived from CP000798. Source DNA and bacteria available from Jacques Ravel
(jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ:
This record has not yet been subject to final NCBI review. The reference
sequence was derived from CP000798. Source DNA and bacteria available from
Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
E24377A;plasmid=pETEC_6;strain=E24377A
NC_009789    GenBank    gene    665    781    .    -    1
ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001
NC_009789    GenBank    mRNA    665    781    .    -    1
ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
NC_009789    GenBank    CDS    665    781    .    -    1
ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified
by glimmer%3B
putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical
protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38


Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's,
as desired.
I have no idea what is going on here...

Best,
Dave

On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain <scott at scottcain.net> wrote:

> Hi Dave,
>
> Let's keep the discussion on the mailing list so we can make sure that
> when this problem is solved, its resolution will be archived.
>
> I don't really understand what is going on either, though it would
> probably be a good idea to set your PERL5LIB env variable so that when
> you execute this script from the git repository that it will also uses
> BioPerl modules in the git repository instead of the ones that are
> installed in your "normal" path.
>
> Also, are you using any command line flags when executing it?  I didn't.
>
> Scott
>
>
> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
> <david.breimann at gmail.com> wrote:
> > Yes, I'm using Ubuntu 10.04.
> >
> > That is really weired. I tried running the script from the perl-live dir
> > (which I just pulled using git), and I get the same results as before
> > (`Name` instead of `locus_tag`):
> >
> >  $ wget
> >
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> >  $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
> > NC_009789.genbank
> >
> > Attached is the resulting GFF3.
> > I also attach a copy of bp_genbank2gff3.pl as found under
> > /home/dave/src/bioperl-live/blib/script.
> >
> > This is a real mystery for me!
> >
> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
> >>
> >> Typically I do build and install, but you can run it directly from the
> >> git checkout directory.
> >>
> >> For locating other versions of the script, are you running linux?  If
> >> so, are you familiar with the "locate" command:
> >>
> >>  locate bp_genbank2gff3.pl
> >>
> >> If you've never used it before, you may need to update the database
> >> the locate command uses as root:
> >>
> >>  sudo updatedb
> >>
> >> Scott
> >>
> >>
> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
> >> <david.breimann at gmail.com> wrote:
> >> > Your gff seems fine. I get a vey similiar one, but with `Name=`
> instaed
> >> > of
> >> > `locus_tag=`.
> >> >
> >> > I don't really know how to check for multiple bioperl installations.
> >> > I'm using my personal server, so I don't mind removing and installing
> >> > everything from scratch -- but I do'nt know ho to do that.
> >> >
> >> > Also, what I don't get with the git is how the scripts are supposed to
> >> > be
> >> > updated (unless you build and install).
> >> >
> >> > Thanks you!
> >> >
> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
> wrote:
> >> >>
> >> >> Well, if you aren't getting the same results as me then I'd say you
> >> >> aren't using the same version of the script :-)
> >> >>
> >> >> Unfortunately, the scripts are no longer automatically marked with
> the
> >> >> "internal" version information when committed, so there really isn't
> >> >> anything in the script I can tell you to look for.  Check for more
> >> >> than one bioperl instance on your  computer.
> >> >>
> >> >> I've attached the GFF3 file I got so you can look at it and tell me
> if
> >> >> it is what you expect.
> >> >>
> >> >> Scott
> >> >>
> >> >>
> >> >>
> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
> >> >> <david.breimann at gmail.com> wrote:
> >> >> > Hi Scott,
> >> >> >
> >> >> > I just pulled the lated bioperl-live using git.
> >> >> > I'm not sure how the scripts are updated, so I Build and installed
> >> >> > anyway
> >> >> > (perhaps exporting the path is supposed to be enough?)
> >> >> > Anyway, I still get the same results. No locus_tag.
> >> >> > How can I tell if I'm using the latest version of the script?
> >> >> >
> >> >> > Thanks again.
> >> >> >
> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi Dave,
> >> >> >>
> >> >> >> A fresh "pull" of the bioperl git repository shows that
> >> >> >> bp_genbank2gff3.pl already does this.  It creates a locus_tag for
> >> >> >> all
> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID
> >> >> >> when
> >> >> >> it can (it can't blindly use the locus tag for the ID since both
> the
> >> >> >> gene and the CDS have the same tag).
> >> >> >>
> >> >> >> Scott
> >> >> >>
> >> >> >>
> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> > Hi Scott,
> >> >> >> >
> >> >> >> > Here is a very short genbank:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> >> >> >> >
> >> >> >> > Note all genes in the genbank have locus tags. In the resulting
> >> >> >> > GFF3,
> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag.
> I
> >> >> >> > have
> >> >> >> > no
> >> >> >> > idea why it deserves a special treatment... :)
> >> >> >> >
> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3
> last
> >> >> >> > column
> >> >> >> > whenever available) will really make my life easier.
> >> >> >> >
> >> >> >> > Thank you,
> >> >> >> > Dave
> >> >> >> >
> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain <
> scott at scottcain.net>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Hi Dave,
> >> >> >> >>
> >> >> >> >> That seems perfectly reasonable.  If you could point out a
> >> >> >> >> GenBank
> >> >> >> >> entry for which that does not happen, I could try to figure out
> >> >> >> >> why
> >> >> >> >> not.
> >> >> >> >>
> >> >> >> >> Scott
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
> >> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
> >> >> >> >> > locus_tag
> >> >> >> >> > will
> >> >> >> >> > be
> >> >> >> >> > always added to the GFF last column if it exists in the
> >> >> >> >> > genbank,
> >> >> >> >> > whether
> >> >> >> >> > it
> >> >> >> >> > is used as ID in the GFF or not.
> >> >> >> >> >
> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
> >> >> >> >> > <scott at scottcain.net>
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> Hi Dave,
> >> >> >> >> >>
> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to
> deal
> >> >> >> >> >> with
> >> >> >> >> >> GenBank files :-)  It was designed initially to work on
> whole
> >> >> >> >> >> genome
> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to
> make
> >> >> >> >> >> it
> >> >> >> >> >> "do
> >> >> >> >> >> the right thing."  In practice, it is not unusual for a post
> >> >> >> >> >> processing step (either by hand or a quicky perl script) to
> be
> >> >> >> >> >> required to really get it right.  I don't recall the
> specifics
> >> >> >> >> >> (if I
> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
> do
> >> >> >> >> >> know
> >> >> >> >> >> that there is a list of things that it will try to use for
> the
> >> >> >> >> >> ID,
> >> >> >> >> >> and
> >> >> >> >> >> while the locus is on the list, I don't know where it comes
> in
> >> >> >> >> >> the
> >> >> >> >> >> list, so it's possible that other items might supersede it.
> >> >> >> >> >>
> >> >> >> >> >> Scott
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
> >> >> >> >> >> <david.breimann at gmail.com> wrote:
> >> >> >> >> >> > Hello,
> >> >> >> >> >> >
> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
> adds
> >> >> >> >> >> > a
> >> >> >> >> >> > `locus_tag`
> >> >> >> >> >> > in the fields and sometime it doesn't, even though the
> >> >> >> >> >> > genabank
> >> >> >> >> >> > has a
> >> >> >> >> >> > locus
> >> >> >> >> >> > tag.
> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
> >> >> >> >> >> >
> >> >> >> >> >> > Thanks,
> >> >> >> >> >> > Dave
> >> >> >> >> >> > _______________________________________________
> >> >> >> >> >> > Bioperl-l mailing list
> >> >> >> >> >> > Bioperl-l at lists.open-bio.org
> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> >> >> Scott Cain, Ph. D.                                   scott
> at
> >> >> >> >> >> scottcain
> >> >> >> >> >> dot net
> >> >> >> >> >> GMOD Coordinator (http://gmod.org/)
> >> >> >> >> >> 216-392-3087
> >> >> >> >> >> Ontario Institute for Cancer Research
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> >> Scott Cain, Ph. D.                                   scott at
> >> >> >> >> scottcain
> >> >> >> >> dot net
> >> >> >> >> GMOD Coordinator (http://gmod.org/)
> >> >> >> >> 216-392-3087
> >> >> >> >> Ontario Institute for Cancer Research
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >>
> >> >> >>
> ------------------------------------------------------------------------
> >> >> >> Scott Cain, Ph. D.                                   scott at
> >> >> >> scottcain
> >> >> >> dot net
> >> >> >> GMOD Coordinator (http://gmod.org/)
> 216-392-3087
> >> >> >> Ontario Institute for Cancer Research
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>
> ------------------------------------------------------------------------
> >> >> Scott Cain, Ph. D.                                   scott at
> scottcain
> >> >> dot net
> >> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> >> Ontario Institute for Cancer Research
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Scott Cain, Ph. D.                                   scott at scottcain
> >> dot net
> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> >> Ontario Institute for Cancer Research
> >
> >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>


From scott at scottcain.net  Sat Sep 18 14:03:43 2010
From: scott at scottcain.net (Scott Cain)
Date: Sat, 18 Sep 2010 15:03:43 +0100
Subject: [Bioperl-l] bp_genbank2gff3.pl
In-Reply-To: <AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>
References: <AANLkTinboedH1-CA2wx6jx0HHt=eO43UZg1UhhMx6h1W@mail.gmail.com>
	<AANLkTi=hpmoBnGLY7njgjL43h0eQ94GG+Pj7742P-=nF@mail.gmail.com>
	<AANLkTimFBtgaphiDnmxvhioG1jk-q+5B5JJRYR9+8cw=@mail.gmail.com>
	<AANLkTikzmJS9BH4uUFHUepbH-u3ympL-ci2CTg45rWtu@mail.gmail.com>
	<AANLkTi=pBpr5-MtgQ5JQRr2rK_HP_6MsRK_EM7tPnwwS@mail.gmail.com>
	<AANLkTi=RV1d35597es9XM=g6PQFXG7cq7kW9giEcgQCz@mail.gmail.com>
	<AANLkTi=rNQD4bH5vzMouwEVe=dunRkJKW05aD+XX1LkZ@mail.gmail.com>
	<AANLkTik9z6mySLOnwe0khGX92ftS8Z2B+eqFnSaMB=rV@mail.gmail.com>
	<AANLkTikgiAWWBggqAajoy0hnF-pzjuCUGXaWOmfV0DR=@mail.gmail.com>
	<AANLkTim5ckmntBY+Kvo+Vc2ojXe9M4TcmZoRWuypSePd@mail.gmail.com>
	<AANLkTi=gqXvd5Rps13ry7_Y3_EJYuvvBn19rGYuB6VYR@mail.gmail.com>
	<AANLkTi=HMX7y-NGbWo0At+wPQhMPsrKRUP=jX5FoFRd0@mail.gmail.com>
	<AANLkTimkORAtmRJ-+1E7PORdySMvJD7_oNAeEwDdgKje@mail.gmail.com>
Message-ID: <AANLkTi=yGr+VHDinKtSiy7XE3f7_F73hGRA4bNFzRGxz@mail.gmail.com>

The only thing I can add is that I did a 'git diff genbank2gff3.PLS'
and found no differences.  It occurred to me that perhaps I'd done
some fixing and not commited it, but it looks to me that that's not
the case (assuming I've managed to use git correctly (not a great
assumption, but I don't have another one to work with :-))

Scott


On Sat, Sep 18, 2010 at 2:57 PM, David Breimann
<david.breimann at gmail.com> wrote:
> So let's do an intermediate summary of my situation:
> I'm using Ubuntu 10.04 and Perl 5.10.1.
> I get unexpected results when using bp_genbank2gff3.pl ("Name=" instead of
> "locus_tag=" in the last GFF3 column), while Scott gets the expected results
> while using the latest version of bioperl.
> I cloned a fresh version of bioperl live into my ~/src:
> $ cd ~/src
> $ git clone http://github.com/bioperl/bioperl-live.git
>
> I then added the following line to the end of ~/.profile:
> export PERL5LIB="$HOME/src/bioperl-live:$PERL5LIB"
> and ran
> $ source ~/.profile
>
> I then downloaded a small genome from NCBI
> $ wget
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
> and tested the script:
> $ ~/src/bioperl-live/scripts/Bio-DB-GFF/genbank2gff3.PLS NC_009789.gbk
>
> Following are the top 10 lines of the resulting GFF3:
>
> ##gff-version 3
> # sequence-region NC_009789 1 6199
> # conversion-by bp_genbank2gff3.pl
> # organism Escherichia coli E24377A
> # date 06-JAN-2010
> # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
> NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1
> ID=NC_009789;Dbxref=Project:13960,taxon:331111;Name=NC_009789;Note=Escherichia
> coli E24377A plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This
> record has not yet been subject to final NCBI review. The reference sequence
> was derived from CP000798. Source DNA and bacteria available from Jacques
> Ravel (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL
> REFSEQ: This record has not yet been subject to final NCBI review. The
> reference sequence was derived from CP000798. Source DNA and bacteria
> available from Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
> ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
> E24377A;plasmid=pETEC_6;strain=E24377A
> NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001;Dbxref=GeneID:5585816;Name=EcE24377A_B0001
> NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
> NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Name=EcE24377A_B0001;Note=identified
> by glimmer%3B putative;codon_start=1;product=hypothetical
> protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38
>
> while these are from Scotts' file:
> ##gff-version 3
> # sequence-region NC_009789 1 6199
> # conversion-by bp_genbank2gff3.pl
> # organism Escherichia coli E24377A
> # date 06-JAN-2010
> # Note Escherichia coli E24377A plasmid pETEC_6, complete sequence.
> NC_009789??? GenBank??? region??? 1??? 6199??? .??? +??? 1
> ID=NC_009789;Dbxref=Project:13960,taxon:331111;Note=Escherichia coli E24377A
> plasmid pETEC_6%2C complete sequence.,PROVISIONAL REFSEQ: This record has
> not yet been subject to final NCBI review. The reference sequence was
> derived from CP000798. Source DNA and bacteria available from Jacques Ravel
> (jravel at tigr.org). COMPLETENESS: full length. ;comment1=PROVISIONAL REFSEQ:
> This record has not yet been subject to final NCBI review. The reference
> sequence was derived from CP000798. Source DNA and bacteria available from
> Jacques Ravel (jravel at tigr.org). COMPLETENESS: full length.
> ;date=06-JAN-2010;mol_type=genomic DNA;organism=Escherichia coli
> E24377A;plasmid=pETEC_6;strain=E24377A
> NC_009789??? GenBank??? gene??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001;Dbxref=GeneID:5585816;locus_tag=EcE24377A_B0001
> NC_009789??? GenBank??? mRNA??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.t01;Parent=EcE24377A_B0001
> NC_009789??? GenBank??? CDS??? 665??? 781??? .??? -??? 1
> ID=EcE24377A_B0001.p01;Parent=EcE24377A_B0001.t01;Dbxref=GI:157149501,GeneID:5585816;Note=identified
> by glimmer%3B
> putative;codon_start=1;locus_tag=EcE24377A_B0001;product=hypothetical
> protein;protein_id=YP_001451539.1;transl_table=11;translation=length.38
>
>
> Note the "Name=" tags in my version are replaced by "locus_tag=" in Scott's,
> as desired.
> I have no idea what is going on here...
>
> Best,
> Dave
>
> On Sat, Sep 18, 2010 at 3:40 PM, Scott Cain <scott at scottcain.net> wrote:
>>
>> Hi Dave,
>>
>> Let's keep the discussion on the mailing list so we can make sure that
>> when this problem is solved, its resolution will be archived.
>>
>> I don't really understand what is going on either, though it would
>> probably be a good idea to set your PERL5LIB env variable so that when
>> you execute this script from the git repository that it will also uses
>> BioPerl modules in the git repository instead of the ones that are
>> installed in your "normal" path.
>>
>> Also, are you using any command line flags when executing it? ?I didn't.
>>
>> Scott
>>
>>
>> On Sat, Sep 18, 2010 at 2:14 PM, David Breimann
>> <david.breimann at gmail.com> wrote:
>> > Yes, I'm using Ubuntu 10.04.
>> >
>> > That is really weired. I tried running the script from the perl-live dir
>> > (which I just pulled using git), and I get the same results as before
>> > (`Name` instead of `locus_tag`):
>> >
>> > ?$ wget
>> >
>> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> > ?$ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
>> > NC_009789.genbank
>> >
>> > Attached is the resulting GFF3.
>> > I also attach a copy of bp_genbank2gff3.pl as found under
>> > /home/dave/src/bioperl-live/blib/script.
>> >
>> > This is a real mystery for me!
>> >
>> > On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>> >>
>> >> Typically I do build and install, but you can run it directly from the
>> >> git checkout directory.
>> >>
>> >> For locating other versions of the script, are you running linux? ?If
>> >> so, are you familiar with the "locate" command:
>> >>
>> >> ?locate bp_genbank2gff3.pl
>> >>
>> >> If you've never used it before, you may need to update the database
>> >> the locate command uses as root:
>> >>
>> >> ?sudo updatedb
>> >>
>> >> Scott
>> >>
>> >>
>> >> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>> >> <david.breimann at gmail.com> wrote:
>> >> > Your gff seems fine. I get a vey similiar one, but with `Name=`
>> >> > instaed
>> >> > of
>> >> > `locus_tag=`.
>> >> >
>> >> > I don't really know how to check for multiple bioperl installations.
>> >> > I'm using my personal server, so I don't mind removing and installing
>> >> > everything from scratch -- but I do'nt know ho to do that.
>> >> >
>> >> > Also, what I don't get with the git is how the scripts are supposed
>> >> > to
>> >> > be
>> >> > updated (unless you build and install).
>> >> >
>> >> > Thanks you!
>> >> >
>> >> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
>> >> > wrote:
>> >> >>
>> >> >> Well, if you aren't getting the same results as me then I'd say you
>> >> >> aren't using the same version of the script :-)
>> >> >>
>> >> >> Unfortunately, the scripts are no longer automatically marked with
>> >> >> the
>> >> >> "internal" version information when committed, so there really isn't
>> >> >> anything in the script I can tell you to look for. ?Check for more
>> >> >> than one bioperl instance on your ?computer.
>> >> >>
>> >> >> I've attached the GFF3 file I got so you can look at it and tell me
>> >> >> if
>> >> >> it is what you expect.
>> >> >>
>> >> >> Scott
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>> >> >> <david.breimann at gmail.com> wrote:
>> >> >> > Hi Scott,
>> >> >> >
>> >> >> > I just pulled the lated bioperl-live using git.
>> >> >> > I'm not sure how the scripts are updated, so I Build and installed
>> >> >> > anyway
>> >> >> > (perhaps exporting the path is supposed to be enough?)
>> >> >> > Anyway, I still get the same results. No locus_tag.
>> >> >> > How can I tell if I'm using the latest version of the script?
>> >> >> >
>> >> >> > Thanks again.
>> >> >> >
>> >> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Dave,
>> >> >> >>
>> >> >> >> A fresh "pull" of the bioperl git repository shows that
>> >> >> >> bp_genbank2gff3.pl already does this. ?It creates a locus_tag for
>> >> >> >> all
>> >> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>> >> >> >> when
>> >> >> >> it can (it can't blindly use the locus tag for the ID since both
>> >> >> >> the
>> >> >> >> gene and the CDS have the same tag).
>> >> >> >>
>> >> >> >> Scott
>> >> >> >>
>> >> >> >>
>> >> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> > Hi Scott,
>> >> >> >> >
>> >> >> >> > Here is a very short genbank:
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>> >> >> >> >
>> >> >> >> > Note all genes in the genbank have locus tags. In the resulting
>> >> >> >> > GFF3,
>> >> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag.
>> >> >> >> > I
>> >> >> >> > have
>> >> >> >> > no
>> >> >> >> > idea why it deserves a special treatment... :)
>> >> >> >> >
>> >> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3
>> >> >> >> > last
>> >> >> >> > column
>> >> >> >> > whenever available) will really make my life easier.
>> >> >> >> >
>> >> >> >> > Thank you,
>> >> >> >> > Dave
>> >> >> >> >
>> >> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain
>> >> >> >> > <scott at scottcain.net>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> Hi Dave,
>> >> >> >> >>
>> >> >> >> >> That seems perfectly reasonable. ?If you could point out a
>> >> >> >> >> GenBank
>> >> >> >> >> entry for which that does not happen, I could try to figure
>> >> >> >> >> out
>> >> >> >> >> why
>> >> >> >> >> not.
>> >> >> >> >>
>> >> >> >> >> Scott
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>> >> >> >> >> > locus_tag
>> >> >> >> >> > will
>> >> >> >> >> > be
>> >> >> >> >> > always added to the GFF last column if it exists in the
>> >> >> >> >> > genbank,
>> >> >> >> >> > whether
>> >> >> >> >> > it
>> >> >> >> >> > is used as ID in the GFF or not.
>> >> >> >> >> >
>> >> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>> >> >> >> >> > <scott at scottcain.net>
>> >> >> >> >> > wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> Hi Dave,
>> >> >> >> >> >>
>> >> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to
>> >> >> >> >> >> deal
>> >> >> >> >> >> with
>> >> >> >> >> >> GenBank files :-) ?It was designed initially to work on
>> >> >> >> >> >> whole
>> >> >> >> >> >> genome
>> >> >> >> >> >> refseqs, and contains several ad hoc rules for trying to
>> >> >> >> >> >> make
>> >> >> >> >> >> it
>> >> >> >> >> >> "do
>> >> >> >> >> >> the right thing." ?In practice, it is not unusual for a
>> >> >> >> >> >> post
>> >> >> >> >> >> processing step (either by hand or a quicky perl script) to
>> >> >> >> >> >> be
>> >> >> >> >> >> required to really get it right. ?I don't recall the
>> >> >> >> >> >> specifics
>> >> >> >> >> >> (if I
>> >> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
>> >> >> >> >> >> do
>> >> >> >> >> >> know
>> >> >> >> >> >> that there is a list of things that it will try to use for
>> >> >> >> >> >> the
>> >> >> >> >> >> ID,
>> >> >> >> >> >> and
>> >> >> >> >> >> while the locus is on the list, I don't know where it comes
>> >> >> >> >> >> in
>> >> >> >> >> >> the
>> >> >> >> >> >> list, so it's possible that other items might supersede it.
>> >> >> >> >> >>
>> >> >> >> >> >> Scott
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>> >> >> >> >> >> <david.breimann at gmail.com> wrote:
>> >> >> >> >> >> > Hello,
>> >> >> >> >> >> >
>> >> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
>> >> >> >> >> >> > adds
>> >> >> >> >> >> > a
>> >> >> >> >> >> > `locus_tag`
>> >> >> >> >> >> > in the fields and sometime it doesn't, even though the
>> >> >> >> >> >> > genabank
>> >> >> >> >> >> > has a
>> >> >> >> >> >> > locus
>> >> >> >> >> >> > tag.
>> >> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>> >> >> >> >> >> >
>> >> >> >> >> >> > Thanks,
>> >> >> >> >> >> > Dave
>> >> >> >> >> >> > _______________________________________________
>> >> >> >> >> >> > Bioperl-l mailing list
>> >> >> >> >> >> > Bioperl-l at lists.open-bio.org
>> >> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> --
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott
>> >> >> >> >> >> at
>> >> >> >> >> >> scottcain
>> >> >> >> >> >> dot net
>> >> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> >> 216-392-3087
>> >> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> ------------------------------------------------------------------------
>> >> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> >> scottcain
>> >> >> >> >> dot net
>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> >> 216-392-3087
>> >> >> >> >> Ontario Institute for Cancer Research
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------
>> >> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> >> scottcain
>> >> >> >> dot net
>> >> >> >> GMOD Coordinator (http://gmod.org/)
>> >> >> >> 216-392-3087
>> >> >> >> Ontario Institute for Cancer Research
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------
>> >> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at
>> >> >> scottcain
>> >> >> dot net
>> >> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> >> Ontario Institute for Cancer Research
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
>> Ontario Institute for Cancer Research
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087
Ontario Institute for Cancer Research


From j.scholtalbers at gmail.com  Mon Sep 20 08:04:34 2010
From: j.scholtalbers at gmail.com (Jelle Scholtalbers)
Date: Mon, 20 Sep 2010 10:04:34 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
Message-ID: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>

Hi,

I'm trying to get all descendents for a specific taxon using Entrez.
each_Descendent and get_all_Descendents don't seem to be implemented or
working.  I then tried by getting the tree for this taxon using
Bio::DB::Taxonomy's get_tree. However this only retrieves the
ancestors/parents.
What would be the best approach here?

Cheers,
Jelle

On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> wrote:

> Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> Eric
>
> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> > Sounds like this is going through an initial indexing step (for
> flatfiles).  I would expect the initial indexing of the tables to take time
> as you have to create the DB, but subsequent lookups post-indexing should be
> much faster if the index is already present.  Maybe Jason could answer in
> more detail?
> >
> > chris
> >
> > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
> >
> >> Hello,
> >>
> >> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> >> 5.8.5 with BioPerl 1.6.0
> >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> >>
> >> It ran for 100 cpu seconds and output:
> >>
> >> 33090 Viridiplantae kingdom
> >>
> >> I was expecting it to also output the descendents. Some questions:
> >>
> >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> >> implemented? It looks to be in Taxon.pm but it is not documented and
> >> when I ran Data::Dumper on $node the value '_desc' was empty.
> >>
> >> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> >> with a call to 'entrez' it took only 0.02 cpu seconds to come
> >> up with the same result.
> >>
> >> thanks,
> >> Eric
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From pcantalupo at gmail.com  Mon Sep 20 14:46:32 2010
From: pcantalupo at gmail.com (Paul Cantalupo)
Date: Mon, 20 Sep 2010 10:46:32 -0400
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <AANLkTikCi7XiKptEghr7uckAUBcD_K3n_YAgHEJjGo_F@mail.gmail.com>

Jelle,

Below is my subroutine that returns the lineage corresponding to a
Taxonomy id. For example, if you use 10633 as the taxid, the
subroutine will return:

Viruses
dsDNA viruses, no RNA stage
Polyomaviridae
Polyomavirus
Simian virus 40

I hope this is what you wanted. Good luck

sub taxid2lineage {
?? my ($id) = @_;
?? return undef unless ($id);

?? my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
????????????????????????????????????????? -db??? => 'taxonomy',
????????????????????????????????????????? -email => 'pcantalupo at gmail.com',
????????????????????????????????????????? -id??? => [ $id ],
????????????????????????????????????????? );

?? my $res = $factory->get_Response->content;
?? my $data = XMLin($res);

?? if (!ref($data)) {
????? # this happens when the Taxid is not found in the Taxonomy DB
????? return $data;
?? }

?? my @lineage = ();

?? foreach my $taxa (@{ $data->{Taxon}->{LineageEx}->{Taxon} } ) {
????? # taxa is a hash with three keys ScientificName, TaxId, and Rank
????? # I'm only saving the ScientificName but possible extensions to this
????? # subroutine would be to return the TaxId and Rank as well.
????? push (@lineage, $taxa->{ScientificName});
?? }

?? # add the Species to the end of the Lineage array.
?? push (@lineage, $data->{Taxon}->{ScientificName});

?? return wantarray ? return @lineage : join("; ", @lineage);
}

Paul Cantalupo
University of Pittsburgh


On Mon, Sep 20, 2010 at 4:04 AM, Jelle Scholtalbers
<j.scholtalbers at gmail.com> wrote:
>
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working. ?I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> wrote:
>
> > Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> > Eric
> >
> > On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu>
> > wrote:
> > > Sounds like this is going through an initial indexing step (for
> > flatfiles). ?I would expect the initial indexing of the tables to take time
> > as you have to create the DB, but subsequent lookups post-indexing should be
> > much faster if the index is already present. ?Maybe Jason could answer in
> > more detail?
> > >
> > > chris
> > >
> > > On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
> > >
> > >> Hello,
> > >>
> > >> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> > >> 5.8.5 with BioPerl 1.6.0
> > >> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> > >>
> > >> It ran for 100 cpu seconds and output:
> > >>
> > >> 33090 Viridiplantae kingdom
> > >>
> > >> I was expecting it to also output the descendents. Some questions:
> > >>
> > >> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> > >> implemented? It looks to be in Taxon.pm but it is not documented and
> > >> when I ran Data::Dumper on $node the value '_desc' was empty.
> > >>
> > >> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> > >> with a call to 'entrez' it took only 0.02 cpu seconds to come
> > >> up with the same result.
> > >>
> > >> thanks,
> > >> Eric
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Mon Sep 20 15:38:36 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 20 Sep 2010 08:38:36 -0700
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <4C977FFC.5000205@bioperl.org>


This works for me to get all the descendents from sub-node. You have to 
call the function with the dabatase handle.   I am not sure if the Taxon 
implementation has reference to the dbhandle or not:
#!/usr/bin/perl -w
use strict;
use Bio::DB::Taxonomy;
my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump 
into this directory
my $db = Bio::DB::Taxonomy->new(-source => 'flatfile',
                                 -nodesfile => "$dbdir/nodes.dmp",
                                 -namesfile => "$dbdir/names.dmp",
                                 );
my $taxa = $db->get_taxon(-taxonid => 151341);
my @d = $db->get_all_Descendents($taxa);

print join("\n", map { $_->id . " " . $_->rank . " " . 
$_->scientific_name } @d), "\n";


Hope that helps.
Jelle Scholtalbers wrote, On 9/20/10 1:04 AM:
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working.  I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins<rec3141 at mcmaster.ca>  wrote:
>
>    
>> Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
>> Eric
>>
>> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields<cjfields at illinois.edu>
>> wrote:
>>      
>>> Sounds like this is going through an initial indexing step (for
>>>        
>> flatfiles).  I would expect the initial indexing of the tables to take time
>> as you have to create the DB, but subsequent lookups post-indexing should be
>> much faster if the index is already present.  Maybe Jason could answer in
>> more detail?
>>      
>>> chris
>>>
>>> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
>>>
>>>        
>>>> Hello,
>>>>
>>>> I tried the Bio::DB::Taxonomy example on this wiki page using perl
>>>> 5.8.5 with BioPerl 1.6.0
>>>> http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>>>>
>>>> It ran for 100 cpu seconds and output:
>>>>
>>>> 33090 Viridiplantae kingdom
>>>>
>>>> I was expecting it to also output the descendents. Some questions:
>>>>
>>>> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
>>>> implemented? It looks to be in Taxon.pm but it is not documented and
>>>> when I ran Data::Dumper on $node the value '_desc' was empty.
>>>>
>>>> 2) is the flatfile reader always so slow? after replacing 'flatfile'
>>>> with a call to 'entrez' it took only 0.02 cpu seconds to come
>>>> up with the same result.
>>>>
>>>> thanks,
>>>> Eric
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>          
>>>        
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>      
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>    


From j.scholtalbers at gmail.com  Wed Sep 22 07:46:35 2010
From: j.scholtalbers at gmail.com (Jelle Scholtalbers)
Date: Wed, 22 Sep 2010 09:46:35 +0200
Subject: [Bioperl-l] Bio::DB::Taxonomy and each_Descendent
In-Reply-To: <4C977FFC.5000205@bioperl.org>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>
	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>
	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>
	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
	<4C977FFC.5000205@bioperl.org>
Message-ID: <AANLkTi=nbAJcCs6a2F3pacUaxZE3JZ690dHTasLGug_a@mail.gmail.com>

Hi Jason,

this was the same method I was using. With the taxdump it works apparently,
however it does not work with Entrez as source. So I will just stick to a
up2date taxdump then. Thanks for your example.

@Paul: Your method gives indeed the lineage but will only retrieve the
ancestors. I want to retrieve all the descendents. Thx anyway.

Cheers,
Jelle

On Mon, Sep 20, 2010 at 5:38 PM, Jason Stajich <jason at bioperl.org> wrote:

>
> This works for me to get all the descendents from sub-node. You have to
> call the function with the dabatase handle.   I am not sure if the Taxon
> implementation has reference to the dbhandle or not:
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::Taxonomy;
> my $dbdir = '/db/taxonomy/ncbi/'; #downloaded data from NCBI taxdump into
> this directory
> my $db = Bio::DB::Taxonomy->new(-source => 'flatfile',
>                                 -nodesfile => "$dbdir/nodes.dmp",
>                                 -namesfile => "$dbdir/names.dmp",
>                                 );
> my $taxa = $db->get_taxon(-taxonid => 151341);
> my @d = $db->get_all_Descendents($taxa);
>
> print join("\n", map { $_->id . " " . $_->rank . " " . $_->scientific_name
> } @d), "\n";
>
>
> Hope that helps.
> Jelle Scholtalbers wrote, On 9/20/10 1:04 AM:
>
> Hi,
>
> I'm trying to get all descendents for a specific taxon using Entrez.
> each_Descendent and get_all_Descendents don't seem to be implemented or
> working.  I then tried by getting the tree for this taxon using
> Bio::DB::Taxonomy's get_tree. However this only retrieves the
> ancestors/parents.
> What would be the best approach here?
>
> Cheers,
> Jelle
>
> On Wed, Apr 21, 2010 at 5:45 PM, Eric Collins <rec3141 at mcmaster.ca> <rec3141 at mcmaster.ca> wrote:
>
>
>
>  Thanks, that was indeed the answer to #2. Any idea about each_Descendent?
> Eric
>
> On Tue, Apr 20, 2010 at 4:48 PM, Chris Fields <cjfields at illinois.edu> <cjfields at illinois.edu>
> wrote:
>
>
>  Sounds like this is going through an initial indexing step (for
>
>
>  flatfiles).  I would expect the initial indexing of the tables to take time
> as you have to create the DB, but subsequent lookups post-indexing should be
> much faster if the index is already present.  Maybe Jason could answer in
> more detail?
>
>
>  chris
>
> On Apr 20, 2010, at 3:20 PM, Eric Collins wrote:
>
>
>
>  Hello,
>
> I tried the Bio::DB::Taxonomy example on this wiki page using perl
> 5.8.5 with BioPerl 1.6.0http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
>
> It ran for 100 cpu seconds and output:
>
> 33090 Viridiplantae kingdom
>
> I was expecting it to also output the descendents. Some questions:
>
> 1) are calls to 'each_Descendent' or 'get_all_Descendents' actually
> implemented? It looks to be in Taxon.pm but it is not documented and
> when I ran Data::Dumper on $node the value '_desc' was empty.
>
> 2) is the flatfile reader always so slow? after replacing 'flatfile'
> with a call to 'entrez' it took only 0.02 cpu seconds to come
> up with the same result.
>
> thanks,
> Eric
> _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>         _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>      _______________________________________________
> Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From waldenhe at muohio.edu  Fri Sep 24 19:15:48 2010
From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene)
Date: Fri, 24 Sep 2010 15:15:48 -0400
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>

Hello Bioperl Masters,

I am trying to perform a local blast with a query list of fasta files against a db of other fasta files.  I am attempting to use the Bio::Tools::Run::StandAloneBlastPlus module.  I have downleaded from the NCBI website BLAST+ 2.2.24+ and installed on my ubuntu machine.  I am using bioperl-1.5.2.

so the snibbit of code that is giving me errors is below:

my $seq_obj = Bio::Seq->new(-id =>$accn, -seq =>$seq);
my $report_obj = $blast_obj->blastall($seq_obj);
my $result_obj = $report_obj->next_result;
print $result_obj->num_hits;

The error I am getting is:

--------------------- WARNING ---------------------
MSG: cannot find path to blastall
---------------------------------------------------
Can't call method "next_result" on an undefined value at /media/C8B3-4A4A/Bioinformatics 1.1 beta/BioPerl/bioperl.pm line 284.

I think the real problem is the "cannot find path to Blastall.

>From reading around on different forums I have to make  a  .ncbirc text file with the location of BLAST+2.2.24+ on my machine.  I have that file in my /home folder.

How do I get  StandAloneBlastPlus synced up with BLAST+2.2.24+ ?  Am I approaching this right?

Thankyou,

Hans Waldenmaier


From ross at cuhk.edu.hk  Sat Sep 25 08:30:39 2010
From: ross at cuhk.edu.hk (Ross KK Leung)
Date: Sat, 25 Sep 2010 16:30:39 +0800
Subject: [Bioperl-l] perl for GO
In-Reply-To: <AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
References: <m2g1961621c1004161449h1e7c84f1zcef0b8af0825ca99@mail.gmail.com>	<i2v1961621c1004201320nedb3c28fm1f24b2d2cb65379@mail.gmail.com>	<9081_1271796557_o3KKnAcq015381_42E5A75A-438A-4AF7-AC60-226395329A9B@illinois.edu>	<j2i1961621c1004210845wac403f7emaf57118e2046f8ef@mail.gmail.com>
	<AANLkTik_cgS-4MH_QuqOomyTr6CRY7C_B+56S3x_2DrH@mail.gmail.com>
Message-ID: <015201cb5c8b$ef693490$ce3b9db0$@edu.hk>


Given a set of GO IDs, e.g. 

GO:0008150
GO:0005750
GO:0006122
GO:0008121
GO:0003674
GO:0005575
GO:0008150
GO:0009507
GO:0009535
GO:0009567
GO:0009977
GO:0010027
GO:0031361

from
http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo

one can manually examine the hierarchy. Although there is go-perl
(http://search.cpan.org/~cmungall/go-perl/) and go-db-perl
(http://search.cpan.org/~cmungall/go-db-perl/), as a life science student
who just learns Perl, I find it difficult to draw a hierarchy tree (or
simply make it a table to count the occurrence) to produce something like:


biological_process (4)
*** cellular process (4)
****** cell adhesion (1)
****** cell differention (3)
Molecular function (4)
<snip>
Cellular component (4)
<snip>

Can anybody advise? I don't need any fancy figures at all...


From David.Messina at sbc.su.se  Sun Sep 26 16:11:54 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 26 Sep 2010 18:11:54 +0200
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>
References: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3267@STUCMS4.it.muohio.edu>
Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0@sbc.su.se>

Hi Hans,


> I think the real problem is the "cannot find path to Blastall.

Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.

See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.

Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.


Dave


From maj at fortinbras.us  Mon Sep 27 00:43:15 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 27 Sep 2010 00:43:15 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W479062635438381285548195@webmail34>

Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:

git clone http://github.com/bioperl/bioperl-live.git
git clone http://github.com/bioperl/bioperl-run.git

(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl

cheers MAJ

--------------------------
Mark A. Jensen, PhD
Senior Consultant
Fortinbras Research
http://www.fortinbras.us

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Sunday, September 26, 2010 12:11 PM
>To: 'Waldenmaier, Hans Eugene'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans,
>
>
>> I think the real problem is the "cannot find path to Blastall.
>
>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>
>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Sep 27 21:07:11 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 27 Sep 2010 16:07:11 -0500
Subject: [Bioperl-l] Client-side Scansite Bioperl module
In-Reply-To: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
References: <DA034B5D-5E58-43EE-9AC4-197EE155FC7F@mit.edu>
Message-ID: <D8C2E980-BEB9-4231-8014-7318778CD215@illinois.edu>

Sorry, didn't see this being responded to on-list (been off the radar the last month).  I think this is a good idea, but I'm wondering if this might be better as a separate release on CPAN from bioperl core, seeing as we're in the prelim stages after the next bioperl release of modularizing the current bioperl core into smaller independent releases.

chris

On Sep 4, 2010, at 10:40 AM, Jonathan Rameseder wrote:

> hi guys
> 
> it seems Bioperl contains a wrapper [1] for Scansite [2]. in what extent would it make sense to integrate a client-sided version of Scansite with some statistical analysis features (eg enrichment tests) in Bioperl? that would give users the opportunity to customize their own version of the Scansite algorithm. i developed an object-oriented client-sided version and am currently writing test cases.  maybe it could be integrated with the server wrapper somehow? please let me know what you think :-D!
> 
> best wishes
> johnny
> 
> [1] Bio::Tools::Analysis::Protein::Scansite
> [2] http://www.ncbi.nlm.nih.gov/pubmed/11283593
> 
> ********************
> Jonathan Rameseder
> Ph.D. Candidate
> Computational Systems Biology Initiative
> Koch Institute for Integrative Cancer Research
> Massachusetts Institute of Technology
> ********************


From gandipalem at gmail.com  Tue Sep 28 04:09:06 2010
From: gandipalem at gmail.com (bv s)
Date: Tue, 28 Sep 2010 09:39:06 +0530
Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19
In-Reply-To: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
References: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
Message-ID: <AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>

Dear Sir/Madam,

Any one can tell how to use the make_primers.pl script?
What is Coordination file?

Regards
Suresh
Scholar,
National Bureau Of Plant Genetic Resources,
New Delhi.


On Mon, Sep 27, 2010 at 9:30 PM, <bioperl-l-request at lists.open-bio.org>wrote:

> Send Bioperl-l mailing list submissions to
>        bioperl-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.open-bio.org/mailman/listinfo/bioperl-l
> or, via email, send a message with subject or body 'help' to
>        bioperl-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
>        bioperl-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioperl-l digest..."
>
>
> Today's Topics:
>
>   1. Re:  StandAloneBlastPlus (Dave Messina)
>   2. Re:  StandAloneBlastPlus (Mark A. Jensen)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 26 Sep 2010 18:11:54 +0200
> From: Dave Messina <David.Messina at sbc.su.se>
> Subject: Re: [Bioperl-l] StandAloneBlastPlus
> To: "Waldenmaier, Hans Eugene" <waldenhe at muohio.edu>
> Cc: "bioperl-l at bioperl.org" <bioperl-l at bioperl.org>
> Message-ID: <5A561A87-A3A3-4CEB-A57E-B719ECFF75F0 at sbc.su.se>
> Content-Type: text/plain; charset=us-ascii
>
> Hi Hans,
>
>
> > I think the real problem is the "cannot find path to Blastall.
>
> Yes. But it sounds like you're trying to use the Bio::Tools::Run modules
> for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it
> has blastn, blastp, etc.
>
> See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
> Also, you probably need to upgrade your BioPerl installation. I'm pretty
> sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
> Dave
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 27 Sep 2010 00:43:15 +0000
> From: "Mark A. Jensen" <maj at fortinbras.us>
> Subject: Re: [Bioperl-l] StandAloneBlastPlus
> To: "Dave Messina" <David.Messina at sbc.su.se>,   "Waldenmaier, Hans
>        Eugene" <waldenhe at muohio.edu>
> Cc: bioperl-l at bioperl.org
> Message-ID: <W479062635438381285548195 at webmail34>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi Hans-- Dave is right; you'll need both the new blast+ as well as the
> latest BioPerl trunk code. Get it by doing both of the following:
>
> git clone http://github.com/bioperl/bioperl-live.git
> git clone http://github.com/bioperl/bioperl-run.git
>
> (i.e., you need the latest core and run distributions). To install, see
> http://www.bioperl.org/wiki/Installing_BioPerl
>
> cheers MAJ
>
> --------------------------
> Mark A. Jensen, PhD
> Senior Consultant
> Fortinbras Research
> http://www.fortinbras.us
>
> >-----Original Message-----
> >From: Dave Messina [mailto:David.Messina at sbc.su.se]
> >Sent: Sunday, September 26, 2010 12:11 PM
> >To: 'Waldenmaier, Hans Eugene'
> >Cc: bioperl-l at bioperl.org
> >Subject: Re: [Bioperl-l] StandAloneBlastPlus
> >
> >Hi Hans,
> >
> >
> >> I think the real problem is the "cannot find path to Blastall.
> >
> >Yes. But it sounds like you're trying to use the Bio::Tools::Run modules
> for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it
> has blastn, blastp, etc.
> >
> >See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
> >
> >Also, you probably need to upgrade your BioPerl installation. I'm pretty
> sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
> >
> >
> >
> >Dave
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> End of Bioperl-l Digest, Vol 89, Issue 19
> *****************************************
>


From David.Messina at sbc.su.se  Tue Sep 28 07:53:29 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 28 Sep 2010 09:53:29 +0200
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>
References: <W479062635438381285548195@webmail34>
	<23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>
Message-ID: <0BFD9DB0-40D9-4443-8968-CF5D5A31BD02@sbc.su.se>

> I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  

Type the following on the command line:
perl -e 'print $ENV{PATH}, "\n"'

You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing 
export /home/hans/BLAST/bin:PATH=${PATH}

on the command line and then type
perl -e 'print $ENV{PATH}, "\n"'

again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script?


> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
> export PATH=${PATH}:/home/hans/BLAST/bin
> export BLASTDIR=/home/hans/BLAST/
> 
> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.

It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added.

It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you.


Dave


From David.Messina at sbc.su.se  Tue Sep 28 07:58:00 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 28 Sep 2010 09:58:00 +0200
Subject: [Bioperl-l] Bioperl-l Digest, Vol 89, Issue 19
In-Reply-To: <AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>
References: <mailman.13.1285603208.6769.bioperl-l@lists.open-bio.org>
	<AANLkTin0DTop66SYiOdvQ7787Qot528tMeFOyT=LX9Ef@mail.gmail.com>
Message-ID: <6BACC902-4F5E-466B-B949-FE373831CB92@sbc.su.se>

> Any one can tell how to use the make_primers.pl script?
> What is Coordination file?

>From the documentation at the top of the script:

Description: This program designs primers for constructing knockouts
of genes by transformation of PCR products (ref: Datsenko & Wanner,
PNAS 2000).

A tab-delimited file containing ORF START STOP is read, and primers
flanking the start & stop coordinates are designed based on the
user-designated sequence file.

In addition, primers flanking the knockout regions are chosen for PCR
screening purposes once the knockout is generated.  The script uses
Bioperl in order to determine the primer sequences, which requires
getting subsequences and reverse complementing some of the objects.


Dave


From maj at fortinbras.us  Tue Sep 28 11:18:34 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 28 Sep 2010 11:18:34 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W726055216153101285672714@webmail57>

The module checks the env variable BLASTPLUSDIR for the executable; 
you can set it directly

export BLASTPLUSDIR=/home/hans/BLAST/bin

and you should be good to go.
MAJ

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Tuesday, September 28, 2010 03:53 AM
>To: 'Waldenmaier, Hans Eugene'
>Cc: 'Mark A. Jensen', bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>> I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
>
>Type the following on the command line:
>perl -e 'print $ENV{PATH}, "\n"'
>
>You should see /home/hans/BLAST/bin in the output from that command. If you don't, try typing 
>export /home/hans/BLAST/bin:PATH=${PATH}
>
>on the command line and then type
>perl -e 'print $ENV{PATH}, "\n"'
>
>again. If your BLAST bin directory still doesn't appear in that list, then something else is going on with your system. For example, you might have more than one version of Perl or Blast installed. Is the perl you're running on the command line the same perl that's called by the #! line at the top of your script?
>
>
>> I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
>> export PATH=${PATH}:/home/hans/BLAST/bin
>> export BLASTDIR=/home/hans/BLAST/
>> 
>> Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.
>
>It doesn't matter where in your .bashrc it goes, although it's possible there's something else in your .bashrc (or in the system bashrc, which is often read in. Look for mention of /etc/bashrc or similar.) that is overriding or altering the lines you added.
>
>It's a little tricky to diagnose and correct PATH issues over the internet, so if you're still having trouble, you might try to find someone locally who is knowledgeable about Unix and can work directly in your account with you.
>
>
>Dave
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From waldenhe at muohio.edu  Tue Sep 28 04:52:56 2010
From: waldenhe at muohio.edu (Waldenmaier, Hans Eugene)
Date: Tue, 28 Sep 2010 00:52:56 -0400
Subject: [Bioperl-l] StandAloneBlastPlus
In-Reply-To: <W479062635438381285548195@webmail34>
References: <W479062635438381285548195@webmail34>
Message-ID: <23920BABEC0B6D43BB2EB45E54163D75032B68AA3275@STUCMS4.it.muohio.edu>

Thanks Guys,

I have run those steps, my current version now is:
hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
1.006001

But I am still having problems.

I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI.  I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now.  
I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
export PATH=${PATH}:/home/hans/BLAST/bin
export BLASTDIR=/home/hans/BLAST/

Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.

Thanks for the help,

Hans
________________________________________
From: Mark A. Jensen [maj at fortinbras.us]
Sent: Sunday, September 26, 2010 8:43 
To: Dave Messina; Waldenmaier, Hans Eugene
Cc: bioperl-l at bioperl.org
Subject: Re:  [Bioperl-l] StandAloneBlastPlus

Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:

git clone http://github.com/bioperl/bioperl-live.git
git clone http://github.com/bioperl/bioperl-run.git

(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl

cheers MAJ

--------------------------
Mark A. Jensen, PhD
Senior Consultant
Fortinbras Research
http://www.fortinbras.us

>-----Original Message-----
>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>Sent: Sunday, September 26, 2010 12:11 PM
>To: 'Waldenmaier, Hans Eugene'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans,
>
>
>> I think the real problem is the "cannot find path to Blastall.
>
>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>
>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>
>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>
>
>
>Dave
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Tue Sep 28 15:04:07 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 28 Sep 2010 15:04:07 +0000
Subject: [Bioperl-l] StandAloneBlastPlus
Message-ID: <W8604325546183031285686247@webmail66>

Should work from .bashrc, Hans. Also add  

export BLASTPLUSDIR=/home/hans/BLAST/bin

It really should see it in the PATH as you have it, so that may be a bug; however the BLASTPLUSDIR should force it to see the program.

You can also execute the export commands in the shell, and the variables will be set and visible to programs for the duration of the login session. You can see what they are set to in the shell by doing

set | grep BLAST

cheers MAJ

>-----Original Message-----
>From: Waldenmaier, Hans Eugene [mailto:waldenhe at muohio.edu]
>Sent: Tuesday, September 28, 2010 12:52 AM
>To: 'Mark A. Jensen', 'Dave Messina'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>
>Thanks Guys,
>
>I have run those steps, my current version now is:
>hans at hans-laptop:~$ perl -MBio::Perl -le 'print Bio::Perl->VERSION;'
>1.006001
>
>But I am still having problems.
>
>I am having slightly more luck with using StandAloneBlast and the regular BLAST form NCBI.  I can get the command-line Blast running.  But I still cannot get Perl to see BLAST.  
>Following the instructions from the HOWTO's and the O'reilly book BLAST, I have gotten to the setting up the environmental variables part, which is where I think my problems are arising now.  
>I have added these lines to my /home/hans/ .bashrc file in order to get perl to find BLAST:
>export PATH=${PATH}:/home/hans/BLAST/bin
>export BLASTDIR=/home/hans/BLAST/
>
>Am I just supposed to add these the end of the .bashrc file or am I supposed to put it someplace special.
>
>Thanks for the help,
>
>Hans
>________________________________________
>From: Mark A. Jensen [maj at fortinbras.us]
>Sent: Sunday, September 26, 2010 8:43 
>To: Dave Messina; Waldenmaier, Hans Eugene
>Cc: bioperl-l at bioperl.org
>Subject: Re:  [Bioperl-l] StandAloneBlastPlus
>
>Hi Hans-- Dave is right; you'll need both the new blast+ as well as the latest BioPerl trunk code. Get it by doing both of the following:
>
>git clone http://github.com/bioperl/bioperl-live.git
>git clone http://github.com/bioperl/bioperl-run.git
>
>(i.e., you need the latest core and run distributions). To install, see http://www.bioperl.org/wiki/Installing_BioPerl
>
>cheers MAJ
>
>--------------------------
>Mark A. Jensen, PhD
>Senior Consultant
>Fortinbras Research
>http://www.fortinbras.us
>
>>-----Original Message-----
>>From: Dave Messina [mailto:David.Messina at sbc.su.se]
>>Sent: Sunday, September 26, 2010 12:11 PM
>>To: 'Waldenmaier, Hans Eugene'
>>Cc: bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] StandAloneBlastPlus
>>
>>Hi Hans,
>>
>>
>>> I think the real problem is the "cannot find path to Blastall.
>>
>>Yes. But it sounds like you're trying to use the Bio::Tools::Run modules for the old Blast, not Blast+. Blast+ doesn't have a blastall executable, it has blastn, blastp, etc.
>>
>>See http://www.bioperl.org/wiki/HOWTO:BlastPlus for example code.
>>
>>Also, you probably need to upgrade your BioPerl installation. I'm pretty sure BioPerl 1.5.2 doesn't have the Blast+ code in it.
>>
>>
>>
>>Dave
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From chiragmatkarbioinfo at gmail.com  Thu Sep 30 12:20:35 2010
From: chiragmatkarbioinfo at gmail.com (chirag matkar)
Date: Thu, 30 Sep 2010 19:20:35 +0700
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
Message-ID: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>

Hello all,
Is there any module to fetch dna sequence data from ensemble gene id?

-- 
Regards,
Chirag Matkar


From jun.yin at ucd.ie  Thu Sep 30 13:36:31 2010
From: jun.yin at ucd.ie (Jun Yin)
Date: Thu, 30 Sep 2010 14:36:31 +0100
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
In-Reply-To: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
References: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
Message-ID: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>

Hi, Chirag,

BioPerl does not have any module to retrieve data from Ensembl. But Ensembl
provides a BioPerl-like interface on that function.

You can visit Ensembl's website on how to use that module:
http://www.ensembl.org/info/data/api.html

Cheers,
Jun Yin
Ph.D.?student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar
Sent: Thursday, September 30, 2010 1:21 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id

Hello all,
Is there any module to fetch dna sequence data from ensemble gene id?

-- 
Regards,
Chirag Matkar
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l

__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com


__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 
 
__________ Information from ESET Smart Security, version of virus signature
database 5377 (20100818) __________

The message was checked by ESET Smart Security.

http://www.eset.com
 

From cjfields at illinois.edu  Thu Sep 30 15:16:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 30 Sep 2010 10:16:45 -0500
Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
In-Reply-To: <011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>
References: <AANLkTinePSh9vC76wxzgLQiFKH0T3OdZYXDozDXxXdGp@mail.gmail.com>
	<011901cb60a4$7dc13c30$7943b490$%yin@ucd.ie>
Message-ID: <E4D361DE-CEB6-45CC-83EE-6547C497ABE3@illinois.edu>

On Sep 30, 2010, at 8:36 AM, Jun Yin wrote:

> Hi, Chirag,
> 
> BioPerl does not have any module to retrieve data from Ensembl. But Ensembl
> provides a BioPerl-like interface on that function.

Actually, BioPerl does have Bio::Tools::Run::Ensembl, which was submitted by Sendu Bala a few years back.  I think it stills works rather well, at least tests pass.  You might get more out of using the Ensembl API directly as Jun states though, YMMV.

BTW, the ensembl API also works with the latest bioperl code, regardless what the Ensembl website says (e.g. they only support v1.2.3).  Haven't heard more about whether this discrepancy was supposed to be addressed at some point.

chris

> You can visit Ensembl's website on how to use that module:
> http://www.ensembl.org/info/data/api.html
> 
> Cheers,
> Jun Yin
> Ph.D. student in U.C.D.
> 
> Bioinformatics Laboratory
> Conway Institute
> University College Dublin
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chirag matkar
> Sent: Thursday, September 30, 2010 1:21 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Retrieve Sequence from Ensembl gene id
> 
> Hello all,
> Is there any module to fetch dna sequence data from ensemble gene id?
> 
> -- 
> Regards,
> Chirag Matkar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> __________ Information from ESET Smart Security, version of virus signature
> database 5377 (20100818) __________
> 
> The message was checked by ESET Smart Security.
> 
> http://www.eset.com
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From A.Vakhrusheva at lumc.nl  Wed Sep 29 13:28:54 2010
From: A.Vakhrusheva at lumc.nl (A.Vakhrusheva at lumc.nl)
Date: Wed, 29 Sep 2010 15:28:54 +0200
Subject: [Bioperl-l] Bio::Matrix::MatrixI
Message-ID: <35D95AF6C5D146479C328BBBA554FB76028C367E@mailf.lumcnet.prod.intern>

Bio::Matrix::MatrixI

I have a question concerning this interface.

I want to calculate p distances matrix, but what format is acceptable
for input? Phylip doesn't work

Anna