From bioperlanand at yahoo.com  Mon May  1 14:36:20 2006
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Mon, 1 May 2006 11:36:20 -0700 (PDT)
Subject: [Bioperl-l] how to obtain GIs from clone_ids
Message-ID: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com>


Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry) 
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions..

Thanks in advance.

Anand

		
---------------------------------
Blab-away for as little as 1?/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.

From cuiw at mail.nih.gov  Mon May  1 15:39:01 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Mon, 1 May 2006 15:39:01 -0400
Subject: [Bioperl-l] how to obtain GIs from clone_ids
In-Reply-To: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F48B0@nihcesmlbx10.nih.gov>

use strict;
use Bio::DB::Query::GenBank;

my $query_string = 'EST["C0005918b04"]';   
my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',                                           
					 -query=>$query_string,				       
					);   
my $count = $query->count;


my @ids   = $query->ids;  


for (@ids) {
  print;
}

-----Original Message-----
From: Anand Venkatraman [mailto:bioperlanand at yahoo.com] 
Sent: Monday, May 01, 2006 2:36 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] how to obtain GIs from clone_ids


Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry)
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions..

Thanks in advance.

Anand

		
---------------------------------
Blab-away for as little as 1?/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From s.ryazansky at gmail.com  Mon May  1 17:55:13 2006
From: s.ryazansky at gmail.com (Sergei Ryazansky)
Date: Mon, 1 May 2006 21:55:13 +0000 (UTC)
Subject: [Bioperl-l] blast program to run locally on windows
References: <007c01c66883$61f29490$15327e82@pyrimidine>
	<20060425215433.35436.qmail@web36613.mail.mud.yahoo.com>
Message-ID: <loom.20060501T235327-11@post.gmane.org>

Hi,
Can you post your formatdb.log file here?


From cjfields at uiuc.edu  Tue May  2 00:15:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 1 May 2006 23:15:19 -0500
Subject: [Bioperl-l] blast program to run locally on windows
In-Reply-To: <loom.20060501T235327-11@post.gmane.org>
References: <007c01c66883$61f29490$15327e82@pyrimidine>
	<20060425215433.35436.qmail@web36613.mail.mud.yahoo.com>
	<loom.20060501T235327-11@post.gmane.org>
Message-ID: <D54C8321-6A9C-4674-8C7E-5452DEF84599@uiuc.edu>

We managed to work our way through it.  He hadn't set ncbi.ini to the  
correct directories; the database was formatted correctly.

Chris

On May 1, 2006, at 4:55 PM, Sergei Ryazansky wrote:

> Hi,
> Can you post your formatdb.log file here?
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue May  2 12:19:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 2 May 2006 11:19:34 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and complexity
Message-ID: <000901c66e04$33e07370$15327e82@pyrimidine>

I ran into some wonkiness with using extra parameters ('seq_start',
'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
gone through, fixed, and committed.  I also have added a few tests to DB.t
for everything (all changes were in Bio::DB::WebDBSeqI and
Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
manage to get it added as well (with tests).  This is how NCBI defines
complexity:

complexity regulates the display:
0 - get the whole blob
1 - get the bioseq for gi of interest (default in Entrez)
2 - get the minimal bioseq-set containing the gi of interest
3 - get the minimal nuc-prot containing the gi of interest
4 - get the minimal pub-set containing the gi of interest

Here's my quandary; when setting complexity to '0', you get a glob back (the
main sequence as well as any subsequences, such as CDS); this is in essence
a sequence stream with multiple alphabet types.  So, I now have it set up to
do this:

my $factory = Bio::DB::GenBank->new(-format => 'fasta',
                                    -complexity => 0
                                   );

my $seqin = $factory->get_Seq_by_acc($acc);

while (my $seq = $seqin->next_seq) {
    $seqout->write_seq($seq);
}

since I thought returning an array would be horrendously expensive on
memory, esp. with larger sequences.  Currently this is only set up for
sequences which are retrieved when complexity is set to '0' so it's a pretty
unique case.  Regardless, I'm worried that, since users expect a Bio::Seq
object instead of a Bio::SeqIO object here, it will cause a lot of confusion
with the API.  Any suggestions/gripes?

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From mamillerpa at yahoo.com  Tue May  2 07:41:01 2006
From: mamillerpa at yahoo.com (Mark A. Miller)
Date: Tue, 2 May 2006 04:41:01 -0700 (PDT)
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines
Message-ID: <20060502114101.29745.qmail@web50409.mail.yahoo.com>

Hello all.

I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
make FASTA subset files for some bacterial strains.  I haven't been
able to parse out the strain information from the OS or RC lines. 
These lines typically look like:

OS Somegenus somespecies subsp. somesubspecies strain ABC123.
RC STRAIN=ABC123.

I'm not especiialy good with Perl, and I'm definitely weak when it
comes to OOP.

I have included some code I pasted together from various pages on the
bioperl wiki.  In addition to the wiki, I have been making use of 
www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html

The code I have so far reports the species but not the subspecies or
variant.  I have also tried to walk through all of the feature,
annotation and reference objects but I still can't seem to parse out
the information I need.  (For brevity, the example I'm including below
only lists the code I used for the annotation objects.)  Also, this
code only prints the information...  I know that I'll have to write a
FASTA sequence object seperately.

Any suggestions?

Thanks,
Mark

---   ---   ---


#!/usr/bin/perl


use Bio::SeqIO;


my $usage = "getaccs.pl file format\n";

my $file = shift or die $usage;

my $format = shift or die $usage;


my $inseq = Bio::SeqIO->new(-file   => "<$file",

   -format => $format );


while (my $seq = $inseq->next_seq) {


  my $species_object = $seq->species;

  my $species_string = $species_object->species;

  my $variant_string = $species_object->variant;

  my $common_string = $species_object->common_name;

  my $sub_string = $species_object->sub_species;

  my $binomial = $species_object->binomial('FULL');

  
  print "display   ",$seq->display_id,"\n";

  print "accession ",$seq->accession_number,"\n";

  print "desc      ",$seq->desc,"\n";

  
  print "species   ",$species_string,"\n";

  print "variant   ",$variant_string,"\n";

  print "common    ",$common_string,"\n";

  print "sub       ",$sub_string,"\n";

  print "binomial  ",$binomial,"\n";

  
  print $seq->seq,"\n";

  
  my $anno_collection = $seq->annotation;

  for my $key ( $anno_collection->get_all_annotation_keys ) {

    my @annotations = $anno_collection->get_Annotations($key);

    for my $value ( @annotations ) {

      print "tagname : ", $value->tagname, "\n";

      # $value is an Bio::Annotation, and has an "as_text" method

      print "  annotation value: ", $value->as_text, "\n";


       if ($value->tagname eq "reference") {

        my $hash_ref = $value->hash_tree;

        for my $key (keys %{$hash_ref}) {

          print $key,": ",$hash_ref->{$key},"\n";

        }

      }

    }

  }

  print "\n";

}

exit;


---   ---   ---   ---   ---   ---   ---   ---

Mark A. Miller

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From cjfields at uiuc.edu  Tue May  2 14:01:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 2 May 2006 13:01:58 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and complexity
In-Reply-To: <000901c66e04$33e07370$15327e82@pyrimidine>
Message-ID: <000a01c66e12$8131a960$15327e82@pyrimidine>

I hate responding to my own post!  Just wanted to add that I'm adding a
warnings for the get_Seq* methods to use the approp. get_Stream* method when
complexity == 0 before returning the Bio::SeqIO object.

CJF

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, May 02, 2006 11:20 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::GenBank and complexity
> 
> I ran into some wonkiness with using extra parameters ('seq_start',
> 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
> gone through, fixed, and committed.  I also have added a few tests to DB.t
> for everything (all changes were in Bio::DB::WebDBSeqI and
> Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
> manage to get it added as well (with tests).  This is how NCBI defines
> complexity:
> 
> complexity regulates the display:
> 0 - get the whole blob
> 1 - get the bioseq for gi of interest (default in Entrez)
> 2 - get the minimal bioseq-set containing the gi of interest
> 3 - get the minimal nuc-prot containing the gi of interest
> 4 - get the minimal pub-set containing the gi of interest
> 
> Here's my quandary; when setting complexity to '0', you get a glob back
> (the
> main sequence as well as any subsequences, such as CDS); this is in
> essence
> a sequence stream with multiple alphabet types.  So, I now have it set up
> to
> do this:
> 
> my $factory = Bio::DB::GenBank->new(-format => 'fasta',
>                                     -complexity => 0
>                                    );
> 
> my $seqin = $factory->get_Seq_by_acc($acc);
> 
> while (my $seq = $seqin->next_seq) {
>     $seqout->write_seq($seq);
> }
> 
> since I thought returning an array would be horrendously expensive on
> memory, esp. with larger sequences.  Currently this is only set up for
> sequences which are retrieved when complexity is set to '0' so it's a
> pretty
> unique case.  Regardless, I'm worried that, since users expect a Bio::Seq
> object instead of a Bio::SeqIO object here, it will cause a lot of
> confusion
> with the API.  Any suggestions/gripes?
> 
> Chris
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Tue May  2 14:36:08 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 2 May 2006 14:36:08 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
References: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
Message-ID: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>

This is really a limitation of the EMBL/GenBank format

See this thread:
http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html

or on GMANE
http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557

I don't know if any of this has been resolved really so hopefully  
James will speak up if he's implemented anything.

-jason
On May 2, 2006, at 7:41 AM, Mark A. Miller wrote:

> Hello all.
>
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
>
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
>
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
>
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
>
> The code I have so far reports the species but not the subspecies or
> variant.  I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need.  (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.)  Also, this
> code only prints the information...  I know that I'll have to write a
> FASTA sequence object seperately.
>
> Any suggestions?
>
> Thanks,
> Mark
>
> ---   ---   ---
>
>
> #!/usr/bin/perl
>
>
>
> use Bio::SeqIO;
>
>
>
> my $usage = "getaccs.pl file format\n";
>
> my $file = shift or die $usage;
>
> my $format = shift or die $usage;
>
>
>
> my $inseq = Bio::SeqIO->new(-file   => "<$file",
>
>    -format => $format );
>
>
>
> while (my $seq = $inseq->next_seq) {
>
>
>
>   my $species_object = $seq->species;
>
>   my $species_string = $species_object->species;
>
>   my $variant_string = $species_object->variant;
>
>   my $common_string = $species_object->common_name;
>
>   my $sub_string = $species_object->sub_species;
>
>   my $binomial = $species_object->binomial('FULL');
>
>
>
>   print "display   ",$seq->display_id,"\n";
>
>   print "accession ",$seq->accession_number,"\n";
>
>   print "desc      ",$seq->desc,"\n";
>
>
>
>   print "species   ",$species_string,"\n";
>
>   print "variant   ",$variant_string,"\n";
>
>   print "common    ",$common_string,"\n";
>
>   print "sub       ",$sub_string,"\n";
>
>   print "binomial  ",$binomial,"\n";
>
>
>
>   print $seq->seq,"\n";
>
>
>
>   my $anno_collection = $seq->annotation;
>
>   for my $key ( $anno_collection->get_all_annotation_keys ) {
>
>     my @annotations = $anno_collection->get_Annotations($key);
>
>     for my $value ( @annotations ) {
>
>       print "tagname : ", $value->tagname, "\n";
>
>       # $value is an Bio::Annotation, and has an "as_text" method
>
>       print "  annotation value: ", $value->as_text, "\n";
>
>
>
>        if ($value->tagname eq "reference") {
>
>         my $hash_ref = $value->hash_tree;
>
>         for my $key (keys %{$hash_ref}) {
>
>           print $key,": ",$hash_ref->{$key},"\n";
>
>         }
>
>       }
>
>     }
>
>   }
>
>   print "\n";
>
> }
>
> exit;
>
>
>
>
>
> ---   ---   ---   ---   ---   ---   ---   ---
>
> Mark A. Miller
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From mblanche at berkeley.edu  Tue May  2 15:30:49 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 12:30:49 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
Message-ID: <C07D0179.2183%mblanche@berkeley.edu>

Dear all--

I have been trying to use the intersection function to extract overlapping
region from alternatively spliced exons as in the following script. The
returned object from the 'my $overlap = $exon1->intersection($exon2);' is
actually loosing the strand of $exon1 if $exon1 is from the negative strand.
Is this behavior expected? Should I check the strand of $exon1 before
working on the object return by any Bio::RangeI function?

Many thanks 

#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;

MAIN:{

    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                -dsn =>
'dbi:mysql:database=dmel_43_LS;host=riolab.net',
                                -user => 'guest');
    my $test_db = $db->segment('4');
    
    # Load up the exons into $exons_p
    for my $gene ($test_db->features(-types => 'gene')){

        my $exons_p = extractExons($gene);

        cluster($exons_p) unless ($#{$exons_p} == -1);

    }
}

sub extractExons {
    my $gene = shift;
    my %ex_list;
    my @tcs = $gene->features(    -type =>'processed_transcript',
                                    -attributes =>{Gene => $gene->group});
                   
    for my $tc (@tcs){
        my @exons = $tc->features (-type => 'exon',
                                     -attributes => {Parent => $tc->group}
);        
    
        for (@exons){
            my $ex_id    = $_->id;
            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});

        }
    
    }
    my @values = values %ex_list;
    return(\@values);
}

sub cluster {
    my $exons_p = shift;
    
    for (my $s = 0; $s <= $#{$exons_p}; $s++){
        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
            my $exon1 = $exons_p->[$s];
            my $exon2 = $exons_p->[$t];
            
            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
            
                my $overlap = $exon1->intersection($exon2);
                
                print "===\n";;
                print "ex1\n", $exon1->seq, "\n";
                print "ex2\n", $exon2->seq, "\n";
                print "overlap\n", $overlap->seq, "\n";
            }
        }
    }
}
______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From osborne1 at optonline.net  Tue May  2 16:17:29 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 02 May 2006 16:17:29 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D0179.2183%mblanche@berkeley.edu>
Message-ID: <C07D3699.84BC%osborne1@optonline.net>

Marco,

Yes, this is how intersection() is supposed to work. If both of the Range
objects have the same strand then the strand information is returned as part
of the result but if they aren't on the same strand then no strand
information is returned.

Brian O.


On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:

> Dear all--
> 
> I have been trying to use the intersection function to extract overlapping
> region from alternatively spliced exons as in the following script. The
> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
> Is this behavior expected? Should I check the strand of $exon1 before
> working on the object return by any Bio::RangeI function?
> 
> Many thanks 
> 
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
> 
> MAIN:{
> 
>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                 -dsn =>
> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                 -user => 'guest');
>     my $test_db = $db->segment('4');
>     
>     # Load up the exons into $exons_p
>     for my $gene ($test_db->features(-types => 'gene')){
> 
>         my $exons_p = extractExons($gene);
> 
>         cluster($exons_p) unless ($#{$exons_p} == -1);
> 
>     }
> }
> 
> sub extractExons {
>     my $gene = shift;
>     my %ex_list;
>     my @tcs = $gene->features(    -type =>'processed_transcript',
>                                     -attributes =>{Gene => $gene->group});
>                  
>     for my $tc (@tcs){
>         my @exons = $tc->features (-type => 'exon',
>                                      -attributes => {Parent => $tc->group}
> );        
>     
>         for (@exons){
>             my $ex_id    = $_->id;
>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> 
>         }
>     
>     }
>     my @values = values %ex_list;
>     return(\@values);
> }
> 
> sub cluster {
>     my $exons_p = shift;
>     
>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>             my $exon1 = $exons_p->[$s];
>             my $exon2 = $exons_p->[$t];
>             
>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>             
>                 my $overlap = $exon1->intersection($exon2);
>                 
>                 print "===\n";;
>                 print "ex1\n", $exon1->seq, "\n";
>                 print "ex2\n", $exon2->seq, "\n";
>                 print "overlap\n", $overlap->seq, "\n";
>             }
>         }
>     }
> }
> ______________________________
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062


From mblanche at berkeley.edu  Tue May  2 16:32:58 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 13:32:58 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D3699.84BC%osborne1@optonline.net>
Message-ID: <C07D100A.218A%mblanche@berkeley.edu>

Brian--

Even when both elements of intersection() are from the negative strand, the
return object is from the positive strand and $overlap is actually the
revervese complement of the intersection between the 2 exons. Here is part
of the output from the script below:

===
ex1     Strand: -1
CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
ex2     Strand: -1
CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
CAAATCG
overlap Strand: 1
CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
TGCCGACTGCCATGTTCAACTAATAAACCGG
AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
...

If both are from the positive strand, the return object is positive as in:

===
ex1     Strand: 1
CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
TTTGTGCCTGTTTCAGTATAAATTAATTATG
CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
AAATATACATATATGCAACATATATAACTTC
CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
ex2     Strand: 1
ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
overlap Strand: 1
CAACGCAGACGTG

Is there something I am missing? Here is the script generating the output

Many thanks all...

Marco


use strict;
use warnings;
use Bio::DB::GFF;

MAIN:{

    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                -dsn =>
'dbi:mysql:database=dmel_43_LS;host=riolab.net',
                                -user => 'guest');
    my $test_db = $db->segment('4');
    
    # Load up the exons into $exons_p
    for my $gene ($test_db->features(-types => 'gene')){

        my $exons_p = extractExons($gene);

        cluster($exons_p) unless ($#{$exons_p} == -1);

    }
}

sub extractExons {
    my $gene = shift;
    my %ex_list;
    my @tcs = $gene->features(    -type =>'processed_transcript',
                                    -attributes =>{Gene => $gene->group});
                   
    for my $tc (@tcs){
        my @exons = $tc->features (-type => 'exon',
                                     -attributes => {Parent => $tc->group}
);        
    
        for (@exons){
            my $ex_id    = $_->id;
            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});

        }
    
    }
    my @values = values %ex_list;
    return(\@values);
}

sub cluster {
    my $exons_p = shift;
    
    for (my $s = 0; $s <= $#{$exons_p}; $s++){
        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
            my $exon1 = $exons_p->[$s];
            my $exon2 = $exons_p->[$t];
            
            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
            
                my $overlap = $exon1->intersection($exon2);
                
                print "===\n";;
                print     "ex1\tStrand: ", $exon1->strand, "\n",
                        $exon1->seq, "\n";
                print     "ex2\tStrand: ", $exon2->strand, "\n",
                        $exon2->seq, "\n";
                print "overlap\tStrand: ", $overlap->strand, "\n",
                        $overlap->seq, "\n";
            }
        }
    }
}

On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:

> Marco,
> 
> Yes, this is how intersection() is supposed to work. If both of the Range
> objects have the same strand then the strand information is returned as part
> of the result but if they aren't on the same strand then no strand
> information is returned.
> 
> Brian O.
> 
> 
> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> 
>> Dear all--
>> 
>> I have been trying to use the intersection function to extract overlapping
>> region from alternatively spliced exons as in the following script. The
>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
>> Is this behavior expected? Should I check the strand of $exon1 before
>> working on the object return by any Bio::RangeI function?
>> 
>> Many thanks 
>> 
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> 
>> MAIN:{
>> 
>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                 -dsn =>
>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>                                 -user => 'guest');
>>     my $test_db = $db->segment('4');
>>     
>>     # Load up the exons into $exons_p
>>     for my $gene ($test_db->features(-types => 'gene')){
>> 
>>         my $exons_p = extractExons($gene);
>> 
>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>> 
>>     }
>> }
>> 
>> sub extractExons {
>>     my $gene = shift;
>>     my %ex_list;
>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>                                     -attributes =>{Gene => $gene->group});
>>                 
>>     for my $tc (@tcs){
>>         my @exons = $tc->features (-type => 'exon',
>>                                      -attributes => {Parent => $tc->group}
>> );        
>>     
>>         for (@exons){
>>             my $ex_id    = $_->id;
>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>> 
>>         }
>>     
>>     }
>>     my @values = values %ex_list;
>>     return(\@values);
>> }
>> 
>> sub cluster {
>>     my $exons_p = shift;
>>     
>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>             my $exon1 = $exons_p->[$s];
>>             my $exon2 = $exons_p->[$t];
>>             
>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>             
>>                 my $overlap = $exon1->intersection($exon2);
>>                 
>>                 print "===\n";;
>>                 print "ex1\n", $exon1->seq, "\n";
>>                 print "ex2\n", $exon2->seq, "\n";
>>                 print "overlap\n", $overlap->seq, "\n";
>>             }
>>         }
>>     }
>> }
>> ______________________________
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at uclink.berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
> 
> 

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From osborne1 at optonline.net  Tue May  2 17:49:49 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 02 May 2006 17:49:49 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D100A.218A%mblanche@berkeley.edu>
Message-ID: <C07D4C3D.84C4%osborne1@optonline.net>

Marco,

Odd, because the intersection() code is quite simple and it's clear how it
should behave. What version of Bioperl are you using? I'm looking at the
latest, in bioperl-live...

Brian O.


On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:

> Brian--
> 
> Even when both elements of intersection() are from the negative strand, the
> return object is from the positive strand and $overlap is actually the
> revervese complement of the intersection between the 2 exons. Here is part
> of the output from the script below:
> 
> ===
> ex1     Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
> ex2     Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
> CAAATCG
> overlap Strand: 1
> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
> TGCCGACTGCCATGTTCAACTAATAAACCGG
> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
> ...
> 
> If both are from the positive strand, the return object is positive as in:
> 
> ===
> ex1     Strand: 1
> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
> TTTGTGCCTGTTTCAGTATAAATTAATTATG
> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
> AAATATACATATATGCAACATATATAACTTC
> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
> ex2     Strand: 1
> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
> overlap Strand: 1
> CAACGCAGACGTG
> 
> Is there something I am missing? Here is the script generating the output
> 
> Many thanks all...
> 
> Marco
> 
> 
> use strict;
> use warnings;
> use Bio::DB::GFF;
> 
> MAIN:{
> 
>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                 -dsn =>
> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                 -user => 'guest');
>     my $test_db = $db->segment('4');
>     
>     # Load up the exons into $exons_p
>     for my $gene ($test_db->features(-types => 'gene')){
> 
>         my $exons_p = extractExons($gene);
> 
>         cluster($exons_p) unless ($#{$exons_p} == -1);
> 
>     }
> }
> 
> sub extractExons {
>     my $gene = shift;
>     my %ex_list;
>     my @tcs = $gene->features(    -type =>'processed_transcript',
>                                     -attributes =>{Gene => $gene->group});
>                  
>     for my $tc (@tcs){
>         my @exons = $tc->features (-type => 'exon',
>                                      -attributes => {Parent => $tc->group}
> );        
>     
>         for (@exons){
>             my $ex_id    = $_->id;
>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> 
>         }
>     
>     }
>     my @values = values %ex_list;
>     return(\@values);
> }
> 
> sub cluster {
>     my $exons_p = shift;
>     
>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>             my $exon1 = $exons_p->[$s];
>             my $exon2 = $exons_p->[$t];
>             
>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>             
>                 my $overlap = $exon1->intersection($exon2);
>                 
>                 print "===\n";;
>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
>                         $exon1->seq, "\n";
>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
>                         $exon2->seq, "\n";
>                 print "overlap\tStrand: ", $overlap->strand, "\n",
>                         $overlap->seq, "\n";
>             }
>         }
>     }
> }
> 
> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
> 
>> Marco,
>> 
>> Yes, this is how intersection() is supposed to work. If both of the Range
>> objects have the same strand then the strand information is returned as part
>> of the result but if they aren't on the same strand then no strand
>> information is returned.
>> 
>> Brian O.
>> 
>> 
>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>> 
>>> Dear all--
>>> 
>>> I have been trying to use the intersection function to extract overlapping
>>> region from alternatively spliced exons as in the following script. The
>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
>>> Is this behavior expected? Should I check the strand of $exon1 before
>>> working on the object return by any Bio::RangeI function?
>>> 
>>> Many thanks 
>>> 
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::DB::GFF;
>>> 
>>> MAIN:{
>>> 
>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>                                 -dsn =>
>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>>                                 -user => 'guest');
>>>     my $test_db = $db->segment('4');
>>>     
>>>     # Load up the exons into $exons_p
>>>     for my $gene ($test_db->features(-types => 'gene')){
>>> 
>>>         my $exons_p = extractExons($gene);
>>> 
>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>>> 
>>>     }
>>> }
>>> 
>>> sub extractExons {
>>>     my $gene = shift;
>>>     my %ex_list;
>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>>                                     -attributes =>{Gene => $gene->group});
>>>                
>>>     for my $tc (@tcs){
>>>         my @exons = $tc->features (-type => 'exon',
>>>                                      -attributes => {Parent => $tc->group}
>>> );        
>>>     
>>>         for (@exons){
>>>             my $ex_id    = $_->id;
>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>> 
>>>         }
>>>     
>>>     }
>>>     my @values = values %ex_list;
>>>     return(\@values);
>>> }
>>> 
>>> sub cluster {
>>>     my $exons_p = shift;
>>>     
>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>>             my $exon1 = $exons_p->[$s];
>>>             my $exon2 = $exons_p->[$t];
>>>             
>>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>             
>>>                 my $overlap = $exon1->intersection($exon2);
>>>                
>>>                 print "===\n";;
>>>                 print "ex1\n", $exon1->seq, "\n";
>>>                 print "ex2\n", $exon2->seq, "\n";
>>>                 print "overlap\n", $overlap->seq, "\n";
>>>             }
>>>         }
>>>     }
>>> }
>>> ______________________________
>>> Marco Blanchette, Ph.D.
>>> 
>>> mblanche at uclink.berkeley.edu
>>> 
>>> Donald C. Rio's lab
>>> Department of Molecular and Cell Biology
>>> 16 Barker Hall
>>> University of California
>>> Berkeley, CA 94720-3204
>>> 
>>> Tel: (510) 642-1084
>>> Cell: (510) 847-0996
>>> Fax: (510) 642-6062
>> 
>> 
> 
> ______________________________
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062


From mblanche at berkeley.edu  Tue May  2 18:31:44 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 15:31:44 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D4C3D.84C4%osborne1@optonline.net>
Message-ID: <C07D2BE0.2196%mblanche@berkeley.edu>

Brian--

I checked out last week version from the CVS.

Silly question: How do I get the version of BioPerl I am using... Never had
to check a module/bundle version number before...

Marco


On 5/2/06 14:49, "Brian Osborne" <osborne1 at optonline.net> wrote:

> Marco,
> 
> Odd, because the intersection() code is quite simple and it's clear how it
> should behave. What version of Bioperl are you using? I'm looking at the
> latest, in bioperl-live...
> 
> Brian O.
> 
> 
> On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> 
>> Brian--
>> 
>> Even when both elements of intersection() are from the negative strand, the
>> return object is from the positive strand and $overlap is actually the
>> revervese complement of the intersection between the 2 exons. Here is part
>> of the output from the script below:
>> 
>> ===
>> ex1     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
>> ex2     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
>> CAAATCG
>> overlap Strand: 1
>> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
>> TGCCGACTGCCATGTTCAACTAATAAACCGG
>> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
>> ...
>> 
>> If both are from the positive strand, the return object is positive as in:
>> 
>> ===
>> ex1     Strand: 1
>> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
>> TTTGTGCCTGTTTCAGTATAAATTAATTATG
>> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
>> AAATATACATATATGCAACATATATAACTTC
>> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
>> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
>> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
>> ex2     Strand: 1
>> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
>> overlap Strand: 1
>> CAACGCAGACGTG
>> 
>> Is there something I am missing? Here is the script generating the output
>> 
>> Many thanks all...
>> 
>> Marco
>> 
>> 
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> 
>> MAIN:{
>> 
>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                 -dsn =>
>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>                                 -user => 'guest');
>>     my $test_db = $db->segment('4');
>>     
>>     # Load up the exons into $exons_p
>>     for my $gene ($test_db->features(-types => 'gene')){
>> 
>>         my $exons_p = extractExons($gene);
>> 
>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>> 
>>     }
>> }
>> 
>> sub extractExons {
>>     my $gene = shift;
>>     my %ex_list;
>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>                                     -attributes =>{Gene => $gene->group});
>>                 
>>     for my $tc (@tcs){
>>         my @exons = $tc->features (-type => 'exon',
>>                                      -attributes => {Parent => $tc->group}
>> );        
>>     
>>         for (@exons){
>>             my $ex_id    = $_->id;
>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>> 
>>         }
>>     
>>     }
>>     my @values = values %ex_list;
>>     return(\@values);
>> }
>> 
>> sub cluster {
>>     my $exons_p = shift;
>>     
>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>             my $exon1 = $exons_p->[$s];
>>             my $exon2 = $exons_p->[$t];
>>             
>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>             
>>                 my $overlap = $exon1->intersection($exon2);
>>                 
>>                 print "===\n";;
>>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
>>                         $exon1->seq, "\n";
>>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
>>                         $exon2->seq, "\n";
>>                 print "overlap\tStrand: ", $overlap->strand, "\n",
>>                         $overlap->seq, "\n";
>>             }
>>         }
>>     }
>> }
>> 
>> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
>> 
>>> Marco,
>>> 
>>> Yes, this is how intersection() is supposed to work. If both of the Range
>>> objects have the same strand then the strand information is returned as part
>>> of the result but if they aren't on the same strand then no strand
>>> information is returned.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>>> 
>>>> Dear all--
>>>> 
>>>> I have been trying to use the intersection function to extract overlapping
>>>> region from alternatively spliced exons as in the following script. The
>>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>>> actually loosing the strand of $exon1 if $exon1 is from the negative
>>>> strand.
>>>> Is this behavior expected? Should I check the strand of $exon1 before
>>>> working on the object return by any Bio::RangeI function?
>>>> 
>>>> Many thanks 
>>>> 
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use Bio::DB::GFF;
>>>> 
>>>> MAIN:{
>>>> 
>>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>>                                 -dsn =>
>>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>>>                                 -user => 'guest');
>>>>     my $test_db = $db->segment('4');
>>>>     
>>>>     # Load up the exons into $exons_p
>>>>     for my $gene ($test_db->features(-types => 'gene')){
>>>> 
>>>>         my $exons_p = extractExons($gene);
>>>> 
>>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>>>> 
>>>>     }
>>>> }
>>>> 
>>>> sub extractExons {
>>>>     my $gene = shift;
>>>>     my %ex_list;
>>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>>>                                     -attributes =>{Gene => $gene->group});
>>>>               
>>>>     for my $tc (@tcs){
>>>>         my @exons = $tc->features (-type => 'exon',
>>>>                                      -attributes => {Parent => $tc->group}
>>>> );        
>>>>     
>>>>         for (@exons){
>>>>             my $ex_id    = $_->id;
>>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>>> 
>>>>         }
>>>>     
>>>>     }
>>>>     my @values = values %ex_list;
>>>>     return(\@values);
>>>> }
>>>> 
>>>> sub cluster {
>>>>     my $exons_p = shift;
>>>>     
>>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>>>             my $exon1 = $exons_p->[$s];
>>>>             my $exon2 = $exons_p->[$t];
>>>>             
>>>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>>             
>>>>                 my $overlap = $exon1->intersection($exon2);
>>>>               
>>>>                 print "===\n";;
>>>>                 print "ex1\n", $exon1->seq, "\n";
>>>>                 print "ex2\n", $exon2->seq, "\n";
>>>>                 print "overlap\n", $overlap->seq, "\n";
>>>>             }
>>>>         }
>>>>     }
>>>> }
>>>> ______________________________
>>>> Marco Blanchette, Ph.D.
>>>> 
>>>> mblanche at uclink.berkeley.edu
>>>> 
>>>> Donald C. Rio's lab
>>>> Department of Molecular and Cell Biology
>>>> 16 Barker Hall
>>>> University of California
>>>> Berkeley, CA 94720-3204
>>>> 
>>>> Tel: (510) 642-1084
>>>> Cell: (510) 847-0996
>>>> Fax: (510) 642-6062
>>> 
>>> 
>> 
>> ______________________________
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at uclink.berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
> 
> 

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From arareko at campus.iztacala.unam.mx  Tue May  2 18:32:24 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Tue, 02 May 2006 17:32:24 -0500
Subject: [Bioperl-l] BioPerl-run in FreeBSD
Message-ID: <4457DDF8.4050005@campus.iztacala.unam.mx>

It?s my great pleasure to announce the availability of the BioPerl-run 
packages (stable & developer releases) for the FreeBSD operating system.

For instructions on how to install BioPerl ports in FreeBSD, please take 
a look into the Getting Bioperl section of the BioPerl Wiki.

Regards,
Mauricio.
-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From heikki at sanbi.ac.za  Wed May  3 02:51:12 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 3 May 2006 08:51:12 +0200
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D2BE0.2196%mblanche@berkeley.edu>
References: <C07D2BE0.2196%mblanche@berkeley.edu>
Message-ID: <200605030851.13007.heikki@sanbi.ac.za>

On Wednesday 03 May 2006 00:31, Marco Blanchette wrote:
> Brian--
>
> I checked out last week version from the CVS.
>
> Silly question: How do I get the version of BioPerl I am using... Never had
> to check a module/bundle version number before...

It is not that silly. The syntax in not too easy:

	perl -MBio::Perl -le 'print Bio::Perl->VERSION;'

You can use any module in bioperl, of course.

     -Heikki

> Marco
>
> On 5/2/06 14:49, "Brian Osborne" <osborne1 at optonline.net> wrote:
> > Marco,
> >
> > Odd, because the intersection() code is quite simple and it's clear how
> > it should behave. What version of Bioperl are you using? I'm looking at
> > the latest, in bioperl-live...
> >
> > Brian O.
> >
> > On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> >> Brian--
> >>
> >> Even when both elements of intersection() are from the negative strand,
> >> the return object is from the positive strand and $overlap is actually
> >> the revervese complement of the intersection between the 2 exons. Here
> >> is part of the output from the script below:
> >>
> >> ===
> >> ex1     Strand: -1
> >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA
> >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
> >> ex2     Strand: -1
> >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA
> >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAAC
> >>CCGT CAAATCG
> >> overlap Strand: 1
> >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTA
> >>TTTT TGCCGACTGCCATGTTCAACTAATAAACCGG
> >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
> >> ...
> >>
> >> If both are from the positive strand, the return object is positive as
> >> in:
> >>
> >> ===
> >> ex1     Strand: 1
> >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCT
> >>TTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG
> >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGAT
> >>GAAT AAATATACATATATGCAACATATATAACTTC
> >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGG
> >>CAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
> >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
> >> ex2     Strand: 1
> >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
> >> overlap Strand: 1
> >> CAACGCAGACGTG
> >>
> >> Is there something I am missing? Here is the script generating the
> >> output
> >>
> >> Many thanks all...
> >>
> >> Marco
> >>
> >>
> >> use strict;
> >> use warnings;
> >> use Bio::DB::GFF;
> >>
> >> MAIN:{
> >>
> >>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >>                                 -dsn =>
> >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
> >>                                 -user => 'guest');
> >>     my $test_db = $db->segment('4');
> >>
> >>     # Load up the exons into $exons_p
> >>     for my $gene ($test_db->features(-types => 'gene')){
> >>
> >>         my $exons_p = extractExons($gene);
> >>
> >>         cluster($exons_p) unless ($#{$exons_p} == -1);
> >>
> >>     }
> >> }
> >>
> >> sub extractExons {
> >>     my $gene = shift;
> >>     my %ex_list;
> >>     my @tcs = $gene->features(    -type =>'processed_transcript',
> >>                                     -attributes =>{Gene =>
> >> $gene->group});
> >>
> >>     for my $tc (@tcs){
> >>         my @exons = $tc->features (-type => 'exon',
> >>                                      -attributes => {Parent =>
> >> $tc->group} );
> >>
> >>         for (@exons){
> >>             my $ex_id    = $_->id;
> >>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> >>
> >>         }
> >>
> >>     }
> >>     my @values = values %ex_list;
> >>     return(\@values);
> >> }
> >>
> >> sub cluster {
> >>     my $exons_p = shift;
> >>
> >>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
> >>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
> >>             my $exon1 = $exons_p->[$s];
> >>             my $exon2 = $exons_p->[$t];
> >>
> >>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
> >>
> >>                 my $overlap = $exon1->intersection($exon2);
> >>
> >>                 print "===\n";;
> >>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
> >>                         $exon1->seq, "\n";
> >>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
> >>                         $exon2->seq, "\n";
> >>                 print "overlap\tStrand: ", $overlap->strand, "\n",
> >>                         $overlap->seq, "\n";
> >>             }
> >>         }
> >>     }
> >> }
> >>
> >> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
> >>> Marco,
> >>>
> >>> Yes, this is how intersection() is supposed to work. If both of the
> >>> Range objects have the same strand then the strand information is
> >>> returned as part of the result but if they aren't on the same strand
> >>> then no strand information is returned.
> >>>
> >>> Brian O.
> >>>
> >>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> >>>> Dear all--
> >>>>
> >>>> I have been trying to use the intersection function to extract
> >>>> overlapping region from alternatively spliced exons as in the
> >>>> following script. The returned object from the 'my $overlap =
> >>>> $exon1->intersection($exon2);' is actually loosing the strand of
> >>>> $exon1 if $exon1 is from the negative strand.
> >>>> Is this behavior expected? Should I check the strand of $exon1 before
> >>>> working on the object return by any Bio::RangeI function?
> >>>>
> >>>> Many thanks
> >>>>
> >>>> #!/usr/bin/perl
> >>>> use strict;
> >>>> use warnings;
> >>>> use Bio::DB::GFF;
> >>>>
> >>>> MAIN:{
> >>>>
> >>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >>>>                                 -dsn =>
> >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
> >>>>                                 -user => 'guest');
> >>>>     my $test_db = $db->segment('4');
> >>>>
> >>>>     # Load up the exons into $exons_p
> >>>>     for my $gene ($test_db->features(-types => 'gene')){
> >>>>
> >>>>         my $exons_p = extractExons($gene);
> >>>>
> >>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
> >>>>
> >>>>     }
> >>>> }
> >>>>
> >>>> sub extractExons {
> >>>>     my $gene = shift;
> >>>>     my %ex_list;
> >>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
> >>>>                                     -attributes =>{Gene =>
> >>>> $gene->group});
> >>>>
> >>>>     for my $tc (@tcs){
> >>>>         my @exons = $tc->features (-type => 'exon',
> >>>>                                      -attributes => {Parent =>
> >>>> $tc->group} );
> >>>>
> >>>>         for (@exons){
> >>>>             my $ex_id    = $_->id;
> >>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> >>>>
> >>>>         }
> >>>>
> >>>>     }
> >>>>     my @values = values %ex_list;
> >>>>     return(\@values);
> >>>> }
> >>>>
> >>>> sub cluster {
> >>>>     my $exons_p = shift;
> >>>>
> >>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
> >>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
> >>>>             my $exon1 = $exons_p->[$s];
> >>>>             my $exon2 = $exons_p->[$t];
> >>>>
> >>>>             if (!($exon1->equals($exon2)) &&
> >>>> $exon1->overlaps($exon2)){
> >>>>
> >>>>                 my $overlap = $exon1->intersection($exon2);
> >>>>
> >>>>                 print "===\n";;
> >>>>                 print "ex1\n", $exon1->seq, "\n";
> >>>>                 print "ex2\n", $exon2->seq, "\n";
> >>>>                 print "overlap\n", $overlap->seq, "\n";
> >>>>             }
> >>>>         }
> >>>>     }
> >>>> }
> >>>> ______________________________
> >>>> Marco Blanchette, Ph.D.
> >>>>
> >>>> mblanche at uclink.berkeley.edu
> >>>>
> >>>> Donald C. Rio's lab
> >>>> Department of Molecular and Cell Biology
> >>>> 16 Barker Hall
> >>>> University of California
> >>>> Berkeley, CA 94720-3204
> >>>>
> >>>> Tel: (510) 642-1084
> >>>> Cell: (510) 847-0996
> >>>> Fax: (510) 642-6062
> >>
> >> ______________________________
> >> Marco Blanchette, Ph.D.
> >>
> >> mblanche at uclink.berkeley.edu
> >>
> >> Donald C. Rio's lab
> >> Department of Molecular and Cell Biology
> >> 16 Barker Hall
> >> University of California
> >> Berkeley, CA 94720-3204
> >>
> >> Tel: (510) 642-1084
> >> Cell: (510) 847-0996
> >> Fax: (510) 642-6062
>
> ______________________________
> Marco Blanchette, Ph.D.
>
> mblanche at uclink.berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From nuclearn at gmail.com  Wed May  3 02:05:42 2006
From: nuclearn at gmail.com (Li Xiao)
Date: Wed, 3 May 2006 14:05:42 +0800
Subject: [Bioperl-l] about the frame and strand of a blastx report
Message-ID: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com>

Hi, anybody,

    I am working to parse a blastx report by using BioPerl modules
(Bio::SearchIO).
The blastx result was created by NCBI-BLAST. How i can obtain the strand ( +
or -)
of query sequence against the hited protein? I tried to use the strand
function, but
nothing were reported. And i used the frame funtion, the result usually
display 0,1,2,
so, the result can not give any information about the query strand( + o r-
).
  How i obtain the strand of a query squence?
--
*********************************************************************
Li Xiao
Sichuan Key Laboratory of Molecular Biology and Biotechnology
College of Life Science, Sichuan University
Chengdu, SiChuan, P.R.China
TEL:86-28-85470083 FAX:86-28-85412738
E-MAIL: nuclearn at gmail.com
URL: http://scbi.scu.edu.cn
**********************************************************************


From cjfields at uiuc.edu  Wed May  3 09:38:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 08:38:17 -0500
Subject: [Bioperl-l] about the frame and strand of a blastx report
In-Reply-To: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com>
Message-ID: <000601c66eb6$d5d5f530$15327e82@pyrimidine>

$hsp->strand():

my $parser = Bio::SearchIO->new (-file => shift @ARGV,
                                 -format => 'blast');

while (my $result = $parser->next_result) {
    while (my $hit = $result->next_hit) {
        while (my $hsp = $hit->next_hsp) {
            print $hsp->strand,"\n";
        }
    }
}

This will give 1 or -1.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Li Xiao
> Sent: Wednesday, May 03, 2006 1:06 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] about the frame and strand of a blastx report
> 
> Hi, anybody,
> 
>     I am working to parse a blastx report by using BioPerl modules
> (Bio::SearchIO).
> The blastx result was created by NCBI-BLAST. How i can obtain the strand (
> +
> or -)
> of query sequence against the hited protein? I tried to use the strand
> function, but
> nothing were reported. And i used the frame funtion, the result usually
> display 0,1,2,
> so, the result can not give any information about the query strand( + o r-
> ).
>   How i obtain the strand of a query squence?
> --
> *********************************************************************
> Li Xiao
> Sichuan Key Laboratory of Molecular Biology and Biotechnology
> College of Life Science, Sichuan University
> Chengdu, SiChuan, P.R.China
> TEL:86-28-85470083 FAX:86-28-85412738
> E-MAIL: nuclearn at gmail.com
> URL: http://scbi.scu.edu.cn
> **********************************************************************
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Wed May  3 11:22:27 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 03 May 2006 11:22:27 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
Message-ID: <C07E42F3.84E3%osborne1@optonline.net>

Mark,

So you're trying to get the information in the RC line from a Swissprot
format file?

Brian O.


On 5/2/06 7:41 AM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:

> Hello all.
> 
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
> 
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
> 
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
> 
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
> 
> The code I have so far reports the species but not the subspecies or
> variant.  I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need.  (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.)  Also, this
> code only prints the information...  I know that I'll have to write a
> FASTA sequence object seperately.
> 
> Any suggestions?
> 
> Thanks,
> Mark
> 
> ---   ---   ---
> 
> 
> #!/usr/bin/perl
> 
> 
> 
> use Bio::SeqIO;
> 
> 
> 
> my $usage = "getaccs.pl file format\n";
> 
> my $file = shift or die $usage;
> 
> my $format = shift or die $usage;
> 
> 
> 
> my $inseq = Bio::SeqIO->new(-file   => "<$file",
> 
>    -format => $format );
> 
> 
> 
> while (my $seq = $inseq->next_seq) {
> 
> 
> 
>   my $species_object = $seq->species;
> 
>   my $species_string = $species_object->species;
> 
>   my $variant_string = $species_object->variant;
> 
>   my $common_string = $species_object->common_name;
> 
>   my $sub_string = $species_object->sub_species;
> 
>   my $binomial = $species_object->binomial('FULL');
> 
>   
> 
>   print "display   ",$seq->display_id,"\n";
> 
>   print "accession ",$seq->accession_number,"\n";
> 
>   print "desc      ",$seq->desc,"\n";
> 
>   
> 
>   print "species   ",$species_string,"\n";
> 
>   print "variant   ",$variant_string,"\n";
> 
>   print "common    ",$common_string,"\n";
> 
>   print "sub       ",$sub_string,"\n";
> 
>   print "binomial  ",$binomial,"\n";
> 
>   
> 
>   print $seq->seq,"\n";
> 
>   
> 
>   my $anno_collection = $seq->annotation;
> 
>   for my $key ( $anno_collection->get_all_annotation_keys ) {
> 
>     my @annotations = $anno_collection->get_Annotations($key);
> 
>     for my $value ( @annotations ) {
> 
>       print "tagname : ", $value->tagname, "\n";
> 
>       # $value is an Bio::Annotation, and has an "as_text" method
> 
>       print "  annotation value: ", $value->as_text, "\n";
> 
> 
> 
>        if ($value->tagname eq "reference") {
> 
>         my $hash_ref = $value->hash_tree;
> 
>         for my $key (keys %{$hash_ref}) {
> 
>           print $key,": ",$hash_ref->{$key},"\n";
> 
>         }
> 
>       }
> 
>     }
> 
>   }
> 
>   print "\n";
> 
> }
> 
> exit;
> 
> 
> 
> 
> 
> ---   ---   ---   ---   ---   ---   ---   ---
> 
> Mark A. Miller
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From MEC at stowers-institute.org  Wed May  3 11:09:04 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 3 May 2006 10:09:04 -0500
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
Message-ID: <CED81D34E37D5043A1211565277A51E504D2E369@exchkc02.stowers-institute.org>

Marco,

It appears that your code assumes that the exons as returned from call
to BIO::DB::GFF::features are sorted by start; I don't think is
guaranteed (at least not in the documentation I'm reading).  Also I
think your code will not report overlap between two exons that have an
intervening overlapping exon.  Depending on what you're application is,
you may care.  For example, e1, e2, e3 all intersect pairwise, but your
code won't report on e1's overlap with e3.

e1 ---*******-------
e2 -----******------
e3 ------***--------

Out of curiousity, what is your application?  Designing primers for gene
resequencing?

Cheers,

Malcolm Cook
Database Applications Manager, Bioinformatics
Stowers Institute for Medical Research 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Marco Blanchette
>Sent: Tuesday, May 02, 2006 2:31 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
>
>Dear all--
>
>I have been trying to use the intersection function to extract 
>overlapping
>region from alternatively spliced exons as in the following script. The
>returned object from the 'my $overlap = 
>$exon1->intersection($exon2);' is
>actually loosing the strand of $exon1 if $exon1 is from the 
>negative strand.
>Is this behavior expected? Should I check the strand of $exon1 before
>working on the object return by any Bio::RangeI function?
>
>Many thanks 
>
>#!/usr/bin/perl
>use strict;
>use warnings;
>use Bio::DB::GFF;
>
>MAIN:{
>
>    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                -dsn =>
>'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                -user => 'guest');
>    my $test_db = $db->segment('4');
>    
>    # Load up the exons into $exons_p
>    for my $gene ($test_db->features(-types => 'gene')){
>
>        my $exons_p = extractExons($gene);
>
>        cluster($exons_p) unless ($#{$exons_p} == -1);
>
>    }
>}
>
>sub extractExons {
>    my $gene = shift;
>    my %ex_list;
>    my @tcs = $gene->features(    -type =>'processed_transcript',
>                                    -attributes =>{Gene => 
>$gene->group});
>                   
>    for my $tc (@tcs){
>        my @exons = $tc->features (-type => 'exon',
>                                     -attributes => {Parent => 
>$tc->group}
>);        
>    
>        for (@exons){
>            my $ex_id    = $_->id;
>            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>
>        }
>    
>    }
>    my @values = values %ex_list;
>    return(\@values);
>}
>
>sub cluster {
>    my $exons_p = shift;
>    
>    for (my $s = 0; $s <= $#{$exons_p}; $s++){
>        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>            my $exon1 = $exons_p->[$s];
>            my $exon2 = $exons_p->[$t];
>            
>            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>            
>                my $overlap = $exon1->intersection($exon2);
>                
>                print "===\n";;
>                print "ex1\n", $exon1->seq, "\n";
>                print "ex2\n", $exon2->seq, "\n";
>                print "overlap\n", $overlap->seq, "\n";
>            }
>        }
>    }
>}
>______________________________
>Marco Blanchette, Ph.D.
>
>mblanche at uclink.berkeley.edu
>
>Donald C. Rio's lab
>Department of Molecular and Cell Biology
>16 Barker Hall
>University of California
>Berkeley, CA 94720-3204
>
>Tel: (510) 642-1084
>Cell: (510) 847-0996
>Fax: (510) 642-6062
>-- 
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From sdavis2 at mail.nih.gov  Wed May  3 12:18:48 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 03 May 2006 12:18:48 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <CED81D34E37D5043A1211565277A51E504D2E369@exchkc02.stowers-institute.org>
Message-ID: <C07E5028.AF8A%sdavis2@mail.nih.gov>


On 5/3/06 11:09 AM, "Cook, Malcolm" <MEC at stowers-institute.org> wrote:

> Marco,
> 
> It appears that your code assumes that the exons as returned from call
> to BIO::DB::GFF::features are sorted by start; I don't think is
> guaranteed (at least not in the documentation I'm reading).  Also I
> think your code will not report overlap between two exons that have an
> intervening overlapping exon.  Depending on what you're application is,
> you may care.  For example, e1, e2, e3 all intersect pairwise, but your
> code won't report on e1's overlap with e3.
> 
> e1 ---*******-------
> e2 -----******------
> e3 ------***--------

I think this can be done (looking for "superexons") via the UCSC table
browser or via Penn State University's Galaxy server (written in python and
downloadable) in case you want a quick solution to what I think is your
problem....

Sean


From osborne1 at optonline.net  Wed May  3 16:22:57 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 03 May 2006 16:22:57 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060503193446.92476.qmail@web50412.mail.yahoo.com>
Message-ID: <C07E8961.84F2%osborne1@optonline.net>

Mark,

The RC line is part of the description of a reference, I'm guessing 'RC'
stands for Reference Comment. In order to get the attributes of a reference
you'll first do something like:

my $anno_collection = $seq->annotation;
my @references = $anno_collection->get_Annotations('reference');

To get the comment field for a specific reference you can do:

$references[0]->comment;

See the Feature-Annotation HOWTO for more information on Annotations, the
Reference object is a kind of Annotation object.

Brian O.


On 5/3/06 3:34 PM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:

> Yeah.  Do you have any experience with that?
> 
> Mark
> 
> --- Brian Osborne <osborne1 at optonline.net> wrote:
> 
>> Mark,
>> 
>> So you're trying to get the information in the RC line from a
>> Swissprot
>> format file?
>> 
>> Brian O.
> 
> 
> ---   ---   ---   ---   ---   ---   ---   ---
> 
> Mark A. Miller
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 


From cjfields at uiuc.edu  Wed May  3 17:09:36 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 16:09:36 -0500
Subject: [Bioperl-l] Batch retrieval partially implemented in
	Bio::DB::GenBank/GenPept
Message-ID: <000601c66ef5$e3066d90$15327e82@pyrimidine>

Just wanted to let you guys know I have added a few bits and pieces to
Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
epost/efetch.  I didn't want to break anything too severely so you can only
use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
methods yet).  I also added tests to DB.t, a few each for protein and
nucleotide retrieval using batch mode and so far they all pass fine.  

I haven't tested the upper sequence limit for this yet to see if it's at all
comparable to just using efetch but it seems a bit faster.  The eutils
coursebook states that one should only post ~500 at a time (I think you can
get a bit higher though).

Also, at the moment it only works at the moment for GI's (NOT accessions,
which apparently epost does not accept).  If we want to continue using this
method for retrieval then we may need a workaround for accs.

CJF

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From torsten.seemann at infotech.monash.edu.au  Wed May  3 17:44:48 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 04 May 2006 07:44:48 +1000
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D2BE0.2196%mblanche@berkeley.edu>
References: <C07D2BE0.2196%mblanche@berkeley.edu>
Message-ID: <1146692688.12571.1.camel@chauvel.csse.monash.edu.au>

Marco,

> Silly question: How do I get the version of BioPerl I am using... Never had
> to check a module/bundle version number before...

http://bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

-- 
Torsten Seemann <torsten.seemann at infotech.monash.edu.au>
Victorian Bioinformatics Consortium


From cjfields at uiuc.edu  Wed May  3 18:08:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 17:08:37 -0500
Subject: [Bioperl-l] Batch retrieval partially implemented
	inBio::DB::GenBank/GenPept
In-Reply-To: <000601c66ef5$e3066d90$15327e82@pyrimidine>
Message-ID: <000001c66efe$21dbcf80$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Wednesday, May 03, 2006 4:10 PM
> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Batch retrieval partially implemented
> inBio::DB::GenBank/GenPept
> 
> Just wanted to let you guys know I have added a few bits and pieces to
> Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
                     ^^^^^^^^^^^^^^^^^^^
                     Bio::DB::NCBIHelper
Fat fingers!

> epost/efetch.  I didn't want to break anything too severely so you can
> only
> use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
> methods yet).  I also added tests to DB.t, a few each for protein and
> nucleotide retrieval using batch mode and so far they all pass fine.
> 
> I haven't tested the upper sequence limit for this yet to see if it's at
> all
> comparable to just using efetch but it seems a bit faster.  The eutils
> coursebook states that one should only post ~500 at a time (I think you
> can
> get a bit higher though).
> 
> Also, at the moment it only works at the moment for GI's (NOT accessions,
> which apparently epost does not accept).  If we want to continue using
> this
> method for retrieval then we may need a workaround for accs.
> 
> CJF
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed May  3 18:24:23 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 03 May 2006 17:24:23 -0500
Subject: [Bioperl-l] Batch retrieval partially
	implemented	inBio::DB::GenBank/GenPept
In-Reply-To: <000001c66efe$21dbcf80$15327e82@pyrimidine>
References: <000001c66efe$21dbcf80$15327e82@pyrimidine>
Message-ID: <44592D97.6090906@campus.iztacala.unam.mx>

hehehe :)

Chris Fields wrote:
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Wednesday, May 03, 2006 4:10 PM
>> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Batch retrieval partially implemented
>> inBio::DB::GenBank/GenPept
>>
>> Just wanted to let you guys know I have added a few bits and pieces to
>> Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
>                      ^^^^^^^^^^^^^^^^^^^
>                      Bio::DB::NCBIHelper
> Fat fingers!
> 
>> epost/efetch.  I didn't want to break anything too severely so you can
>> only
>> use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
>> methods yet).  I also added tests to DB.t, a few each for protein and
>> nucleotide retrieval using batch mode and so far they all pass fine.
>>
>> I haven't tested the upper sequence limit for this yet to see if it's at
>> all
>> comparable to just using efetch but it seems a bit faster.  The eutils
>> coursebook states that one should only post ~500 at a time (I think you
>> can
>> get a bit higher though).
>>
>> Also, at the moment it only works at the moment for GI's (NOT accessions,
>> which apparently epost does not accept).  If we want to continue using
>> this
>> method for retrieval then we may need a workaround for accs.
>>
>> CJF
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From fernan at iib.unsam.edu.ar  Wed May  3 20:38:07 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Wed, 3 May 2006 21:38:07 -0300
Subject: [Bioperl-l] BioPerl-run in FreeBSD
In-Reply-To: <4457DDF8.4050005@campus.iztacala.unam.mx>
References: <4457DDF8.4050005@campus.iztacala.unam.mx>
Message-ID: <20060504003807.GA86447@iib.unsam.edu.ar>

+----[ Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> (02.May.2006 19:49):
|
| It?s my great pleasure to announce the availability of the BioPerl-run 
| packages (stable & developer releases) for the FreeBSD operating system.
| 
| For instructions on how to install BioPerl ports in FreeBSD, please take 
| a look into the Getting Bioperl section of the BioPerl Wiki.
| 
+----]

Great job Mauricio,

thanks for contributing this!

Fernan

From miker at biotiquesystems.com  Tue May  2 23:31:59 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Tue, 2 May 2006 20:31:59 -0700
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
Message-ID: <007b01c66e62$23161d20$c100a8c0@mike>


I've encountered a pretty serious bug in Bio::SeqIO when parsing certain genbank
files that contain CONTIG entries with gaps.  One such record is NW_925173.

When I try to parse this file using Bio::SeqIO::genbank, it will enter an
infinite loop and spin until it runs out of memory.  

I'm pretty certain it relates to this bug:
http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate that
genbank records with CONTIG gaps are not valid and can't be parsed.  But this
bug actually claims to be fixed, which is strange, since looking at the code for
FTLocationFactory (where the loop is) it's still right there.  I assume that
this may be fixed in other contexts but is still not fixed in
Bio::SeqIO::genbank?  Or am I doing something wrong?

I think that this should probably be filed as an open bug.  I would think that
even if bioperl isn't interested in parsing this type of file via SeqIO,
certainly you'd want to ensure that no finite input file would send the parser
into an infinite loop.  Have others encountered this problem?  Is there any plan
to address it?

Thanks very much for any information or help!

-Mike

P.S.  I've played around with my version of FTLocationFactory and it seems to
actually work and parse the gaps.  I'm not sure if I've created other bugs or if
it works in all cases, but at least the parser doesn't die.  I also don't know
that my hacky code is appropriate for putting back in to BioPerl, but I'm happy
to provide it if someone wants to check it out and/or consider it for checkin.


From ULNJUJERYDIX at spammotel.com  Wed May  3 04:20:38 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 3 May 2006 16:20:38 +0800
Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with
	Bio::Graphics::Panel
Message-ID: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>

Help!
I can't figure out the docs instructions

I want to create an imagemap of short sequence matches with a longer one
with clickable imagemaps for the short sequences. I figure I can do this
easily enough using the example script for parsing blast output but I need
an example script to understand how to produce the html code for the
imagemap. I can find only rather cryptic references about how this can be
done (see below).

$boxes = $panel-E<gt>boxes
    @boxes = $panel-E<gt>boxes
    The boxes() method returns a list of arrayrefs containing the
coordinates of each glyph.  The method is useful for constructing an
image map.  In a scalar context, boxes() returns an arrayref.  In an
list context, the method returns the list directly.

    Each member of the list is an arrayref of the following format:

      [ $feature, $x1, $y1, $x2, $y2, $track ]

    The first element is the feature object; either an
Ace::Sequence::Feature, a Das::Segment::Feature, or another Bioperl
Bio::SeqFeatureI object.  The coordinates are the topleft and
bottomright corners of the glyph, including any space allocated for
labels. The track is the Bio::Graphics::Glyph object corresponding to
the track that the feature is rendered inside.

    $position = $panel-E<gt>track_position($track)
    After calling gd() or boxes(), you can learn the resulting Y
coordinate of a track by calling track_position() with the value
returned by add_track() or unshift_track().  This will return undef if
called before gd() or boxes() or with an invalid track.

    @pixel_coords = $panel-E<gt>location2pixel(@feature_coords)
    Public routine to map feature coordinates (in base pairs) into pixel
coordinates relative to the left-hand edge of the picture. If you
define a -background callback, the callback may wish to invoke this
routine in order to translate base coordinates into pixel coordinates.

    $left = $panel-E<gt>left
    $right = $panel-E<gt>right
    $top   = $panel-E<gt>top
    $bottom = $panel-E<gt>bottom
    Return the pixel coordinates of the *drawing area*     of the panel, that
is, exclusive of the padding.


got it from http://docs.bioperl.org/bioperl-live/Bio/Graphics/Panel.html


From s.johri at imperial.ac.uk  Thu May  4 08:50:34 2006
From: s.johri at imperial.ac.uk (Johri, Saurabh)
Date: Thu, 4 May 2006 13:50:34 +0100
Subject: [Bioperl-l] Fu and Li's D statistic - calculate
Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AB3@icex5.ic.ac.uk>

Hi all,

I'm trying to calculate Fu and Li's D summary statistic for a group of
sequences.
the function fu_and_li_D(@ingroup,$extmutations)  takes 2 args, the
first being the ingroup (population) and the second being the number of
external mutations
which is calculated from an outgroup sequence.. 
 
my question is, which function do i use to calculate the number of
external mutations ?
would this be the singleton_count() function ?
the singleton_count() function takes a PopGen object - which represents
a clustal alignment file...
would i include the outgroup in a multiple fasta file for alignment with
clustal ?
 
any suggestions as to how to calculate the number of external mutations
would be much appreciated
 
Thanks for your help!
 

Saurabh Johri
Centre for Molecular Microbiology & Infection
Imperial College London
SW7 2AZ

 
From hlapp at gmx.net  Thu May  4 12:30:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 4 May 2006 12:30:05 -0400
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike>
References: <007b01c66e62$23161d20$c100a8c0@mike>
Message-ID: <C9D4D0CB-8340-4157-A603-3935C8F581E6@gmx.net>

Infinite loop on a file you can download (i.e., as opposed to a file  
you tinkered with) is never ok. Could you file this as a bug report?  
And ideally attach your patch?

Thanks,

	-hilmar

On May 2, 2006, at 11:31 PM, Michael Rogoff wrote:

>
> I've encountered a pretty serious bug in Bio::SeqIO when parsing  
> certain genbank
> files that contain CONTIG entries with gaps.  One such record is  
> NW_925173.
>
> When I try to parse this file using Bio::SeqIO::genbank, it will  
> enter an
> infinite loop and spin until it runs out of memory.
>
> I'm pretty certain it relates to this bug:
> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to  
> indicate that
> genbank records with CONTIG gaps are not valid and can't be  
> parsed.  But this
> bug actually claims to be fixed, which is strange, since looking at  
> the code for
> FTLocationFactory (where the loop is) it's still right there.  I  
> assume that
> this may be fixed in other contexts but is still not fixed in
> Bio::SeqIO::genbank?  Or am I doing something wrong?
>
> I think that this should probably be filed as an open bug.  I would  
> think that
> even if bioperl isn't interested in parsing this type of file via  
> SeqIO,
> certainly you'd want to ensure that no finite input file would send  
> the parser
> into an infinite loop.  Have others encountered this problem?  Is  
> there any plan
> to address it?
>
> Thanks very much for any information or help!
>
> -Mike
>
> P.S.  I've played around with my version of FTLocationFactory and  
> it seems to
> actually work and parse the gaps.  I'm not sure if I've created  
> other bugs or if
> it works in all cases, but at least the parser doesn't die.  I also  
> don't know
> that my hacky code is appropriate for putting back in to BioPerl,  
> but I'm happy
> to provide it if someone wants to check it out and/or consider it  
> for checkin.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From saldroubi at yahoo.com  Thu May  4 13:03:00 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Thu, 4 May 2006 10:03:00 -0700 (PDT)
Subject: [Bioperl-l] Is webiste down?
Message-ID: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>

All,
  
  Is the bioperl website down?  I can't get to http://www.bioperl.org 
  
  
  Thank you. 
  

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com

From arareko at campus.iztacala.unam.mx  Thu May  4 14:22:52 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 04 May 2006 13:22:52 -0500
Subject: [Bioperl-l] Is webiste down?
In-Reply-To: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>
References: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>
Message-ID: <445A467C.4070700@campus.iztacala.unam.mx>

Website is ok, maybe your gateway can't lookup the bioperl server at the 
moment.

Regards,
Mauricio.

Sam Al-Droubi wrote:
> All,
>   
>   Is the bioperl website down?  I can't get to http://www.bioperl.org 
>   
>   
>   Thank you. 
>   
> 
> 
> Sincerely, 
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu May  4 14:40:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 4 May 2006 13:40:32 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike>
Message-ID: <000001c66faa$3a25b130$15327e82@pyrimidine>

Are you using the CONTIG record or the full GenBank file? 	I see
problems with both (using bioperl-live) which seem unrelated to one another.
The full file seems to be running a bit slow b/c the full GenBank record is
huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
memory).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> Sent: Tuesday, May 02, 2006 10:32 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> 
> I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> genbank
> files that contain CONTIG entries with gaps.  One such record is
> NW_925173.
> 
> When I try to parse this file using Bio::SeqIO::genbank, it will enter an
> infinite loop and spin until it runs out of memory.
> 
> I'm pretty certain it relates to this bug:
> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> that
> genbank records with CONTIG gaps are not valid and can't be parsed.  But
> this
> bug actually claims to be fixed, which is strange, since looking at the
> code for
> FTLocationFactory (where the loop is) it's still right there.  I assume
> that
> this may be fixed in other contexts but is still not fixed in
> Bio::SeqIO::genbank?  Or am I doing something wrong?
> 
> I think that this should probably be filed as an open bug.  I would think
> that
> even if bioperl isn't interested in parsing this type of file via SeqIO,
> certainly you'd want to ensure that no finite input file would send the
> parser
> into an infinite loop.  Have others encountered this problem?  Is there
> any plan
> to address it?
> 
> Thanks very much for any information or help!
> 
> -Mike
> 
> P.S.  I've played around with my version of FTLocationFactory and it seems
> to
> actually work and parse the gaps.  I'm not sure if I've created other bugs
> or if
> it works in all cases, but at least the parser doesn't die.  I also don't
> know
> that my hacky code is appropriate for putting back in to BioPerl, but I'm
> happy
> to provide it if someone wants to check it out and/or consider it for
> checkin.
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From j.abbott at imperial.ac.uk  Thu May  4 11:44:44 2006
From: j.abbott at imperial.ac.uk (James Abbott)
Date: Thu, 04 May 2006 16:44:44 +0100
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or
	RC	lines
In-Reply-To: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>
References: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
	<7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>
Message-ID: <445A216C.7090108@imperial.ac.uk>

Jason Stajich wrote:
> I don't know if any of this has been resolved really so hopefully  
> James will speak up if he's implemented anything.
Not as yet, I'm afraid - $job is keeping me overly busy at the moment, 
but it's on my todo list....

Cheers,
James

-- 
Dr. James Abbott <j.abbott at imperial.ac.uk>
Bioinformatics Software Developer, Bioinformatics Support Service
Imperial College, London


From hubert.prielinger at gmx.at  Thu May  4 15:35:42 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 13:35:42 -0600
Subject: [Bioperl-l] can't parse blast file anymore
Message-ID: <445A578E.8050207@gmx.at>

Hi,
the following perl script worked fine until a few days ago....

==============================================================
#!/usr/bin/perl -w

use Bio::SearchIO;
use strict;
use DBI;
use Net::MySQL;

#use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux);

print "trying to connect to database \n";
my $database = 'antimicro_peptides';
my $host = 'ppc7.bio.ucalgary.ca';
my $user = 'Hubert';
my $password = 'Col00eng30';

my $mysql = Net::MySQL->new(
        hostname => $host,
        database => $database,
        user     => $user,
        password => $password,
    );
   

print "Connection established \n";

my $selectID = 0;
my $count = 0;


##output database results
#while (my @row = $sth->fetchrow_array)
#   { print "@row\n" }


print "start program\n";
my $directory = '/home/Hubert/test';
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
  if ($file =~ /txt$/)   {
      $count++;
    print "read file $file \n";
  

    $file = $directory . '/' . $file;

    my $search = new Bio::SearchIO (-format => 'blast',
                                       -file => $file);
    print "bioperl seems to work....\n";                           
    my $cutoff_len = 10;
                               
    #iterate over each query sequence
    print "try to enter while loop\n";
    while (my $result = $search->next_result) {
    print "entered 1st while loop\n";
   
      #iterate over each hit on the query sequence
      while (my $hit = $result->next_hit) {
      print "entered 2nd while loop\n";
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
        print "entered 3rd while loop\n";
           
          if ($hsp->length('sbjct') <= $cutoff_len) {
          #print $hsp->hit_string, "\n";
               
            for ($hsp->hit_string) {        #$hsp->hit_string
             print "count files....., $count ,\n";
.................

===================================================================

Output:

[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
trying to connect to database
Connection established
start program
opened directory
read file 40026.txt
bioperl seems to work....
try to enter while loop


but it doesn't enter the first while loop, it stuck there, first I 
thought it is a linux problem, because I updated from FC4 to FC5, but it 
isn't because perl is working fine, and it seems bioperl is working fine 
too, but it cannot parse the file anymore.....

regards
Hubert


From barry.moore at genetics.utah.edu  Thu May  4 17:22:51 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 4 May 2006 15:22:51 -0600
Subject: [Bioperl-l] [BULK]   can't parse blast file anymore
In-Reply-To: <445A578E.8050207@gmx.at>
References: <445A578E.8050207@gmx.at>
Message-ID: <BD1D97AA-99BD-451C-9835-4F22A59BCFDD@genetics.utah.edu>

Hubert,

My first suggestion would be to log onto your calgary server and  
change your password real quick (unless that is intended to post you  
password to the world).  Well, this isn't an answer, but it may help  
you find one.  Use perl -d your_script.pl to run your script under  
the debugger.  Type 'n' to step forward to the line where you start  
the while loop.  Type 'x $result' to see that an object exists (it  
should or you'd have gotten an error).  Type 's' to step into the  
next_results call, and then continue to type 'n' and 's' as needed to  
burrow down to see if you can find where you're hanging.

Barry

On May 4, 2006, at 1:35 PM, Hubert Prielinger wrote:

> Hi,
> the following perl script worked fine until a few days ago....
>
> ==============================================================
> #!/usr/bin/perl -w
>
> use Bio::SearchIO;
> use strict;
> use DBI;
> use Net::MySQL;
>
> #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux);
>
> print "trying to connect to database \n";
> my $database = 'antimicro_peptides';
> my $host = 'ppc7.bio.ucalgary.ca';
> my $user = 'Hubert';
> my $password = 'Col00eng30';
>
> my $mysql = Net::MySQL->new(
>         hostname => $host,
>         database => $database,
>         user     => $user,
>         password => $password,
>     );
>
>
> print "Connection established \n";
>
> my $selectID = 0;
> my $count = 0;
>
>
>
> ##output database results
> #while (my @row = $sth->fetchrow_array)
> #   { print "@row\n" }
>
>
>
> print "start program\n";
> my $directory = '/home/Hubert/test';
> opendir(DIR, $directory) || die("Cannot open directory");
> print "opened directory\n";
>
> foreach my $file (readdir(DIR))  {
>   if ($file =~ /txt$/)   {
>       $count++;
>     print "read file $file \n";
>
>
>     $file = $directory . '/' . $file;
>
>     my $search = new Bio::SearchIO (-format => 'blast',
>                                        -file => $file);
>     print "bioperl seems to work....\n";
>     my $cutoff_len = 10;
>
>     #iterate over each query sequence
>     print "try to enter while loop\n";
>     while (my $result = $search->next_result) {
>     print "entered 1st while loop\n";
>
>       #iterate over each hit on the query sequence
>       while (my $hit = $result->next_hit) {
>       print "entered 2nd while loop\n";
>
>         #iterate over each HSP in the hit
>         while (my $hsp = $hit->next_hsp) {
>         print "entered 3rd while loop\n";
>
>           if ($hsp->length('sbjct') <= $cutoff_len) {
>           #print $hsp->hit_string, "\n";
>
>             for ($hsp->hit_string) {        #$hsp->hit_string
>              print "count files....., $count ,\n";
> .................
>
> ===================================================================
>
> Output:
>
> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
> trying to connect to database
> Connection established
> start program
> opened directory
> read file 40026.txt
> bioperl seems to work....
> try to enter while loop
>
>
> but it doesn't enter the first while loop, it stuck there, first I
> thought it is a linux problem, because I updated from FC4 to FC5,  
> but it
> isn't because perl is working fine, and it seems bioperl is working  
> fine
> too, but it cannot parse the file anymore.....
>
> regards
> Hubert
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu May  4 18:27:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 4 May 2006 17:27:57 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <000001c66faa$3a25b130$15327e82@pyrimidine>
Message-ID: <000001c66fc9$fe7e5680$15327e82@pyrimidine>

Here's another odd bit.  This is what I get for the CONTIG line when I
passed a simple contig file (NW_925062, with one join) through Bio::SeqIO:

-----------------------------------
....
FEATURES             Location/Qualifiers
     source          1..8541
                     /db_xref="taxon:9606"
                     /mol_type="genomic DNA"
                     /chromosome="11"
                     /organism="Homo sapiens"
CONTIG      AADB02014027.1:1..8541

//
-----------------------------------
Here's the original:
-----------------------------------
FEATURES             Location/Qualifiers
     source          1..8541
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /chromosome="11"
CONTIG      join(AADB02014027.1:1..8541)
//
-----------------------------------

Looks like it lopped out the 'join' here as well.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, May 04, 2006 1:41 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> Are you using the CONTIG record or the full GenBank file? 	I see
> problems with both (using bioperl-live) which seem unrelated to one
> another.
> The full file seems to be running a bit slow b/c the full GenBank record
> is
> huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
> memory).
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > Sent: Tuesday, May 02, 2006 10:32 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> >
> > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> > genbank
> > files that contain CONTIG entries with gaps.  One such record is
> > NW_925173.
> >
> > When I try to parse this file using Bio::SeqIO::genbank, it will enter
> an
> > infinite loop and spin until it runs out of memory.
> >
> > I'm pretty certain it relates to this bug:
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> > that
> > genbank records with CONTIG gaps are not valid and can't be parsed.  But
> > this
> > bug actually claims to be fixed, which is strange, since looking at the
> > code for
> > FTLocationFactory (where the loop is) it's still right there.  I assume
> > that
> > this may be fixed in other contexts but is still not fixed in
> > Bio::SeqIO::genbank?  Or am I doing something wrong?
> >
> > I think that this should probably be filed as an open bug.  I would
> think
> > that
> > even if bioperl isn't interested in parsing this type of file via SeqIO,
> > certainly you'd want to ensure that no finite input file would send the
> > parser
> > into an infinite loop.  Have others encountered this problem?  Is there
> > any plan
> > to address it?
> >
> > Thanks very much for any information or help!
> >
> > -Mike
> >
> > P.S.  I've played around with my version of FTLocationFactory and it
> seems
> > to
> > actually work and parse the gaps.  I'm not sure if I've created other
> bugs
> > or if
> > it works in all cases, but at least the parser doesn't die.  I also
> don't
> > know
> > that my hacky code is appropriate for putting back in to BioPerl, but
> I'm
> > happy
> > to provide it if someone wants to check it out and/or consider it for
> > checkin.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Thu May  4 18:39:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 4 May 2006 18:39:05 -0400
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <000001c66fc9$fe7e5680$15327e82@pyrimidine>
References: <000001c66fc9$fe7e5680$15327e82@pyrimidine>
Message-ID: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net>

The two notations are equivalent and syntactically correct, or so I  
believe ... I don't think 100% verbatim preservation should be the  
goal. Or am I missing the point?

On May 4, 2006, at 6:27 PM, Chris Fields wrote:

> Here's another odd bit.  This is what I get for the CONTIG line when I
> passed a simple contig file (NW_925062, with one join) through  
> Bio::SeqIO:
>
> -----------------------------------
> ....
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /db_xref="taxon:9606"
>                      /mol_type="genomic DNA"
>                      /chromosome="11"
>                      /organism="Homo sapiens"
> CONTIG      AADB02014027.1:1..8541
>
> //
> -----------------------------------
> Here's the original:
> -----------------------------------
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /organism="Homo sapiens"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:9606"
>                      /chromosome="11"
> CONTIG      join(AADB02014027.1:1..8541)
> //
> -----------------------------------
>
> Looks like it lopped out the 'join' here as well.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, May 04, 2006 1:41 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>
>> Are you using the CONTIG record or the full GenBank file? 	I see
>> problems with both (using bioperl-live) which seem unrelated to one
>> another.
>> The full file seems to be running a bit slow b/c the full GenBank  
>> record
>> is
>> huge (~55 MB) but the CONTIG file does exactly what you said (runs  
>> out of
>> memory).
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
>>> Sent: Tuesday, May 02, 2006 10:32 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>>
>>>
>>> I've encountered a pretty serious bug in Bio::SeqIO when parsing  
>>> certain
>>> genbank
>>> files that contain CONTIG entries with gaps.  One such record is
>>> NW_925173.
>>>
>>> When I try to parse this file using Bio::SeqIO::genbank, it will  
>>> enter
>> an
>>> infinite loop and spin until it runs out of memory.
>>>
>>> I'm pretty certain it relates to this bug:
>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to  
>>> indicate
>>> that
>>> genbank records with CONTIG gaps are not valid and can't be  
>>> parsed.  But
>>> this
>>> bug actually claims to be fixed, which is strange, since looking  
>>> at the
>>> code for
>>> FTLocationFactory (where the loop is) it's still right there.  I  
>>> assume
>>> that
>>> this may be fixed in other contexts but is still not fixed in
>>> Bio::SeqIO::genbank?  Or am I doing something wrong?
>>>
>>> I think that this should probably be filed as an open bug.  I would
>> think
>>> that
>>> even if bioperl isn't interested in parsing this type of file via  
>>> SeqIO,
>>> certainly you'd want to ensure that no finite input file would  
>>> send the
>>> parser
>>> into an infinite loop.  Have others encountered this problem?  Is  
>>> there
>>> any plan
>>> to address it?
>>>
>>> Thanks very much for any information or help!
>>>
>>> -Mike
>>>
>>> P.S.  I've played around with my version of FTLocationFactory and it
>> seems
>>> to
>>> actually work and parse the gaps.  I'm not sure if I've created  
>>> other
>> bugs
>>> or if
>>> it works in all cases, but at least the parser doesn't die.  I also
>> don't
>>> know
>>> that my hacky code is appropriate for putting back in to BioPerl,  
>>> but
>> I'm
>>> happy
>>> to provide it if someone wants to check it out and/or consider it  
>>> for
>>> checkin.
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hubert.prielinger at gmx.at  Thu May  4 19:57:44 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 17:57:44 -0600
Subject: [Bioperl-l] can't parse blast file anymore
In-Reply-To: <445A7449.1080607@infotech.monash.edu.au>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
Message-ID: <445A94F8.9000903@gmx.at>

Torsten Seemann wrote:
> Hubert
>
>> the following perl script worked fine until a few days ago....
>>
>>    #iterate over each query sequence
>>    print "try to enter while loop\n";
>>  
>>
> die "Bad BLAST report" if not defined $search;
>
>>    while (my $result = $search->next_result) {
>>    print "entered 1st while loop\n";
>>
>> Output:
>>
>> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
>> try to enter while loop
>>
>> but it doesn't enter the first while loop, it stuck there, first I  
>>
> What is the value of $search before you start the WHILE loop ?
>
>


hi,
$search is defined, like

my $search = new Bio::SearchIO (-format => 'blast',
                                       -file => $file)


if I try it with the debugger as barry has suggested than I get the following

 
DB<1> n
main::(Blast.pl:24):    print "Connection established \n";
  DB<1> n
Connection established
main::(Blast.pl:26):    my $selectID = 0;
  DB<1> n
main::(Blast.pl:27):    my $count = 0;
  DB<1> n
main::(Blast.pl:37):    print "start program\n";
  DB<1> n
start program
main::(Blast.pl:38):    my $directory = '/home/Hubert/test';
  DB<1> n
main::(Blast.pl:39):    opendir(DIR, $directory) || die("Cannot open 
directory");
  DB<1> n
main::(Blast.pl:40):    print "opened directory\n";
  DB<1> n
opened directory
main::(Blast.pl:42):    foreach my $file (readdir(DIR))  {
  DB<1> n
main::(Blast.pl:43):      if ($file =~ /txt$/)   {
  DB<1> n
main::(Blast.pl:44):            $count++;
  DB<1> n
main::(Blast.pl:45):        print "read file $file \n";
  DB<1> n
read file 40026.txt
main::(Blast.pl:48):        $file = $directory . '/' . $file;
  DB<1> n
main::(Blast.pl:50):        my $search = new Bio::SearchIO (-format => 
'blast',
main::(Blast.pl:51):                                                           
-file => $file);
  DB<1> n
main::(Blast.pl:52):            print "bioperl seems to work....\n";
  DB<1> s $search
main::((eval 14)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
3:      $search;
  DB<<2>> n

  DB<2> n
bioperl seems to work....
main::(Blast.pl:53):        my $cutoff_len = 10;
  DB<2> n
main::(Blast.pl:56):        print "try to enter while loop\n";
  DB<2> n
try to enter while loop
main::(Blast.pl:57):        while (my $result = $search->next_result) {
  DB<2> s $result
main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
3:      $result;
  DB<<3>>


From torsten.seemann at infotech.monash.edu.au  Thu May  4 17:38:17 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 05 May 2006 07:38:17 +1000
Subject: [Bioperl-l] can't parse blast file anymore
In-Reply-To: <445A578E.8050207@gmx.at>
References: <445A578E.8050207@gmx.at>
Message-ID: <445A7449.1080607@infotech.monash.edu.au>

Hubert

>the following perl script worked fine until a few days ago....
>
>    #iterate over each query sequence
>    print "try to enter while loop\n";
>  
>
die "Bad BLAST report" if not defined $search;

>    while (my $result = $search->next_result) {
>    print "entered 1st while loop\n";
>
>Output:
>
>[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
>try to enter while loop
>
>but it doesn't enter the first while loop, it stuck there, first I 
>  
>
What is the value of $search before you start the WHILE loop ?


From barry.moore at genetics.utah.edu  Thu May  4 20:39:57 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 4 May 2006 18:39:57 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445A94F8.9000903@gmx.at>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
	<445A94F8.9000903@gmx.at>
Message-ID: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>

That should be 'x $resust' and you should see the object dumped to  
the screen.

or just 's' by itself which will step you into the sub on the while  
line will step you into the next_result sub, and you can look around  
and watch what's happening.

B

>   DB<2> s $result
> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
> 3:      $result;
>   DB<<3>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Thu May  4 22:04:20 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 20:04:20 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>
	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
Message-ID: <445AB2A4.7020405@gmx.at>

if I do so it returns:
0 undef


Barry Moore wrote:
> That should be 'x $resust' and you should see the object dumped to  
> the screen.
>
> or just 's' by itself which will step you into the sub on the while  
> line will step you into the next_result sub, and you can look around  
> and watch what's happening.
>
> B
>
>   
>>   DB<2> s $result
>> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
>> 3:      $result;
>>   DB<<3>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From torsten.seemann at infotech.monash.edu.au  Fri May  5 00:40:34 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 05 May 2006 14:40:34 +1000
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445AB2A4.7020405@gmx.at>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
	<445A94F8.9000903@gmx.at>
	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
	<445AB2A4.7020405@gmx.at>
Message-ID: <445AD742.4070408@infotech.monash.edu.au>

Hubert Prielinger wrote:
> if I do so it returns:
> 0 undef

That means the value of $search was undef.
That means that it could not parse or open the BLAST report.
I repeat the line that I put in my earlier email which you ignored.

# your line
my $search = Bio::SearchIO->new( ..... );

# then check if it was successful!
die "could not open blast report" if not defined $search;

--Torsten

From jason.stajich at duke.edu  Fri May  5 09:21:38 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 5 May 2006 09:21:38 -0400
Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files
In-Reply-To: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
Message-ID: <B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>

Space after the > is causing the problem since we infer the ID as the  
everything after the '>' BEFORE the first whitespace.  Get rid of the  
space.
   $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE

On May 4, 2006, at 7:00 PM, Gloria Rendon wrote:

> contents of the input file has a single sequence:
>
>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel
> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNFS
> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN
> ------------------------------------------
> this is the script that tries to parse it:
>
> use Bio::AlignIO;
> my $inseq = Bio::AlignIO->new(-format => 'fasta',
>                            -file   => 'test.fasta');
> while( my $aln = $inseq->next_aln ) {
>      print "name: ", $aln->displayname;
>      print "length: ", $aln->length;
>      print "\n";
> }
>
> ------------------------------------------
> and this is the result of running that script on winxp
>
> D:\msa\NAK MUTANTS>perl parseFasta.pl
>
>
> ------------- EXCEPTION  -------------
> MSG: No sequence with name []
> STACK Bio::SimpleAlign::displayname
> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047
> STACK toplevel parseFasta.pl:11
>
> --------------------------------------
> D:\msa\NAK MUTANTS>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From thoufek at pngg.org  Thu May  4 12:50:44 2006
From: thoufek at pngg.org (T.D. Houfek)
Date: Thu, 04 May 2006 12:50:44 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
Message-ID: <445A30E4.6070103@pngg.org>

Using Bioperl 1.5, having trouble with writing FASTA-style quality files 
using Bio::Seq::Quality.

I create the Bio::Seq::Quality object, giving its constructor an ID, a 
description, a nucleotide sequence, and a quality sequence. I then write 
the sequence FASTA and the quality FASTA. The description string will 
appear in the header line of the sequence FASTA, but not in the header 
line of the quality FASTA.

Can anybody help me figure out how to fix this? I've attached a sample 
script and output.

-T.D.

------------------- sample script follows 
---------------------------------------

#!/usr/bin/perl
use strict;
use Bio::Seq::Quality;
use Bio::SeqIO;

my $id = "bogus_id";
my $desc = "bogus description";
my $seq = "ATTATTATTATTATT";
my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";

my $sequal_obj = Bio::Seq::Quality->new(
-display_id => $id,
-desc => $desc,
-seq => $seq,
-qual => $qual
);

my $qualout = Bio::SeqIO->new(
-file => ">myfile.qual",
-format => 'qual'
);
my $seqout = Bio::SeqIO->new(
-file => ">myfile.seq",
-format => 'Fasta'
);

$seqout->write_seq($sequal_obj);
$qualout->write_seq($sequal_obj);


------------------ sample output follows 
---------------------------------------

tdhoufek at aether:~$ cat myfile.seq
 >bogus_id bogus description
ATTATTATTATTATT
tdhoufek at aether:~$ cat myfile.qual
 >bogus_id
10 20 30 10 20 30 10 20 30 10 20 30 10 20 30

--------------------------------------------------------------------------------------------------


-- 
T.D. Houfek
senior bioinformatics developer
plant nematode genetics group
north carolina state university
Email: thoufek at pngg.org
----------------------------------------------------------
use Bio::Seq; @a =qw/NNN CCT GAG CAT GCG TGT AAG AAC TAG/;
$u=seq;$r=Bio::Seq;sub c{$c=$r->new(-$u=>"@_[0]")->revcom;
$t=$c->$u;}map{m/\d/?$g=c($a[$_]):tr/a-i/1-9/&&($g=$a[$_])
;$x[$i++]=$g;} split //,"dgh5cb40ab120cdefb4";$z=$r->new(-
$u=>(join"", at x))->translate()->$u;$z =~s/X/ /g;print"$z\n"


From jason.stajich at duke.edu  Fri May  5 09:27:51 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 5 May 2006 09:27:51 -0400
Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files
In-Reply-To: <B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>
References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
	<B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>
Message-ID: <0F79C9AD-DE36-4424-9E59-37ABE8B62A5E@duke.edu>

[replying to myself]

although if you are trying to just read a sequence not an alignment  
then you want to use Bio::SeqIO.

See the copious help on the HOWTO page at bioperl website including a  
sequence and feature howto and beginner's guide.
  http://bioperl.org/wiki/HOWTOs

-jason
On May 5, 2006, at 9:21 AM, Jason Stajich wrote:

> Space after the > is causing the problem since we infer the ID as the
> everything after the '>' BEFORE the first whitespace.  Get rid of the
> space.
>    $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE
>
> On May 4, 2006, at 7:00 PM, Gloria Rendon wrote:
>
>> contents of the input file has a single sequence:
>>
>>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel
>> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNF 
>> S
>> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN
>> ------------------------------------------
>> this is the script that tries to parse it:
>>
>> use Bio::AlignIO;
>> my $inseq = Bio::AlignIO->new(-format => 'fasta',
>>                            -file   => 'test.fasta');
>> while( my $aln = $inseq->next_aln ) {
>>      print "name: ", $aln->displayname;
>>      print "length: ", $aln->length;
>>      print "\n";
>> }
>>
>> ------------------------------------------
>> and this is the result of running that script on winxp
>>
>> D:\msa\NAK MUTANTS>perl parseFasta.pl
>>
>>
>> ------------- EXCEPTION  -------------
>> MSG: No sequence with name []
>> STACK Bio::SimpleAlign::displayname
>> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047
>> STACK toplevel parseFasta.pl:11
>>
>> --------------------------------------
>> D:\msa\NAK MUTANTS>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From osborne1 at optonline.net  Fri May  5 10:04:02 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 05 May 2006 10:04:02 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
Message-ID: <C080D392.8567%osborne1@optonline.net>

T.D.,

According to the documentation,
http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file looks
right. What are you trying to create?

Brian O.


On 5/4/06 12:50 PM, "T.D. Houfek" <thoufek at pngg.org> wrote:

> Using Bioperl 1.5, having trouble with writing FASTA-style quality files
> using Bio::Seq::Quality.
> 
> I create the Bio::Seq::Quality object, giving its constructor an ID, a
> description, a nucleotide sequence, and a quality sequence. I then write
> the sequence FASTA and the quality FASTA. The description string will
> appear in the header line of the sequence FASTA, but not in the header
> line of the quality FASTA.
> 
> Can anybody help me figure out how to fix this? I've attached a sample
> script and output.
> 
> -T.D.
> 
> ------------------- sample script follows
> ---------------------------------------
> 
> #!/usr/bin/perl
> use strict;
> use Bio::Seq::Quality;
> use Bio::SeqIO;
> 
> my $id = "bogus_id";
> my $desc = "bogus description";
> my $seq = "ATTATTATTATTATT";
> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";
> 
> my $sequal_obj = Bio::Seq::Quality->new(
> -display_id => $id,
> -desc => $desc,
> -seq => $seq,
> -qual => $qual
> );
> 
> my $qualout = Bio::SeqIO->new(
> -file => ">myfile.qual",
> -format => 'qual'
> );
> my $seqout = Bio::SeqIO->new(
> -file => ">myfile.seq",
> -format => 'Fasta'
> );
> 
> $seqout->write_seq($sequal_obj);
> $qualout->write_seq($sequal_obj);
> 
> 
> ------------------ sample output follows
> ---------------------------------------
> 
> tdhoufek at aether:~$ cat myfile.seq
>> bogus_id bogus description
> ATTATTATTATTATT
> tdhoufek at aether:~$ cat myfile.qual
>> bogus_id
> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
> 
> ------------------------------------------------------------------------------
> --------------------
> 
> 
> 


From cjfields at uiuc.edu  Fri May  5 10:24:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 09:24:05 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net>
Message-ID: <001701c6704f$90dbd090$15327e82@pyrimidine>

I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk
from the longer file Michael used as an example here (NW_925173). I believe
the CONTIG line is currently handled like a feature so I think it goes
through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix is;
I think it's getting beaten up in there somehow. I may see what happens if
it's treated like a WGS line (like a Bio::Annotation::SimpleValue object)
and just glob the whole mess together as is.


Chris

...
FEATURES             Location/Qualifiers
     source          1..44976370
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /chromosome="11"
CONTIG      join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321,
            gap(441),AADB02014318.1:1..173584,gap(676),
            AADB02014319.1:1..377558,gap(20),
            complement(AADB02014320.1:1..431263),gap(20),
            AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198,
            gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771,
            gap(4611),AADB02014325.1:1..383881,gap(20),
            complement(AADB02014326.1:1..381633),gap(1930),
            complement(AADB02014327.1:1..460053),gap(20),
            AADB02014328.1:1..4186,gap(1587),
...

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Thursday, May 04, 2006 5:39 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> The two notations are equivalent and syntactically correct, or so I
> believe ... I don't think 100% verbatim preservation should be the
> goal. Or am I missing the point?
> 
> On May 4, 2006, at 6:27 PM, Chris Fields wrote:
> 
> > Here's another odd bit.  This is what I get for the CONTIG line when I
> > passed a simple contig file (NW_925062, with one join) through
> > Bio::SeqIO:
> >
> > -----------------------------------
> > ....
> > FEATURES             Location/Qualifiers
> >      source          1..8541
> >                      /db_xref="taxon:9606"
> >                      /mol_type="genomic DNA"
> >                      /chromosome="11"
> >                      /organism="Homo sapiens"
> > CONTIG      AADB02014027.1:1..8541
> >
> > //
> > -----------------------------------
> > Here's the original:
> > -----------------------------------
> > FEATURES             Location/Qualifiers
> >      source          1..8541
> >                      /organism="Homo sapiens"
> >                      /mol_type="genomic DNA"
> >                      /db_xref="taxon:9606"
> >                      /chromosome="11"
> > CONTIG      join(AADB02014027.1:1..8541)
> > //
> > -----------------------------------
> >
> > Looks like it lopped out the 'join' here as well.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >> Sent: Thursday, May 04, 2006 1:41 PM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >>
> >> Are you using the CONTIG record or the full GenBank file? 	I see
> >> problems with both (using bioperl-live) which seem unrelated to one
> >> another.
> >> The full file seems to be running a bit slow b/c the full GenBank
> >> record
> >> is
> >> huge (~55 MB) but the CONTIG file does exactly what you said (runs
> >> out of
> >> memory).
> >>
> >> Chris
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> >>> Sent: Tuesday, May 02, 2006 10:32 PM
> >>> To: bioperl-l at lists.open-bio.org
> >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >>>
> >>>
> >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing
> >>> certain
> >>> genbank
> >>> files that contain CONTIG entries with gaps.  One such record is
> >>> NW_925173.
> >>>
> >>> When I try to parse this file using Bio::SeqIO::genbank, it will
> >>> enter
> >> an
> >>> infinite loop and spin until it runs out of memory.
> >>>
> >>> I'm pretty certain it relates to this bug:
> >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to
> >>> indicate
> >>> that
> >>> genbank records with CONTIG gaps are not valid and can't be
> >>> parsed.  But
> >>> this
> >>> bug actually claims to be fixed, which is strange, since looking
> >>> at the
> >>> code for
> >>> FTLocationFactory (where the loop is) it's still right there.  I
> >>> assume
> >>> that
> >>> this may be fixed in other contexts but is still not fixed in
> >>> Bio::SeqIO::genbank?  Or am I doing something wrong?
> >>>
> >>> I think that this should probably be filed as an open bug.  I would
> >> think
> >>> that
> >>> even if bioperl isn't interested in parsing this type of file via
> >>> SeqIO,
> >>> certainly you'd want to ensure that no finite input file would
> >>> send the
> >>> parser
> >>> into an infinite loop.  Have others encountered this problem?  Is
> >>> there
> >>> any plan
> >>> to address it?
> >>>
> >>> Thanks very much for any information or help!
> >>>
> >>> -Mike
> >>>
> >>> P.S.  I've played around with my version of FTLocationFactory and it
> >> seems
> >>> to
> >>> actually work and parse the gaps.  I'm not sure if I've created
> >>> other
> >> bugs
> >>> or if
> >>> it works in all cases, but at least the parser doesn't die.  I also
> >> don't
> >>> know
> >>> that my hacky code is appropriate for putting back in to BioPerl,
> >>> but
> >> I'm
> >>> happy
> >>> to provide it if someone wants to check it out and/or consider it
> >>> for
> >>> checkin.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Fri May  5 10:47:50 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 5 May 2006 10:47:50 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <C080D392.8567%osborne1@optonline.net>
References: <C080D392.8567%osborne1@optonline.net>
Message-ID: <2E1683FE-57E4-4D97-A958-1B529973E89E@gmx.net>

He wants the description on the description line, like for the  
sequence file.

Thomas, my guess is the code doesn't print the description to the  
line although I haven't made sure. Do you want to volunteer and  
check, add that print statement and post the patch?

	-hilmar

On May 5, 2006, at 10:04 AM, Brian Osborne wrote:

> T.D.,
>
> According to the documentation,
> http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file  
> looks
> right. What are you trying to create?
>
> Brian O.
>
>
> On 5/4/06 12:50 PM, "T.D. Houfek" <thoufek at pngg.org> wrote:
>
>> Using Bioperl 1.5, having trouble with writing FASTA-style quality  
>> files
>> using Bio::Seq::Quality.
>>
>> I create the Bio::Seq::Quality object, giving its constructor an  
>> ID, a
>> description, a nucleotide sequence, and a quality sequence. I then  
>> write
>> the sequence FASTA and the quality FASTA. The description string will
>> appear in the header line of the sequence FASTA, but not in the  
>> header
>> line of the quality FASTA.
>>
>> Can anybody help me figure out how to fix this? I've attached a  
>> sample
>> script and output.
>>
>> -T.D.
>>
>> ------------------- sample script follows
>> ---------------------------------------
>>
>> #!/usr/bin/perl
>> use strict;
>> use Bio::Seq::Quality;
>> use Bio::SeqIO;
>>
>> my $id = "bogus_id";
>> my $desc = "bogus description";
>> my $seq = "ATTATTATTATTATT";
>> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";
>>
>> my $sequal_obj = Bio::Seq::Quality->new(
>> -display_id => $id,
>> -desc => $desc,
>> -seq => $seq,
>> -qual => $qual
>> );
>>
>> my $qualout = Bio::SeqIO->new(
>> -file => ">myfile.qual",
>> -format => 'qual'
>> );
>> my $seqout = Bio::SeqIO->new(
>> -file => ">myfile.seq",
>> -format => 'Fasta'
>> );
>>
>> $seqout->write_seq($sequal_obj);
>> $qualout->write_seq($sequal_obj);
>>
>>
>> ------------------ sample output follows
>> ---------------------------------------
>>
>> tdhoufek at aether:~$ cat myfile.seq
>>> bogus_id bogus description
>> ATTATTATTATTATT
>> tdhoufek at aether:~$ cat myfile.qual
>>> bogus_id
>> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
>>
>> --------------------------------------------------------------------- 
>> ---------
>> --------------------
>>
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From dmessina at wustl.edu  Fri May  5 11:24:47 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 10:24:47 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <5A549C57-A310-4623-BC44-787AC8BFD6C2@wustl.edu>

Apologies if this is a repost -- mail troubles this morning.

Hilmar is correct.

 From a cursory walk through the code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From dmessina at wustl.edu  Fri May  5 10:53:15 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 09:53:15 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <DCF490F7-46CC-47B7-81A7-229BCC819980@wustl.edu>

T.D.,

 From a cursory walk through your code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From dmessina at wustl.edu  Fri May  5 10:53:15 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 09:53:15 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <DCF490F7-46CC-47B7-81A7-229BCC819980@wustl.edu>

T.D.,

 From a cursory walk through your code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From hubert.prielinger at gmx.at  Fri May  5 14:30:24 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 12:30:24 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445AD742.4070408@infotech.monash.edu.au>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>	<445AB2A4.7020405@gmx.at>
	<445AD742.4070408@infotech.monash.edu.au>
Message-ID: <445B99C0.6050407@gmx.at>

hi,
I have done, as you suggested and I got the error message:

Can't call method "next_result" on an undefined value at....

then I looked up at the internet and found a thread which suggested to 
use strict and then the problem is solved....
but I'm already using use strict..

thanks

Torsten Seemann wrote:
> Hubert Prielinger wrote:
>   
>> if I do so it returns:
>> 0 undef
>>     
>
> That means the value of $search was undef.
> That means that it could not parse or open the BLAST report.
> I repeat the line that I put in my earlier email which you ignored.
>
> # your line
> my $search = Bio::SearchIO->new( ..... );
>
> # then check if it was successful!
> die "could not open blast report" if not defined $search;
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri May  5 15:18:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 14:18:16 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
Message-ID: <000001c67078$a9a7ca10$15327e82@pyrimidine>

What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping through
your files and performing a task on each one, such as parsing output.  It
changes into the working directory each time; you should be able to do
something like this:

use File::Find;
use Bio::SearchIO;


Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report.
> > I repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 15:27:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 14:27:12 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
Message-ID: <000101c67079$e8c86a00$15327e82@pyrimidine>

Sorry, mail got sent before I finished it!  Here I go again...

What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping through
your files and performing a task on each one, such as parsing output.  It
changes into the working directory each time; you should be able to do
something like this:

use File::Find;
use Bio::SearchIO;

my @dirlist = ("/home/Hubert/test");

find (\&dir, @dirlist);

sub printdir {
    return unless /txt$/; 
    return if (-d);
    my $parser = Bio::SearchIO->new(-file => $_,
                                    -format => 'blast');	
    while (my $result = $parser->next_result) {
        while (my $hit = $result->next_hit) {
            while (my $hsp = $hit->next_hsp) {
                # do stuff here
            }
        }
    }
}

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report.
> > I repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From barry.moore at genetics.utah.edu  Fri May  5 15:39:37 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Fri, 5 May 2006 13:39:37 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>	<445AB2A4.7020405@gmx.at>
	<445AD742.4070408@infotech.monash.edu.au> <445B99C0.6050407@gmx.at>
Message-ID: <7F3D73A6-392E-4728-ACB9-FD3BEDFD3C18@genetics.utah.edu>

Hubert-

If you want to send me your script and input file I'll try to have a  
look at it.

Barry

On May 5, 2006, at 12:30 PM, Hubert Prielinger wrote:

> hi,
> I have done, as you suggested and I got the error message:
>
> Can't call method "next_result" on an undefined value at....
>
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
>
> thanks
>
> Torsten Seemann wrote:
>> Hubert Prielinger wrote:
>>
>>> if I do so it returns:
>>> 0 undef
>>>
>>
>> That means the value of $search was undef.
>> That means that it could not parse or open the BLAST report.
>> I repeat the line that I put in my earlier email which you ignored.
>>
>> # your line
>> my $search = Bio::SearchIO->new( ..... );
>>
>> # then check if it was successful!
>> die "could not open blast report" if not defined $search;
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 16:07:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 15:07:53 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <000101c67079$e8c86a00$15327e82@pyrimidine>
Message-ID: <000201c6707f$97aaaba0$15327e82@pyrimidine>

Oops!  This is what happens when I copy and paste in a hurry.

> use File::Find;
> use Bio::SearchIO;
> 
> my @dirlist = ("/home/Hubert/test");
> 
> find (\&dir, @dirlist);
> 
> sub printdir {
  ^^^^^^^^^^^

Should be: sub dir {

>     return unless /txt$/;
>     return if (-d);
>     my $parser = Bio::SearchIO->new(-file => $_,
>                                     -format => 'blast');
>     while (my $result = $parser->next_result) {
>         while (my $hit = $result->next_hit) {
>             while (my $hsp = $hit->next_hsp) {
>                 # do stuff here
>             }
>         }
>     }
> }

Hubert, if the file you are parsing looks fine (i.e. valid BLAST output),
post it and your script on Bugzilla and let us take a look.  Leave out your
password though ; >

Chris


From golharam at umdnj.edu  Fri May  5 15:58:03 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 05 May 2006 15:58:03 -0400
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <000001c67078$a9a7ca10$15327e82@pyrimidine>
Message-ID: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>

I'm not sure how applicable this is, but I've seen a problem with Perl
if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8).
I've changed mine to en_US and lots of perl string parsing problems went
away.

Also, what about running the bioperl tests on your installation (make
test).  What happens?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Friday, May 05, 2006 3:18 PM
To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore


What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping
through your files and performing a task on each one, such as parsing
output.  It changes into the working directory each time; you should be
able to do something like this:

use File::Find;
use Bio::SearchIO;


Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to

> use strict and then the problem is solved.... but I'm already using 
> use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report. I 
> > repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 17:56:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 16:56:29 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <001701c6704f$90dbd090$15327e82@pyrimidine>
Message-ID: <000901c6708e$c77442b0$15327e82@pyrimidine>

Okay, I have changed the way the CONTIG line is handled in
Bio::SeqIO::genbank.  It was handling it as a feature; I just changed it
over to handling it as a Bio::Annotation::SimpleValue object with the value
being the entire contig section.  It seems to pass tests fine but I'm
operating off Windows and my wife's IBook went to the great desktop in the
sky (motherboard), so I can't test it there.  Pulling the file off using
Bio::DB::GenBank (using the no-redirect flag) works w/o crashing out.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Friday, May 05, 2006 9:24 AM
> To: 'Hilmar Lapp'
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk
> from the longer file Michael used as an example here (NW_925173). I
> believe
> the CONTIG line is currently handled like a feature so I think it goes
> through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix
> is;
> I think it's getting beaten up in there somehow. I may see what happens if
> it's treated like a WGS line (like a Bio::Annotation::SimpleValue object)
> and just glob the whole mess together as is.
> 
> 
> Chris
> 
> ...
> FEATURES             Location/Qualifiers
>      source          1..44976370
>                      /organism="Homo sapiens"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:9606"
>                      /chromosome="11"
> CONTIG
> join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321,
>             gap(441),AADB02014318.1:1..173584,gap(676),
>             AADB02014319.1:1..377558,gap(20),
>             complement(AADB02014320.1:1..431263),gap(20),
>             AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198,
> 
> gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771,
>             gap(4611),AADB02014325.1:1..383881,gap(20),
>             complement(AADB02014326.1:1..381633),gap(1930),
>             complement(AADB02014327.1:1..460053),gap(20),
>             AADB02014328.1:1..4186,gap(1587),
> ...
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> > Sent: Thursday, May 04, 2006 5:39 PM
> > To: Chris Fields
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> > The two notations are equivalent and syntactically correct, or so I
> > believe ... I don't think 100% verbatim preservation should be the
> > goal. Or am I missing the point?
> >
> > On May 4, 2006, at 6:27 PM, Chris Fields wrote:
> >
> > > Here's another odd bit.  This is what I get for the CONTIG line when I
> > > passed a simple contig file (NW_925062, with one join) through
> > > Bio::SeqIO:
> > >
> > > -----------------------------------
> > > ....
> > > FEATURES             Location/Qualifiers
> > >      source          1..8541
> > >                      /db_xref="taxon:9606"
> > >                      /mol_type="genomic DNA"
> > >                      /chromosome="11"
> > >                      /organism="Homo sapiens"
> > > CONTIG      AADB02014027.1:1..8541
> > >
> > > //
> > > -----------------------------------
> > > Here's the original:
> > > -----------------------------------
> > > FEATURES             Location/Qualifiers
> > >      source          1..8541
> > >                      /organism="Homo sapiens"
> > >                      /mol_type="genomic DNA"
> > >                      /db_xref="taxon:9606"
> > >                      /chromosome="11"
> > > CONTIG      join(AADB02014027.1:1..8541)
> > > //
> > > -----------------------------------
> > >
> > > Looks like it lopped out the 'join' here as well.
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> > >> Sent: Thursday, May 04, 2006 1:41 PM
> > >> To: bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> > >>
> > >> Are you using the CONTIG record or the full GenBank file? 	I
see
> > >> problems with both (using bioperl-live) which seem unrelated to one
> > >> another.
> > >> The full file seems to be running a bit slow b/c the full GenBank
> > >> record
> > >> is
> > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs
> > >> out of
> > >> memory).
> > >>
> > >> Chris
> > >>
> > >>> -----Original Message-----
> > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > >>> Sent: Tuesday, May 02, 2006 10:32 PM
> > >>> To: bioperl-l at lists.open-bio.org
> > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> > >>>
> > >>>
> > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing
> > >>> certain
> > >>> genbank
> > >>> files that contain CONTIG entries with gaps.  One such record is
> > >>> NW_925173.
> > >>>
> > >>> When I try to parse this file using Bio::SeqIO::genbank, it will
> > >>> enter
> > >> an
> > >>> infinite loop and spin until it runs out of memory.
> > >>>
> > >>> I'm pretty certain it relates to this bug:
> > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to
> > >>> indicate
> > >>> that
> > >>> genbank records with CONTIG gaps are not valid and can't be
> > >>> parsed.  But
> > >>> this
> > >>> bug actually claims to be fixed, which is strange, since looking
> > >>> at the
> > >>> code for
> > >>> FTLocationFactory (where the loop is) it's still right there.  I
> > >>> assume
> > >>> that
> > >>> this may be fixed in other contexts but is still not fixed in
> > >>> Bio::SeqIO::genbank?  Or am I doing something wrong?
> > >>>
> > >>> I think that this should probably be filed as an open bug.  I would
> > >> think
> > >>> that
> > >>> even if bioperl isn't interested in parsing this type of file via
> > >>> SeqIO,
> > >>> certainly you'd want to ensure that no finite input file would
> > >>> send the
> > >>> parser
> > >>> into an infinite loop.  Have others encountered this problem?  Is
> > >>> there
> > >>> any plan
> > >>> to address it?
> > >>>
> > >>> Thanks very much for any information or help!
> > >>>
> > >>> -Mike
> > >>>
> > >>> P.S.  I've played around with my version of FTLocationFactory and it
> > >> seems
> > >>> to
> > >>> actually work and parse the gaps.  I'm not sure if I've created
> > >>> other
> > >> bugs
> > >>> or if
> > >>> it works in all cases, but at least the parser doesn't die.  I also
> > >> don't
> > >>> know
> > >>> that my hacky code is appropriate for putting back in to BioPerl,
> > >>> but
> > >> I'm
> > >>> happy
> > >>> to provide it if someone wants to check it out and/or consider it
> > >>> for
> > >>> checkin.
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May  5 19:54:55 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 17:54:55 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>
References: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>
Message-ID: <445BE5CF.2000007@gmx.at>

hi ryan,
nothing happend if I add the verbose flag
and how can I test my bioperl installation.....


Ryan Golhar wrote:
> I'm not sure how applicable this is, but I've seen a problem with Perl
> if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8).
> I've changed mine to en_US and lots of perl string parsing problems went
> away.
>
> Also, what about running the bioperl tests on your installation (make
> test).  What happens?
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Friday, May 05, 2006 3:18 PM
> To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
>
>
> What happens if you add the verbose flag?
>
> my $search = new Bio::SearchIO (-verbose => 1,
>                                 -format => 'blast',
>                                 -file => $file);
>
> Added thought : you might want to look at File::Find for stepping
> through your files and performing a task on each one, such as parsing
> output.  It changes into the working directory each time; you should be
> able to do something like this:
>
> use File::Find;
> use Bio::SearchIO;
>
>
>
>
> Original Message-----
>   
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, May 05, 2006 1:30 PM
>> To: Torsten Seemann; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
>>
>> hi,
>> I have done, as you suggested and I got the error message:
>>
>> Can't call method "next_result" on an undefined value at....
>>
>> then I looked up at the internet and found a thread which suggested to
>>     
>
>   
>> use strict and then the problem is solved.... but I'm already using 
>> use strict..
>>
>> thanks
>>
>> Torsten Seemann wrote:
>>     
>>> Hubert Prielinger wrote:
>>>
>>>       
>>>> if I do so it returns:
>>>> 0 undef
>>>>
>>>>         
>>> That means the value of $search was undef.
>>> That means that it could not parse or open the BLAST report. I 
>>> repeat the line that I put in my earlier email which you ignored.
>>>
>>> # your line
>>> my $search = Bio::SearchIO->new( ..... );
>>>
>>> # then check if it was successful!
>>> die "could not open blast report" if not defined $search;
>>>
>>> --Torsten
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org 
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org 
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From hubert.prielinger at gmx.at  Fri May  5 20:01:11 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 18:01:11 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <445BE747.5020202@gmx.at>

hi
I have posted my script and the blast file to bugzilla......

From hubert.prielinger at gmx.at  Fri May  5 21:21:33 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 19:21:33 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445BE747.5020202@gmx.at>
References: <445BE747.5020202@gmx.at>
Message-ID: <445BFA1D.5060008@gmx.at>

they bugzilla posting didn't work, what is the exact email address for 
bugzilla

Hubert Prielinger wrote:
> hi
> I have posted my script and the blast file to bugzilla......
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri May  5 21:38:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 20:38:47 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445BFA1D.5060008@gmx.at>
Message-ID: <000d01c670ad$d209f980$15327e82@pyrimidine>

Hubert,

Calm down.  Breathe in, breath out.  Relax.......

Okay, here is the place to start.  Read the instructions there first.

http://www.bioperl.org/wiki/Bugs

Bugs are reported at this site:

http://bugzilla.bioperl.org/

Again, follow the instructions.  You will have to create a user name and
password to submit.  Once that is set up, click the "Submit a new bug" link
on the main bugzilla page.  On that page, fill out all information first and
a description of the error and hit 'commit'.   Add the BLAST report and some
sample script by clicking on the "Create a New Attachment" link (you'll have
to do this for each file).  Once you go back to the bug page you should see
two attachments and the bug report.  Any commits get sent through the
bioperl-guts-l mail list which most developers subscribe to, so they'll know
there's a new bug out there.  

I will not be able to get to it personally; our home computer died a slow
painful death today (RIP 2002-2006) but I can get to it next week.  If you
post the bug, somebody might be able to get to it sooner!

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 8:22 PM
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore
> 
> they bugzilla posting didn't work, what is the exact email address for
> bugzilla
> 
> Hubert Prielinger wrote:
> > hi
> > I have posted my script and the blast file to bugzilla......
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 22:26:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 21:26:35 -0500
Subject: [Bioperl-l] Changes to NCBIHelper (RE: CONTIG, genome files)
Message-ID: <000f01c670b4$7f22f760$15327e82@pyrimidine>

I committed a change to NCBIHelper that permits the downloading of CON
(contig) files and corrects an issue where no sequence features were saved
when rebuilding those files.  If you use Bio::DB::GenBank regularly to
download genome files, this likely will NOT affect your code unless you
explicitly set the format type to 'genbank', like so:

$factory = Bio::DB::GenBank->new(-format => 'gb'); # or 'genbank'

I believe most will not have that setting since the default was already
'gb'.  Now, the default is 'gbwithparts', which returns the full sequence
regardless.  If it is a file with a CONTIG line, the sequence is built on
NCBI's end and will include seq features if they are present).  As Brian
said, we'll let NCBI do the work for us!  

If you need the actual file w/o sequence, then you can set the format to
'genbank' (like above) and it will grab it for you.  There was an unrelated
problem with CONTIG line parsing that I also fixed, where I changed the
format over to a Bio::Annotation::SimpleValue as a workaround for now; for
some reason some CON files were misparsed and resulted in infinite loops or
missing 'join' statements.  

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hubert.prielinger at gmx.at  Sat May  6 18:22:05 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sat, 06 May 2006 16:22:05 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <000d01c670ad$d209f980$15327e82@pyrimidine>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
Message-ID: <445D218D.2030504@gmx.at>

ok, thanks
I have submitted the bug
bug #1994


Chris Fields wrote:
> Hubert,
>
> Calm down.  Breathe in, breath out.  Relax.......
>
> Okay, here is the place to start.  Read the instructions there first.
>
> http://www.bioperl.org/wiki/Bugs
>
> Bugs are reported at this site:
>
> http://bugzilla.bioperl.org/
>
> Again, follow the instructions.  You will have to create a user name and
> password to submit.  Once that is set up, click the "Submit a new bug" link
> on the main bugzilla page.  On that page, fill out all information first and
> a description of the error and hit 'commit'.   Add the BLAST report and some
> sample script by clicking on the "Create a New Attachment" link (you'll have
> to do this for each file).  Once you go back to the bug page you should see
> two attachments and the bug report.  Any commits get sent through the
> bioperl-guts-l mail list which most developers subscribe to, so they'll know
> there's a new bug out there.  
>
> I will not be able to get to it personally; our home computer died a slow
> painful death today (RIP 2002-2006) but I can get to it next week.  If you
> post the bug, somebody might be able to get to it sooner!
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, May 05, 2006 8:22 PM
>> To: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore
>>
>> they bugzilla posting didn't work, what is the exact email address for
>> bugzilla
>>
>> Hubert Prielinger wrote:
>>     
>>> hi
>>> I have posted my script and the blast file to bugzilla......
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From torsten.seemann at infotech.monash.edu.au  Sat May  6 20:57:14 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 07 May 2006 10:57:14 +1000
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445D218D.2030504@gmx.at>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
	<445D218D.2030504@gmx.at>
Message-ID: <445D45EA.8020804@infotech.monash.edu.au>

Hubert Prielinger wrote:
> ok, thanks
> I have submitted the bug
> bug #1994

This is a line from the script you sent to Bugzilla:

my $search = new Bio::SearchIO (
-verbose => 1,-format => 'blast', -file => $file)
or die "could not open blast report" if not defined my $search;

Althoygh syntactically correct, I don't think it is doing what you want.
Please change it to this:

my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die 
"could not open blast report";

or alternatively, this:

my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
if (not defined $search) {
   die "could not open blast report";
}

and let us know what happens.

all the example output you have supplied still suggests that Bio::SearchIO can 
not load or parse your blast report.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia

From mamillerpa at yahoo.com  Sat May  6 19:07:30 2006
From: mamillerpa at yahoo.com (Mark A. Miller)
Date: Sat, 6 May 2006 16:07:30 -0700 (PDT)
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <C07E8961.84F2%osborne1@optonline.net>
Message-ID: <20060506230730.56480.qmail@web50410.mail.yahoo.com>

Thanks for your responses, Jason and Brian.

Brian, you suggestion works great.  I had really hoped that by parsing
the OS line as well, I could be sure I wasn't missing any sequences
from my organisms.  Well, I gave up on that and just obtained the NCBI
taxonomy values.  I find it pretty easy to work with them in bioperl. 
Unfortunately, walking through all of Trembl takes a while, and I'm
getting this error:

  Can't call method "ncbi_taxid" on an undefined value at ./ga2.pl line
55, <GEN0> line 3253682.

When I try to extract annotations, etc., from entries like:

  DHE4_UNKP

with:

  my $species_object = $seq->species;
  my $taxid_string = $species_object->ncbi_taxid;

I guess I have to write an error handler for incomplete taxonomy
values.

Bye for now,
Mark


--- Brian Osborne <osborne1 at optonline.net> wrote:

> Mark,
> 
> The RC line is part of the description of a reference, I'm guessing
> 'RC'
> stands for Reference Comment. In order to get the attributes of a
> reference
> you'll first do something like:
> 
> my $anno_collection = $seq->annotation;
> my @references = $anno_collection->get_Annotations('reference');
> 
> To get the comment field for a specific reference you can do:
> 
> $references[0]->comment;
> 
> See the Feature-Annotation HOWTO for more information on Annotations,
> the
> Reference object is a kind of Annotation object.
> 
> Brian O.
> 
> 
> On 5/3/06 3:34 PM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:
> 
> > Yeah.  Do you have any experience with that?
> > 
> > Mark
> > 
> > --- Brian Osborne <osborne1 at optonline.net> wrote:
> > 
> >> Mark,
> >> 
> >> So you're trying to get the information in the RC line from a
> >> Swissprot
> >> format file?
> >> 
> >> Brian O.
> > 
> > 
> > ---   ---   ---   ---   ---   ---   ---   ---
> > 
> > Mark A. Miller
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com 
> 
> 
> 


---   ---   ---   ---   ---   ---   ---   ---

Mark A. Miller

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From cjfields at uiuc.edu  Sat May  6 23:33:40 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sat, 6 May 2006 22:33:40 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>

The -verbose flag was my suggestion; it should output a ton of debugging info 
from SearchIO::blast; if you see anything there, then it means that it's at least 
attempting to parse the report.  

Of course I can't test this myself at the moment since my wife's computer died 
(along with the bioperl setup); I'm using a loaner computer at the moment.

Chris

---- Original message ----
>Date: Sun, 07 May 2006 10:57:14 +1000
>From: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>  
>Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore  
>To: Hubert Prielinger <hubert.prielinger at gmx.at>
>Cc: bioperl-l at bioperl.org
>
>Hubert Prielinger wrote:
>> ok, thanks
>> I have submitted the bug
>> bug #1994
>
>This is a line from the script you sent to Bugzilla:
>
>my $search = new Bio::SearchIO (
>-verbose => 1,-format => 'blast', -file => $file)
>or die "could not open blast report" if not defined my $search;
>
>Althoygh syntactically correct, I don't think it is doing what you want.
>Please change it to this:
>
>my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die 
>"could not open blast report";
>
>or alternatively, this:
>
>my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>if (not defined $search) {
>   die "could not open blast report";
>}
>
>and let us know what happens.
>
>all the example output you have supplied still suggests that Bio::SearchIO can 
>not load or parse your blast report.
>
>-- 
>Torsten Seemann
>Victorian Bioinformatics Consortium, Monash University, Australia
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May  7 03:34:55 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 7 May 2006 00:34:55 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
Message-ID: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>

Hi all,

I use Bio::Tools::Run::Primer3 to design PCR primers.
I want to change some default values, for example, to
increase the PCR product size to 490-510 bp instead of
using the default value of 100-300 bp. What should I
do ?  


Thanks,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From jason.stajich at duke.edu  Sun May  7 16:49:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 16:49:29 -0400
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>
References: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>
Message-ID: <F69897D1-C65F-47F3-8324-EC2E52A2ACCD@duke.edu>

The problem is in how SearchIO was being initialized, the code  
basically looked like this:

  my $x = new Foo() or die if not defined my $x;

which is invalid for two reason.
  1) if not defined my $x;
  Will ALWAYS be false.

  2) my $x = new Foo() or die ;
  Will cast the new object as a boolean.

Whenever things aren't working, take a look at the code and try and  
walk through any shortcuts.  For clarity make it a two-step process
my $x = new Foo();
die "no valid $x" unless defined $x;

Please note that currently BioPerl WILL die (via throw) if you try  
and ask for an invalid file when you initialize a new IO object  --  
this is handled by code in Bio::Root::IO (line 313 in Bio/Root/IO.pm)  
which all the IO objects use, so you don't really need to do a test  
on the object after all.

--jason
On May 6, 2006, at 11:33 PM, Christopher Fields wrote:

> The -verbose flag was my suggestion; it should output a ton of  
> debugging info
> from SearchIO::blast; if you see anything there, then it means that  
> it's at least
> attempting to parse the report.
>
> Of course I can't test this myself at the moment since my wife's  
> computer died
> (along with the bioperl setup); I'm using a loaner computer at the  
> moment.
>
> Chris
>
> ---- Original message ----
>> Date: Sun, 07 May 2006 10:57:14 +1000
>> From: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>
>> Subject: Re: [Bioperl-l] [BULK]  can't parse blast file anymore
>> To: Hubert Prielinger <hubert.prielinger at gmx.at>
>> Cc: bioperl-l at bioperl.org
>>
>> Hubert Prielinger wrote:
>>> ok, thanks
>>> I have submitted the bug
>>> bug #1994
>>
>> This is a line from the script you sent to Bugzilla:
>>
>> my $search = new Bio::SearchIO (
>> -verbose => 1,-format => 'blast', -file => $file)
>> or die "could not open blast report" if not defined my $search;
>>
>> Althoygh syntactically correct, I don't think it is doing what you  
>> want.
>> Please change it to this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file)  
>> or die
>> "could not open blast report";
>>
>> or alternatively, this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>> if (not defined $search) {
>>   die "could not open blast report";
>> }
>>
>> and let us know what happens.
>>
>> all the example output you have supplied still suggests that  
>> Bio::SearchIO can
>> not load or parse your blast report.
>>
>> -- 
>> Torsten Seemann
>> Victorian Bioinformatics Consortium, Monash University, Australia
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Sun May  7 17:01:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 17:01:29 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>
References: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>
Message-ID: <C9CE0912-9C48-4404-AB56-054A425FE3A0@duke.edu>

I put up some info on the wiki (and I encourage other people to do  
the same!)
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3

Set the command line parameters by just calling a function of the  
name of the parameter.  To get a list of the available options, this  
perl code will report it to you:

# what are the arguments, and what do they mean?
   my $args = $primer3->arguments;

   print "ARGUMENT\tMEANING\n";
   foreach my $key (keys %{$args}) {print "$key\t", $$args{$key}, "\n"}

The info for PRODUCT_SIZE_RANGE is:
   (size range list, default 100-300) space separated list of product  
sizes eg <a>-<b> <x>-<y>

I believe you can set the PCR product size with
   $primer3->primer_product_size_range("490-510");

-jason
On May 7, 2006, at 3:34 AM, chen li wrote:

> Hi all,
>
> I use Bio::Tools::Run::Primer3 to design PCR primers.
> I want to change some default values, for example, to
> increase the PCR product size to 490-510 bp instead of
> using the default value of 100-300 bp. What should I
> do ?
>
>
> Thanks,
>
> Li
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From chen_li3 at yahoo.com  Sun May  7 21:18:17 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 7 May 2006 18:18:17 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <C9CE0912-9C48-4404-AB56-054A425FE3A0@duke.edu>
Message-ID: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>

Hi Jason,

I add the line code   
$primer3->primer_product_size_range("490-510");
 to my script. But it doesn't work nor primer3
complains it.

Li

--- Jason Stajich <jason.stajich at duke.edu> wrote:

> I put up some info on the wiki (and I encourage
> other people to do  
> the same!)
>
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
> 
> Set the command line parameters by just calling a
> function of the  
> name of the parameter.  To get a list of the
> available options, this  
> perl code will report it to you:
> 
> # what are the arguments, and what do they mean?
>    my $args = $primer3->arguments;
> 
>    print "ARGUMENT\tMEANING\n";
>    foreach my $key (keys %{$args}) {print "$key\t",
> $$args{$key}, "\n"}
> 
> The info for PRODUCT_SIZE_RANGE is:
>    (size range list, default 100-300) space
> separated list of product  
> sizes eg <a>-<b> <x>-<y>
> 
> I believe you can set the PCR product size with
>    $primer3->primer_product_size_range("490-510");
> 
> -jason
> On May 7, 2006, at 3:34 AM, chen li wrote:
> 
> > Hi all,
> >
> > I use Bio::Tools::Run::Primer3 to design PCR
> primers.
> > I want to change some default values, for example,
> to
> > increase the PCR product size to 490-510 bp
> instead of
> > using the default value of 100-300 bp. What should
> I
> > do ?
> >
> >
> > Thanks,
> >
> > Li
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From hubert.prielinger at gmx.at  Sun May  7 21:41:14 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 07 May 2006 19:41:14 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445D45EA.8020804@infotech.monash.edu.au>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
	<445D218D.2030504@gmx.at> <445D45EA.8020804@infotech.monash.edu.au>
Message-ID: <445EA1BA.9050301@gmx.at>

hi,
I have corrected that and now I finally I got a few error messages:

blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
Madden, Alejandro A. Sch?ffer,
blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
David J. Lipman
blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
generation of
blast.pm: unrecognized line protein database search programs", Nucleic 
Acids Res. 25:3389-3402.
blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1

after that line it stops without terminating....


Torsten Seemann wrote:
> Hubert Prielinger wrote:
>> ok, thanks
>> I have submitted the bug
>> bug #1994
>
> This is a line from the script you sent to Bugzilla:
>
> my $search = new Bio::SearchIO (
> -verbose => 1,-format => 'blast', -file => $file)
> or die "could not open blast report" if not defined my $search;
>
> Althoygh syntactically correct, I don't think it is doing what you want.
> Please change it to this:
>
> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
> die "could not open blast report";
>
> or alternatively, this:
>
> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
> if (not defined $search) {
>   die "could not open blast report";
> }
>
> and let us know what happens.
>
> all the example output you have supplied still suggests that 
> Bio::SearchIO can not load or parse your blast report.
>


From cjfields at uiuc.edu  Sun May  7 22:04:13 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sun, 7 May 2006 21:04:13 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>

These are debugging lines (not errors); you still have the -verbose flag set.  

Did you follow Jason's advice?  I believe he's right on the money about the issue 
at hand...

Chris

---- Original message ----
>Date: Sun, 07 May 2006 19:41:14 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore  
>To: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>, bioperl-
l at bioperl.org, Chris Fields <cjfields at uiuc.edu>, Jason Stajich 
<jason.stajich at duke.edu>
>
>hi,
>I have corrected that and now I finally I got a few error messages:
>
>blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
>Madden, Alejandro A. Sch?ffer,
>blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
>David J. Lipman
>blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
>generation of
>blast.pm: unrecognized line protein database search programs", Nucleic 
>Acids Res. 25:3389-3402.
>blast.pm: unrecognized line RID: 
1137529800-24476-151611170370.BLASTQ1
>
>after that line it stops without terminating....
>
>
>Torsten Seemann wrote:
>> Hubert Prielinger wrote:
>>> ok, thanks
>>> I have submitted the bug
>>> bug #1994
>>
>> This is a line from the script you sent to Bugzilla:
>>
>> my $search = new Bio::SearchIO (
>> -verbose => 1,-format => 'blast', -file => $file)
>> or die "could not open blast report" if not defined my $search;
>>
>> Althoygh syntactically correct, I don't think it is doing what you want.
>> Please change it to this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
>> die "could not open blast report";
>>
>> or alternatively, this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>> if (not defined $search) {
>>   die "could not open blast report";
>> }
>>
>> and let us know what happens.
>>
>> all the example output you have supplied still suggests that 
>> Bio::SearchIO can not load or parse your blast report.
>>
>


From jason.stajich at duke.edu  Sun May  7 22:47:00 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 22:47:00 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <430DE892-8EE8-4FC9-8BAC-7D344C876B72@duke.edu>

I'm not really familiar with the module more  than what the  
documentation says so did you try and use the add_targets method to  
add arguments instead?  I had thought the AUTOLOAD method took care  
of access to the cmd line arguments as it does for the other Run  
modules but I am not really sure.  Perhaps folks on the list who use  
this module can provide better advice.

-jason
On May 7, 2006, at 9:18 PM, chen li wrote:

> Hi Jason,
>
> I add the line code
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
>
> Li
>
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
>
>> I put up some info on the wiki (and I encourage
>> other people to do
>> the same!)
>>
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>>
>> Set the command line parameters by just calling a
>> function of the
>> name of the parameter.  To get a list of the
>> available options, this
>> perl code will report it to you:
>>
>> # what are the arguments, and what do they mean?
>>    my $args = $primer3->arguments;
>>
>>    print "ARGUMENT\tMEANING\n";
>>    foreach my $key (keys %{$args}) {print "$key\t",
>> $$args{$key}, "\n"}
>>
>> The info for PRODUCT_SIZE_RANGE is:
>>    (size range list, default 100-300) space
>> separated list of product
>> sizes eg <a>-<b> <x>-<y>
>>
>> I believe you can set the PCR product size with
>>    $primer3->primer_product_size_range("490-510");
>>
>> -jason
>> On May 7, 2006, at 3:34 AM, chen li wrote:
>>
>>> Hi all,
>>>
>>> I use Bio::Tools::Run::Primer3 to design PCR
>> primers.
>>> I want to change some default values, for example,
>> to
>>> increase the PCR product size to 490-510 bp
>> instead of
>>> using the default value of 100-300 bp. What should
>> I
>>> do ?
>>>
>>>
>>> Thanks,
>>>
>>> Li
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From osborne1 at optonline.net  Mon May  8 10:49:22 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 08 May 2006 10:49:22 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <C084D2B2.85D7%osborne1@optonline.net>

Li,

Read the documentation, Bio::Tools::Run::Primer3. It shows examples of the
correct syntax. Also look at bioperl-run/t/Primer3.t.

Brian O.


On 5/7/06 9:18 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> Hi Jason,
> 
> I add the line code
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
> 
> Li
> 
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
> 
>> I put up some info on the wiki (and I encourage
>> other people to do
>> the same!)
>> 
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>> 
>> Set the command line parameters by just calling a
>> function of the 
>> name of the parameter.  To get a list of the
>> available options, this
>> perl code will report it to you:
>> 
>> # what are the arguments, and what do they mean?
>>    my $args = $primer3->arguments;
>> 
>>    print "ARGUMENT\tMEANING\n";
>>    foreach my $key (keys %{$args}) {print "$key\t",
>> $$args{$key}, "\n"}
>> 
>> The info for PRODUCT_SIZE_RANGE is:
>>    (size range list, default 100-300) space
>> separated list of product
>> sizes eg <a>-<b> <x>-<y>
>> 
>> I believe you can set the PCR product size with
>>    $primer3->primer_product_size_range("490-510");
>> 
>> -jason
>> On May 7, 2006, at 3:34 AM, chen li wrote:
>> 
>>> Hi all,
>>> 
>>> I use Bio::Tools::Run::Primer3 to design PCR
>> primers.
>>> I want to change some default values, for example,
>> to
>>> increase the PCR product size to 490-510 bp
>> instead of
>>> using the default value of 100-300 bp. What should
>> I
>>> do ?
>>> 
>>> 
>>> Thanks,
>>> 
>>> Li
>>> 
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> 
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy at colibase.bham.ac.uk  Mon May  8 07:12:49 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Mon, 08 May 2006 12:12:49 +0100
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <445F27B1.40501@colibase.bham.ac.uk>

Hi Li,

I think the syntax you need is:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

I guess you may also need to change the parameter PRIMER_PRODUCT_OPT_SIZE.

Incidentally, such a restricted product size range may mean that Primer3 
is unable to design any suitable primers. If I recall correctly, this 
doesn't cause an error, you just get a Bio::Tools::Primer3 object with 
no primers in it. I have had some success with testing for this, and if 
necessary relaxing some constraints on primer design and re-running 
Primer3.

Hope this helps.
Roy.

--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk

> Hi Jason,
> 
> I add the line code   
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
> 
> Li
> 
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
> 
>> > I put up some info on the wiki (and I encourage
>> > other people to do  
>> > the same!)
>> >
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>> > 
>> > Set the command line parameters by just calling a
>> > function of the  
>> > name of the parameter.  To get a list of the
>> > available options, this  
>> > perl code will report it to you:
>> > 
>> > # what are the arguments, and what do they mean?
>> >    my $args = $primer3->arguments;
>> > 
>> >    print "ARGUMENT\tMEANING\n";
>> >    foreach my $key (keys %{$args}) {print "$key\t",
>> > $$args{$key}, "\n"}
>> > 
>> > The info for PRODUCT_SIZE_RANGE is:
>> >    (size range list, default 100-300) space
>> > separated list of product  
>> > sizes eg <a>-<b> <x>-<y>
>> > 
>> > I believe you can set the PCR product size with
>> >    $primer3->primer_product_size_range("490-510");
>> > 
>> > -jason
>> > On May 7, 2006, at 3:34 AM, chen li wrote:
>> > 
>>> > > Hi all,
>>> > >
>>> > > I use Bio::Tools::Run::Primer3 to design PCR
>> > primers.
>>> > > I want to change some default values, for example,
>> > to
>>> > > increase the PCR product size to 490-510 bp
>> > instead of
>>> > > using the default value of 100-300 bp. What should
>> > I
>>> > > do ?
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Li
>>> > >
>>> > > __________________________________________________
>>> > > Do You Yahoo!?
>>> > > Tired of spam?  Yahoo! Mail has the best spam
>> > protection around
>>> > > http://mail.yahoo.com
>>> > > _______________________________________________
>>> > > Bioperl-l mailing list
>>> > > Bioperl-l at lists.open-bio.org
>>> > >
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > 
>> > --
>> > Jason Stajich
>> > Duke University
>> > http://www.duke.edu/~jes12
>> > 
>> > 
>> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From chen_li3 at yahoo.com  Mon May  8 09:21:54 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 8 May 2006 06:21:54 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <445F27B1.40501@colibase.bham.ac.uk>
Message-ID: <20060508132154.71440.qmail@web36802.mail.mud.yahoo.com>

I think Dr. Chaudhuri is correct. 

I add the follwoing line codes to my script(actually
copy from the document)

$primer3->add_targets(
PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

$primer3->add_targets('PRIMER_MIN_TM'=>60,
'PRIMER_MAX_TM'=>64);

to design the primers with product size from 490-510
bp and primer annealing Tm from 60 to 64C .

Here is part of the output in the file called
temp.out:

.......... original sequence.....
GTGGGCTGGTGTTGCTTGGAAAATTTCAAAATCCCAAAGTTTCAGGCTTCCCAAAGTTGGCTTGGAAAAATGTGATAGTCTCACCTGAGTCTAGACATGT
.................

PRIMER_PRODUCT_SIZE_RANGE=490-510
PRIMER_MIN_TM=60
PRIMER_MAX_TM=64
PRIMER_PAIR_PENALTY=0.1544
PRIMER_LEFT_PENALTY=0.081468
PRIMER_RIGHT_PENALTY=0.072951
PRIMER_LEFT_SEQUENCE=CCAAAGTTGGCTTGGAAAAA
...............................
PRIMER_PRODUCT_SIZE=501

..............

This is what I want. If you don't set the special
parameters such annealing Tm program will use the
defualt ones. If you set your own parameters they will
show up after the sequence (see this output example).

If one needs to set more parameters and wants to know
what parameters are available just browse the code for
BEGIN section.

Now I have another question: the program always prints
out the original sequence at the beginning is it
possible not to do that?

Thanks all for join this topic,

Li 


--- Roy Chaudhuri <roy at colibase.bham.ac.uk> wrote:

> Hi Li,
> 
> I think the syntax you need is:
> 
>
$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');
> 
> I guess you may also need to change the parameter
> PRIMER_PRODUCT_OPT_SIZE.
> 
> Incidentally, such a restricted product size range
> may mean that Primer3 
> is unable to design any suitable primers. If I
> recall correctly, this 
> doesn't cause an error, you just get a
> Bio::Tools::Primer3 object with 
> no primers in it. I have had some success with
> testing for this, and if 
> necessary relaxing some constraints on primer design
> and re-running 
> Primer3.
> 
> Hope this helps.
> Roy.
> 
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
> 
> http://xbase.bham.ac.uk
> 
> > Hi Jason,
> > 
> > I add the line code   
> > $primer3->primer_product_size_range("490-510");
> >  to my script. But it doesn't work nor primer3
> > complains it.
> > 
> > Li
> > 
> > --- Jason Stajich <jason.stajich at duke.edu> wrote:
> > 
> >> > I put up some info on the wiki (and I encourage
> >> > other people to do  
> >> > the same!)
> >> >
> >
>
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
> >> > 
> >> > Set the command line parameters by just calling
> a
> >> > function of the  
> >> > name of the parameter.  To get a list of the
> >> > available options, this  
> >> > perl code will report it to you:
> >> > 
> >> > # what are the arguments, and what do they
> mean?
> >> >    my $args = $primer3->arguments;
> >> > 
> >> >    print "ARGUMENT\tMEANING\n";
> >> >    foreach my $key (keys %{$args}) {print
> "$key\t",
> >> > $$args{$key}, "\n"}
> >> > 
> >> > The info for PRODUCT_SIZE_RANGE is:
> >> >    (size range list, default 100-300) space
> >> > separated list of product  
> >> > sizes eg <a>-<b> <x>-<y>
> >> > 
> >> > I believe you can set the PCR product size with
> >> >   
> $primer3->primer_product_size_range("490-510");
> >> > 
> >> > -jason
> >> > On May 7, 2006, at 3:34 AM, chen li wrote:
> >> > 
> >>> > > Hi all,
> >>> > >
> >>> > > I use Bio::Tools::Run::Primer3 to design PCR
> >> > primers.
> >>> > > I want to change some default values, for
> example,
> >> > to
> >>> > > increase the PCR product size to 490-510 bp
> >> > instead of
> >>> > > using the default value of 100-300 bp. What
> should
> >> > I
> >>> > > do ?
> >>> > >
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Li
> >>> > >
> >>> > >
> __________________________________________________
> >>> > > Do You Yahoo!?
> >>> > > Tired of spam?  Yahoo! Mail has the best
> spam
> >> > protection around
> >>> > > http://mail.yahoo.com
> >>> > >
> _______________________________________________
> >>> > > Bioperl-l mailing list
> >>> > > Bioperl-l at lists.open-bio.org
> >>> > >
> >> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> > 
> >> > --
> >> > Jason Stajich
> >> > Duke University
> >> > http://www.duke.edu/~jes12
> >> > 
> >> > 
> >> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> > http://mail.yahoo.com 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From hubert.prielinger at gmx.at  Mon May  8 15:09:29 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 08 May 2006 13:09:29 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>
References: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>
Message-ID: <445F9769.70500@gmx.at>

hi all together,
i have solved the problem, because I'm parsing blast 2.2.13 and I have 
installed an early bioperl 1.5.1 and there it occurred that
bug 1934 wasn't fixed yet, so I had to exchange the blast.pm file and 
now it works properly.

thank you very much
Hubert

Christopher Fields wrote:
> These are debugging lines (not errors); you still have the -verbose flag set.  
>
> Did you follow Jason's advice?  I believe he's right on the money about the issue 
> at hand...
>
> Chris
>
> ---- Original message ----
>   
>> Date: Sun, 07 May 2006 19:41:14 -0600
>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>> Subject: Re: [Bioperl-l] [BULK]  can't parse blast file anymore  
>> To: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>, bioperl-
>>     
> l at bioperl.org, Chris Fields <cjfields at uiuc.edu>, Jason Stajich 
> <jason.stajich at duke.edu>
>   
>> hi,
>> I have corrected that and now I finally I got a few error messages:
>>
>> blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
>> Madden, Alejandro A. Sch?ffer,
>> blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
>> David J. Lipman
>> blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
>> generation of
>> blast.pm: unrecognized line protein database search programs", Nucleic 
>> Acids Res. 25:3389-3402.
>> blast.pm: unrecognized line RID: 
>>     
> 1137529800-24476-151611170370.BLASTQ1
>   
>> after that line it stops without terminating....
>>
>>
>> Torsten Seemann wrote:
>>     
>>> Hubert Prielinger wrote:
>>>       
>>>> ok, thanks
>>>> I have submitted the bug
>>>> bug #1994
>>>>         
>>> This is a line from the script you sent to Bugzilla:
>>>
>>> my $search = new Bio::SearchIO (
>>> -verbose => 1,-format => 'blast', -file => $file)
>>> or die "could not open blast report" if not defined my $search;
>>>
>>> Althoygh syntactically correct, I don't think it is doing what you want.
>>> Please change it to this:
>>>
>>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
>>> die "could not open blast report";
>>>
>>> or alternatively, this:
>>>
>>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>>> if (not defined $search) {
>>>   die "could not open blast report";
>>> }
>>>
>>> and let us know what happens.
>>>
>>> all the example output you have supplied still suggests that 
>>> Bio::SearchIO can not load or parse your blast report.
>>>
>>>       
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From s.johri at imperial.ac.uk  Mon May  8 11:38:13 2006
From: s.johri at imperial.ac.uk (Johri, Saurabh)
Date: Mon, 8 May 2006 16:38:13 +0100
Subject: [Bioperl-l] PAML + Codeml problem..
Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>

Hi all,
 
I'm trying to use codeml from PAML to estimate Ka, Ks values from
sequences within a multi fasta file:
i'm using the code which has been posted on the bioperl wiki...
 
However, when I run the code, i get the following errors:
 
I did a google search to see if anyone had come across similar
problems.... in which case the problem seems to have been due to the
sequences not being a multiple of 3,
In my code I check if the sequence is a multiple of 3 and if  not, i
alter the sequences until this is the case, although I still have the
same error messages,
 
Any suggestions as to why this could be happening?
 
Thanks!!!
 
Saurabh Johri
Tuberculosis Research Group
Centre for Molecular Microbiology & Infection
Imperial College London
SW7 2AZ

 
-------------------- WARNING ---------------------
MSG: There was an error - see error_string for the program output
---------------------------------------------------
 
------------- EXCEPTION Bio::Root::NotImplemented -------------
MSG: Unknown format of PAML output
STACK Bio::Tools::Phylo::PAML::_parse_summary
/sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359
STACK Bio::Tools::Phylo::PAML::next_result
/sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224
------------------------------------
 
>Rv3923c
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_cdc1551
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac
>Rv3923c_mtb_f11
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_c1
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_210
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mbovis
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
 
------------------------------------


From chen_li3 at yahoo.com  Mon May  8 20:21:42 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 8 May 2006 17:21:42 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple sequences
Message-ID: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com>

Dear all,

The following is the script I use to design primers
for one sequence:

#!/cygdrive/c/Perl/bin/perl.exe

use warnings;
use strict;

use Bio::Tools::Run::Primer3;
use Bio::SeqIO;

my $file_in='piwil2.fa';
my $file_out='temp.out';
my $seqio=Bio::SeqIO->new(-file=>$file_in)
                    
my $seq=$seqio->next_seq;      
my $primer3=Bio::Tools::Run::Primer3->new(
                                            
-seq=>$seq,
-outfile=>$file_out,
- path=>"c:/Perl/local/primer3_1.0.0/src/primer3.exe" 
                                           
 );
                                                    
  unless ($primer3->executable){                	print
"primer3 can not be found. 
             Is it installed?\n"; 
  		exit(-1);
   }

$primer3->add_targets(
# set your own parameters for the primers or product
				
'PRIMER_OPT_GC_PERCENT'=>' 50   ',		
'PRIMER_OPT_SIZE'=>  '24    ',		
'PRIMER_OPT_TM'=>  ' 60   ');
                      	
  my $result=$primer3->run;    

   exit;

I try to modify it for multiple sequences by using a
while loop as following:

while ($seq=$seqio->next_seq){

my $primer3=Bio::Tools::Run::Primer3->new()
# design the primer}
....}

I get primers only for the last sequence. It seems the
earlier ones are overwritten.

Any idea will be highly aprreciated.

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From jason.stajich at duke.edu  Mon May  8 20:59:26 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 8 May 2006 20:59:26 -0400
Subject: [Bioperl-l] PAML + Codeml problem..
In-Reply-To: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>
References: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>
Message-ID: <4796FE3D-9D14-4D93-B455-69EDFE2B2B62@duke.edu>

Saurabh -

a) These sequences are identical except for difference in length so  
there isn't going to be any interesting values from PAML, but maybe  
you are just providing an example?
b) I think you are missing the trailing gaps in the alignment of the  
Rv3923c_mtb_cdc1551 sequence as it is shorter PAML requires aligned  
sequences as input.
c) The sequences, in the reading frame you have provided (and using  
the standard translation table), have stop codons in them, this will  
cause failure as well.

Which code from the wiki are you running, the 'running PAML' part of  
the HOWTO?

Try looking at the actual output from PAML to figure out what is wrong.
Add this when initializing the Run object:
-save_tempfiles => 1,
-verbose => 1,

then open up the tempdir that is reported and look at the output  
files (mlc file).

-jason

On May 8, 2006, at 11:38 AM, Johri, Saurabh wrote:

> Hi all,
>
> I'm trying to use codeml from PAML to estimate Ka, Ks values from
> sequences within a multi fasta file:
> i'm using the code which has been posted on the bioperl wiki...
>
> However, when I run the code, i get the following errors:
>
> I did a google search to see if anyone had come across similar
> problems.... in which case the problem seems to have been due to the
> sequences not being a multiple of 3,
> In my code I check if the sequence is a multiple of 3 and if  not, i
> alter the sequences until this is the case, although I still have the
> same error messages,
>
> Any suggestions as to why this could be happening?
>
> Thanks!!!
>
> Saurabh Johri
> Tuberculosis Research Group
> Centre for Molecular Microbiology & Infection
> Imperial College London
> SW7 2AZ
>
>
>
>
> -------------------- WARNING ---------------------
> MSG: There was an error - see error_string for the program output
> ---------------------------------------------------
>
> ------------- EXCEPTION Bio::Root::NotImplemented -------------
> MSG: Unknown format of PAML output
> STACK Bio::Tools::Phylo::PAML::_parse_summary
> /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359
> STACK Bio::Tools::Phylo::PAML::next_result
> /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224
> ------------------------------------
>
>> Rv3923c
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_cdc1551
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac
>> Rv3923c_mtb_f11
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_c1
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_210
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mbovis
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>
> ------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From osborne1 at optonline.net  Mon May  8 21:17:22 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 08 May 2006 21:17:22 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com>
Message-ID: <C08565E2.85FD%osborne1@optonline.net>

Li,

If you're analyzing multiple input sequences you're going to have to create
multiple output sequences.

Brian O.


On 5/8/06 8:21 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> I get primers only for the last sequence. It seems the
> earlier ones are overwritten.


From WiersmaP at AGR.GC.CA  Mon May  8 21:28:27 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Mon, 8 May 2006 21:28:27 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C41@onncrxms5.agr.gc.ca>

Hi Li,

 
When you execute $primer3->run with a Bio::Tools::Run::Primer3 object it
opens -outfile=>"filename" for writing and then closes.  That's why
putting it in a loop will overwrite your output file each time so you
only see the last one.  I suppose you could read in each output file
before looping to the next seq and append it to another file.

 
If you're doing a fair bit of work with this module it would be worth
looking at the Bio::Tools::Primer3 module.  The statement $result =
$primer3->run produces a Bio::Tools::Primer3 object which has all the
methods you need for customizing your output.

 
Paul

 
Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC

wiersmap at agr.gc.ca

 
From simon_sask at yahoo.com  Tue May  9 04:06:04 2006
From: simon_sask at yahoo.com (Simon K. Chan)
Date: Tue, 9 May 2006 01:06:04 -0700 (PDT)
Subject: [Bioperl-l] Raw Blast Alignment
Message-ID: <20060509080604.53621.qmail@web54104.mail.yahoo.com>

Hi Fellow Bioperl-ers,

bioperl-live/examples/searchio/rawwriter.pl is
supposed to show the raw alignments using
Bio::SearchIO.  The script is written to parse a
PSI-BLAST report.  I found an old email in the archive
from Jason stating that this should parse other
flavors of blast reports as well.  

What do I need to do to make this script parse non-PSI
blast reports?  I tried to just specify a file and
that the -format is 'blast', but I get an error
stating that the object method 'raw_hit_data' is not
defined in Bio::Search::Hit::BlastHit.

Basically, I want to obtain the raw alignment because
I'd like to get the size of the gaps, not just the
number.

Any help will be much appreciated.
Many thanks


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From cjfields at uiuc.edu  Tue May  9 08:21:02 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Tue, 9 May 2006 07:21:02 -0500
Subject: [Bioperl-l] Raw Blast Alignment
Message-ID: <fe65cab2.b5b5696a.81acb00@expms6.cites.uiuc.edu>

You need to read the SearchIO HOWTO, which gives several examples:

http://www.bioperl.org/wiki/HOWTO:SearchIO

Chris

---- Original message ----
>Date: Tue, 9 May 2006 01:06:04 -0700 (PDT)
>From: "Simon K. Chan" <simon_sask at yahoo.com>  
>Subject: [Bioperl-l] Raw Blast Alignment  
>To: bioperl-l at lists.open-bio.org
>
>Hi Fellow Bioperl-ers,
>
>bioperl-live/examples/searchio/rawwriter.pl is
>supposed to show the raw alignments using
>Bio::SearchIO.  The script is written to parse a
>PSI-BLAST report.  I found an old email in the archive
>from Jason stating that this should parse other
>flavors of blast reports as well.  
>
>What do I need to do to make this script parse non-PSI
>blast reports?  I tried to just specify a file and
>that the -format is 'blast', but I get an error
>stating that the object method 'raw_hit_data' is not
>defined in Bio::Search::Hit::BlastHit.
>
>Basically, I want to obtain the raw alignment because
>I'd like to get the size of the gaps, not just the
>number.
>
>Any help will be much appreciated.
>Many thanks
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

From peterm at bioinf.uni-leipzig.de  Tue May  9 08:44:25 2006
From: peterm at bioinf.uni-leipzig.de (Peter Menzel)
Date: Tue, 09 May 2006 14:44:25 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <44608EA9.1030808@bioinf.uni-leipzig.de>

Hi all,
I am using the Bio::Graphics module to draw sequences and their features 
with Bio::SeqFeature::Generic.
The features I want to highlight are occurrences of transcription 
binding factors. Therefore I want to give every factor its own color, 
but i didn't see how to manage it. I only can colorize complete tracks.
Is there a known workaround?

Thanks, Peter


From Marc.Logghe at DEVGEN.com  Tue May  9 10:13:24 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 9 May 2006 16:13:24 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D88@ANTARESIA.be.devgen.com>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Peter Menzel
> Sent: Tuesday, May 09, 2006 2:44 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] colorize features
> 
> Hi all,
> I am using the Bio::Graphics module to draw sequences and 
> their features with Bio::SeqFeature::Generic.
> The features I want to highlight are occurrences of 
> transcription binding factors. Therefore I want to give every 
> factor its own color, but i didn't see how to manage it. I 
> only can colorize complete tracks.
> Is there a known workaround?

Yes, instead of giving a hardcoded color value you can pass a subroutine
to the option.
-bgcolor => sub {
    my $feat = shift;
    # get your attribute on which you want to base your color
    my ($attr) = $feat->get_tag_values('my_attribute');

    return $attr > 10 ? 'red' : 'green'
}

Not sure about the method calls I am making here (could as well be
get_attributes()) but you get the idea.
Cheers,
Marc


From Marc.Logghe at DEVGEN.com  Tue May  9 10:47:06 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 9 May 2006 16:47:06 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D89@ANTARESIA.be.devgen.com>

Hi Peter,
Actually it is explained much better in this howto:
http://bioperl.org/wiki/HOWTO:Graphics

The examples show the principle I mentioned in my previous post (e.g.
Example 4), but then for the -label or -description options.
But as said, you can apply this as well for (most of ?) the other
options as well.
Regards,
ML 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Marc Logghe
> Sent: Tuesday, May 09, 2006 4:13 PM
> To: Peter Menzel; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] colorize features
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Peter 
> > Menzel
> > Sent: Tuesday, May 09, 2006 2:44 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] colorize features
> > 
> > Hi all,
> > I am using the Bio::Graphics module to draw sequences and their 
> > features with Bio::SeqFeature::Generic.
> > The features I want to highlight are occurrences of transcription 
> > binding factors. Therefore I want to give every factor its 
> own color, 
> > but i didn't see how to manage it. I only can colorize complete 
> > tracks.
> > Is there a known workaround?
> 
> Yes, instead of giving a hardcoded color value you can pass a 
> subroutine to the option.
> -bgcolor => sub {
>     my $feat = shift;
>     # get your attribute on which you want to base your color
>     my ($attr) = $feat->get_tag_values('my_attribute');
> 
>     return $attr > 10 ? 'red' : 'green'
> }
> 
> Not sure about the method calls I am making here (could as well be
> get_attributes()) but you get the idea.
> Cheers,
> Marc
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From WiersmaP at AGR.GC.CA  Tue May  9 11:49:33 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Tue, 9 May 2006 11:49:33 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca>

Hi Li,

The line "my $result = $primer3->run" is already in the code you submitted.  In the Bio::Tools::Primer3 module the author uses "$p3" for the object.  If you change your line to "my $p3 = $primer3->run" you should be able to run the examples below. Process the results for each sequence and output the results before looping to the next sequence.

>From Bio::Tools::Primer3.pm:

 # how many results were there?
 my $num=$p3->number_of_results;
 print "There were $num results\n";

 # get all the results
 my $all_results=$p3->all_results;
 print "ALL the results\n";
 foreach my $key (keys %{$all_results}) {print "$key\t${$all_results}{$key}\n"}

 # get specific results
 my $result1=$p3->primer_results(1);
 print "The first primer is\n";
 foreach my $key (keys %{$result1}) {print "$key\t${$result1}{$key}\n"}

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Monday, May 08, 2006 8:32 PM
To: Wiersma, Paul
Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

I read both documents. What I understand is that
Bio:Tools::Run:Primer3 is for designing primers and
Bio:Tools::Primer3 is for parsing the results. When I
read the documents I do not see this line
 $result = $primer3->run in Bio:Tools::Primer3. I
wonder how you get this infomration.

Thanks,

Li 

--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
>  
> 
> When you execute $primer3->run with a
> Bio::Tools::Run::Primer3 object it
> opens -outfile=>"filename" for writing and then
> closes.  That's why
> putting it in a loop will overwrite your output file
> each time so you
> only see the last one.  I suppose you could read in
> each output file
> before looping to the next seq and append it to
> another file.
> 
>  
> 
> If you're doing a fair bit of work with this module
> it would be worth
> looking at the Bio::Tools::Primer3 module.  The
> statement $result =
> $primer3->run produces a Bio::Tools::Primer3 object
> which has all the
> methods you need for customizing your output.
> 
>  
> 
> Paul
> 
>  
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> 
> wiersmap at agr.gc.ca
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Tue May  9 13:32:32 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 9 May 2006 10:32:32 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca>
Message-ID: <20060509173232.18843.qmail@web36802.mail.mud.yahoo.com>

Thanks Paul it REALLY works.

I have other questions:
1) When I run the script I use this line on the
command prompt

perl primer.pl >test

When I check the default output file(temp.out) used by
the script I only see the information about the last
sequence which is different from what is in the test
file. In test file I can get all the information for
all the sequences.

2)Is it possible directly to use Bio::Tools:: Primer3
to print out selective information such as the primer
sequence and the size of PCR product?  Or do I have
parse the file by myself?

After I get all these information I would like to post
the script for bacth-designing PCR primers.


Thanks,

Li 


--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
> The line "my $result = $primer3->run" is already in
> the code you submitted.  In the Bio::Tools::Primer3
> module the author uses "$p3" for the object.  If you
> change your line to "my $p3 = $primer3->run" you
> should be able to run the examples below. Process
> the results for each sequence and output the results
> before looping to the next sequence.
> 
> >From Bio::Tools::Primer3.pm:
> 
>  # how many results were there?
>  my $num=$p3->number_of_results;
>  print "There were $num results\n";
> 
>  # get all the results
>  my $all_results=$p3->all_results;
>  print "ALL the results\n";
>  foreach my $key (keys %{$all_results}) {print
> "$key\t${$all_results}{$key}\n"}
> 
>  # get specific results
>  my $result1=$p3->primer_results(1);
>  print "The first primer is\n";
>  foreach my $key (keys %{$result1}) {print
> "$key\t${$result1}{$key}\n"}
> 
> Paul
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> ?
> 
> 
> 
> -----Original Message-----
> From: chen li [mailto:chen_li3 at yahoo.com] 
> Sent: Monday, May 08, 2006 8:32 PM
> To: Wiersma, Paul
> Subject: Re: [Bioperl-l] use primer3 to design
> primers with multiple sequences
> 
> Hi Paul,
> 
> I read both documents. What I understand is that
> Bio:Tools::Run:Primer3 is for designing primers and
> Bio:Tools::Primer3 is for parsing the results. When
> I
> read the documents I do not see this line
>  $result = $primer3->run in Bio:Tools::Primer3. I
> wonder how you get this infomration.
> 
> Thanks,
> 
> Li 
> 
> --- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:
> 
> > Hi Li,
> > 
> >  
> > 
> > When you execute $primer3->run with a
> > Bio::Tools::Run::Primer3 object it
> > opens -outfile=>"filename" for writing and then
> > closes.  That's why
> > putting it in a loop will overwrite your output
> file
> > each time so you
> > only see the last one.  I suppose you could read
> in
> > each output file
> > before looping to the next seq and append it to
> > another file.
> > 
> >  
> > 
> > If you're doing a fair bit of work with this
> module
> > it would be worth
> > looking at the Bio::Tools::Primer3 module.  The
> > statement $result =
> > $primer3->run produces a Bio::Tools::Primer3
> object
> > which has all the
> > methods you need for customizing your output.
> > 
> >  
> > 
> > Paul
> > 
> >  
> > 
> > Paul A. Wiersma
> > Agriculture and Agri-Food Canada/Agriculture et
> > Agroalimentaire Canada
> > Summerland, BC
> > 
> > wiersmap at agr.gc.ca
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From WiersmaP at AGR.GC.CA  Tue May  9 13:59:20 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Tue, 9 May 2006 13:59:20 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca>

Hi Li,

I've attached some code I used to explore basic functionality of Primer3.pm modules.  Hopefully you can see how I've picked out parts of the results for printing.  You can modify it as you need to output only some results.

>>>>>>>>
  # design the primers. This runs primer3 and returns a 
  # Bio::Tools::Run::Primer3 object with the results
my $results=$primer3->run;

  # see the Bio::Tools::Run::Primer3 pod for
  # things that you can get from this. For example:

print "There were ", $results->number_of_results+1, " primers\n";

my @out_keys_part = qw(	START
			LENGTH
			TM
			GC_PERCENT
			SELF_ANY
			SELF_END
			SEQUENCE );

for (my $i=0;$i <= $results->number_of_results;$i++){
	
	# get specific results
	my $result1=$results->primer_results($i);
 
	print "\n",$i+1;	
	for $key qw(PRIMER_LEFT PRIMER_RIGHT){
			
			my ($start, $length) = split /,/, ${$result1}{$key};
			${$result1}{$key."_START"} = $start;
			${$result1}{$key."_LENGTH"} = $length;
			foreach $partkey (@out_keys_part) {
				print "\t", ${$result1}{$key."_".$partkey};
			} 
			print "\n";
	}
	print "\tPRODUCT SIZE: ", ${$result1}{'PRIMER_PRODUCT_SIZE'}, ", PAIR ANY COMPL: ",
				${$result1}{'PRIMER_PAIR_COMPL_ANY'};
	print ", PAIR 3\' COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_END'}, "\n";
}
>>>>>>>>>>>>>>>

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Telephone/T?l?phone: 250-494-6388
Facsimile/T?l?copieur: 250-494-0755
Box 5000, 4200 Hwy 97
Summerland, BC
V0H 1Z0
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Tuesday, May 09, 2006 10:33 AM
To: Wiersma, Paul
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences

Thanks Paul it REALLY works.

I have other questions:
1) When I run the script I use this line on the
command prompt

perl primer.pl >test

When I check the default output file(temp.out) used by
the script I only see the information about the last
sequence which is different from what is in the test
file. In test file I can get all the information for
all the sequences.

2)Is it possible directly to use Bio::Tools:: Primer3
to print out selective information such as the primer
sequence and the size of PCR product?  Or do I have
parse the file by myself?

After I get all these information I would like to post
the script for bacth-designing PCR primers.


Thanks,

Li 


--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
> The line "my $result = $primer3->run" is already in
> the code you submitted.  In the Bio::Tools::Primer3
> module the author uses "$p3" for the object.  If you
> change your line to "my $p3 = $primer3->run" you
> should be able to run the examples below. Process
> the results for each sequence and output the results
> before looping to the next sequence.
> 
> >From Bio::Tools::Primer3.pm:
> 
>  # how many results were there?
>  my $num=$p3->number_of_results;
>  print "There were $num results\n";
> 
>  # get all the results
>  my $all_results=$p3->all_results;
>  print "ALL the results\n";
>  foreach my $key (keys %{$all_results}) {print
> "$key\t${$all_results}{$key}\n"}
> 
>  # get specific results
>  my $result1=$p3->primer_results(1);
>  print "The first primer is\n";
>  foreach my $key (keys %{$result1}) {print
> "$key\t${$result1}{$key}\n"}
> 
> Paul
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> ?
> 
> 
> 
> -----Original Message-----
> From: chen li [mailto:chen_li3 at yahoo.com] 
> Sent: Monday, May 08, 2006 8:32 PM
> To: Wiersma, Paul
> Subject: Re: [Bioperl-l] use primer3 to design
> primers with multiple sequences
> 
> Hi Paul,
> 
> I read both documents. What I understand is that
> Bio:Tools::Run:Primer3 is for designing primers and
> Bio:Tools::Primer3 is for parsing the results. When
> I
> read the documents I do not see this line
>  $result = $primer3->run in Bio:Tools::Primer3. I
> wonder how you get this infomration.
> 
> Thanks,
> 
> Li 
> 
> --- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:
> 
> > Hi Li,
> > 
> >  
> > 
> > When you execute $primer3->run with a
> > Bio::Tools::Run::Primer3 object it
> > opens -outfile=>"filename" for writing and then
> > closes.  That's why
> > putting it in a loop will overwrite your output
> file
> > each time so you
> > only see the last one.  I suppose you could read
> in
> > each output file
> > before looping to the next seq and append it to
> > another file.
> > 
> >  
> > 
> > If you're doing a fair bit of work with this
> module
> > it would be worth
> > looking at the Bio::Tools::Primer3 module.  The
> > statement $result =
> > $primer3->run produces a Bio::Tools::Primer3
> object
> > which has all the
> > methods you need for customizing your output.
> > 
> >  
> > 
> > Paul
> > 
> >  
> > 
> > Paul A. Wiersma
> > Agriculture and Agri-Food Canada/Agriculture et
> > Agroalimentaire Canada
> > Summerland, BC
> > 
> > wiersmap at agr.gc.ca
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  9 17:13:43 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 9 May 2006 16:13:43 -0500
Subject: [Bioperl-l] Oddness in Bio::SeqIO
Message-ID: <000601c673ad$74601c30$15327e82@pyrimidine>

I noticed an odd thing with SeqIO parsing of species lines (those
problematic bacterial tax names again).  I have a simple script that runs
output to STDOUT to generate a list of hits.  Here's what I get:

Bacterium: Corynebacterium glutamicum ATCC 13032
         hits: 4
Bacterium: Corynebacterium jeikeium K411 K411 <--
         hits: 1
Bacterium: Frankia sp. CcI3 CcI3 <--
         hits: 1
Bacterium: Frankia sp. EAN1pec EAN1pec <--
         hits: 1
Bacterium: Janibacter sp. HTCC2649 HTCC2649 <--
         hits: 1
Bacterium: Kineococcus radiotolerans SRS30216 SRS30216  <--
         hits: 1
Bacterium: Leifsonia xyli subsp. xyli str. CTCB07 xyli str. CTCB07 <--
         hits: 1
Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
K-10 <--

...

Most (but not all) of the strain numbers get repeated (marked with arrows).
This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
(and thus passed through Bio::SeqIO).  Anyone seen this before?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From torsten.seemann at infotech.monash.edu.au  Tue May  9 19:42:29 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 10 May 2006 09:42:29 +1000
Subject: [Bioperl-l] Oddness in Bio::SeqIO
In-Reply-To: <000601c673ad$74601c30$15327e82@pyrimidine>
References: <000601c673ad$74601c30$15327e82@pyrimidine>
Message-ID: <446128E5.1000908@infotech.monash.edu.au>

Chris,

> I noticed an odd thing with SeqIO parsing of species lines (those
> problematic bacterial tax names again).  I have a simple script that runs
> output to STDOUT to generate a list of hits.  Here's what I get:

> Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
> K-10 <--

In this case,

Genus = Mycobacterium
Species = avium
Subspecies = paratuberculosis
Strain = K-10

which suggests that BioPerl is trying to handle something special, 
because the 'subsp.' is gone?

Here's the pertinent parts of the Genbank file
(apologies for the wrapping):

LOCUS       NC_002944            4829781 bp    DNA     circular BCT 
18-JAN-2006
DEFINITION  Mycobacterium avium subsp. paratuberculosis K-10, complete 
genome.
SOURCE      Mycobacterium avium subsp. paratuberculosis K-10
   ORGANISM  Mycobacterium avium subsp. paratuberculosis K-10
             Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
             Corynebacterineae; Mycobacteriaceae; Mycobacterium; 
Mycobacterium
             avium complex (MAC).

                      /organism="Mycobacterium avium subsp. 
paratuberculosis K-10"
                      /strain="K-10"
                      /sub_species="paratuberculosis"


> Most (but not all) of the strain numbers get repeated (marked with arrows).
> This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
> (and thus passed through Bio::SeqIO).  Anyone seen this before?

The problem is mentioned in the wiki so it must have come up before?
http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data

I also deal with Bacteria mainly, and should also look into this. I 
haven't been using the genbank headers directly, only the features, so i 
never came across this.

Another thing which may crop up is when no Species has been allocated 
yet but the genus is known (or something like that). In that case the 
name is written as "Genus spp." eg.  	 Gallibacterium spp.

--Torsten


From chen_li3 at yahoo.com  Tue May  9 21:04:08 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 9 May 2006 18:04:08 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C47@onncrxms5.agr.gc.ca>
Message-ID: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com>

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now
figure out the line
"my $result1=$results->primer_results(1);"

returns a hash reference containing all the
information for the first pair of primer.  1)Since it
is a hash I should be able to get the specific value
for its corresponding  key by telling Perl which key
is the entry for the value. 2) Also it is a reference
I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks
a little bit complicated to me. But I get the job done
by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration 
#foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From zhouyubio at gmail.com  Tue May  9 21:35:01 2006
From: zhouyubio at gmail.com (Yu ZHOU)
Date: Wed, 10 May 2006 01:35:01 +0000 (UTC)
Subject: [Bioperl-l] pubmed
References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu>
Message-ID: <loom.20060510T032601-573@post.gmane.org>

Qunfeng <qfdong <at> iastate.edu> writes:

> 
> Hi there,
> 
> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
> 
> I am not very familiar with BioPerl. I tried to follow the example showing 
> in the above page to retrieve pubmed ID under each Reference tag , i.e., 
> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The 
> authors() works for me.  Appreciate any suggestions.
> 
> Qunfeng 
> 


Hi, 

I have the same problem with you. Here is what I have done, by using regular
expression to match the value of 'location' tag, if there is.

#------------------
my $ann = $seqobj->annotation(); # annotation object
foreach my $ref ( $ann->get_Annotations('reference') ) {
    print "Title: ", $ref->title,"\n";
    print "Location: ", $ref->location, "\n";
    if ($ref->location =~ /PUBMED\s+(\d+)/) {
	my $pmid = $1;
	print "PMID: ", $pmid, "\n";
    }
    print "Authors: ", $ref->authors, "\n";
}
#------------------


From osborne1 at optonline.net  Tue May  9 23:01:49 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 09 May 2006 23:01:49 -0400
Subject: [Bioperl-l] pubmed
In-Reply-To: <loom.20060510T032601-573@post.gmane.org>
Message-ID: <C086CFDD.865A%osborne1@optonline.net>

Qunfeng,

I'm using bioperl-live, I'm able retrieve the single PubMed id found in the
56961711 entry using the pubmed() method. Note that there are 4 references,
only one of which has a Pubmed id. Also, the authors() method prints out the
authors, not the Pubmed id. If you have a problem please show your code and
tell us which version of Bioperl you're using.

Brian O.


use strict;

use lib "/Users/bosborne/bioperl-live";

use Bio::DB::GenBank;


my $db = Bio::DB::GenBank->new;

my $seq = $db->get_Seq_by_id(56961711);

my $ann_coll = $seq->annotation;


foreach my $ann ($ann_coll->get_Annotations('reference')) {

  print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n";

}


On 5/9/06 9:35 PM, "Yu ZHOU" <zhouyubio at gmail.com> wrote:

> Qunfeng <qfdong <at> iastate.edu> writes:
> 
>> 
>> Hi there,
>> 
>> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
>> 
>> I am not very familiar with BioPerl. I tried to follow the example showing
>> in the above page to retrieve pubmed ID under each Reference tag , i.e.,
>> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The
>> authors() works for me.  Appreciate any suggestions.
>> 
>> Qunfeng 
>> 
> 
> 
> Hi, 
> 
> I have the same problem with you. Here is what I have done, by using regular
> expression to match the value of 'location' tag, if there is.
> 
> #------------------
> my $ann = $seqobj->annotation(); # annotation object
> foreach my $ref ( $ann->get_Annotations('reference') ) {
>     print "Title: ", $ref->title,"\n";
>     print "Location: ", $ref->location, "\n";
>     if ($ref->location =~ /PUBMED\s+(\d+)/) {
> my $pmid = $1;
> print "PMID: ", $pmid, "\n";
>     }
>     print "Authors: ", $ref->authors, "\n";
> }
> #------------------
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Wed May 10 05:30:59 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 10 May 2006 10:30:59 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
Message-ID: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>

Hi,
I'm a little confused as to how names are supposed to work in 
Bio::Taxonomy::Node.

In the bioperl versions that I've looked at a Node doesn't seem to store 
the most important information about itself - it's scientific name - in 
an obvious place. bioperl 1.5.1 puts it at the start of the 
classification list. I'd have thought sticking it in -name would make 
more sense, but this is used only for the GenBank common name.

The Bio::Taxonomy docs still suggests:

my $node_species_sapiens = Bio::Taxonomy::Node->new(
   -object_id => 9606, # or -ncbi_taxid. Requird tag
   -names => {
       'scientific' => ['sapiens'],
       'common_name' => ['human']
   },
   -rank => 'species'  # Required tag
);

and whilst Bio::Taxonomy::Node does not accept -names, it does have a 
'name' method which claims to work like:

$obj->name('scientific', 'sapiens');

This kind of thing would be really nice, but afaics 
Bio::Taxonomy::Node->new takes the -name value and makes a common name 
out of it, whilst the name() method passes any 'scientific' name to the 
scientific_name() method which is unable to set any value (and warns 
about this), only get.

It seems like the need to have this classification array work the same 
way as Bio::Species is causing some unnecessary restrictions. Can't the 
more sensible idea of having a dedicated storage spot for the 
ScientificName and other parameters be used, with the classification 
array either being generated just-in-time from the hash-stored data, or 
indeed being generated from the Lineage field?


Also, why does a node store the complete hierarchy on itself in the 
classification array? If we're going that far, why don't the 
Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a 
get_taxonomy() method instead of a get_Taxonomy_Node() method. 
get_taxonomy() could, from a single efetch.fcgi lookup, create a 
complete Bio::Taxonomy with all the nodes. Whilst most nodes would only 
have a minimum of information, if you could simply ask a node what its 
rank and scientific name was you could easily build a classification 
array, or ask what Kingdom your species was in etc.

Are there good reasons for Taxonomy working the way it does in 1.5.1, or 
would I not be wasting my time re-writing things to make more sense (to me)?


Cheers,
Sendu.

From osborne1 at optonline.net  Wed May 10 10:33:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 10 May 2006 10:33:18 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca>
Message-ID: <C08771EE.866A%osborne1@optonline.net>

Paul,

I took your code, added some "run" code and made it into a script and added
this to CVS, examples/tools/run_primer3.pl. I hope this is OK with you.

Brian O.


On 5/9/06 1:59 PM, "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> $results->number_of_results


From stoltzfu at umbi.umd.edu  Tue May  9 16:22:43 2006
From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus)
Date: Tue, 09 May 2006 16:22:43 -0400
Subject: [Bioperl-l] proposal: CDAT (character data and trees) integrative
	object
Message-ID: <D8EE6983-2123-45B3-967C-0E4982428CFA@umbi.umd.edu>

Dear developers--

We propose a Bio::CDAT (Character Data And Trees) module to  
facilitate comparative analysis
using evolutionary methods by 1) managing evolutionary relationships  
(by linking data to trees)
and 2) allowing coordinated analysis of different types of data (by  
implementing a generic concept
of ?character-state? data).

Bio::CDAT would take advantage of existing BioPerl objects and would  
include the functionality
of Rutger Vos's Bio::Phylo.  It would provide the framework to  
develop interfaces to analysis tools
(phylogeny inference, evolutionary rate models, functional shift  
inference, etc), as well as to file
formats and visualization methods appropriate for such analyses.

A proposal is attached.  We would like to hear your thoughts (e.g.,  
see the section on "Questions
to consider")!  Thanks

Arlin Stoltzfus
WeiGang Qiu
Rutger Vos
(with thanks to Justin Reese and Aaron Mackey)
------------------
Arlin Stoltzfus (stoltzfu at umbi.umd.edu)
CARB, 9600 Gudelsky Drive, Rockville, Maryland  20850
tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel
---------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CDAT-proposal.pdf
Type: application/pdf
Size: 193701 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060509/48aeca4b/attachment-0001.pdf 
-------------- next part --------------


From zhouyubio at gmail.com  Wed May 10 04:55:46 2006
From: zhouyubio at gmail.com (Yu Zhou)
Date: Wed, 10 May 2006 16:55:46 +0800
Subject: [Bioperl-l] pubmed
In-Reply-To: <C086CFDD.865A%osborne1@optonline.net>
References: <loom.20060510T032601-573@post.gmane.org>
	<C086CFDD.865A%osborne1@optonline.net>
Message-ID: <613ffb490605100155w43a9ea4sca23818bc7fa4e33@mail.gmail.com>

Thanks!

I am using Bioperl-1.4, not bioperl-live. That may be the reason why
it does not work!


On 5/10/06, Brian Osborne <osborne1 at optonline.net> wrote:
> Qunfeng,
>
> I'm using bioperl-live, I'm able retrieve the single PubMed id found in the
> 56961711 entry using the pubmed() method. Note that there are 4 references,
> only one of which has a Pubmed id. Also, the authors() method prints out the
> authors, not the Pubmed id. If you have a problem please show your code and
> tell us which version of Bioperl you're using.
>
> Brian O.
>
>
> use strict;
>
> use lib "/Users/bosborne/bioperl-live";
>
> use Bio::DB::GenBank;
>
>
>
> my $db = Bio::DB::GenBank->new;
>
> my $seq = $db->get_Seq_by_id(56961711);
>
> my $ann_coll = $seq->annotation;
>
>
> foreach my $ann ($ann_coll->get_Annotations('reference')) {
>
>   print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n";
>
> }
>
>
>
>
>
> On 5/9/06 9:35 PM, "Yu ZHOU" <zhouyubio at gmail.com> wrote:
>
> > Qunfeng <qfdong <at> iastate.edu> writes:
> >
> >>
> >> Hi there,
> >>
> >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
> >>
> >> I am not very familiar with BioPerl. I tried to follow the example
> showing
> >> in the above page to retrieve pubmed ID under each Reference tag , i.e.,
> >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The
> >> authors() works for me.  Appreciate any suggestions.
> >>
> >> Qunfeng
> >>
> >
> >
> > Hi,
> >
> > I have the same problem with you. Here is what I have done, by using
> regular
> > expression to match the value of 'location' tag, if there is.
> >
> > #------------------
> > my $ann = $seqobj->annotation(); # annotation object
> > foreach my $ref ( $ann->get_Annotations('reference') ) {
> >     print "Title: ", $ref->title,"\n";
> >     print "Location: ", $ref->location, "\n";
> >     if ($ref->location =~ /PUBMED\s+(\d+)/) {
> > my $pmid = $1;
> > print "PMID: ", $pmid, "\n";
> >     }
> >     print "Authors: ", $ref->authors, "\n";
> > }
> > #------------------
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>


--
Best Wishes!

Yu


From cjfields at uiuc.edu  Wed May 10 11:46:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 10:46:27 -0500
Subject: [Bioperl-l] Oddness in Bio::SeqIO
In-Reply-To: <446128E5.1000908@infotech.monash.edu.au>
Message-ID: <000f01c67448$e63973b0$15327e82@pyrimidine>

This actually pops up when using $seq->species->common_name; using
$seq->species->binomial chops some of the strain designations off, so really
neither one works optimally for bacterial genus-species-strain taxonomy.
Hilmar made the suggestion that it's probably best to grab the NCBI TaxID
and parse it out that way by looking it up in the taxonomy database (using
Bio::DB::Taxonomy), but at the moment that's not what Bio::SeqIO::genbank
does.  

I wonder if we should be trying to shove most of this stuff into species
objects directly from the beginning; in other words, maybe we should try to
get the information in Bio::Annotation objects and then, after the
parsing/IO is finished, have a method to get the information into
Bio::Species objects when wanted/needed; a check could be added against the
NCBI Taxonomy database there.  

Anyway, I really haven't looked at how they are parsed out and don't have
the time at the moment.  I may look into this as well but not until I get
back from conference (end of May).  Jason and Brian have been calling for a
refactoring of Bio::SeqIO::genbank for a while; maybe it's getting time to
do something about it...

Chris 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> Sent: Tuesday, May 09, 2006 6:42 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Oddness in Bio::SeqIO
> 
> Chris,
> 
> > I noticed an odd thing with SeqIO parsing of species lines (those
> > problematic bacterial tax names again).  I have a simple script that
> runs
> > output to STDOUT to generate a list of hits.  Here's what I get:
> 
> > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10
> paratuberculosis
> > K-10 <--
> 
> In this case,
> 
> Genus = Mycobacterium
> Species = avium
> Subspecies = paratuberculosis
> Strain = K-10
> 
> which suggests that BioPerl is trying to handle something special,
> because the 'subsp.' is gone?
> 
> Here's the pertinent parts of the Genbank file
> (apologies for the wrapping):
> 
> LOCUS       NC_002944            4829781 bp    DNA     circular BCT
> 18-JAN-2006
> DEFINITION  Mycobacterium avium subsp. paratuberculosis K-10, complete
> genome.
> SOURCE      Mycobacterium avium subsp. paratuberculosis K-10
>    ORGANISM  Mycobacterium avium subsp. paratuberculosis K-10
>              Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
>              Corynebacterineae; Mycobacteriaceae; Mycobacterium;
> Mycobacterium
>              avium complex (MAC).
> 
>                       /organism="Mycobacterium avium subsp.
> paratuberculosis K-10"
>                       /strain="K-10"
>                       /sub_species="paratuberculosis"
> 
> 
> > Most (but not all) of the strain numbers get repeated (marked with
> arrows).
> > This is actually in the GenBank file itself, downloaded via
> Bio::DB::GenBank
> > (and thus passed through Bio::SeqIO).  Anyone seen this before?
> 
> The problem is mentioned in the wiki so it must have come up before?
> http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data
> 
> I also deal with Bacteria mainly, and should also look into this. I
> haven't been using the genbank headers directly, only the features, so i
> never came across this.
> 
> Another thing which may crop up is when no Species has been allocated
> yet but the genus is known (or something like that). In that case the
> name is written as "Genus spp." eg.  	 Gallibacterium spp.
> 
> --Torsten
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cuiw at mail.nih.gov  Wed May 10 12:02:55 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 12:02:55 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiplesequences
In-Reply-To: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F4999@nihcesmlbx10.nih.gov>


'PRIMER_SEQUENCE_ID' is not a key in the Bio::Tools::Primer3 output
hash.

You can find all legal keys by "print keys %{$result1};"


There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

From WiersmaP at AGR.GC.CA  Wed May 10 12:08:37 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Wed, 10 May 2006 12:08:37 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca>

Brian, no problem with the code, thanks for asking.

Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0).  If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error.

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Tuesday, May 09, 2006 6:04 PM
To: Wiersma, Paul
Cc: bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now
figure out the line
"my $result1=$results->primer_results(1);"

returns a hash reference containing all the
information for the first pair of primer.  1)Since it
is a hash I should be able to get the specific value
for its corresponding  key by telling Perl which key
is the entry for the value. 2) Also it is a reference
I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks
a little bit complicated to me. But I get the job done
by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration 
#foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cuiw at mail.nih.gov  Wed May 10 14:42:36 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 14:42:36 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiplesequences:
	bug in code!
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F49A0@nihcesmlbx10.nih.gov>

Hope this works!

Bio::Tools::Primer3 line 264 should be:
 
$self->{seqobject}=Bio::Seq->new(-seq=>$value, -id=>$id);

Then you should be able to display PRIMER_SEQUENCE_ID by

####read primer3 output file############
my $p3=Bio::Tools::Primer3->new(-file=>"data/primer3_output.txt");

########  print id###############
print $p3->seqobject->id;

Wenwu Cui, PhD
NIH/NCI


-----Original Message-----
From: Wiersma, Paul [mailto:WiersmaP at agr.gc.ca] 
Sent: Wednesday, May 10, 2006 12:09 PM
To: chen li
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] use primer3 to design primers with multiplesequences

Brian, no problem with the code, thanks for asking.

Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0).  If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error.

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com]
Sent: Tuesday, May 09, 2006 6:04 PM
To: Wiersma, Paul
Cc: bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);"

returns a hash reference containing all the information for the first pair of primer.  1)Since it is a hash I should be able to get the specific value for its corresponding  key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 10 14:58:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 13:58:19 -0500
Subject: [Bioperl-l] ListSummaries for April 26-May 9
Message-ID: <001801c67463$b3c0a910$15327e82@pyrimidine>

ListSummaries for April 26-May 9 are up at the usual place:

http://www.bioperl.org/wiki/Mailing_list_summaries

Direct link:

http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006

It's a bit of a hurried one so don't be surprised to find a few spelling
errors here and there.  I'm getting ready for a conference in a couple weeks
so I may be off the radar a bit here and there.  The next ListSummary won't
be posted until May 26.  Enjoy!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From chen_li3 at yahoo.com  Wed May 10 20:27:34 2006
From: chen_li3 at yahoo.com (chen li)
Date: Wed, 10 May 2006 17:27:34 -0700 (PDT)
Subject: [Bioperl-l] What is the relationship between primer3 module and
	run-primer3 module?
Message-ID: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>

First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual 
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3 

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run; 

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li 

 
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From jason.stajich at duke.edu  Wed May 10 20:41:31 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 10 May 2006 20:41:31 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module and
	run-primer3 module?
In-Reply-To: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <B1D9C06A-09FF-4342-81E4-7D38AD66F4CA@duke.edu>

Bio::Tools::Run::XXX modules are for running applications...

On May 10, 2006, at 8:27 PM, chen li wrote:

> First thank you all for replying my previous post
> about primer3.
>
> But now I am a little confused even after I read the
> documents: What is the relationship between these two
> modules? What is correct/standard way to use them to
> do the batch-primer design? What I do is that I use
> Bio::Tools::Run::Primer3 to design primers. Based on
> Dr. Roy Chaudhuri's information I can set the
> parameters using the following syntax:
>
> $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');
>
> Based on Paul A. Wiersma's explanation I can also
> print out part of the primer results(because I don't
> need all the information). But there is a little
> trouble: PRIMER_SEQUENCE_ID can't be accessed using
> this method. And Paul points out that
> "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
> individual
> results but only end up by default with
> $results->primer_results(0)".  So it seems there is no
> way to get around this problem using
> Bio::Tools::Run::Primer3. And others suggest using
> Bio::Tools::Primer3 to parse the results. So is true
> that Bio::Tools::Run::Primer3 is for primer design and
> Bio::Tools::Primer3 is for parsing the results from
> Bio::Tools::Run::Primer3? But what I find is that I
> get almost all the results (except PRIMER_SEQUENCE_ID
> and SEQUENCE ) without providing a line code
>
> use Bio::Tools::Primer3
>
> in the script.  How to explain this? Is it because the
> following line code?
>
> my $result=$primer3->run;
>
> The last question: which line code is used to invoke
> program primer3.exe? How does Perl script call the
> primer3.exe?
>
> Once again thank you all very much,
>
> Li
>
>
>
>
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Wed May 10 20:53:43 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 10 May 2006 20:53:43 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
Message-ID: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>

I would use the implementation that talks to the flatfile db as the  
standard here.  nodes are defined by the data in from taxonomy dump  
dbs from ncbi.
the eutils is pretty worthless except for taxid->name or reverse, you  
can't get the full taxonomy (or couldn't when that implementation was  
written).

The "name" method refers to the name of the node - each level in the  
taxonomy can have a "name".

The bits of hackiness relate to wrapping the node object as a  
Bio::Species and/or being able to read  a genbank file and the  
organism taxonomy data as a list and instantiating.  If we could rely  
on everything being in a DB of course this would be simpler.

Another problem is the depth of the taxonomy is not constant for  
every node so assuming that a fixed number of slots will be filled in  
to generate the taxonomy leads to problems.

Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the  
best example of working code as this is how I really wanted it to  
work, the Bio::Species hacks are only there to shoehorn data  
retrieved from genbank files in.  With the flatfile implementation  
you have to walk all the way up the db hierarchy to get the kingdom  
for a node so you do have to build up the classification hierarchy as  
each node only stores data about itsself.

I'm not exactly sure what you are proposing to do, but would  
definitely enjoy another pair of hands, I don't really have time to  
mess with it any time soon.

-jason
On May 10, 2006, at 5:30 AM, Sendu Bala wrote:

> Hi,
> I'm a little confused as to how names are supposed to work in
> Bio::Taxonomy::Node.
>
> In the bioperl versions that I've looked at a Node doesn't seem to  
> store
> the most important information about itself - it's scientific name  
> - in
> an obvious place. bioperl 1.5.1 puts it at the start of the
> classification list. I'd have thought sticking it in -name would make
> more sense, but this is used only for the GenBank common name.
>
> The Bio::Taxonomy docs still suggests:
>
> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>    -names => {
>        'scientific' => ['sapiens'],
>        'common_name' => ['human']
>    },
>    -rank => 'species'  # Required tag
> );
>
> and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> 'name' method which claims to work like:
>
> $obj->name('scientific', 'sapiens');
>
> This kind of thing would be really nice, but afaics
> Bio::Taxonomy::Node->new takes the -name value and makes a common name
> out of it, whilst the name() method passes any 'scientific' name to  
> the
> scientific_name() method which is unable to set any value (and warns
> about this), only get.
>
> It seems like the need to have this classification array work the same
> way as Bio::Species is causing some unnecessary restrictions. Can't  
> the
> more sensible idea of having a dedicated storage spot for the
> ScientificName and other parameters be used, with the classification
> array either being generated just-in-time from the hash-stored  
> data, or
> indeed being generated from the Lineage field?
>
>
> Also, why does a node store the complete hierarchy on itself in the
> classification array? If we're going that far, why don't the
> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> complete Bio::Taxonomy with all the nodes. Whilst most nodes would  
> only
> have a minimum of information, if you could simply ask a node what its
> rank and scientific name was you could easily build a classification
> array, or ask what Kingdom your species was in etc.
>
> Are there good reasons for Taxonomy working the way it does in  
> 1.5.1, or
> would I not be wasting my time re-writing things to make more sense  
> (to me)?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cuiw at mail.nih.gov  Wed May 10 21:46:00 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 21:46:00 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module
	andrun-primer3 module?
References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B07D391@nihcesmlbx10.nih.gov>

1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file.
 
2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output PRIMER_SEQUENCE_ID
 
3. primer3.exe is called in the Bio::Tools::Run::Primer3  "run" function, please read the function definition.


________________________________

From: chen li [mailto:chen_li3 at yahoo.com]
Sent: Wed 5/10/2006 8:27 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module?


First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run;

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 10 23:36:39 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 22:36:39 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
Message-ID: <000301c674ac$1d40f0f0$15327e82@pyrimidine>

I think you can get pretty much everything now, though I can definitely see
the use of a local database.  I ran a few tests, really unrelated to this,
using the powerscripting test page at NCBI for eutils (for the curious, at
http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was able to
retrieve XML-formatted taxonomic information; here's the bacterium Frankia
sp. CcI3 TaxID info, which looks like they have everything set up by rank.
It gives quite a bit of information. 
 
<?xml version="1.0"?>
<!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
"http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
<TaxaSet>

<Taxon>
  <TaxId>106370</TaxId>
  <ScientificName>Frankia sp. CcI3</ScientificName>
  <ParentTaxId>1854</ParentTaxId>
  <Rank>species</Rank>
  <Division>Bacteria</Division>
  <GeneticCode>
    <GCId>11</GCId>
    <GCName>Bacterial and Plant Plastid</GCName>
  </GeneticCode>
  <MitoGeneticCode>
    <MGCId>0</MGCId>
    <MGCName>Unspecified</MGCName>
  </MitoGeneticCode>
  <Lineage>cellular organisms; Bacteria; Actinobacteria; Actinobacteria
(class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
Frankia</Lineage>
  <LineageEx>
    <Taxon>
      <TaxId>131567</TaxId>
      <ScientificName>cellular organisms</ScientificName>
      <Rank>no rank</Rank>
    </Taxon>
    <Taxon>
      <TaxId>2</TaxId>
      <ScientificName>Bacteria</ScientificName>
      <Rank>superkingdom</Rank>
    </Taxon>
    <Taxon>
      <TaxId>201174</TaxId>
      <ScientificName>Actinobacteria</ScientificName>
      <Rank>phylum</Rank>
    </Taxon>
    <Taxon>
      <TaxId>1760</TaxId>
      <ScientificName>Actinobacteria (class)</ScientificName>
      <Rank>class</Rank>
    </Taxon>
    <Taxon>
      <TaxId>85003</TaxId>
      <ScientificName>Actinobacteridae</ScientificName>
      <Rank>subclass</Rank>
    </Taxon>
    <Taxon>
      <TaxId>2037</TaxId>
      <ScientificName>Actinomycetales</ScientificName>
      <Rank>order</Rank>
    </Taxon>
    <Taxon>
      <TaxId>85013</TaxId>
      <ScientificName>Frankineae</ScientificName>
      <Rank>suborder</Rank>
    </Taxon>
    <Taxon>
      <TaxId>74712</TaxId>
      <ScientificName>Frankiaceae</ScientificName>
      <Rank>family</Rank>
    </Taxon>
    <Taxon>
      <TaxId>1854</TaxId>
      <ScientificName>Frankia</ScientificName>
      <Rank>genus</Rank>
    </Taxon>
  </LineageEx>
  <CreateDate>1999/10/22</CreateDate>
  <UpdateDate>2005/01/19</UpdateDate>
  <PubDate>2000/02/02</PubDate>
</Taxon>


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Wednesday, May 10, 2006 7:54 PM
> To: Sendu Bala
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> 
> I would use the implementation that talks to the flatfile db as the
> standard here.  nodes are defined by the data in from taxonomy dump
> dbs from ncbi.
> the eutils is pretty worthless except for taxid->name or reverse, you
> can't get the full taxonomy (or couldn't when that implementation was
> written).
> 
> The "name" method refers to the name of the node - each level in the
> taxonomy can have a "name".
> 
> The bits of hackiness relate to wrapping the node object as a
> Bio::Species and/or being able to read  a genbank file and the
> organism taxonomy data as a list and instantiating.  If we could rely
> on everything being in a DB of course this would be simpler.
> 
> Another problem is the depth of the taxonomy is not constant for
> every node so assuming that a fixed number of slots will be filled in
> to generate the taxonomy leads to problems.
> 
> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
> best example of working code as this is how I really wanted it to
> work, the Bio::Species hacks are only there to shoehorn data
> retrieved from genbank files in.  With the flatfile implementation
> you have to walk all the way up the db hierarchy to get the kingdom
> for a node so you do have to build up the classification hierarchy as
> each node only stores data about itsself.
> 
> I'm not exactly sure what you are proposing to do, but would
> definitely enjoy another pair of hands, I don't really have time to
> mess with it any time soon.
> 
> -jason
> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> 
> > Hi,
> > I'm a little confused as to how names are supposed to work in
> > Bio::Taxonomy::Node.
> >
> > In the bioperl versions that I've looked at a Node doesn't seem to
> > store
> > the most important information about itself - it's scientific name
> > - in
> > an obvious place. bioperl 1.5.1 puts it at the start of the
> > classification list. I'd have thought sticking it in -name would make
> > more sense, but this is used only for the GenBank common name.
> >
> > The Bio::Taxonomy docs still suggests:
> >
> > my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >    -names => {
> >        'scientific' => ['sapiens'],
> >        'common_name' => ['human']
> >    },
> >    -rank => 'species'  # Required tag
> > );
> >
> > and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> > 'name' method which claims to work like:
> >
> > $obj->name('scientific', 'sapiens');
> >
> > This kind of thing would be really nice, but afaics
> > Bio::Taxonomy::Node->new takes the -name value and makes a common name
> > out of it, whilst the name() method passes any 'scientific' name to
> > the
> > scientific_name() method which is unable to set any value (and warns
> > about this), only get.
> >
> > It seems like the need to have this classification array work the same
> > way as Bio::Species is causing some unnecessary restrictions. Can't
> > the
> > more sensible idea of having a dedicated storage spot for the
> > ScientificName and other parameters be used, with the classification
> > array either being generated just-in-time from the hash-stored
> > data, or
> > indeed being generated from the Lineage field?
> >
> >
> > Also, why does a node store the complete hierarchy on itself in the
> > classification array? If we're going that far, why don't the
> > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> > get_taxonomy() method instead of a get_Taxonomy_Node() method.
> > get_taxonomy() could, from a single efetch.fcgi lookup, create a
> > complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> > only
> > have a minimum of information, if you could simply ask a node what its
> > rank and scientific name was you could easily build a classification
> > array, or ask what Kingdom your species was in etc.
> >
> > Are there good reasons for Taxonomy working the way it does in
> > 1.5.1, or
> > would I not be wasting my time re-writing things to make more sense
> > (to me)?
> >
> >
> > Cheers,
> > Sendu.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 11 08:04:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 08:04:54 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
Message-ID: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>

Great - now we just need someone to volunteer to actually work on this.

The current code grabs most of this but I believe expects a different  
XML


On May 10, 2006, at 11:36 PM, Chris Fields wrote:

> I think you can get pretty much everything now, though I can  
> definitely see
> the use of a local database.  I ran a few tests, really unrelated  
> to this,
> using the powerscripting test page at NCBI for eutils (for the  
> curious, at
> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was  
> able to
> retrieve XML-formatted taxonomic information; here's the bacterium  
> Frankia
> sp. CcI3 TaxID info, which looks like they have everything set up  
> by rank.
> It gives quite a bit of information.
>
> <?xml version="1.0"?>
> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> <TaxaSet>
>
> <Taxon>
>   <TaxId>106370</TaxId>
>   <ScientificName>Frankia sp. CcI3</ScientificName>
>   <ParentTaxId>1854</ParentTaxId>
>   <Rank>species</Rank>
>   <Division>Bacteria</Division>
>   <GeneticCode>
>     <GCId>11</GCId>
>     <GCName>Bacterial and Plant Plastid</GCName>
>   </GeneticCode>
>   <MitoGeneticCode>
>     <MGCId>0</MGCId>
>     <MGCName>Unspecified</MGCName>
>   </MitoGeneticCode>
>   <Lineage>cellular organisms; Bacteria; Actinobacteria;  
> Actinobacteria
> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> Frankia</Lineage>
>   <LineageEx>
>     <Taxon>
>       <TaxId>131567</TaxId>
>       <ScientificName>cellular organisms</ScientificName>
>       <Rank>no rank</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>2</TaxId>
>       <ScientificName>Bacteria</ScientificName>
>       <Rank>superkingdom</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>201174</TaxId>
>       <ScientificName>Actinobacteria</ScientificName>
>       <Rank>phylum</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>1760</TaxId>
>       <ScientificName>Actinobacteria (class)</ScientificName>
>       <Rank>class</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>85003</TaxId>
>       <ScientificName>Actinobacteridae</ScientificName>
>       <Rank>subclass</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>2037</TaxId>
>       <ScientificName>Actinomycetales</ScientificName>
>       <Rank>order</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>85013</TaxId>
>       <ScientificName>Frankineae</ScientificName>
>       <Rank>suborder</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>74712</TaxId>
>       <ScientificName>Frankiaceae</ScientificName>
>       <Rank>family</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>1854</TaxId>
>       <ScientificName>Frankia</ScientificName>
>       <Rank>genus</Rank>
>     </Taxon>
>   </LineageEx>
>   <CreateDate>1999/10/22</CreateDate>
>   <UpdateDate>2005/01/19</UpdateDate>
>   <PubDate>2000/02/02</PubDate>
> </Taxon>
>
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>> Sent: Wednesday, May 10, 2006 7:54 PM
>> To: Sendu Bala
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>
>> I would use the implementation that talks to the flatfile db as the
>> standard here.  nodes are defined by the data in from taxonomy dump
>> dbs from ncbi.
>> the eutils is pretty worthless except for taxid->name or reverse, you
>> can't get the full taxonomy (or couldn't when that implementation was
>> written).
>>
>> The "name" method refers to the name of the node - each level in the
>> taxonomy can have a "name".
>>
>> The bits of hackiness relate to wrapping the node object as a
>> Bio::Species and/or being able to read  a genbank file and the
>> organism taxonomy data as a list and instantiating.  If we could rely
>> on everything being in a DB of course this would be simpler.
>>
>> Another problem is the depth of the taxonomy is not constant for
>> every node so assuming that a fixed number of slots will be filled in
>> to generate the taxonomy leads to problems.
>>
>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
>> best example of working code as this is how I really wanted it to
>> work, the Bio::Species hacks are only there to shoehorn data
>> retrieved from genbank files in.  With the flatfile implementation
>> you have to walk all the way up the db hierarchy to get the kingdom
>> for a node so you do have to build up the classification hierarchy as
>> each node only stores data about itsself.
>>
>> I'm not exactly sure what you are proposing to do, but would
>> definitely enjoy another pair of hands, I don't really have time to
>> mess with it any time soon.
>>
>> -jason
>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
>>
>>> Hi,
>>> I'm a little confused as to how names are supposed to work in
>>> Bio::Taxonomy::Node.
>>>
>>> In the bioperl versions that I've looked at a Node doesn't seem to
>>> store
>>> the most important information about itself - it's scientific name
>>> - in
>>> an obvious place. bioperl 1.5.1 puts it at the start of the
>>> classification list. I'd have thought sticking it in -name would  
>>> make
>>> more sense, but this is used only for the GenBank common name.
>>>
>>> The Bio::Taxonomy docs still suggests:
>>>
>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>>>    -names => {
>>>        'scientific' => ['sapiens'],
>>>        'common_name' => ['human']
>>>    },
>>>    -rank => 'species'  # Required tag
>>> );
>>>
>>> and whilst Bio::Taxonomy::Node does not accept -names, it does  
>>> have a
>>> 'name' method which claims to work like:
>>>
>>> $obj->name('scientific', 'sapiens');
>>>
>>> This kind of thing would be really nice, but afaics
>>> Bio::Taxonomy::Node->new takes the -name value and makes a common  
>>> name
>>> out of it, whilst the name() method passes any 'scientific' name to
>>> the
>>> scientific_name() method which is unable to set any value (and warns
>>> about this), only get.
>>>
>>> It seems like the need to have this classification array work the  
>>> same
>>> way as Bio::Species is causing some unnecessary restrictions. Can't
>>> the
>>> more sensible idea of having a dedicated storage spot for the
>>> ScientificName and other parameters be used, with the classification
>>> array either being generated just-in-time from the hash-stored
>>> data, or
>>> indeed being generated from the Lineage field?
>>>
>>>
>>> Also, why does a node store the complete hierarchy on itself in the
>>> classification array? If we're going that far, why don't the
>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
>>> only
>>> have a minimum of information, if you could simply ask a node  
>>> what its
>>> rank and scientific name was you could easily build a classification
>>> array, or ask what Kingdom your species was in etc.
>>>
>>> Are there good reasons for Taxonomy working the way it does in
>>> 1.5.1, or
>>> would I not be wasting my time re-writing things to make more sense
>>> (to me)?
>>>
>>>
>>> Cheers,
>>> Sendu.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Thu May 11 07:51:44 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 11 May 2006 12:51:44 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
	<655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
Message-ID: <44632550.3040603@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> I would use the implementation that talks to the flatfile db as the 
> standard here.  nodes are defined by the data in from taxonomy dump
> dbs from ncbi. the eutils is pretty worthless except for taxid->name
> or reverse, you can't get the full taxonomy (or couldn't when that
> implementation was written).

I'm not sure what you mean. In 1.5.1 you have access to the full
taxonomy because you're using efetch.fcgi. Indeed, you parse the full
taxonomy already to get the classification.


> The "name" method refers to the name of the node - each level in the
>  taxonomy can have a "name".

Yes, and to me the 'name of the node' is its scientific name (something
like 'sapiens'), not a 'common' name. So why is it stored as a
'common' name in the object? Why don't the DB::Taxonomy modules store
the actual common names (something like 'human')?


> The bits of hackiness relate to wrapping the node object as a 
> Bio::Species and/or being able to read  a genbank file and the
> organism taxonomy data as a list and instantiating.  If we could rely
> on everything being in a DB of course this would be simpler.

I think that Taxonomy stuff could be done in a 'pure' way, with a new
Bio::Species made as a wrapper around an appropriate Taxonomy module(s)
that cheated and made fake nodes from a genbank list and then made a
proper Bio::Taxonomy.


> With the flatfile implementation you have to walk all the way up the
> db hierarchy to get the kingdom for a node so you do have to build up
> the classification hierarchy as each node only stores data about
> itsself.

I'm still actually using bioperl 1.4 but I'm looking at 1.5.1 assuming
it is the latest available and I see that the flatfile implementation
works the same way as the entrez one. The requested node is fetched, but
then internally it walks the hierarchy purely so it can build a
classification list which is then stored on the object. If you're
already retrieving every node above the the requested node, why not just
return every node? Why not just return a whole Bio::Taxonomy?


> I'm not exactly sure what you are proposing to do, but would
> definitely enjoy another pair of hands, I don't really have time to
> mess with it any time soon.

I shouldn't really be spending any time on it either, but I knocked up a
quick implementation for myself yesterday/today. I'm working on a bunch 
of modules that inherit from bioperl and then add/alter to suit my 
needs. In this regard they're a bit limited and kind of hard-coded to my 
way of thinking, but hopefully you can see my intent and perhaps use 
some of my implementation.

In my implementation:
# DB::Taxonomy::* return a Bio::Taxonomy equivalent with a single 
database lookup.
# The Taxonomy is implicitly a tree.
# The Taxonomy can have branches of different length from root to the
same rank level.
# The Taxonomy isn't told what ranks is has (isn't limited by some
supplied rank list); it has the ranks that its Nodes have and knows
(without being told) what order those ranks should be in.
# The Taxonomy is made of Nodes that truly only contain information
about themselves and have no classification array or anything like that.
# A Node can still be classified.
# We can have Nodes of rank 'no rank' that will be correctly ordered in
the classification.
# Nodes have a scientific name and common names
# You get parent and all children nodes without database lookups.
# There is a Bio::Species like thing that wraps around this and gives
easy access to what I really want to do:

my $human = TFBS::Species->new(-common_name => 'human');
my @classification = $human->classification; # returns the array you'd
expect from a normally created, fully classified Bio::Species
my $kingdom = $human->kingdom # returns 'Metazoa'

# For genbank, we can still supply TFBS::Species a classification array

http://bix.sendu.me.uk/files/taxonomy_the_tfbs_way.tar.gz
(only tested inheriting from bioperl 1.4, but ideally that shouldn't 
make any difference!)

Is there any scope for bioperl Taxonomy becoming more like this? Or are
there problems with my design (quite likely!)? Or are there good reasons
for maintaining the current way of working? Please feel free to shoot me
down/ discuss.


Cheers,
Sendu.

From sb at mrc-dunn.cam.ac.uk  Thu May 11 08:22:53 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 11 May 2006 13:22:53 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
	<EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
Message-ID: <44632C9D.4010408@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> Great - now we just need someone to volunteer to actually work on this.

Now I'm really confused...


> The current code grabs most of this but I believe expects a different XML

No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez expects 
that XML, and parses it as fully as flatfile.pm does. Nothing more to 
do. Weren't you the person that wrote that parser?

I parse the same XML in my version of entrez.pm (see my previous email); 
the main difference being I make Nodes out of each Taxon instead of just 
adding each Taxon's ScientificName to the classification array.

From jason.stajich at duke.edu  Thu May 11 09:53:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 09:53:56 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <44632C9D.4010408@mrc-dunn.cam.ac.uk>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
	<EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
	<44632C9D.4010408@mrc-dunn.cam.ac.uk>
Message-ID: <AAFFC5EC-8B54-4D87-BE38-CB90785AD4B5@duke.edu>

i guess so - long since forgotten what it supports though since I  
don't regularly use it. sorry.

On May 11, 2006, at 8:22 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>> Great - now we just need someone to volunteer to actually work on  
>> this.
>
> Now I'm really confused...
>
>
>> The current code grabs most of this but I believe expects a  
>> different XML
>
> No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez  
> expects
> that XML, and parses it as fully as flatfile.pm does. Nothing more to
> do. Weren't you the person that wrote that parser?
>
> I parse the same XML in my version of entrez.pm (see my previous  
> email);
> the main difference being I make Nodes out of each Taxon instead of  
> just
> adding each Taxon's ScientificName to the classification array.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Thu May 11 10:57:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 11 May 2006 09:57:20 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
Message-ID: <000b01c6750b$33e95ea0$15327e82@pyrimidine>

Heh... 

To tell the truth, I haven't looked at Bio::DB::Taxonomy in any depth yet,
but I myself have seen issues with the way Bio::Species treats bacterial
strains (I guess this also involves Bio::Taxonomy::Node since that's what
Bio::Species delegates to).  Seems it likes to repeat some strain names when
using $seq->species->common_name.  Not a killer problem but annoying since
the correct name is in the source tag in the feature table!  I 'could' take
a look at it but I can't guarantee quick results.

Jason, I could add Taxonomy to the EUtilities overhaul I mentioned to you
previously but it'll take awhile to get going.  I'm really more interested
in getting epost-esearch-efetch sequence retrieval up and running first with
the same API as Bio::DB::GenBank/Genpept and Bio::DB::Query::GenBank, donate
the code (late summer/fall???) after working out namespace issues so it
doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I suppose I
could also look at Bio::DB:Taxonomy to see what's up in the next couple of
weeks (after conference), unless someone gets to it sooner.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Thursday, May 11, 2006 7:05 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> 
> Great - now we just need someone to volunteer to actually work on this.
> 
> The current code grabs most of this but I believe expects a different
> XML
> 
> 
> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
> 
> > I think you can get pretty much everything now, though I can
> > definitely see
> > the use of a local database.  I ran a few tests, really unrelated
> > to this,
> > using the powerscripting test page at NCBI for eutils (for the
> > curious, at
> > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
> > able to
> > retrieve XML-formatted taxonomic information; here's the bacterium
> > Frankia
> > sp. CcI3 TaxID info, which looks like they have everything set up
> > by rank.
> > It gives quite a bit of information.
> >
> > <?xml version="1.0"?>
> > <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> > <TaxaSet>
> >
> > <Taxon>
> >   <TaxId>106370</TaxId>
> >   <ScientificName>Frankia sp. CcI3</ScientificName>
> >   <ParentTaxId>1854</ParentTaxId>
> >   <Rank>species</Rank>
> >   <Division>Bacteria</Division>
> >   <GeneticCode>
> >     <GCId>11</GCId>
> >     <GCName>Bacterial and Plant Plastid</GCName>
> >   </GeneticCode>
> >   <MitoGeneticCode>
> >     <MGCId>0</MGCId>
> >     <MGCName>Unspecified</MGCName>
> >   </MitoGeneticCode>
> >   <Lineage>cellular organisms; Bacteria; Actinobacteria;
> > Actinobacteria
> > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> > Frankia</Lineage>
> >   <LineageEx>
> >     <Taxon>
> >       <TaxId>131567</TaxId>
> >       <ScientificName>cellular organisms</ScientificName>
> >       <Rank>no rank</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>2</TaxId>
> >       <ScientificName>Bacteria</ScientificName>
> >       <Rank>superkingdom</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>201174</TaxId>
> >       <ScientificName>Actinobacteria</ScientificName>
> >       <Rank>phylum</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>1760</TaxId>
> >       <ScientificName>Actinobacteria (class)</ScientificName>
> >       <Rank>class</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>85003</TaxId>
> >       <ScientificName>Actinobacteridae</ScientificName>
> >       <Rank>subclass</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>2037</TaxId>
> >       <ScientificName>Actinomycetales</ScientificName>
> >       <Rank>order</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>85013</TaxId>
> >       <ScientificName>Frankineae</ScientificName>
> >       <Rank>suborder</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>74712</TaxId>
> >       <ScientificName>Frankiaceae</ScientificName>
> >       <Rank>family</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>1854</TaxId>
> >       <ScientificName>Frankia</ScientificName>
> >       <Rank>genus</Rank>
> >     </Taxon>
> >   </LineageEx>
> >   <CreateDate>1999/10/22</CreateDate>
> >   <UpdateDate>2005/01/19</UpdateDate>
> >   <PubDate>2000/02/02</PubDate>
> > </Taxon>
> >
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >> Sent: Wednesday, May 10, 2006 7:54 PM
> >> To: Sendu Bala
> >> Cc: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>
> >> I would use the implementation that talks to the flatfile db as the
> >> standard here.  nodes are defined by the data in from taxonomy dump
> >> dbs from ncbi.
> >> the eutils is pretty worthless except for taxid->name or reverse, you
> >> can't get the full taxonomy (or couldn't when that implementation was
> >> written).
> >>
> >> The "name" method refers to the name of the node - each level in the
> >> taxonomy can have a "name".
> >>
> >> The bits of hackiness relate to wrapping the node object as a
> >> Bio::Species and/or being able to read  a genbank file and the
> >> organism taxonomy data as a list and instantiating.  If we could rely
> >> on everything being in a DB of course this would be simpler.
> >>
> >> Another problem is the depth of the taxonomy is not constant for
> >> every node so assuming that a fixed number of slots will be filled in
> >> to generate the taxonomy leads to problems.
> >>
> >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
> >> best example of working code as this is how I really wanted it to
> >> work, the Bio::Species hacks are only there to shoehorn data
> >> retrieved from genbank files in.  With the flatfile implementation
> >> you have to walk all the way up the db hierarchy to get the kingdom
> >> for a node so you do have to build up the classification hierarchy as
> >> each node only stores data about itsself.
> >>
> >> I'm not exactly sure what you are proposing to do, but would
> >> definitely enjoy another pair of hands, I don't really have time to
> >> mess with it any time soon.
> >>
> >> -jason
> >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> >>
> >>> Hi,
> >>> I'm a little confused as to how names are supposed to work in
> >>> Bio::Taxonomy::Node.
> >>>
> >>> In the bioperl versions that I've looked at a Node doesn't seem to
> >>> store
> >>> the most important information about itself - it's scientific name
> >>> - in
> >>> an obvious place. bioperl 1.5.1 puts it at the start of the
> >>> classification list. I'd have thought sticking it in -name would
> >>> make
> >>> more sense, but this is used only for the GenBank common name.
> >>>
> >>> The Bio::Taxonomy docs still suggests:
> >>>
> >>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >>>    -names => {
> >>>        'scientific' => ['sapiens'],
> >>>        'common_name' => ['human']
> >>>    },
> >>>    -rank => 'species'  # Required tag
> >>> );
> >>>
> >>> and whilst Bio::Taxonomy::Node does not accept -names, it does
> >>> have a
> >>> 'name' method which claims to work like:
> >>>
> >>> $obj->name('scientific', 'sapiens');
> >>>
> >>> This kind of thing would be really nice, but afaics
> >>> Bio::Taxonomy::Node->new takes the -name value and makes a common
> >>> name
> >>> out of it, whilst the name() method passes any 'scientific' name to
> >>> the
> >>> scientific_name() method which is unable to set any value (and warns
> >>> about this), only get.
> >>>
> >>> It seems like the need to have this classification array work the
> >>> same
> >>> way as Bio::Species is causing some unnecessary restrictions. Can't
> >>> the
> >>> more sensible idea of having a dedicated storage spot for the
> >>> ScientificName and other parameters be used, with the classification
> >>> array either being generated just-in-time from the hash-stored
> >>> data, or
> >>> indeed being generated from the Lineage field?
> >>>
> >>>
> >>> Also, why does a node store the complete hierarchy on itself in the
> >>> classification array? If we're going that far, why don't the
> >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> >>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> >>> only
> >>> have a minimum of information, if you could simply ask a node
> >>> what its
> >>> rank and scientific name was you could easily build a classification
> >>> array, or ask what Kingdom your species was in etc.
> >>>
> >>> Are there good reasons for Taxonomy working the way it does in
> >>> 1.5.1, or
> >>> would I not be wasting my time re-writing things to make more sense
> >>> (to me)?
> >>>
> >>>
> >>> Cheers,
> >>> Sendu.
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 11 11:42:07 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 11:42:07 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <000b01c6750b$33e95ea0$15327e82@pyrimidine>
References: <000b01c6750b$33e95ea0$15327e82@pyrimidine>
Message-ID: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu>


I think you'll see it is different and mostly a limitation of the  
genbank format and the Bio::Species objects that you get from a  
genbank parse do represent the full capabilities of a Taxonomy::Node.

I am happy for someone to overhaul things, but it all boils down to  
inferring which part of a list of names is the species versus sub- 
species versus strain when none of the members of the list are  
labeled.  This is some of the same problems we have for swissprot as  
well.  I just don't think we can do it right only from the genbank  
file data so I don't see a lot of point of expecting Bio::Species to  
provide more than a representation of what is in the file and just  
return that array.


It has seemed like we need to special case things pretty heavily or  
do a lookup in the taxonomydb for something.

Can you guess what value is the strain versus sub-species?  What  
happens when there is a two part strain name (space separated) and a  
sub-species or variety designation?

SOURCE      Staphylococcus haemolyticus JCSC1435
   ORGANISM  Staphylococcus haemolyticus JCSC1435
             Bacteria; Firmicutes; Bacillales; Staphylococcus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
strain is JCSC1435

versus
SOURCE      Muntiacus muntjak vaginalis
   ORGANISM  Muntiacus muntjak vaginalis
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;  
Ruminantia;
             Pecora; Cervidae; Muntiacinae; Muntiacus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
species is muntjak, sub-species vaginalis ?

versus
SOURCE      Aspergillus nidulans FGSC A4
   ORGANISM  Aspergillus nidulans FGSC A4
             Eukaryota; Fungi; Ascomycota; Pezizomycotina;  
Eurotiomycetes;
             Eurotiales; Trichocomaceae; Emericella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321

Genus should be Aspergillus or Emericella ?

Strain and subspecies/variety in the same entry
SOURCE      Cryptococcus neoformans var. grubii H99
   ORGANISM  Cryptococcus neoformans var. grubii H99
             Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
             Heterobasidiomycetes; Tremellomycetidae; Tremellales;  
Tremellaceae;
             Filobasidiella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443


On May 11, 2006, at 10:57 AM, Chris Fields wrote:

> Heh...
>
> To tell the truth, I haven't looked at Bio::DB::Taxonomy in any  
> depth yet,
> but I myself have seen issues with the way Bio::Species treats  
> bacterial
> strains (I guess this also involves Bio::Taxonomy::Node since  
> that's what
> Bio::Species delegates to).  Seems it likes to repeat some strain  
> names when
> using $seq->species->common_name.  Not a killer problem but  
> annoying since
> the correct name is in the source tag in the feature table!  I  
> 'could' take
> a look at it but I can't guarantee quick results.
>
> Jason, I could add Taxonomy to the EUtilities overhaul I mentioned  
> to you
> previously but it'll take awhile to get going.  I'm really more  
> interested
> in getting epost-esearch-efetch sequence retrieval up and running  
> first with
> the same API as Bio::DB::GenBank/Genpept and  
> Bio::DB::Query::GenBank, donate
> the code (late summer/fall???) after working out namespace issues  
> so it
> doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I  
> suppose I
> could also look at Bio::DB:Taxonomy to see what's up in the next  
> couple of
> weeks (after conference), unless someone gets to it sooner.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>> Sent: Thursday, May 11, 2006 7:05 AM
>> To: Chris Fields
>> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>
>> Great - now we just need someone to volunteer to actually work on  
>> this.
>>
>> The current code grabs most of this but I believe expects a different
>> XML
>>
>>
>> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
>>
>>> I think you can get pretty much everything now, though I can
>>> definitely see
>>> the use of a local database.  I ran a few tests, really unrelated
>>> to this,
>>> using the powerscripting test page at NCBI for eutils (for the
>>> curious, at
>>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
>>> able to
>>> retrieve XML-formatted taxonomic information; here's the bacterium
>>> Frankia
>>> sp. CcI3 TaxID info, which looks like they have everything set up
>>> by rank.
>>> It gives quite a bit of information.
>>>
>>> <?xml version="1.0"?>
>>> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
>>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
>>> <TaxaSet>
>>>
>>> <Taxon>
>>>   <TaxId>106370</TaxId>
>>>   <ScientificName>Frankia sp. CcI3</ScientificName>
>>>   <ParentTaxId>1854</ParentTaxId>
>>>   <Rank>species</Rank>
>>>   <Division>Bacteria</Division>
>>>   <GeneticCode>
>>>     <GCId>11</GCId>
>>>     <GCName>Bacterial and Plant Plastid</GCName>
>>>   </GeneticCode>
>>>   <MitoGeneticCode>
>>>     <MGCId>0</MGCId>
>>>     <MGCName>Unspecified</MGCName>
>>>   </MitoGeneticCode>
>>>   <Lineage>cellular organisms; Bacteria; Actinobacteria;
>>> Actinobacteria
>>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
>>> Frankia</Lineage>
>>>   <LineageEx>
>>>     <Taxon>
>>>       <TaxId>131567</TaxId>
>>>       <ScientificName>cellular organisms</ScientificName>
>>>       <Rank>no rank</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>2</TaxId>
>>>       <ScientificName>Bacteria</ScientificName>
>>>       <Rank>superkingdom</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>201174</TaxId>
>>>       <ScientificName>Actinobacteria</ScientificName>
>>>       <Rank>phylum</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>1760</TaxId>
>>>       <ScientificName>Actinobacteria (class)</ScientificName>
>>>       <Rank>class</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>85003</TaxId>
>>>       <ScientificName>Actinobacteridae</ScientificName>
>>>       <Rank>subclass</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>2037</TaxId>
>>>       <ScientificName>Actinomycetales</ScientificName>
>>>       <Rank>order</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>85013</TaxId>
>>>       <ScientificName>Frankineae</ScientificName>
>>>       <Rank>suborder</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>74712</TaxId>
>>>       <ScientificName>Frankiaceae</ScientificName>
>>>       <Rank>family</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>1854</TaxId>
>>>       <ScientificName>Frankia</ScientificName>
>>>       <Rank>genus</Rank>
>>>     </Taxon>
>>>   </LineageEx>
>>>   <CreateDate>1999/10/22</CreateDate>
>>>   <UpdateDate>2005/01/19</UpdateDate>
>>>   <PubDate>2000/02/02</PubDate>
>>> </Taxon>
>>>
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>>>> Sent: Wednesday, May 10, 2006 7:54 PM
>>>> To: Sendu Bala
>>>> Cc: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>>>
>>>> I would use the implementation that talks to the flatfile db as the
>>>> standard here.  nodes are defined by the data in from taxonomy dump
>>>> dbs from ncbi.
>>>> the eutils is pretty worthless except for taxid->name or  
>>>> reverse, you
>>>> can't get the full taxonomy (or couldn't when that  
>>>> implementation was
>>>> written).
>>>>
>>>> The "name" method refers to the name of the node - each level in  
>>>> the
>>>> taxonomy can have a "name".
>>>>
>>>> The bits of hackiness relate to wrapping the node object as a
>>>> Bio::Species and/or being able to read  a genbank file and the
>>>> organism taxonomy data as a list and instantiating.  If we could  
>>>> rely
>>>> on everything being in a DB of course this would be simpler.
>>>>
>>>> Another problem is the depth of the taxonomy is not constant for
>>>> every node so assuming that a fixed number of slots will be  
>>>> filled in
>>>> to generate the taxonomy leads to problems.
>>>>
>>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as  
>>>> the
>>>> best example of working code as this is how I really wanted it to
>>>> work, the Bio::Species hacks are only there to shoehorn data
>>>> retrieved from genbank files in.  With the flatfile implementation
>>>> you have to walk all the way up the db hierarchy to get the kingdom
>>>> for a node so you do have to build up the classification  
>>>> hierarchy as
>>>> each node only stores data about itsself.
>>>>
>>>> I'm not exactly sure what you are proposing to do, but would
>>>> definitely enjoy another pair of hands, I don't really have time to
>>>> mess with it any time soon.
>>>>
>>>> -jason
>>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
>>>>
>>>>> Hi,
>>>>> I'm a little confused as to how names are supposed to work in
>>>>> Bio::Taxonomy::Node.
>>>>>
>>>>> In the bioperl versions that I've looked at a Node doesn't seem to
>>>>> store
>>>>> the most important information about itself - it's scientific name
>>>>> - in
>>>>> an obvious place. bioperl 1.5.1 puts it at the start of the
>>>>> classification list. I'd have thought sticking it in -name would
>>>>> make
>>>>> more sense, but this is used only for the GenBank common name.
>>>>>
>>>>> The Bio::Taxonomy docs still suggests:
>>>>>
>>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>>>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>>>>>    -names => {
>>>>>        'scientific' => ['sapiens'],
>>>>>        'common_name' => ['human']
>>>>>    },
>>>>>    -rank => 'species'  # Required tag
>>>>> );
>>>>>
>>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does
>>>>> have a
>>>>> 'name' method which claims to work like:
>>>>>
>>>>> $obj->name('scientific', 'sapiens');
>>>>>
>>>>> This kind of thing would be really nice, but afaics
>>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common
>>>>> name
>>>>> out of it, whilst the name() method passes any 'scientific'  
>>>>> name to
>>>>> the
>>>>> scientific_name() method which is unable to set any value (and  
>>>>> warns
>>>>> about this), only get.
>>>>>
>>>>> It seems like the need to have this classification array work the
>>>>> same
>>>>> way as Bio::Species is causing some unnecessary restrictions.  
>>>>> Can't
>>>>> the
>>>>> more sensible idea of having a dedicated storage spot for the
>>>>> ScientificName and other parameters be used, with the  
>>>>> classification
>>>>> array either being generated just-in-time from the hash-stored
>>>>> data, or
>>>>> indeed being generated from the Lineage field?
>>>>>
>>>>>
>>>>> Also, why does a node store the complete hierarchy on itself in  
>>>>> the
>>>>> classification array? If we're going that far, why don't the
>>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just  
>>>>> have a
>>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
>>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
>>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
>>>>> only
>>>>> have a minimum of information, if you could simply ask a node
>>>>> what its
>>>>> rank and scientific name was you could easily build a  
>>>>> classification
>>>>> array, or ask what Kingdom your species was in etc.
>>>>>
>>>>> Are there good reasons for Taxonomy working the way it does in
>>>>> 1.5.1, or
>>>>> would I not be wasting my time re-writing things to make more  
>>>>> sense
>>>>> (to me)?
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Sendu.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From WiersmaP at AGR.GC.CA  Thu May 11 13:04:01 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Thu, 11 May 2006 13:04:01 -0400
Subject: [Bioperl-l] What is the relationship between primer3
	moduleandrun-primer3 module?
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca>

The bug that Wenwu referred should only occur when reading a Primer3 output file;  the Bio::Tools::Run::Primer3->run method takes the results and directly transfers them to a Bio::Tools::Primer3 object without an intermediate file.  A Data::Dumper look at the Bio::Tools::Primer3 object shows the keys and results for PRIMER_SEQUENCE_ID and SEQUENCE in 'results' and then again in the 'results_by_number' hash but only in the '0' hash.

All of this doesn't really matter for Li's original concern.  If you want to include the id of sequence along with the primer3 results just take it from the seq object (i.e. $seq->display_id() ).  Since you are in a loop taking one sequence at a time this $seq will be the one that was sent to primer3.

PAW

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Cui, Wenwu (NIH/NCI) [F]
Sent: Wednesday, May 10, 2006 6:46 PM
To: chen li; bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module?

1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file.
 
2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output 
 
3. primer3.exe is called in the Bio::Tools::Run::Primer3  "run" function, please read the function definition.


From cjfields at uiuc.edu  Thu May 11 13:16:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 11 May 2006 12:16:19 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu>
Message-ID: <000f01c6751e$9e89d6a0$15327e82@pyrimidine>

> I think you'll see it is different and mostly a limitation of the
> genbank format and the Bio::Species objects that you get from a
> genbank parse do represent the full capabilities of a Taxonomy::Node.

I definitely see the rational for using a TaxID lookup (I think Hilmar said
so as well), especially for local databases.  I wonder, though, if there is
a way that RichSeqs like GenBank, when passed through SeqIO, can be just be
'short-circuited' using the sequence builder to just accept what's on the
SOURCE or ORGANISM line of a file as is, without forcing it into
Bio::Species/Bio::Taxonomy::Node.  Or maybe diminish the role of the
SOURCE/ORGANISM lines altogether to just simple Annotation objects and place
much greater emphasis on the TaxID itself, in effect decoupling the TaxID
(taxonomic information) from SOURCE/ORGANISM (annotation information).

In other words, have GenBank/EMBL classification lines and organism lines
essentially stay like they are in the input file (use simple objects).
Then, if one were really intent on getting the full name, classification,
etc., or one wanted to store their sequences in bioperl-db, they would be
required to either have a local db of NCBI Taxonomy or remote access to a
similar database (NCBI or something else) so a lookup could be accomplished
using the TaxID.  If they us BioSQL, then require them to preload their
BioSQL database with NCBI's taxonomy, something Hilmar already strongly
suggests.

If anyone isn't interested in the taxonomic information or doesn't want to
bother grabbing the database or setting up remote access, tough luck; just
grab the Bio::Annotation/Bio::Species object and use that.  As the saying
goes, "you can't be all things to all people."  At some point you have to
throw your arms in the air, do the best you can, but give up trying to
please everyone.

> I am happy for someone to overhaul things, but it all boils down to
> inferring which part of a list of names is the species versus sub-
> species versus strain when none of the members of the list are
> labeled.  This is some of the same problems we have for swissprot as
> well.  I just don't think we can do it right only from the genbank
> file data so I don't see a lot of point of expecting Bio::Species to
> provide more than a representation of what is in the file and just
> return that array.
> 
> 
> It has seemed like we need to special case things pretty heavily or
> do a lookup in the taxonomydb for something.
> 
> Can you guess what value is the strain versus sub-species?  What
> happens when there is a two part strain name (space separated) and a
> sub-species or variety designation?
> 
> SOURCE      Staphylococcus haemolyticus JCSC1435
>    ORGANISM  Staphylococcus haemolyticus JCSC1435
>              Bacteria; Firmicutes; Bacillales; Staphylococcus.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
> strain is JCSC1435
> 
> versus
> SOURCE      Muntiacus muntjak vaginalis
>    ORGANISM  Muntiacus muntjak vaginalis
>              Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
>              Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;
> Ruminantia;
>              Pecora; Cervidae; Muntiacinae; Muntiacus.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
> species is muntjak, sub-species vaginalis ?
> 
> versus
> SOURCE      Aspergillus nidulans FGSC A4
>    ORGANISM  Aspergillus nidulans FGSC A4
>              Eukaryota; Fungi; Ascomycota; Pezizomycotina;
> Eurotiomycetes;
>              Eurotiales; Trichocomaceae; Emericella.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321
> 
> Genus should be Aspergillus or Emericella ?
> 
> Strain and subspecies/variety in the same entry
> SOURCE      Cryptococcus neoformans var. grubii H99
>    ORGANISM  Cryptococcus neoformans var. grubii H99
>              Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
>              Heterobasidiomycetes; Tremellomycetidae; Tremellales;
> Tremellaceae;
>              Filobasidiella.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443

Definitely tricky!  This really points out the problem here.  It used to be
a problem for only a few cases but with so many bacterial and fungal genomes
that's changed.  

The Frankia XML example has the scientific name set to "Frankia sp. CcI3",
which matches the SOURCE/ORGANISM line in NCBI's GenBank files and the OS
line in EMBL files.  It looks like the lines are parsed into and then built
from the ground-up in Bio::SeqIO::genbank using Bio::Species objects, which,
in my case with the strain designation, is where the problem lies.  They
could be placed in annotation objects with (-tagname=> 'SOURCE', value
=>'Frankia sp. CcI3') or similar settings.  Or simplify Bio::Species to only
represent the information in the GenBank SOURCE/ORGANISM/CLASSIFICATION or
EMBL OS/OC lines and nothing more complex than that (no complex taxonomy;
for that you use the TaxID and local database). 

Okay,  I need to lay off the coffee now...

Chris

> On May 11, 2006, at 10:57 AM, Chris Fields wrote:
> 
> > Heh...
> >
> > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any
> > depth yet,
> > but I myself have seen issues with the way Bio::Species treats
> > bacterial
> > strains (I guess this also involves Bio::Taxonomy::Node since
> > that's what
> > Bio::Species delegates to).  Seems it likes to repeat some strain
> > names when
> > using $seq->species->common_name.  Not a killer problem but
> > annoying since
> > the correct name is in the source tag in the feature table!  I
> > 'could' take
> > a look at it but I can't guarantee quick results.
> >
> > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned
> > to you
> > previously but it'll take awhile to get going.  I'm really more
> > interested
> > in getting epost-esearch-efetch sequence retrieval up and running
> > first with
> > the same API as Bio::DB::GenBank/Genpept and
> > Bio::DB::Query::GenBank, donate
> > the code (late summer/fall???) after working out namespace issues
> > so it
> > doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I
> > suppose I
> > could also look at Bio::DB:Taxonomy to see what's up in the next
> > couple of
> > weeks (after conference), unless someone gets to it sooner.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >> Sent: Thursday, May 11, 2006 7:05 AM
> >> To: Chris Fields
> >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
> >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>
> >> Great - now we just need someone to volunteer to actually work on
> >> this.
> >>
> >> The current code grabs most of this but I believe expects a different
> >> XML
> >>
> >>
> >> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
> >>
> >>> I think you can get pretty much everything now, though I can
> >>> definitely see
> >>> the use of a local database.  I ran a few tests, really unrelated
> >>> to this,
> >>> using the powerscripting test page at NCBI for eutils (for the
> >>> curious, at
> >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
> >>> able to
> >>> retrieve XML-formatted taxonomic information; here's the bacterium
> >>> Frankia
> >>> sp. CcI3 TaxID info, which looks like they have everything set up
> >>> by rank.
> >>> It gives quite a bit of information.
> >>>
> >>> <?xml version="1.0"?>
> >>> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> >>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> >>> <TaxaSet>
> >>>
> >>> <Taxon>
> >>>   <TaxId>106370</TaxId>
> >>>   <ScientificName>Frankia sp. CcI3</ScientificName>
> >>>   <ParentTaxId>1854</ParentTaxId>
> >>>   <Rank>species</Rank>
> >>>   <Division>Bacteria</Division>
> >>>   <GeneticCode>
> >>>     <GCId>11</GCId>
> >>>     <GCName>Bacterial and Plant Plastid</GCName>
> >>>   </GeneticCode>
> >>>   <MitoGeneticCode>
> >>>     <MGCId>0</MGCId>
> >>>     <MGCName>Unspecified</MGCName>
> >>>   </MitoGeneticCode>
> >>>   <Lineage>cellular organisms; Bacteria; Actinobacteria;
> >>> Actinobacteria
> >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> >>> Frankia</Lineage>
> >>>   <LineageEx>
> >>>     <Taxon>
> >>>       <TaxId>131567</TaxId>
> >>>       <ScientificName>cellular organisms</ScientificName>
> >>>       <Rank>no rank</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>2</TaxId>
> >>>       <ScientificName>Bacteria</ScientificName>
> >>>       <Rank>superkingdom</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>201174</TaxId>
> >>>       <ScientificName>Actinobacteria</ScientificName>
> >>>       <Rank>phylum</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>1760</TaxId>
> >>>       <ScientificName>Actinobacteria (class)</ScientificName>
> >>>       <Rank>class</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>85003</TaxId>
> >>>       <ScientificName>Actinobacteridae</ScientificName>
> >>>       <Rank>subclass</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>2037</TaxId>
> >>>       <ScientificName>Actinomycetales</ScientificName>
> >>>       <Rank>order</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>85013</TaxId>
> >>>       <ScientificName>Frankineae</ScientificName>
> >>>       <Rank>suborder</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>74712</TaxId>
> >>>       <ScientificName>Frankiaceae</ScientificName>
> >>>       <Rank>family</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>1854</TaxId>
> >>>       <ScientificName>Frankia</ScientificName>
> >>>       <Rank>genus</Rank>
> >>>     </Taxon>
> >>>   </LineageEx>
> >>>   <CreateDate>1999/10/22</CreateDate>
> >>>   <UpdateDate>2005/01/19</UpdateDate>
> >>>   <PubDate>2000/02/02</PubDate>
> >>> </Taxon>
> >>>
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >>>> Sent: Wednesday, May 10, 2006 7:54 PM
> >>>> To: Sendu Bala
> >>>> Cc: bioperl-l at lists.open-bio.org
> >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>>>
> >>>> I would use the implementation that talks to the flatfile db as the
> >>>> standard here.  nodes are defined by the data in from taxonomy dump
> >>>> dbs from ncbi.
> >>>> the eutils is pretty worthless except for taxid->name or
> >>>> reverse, you
> >>>> can't get the full taxonomy (or couldn't when that
> >>>> implementation was
> >>>> written).
> >>>>
> >>>> The "name" method refers to the name of the node - each level in
> >>>> the
> >>>> taxonomy can have a "name".
> >>>>
> >>>> The bits of hackiness relate to wrapping the node object as a
> >>>> Bio::Species and/or being able to read  a genbank file and the
> >>>> organism taxonomy data as a list and instantiating.  If we could
> >>>> rely
> >>>> on everything being in a DB of course this would be simpler.
> >>>>
> >>>> Another problem is the depth of the taxonomy is not constant for
> >>>> every node so assuming that a fixed number of slots will be
> >>>> filled in
> >>>> to generate the taxonomy leads to problems.
> >>>>
> >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as
> >>>> the
> >>>> best example of working code as this is how I really wanted it to
> >>>> work, the Bio::Species hacks are only there to shoehorn data
> >>>> retrieved from genbank files in.  With the flatfile implementation
> >>>> you have to walk all the way up the db hierarchy to get the kingdom
> >>>> for a node so you do have to build up the classification
> >>>> hierarchy as
> >>>> each node only stores data about itsself.
> >>>>
> >>>> I'm not exactly sure what you are proposing to do, but would
> >>>> definitely enjoy another pair of hands, I don't really have time to
> >>>> mess with it any time soon.
> >>>>
> >>>> -jason
> >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> >>>>
> >>>>> Hi,
> >>>>> I'm a little confused as to how names are supposed to work in
> >>>>> Bio::Taxonomy::Node.
> >>>>>
> >>>>> In the bioperl versions that I've looked at a Node doesn't seem to
> >>>>> store
> >>>>> the most important information about itself - it's scientific name
> >>>>> - in
> >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the
> >>>>> classification list. I'd have thought sticking it in -name would
> >>>>> make
> >>>>> more sense, but this is used only for the GenBank common name.
> >>>>>
> >>>>> The Bio::Taxonomy docs still suggests:
> >>>>>
> >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >>>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >>>>>    -names => {
> >>>>>        'scientific' => ['sapiens'],
> >>>>>        'common_name' => ['human']
> >>>>>    },
> >>>>>    -rank => 'species'  # Required tag
> >>>>> );
> >>>>>
> >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does
> >>>>> have a
> >>>>> 'name' method which claims to work like:
> >>>>>
> >>>>> $obj->name('scientific', 'sapiens');
> >>>>>
> >>>>> This kind of thing would be really nice, but afaics
> >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common
> >>>>> name
> >>>>> out of it, whilst the name() method passes any 'scientific'
> >>>>> name to
> >>>>> the
> >>>>> scientific_name() method which is unable to set any value (and
> >>>>> warns
> >>>>> about this), only get.
> >>>>>
> >>>>> It seems like the need to have this classification array work the
> >>>>> same
> >>>>> way as Bio::Species is causing some unnecessary restrictions.
> >>>>> Can't
> >>>>> the
> >>>>> more sensible idea of having a dedicated storage spot for the
> >>>>> ScientificName and other parameters be used, with the
> >>>>> classification
> >>>>> array either being generated just-in-time from the hash-stored
> >>>>> data, or
> >>>>> indeed being generated from the Lineage field?
> >>>>>
> >>>>>
> >>>>> Also, why does a node store the complete hierarchy on itself in
> >>>>> the
> >>>>> classification array? If we're going that far, why don't the
> >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just
> >>>>> have a
> >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> >>>>> only
> >>>>> have a minimum of information, if you could simply ask a node
> >>>>> what its
> >>>>> rank and scientific name was you could easily build a
> >>>>> classification
> >>>>> array, or ask what Kingdom your species was in etc.
> >>>>>
> >>>>> Are there good reasons for Taxonomy working the way it does in
> >>>>> 1.5.1, or
> >>>>> would I not be wasting my time re-writing things to make more
> >>>>> sense
> >>>>> (to me)?
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>> Sendu.
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> Jason Stajich
> >>>> Duke University
> >>>> http://www.duke.edu/~jes12
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From WiersmaP at AGR.GC.CA  Thu May 11 20:13:12 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Thu, 11 May 2006 20:13:12 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module
	andrun-primer3 module?
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C52@onncrxms5.agr.gc.ca>

Li,

If you are only "a little confused" by the OO concepts in the primer3 modules than you are doing well.

To expand a little on Wenwu's explanations.  A Bio::Tools::Run:Primer3 object is a "wrapper" around the Primer3 program. All the commands and parameters that Primer3 needs for it to run are collected inside the object.  This includes a sequence (which you must supply as a sequence object) and parameters (most of which are already supplied by default but can be changed using the $primer3_object->add_targets method). Then, when everything is set the way you want it you 'run' the Primer3 program by using $primer3_object->run.  The "wrapper" collects all the run parameters and sends them off to the Primer3 executable.  Primer3 does the analysis and outputs the results to "stdout" in boulder-io format.  By redirecting the output (i.e. perl p3run_script.pl > out.txt) you will get the Primer3 output directly in the  boulder-io format ('tag'='value') stored in out.txt.  Because out.txt is not being closed between each sequence called in the script you get all of the results concatenated in out.txt.  However, if you supplied an output filename (-outfile=>$file_out) in the "wrapper", each line of output from Primer3 will be written to $file_out and at the end of Primer3 output the file will be closed.  Now if your script loops to another sequence it will open the same outfile again and overwrite.  

One last important detail for the "wrapper" object.  When Primer3 is executed the $primer3_object is designed to return a Bio::Tools::Primer3 object (the code is: my $results_object = $primer3_object->run).  $results_object is a Bio::Tools::Primer3 object and contains the results of your Primer3 run as well as having methods for getting at that information.  This includes finding out how many primer sets were found and the means to access the primer set results one at a time.  It does work as advertised.  Because all of the primer sets are based on the same sequence, Primer3 only outputs the SEQUENCE and PRIMER_SEQUENCE_ID one time instead of for each primer set.  That is why they only show up in $results_object as if they belonged with the first primer set (set '0') and they are not available for the other primer sets.

PAW

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
Sent: Wednesday, May 10, 2006 5:28 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module?

First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual 
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3 

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run; 

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li 

 
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Fri May 12 00:29:37 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 12 May 2006 14:29:37 +1000
Subject: [Bioperl-l] Using bioperl to convert gene predictions to gff
In-Reply-To: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin>
References: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin>
Message-ID: <44640F31.6090702@infotech.monash.edu.au>

Mark,

> I'd like to reformat gene predictions from several different programs
> (genscan, glimmerhmm, fgenesh) to gff format. I know bioperl can parse the
> output from these and other predictors and that it can export into GFF. But
> I'm not clear on how to string the two together.
> Can anyone point me at any example code?

The parser module for the gene predictions generally allow you to 
iterate through the predicted genes. Each prediction is usually returned 
as a Bio::SeqFeatureI-derived object. Those objects have a gff_string() 
method to print them as GFF.

So something as simple as this *may* work:

use Bio::Tools::Glimmer;
my $parser = new Bio::Tools::Glimmer(-file => 'glimmer.out');
while(my $gene = $parser->next_prediction) {
   print $gene->gff_string;
}

If you want separate GFF lines for each exon, you'll have to do another 
loop over $gene->exons() etc each of which are luckily also 
Bio::SeqFeatures!

Or if want to modify some of the GFF columns first, eg. the source tag, 
just do $gene->source_tag('mynewtag') before printing it.

Hope this helps,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From torsten.seemann at infotech.monash.edu.au  Fri May 12 00:36:46 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 12 May 2006 14:36:46 +1000
Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making
	with	Bio::Graphics::Panel
In-Reply-To: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>
References: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>
Message-ID: <446410DE.7070305@infotech.monash.edu.au>

Kevin,

> I want to create an imagemap of short sequence matches with a longer one
> with clickable imagemaps for the short sequences. I figure I can do this
> easily enough using the example script for parsing blast output but I need
> an example script to understand how to produce the html code for the
> imagemap. I can find only rather cryptic references about how this can be
> done (see below).

The "blastGraphic" project probably has Perl code that could help you.

	http://www.gmod.org/blastGraphic.shtml

It is/was part of the GMOD project.
It produces pretty clickable image maps from BLAST reports.

Hope it helps,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From brianjgilmartin at hotmail.com  Fri May 12 05:29:15 2006
From: brianjgilmartin at hotmail.com (brian gilmartin)
Date: Fri, 12 May 2006 10:29:15 +0100
Subject: [Bioperl-l] (no subject)
Message-ID: <BAY107-F354AD036A551D290A1874CBCAC0@phx.gbl>

please remove me from the list

_________________________________________________________________
Be the first to hear what's new at MSN - sign up to our free newsletters! 
http://www.msn.co.uk/newsletters


From sb at mrc-dunn.cam.ac.uk  Fri May 12 06:24:39 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 12 May 2006 11:24:39 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
Message-ID: <44646267.2000802@mrc-dunn.cam.ac.uk>

In bioperl up to at least 1.5.1, when one of the database modules comes 
across a species rank it does:

if ($rank eq 'species') {
   # get rid of genus from species name
   (undef,$taxon_name) = split(/\s+/,$taxon_name,2);
}

However even though true scientific name is usually 'Genus species' in 
the database, note the 'usually' - sometimes the species is a multiword 
item that does not include the Genus, so we can't do some simple split 
and take the second word.
The same applies to levels below species, eg. 'Avian erythroblastosis 
virus' is a variant of the species 'Avian leukosis virus' but 'Avian 
erythroblastosis virus (strain ES4)' is a variant of that variant...

My solution is to just remove whatever is the same between the current 
rank and the previous rank. Maybe even that's not so perfect, but it 
must be a lot better than turning the species 'Avian leukosis virus' 
into the species 'virus' (especially given that the genus here is 
'Alpharetrovirus')!

# we need to be going root(kingdom) -> leaf (species or lower) order
#
# we need to be storing untouched versions of the scientific name of
# the previous rank ($self->{_last_raw})
#
# probably only bother start doing this when we get to genus
my $last_raw = $self->{_last_raw} || undef;
$self->{_last_raw} = $sci_name;
if ($last_raw) {
   $sci_name =~ s/$last_raw//;
   $sci_name =~ s/^\s+//;
}

Are there even more strange species (and lower) names that would still 
not work well with the above solution?

Cheers,
Sendu.

From s_maheshwari84 at rediffmail.com  Fri May 12 09:55:49 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 12 May 2006 13:55:49 -0000
Subject: [Bioperl-l] problem help me...........please
Message-ID: <20060512135549.27106.qmail@webmail9.rediffmail.com>

  
hello
I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
I am working on protein protein interaction but I am unable to use the protein interaction module i.e. ProteinGraph.pm..
Actially I am facing lots of problem in the programme I have written Please help me since last four months I am not able to solve the same problem..
I am pasting my programe here also I am attaching it also. ......

#!usr/bin/perl
use lib "/usr/local/bioxapps/bioperl/library/";
use strict;
use Bio::Graph::SimpleGraph;
use Bio::Graph::IO;
our @ISA=qw( Bio::SeqI);
use Bio::Graph::Edge;
use Bio::Graph::IO::dip;
use Bio::Graph::IO::psi_xml;
use Clone qw(clone);
use vars  qw(@ISA);
use Bio::AnnotatableI;
use Bio::IdentifiableI;
our @ISA = qw(Bio::Graph::SimpleGraph);
@ISA = qw(Bio::Graph::IO);
our @ISA=qw(Expoerter);
use Bio::Graph::ProteinGraph;
use Class::AutoClass;
use Bio::Graph::SimpleGraph::Traversal;

my $graphio = Bio::Graph::IO->new(-file   => '/users/saurabh/perl_program/sample1.txt',-format => 'dip');
print "$graphio";
my $graph   = $graphio->next_network();
print "$graph->nodes\t";
$graph->remove_dup_edges();
my @un=$graph->unconnected_nodes();
print "\nthe unconnected nodes are =@un";
my @n=$graph->subgraph();
print "\subgraph=@n\n";
#print "Please the protein-id whose clusering coefficient is to be detemined\n";
#my $v=<STDIN>;
my $density = $graph->density();
print "\ngraph density=$density\n";
my @graphs = $graph->components();
print "\nno of Connected components=$#graphs\n";
print "\nplease enter the protein-id whom you want to remove from the network\n";
my $no=<STDIN>;
$graph->remove_nodes($graph->nodes_by_id($no));
my $count = $graph->edge_count();
print "\nno of edges=$count\n ";
my $ncount = $graph->node_count();
print "\nno of nodes=$ncount\n ";

print"\nenter the protein  whose interactions is to be find "; 
my $x=<STDIN>;
my $node = $graph->nodes_by_id($x);
#print " this is $node\n";
my @neighbors = $graph->neighbors($node); 
print "to check";
print join",",map{$_->object_id()} @neighbors;
my @nodes = $graph->nodes();
print "\nno of nodes = @nodes\t\n";
my @hubs;
foreach my $nodi (@nodes) 
 {
  if ($graph->neighbor_count($node) > 10) 
      {
       push @hubs, $nodi;
      }
  }
  
foreach my $r(@hubs)
  {
     my @y=@$r;
      print "the following proteins have > 10 interactors=@y\n";
  }
  #siblingual protein

 my @edgeref = $graph->articulation_points();
 print "no of articulation points=$#edgeref\n";
 print "please enter the protein whom you want to check for articulation point \n ";
 my $nod=<STDIN>;
  # make pathgen graph
  my $grap = Bio::Graph::IO->new(-file   => 'org.txt',-format => 'dip');
  my $gra   = $grap->next_network();
  $graph->remove_dup_edges();
  $graph->union($gra);
  my @duplicates = $graph->dup_edges();
  print "these interactions exist in cere and c.elegan\n=@duplicates";
  print "please enter the first protein for identifiaction of shortest path\n";
  my $p1=<STDIN>;
  print "please enter the second  protein for identifiaction of shortest path\n";
  my $p2=<STDIN>;
  
    my @a=$graph->shortest_paths();
 print "shortest path=@a\t\n";
    
  
with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: from.pl
Type: application/octet-stream
Size: 2723 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060512/fe287972/attachment.obj 

From chen_li3 at yahoo.com  Thu May 11 13:47:33 2006
From: chen_li3 at yahoo.com (chen li)
Date: Thu, 11 May 2006 10:47:33 -0700 (PDT)
Subject: [Bioperl-l] script for batch-primer design using primer3 module
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca>
Message-ID: <20060511174733.68836.qmail@web36812.mail.mud.yahoo.com>

Hi all,

With the valuable input from many of you I finally
come out a script for my personal need:
1)bacth-primer design
2)set some of the parameters instead of using all the
default values
3)output only part of the information for the first
pair of primers but not all of them(but you can
choose)
4)the reults can be exported into excel for my
convience.

Enclosed are the script and the results tested.  I
also include some lines about how I figure out which
keys/entries are vailable for change.If you don't 
want the sequence part just add # to comment it.

Any comments are welcome.

BTW the solution suggested by Dr. Cui and Paul doesn't
work for me.

Once again thank you very much,

Li  

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: primer3-5
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: result1.txt
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment.txt 

From Marc.Logghe at DEVGEN.com  Fri May 12 11:28:55 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Fri, 12 May 2006 17:28:55 +0200
Subject: [Bioperl-l] problem help me...........please
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAB@ANTARESIA.be.devgen.com>

Hi,
What is actually the problem ? Do you have errors ? Is the script not
behaving as you expect ?
You also might attach the input file sample1.txt so that people can try
it.
Regards,
Marc
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> saurabh maheshwari
> Sent: Friday, May 12, 2006 3:56 PM
> To: bioperl-l at bioperl.org; s_maheshwari84
> Subject: [Bioperl-l] problem help me...........please
> 
>   
> hello
> I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
> I am working on protein protein interaction but I am unable 
> to use the protein interaction module i.e. ProteinGraph.pm..
> Actially I am facing lots of problem in the programme I have 
> written Please help me since last four months I am not able 
> to solve the same problem..
> I am pasting my programe here also I am attaching it also. ......
> 
> #!usr/bin/perl
> use lib "/usr/local/bioxapps/bioperl/library/";
> use strict;
> use Bio::Graph::SimpleGraph;
> use Bio::Graph::IO;
> our @ISA=qw( Bio::SeqI);
> use Bio::Graph::Edge;
> use Bio::Graph::IO::dip;
> use Bio::Graph::IO::psi_xml;
> use Clone qw(clone);
> use vars  qw(@ISA);
> use Bio::AnnotatableI;
> use Bio::IdentifiableI;
> our @ISA = qw(Bio::Graph::SimpleGraph);
> @ISA = qw(Bio::Graph::IO);
> our @ISA=qw(Expoerter);
> use Bio::Graph::ProteinGraph;
> use Class::AutoClass;
> use Bio::Graph::SimpleGraph::Traversal;
> 
> my $graphio = Bio::Graph::IO->new(-file   => 
> '/users/saurabh/perl_program/sample1.txt',-format => 'dip');
> print "$graphio";
> my $graph   = $graphio->next_network();
> print "$graph->nodes\t";
> $graph->remove_dup_edges();
> my @un=$graph->unconnected_nodes();
> print "\nthe unconnected nodes are =@un"; my 
> @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please 
> the protein-id whose clusering coefficient is to be 
> detemined\n"; #my $v=<STDIN>; my $density = 
> $graph->density(); print "\ngraph density=$density\n"; my 
> @graphs = $graph->components(); print "\nno of Connected 
> components=$#graphs\n"; print "\nplease enter the protein-id 
> whom you want to remove from the network\n"; my $no=<STDIN>; 
> $graph->remove_nodes($graph->nodes_by_id($no));
> my $count = $graph->edge_count();
> print "\nno of edges=$count\n ";
> my $ncount = $graph->node_count();
> print "\nno of nodes=$ncount\n ";
> 
> print"\nenter the protein  whose interactions is to be find 
> "; my $x=<STDIN>; my $node = $graph->nodes_by_id($x); #print 
> " this is $node\n"; my @neighbors = $graph->neighbors($node); 
> print "to check"; print join",",map{$_->object_id()} 
> @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes 
> = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes)  {
>   if ($graph->neighbor_count($node) > 10) 
>       {
>        push @hubs, $nodi;
>       }
>   }
>   
> foreach my $r(@hubs)
>   {
>      my @y=@$r;
>       print "the following proteins have > 10 interactors=@y\n";
>   }
>   #siblingual protein
> 
>  my @edgeref = $graph->articulation_points();  print "no of 
> articulation points=$#edgeref\n";  print "please enter the 
> protein whom you want to check for articulation point \n ";  
> my $nod=<STDIN>;
>   # make pathgen graph
>   my $grap = Bio::Graph::IO->new(-file   => 'org.txt',-format 
> => 'dip');
>   my $gra   = $grap->next_network();
>   $graph->remove_dup_edges();
>   $graph->union($gra);
>   my @duplicates = $graph->dup_edges();
>   print "these interactions exist in cere and c.elegan\n=@duplicates";
>   print "please enter the first protein for identifiaction of 
> shortest path\n";
>   my $p1=<STDIN>;
>   print "please enter the second  protein for identifiaction 
> of shortest path\n";
>   my $p2=<STDIN>;
>   
>     my @a=$graph->shortest_paths();
>  print "shortest path=@a\t\n";
>     
>   
> 
> with Regards
> 
> SAURABH MAHESHWARI
> 
> M.Sc. (BIOINFORMATICS)
> 
> JAMIA MILLIA ISLAMIA
> 
> NEW DELHI
> 


From stoltzfu at umbi.umd.edu  Fri May 12 11:56:06 2006
From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus)
Date: Fri, 12 May 2006 11:56:06 -0400
Subject: [Bioperl-l] proposal: Bio::CDAT (character data and trees)
Message-ID: <A52F256F-A851-4429-A5B1-D3162A344790@umbi.umd.edu>

Dear developers--

We propose a Bio::CDAT (Character Data And Trees) module to  
facilitate comparative analysis
using evolutionary methods by 1) managing evolutionary relationships  
(by linking data to trees)
and 2) allowing coordinated analysis of different types of data (by  
implementing a generic concept
of ?character-state? data).  Bio::CDAT would leverage existing  
BioPerl objects and include the functionality
of Rutger Vos's Bio::Phylo.  It would provide the framework to  
develop interfaces to analysis tools
(phylogeny inference, evolutionary rate models, functional shift  
inference, etc), as well as to file
formats and visualization methods appropriate for such analyses.  A  
proposal is available at

   http://www.molevol.org/camel/projects/CDAT-proposal.pdf

We would like to hear your thoughts (e.g., see the section on  
"Questions to consider")!  Thanks

Arlin Stoltzfus
WeiGang Qiu
Rutger Vos
(with thanks to Justin Reese and Aaron Mackey)
------------------
Arlin Stoltzfus (stoltzfu at umbi.umd.edu)
CARB, 9600 Gudelsky Drive, Rockville, Maryland  20850
tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel


From sdavis2 at mail.nih.gov  Fri May 12 11:54:57 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Fri, 12 May 2006 11:54:57 -0400
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060512135549.27106.qmail@webmail9.rediffmail.com>
Message-ID: <C08A2811.B6B5%sdavis2@mail.nih.gov>


On 5/12/06 9:55 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
wrote:

>   
> hello
> I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
> I am working on protein protein interaction but I am unable to use the protein
> interaction module i.e. ProteinGraph.pm..
> Actially I am facing lots of problem in the programme I have written Please
> help me since last four months I am not able to solve the same problem..
> I am pasting my programe here also I am attaching it also. ......

You haven't really told us what you are trying to do or what problems you
are having.

Sean


From cjfields at uiuc.edu  Fri May 12 13:08:11 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 12 May 2006 12:08:11 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <44646267.2000802@mrc-dunn.cam.ac.uk>
Message-ID: <000f01c675e6$a61bde90$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Friday, May 12, 2006 5:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles
> species,subspecies/variant names
> 
> In bioperl up to at least 1.5.1, when one of the database modules comes
> across a species rank it does:
> 
> if ($rank eq 'species') {
>    # get rid of genus from species name
>    (undef,$taxon_name) = split(/\s+/,$taxon_name,2);
> }

The XML example from NCBI Taxonomy I mentioned previously seems to have
everything in the classification, from superkingdom down to species (no
strain unfortunately, and I'm nit sure about subspecies); if it's missing
the rank then the designation doesn't exist or is tagged as 'no rank'.  Like
I mentioned before I'm not intimately familiar Bio::Taxonomy,
Bio::DB::Taxonomy, or Bio::Species, so I don't have a clue as to how
everything is parsed and plugged in to Bio::Taxonomy objects.  I do know
that XML::Twig is used for parsing through the data so it shouldn't be too
hard to change what you want.

I haven't tried using Bio::DB::Taxonomy directly yet, but I would have
thought that the binomial is just built from the XML twig 'LineageEx'
Rank=Genus + Rank=Species, that the genus comes from the tag 'Genus' and
species from 'Species', and that the scientific name is from the tag
'ScientificName'.  Guess not. 

> However even though true scientific name is usually 'Genus species' in
> the database, note the 'usually' - sometimes the species is a multiword
> item that does not include the Genus, so we can't do some simple split
> and take the second word.
> The same applies to levels below species, eg. 'Avian erythroblastosis
> virus' is a variant of the species 'Avian leukosis virus' but 'Avian
> erythroblastosis virus (strain ES4)' is a variant of that variant...
> 
> My solution is to just remove whatever is the same between the current
> rank and the previous rank. Maybe even that's not so perfect, but it
> must be a lot better than turning the species 'Avian leukosis virus'
> into the species 'virus' (especially given that the genus here is
> 'Alpharetrovirus')!
> 
> # we need to be going root(kingdom) -> leaf (species or lower) order
> #
> # we need to be storing untouched versions of the scientific name of
> # the previous rank ($self->{_last_raw})
> #
> # probably only bother start doing this when we get to genus
> my $last_raw = $self->{_last_raw} || undef;
> $self->{_last_raw} = $sci_name;
> if ($last_raw) {
>    $sci_name =~ s/$last_raw//;
>    $sci_name =~ s/^\s+//;
> }
> 
> Are there even more strange species (and lower) names that would still
> not work well with the above solution?

I'm don't think taking Genus/Species directly from the scientific name
(normally what is in the SOURCE or ORGANISM annotation for GenBank or OS for
EMBL) is the best way to go about it since it's really a best guess using
regex; Jason pointed out several examples where this falls apart, and being
a bacterial man I have found many examples myself.  I'm also not sure that
forcing a lookup for every TaxID in every sequence every time it's passed
through SeqIO is the best way to go either, though I think it should be
required for storing sequences.  It's a tricky balance.  

I still think that maybe we should absolve ourselves from using
SOURCE/ORGANISM or OS/OC information in GenBank files as anything more than
strictly annotation, or reconstruct Bio::Species to maybe a
Bio::Annotation::Species object to handle that annotation and either
deprecate Bio::Species or separate it completely from any Bio::Taxonomy
objects.  It would really simplify things.  Then, if anyone is interested in
taxonomy, either install a local database or use Entrez efetch, and then use
Bio::DB::Taxonomy (fixed of course) to grab the TaxID info.  Seems like
we're running more and more into exceptions to the rule as more genomes are
made available.

Anyway, using Bio::Species for GenBank is really screwy for bacterial names,
so currently I get around BioPerl issues with bacterial names by grabbing
the 'source' seqfeature and pulling the 'organism' tag out.  But it really
shouldn't be that obfuscated, right?

Chris

> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Sat May 13 08:19:21 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sat, 13 May 2006 08:19:21 -0400
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060513041853.16091.qmail@webmail31.rediffmail.com>
References: <20060513041853.16091.qmail@webmail31.rediffmail.com>
Message-ID: <4465CEC9.2010909@mail.nih.gov>

saurabh maheshwari wrote:
>  
> hello
> Thanks for your prompt reply.
> Actaully I am trying to make a protein interaction graph from a dip 
> file.But I am not able to do so.In my last mail I have already attached 
> my program which is giving some error and I am not able troble shot 
> them.Please help
> Thanks

I meant that since we don't know what error(s) you are getting, it is 
really not possible to determine what the problem is.  Also, someone 
else on the list offered to look at your code if you were to privide the 
input file.  I find it helpful to look at this webpage every now and 
then to remind myself what constitutes a useful question to email lists:

http://www.catb.org/~esr/faqs/smart-questions.html

Sean


> On Fri, 12 May 2006 Sean Davis wrote :
>  >
>  >
>  >
>  >On 5/12/06 9:55 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
>  >wrote:
>  >
>  > >
>  > > hello
>  > > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
>  > > I am working on protein protein interaction but I am unable to use 
> the protein
>  > > interaction module i.e. ProteinGraph.pm..
>  > > Actially I am facing lots of problem in the programme I have 
> written Please
>  > > help me since last four months I am not able to solve the same 
> problem..
>  > > I am pasting my programe here also I am attaching it also. ......
>  >
>  >You haven't really told us what you are trying to do or what problems you
>  >are having.
>  >
>  >Sean
>  >
>  >_______________________________________________
>  >Bioperl-l mailing list
>  >Bioperl-l at lists.open-bio.org
>  >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> with Regards
> SAURABH MAHESHWARI
> M.Sc. (BIOINFORMATICS)
> JAMIA MILLIA ISLAMIA
> NEW DELHI
> 
> <http://adworks.rediff.com/cgi-bin/AdWorks/sigclick.cgi/www.rediff.com/signature-home.htm/1507191490 at Middle5?PARTNER=3> 
> 

From s_maheshwari84 at rediffmail.com  Sat May 13 01:17:58 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 13 May 2006 05:17:58 -0000
Subject: [Bioperl-l] problem help me...........please
Message-ID: <20060513051758.4610.qmail@webmail31.rediffmail.com>

  
hello
I am very happy to see the prompt reply from the group members..
As you all suggested  to attach the required files ..
So I have attached all the three file first the input file,secod I have saved the error I was getting into a error file and third the programme file..
Actully in error file I want to know some thing .
I am putting here one error line,
## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
what this stand for
Second thing I want to get the connected graph as I have.
which type of connected grph I explain you by example..
Let there are five object in such a way.
A connected to B
A connected to C
B connected to C
D connected to C
E connected to A
I want to create a whole link in betwwen all five.


Please help me I am not getting the result


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample.dip
Type: application/octet-stream
Size: 5794 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment.obj 
-------------- next part --------------
bash-2.05b$ perl from.pl
Bio::Graph::ProteinGraph=HASH(0x1182e70)
Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes
the unconnected nodes are =subgraph=Bio::Graph::SimpleGraph=HASH(0x11e2160)

graph density=0.00826446280991736

no of Connected components=60

please enter the protein-id whom you want to remove from the network
XMECF2

no of edges=61

no of nodes=122

enter the protein  whose interactions is to be find XMECF2
XMECF2
 interacts with map{->object_id()}

no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) Bio::Seq::RichSeq=HASH(0x11d1850
) Bio::Seq::RichSeq=HASH(0x11bd4c0) Bio::Seq::RichSeq=HASH(0x11c2fd0) Bio::Seq::
RichSeq=HASH(0x11aa7f0) Bio::Seq::RichSeq=HASH(0x1198340) Bio::Seq::RichSeq=HASH
(0x11d81a0) Bio::Seq::RichSeq=HASH(0x11ca320) Bio::Seq::RichSeq=HASH(0x11b5e40)
Bio::Seq::RichSeq=HASH(0x1190e00) Bio::Seq::RichSeq=HASH(0x11c1350) Bio::Seq::Ri
chSeq=HASH(0x11b2e20) Bio::Seq::RichSeq=HASH(0x11cb360) Bio::Seq::RichSeq=HASH(0
x1198250) Bio::Seq::RichSeq=HASH(0x11d0240) Bio::Seq::RichSeq=HASH(0x11c8f20) Bi
o::Seq::RichSeq=HASH(0x11b4ef0) Bio::Seq::RichSeq=HASH(0x119f7a0) Bio::Seq::Rich
Seq=HASH(0x11c2ee0) Bio::Seq::RichSeq=HASH(0x11dba20) Bio::Seq::RichSeq=HASH(0x1
1e2300) Bio::Seq::RichSeq=HASH(0x11b2f10) Bio::Seq::RichSeq=HASH(0x11b4b90) Bio:
:Seq::RichSeq=HASH(0x11d4df0) Bio::Seq::RichSeq=HASH(0x11d4b80) Bio::Seq::RichSe
q=HASH(0x11d8e70) Bio::Seq::RichSeq=HASH(0x11a1270) Bio::Seq::RichSeq=HASH(0x11c
b5d0) Bio::Seq::RichSeq=HASH(0x11d5cc0) Bio::Seq::RichSeq=HASH(0x11d32a0) Bio::S
eq::RichSeq=HASH(0x11b4c80) Bio::Seq::RichSeq=HASH(0x119e0c0) Bio::Seq::RichSeq=
HASH(0x11b7ed0) Bio::Seq::RichSeq=HASH(0x11ad490) Bio::Seq::RichSeq=HASH(0x1196e
60) Bio::Seq::RichSeq=HASH(0x119b7f0) Bio::Seq::RichSeq=HASH(0x11cef60) Bio::Seq
::RichSeq=HASH(0x11b7b70) Bio::Seq::RichSeq=HASH(0x11dd330) Bio::Seq::RichSeq=HA
SH(0x11da8c0) Bio::Seq::RichSeq=HASH(0x11a9f70) Bio::Seq::RichSeq=HASH(0x119b700
) Bio::Seq::RichSeq=HASH(0x119a550) Bio::Seq::RichSeq=HASH(0x11ba910) Bio::Seq::
RichSeq=HASH(0x11e0b30) Bio::Seq::RichSeq=HASH(0x11d3030) Bio::Seq::RichSeq=HASH
(0x11c62d0) Bio::Seq::RichSeq=HASH(0x11abb20) Bio::Seq::RichSeq=HASH(0x11d5bd0)
Bio::Seq::RichSeq=HASH(0x11b03c0) Bio::Seq::RichSeq=HASH(0x119e1b0) Bio::Seq::Ri
chSeq=HASH(0x11aa060) Bio::Seq::RichSeq=HASH(0x11a5700) Bio::Seq::RichSeq=HASH(0
x11a81e0) Bio::Seq::RichSeq=HASH(0x1196b00) Bio::Seq::RichSeq=HASH(0x11c1260) Bi
o::Seq::RichSeq=HASH(0x11a2800) Bio::Seq::RichSeq=HASH(0x11c63c0) Bio::Seq::Rich
Seq=HASH(0x11b60b0) Bio::Seq::RichSeq=HASH(0x11b93b0) Bio::Seq::RichSeq=HASH(0x1
1a4490) Bio::Seq::RichSeq=HASH(0x11ded50) Bio::Seq::RichSeq=HASH(0x11bbcd0) Bio:
:Seq::RichSeq=HASH(0x1194780) Bio::Seq::RichSeq=HASH(0x11aedd0) Bio::Seq::RichSe
q=HASH(0x11cd300) Bio::Seq::RichSeq=HASH(0x11a14e0) Bio::Seq::RichSeq=HASH(0x11c
4630) Bio::Seq::RichSeq=HASH(0x11a43a0) Bio::Seq::RichSeq=HASH(0x11a80f0) Bio::S
eq::RichSeq=HASH(0x11bbbe0) Bio::Seq::RichSeq=HASH(0x11d5960) Bio::Seq::RichSeq=
HASH(0x11c8e30) Bio::Seq::RichSeq=HASH(0x11cd3f0) Bio::Seq::RichSeq=HASH(0x11dd4
20) Bio::Seq::RichSeq=HASH(0x11cee70) Bio::Seq::RichSeq=HASH(0x11dbb10) Bio::Seq
::RichSeq=HASH(0x119a460) Bio::Seq::RichSeq=HASH(0x11aaa60) Bio::Seq::RichSeq=HA
SH(0x11d1760) Bio::Seq::RichSeq=HASH(0x11cb6c0) Bio::Seq::RichSeq=HASH(0x11c7530
) Bio::Seq::RichSeq=HASH(0x11deae0) Bio::Seq::RichSeq=HASH(0x11c4720) Bio::Seq::
RichSeq=HASH(0x119f890) Bio::Seq::RichSeq=HASH(0x11a6c40) Bio::Seq::RichSeq=HASH
(0x11ad130) Bio::Seq::RichSeq=HASH(0x11e23f0) Bio::Seq::RichSeq=HASH(0x11d2f40)
Bio::Seq::RichSeq=HASH(0x1194640) Bio::Seq::RichSeq=HASH(0x11d8f60) Bio::Seq::Ri
chSeq=HASH(0x11d0150) Bio::Seq::RichSeq=HASH(0x119d070) Bio::Seq::RichSeq=HASH(0
x11a5610) Bio::Seq::RichSeq=HASH(0x11aa2d0) Bio::Seq::RichSeq=HASH(0x11b94a0) Bi
o::Seq::RichSeq=HASH(0x11bd5b0) Bio::Seq::RichSeq=HASH(0x11c0ff0) Bio::Seq::Rich
Seq=HASH(0x11a6b50) Bio::Seq::RichSeq=HASH(0x119cf80) Bio::Seq::RichSeq=HASH(0x1
1baa00) Bio::Seq::RichSeq=HASH(0x11c7620) Bio::Seq::RichSeq=HASH(0x119fb00) Bio:
:Seq::RichSeq=HASH(0x11a2a70) Bio::Seq::RichSeq=HASH(0x11b1960) Bio::Seq::RichSe
q=HASH(0x11ab8b0) Bio::Seq::RichSeq=HASH(0x11e0c20) Bio::Seq::RichSeq=HASH(0x11a
d3a0) Bio::Seq::RichSeq=HASH(0x1197fe0) Bio::Seq::RichSeq=HASH(0x11b1870) Bio::S
eq::RichSeq=HASH(0x11a2b60) Bio::Seq::RichSeq=HASH(0x1192750) Bio::Seq::RichSeq=
HASH(0x11c9190) Bio::Seq::RichSeq=HASH(0x11e08c0) Bio::Seq::RichSeq=HASH(0x11dd6
90) Bio::Seq::RichSeq=HASH(0x11da7d0) Bio::Seq::RichSeq=HASH(0x11aece0) Bio::Seq
::RichSeq=HASH(0x11d80b0) Bio::Seq::RichSeq=HASH(0x11ca0b0) Bio::Seq::RichSeq=HA
SH(0x1196bf0) Bio::Seq::RichSeq=HASH(0x11b7de0) Bio::Seq::RichSeq=HASH(0x11b02d0
)
Can't call method "isa" on an undefined value at /usr/local/bioxapps/bioperl/lib
rary//Bio/Graph/ProteinGraph.pm line 477, <STDIN> line 2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: from.pl
Type: application/octet-stream
Size: 2723 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment-0001.obj 

From cjfields at uiuc.edu  Sat May 13 14:18:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 13 May 2006 13:18:53 -0500
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060513051758.4610.qmail@webmail31.rediffmail.com>
Message-ID: <000901c676b9$b14479c0$15327e82@pyrimidine>

I really hate to break the bad news here, but I'm going to be brutally
honest.  I have not looked at any of the Bio::Graph modules and have no idea
how they are implemented, and I haven't looked at your input file, but I can
tell right off the bat your script has major logic problems.  I can also
pretty much  tell that you don't understand the object model we use here, at
all.  This is why I say that (from your last response):

> ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
> what this stand for

Did you cut and paste from several other scripts hoping that it would work?
I say that b/c you mix styles quite frequently here, using objects correctly
(deref'ing with '->') and incorrectly (print "$object").  You also declare
(and redeclare) @ISA four times for a script (not needed unless you're
declaring a class and inheriting methods from other modules).  You also use
@ISA once with a misspelled module name (I don't think there is a module
named 'Expoerter').  So, I'm actually stunned that the script doesn't crash
at all.  Yikes!

Okay, brutal honesty time over.  Any time you see something like this:

Bio::Graph::ProteinGraph=HASH(0x1182e70)

means that what you are printing out is an reference to an object (it refers
to the object class and the location in memory) and is NOT what you want.
You should be doing something along the lines of $object->method, not 'print
$object', to get at the object data and methods.  You use this several times
in your script already; that should be a big hint as the areas where it
doesn't work do not use this syntax.  Read the documentation for the many
varied modules you use in your script.  Look at script examples.  Start
simply, then work your way up.  

Also, using the '->' dereferencing operator inside double quotes doesn't
work; you have to do something like:

print $graph->nodes,"\t";

not 

print "$graph->nodes\t";

That's why you get this in your output:

Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes

Which just prints the object reference with the string '->nodes'.

If any of what I just said doesn't make any sense, you really need to pick
up 'Learning Perl' and 'Intermediate Perl' by Schwartz et al and
'Programming Perl' by Wall et al.  I don't know if anyone can really help at
this point w/o completely writing the script for you.  We will fix problems
to a point but we, for the most part, will not do your work for you.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari
> Sent: Saturday, May 13, 2006 12:18 AM
> To: bioperl_l
> Subject: [Bioperl-l] problem help me...........please
> 
> 
> hello
> I am very happy to see the prompt reply from the group members..
> As you all suggested  to attach the required files ..
> So I have attached all the three file first the input file,secod I have
> saved the error I was getting into a error file and third the programme
> file..
> Actully in error file I want to know some thing .
> I am putting here one error line,
> ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
> what this stand for
> Second thing I want to get the connected graph as I have.
> which type of connected grph I explain you by example..
> Let there are five object in such a way.
> A connected to B
> A connected to C
> B connected to C
> D connected to C
> E connected to A
> I want to create a whole link in betwwen all five.
> 
> 
> Please help me I am not getting the result
> 
> 
> with Regards
> 
> SAURABH MAHESHWARI
> 
> M.Sc. (BIOINFORMATICS)
> 
> JAMIA MILLIA ISLAMIA
> 
> NEW DELHI


From hubert.prielinger at gmx.at  Sat May 13 23:45:58 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sat, 13 May 2006 21:45:58 -0600
Subject: [Bioperl-l] parsing output files from other tools
Message-ID: <4466A7F6.30204@gmx.at>

hi,
Is it possible to parse text outputfiles rather than blast output files, 
like the text outputfiles form the search tool mpSrch that is offered by
EBI, because the WU Blast output files are possible to parse with bioperl.

thanks
Hubert

From arareko at campus.iztacala.unam.mx  Sun May 14 00:09:35 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 13 May 2006 23:09:35 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <4466AD7F.6050700@campus.iztacala.unam.mx>

I'm glad to announce the availability of the Deobfuscator interface at 
the BioPerl website. You can use it at the following URL:

http://bioperl.org/cgi-bin/deob_interface.cgi

Many thanks to Laura Kavanaugh and David Messina for this great 
contribution to the BioPerl project!

Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Sun May 14 12:18:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 14 May 2006 11:18:10 -0500
Subject: [Bioperl-l] parsing output files from other tools
In-Reply-To: <4466A7F6.30204@gmx.at>
Message-ID: <000301c67772$00b4e4f0$15327e82@pyrimidine>

These are the current report types parsed through SearchIO:

http://www.bioperl.org/wiki/Module:Bio::SearchIO

I don't see mpsrch among them.  If you want you could create a new plugin
module to parse those reports; the SearchIO HOWTO gives some pointers:

http://www.bioperl.org/wiki/HOWTO:SearchIO

You can always look at some of the current modules like blast, blastxml, or
fasta to get an idea of how it works.  Judging by the mpsrch output I'm
pretty sure you would have to build a custom plugin for it.  

A viable alternative: looking through the mail list it looks like mpsrch is
a multiprocessor implementation of ssearch, itself an implementation of the
Smith-Waterman algorithm for local alignments in the FASTA package of
programs:

http://www.bioperl.org/wiki/SSEARCH

You might be able to use SearchIO::fasta there...

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Saturday, May 13, 2006 10:46 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] parsing output files from other tools
> 
> hi,
> Is it possible to parse text outputfiles rather than blast output files,
> like the text outputfiles form the search tool mpSrch that is offered by
> EBI, because the WU Blast output files are possible to parse with bioperl.
> 
> thanks
> Hubert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May 14 13:14:30 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 14 May 2006 10:14:30 -0700 (PDT)
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com>

Hi all,

I need to get a reverse-complemenary sequence out of a
fasta sequence file. And the Synopsis of Bio::Seq
points out I can do like this way:

$revcom=$seqobj->revcom();

I use the following script trying to get the job done
but it doesn't work. Then I read documentation of
Bio::Seq and it looks like it doesn't contain revcom
method.

Any idea will be appreciated.

Li 


###############################
Here is the code:

#!c:/perl/bin/perl.exe
use strict;
use warnings;

use Bio::Seq; 
use Bio::SeqIO;     
       
my $file='c:/perl/local/primer3_1.0.0/src/est.txt';   
 
    
my $seqIO=Bio::SeqIO->new(-file=>"<$file",
                            -format=>'fasta' );
                            
    my $seqobj=$seqIO->next_seq();#create object  
    
  print "what attributes/keys are available:\n";    
  for my $key (sort keys %$seqobj){
           my $value=$seqobj->{$key};
	    print "$key\t=>\t$value\n"
	    }
# These are the output on the screen	    
#primary_id =>      gi|54093|emb|X61809.1|
#primary_seq =>     Bio::PrimarySeq=HASH(0x10492848)

#based on these results primary_id can get 
#access right away
# as to primary_seq it is an object in
#Bio::Primaryseq and it provides the following
#methods after reading the documentaion:
                #new   
		#seq 
		#validate_seq 
		#subseq 
		#length 
		#display_id
		#accession_number 
		#primary_id 
		#alphabet 
		#desc 
		#can_call_new
		#id 
		#is_circular 
		#object_id
		#version 
		#authority 
		#namespace 
		#display_name 
		#description 
    
print "primary_id=",$seqobj->primary_id, "\n\n";
print "id=",$seqobj->id, "\n\n"; 
print "revcom=",$seqobj->revcom,"\n\n";
                      
        my $now_time=localtime;
        print  $now_time, "\n\n";  
        exit;

 #These are the output on the screen 
	#primary_id=gi|54093|emb|X61809.1|
	#id=gi|54093|emb|X61809.1
	#revcom=Bio::Seq=HASH(0x10493304)
	#Sun May 14 12:45:20 2006

      
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From cjfields at uiuc.edu  Sun May 14 13:39:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 14 May 2006 12:39:50 -0500
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
In-Reply-To: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com>
Message-ID: <000401c6777d$66ddb120$15327e82@pyrimidine>

This line should give you the hint:

	#revcom=Bio::Seq=HASH(0x10493304)

You're getting an object ref here.  The actual way to get the rev. comp on
the wiki states '$seq->revcom->seq', not '$seq->revcom'.

When I ran your script and change your line to the wiki version I get (using
my test seq):

what attributes/keys are available:
primary_id      =>      test,
primary_seq     =>      Bio::PrimarySeq=HASH(0x1d47fe0)
primary_id=test,

id=test,

revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG
CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG
TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG
GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG

Sun May 14 17:34:45 2006

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of chen li
> Sent: Sunday, May 14, 2006 12:15 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] no revcom method in Bio::Seq module?
> 
> Hi all,
> 
> I need to get a reverse-complemenary sequence out of a
> fasta sequence file. And the Synopsis of Bio::Seq
> points out I can do like this way:
> 
> $revcom=$seqobj->revcom();
> 
> I use the following script trying to get the job done
> but it doesn't work. Then I read documentation of
> Bio::Seq and it looks like it doesn't contain revcom
> method.
> 
> Any idea will be appreciated.
> 
> Li
> 
> 
> ###############################
> Here is the code:
> 
> #!c:/perl/bin/perl.exe
> use strict;
> use warnings;
> 
> use Bio::Seq;
> use Bio::SeqIO;
> 
> my $file='c:/perl/local/primer3_1.0.0/src/est.txt';
> 
> 
> my $seqIO=Bio::SeqIO->new(-file=>"<$file",
>                             -format=>'fasta' );
> 
>     my $seqobj=$seqIO->next_seq();#create object
> 
>   print "what attributes/keys are available:\n";
>   for my $key (sort keys %$seqobj){
>            my $value=$seqobj->{$key};
> 	    print "$key\t=>\t$value\n"
> 	    }
> # These are the output on the screen
> #primary_id =>      gi|54093|emb|X61809.1|
> #primary_seq =>     Bio::PrimarySeq=HASH(0x10492848)
> 
> #based on these results primary_id can get
> #access right away
> # as to primary_seq it is an object in
> #Bio::Primaryseq and it provides the following
> #methods after reading the documentaion:
>                 #new
> 		#seq
> 		#validate_seq
> 		#subseq
> 		#length
> 		#display_id
> 		#accession_number
> 		#primary_id
> 		#alphabet
> 		#desc
> 		#can_call_new
> 		#id
> 		#is_circular
> 		#object_id
> 		#version
> 		#authority
> 		#namespace
> 		#display_name
> 		#description
> 
> print "primary_id=",$seqobj->primary_id, "\n\n";
> print "id=",$seqobj->id, "\n\n";
> print "revcom=",$seqobj->revcom,"\n\n";
> 
>         my $now_time=localtime;
>         print  $now_time, "\n\n";
>         exit;
> 
>  #These are the output on the screen
> 	#primary_id=gi|54093|emb|X61809.1|
> 	#id=gi|54093|emb|X61809.1
> 	#revcom=Bio::Seq=HASH(0x10493304)
> 	#Sun May 14 12:45:20 2006
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May 14 14:08:49 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 14 May 2006 11:08:49 -0700 (PDT)
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
In-Reply-To: <000401c6777d$66ddb120$15327e82@pyrimidine>
Message-ID: <20060514180849.55423.qmail@web36808.mail.mud.yahoo.com>

Hi Chris,

Thank you very much. But could you please give me the
link for this syntax: $seq->revcom->seq?

Li


--- Chris Fields <cjfields at uiuc.edu> wrote:

> This line should give you the hint:
> 
> 	#revcom=Bio::Seq=HASH(0x10493304)
> 
> You're getting an object ref here.  The actual way
> to get the rev. comp on
> the wiki states '$seq->revcom->seq', not
> '$seq->revcom'.
> 
> When I ran your script and change your line to the
> wiki version I get (using
> my test seq):
> 
> what attributes/keys are available:
> primary_id      =>      test,
> primary_seq     =>     
> Bio::PrimarySeq=HASH(0x1d47fe0)
> primary_id=test,
> 
> id=test,
> 
>
revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG
> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
>
CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG
> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
>
GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG
> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG
> 
> Sun May 14 17:34:45 2006
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of chen li
> > Sent: Sunday, May 14, 2006 12:15 PM
> > To: bioperl-l at bioperl.org
> > Subject: [Bioperl-l] no revcom method in Bio::Seq
> module?
> > 
> > Hi all,
> > 
> > I need to get a reverse-complemenary sequence out
> of a
> > fasta sequence file. And the Synopsis of Bio::Seq
> > points out I can do like this way:
> > 
> > $revcom=$seqobj->revcom();
> > 
> > I use the following script trying to get the job
> done
> > but it doesn't work. Then I read documentation of
> > Bio::Seq and it looks like it doesn't contain
> revcom
> > method.
> > 
> > Any idea will be appreciated.
> > 
> > Li
> > 
> > 
> > ###############################
> > Here is the code:
> > 
> > #!c:/perl/bin/perl.exe
> > use strict;
> > use warnings;
> > 
> > use Bio::Seq;
> > use Bio::SeqIO;
> > 
> > my
> $file='c:/perl/local/primer3_1.0.0/src/est.txt';
> > 
> > 
> > my $seqIO=Bio::SeqIO->new(-file=>"<$file",
> >                             -format=>'fasta' );
> > 
> >     my $seqobj=$seqIO->next_seq();#create object
> > 
> >   print "what attributes/keys are available:\n";
> >   for my $key (sort keys %$seqobj){
> >            my $value=$seqobj->{$key};
> > 	    print "$key\t=>\t$value\n"
> > 	    }
> > # These are the output on the screen
> > #primary_id =>      gi|54093|emb|X61809.1|
> > #primary_seq =>    
> Bio::PrimarySeq=HASH(0x10492848)
> > 
> > #based on these results primary_id can get
> > #access right away
> > # as to primary_seq it is an object in
> > #Bio::Primaryseq and it provides the following
> > #methods after reading the documentaion:
> >                 #new
> > 		#seq
> > 		#validate_seq
> > 		#subseq
> > 		#length
> > 		#display_id
> > 		#accession_number
> > 		#primary_id
> > 		#alphabet
> > 		#desc
> > 		#can_call_new
> > 		#id
> > 		#is_circular
> > 		#object_id
> > 		#version
> > 		#authority
> > 		#namespace
> > 		#display_name
> > 		#description
> > 
> > print "primary_id=",$seqobj->primary_id, "\n\n";
> > print "id=",$seqobj->id, "\n\n";
> > print "revcom=",$seqobj->revcom,"\n\n";
> > 
> >         my $now_time=localtime;
> >         print  $now_time, "\n\n";
> >         exit;
> > 
> >  #These are the output on the screen
> > 	#primary_id=gi|54093|emb|X61809.1|
> > 	#id=gi|54093|emb|X61809.1
> > 	#revcom=Bio::Seq=HASH(0x10493304)
> > 	#Sun May 14 12:45:20 2006
> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From cjfields at uiuc.edu  Sun May 14 14:28:14 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sun, 14 May 2006 13:28:14 -0500
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <b3ef767e.b86a2fe8.820dd00@expms6.cites.uiuc.edu>

I think the confusion lies in what revcom returns.  This page

http://www.bioperl.org/wiki/Getting_Started

show a quick way of using revcom, (which I mentioned previously) while this 
page

http://www.bioperl.org/wiki/HOWTO:Beginners

explains what is returned when you use revcom.  '$seq_obj->revcom' returns a 
sequence object (not a sequence string):

http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object

which is why you need to use the 'seq' method to get the string.

Hence, '$seq_obj->revcom->seq'.

Chris

---- Original message ----
>Date: Sun, 14 May 2006 11:08:49 -0700 (PDT)
>From: chen li <chen_li3 at yahoo.com>  
>Subject: RE: [Bioperl-l] no revcom method in Bio::Seq module?  
>To: Chris Fields <cjfields at uiuc.edu>
>Cc: bioperl-l at bioperl.org
>
>Hi Chris,
>
>Thank you very much. But could you please give me the
>link for this syntax: $seq->revcom->seq?
>
>Li
>
>
>
>--- Chris Fields <cjfields at uiuc.edu> wrote:
>
>> This line should give you the hint:
>> 
>> 	#revcom=Bio::Seq=HASH(0x10493304)
>> 
>> You're getting an object ref here.  The actual way
>> to get the rev. comp on
>> the wiki states '$seq->revcom->seq', not
>> '$seq->revcom'.
>> 
>> When I ran your script and change your line to the
>> wiki version I get (using
>> my test seq):
>> 
>> what attributes/keys are available:
>> primary_id      =>      test,
>> primary_seq     =>     
>> Bio::PrimarySeq=HASH(0x1d47fe0)
>> primary_id=test,
>> 
>> id=test,
>> 
>>
>revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGAT
CGCGCGGTCCGGCAGCATCG
>> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
>>
>CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCG
TCGGCCGCGGGCAGTTCGGCG
>> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
>>
>GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGT
CACGTTGGAGCGGGCCACGCG
>> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG
>> 
>> Sun May 14 17:34:45 2006
>> 
>> Chris
>> 
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-
>> > bounces at lists.open-bio.org] On Behalf Of chen li
>> > Sent: Sunday, May 14, 2006 12:15 PM
>> > To: bioperl-l at bioperl.org
>> > Subject: [Bioperl-l] no revcom method in Bio::Seq
>> module?
>> > 
>> > Hi all,
>> > 
>> > I need to get a reverse-complemenary sequence out
>> of a
>> > fasta sequence file. And the Synopsis of Bio::Seq
>> > points out I can do like this way:
>> > 
>> > $revcom=$seqobj->revcom();
>> > 
>> > I use the following script trying to get the job
>> done
>> > but it doesn't work. Then I read documentation of
>> > Bio::Seq and it looks like it doesn't contain
>> revcom
>> > method.
>> > 
>> > Any idea will be appreciated.
>> > 
>> > Li
>> > 
>> > 
>> > ###############################
>> > Here is the code:
>> > 
>> > #!c:/perl/bin/perl.exe
>> > use strict;
>> > use warnings;
>> > 
>> > use Bio::Seq;
>> > use Bio::SeqIO;
>> > 
>> > my
>> $file='c:/perl/local/primer3_1.0.0/src/est.txt';
>> > 
>> > 
>> > my $seqIO=Bio::SeqIO->new(-file=>"<$file",
>> >                             -format=>'fasta' );
>> > 
>> >     my $seqobj=$seqIO->next_seq();#create object
>> > 
>> >   print "what attributes/keys are available:\n";
>> >   for my $key (sort keys %$seqobj){
>> >            my $value=$seqobj->{$key};
>> > 	    print "$key\t=>\t$value\n"
>> > 	    }
>> > # These are the output on the screen
>> > #primary_id =>      gi|54093|emb|X61809.1|
>> > #primary_seq =>    
>> Bio::PrimarySeq=HASH(0x10492848)
>> > 
>> > #based on these results primary_id can get
>> > #access right away
>> > # as to primary_seq it is an object in
>> > #Bio::Primaryseq and it provides the following
>> > #methods after reading the documentaion:
>> >                 #new
>> > 		#seq
>> > 		#validate_seq
>> > 		#subseq
>> > 		#length
>> > 		#display_id
>> > 		#accession_number
>> > 		#primary_id
>> > 		#alphabet
>> > 		#desc
>> > 		#can_call_new
>> > 		#id
>> > 		#is_circular
>> > 		#object_id
>> > 		#version
>> > 		#authority
>> > 		#namespace
>> > 		#display_name
>> > 		#description
>> > 
>> > print "primary_id=",$seqobj->primary_id, "\n\n";
>> > print "id=",$seqobj->id, "\n\n";
>> > print "revcom=",$seqobj->revcom,"\n\n";
>> > 
>> >         my $now_time=localtime;
>> >         print  $now_time, "\n\n";
>> >         exit;
>> > 
>> >  #These are the output on the screen
>> > 	#primary_id=gi|54093|emb|X61809.1|
>> > 	#id=gi|54093|emb|X61809.1
>> > 	#revcom=Bio::Seq=HASH(0x10493304)
>> > 	#Sun May 14 12:45:20 2006
>> > 
>> > 
>> > 
>> > __________________________________________________
>> > Do You Yahoo!?
>> > Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>> > http://mail.yahoo.com
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> >
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 

From Marc.Logghe at DEVGEN.com  Sun May 14 16:28:34 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Sun, 14 May 2006 22:28:34 +0200
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAC@ANTARESIA.be.devgen.com>

Hi Li,
> doesn't work. Then I read documentation of Bio::Seq and it 
> looks like it doesn't contain revcom method.
Here, the Deobfuscator interface that Mauricio announced earlier, comes
in handy.
http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3
A%3ASeq&sort_order=by+method&search_string=
If you look in the methods table, you will find out that the revcom
method is inherited from, and implemented by Bio::PrimarySeqI.
HTH,
Marc 


From sb at mrc-dunn.cam.ac.uk  Mon May 15 04:18:11 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 09:18:11 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
 subspecies/variant names
In-Reply-To: <000f01c675e6$a61bde90$15327e82@pyrimidine>
References: <000f01c675e6$a61bde90$15327e82@pyrimidine>
Message-ID: <44683943.5020307@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sendu Bala wrote:
>> In bioperl up to at least 1.5.1, when one of the database modules 
>> comes across a species rank it does:
>> 
>> if ($rank eq 'species') { # get rid of genus from species name 
>> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); }
> 
> The XML example from NCBI Taxonomy I mentioned previously seems to 
> have everything in the classification, from superkingdom down to 
> species (no strain unfortunately, and I'm nit sure about subspecies);
> if it's missing the rank then the designation doesn't exist or is 
> tagged as 'no rank'.  Like I mentioned before I'm not intimately 
> familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I 
> don't have a clue as to how everything is parsed and plugged in to 
> Bio::Taxonomy objects.  I do know that XML::Twig is used for parsing
> through the data so it shouldn't be too hard to change what you
> want.

Yes, that's all true, but I'm not sure what it has to do with what I was
saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my
own implementation I change the rank of all 'no rank' Nodes below
species to 'variant'.


> I haven't tried using Bio::DB::Taxonomy directly yet, but I would 
> have thought that the binomial is just built from the XML twig 
> 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the
> tag 'Genus' and species from 'Species', and that the scientific name
> is from the tag 'ScientificName'.  Guess not.

No. See above for what it actually does. That is a copy/paste from the
code (there, $taxon_name == ScientificName). When it finds a species
rank it does that split because in the
ncbi taxonomy database the 'genus' rank for a human has a ScientificName
of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo
sapiens', and the bioperl model (quite rightly, I think) wants the
'species' node to not have information of other nodes (well, except for
the classification array). So it removes the 'Homo' from 'Homo sapiens'
giving a species name of 'sapiens'. This then allows the binomial method
to return 'Homo sapiens' instead of 'Homo Homo sapiens'.

(though in a bizarre twist, and this is one of my problems with how
names are currently represented in the Taxonomy modules, 'Scientific
Name' and 'binomial' are synonymous)


[snip]
>> My solution is to just remove whatever is the same between the 
>> current rank and the previous rank. Maybe even that's not so 
>> perfect, but it must be a lot better than turning the species 
>> 'Avian leukosis virus' into the species 'virus' (especially given 
>> that the genus here is 'Alpharetrovirus')!
> 
> I'm don't think taking Genus/Species directly from the scientific 
> name (normally what is in the SOURCE or ORGANISM annotation for 
> GenBank or OS for EMBL) is the best way to go about it [snip]

Perhaps, but again I'm not sure what this has to do with what I was
saying. If you don't want your species name to contain your genus name
you have to do some kind of parsing. My post merely pointed out that the
parsing currently in bioperl does not work for viruses and possibly
other species. I'd like to think that someone cares about this error and
would do the simple fix I offered, or that they already know about the
problem and have done their own fix.


> I'm also not sure that forcing a lookup for every TaxID in every 
> sequence every time it's passed through SeqIO is the best way to go 
> either, though I think it should be required for storing sequences. 
> It's a tricky balance.

In my own implementation any database lookups are cached, and you have
the option of not doing any database lookup at all and 'faking' a
taxonomy from the supplied list of names (so it works just like normal
Bio::Seq).


> I still think that maybe we should absolve ourselves from using 
> SOURCE/ORGANISM or OS/OC information in GenBank files as anything 
> more than strictly annotation, or reconstruct Bio::Species to maybe a
>  Bio::Annotation::Species object to handle that annotation and either
>  deprecate Bio::Species or separate it completely from any 
> Bio::Taxonomy objects.  It would really simplify things.  Then, if 
> anyone is interested in taxonomy, either install a local database or
>  use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course)
>  to grab the TaxID info.

My personal view is that having it as an annotation would serve no real
purpose. For me the whole point of any kind of species representation in
bioperl is to allow you to compare species in a biologically meaningful
way. If it's just some annotation then that means it's basically
free-form text and you have no guarantee that two sequences from the
same species are annotated exactly the same - no guarantee that your
code would identify that those sequences are from the same species.
The only other useful thing that a species object needs to do it let you
know how related two different species are - you need to be able to ask
what a species' class, kingdom etc. are. Again, not viable with an
annotation - you need something strict like a properly constructed Taxonomy.

I guess it comes down to the philosophy of parsing a file. Do you try
and reflect exactly what the file contains, letter for letter, so that
your resulting object can recreate that file letter for letter, or do
you parse the file and extract the correct /meaning/ in order to be more
useful?
I think there can be a choice by the user, and this is best done by
making Bio::Species a clever wrapper around an improved Bio::Taxonomy,
as in my own implementation.

From s_maheshwari84 at rediffmail.com  Mon May 15 04:15:26 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 15 May 2006 08:15:26 -0000
Subject: [Bioperl-l] please help
Message-ID: <20060515081526.27270.qmail@webmail7.rediffmail.com>

  
Hello All
I have sent a problem to the earlier also but my problem is still unsolve so i have modified the problem in another way please can any body give me code to make a graph between some items which are in a text file in the following formate:
Example
item1 interacts with item2 and i want to make graph by giving any item as input and asking all interactions of that item.

item 1      item 2 
A            B
A            C
C            B
D            B
D            E
A            F
G            A     

with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI

From sdavis2 at mail.nih.gov  Mon May 15 06:26:53 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 15 May 2006 06:26:53 -0400
Subject: [Bioperl-l] please help
In-Reply-To: <20060515081526.27270.qmail@webmail7.rediffmail.com>
Message-ID: <C08DCFAD.B7D2%sdavis2@mail.nih.gov>


On 5/15/06 4:15 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
wrote:

>   
> Hello All
> I have sent a problem to the earlier also but my problem is still unsolve so i
> have modified the problem in another way please can any body give me code to
> make a graph between some items which are in a text file in the following
> formate:
> Example
> item1 interacts with item2 and i want to make graph by giving any item as
> input and asking all interactions of that item.
> 
> item 1      item 2
> A            B
> A            C
> C            B
> D            B
> D            E
> A            F
> G            A   

Not a bioperl answer, but in your case, I would suggest looking at using
cytoscape to do this.  Look here for details:

http://www.cytoscape.org/

Sean


From sdavis2 at mail.nih.gov  Mon May 15 07:03:28 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 15 May 2006 07:03:28 -0400
Subject: [Bioperl-l] please help
In-Reply-To: <C08DCFAD.B7D2%sdavis2@mail.nih.gov>
Message-ID: <C08DD840.B7DE%sdavis2@mail.nih.gov>


On 5/15/06 6:26 AM, "Sean Davis" <sdavis2 at mail.nih.gov> wrote:

> 
> 
> 
> On 5/15/06 4:15 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
> wrote:
> 
>>   
>> Hello All
>> I have sent a problem to the earlier also but my problem is still unsolve so
>> i
>> have modified the problem in another way please can any body give me code to
>> make a graph between some items which are in a text file in the following
>> formate:
>> Example
>> item1 interacts with item2 and i want to make graph by giving any item as
>> input and asking all interactions of that item.
>> 
>> item 1      item 2
>> A            B
>> A            C
>> C            B
>> D            B
>> D            E
>> A            F
>> G            A  
> 
> Not a bioperl answer, but in your case, I would suggest looking at using
> cytoscape to do this.  Look here for details:
> 
> http://www.cytoscape.org/

I forgot to mention, if you are looking for a perl solution, I would look at
the Graph module.

http://search.cpan.org/~jhi/Graph-0.69/lib/Graph.pod

You can create the graph according to the docs and then use the neighbors()
method (if I remember correctly) to get the nodes connected to the query
node.

Sean


From akarger at CGR.Harvard.edu  Mon May 15 08:20:11 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 15 May 2006 08:20:11 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>

This tool is quite nice, and may save me a lot of perdoc'ing.

A couple of minor interface thoughts. 

1)There's quite a lot of methods for many of the classes. As such, I
think I'll often want to browse through what's available in a class. But
60% or so of the screen real estate is used for "Enter a search
string... OR select a class from the list". IMO, it would be better to
have two pages, a search page and a result page.   It only takes a click
on Back (or a "new search" button) to get to a new search, and now you
can use your whole screen for reading your results.

2) Please sort the "select a class from the list" alphabetically. I
guess I can enter a search term to get the right classes, but it would
be nice to be able to browse.
2a) if you want to be really fancy, make a javascript nested menu with
expandable submenus. OK, maybe not.

3) Minimalist is nice, but documentation is even nicer. It wasn't clear
to me that the search searches within class names rather than function
names. What I really want to know sometimes is which module has, say,
the revcom method in it. So, if it's not easy to include that within
this search, then at least tell me what my search space is.

4) When I search for something that's not found, I get a screen that
looks pretty familiar, with the extra text "No match to string found"
down at the bottom. It took me a while to even notice it. (Studies show
that most users don't read most of the text on a page.) Bold might be
nice here. Or put the error at the top of the screen. Or both.

5) I'll save my stupidest comment for last - please make the page title
"Bioperl Deobfuscator", so that when I bookmark it I'll know what the
bookmark stands for.

Thanks, Laura Kavanaugh and David Messina, for a neat AND useful tool.

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626


From sb at mrc-dunn.cam.ac.uk  Mon May 15 09:08:32 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 14:08:32 +0100
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>
References: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>
Message-ID: <44687D50.6080306@mrc-dunn.cam.ac.uk>

Amir Karger wrote:
> This tool is quite nice, and may save me a lot of perdoc'ing.

Yes, many thanks to everyone involved.


> A couple of minor interface thoughts. 
> 
> 1)There's quite a lot of methods for many of the classes. As such, I
> think I'll often want to browse through what's available in a class. But
> 60% or so of the screen real estate is used for "Enter a search
> string... OR select a class from the list". IMO, it would be better to
> have two pages, a search page and a result page.   It only takes a click
> on Back (or a "new search" button) to get to a new search, and now you
> can use your whole screen for reading your results.

As the compromise it must be, I like the way it behaves. I don't like 
lots of windows. I especially don't like pop up windows. Right now when 
I'm using the bioperl docs I tend to have a whole bunch of tabs open to 
different class pages at once, so being able to see an overview all on 
one page in Deobfuscator is very nice.

Further to that, I'd love it if clicking on a method name caused an 
in-place css(&|javascript) reveal (similar to how a well implemented 
drop down menu works in a website) rather than a new window opened. 
Alternatively, just have more columns in the results table, ie. usage, 
function, returns, args columns. I feel that opening a window for each 
method you want to understand is far too slow.

I'd also really like a link to the code for the method as well. The 
bioperl docs are rarely complete enough that you can really understand 
what every method is supposed to do without looking at the code.


> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
> to me that the search searches within class names rather than function
> names. What I really want to know sometimes is which module has, say,
> the revcom method in it.

This would be a great feature to add.


Another minor interface thought:
6) Have a little more cell padding in all the tables. Things are just a 
little too cramped and things start to look messy/ run into each other.

From cjfields at uiuc.edu  Mon May 15 09:59:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 08:59:57 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <44687D50.6080306@mrc-dunn.cam.ac.uk>
Message-ID: <000901c67827$d99eabb0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, May 15, 2006 8:09 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Amir Karger wrote:
> > This tool is quite nice, and may save me a lot of perdoc'ing.
> 
> Yes, many thanks to everyone involved.

The Deobfuscator currently indexes bioperl-1.4, so it's not completely
up-to-date.  I believe Mauricio and Dave may be working on updating to the
newer versions and maybe bioperl-live, as well as getting the other bioperl
packages up and running.

For modules added after v1.4 I use the script in the FAQ question mentioned
on the Deobfuscator wiki page to get up-to-date methods, then grab the that
ActiveState HTML'd perldocs pumped out when installing using PPM (I make a
custom PPM/PPD file and install myself every once in a while):

#!/usr/bin/perl -w
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector-

> > A couple of minor interface thoughts.
> >
> > 1)There's quite a lot of methods for many of the classes. As such, I
> > think I'll often want to browse through what's available in a class. But
> > 60% or so of the screen real estate is used for "Enter a search
> > string... OR select a class from the list". IMO, it would be better to
> > have two pages, a search page and a result page.   It only takes a click
> > on Back (or a "new search" button) to get to a new search, and now you
> > can use your whole screen for reading your results.
> 
> As the compromise it must be, I like the way it behaves. I don't like
> lots of windows. I especially don't like pop up windows. Right now when
> I'm using the bioperl docs I tend to have a whole bunch of tabs open to
> different class pages at once, so being able to see an overview all on
> one page in Deobfuscator is very nice.
>
> Further to that, I'd love it if clicking on a method name caused an
> in-place css(&|javascript) reveal (similar to how a well implemented
> drop down menu works in a website) rather than a new window opened.
> Alternatively, just have more columns in the results table, ie. usage,
> function, returns, args columns. I feel that opening a window for each
> method you want to understand is far too slow.

Agreed.

> I'd also really like a link to the code for the method as well. The
> bioperl docs are rarely complete enough that you can really understand
> what every method is supposed to do without looking at the code.

The methods that pop up are in columns along with the class module that
implements the method.  


If you click on that link you get PDOC documentation for the module which
includes most of the code (strangely, though Deobfuscator indexes bioperl
1.4, the PDOC corresponds to bioperl-live).  Is that what you meant, or
something a bit more detailed?

> > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
> > to me that the search searches within class names rather than function
> > names. What I really want to know sometimes is which module has, say,
> > the revcom method in it.

That's listed in the method results table (the next column has the module
with a link to the module's online docs).


Chris


> This would be a great feature to add.
> 
> 
> Another minor interface thought:
> 6) Have a little more cell padding in all the tables. Things are just a
> little too cramped and things start to look messy/ run into each other.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 12:08:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 11:08:30 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <44683943.5020307@mrc-dunn.cam.ac.uk>
Message-ID: <001601c67839$cf289490$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, May 15, 2006 3:18 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
> subspecies/variant names
> 
> Chris Fields wrote:
> > Sendu Bala wrote:
> >> In bioperl up to at least 1.5.1, when one of the database modules
> >> comes across a species rank it does:
> >>
> >> if ($rank eq 'species') { # get rid of genus from species name
> >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); }
> >
> > The XML example from NCBI Taxonomy I mentioned previously seems to
> > have everything in the classification, from superkingdom down to
> > species (no strain unfortunately, and I'm nit sure about subspecies);
> > if it's missing the rank then the designation doesn't exist or is
> > tagged as 'no rank'.  Like I mentioned before I'm not intimately
> > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I
> > don't have a clue as to how everything is parsed and plugged in to
> > Bio::Taxonomy objects.  I do know that XML::Twig is used for parsing
> > through the data so it shouldn't be too hard to change what you
> > want.
> 
> Yes, that's all true, but I'm not sure what it has to do with what I was
> saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my
> own implementation I change the rank of all 'no rank' Nodes below
> species to 'variant'.

Sorry; wandered a bit off topic there.

> > I haven't tried using Bio::DB::Taxonomy directly yet, but I would
> > have thought that the binomial is just built from the XML twig
> > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the
> > tag 'Genus' and species from 'Species', and that the scientific name
> > is from the tag 'ScientificName'.  Guess not.
> 
> No. See above for what it actually does. That is a copy/paste from the
> code (there, $taxon_name == ScientificName). When it finds a species
> rank it does that split because in the
> ncbi taxonomy database the 'genus' rank for a human has a ScientificName
> of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo
> sapiens', and the bioperl model (quite rightly, I think) wants the
> 'species' node to not have information of other nodes (well, except for
> the classification array). So it removes the 'Homo' from 'Homo sapiens'
> giving a species name of 'sapiens'. This then allows the binomial method
> to return 'Homo sapiens' instead of 'Homo Homo sapiens'.
> 
> (though in a bizarre twist, and this is one of my problems with how
> names are currently represented in the Taxonomy modules, 'Scientific
> Name' and 'binomial' are synonymous)
 
Ah, now I see.  That's a bit screwy, but it's not on our end so we have to
deal with it.  I also noticed that subspecies also contains the entire
string:

    <Taxon>
      <TaxId>135461</TaxId>
      <ScientificName>Bacillus subtilis subsp. subtilis</ScientificName>
      <Rank>subspecies</Rank>
    </Taxon>

As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy,
I don't get the actual scientific name for the node (from the GenBank
ORGANISM line) almost every time; I get the name with the strain chopped off
instead and a number of times the names get mangled.  The regexes below only
grab from the topmost tags:

Script:
---------------------------------
#! perl
use strict;
use warnings;

use Bio::DB::Taxonomy;
my $file = shift @ARGV;

print "\nNCBI XML output ScientificName tag for each node:\n";
my @taxid =();
open (TAXFILE, "<tax.xml") or die "Can't open file:$!\n";
while (<TAXFILE>){
	if (/^\s{2}<TaxId>(\d+)<\/TaxId>/) {
		print "$1\t";
		push @taxid, $1;
	}
	print "$1\n" if /^\s{2}<ScientificName>(.*)<\/ScientificName>/;
}
close TAXFILE;

print "\nBio::DB::Taxonomy scientific_name:\n";
for my $id (@taxid){
	my $factory = Bio::DB::Taxonomy->new(-source => 'entrez');
	my $node = $factory->get_Taxonomy_Node(-taxonid => $id);
	print $node->ncbi_taxid,"\t",$node->scientific_name,"\n";
}
---------------------------------

Output:
---------------------------------
NCBI XML output ScientificName tag for each node:
191218  Bacillus anthracis str. A2012
198094  Bacillus anthracis str. Ames
222523  Bacillus cereus ATCC 10987
224308  Bacillus subtilis subsp. subtilis str. 168
226186  Bacteroides thetaiotaomicron VPI-5482
226900  Bacillus cereus ATCC 14579
246194  Carboxydothermus hydrogenoformans Z-2901
260799  Bacillus anthracis str. Sterne
261594  Bacillus anthracis str. 'Ames Ancestor'
264462  Bdellovibrio bacteriovorus HD100
272558  Bacillus halodurans C-125
272559  Bacteroides fragilis NCTC 9343
279010  Bacillus licheniformis ATCC 14580
281309  Bacillus thuringiensis serovar konkukian str. 97-27
288681  Bacillus cereus E33L
295405  Bacteroides fragilis YCH46
66692   Bacillus clausii KSM-K16
76114   Azoarcus sp. EbN1

Bio::DB::Taxonomy scientific_name:
191218  Bacillus cereus group anthracis
198094  Bacillus cereus group anthracis
222523  Bacillus cereus group cereus
224308  subtilis Bacillus subtilis subsp. subtilis
226186  Bacteroides thetaiotaomicron
226900  Bacillus cereus group cereus
246194  Carboxydothermus hydrogenoformans
260799  Bacillus cereus group anthracis
261594  Bacillus cereus group anthracis
264462  Bdellovibrio bacteriovorus
272558  Bacillus halodurans
272559  Bacteroides fragilis
279010  Bacillus licheniformis
281309  Bacillus cereus group thuringiensis
288681  Bacillus cereus group cereus
295405  Bacteroides fragilis
66692   Bacillus clausii
76114   Azoarcus sp.
---------------------------------
Note Bacillus subtilis in the Bio::Tax output above.  Not one of those is
the scientific name as defined by NCBI (and most taxonomists for that
matter).

So, in a nutshell, there's a problem here.  I don't know if your fix works
for that, but I definitely don't think the 'scientific name' should be
assembled ad hoc but should be taken from the tagname for that node.  I am
currently reduced to grabbing the feature primary_tagged 'source' and
getting the 'organism' tagname from that.  I cannot stress enough that it
should NOT be that way.

As for 'binomial' == 'scientific_name', I agree; I see it as well and that
should be fixed.
 
...
> Perhaps, but again I'm not sure what this has to do with what I was
> saying. If you don't want your species name to contain your genus name
> you have to do some kind of parsing. My post merely pointed out that the
> parsing currently in bioperl does not work for viruses and possibly
> other species. I'd like to think that someone cares about this error and
> would do the simple fix I offered, or that they already know about the
> problem and have done their own fix.

Again me going off-topic, so my apologies; it's more to do with my
frustrations with Bio::Species (not Bio::DB::Taxonomy).  My point here was,
since there is no real way to surmise from a GenBank flatfile what the
taxonomic ranks are w/o guessing (which seems to break more often than not
when dealing with complex names), there shouldn't be any tie to Bio::Tax
objects, at least directly.  I guess methods could be incorporated into
Bio::Species for those who want to give it a try, but I would like to get a
GenBank file, for once, in which the scientific name/binomial name isn't
mangled by Bio::Species.

Back to Bio::DB::Taxonomy; I don't have a problem with implementing your
methods here; on the contrary, if they fix my problem above then I'll be
more than glad to.  I can't get to it immediately but maybe later
today/tomorrow.
 
> > I'm also not sure that forcing a lookup for every TaxID in every
> > sequence every time it's passed through SeqIO is the best way to go
> > either, though I think it should be required for storing sequences.
> > It's a tricky balance.
> 
> In my own implementation any database lookups are cached, and you have
> the option of not doing any database lookup at all and 'faking' a
> taxonomy from the supplied list of names (so it works just like normal
> Bio::Seq).
>
> 
> > I still think that maybe we should absolve ourselves from using
> > SOURCE/ORGANISM or OS/OC information in GenBank files as anything
> > more than strictly annotation, or reconstruct Bio::Species to maybe a
> >  Bio::Annotation::Species object to handle that annotation and either
> >  deprecate Bio::Species or separate it completely from any
> > Bio::Taxonomy objects.  It would really simplify things.  Then, if
> > anyone is interested in taxonomy, either install a local database or
> >  use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course)
> >  to grab the TaxID info.
> 
> My personal view is that having it as an annotation would serve no real
> purpose. For me the whole point of any kind of species representation in
> bioperl is to allow you to compare species in a biologically meaningful
> way. If it's just some annotation then that means it's basically
> free-form text and you have no guarantee that two sequences from the
> same species are annotated exactly the same - no guarantee that your
> code would identify that those sequences are from the same species.
> The only other useful thing that a species object needs to do it let you
> know how related two different species are - you need to be able to ask
> what a species' class, kingdom etc. are. Again, not viable with an
> annotation - you need something strict like a properly constructed
> Taxonomy.

My point is, a large number of users do NOT use, nor care about, taxonomic
information to the degree they need to know the entire classification of the
organism; many are just as happy about getting the scientific name only,
which is in the GenBank/EMBL file itself.  To take one extreme, it is not
productive to force every user to download the NCBI tax database and use
lookups just to convert sequences from EMBL format to GenBank format.  It's
not productive to allow users to spam the NCBI tax database remotely either,
so hardcoding lookups is, IMHO, a big mistake.  

> I guess it comes down to the philosophy of parsing a file. Do you try
> and reflect exactly what the file contains, letter for letter, so that
> your resulting object can recreate that file letter for letter, or do
> you parse the file and extract the correct /meaning/ in order to be more
> useful?
> I think there can be a choice by the user, and this is best done by
> making Bio::Species a clever wrapper around an improved Bio::Taxonomy,
> as in my own implementation.

I understand both philosophies, but the latter implies that you know the
intention of the ones submitting the sequence.  99.9% of the time that's
fine, something I can live with.  However, when we mess up something as
simple as getting the scientific name for an organism when the information
is directly in the flat file (ORGANISM line) by trying to 'imply' what the
classification is, yes, I get frustrated.  Even more frustrating to me is
that Bio::DB::Taxonomy, which should return accurate information directly
from the Taxonomy database, still manages to screw up the scientific name.  

The NCBI definition in the sample record:

http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

state that the ORGANISM line contains the formal scientific name and it's
lineage (no ranking).  If the lineage is very long it is abbreviated so you
don't get the same thing as you would through using TaxID. 

So, in essence, I believe you are correct, that Bio::Species can be used as
a 'wrapper' for Bio::Taxonomy objects, but only up to a certain degree with
caveats or warnings for possible inaccuracies.  I also believe that lookups
should be allowed but optional, not required (i.e. left up to the user, as
you state).  

I just feel that it's somewhat misleading to imply, by delegating to
Bio::Taxonomy, that Bio::Species contains accurate taxonomic information
when NCBI themselves state that the GenBank flatfile classification can be
incomplete and does not supply rankings (genus, species) in the file.  It's
our best guess in most cases, and a best guess by definition is not very
accurate.  If you want taxonomic accuracy, use the TaxID and a local tax
database.  I feel that we shouldn't punish those who don't worry/care about
taxonomy by implementing Bio::Species with methods that mangle data that's
directly in the flat file they're parsing.

Okay, not to cut short this discussion, but I have to get back to $job.
I'll try adding your fixes in a bit later today/tomorrow; if they pass tests
I'll commit them in.

Chris


From hlapp at gmx.net  Mon May 15 12:59:06 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 12:59:06 -0400
Subject: [Bioperl-l] error loading uniprot release 49.6 into mysql
In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
Message-ID: <C78E4724-CC95-483E-876B-69AF7C1CC6AF@gmx.net>

You found the right instance. Unfortunately with the way the bioperl  
swissprot parser works the group (RG) isn't promoted to author if  
there is no author in addition (in fact you may debate whether that  
would even be the best way of doing things), so it doesn't find it on  
second occurrence by unique key.

If you can live without this entry, or any other entry that causes a  
hiccup, just supply the flag --safe and it will gracefully move on to  
the next entry.

Fixing the issue would require either to fix the bioperl swissprot  
parser (or Bio::Annotation::Reference) to stick the RG group into the  
author slot if there is no author, or to fix Bioperl  
Bio::Annotation::Reference to also feature a group and biosql to use  
it in place of a missing author.

Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql)  
should just use that in place of a missing author?

The downside is that upon round-tripping an entry, the RG annotation  
line will become an RA annotation line. How bad would that be?

Any thoughts from anyone?

	-hilmar

On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote:

> I found where the script is hiccuping....
>
> The Uniprot release contains lines with identical annotation for  
> the RL keyword for two different sequences.
>
> ___________________
>
> First occurence...
> ___________________
>
> ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
> AC   Q5RFJ2; Q5RDK2;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein theta.
> GN   Name=YWHAQ;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Brain cortex, and Kidney;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.   
> <======  Not Unique
>
>
> ___________________
>
> Second occurence...
> ___________________
>
>
> ID   1433G_PONPY    STANDARD;      PRT;   246 AA.
> AC   Q5RC20;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein gamma.
> GN   Name=YWHAG;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Heart;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.    
> <======  Not Unique
>
>
>
> in these two cases the generated CRC key is identical and so MySQL  
> throws a wobbly.
>
> if i look at the MySQL entry in the REFERENCE table for the first  
> sequence
> ------+-------+---------+----------------------+
> |          139 |      NULL | Submitted (NOV-2004) to the EMBL/ 
> GenBank/DDBJ databases. | NULL  | NULL    | CRC-E7973FEA4B5611DC |
> +--------------+----------- 
> +----------------------------------------------------
>
> and the error when the script choked was
>
>  MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
> values were
>  ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ
>  databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
>  Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3
>
> hence the problem.
>
> I'm guessing i'm not the first person to encounter this, but dont  
> see any hints for an easy way around this.
>
> any suggestions....?
>
> ta
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon May 15 13:01:14 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 13:01:14 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <4466AD7F.6050700@campus.iztacala.unam.mx>
References: <4466AD7F.6050700@campus.iztacala.unam.mx>
Message-ID: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net>

Hey, thanks to Laura & David for this interface.

Any idea why most of the Bio::Ontology::* modules show up without  
their leading Bio::Ontology? And clicking on those hyperlinks doesn't  
go anywhere either ... Anything different with those modules that I  
can fix?

	-hilmar

On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:

> I'm glad to announce the availability of the Deobfuscator interface at
> the BioPerl website. You can use it at the following URL:
>
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> Many thanks to Laura Kavanaugh and David Messina for this great
> contribution to the BioPerl project!
>
> Mauricio.
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 13:22:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 12:22:13 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net>
Message-ID: <000301c67844$1b506280$15327e82@pyrimidine>

That's strange.  Clicking on the list gives me the results for that module.
When I click on the hyperlinks in the results section they open fine; the
method column links opens a new page containing usage-function-returns-args
and the class column links opens pdoc (same page) for bioperl-live.  I'm
using Firefox 1.5 on WinXP.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 12:01 PM
> To: Mauricio Herrera Cuadra
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Hey, thanks to Laura & David for this interface.
> 
> Any idea why most of the Bio::Ontology::* modules show up without
> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> go anywhere either ... Anything different with those modules that I
> can fix?
> 
> 	-hilmar
> 
> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> 
> > I'm glad to announce the availability of the Deobfuscator interface at
> > the BioPerl website. You can use it at the following URL:
> >
> > http://bioperl.org/cgi-bin/deob_interface.cgi
> >
> > Many thanks to Laura Kavanaugh and David Messina for this great
> > contribution to the BioPerl project!
> >
> > Mauricio.
> >
> > --
> > MAURICIO HERRERA CUADRA
> > arareko at campus.iztacala.unam.mx
> > Laboratorio de Gen?tica
> > Unidad de Morfofisiolog?a y Funci?n
> > Facultad de Estudios Superiores Iztacala, UNAM
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Mon May 15 14:00:15 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 19:00:15 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
 subspecies/variant names
In-Reply-To: <001601c67839$cf289490$15327e82@pyrimidine>
References: <001601c67839$cf289490$15327e82@pyrimidine>
Message-ID: <4468C1AF.9080400@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Ah, now I see.  That's a bit screwy, but it's not on our end so we have to
> deal with it.  I also noticed that subspecies also contains the entire
> string:
> 
>     <Taxon>
>       <TaxId>135461</TaxId>
>       <ScientificName>Bacillus subtilis subsp. subtilis</ScientificName>
>       <Rank>subspecies</Rank>
>     </Taxon>

Yes, this is one of the problems I mentioned in the first post to this
thread.


> As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy,
> I don't get the actual scientific name for the node (from the GenBank
> ORGANISM line) almost every time; I get the name with the strain chopped off
> instead and a number of times the names get mangled.

[snip, should be:]
> 224308  Bacillus subtilis subsp. subtilis str. 168
> 281309  Bacillus thuringiensis serovar konkukian str. 97-27

[snip, but Bio::DB::Taxonomy gives:]
> 224308  subtilis Bacillus subtilis subsp. subtilis
> 281309  Bacillus cereus group thuringiensis

[snip]
> So, in a nutshell, there's a problem here.  I don't know if your fix works
> for that, but I definitely don't think the 'scientific name' should be
> assembled ad hoc but should be taken from the tagname for that node.

Yes, my implementation will get you the correct answer, but not quite as
you say. My solution was to munge the actual ScientificName but 'ensure'
that the binomial would give you back the actual binomial name you
wanted - which is the intent of current Bio::DB::Taxonomy code.

my $species0 = TFBS::Species->new(-ncbi_taxid => 224308);
my $leaf_node = $species0->taxonomy->get_leaves();
print "sci_name of Node = '", $leaf_node->scientific_name, "'\n";
print "Species0 subspecies = '", $species0->subspecies, "'\n";
print "Species0 variants = '", scalar($species0->variant), "'\n";
print "Species0 binomial = '", $species0->binomial('FULL'), "'\n";

gives:
sci_name of Node = 'str. 168'
Species0 subspecies = 'subsp. subtilis'
Species0 variants = 'str. 168'
Species0 binomial = 'Bacillus subtilis subsp. subtilis str. 168'

and the same again for id 281309:

sci_name of Node = 'str. 97-27'
Species0 subspecies = ''
Species0 variants = 'serovar konkukian str. 97-27'
Species0 binomial = 'Bacillus thuringiensis serovar konkukian str. 97-27'

I've done it this way because even though strictly speaking the
ScientificName for 224308 (a 'no rank') is 'Bacillus subtilis subsp.
subtilis str. 168', when I ask for the variant I don't want that whole
string. I just want the bit that will be different when comparing other
strains of this subspecies of this species of Bacillus. I want 'str.
168'. Note that my objects never store the original ScientificName; it
is due to 'luck' (or as I like to think, a good implementation) that the
binomial method is able to reconstruct a string that is identical to
what the original ScientificName was.

If you'd like to see my code let me know. You can't just drop the code
snippet I posted in this thread into existing bioperl modules; quite a
bit else has to change as well. I'll have to make an updated
taxonomy_the_tfbs_way.tar.gz file available if you want an example
implementation; the current version of that file is now out of date - it
doesn't do any of what I describe above.


From hlapp at gmx.net  Mon May 15 14:08:49 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 14:08:49 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000301c67844$1b506280$15327e82@pyrimidine>
References: <000301c67844$1b506280$15327e82@pyrimidine>
Message-ID: <F85F6F46-3AB7-4D42-825B-BAD4CA748FC8@gmx.net>

Safari or Firefox on MacOSX don't do this. Note that the appearance  
in the browsable list is already different (the prefix is missing),  
and the JavaScript link also lacks the prefix in the module name in  
contrast to others, e.g., Bio::Ontology::Ontology (which is one of  
the few Bio::Ontology exceptions that do work and do display correctly).

I suppose there is something peculiar about the code formatting of  
those modules? Some of the modules under Bio::OntologyIO are also  
affected BTW.

What happens is after you click on the link the page apppears to  
reload (i.e., gets submitted) but the second table that is supposed  
open underneath the first doesn't appear. However, the sort-by drop  
down selector does appear.

	-hilmar

On May 15, 2006, at 1:22 PM, Chris Fields wrote:

> That's strange.  Clicking on the list gives me the results for that  
> module.
> When I click on the hyperlinks in the results section they open  
> fine; the
> method column links opens a new page containing usage-function- 
> returns-args
> and the class column links opens pdoc (same page) for bioperl- 
> live.  I'm
> using Firefox 1.5 on WinXP.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Monday, May 15, 2006 12:01 PM
>> To: Mauricio Herrera Cuadra
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Hey, thanks to Laura & David for this interface.
>>
>> Any idea why most of the Bio::Ontology::* modules show up without
>> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
>> go anywhere either ... Anything different with those modules that I
>> can fix?
>>
>> 	-hilmar
>>
>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>
>>> I'm glad to announce the availability of the Deobfuscator  
>>> interface at
>>> the BioPerl website. You can use it at the following URL:
>>>
>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>
>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>> contribution to the BioPerl project!
>>>
>>> Mauricio.
>>>
>>> --
>>> MAURICIO HERRERA CUADRA
>>> arareko at campus.iztacala.unam.mx
>>> Laboratorio de Gen?tica
>>> Unidad de Morfofisiolog?a y Funci?n
>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 15:07:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 14:07:59 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <F85F6F46-3AB7-4D42-825B-BAD4CA748FC8@gmx.net>
Message-ID: <000501c67852$e1bb55c0$15327e82@pyrimidine>

I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab
which I can try it on).  I'll let you know what I find.  

This is what I get when I do a search for 'Bio::Ont*' using Firefox on WinXP
and this Deobfuscator link (http://bioperl.org/cgi-bin/deob_interface.cgi?);
all the classes have links that work (I added newline and tab to make it a
bit more readable) :

Bio::OntologyIO	
	Parser factory for Ontology formats
Bio::OntologyIO::Handlers::BaseSAXHandler	
	no short description available
Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
	no short description available
Bio::Ontology::OntologyI
	Interface for an ontology implementation
Bio::Ontology::TermFactory
	Instantiates a new Bio::Ontology::TermI (or derived class) through a
factory
Bio::Ontology::OntologyStore
	A repository of ontologies
Bio::Ontology::RelationshipFactory
	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
through a factory
Bio::Ontology::Ontology
	standard implementation of an Ontology

So the names seem fine here.

When I click on a class (Bio::Ontology::Ontology) I get in the results
section:

Method                  Class                                     Returns
Usage
add_relationship        Bio::Ontology::Ontology	                  Its
argument.     add_relationship(RelationshipI relationship): RelationshipI
add_relationship_type   Bio::Ontology::OntologyEngineI            not
documented    not documented
add_term                Bio::Ontology::Ontology                   its
argument.     add_term(TermI term): TermI

....and so on

Where each method is clickable and opens a new page containing a table:

Bio::Ontology::Ontology::add_relationship
Usage	add_relationship(RelationshipI relationship): RelationshipI
Function	Adds a relationship object to the ontology engine.
Returns	Its argument.
Args	A RelationshipI object.


Each class is also linked to the bioperl-live PDOC.  Clicking on class
Bio::Ontology::Ontology in the results table gets me this page (no new
page):

http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html


Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Monday, May 15, 2006 1:09 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Safari or Firefox on MacOSX don't do this. Note that the appearance
> in the browsable list is already different (the prefix is missing),
> and the JavaScript link also lacks the prefix in the module name in
> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> the few Bio::Ontology exceptions that do work and do display correctly).
> 
> I suppose there is something peculiar about the code formatting of
> those modules? Some of the modules under Bio::OntologyIO are also
> affected BTW.
> 
> What happens is after you click on the link the page apppears to
> reload (i.e., gets submitted) but the second table that is supposed
> open underneath the first doesn't appear. However, the sort-by drop
> down selector does appear.
> 
> 	-hilmar
> 
> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> 
> > That's strange.  Clicking on the list gives me the results for that
> > module.
> > When I click on the hyperlinks in the results section they open
> > fine; the
> > method column links opens a new page containing usage-function-
> > returns-args
> > and the class column links opens pdoc (same page) for bioperl-
> > live.  I'm
> > using Firefox 1.5 on WinXP.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >> Sent: Monday, May 15, 2006 12:01 PM
> >> To: Mauricio Herrera Cuadra
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> Hey, thanks to Laura & David for this interface.
> >>
> >> Any idea why most of the Bio::Ontology::* modules show up without
> >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> >> go anywhere either ... Anything different with those modules that I
> >> can fix?
> >>
> >> 	-hilmar
> >>
> >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>
> >>> I'm glad to announce the availability of the Deobfuscator
> >>> interface at
> >>> the BioPerl website. You can use it at the following URL:
> >>>
> >>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>
> >>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>> contribution to the BioPerl project!
> >>>
> >>> Mauricio.
> >>>
> >>> --
> >>> MAURICIO HERRERA CUADRA
> >>> arareko at campus.iztacala.unam.mx
> >>> Laboratorio de Gen?tica
> >>> Unidad de Morfofisiolog?a y Funci?n
> >>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From cjfields at uiuc.edu  Mon May 15 15:12:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 14:12:34 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <000601c67853$85d49cc0$15327e82@pyrimidine>

I just tried the same thing (links, search, etc) with Mac OS X v 10.3.9 and
Safari (no Firefox sorry) and it worked fine as well (all links, no missing
Bio::Ontology, etc).  Not sure what it could be...

Chris

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Monday, May 15, 2006 2:08 PM
> To: 'Hilmar Lapp'
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: RE: [Bioperl-l] Deobfuscator interface now available
> 
> I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab
> which I can try it on).  I'll let you know what I find.
> 
> This is what I get when I do a search for 'Bio::Ont*' using Firefox on
> WinXP and this Deobfuscator link (http://bioperl.org/cgi-
> bin/deob_interface.cgi?); all the classes have links that work (I added
> newline and tab to make it a bit more readable) :
> 
> Bio::OntologyIO
> 	Parser factory for Ontology formats
> Bio::OntologyIO::Handlers::BaseSAXHandler
> 	no short description available
> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> 	no short description available
> Bio::Ontology::OntologyI
> 	Interface for an ontology implementation
> Bio::Ontology::TermFactory
> 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> factory
> Bio::Ontology::OntologyStore
> 	A repository of ontologies
> Bio::Ontology::RelationshipFactory
> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> through a factory
> Bio::Ontology::Ontology
> 	standard implementation of an Ontology
> 
> So the names seem fine here.
> 
> When I click on a class (Bio::Ontology::Ontology) I get in the results
> section:
> 
> Method                  Class                                     Returns
> Usage
> add_relationship        Bio::Ontology::Ontology
Its
> argument.     add_relationship(RelationshipI relationship): RelationshipI
> add_relationship_type   Bio::Ontology::OntologyEngineI            not
> documented    not documented
> add_term                Bio::Ontology::Ontology                   its
> argument.     add_term(TermI term): TermI
> 
> ....and so on
> 
> Where each method is clickable and opens a new page containing a table:
> 
> Bio::Ontology::Ontology::add_relationship
> Usage	add_relationship(RelationshipI relationship): RelationshipI
> Function	Adds a relationship object to the ontology engine.
> Returns	Its argument.
> Args	A RelationshipI object.
> 
> 
> Each class is also linked to the bioperl-live PDOC.  Clicking on class
> Bio::Ontology::Ontology in the results table gets me this page (no new
> page):
> 
> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> 
> 
> Chris
> 
> > -----Original Message-----
> > From: Hilmar Lapp [mailto:hlapp at gmx.net]
> > Sent: Monday, May 15, 2006 1:09 PM
> > To: Chris Fields
> > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> > Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >
> > Safari or Firefox on MacOSX don't do this. Note that the appearance
> > in the browsable list is already different (the prefix is missing),
> > and the JavaScript link also lacks the prefix in the module name in
> > contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> > the few Bio::Ontology exceptions that do work and do display correctly).
> >
> > I suppose there is something peculiar about the code formatting of
> > those modules? Some of the modules under Bio::OntologyIO are also
> > affected BTW.
> >
> > What happens is after you click on the link the page apppears to
> > reload (i.e., gets submitted) but the second table that is supposed
> > open underneath the first doesn't appear. However, the sort-by drop
> > down selector does appear.
> >
> > 	-hilmar
> >
> > On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >
> > > That's strange.  Clicking on the list gives me the results for that
> > > module.
> > > When I click on the hyperlinks in the results section they open
> > > fine; the
> > > method column links opens a new page containing usage-function-
> > > returns-args
> > > and the class column links opens pdoc (same page) for bioperl-
> > > live.  I'm
> > > using Firefox 1.5 on WinXP.
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> > >> Sent: Monday, May 15, 2006 12:01 PM
> > >> To: Mauricio Herrera Cuadra
> > >> Cc: bioperl-l
> > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> > >>
> > >> Hey, thanks to Laura & David for this interface.
> > >>
> > >> Any idea why most of the Bio::Ontology::* modules show up without
> > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> > >> go anywhere either ... Anything different with those modules that I
> > >> can fix?
> > >>
> > >> 	-hilmar
> > >>
> > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> > >>
> > >>> I'm glad to announce the availability of the Deobfuscator
> > >>> interface at
> > >>> the BioPerl website. You can use it at the following URL:
> > >>>
> > >>> http://bioperl.org/cgi-bin/deob_interface.cgi
> > >>>
> > >>> Many thanks to Laura Kavanaugh and David Messina for this great
> > >>> contribution to the BioPerl project!
> > >>>
> > >>> Mauricio.
> > >>>
> > >>> --
> > >>> MAURICIO HERRERA CUADRA
> > >>> arareko at campus.iztacala.unam.mx
> > >>> Laboratorio de Gen?tica
> > >>> Unidad de Morfofisiolog?a y Funci?n
> > >>> Facultad de Estudios Superiores Iztacala, UNAM
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>
> > >>
> > >> --
> > >> ===========================================================
> > >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > >> ===========================================================
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >


From arareko at campus.iztacala.unam.mx  Mon May 15 15:20:10 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 15 May 2006 14:20:10 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
Message-ID: <4468D46A.8070203@campus.iztacala.unam.mx>

Laura and Dave would be very happy to see all of your 
comments/suggestions/enhancements/complaints summarized in the 
appropriate wiki page. Just be sure to sign them properly with your name 
and date:

http://bioperl.org/wiki/Deobfuscator

I think they'll have to discuss which features will be nice to implement 
and which don't, depending on the direction they want their project to 
go. But don't worry, they're extremely nice people who are open to all 
kind of ideas. The best of all: the Deobfuscator is open-source so 
everyone is invited to contribute to it, just ask them for the code :)

On my side, I'm working on tweaking the code so it would be able of 
browsing different BioPerl packages (core, run, ext) and their 
respective releases (stable, developer, cvs).

Regards,
Mauricio.

Chris Fields wrote:
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>> Sent: Monday, May 15, 2006 8:09 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Amir Karger wrote:
>>> This tool is quite nice, and may save me a lot of perdoc'ing.
>> Yes, many thanks to everyone involved.
> 
> The Deobfuscator currently indexes bioperl-1.4, so it's not completely
> up-to-date.  I believe Mauricio and Dave may be working on updating to the
> newer versions and maybe bioperl-live, as well as getting the other bioperl
> packages up and running.
> 
> For modules added after v1.4 I use the script in the FAQ question mentioned
> on the Deobfuscator wiki page to get up-to-date methods, then grab the that
> ActiveState HTML'd perldocs pumped out when installing using PPM (I make a
> custom PPM/PPD file and install myself every once in a while):
> 
> #!/usr/bin/perl -w
> use Class::Inspector;
> $class = shift || die "Usage: methods perl_class_name\n";
> eval "require $class";
> print join ("\n", sort @{Class::Inspector-
> 
>>> A couple of minor interface thoughts.
>>>
>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>> think I'll often want to browse through what's available in a class. But
>>> 60% or so of the screen real estate is used for "Enter a search
>>> string... OR select a class from the list". IMO, it would be better to
>>> have two pages, a search page and a result page.   It only takes a click
>>> on Back (or a "new search" button) to get to a new search, and now you
>>> can use your whole screen for reading your results.
>> As the compromise it must be, I like the way it behaves. I don't like
>> lots of windows. I especially don't like pop up windows. Right now when
>> I'm using the bioperl docs I tend to have a whole bunch of tabs open to
>> different class pages at once, so being able to see an overview all on
>> one page in Deobfuscator is very nice.
>>
>> Further to that, I'd love it if clicking on a method name caused an
>> in-place css(&|javascript) reveal (similar to how a well implemented
>> drop down menu works in a website) rather than a new window opened.
>> Alternatively, just have more columns in the results table, ie. usage,
>> function, returns, args columns. I feel that opening a window for each
>> method you want to understand is far too slow.
> 
> Agreed.
> 
>> I'd also really like a link to the code for the method as well. The
>> bioperl docs are rarely complete enough that you can really understand
>> what every method is supposed to do without looking at the code.
> 
> The methods that pop up are in columns along with the class module that
> implements the method.  
> 
> 
> If you click on that link you get PDOC documentation for the module which
> includes most of the code (strangely, though Deobfuscator indexes bioperl
> 1.4, the PDOC corresponds to bioperl-live).  Is that what you meant, or
> something a bit more detailed?
> 
>>> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
>>> to me that the search searches within class names rather than function
>>> names. What I really want to know sometimes is which module has, say,
>>> the revcom method in it.
> 
> That's listed in the method results table (the next column has the module
> with a link to the module's online docs).
> 
> 
> Chris
> 
> 
>> This would be a great feature to add.
>>
>>
>> Another minor interface thought:
>> 6) Have a little more cell padding in all the tables. Things are just a
>> little too cramped and things start to look messy/ run into each other.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at gmx.net  Mon May 15 15:23:55 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 15:23:55 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000501c67852$e1bb55c0$15327e82@pyrimidine>
References: <000501c67852$e1bb55c0$15327e82@pyrimidine>
Message-ID: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net>

I wasn't using the search. It's in the scrollable table for browsing.  
-hilmar

On May 15, 2006, at 3:07 PM, Chris Fields wrote:

> I'll have to give it a try on Mac OS X (we have an ancient G4 in  
> the lab
> which I can try it on).  I'll let you know what I find.
>
> This is what I get when I do a search for 'Bio::Ont*' using Firefox  
> on WinXP
> and this Deobfuscator link (http://bioperl.org/cgi-bin/ 
> deob_interface.cgi?);
> all the classes have links that work (I added newline and tab to  
> make it a
> bit more readable) :
>
> Bio::OntologyIO	
> 	Parser factory for Ontology formats
> Bio::OntologyIO::Handlers::BaseSAXHandler	
> 	no short description available
> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> 	no short description available
> Bio::Ontology::OntologyI
> 	Interface for an ontology implementation
> Bio::Ontology::TermFactory
> 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> factory
> Bio::Ontology::OntologyStore
> 	A repository of ontologies
> Bio::Ontology::RelationshipFactory
> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> through a factory
> Bio::Ontology::Ontology
> 	standard implementation of an Ontology
>
> So the names seem fine here.
>
> When I click on a class (Bio::Ontology::Ontology) I get in the results
> section:
>
> Method                  Class                                      
> Returns
> Usage
> add_relationship        Bio::Ontology::Ontology	                  Its
> argument.     add_relationship(RelationshipI relationship):  
> RelationshipI
> add_relationship_type   Bio::Ontology::OntologyEngineI            not
> documented    not documented
> add_term                Bio::Ontology::Ontology                   its
> argument.     add_term(TermI term): TermI
>
> ....and so on
>
> Where each method is clickable and opens a new page containing a  
> table:
>
> Bio::Ontology::Ontology::add_relationship
> Usage	add_relationship(RelationshipI relationship): RelationshipI
> Function	Adds a relationship object to the ontology engine.
> Returns	Its argument.
> Args	A RelationshipI object.
>
>
> Each class is also linked to the bioperl-live PDOC.  Clicking on class
> Bio::Ontology::Ontology in the results table gets me this page (no new
> page):
>
> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
>
>
> Chris
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Monday, May 15, 2006 1:09 PM
>> To: Chris Fields
>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Safari or Firefox on MacOSX don't do this. Note that the appearance
>> in the browsable list is already different (the prefix is missing),
>> and the JavaScript link also lacks the prefix in the module name in
>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
>> the few Bio::Ontology exceptions that do work and do display  
>> correctly).
>>
>> I suppose there is something peculiar about the code formatting of
>> those modules? Some of the modules under Bio::OntologyIO are also
>> affected BTW.
>>
>> What happens is after you click on the link the page apppears to
>> reload (i.e., gets submitted) but the second table that is supposed
>> open underneath the first doesn't appear. However, the sort-by drop
>> down selector does appear.
>>
>> 	-hilmar
>>
>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
>>
>>> That's strange.  Clicking on the list gives me the results for that
>>> module.
>>> When I click on the hyperlinks in the results section they open
>>> fine; the
>>> method column links opens a new page containing usage-function-
>>> returns-args
>>> and the class column links opens pdoc (same page) for bioperl-
>>> live.  I'm
>>> using Firefox 1.5 on WinXP.
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>> Sent: Monday, May 15, 2006 12:01 PM
>>>> To: Mauricio Herrera Cuadra
>>>> Cc: bioperl-l
>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>
>>>> Hey, thanks to Laura & David for this interface.
>>>>
>>>> Any idea why most of the Bio::Ontology::* modules show up without
>>>> their leading Bio::Ontology? And clicking on those hyperlinks  
>>>> doesn't
>>>> go anywhere either ... Anything different with those modules that I
>>>> can fix?
>>>>
>>>> 	-hilmar
>>>>
>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>>>
>>>>> I'm glad to announce the availability of the Deobfuscator
>>>>> interface at
>>>>> the BioPerl website. You can use it at the following URL:
>>>>>
>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>>
>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>>>> contribution to the BioPerl project!
>>>>>
>>>>> Mauricio.
>>>>>
>>>>> --
>>>>> MAURICIO HERRERA CUADRA
>>>>> arareko at campus.iztacala.unam.mx
>>>>> Laboratorio de Gen?tica
>>>>> Unidad de Morfofisiolog?a y Funci?n
>>>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ClarkeW at AGR.GC.CA  Mon May 15 15:40:15 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Mon, 15 May 2006 15:40:15 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>

Hey everyone, 

 
I have been developing some code to download and parse blast reports
from a remote server using Soap::Lite as well as insert the results into
a mysql database. The problem I am having is that my program seems to be
taking up and huge amount of RAM. For a single job of 10000 queries it
can consume as much as a couple hundred Mb inside an hour. I realize
that a lot of work is being done but this seems like way too much. This
leads me to the subject of my post. I think I may have traced the source
of the memory leak to Bio::SearchIO. I have used Devel::Size to track
the size of my variables and done other debugging steps and have had no
luck with resolving this very frustrating problem. My code is as
follows:

 
 my $result = $connector->getQueryResult($query_id);

 
                my $FH;

                open $FH, "<", \$result;

 
                my $searchio = new Bio::SearchIO(-format => "blast",

 
                         -fh => $FH);

 
                while (my $o_blast = $searchio->next_result()) {

                        my $clone_id = $o_blast->query_name();

 
                        my $statement = $bdbi->form_push_SQL ($o_blast,
$clone_id, 5);

 
this is just the leading and tailing code surrounding the use of
Bio::SearchIO since there is quite a lot. I am mostly just wondering if
anyone has ever had problems with SearchIO and its memory usage. I
looked at the source code for it but am afraid it is out of my league.
Any help/suggestions/questions would be great. Thanks


From dmessina at wustl.edu  Mon May 15 15:34:10 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 15 May 2006 14:34:10 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
Message-ID: <C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>

Responding to:
>>> Amir Karger
>> Sendu Bala
>  Chris Fields


> The Deobfuscator currently indexes bioperl-1.4, so it's not completely
> up-to-date.  I believe Mauricio and Dave may be working on updating  
> to the
> newer versions and maybe bioperl-live, as well as getting the other  
> bioperl
> packages up and running.

That's correct -- Mauricio is currently working on a version that  
will allow you to search 1.4, 1.5.1, or bioperl-live. The  
Deobfuscator indexes will be updated (daily?) to keep them in sync  
with the CVS repository.


>>> A couple of minor interface thoughts.
>>>
>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>> think I'll often want to browse through what's available in a  
>>> class. But
>>> 60% or so of the screen real estate is used for "Enter a search
>>> string... OR select a class from the list". IMO, it would be  
>>> better to
>>> have two pages, a search page and a result page.   It only takes  
>>> a click
>>> on Back (or a "new search" button) to get to a new search, and  
>>> now you
>>> can use your whole screen for reading your results.
>>
>> As the compromise it must be, I like the way it behaves. I don't like
>> lots of windows. I especially don't like pop up windows. Right now  
>> when
>> I'm using the bioperl docs I tend to have a whole bunch of tabs  
>> open to
>> different class pages at once, so being able to see an overview  
>> all on
>> one page in Deobfuscator is very nice.

I think the current behavior makes sense as the default, but I like  
the idea of being able to view the search results in a separate  
window for easier browsing. Thanks for the suggestion; I'll add it to  
the list.


>> Further to that, I'd love it if clicking on a method name caused an
>> in-place css(&|javascript) reveal (similar to how a well implemented
>> drop down menu works in a website) rather than a new window opened.
>> Alternatively, just have more columns in the results table, ie.  
>> usage,
>> function, returns, args columns. I feel that opening a window for  
>> each
>> method you want to understand is far too slow.
>
> Agreed.

Yeah, the way it currently works is admittedly lame, and was done as  
a placeholder until we figured out a better way to do it. An in-place  
reveal sounds like a good solution.


>>> 2) Please sort the "select a class from the list" alphabetically. I
>>> guess I can enter a search term to get the right classes, but it  
>>> would
>>> be nice to be able to browse.

Agreed. I think we were doing this in an earlier test version, but I  
must have left it out of the release I handed off to Mauricio.


>>> 3) Minimalist is nice, but documentation is even nicer. It wasn't  
>>> clear
>>> to me that the search searches within class names rather than  
>>> function
>>> names. What I really want to know sometimes is which module has,  
>>> say,
>>> the revcom method in it.
>>
>> This would be a great feature to add.

That's a great idea.


>>> 4) When I search for something that's not found, I get a screen that
>>> looks pretty familiar, with the extra text "No match to string  
>>> found"
>>> down at the bottom. It took me a while to even notice it.  
>>> (Studies show
>>> that most users don't read most of the text on a page.) Bold  
>>> might be
>>> nice here. Or put the error at the top of the screen. Or both.

Added to the list.


>>> 5) I'll save my stupidest comment for last - please make the page  
>>> title
>>> "Bioperl Deobfuscator", so that when I bookmark it I'll know what  
>>> the
>>> bookmark stands for.

Added to the list. Not stupid, by the way -- much to my surprise,  
there are at least 2 or 3 other (obviously inferior :) )  
deobfuscators floating around out there.


>> Another minor interface thought:
>> 6) Have a little more cell padding in all the tables. Things are  
>> just a
>> little too cramped and things start to look messy/ run into each  
>> other.

Added to the list.


Thanks to all of you for taking the time to give such detailed  
feedback -- it's really helpful.

There is a wiki page on the BioPerl site for this project (http:// 
www.bioperl.org/wiki/Deobfuscator), so I'll be putting your comments  
there for tracking and further discussion. Please feel free to add to  
it.


Dave


-- 
Dave Messina
WashU Genome Sequencing Center
dmessina at wustl.edu
314-286-1825


From faruque at ebi.ac.uk  Mon May 15 15:47:27 2006
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Mon, 15 May 2006 20:47:27 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
Message-ID: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk>

>> My personal view is that having it as an annotation would serve no  
>> real
>> purpose. For me the whole point of any kind of species  
>> representation in
>> bioperl is to allow you to compare species in a biologically  
>> meaningful
>> way. If it's just some annotation then that means it's basically

I understand the need to find the species name of entries, especially  
now that so many complete genomes have been given their own strain- 
specific tax nodes, and I also think it is a shame that the ncbi tax  
dump does not give a rank to entries such as these (they cannot  
easily be distinguished from unofficial ranks higher in the tree  
without ascending the tree).
Would it be useful for the species name to be included within EMBL  
file headers, eg in a line called OB (OB is a terrible suggestion  
based on 'Organism Binomial' since OS is already in use)?

eg two examples of the species 'Apple stem grooving virus', where the  
second one would appear to be a different species without delving  
into the tax tree or the inclusion of an OB line.

AC   D14995; S47260;
DE   Apple stem grooving virus genome, complete sequence.
OS   Apple stem grooving virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.

AC   AY646511;
DE   Citrus tatter leaf virus strain Kumquat 1, complete genome.
OS   Citrus tatter leaf virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.


> My point is, a large number of users do NOT use, nor care about,  
> taxonomic
> information to the degree they need to know the entire  
> classification of the
> organism; many are just as happy about getting the scientific name  
> only,
> which is in the GenBank/EMBL file itself.  To take one extreme, it  
> is not
> productive to force every user to download the NCBI tax database  
> and use
> lookups just to convert sequences from EMBL format to GenBank  
> format.  It's
> not productive to allow users to spam the NCBI tax database  
> remotely either,
> so hardcoding lookups is, IMHO, a big mistake.

I don't think you need to add any information to turn an embl-format  
file into a Genbank flatfile, but maybe I'm missing something obvious.

Nadeem


--
Dr S.M. Nadeem N. Faruque
9 Barley Court
Saffron Walden
Essex  CB11 3HG
01799 500 120


From dmessina at wustl.edu  Mon May 15 16:12:48 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 15 May 2006 15:12:48 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <5A2309FD-8C6E-4349-99CC-B3EDA8B2F499@wustl.edu>

On May 15, 2006, at 2:23 PM, Hilmar Lapp wrote:

> I wasn't using the search. It's in the scrollable table for browsing.
> -hilmar

I'm seeing this too on OS X with Safari 2.0.3.

If you type 'goflat' (without the quotes) into the search box, you'll  
see the behavior. Chris, can you try it again this way just to  
confirm it's an OS/browser-specific thing?

Not sure what's going on, Hilmar -- I'll take a look.

Dave


From cjfields at uiuc.edu  Mon May 15 16:56:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 15:56:29 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net>
Message-ID: <000a01c67862$0a00cab0$15327e82@pyrimidine>

Okay, I see what you mean.  Using the search term "Bio::Ont*" also explains
why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and Mac OS
X), and those links are broken like you said.  Could be something to do with
indexing.  

Using the methods script in the FAQ
(http://www.bioperl.org/wiki/FAQ#Why_can.27t_I_easily_get_a_list_of_all_the_
methods_a_object_can_call.3F) I get this:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
Bio::OntologyIO::simplehierarchy::Dumper
Bio::OntologyIO::simplehierarchy::basename
Bio::OntologyIO::simplehierarchy::dirname
Bio::OntologyIO::simplehierarchy::fileparse
Bio::OntologyIO::simplehierarchy::fileparse_set_fstype

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 2:24 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> I wasn't using the search. It's in the scrollable table for browsing.
> -hilmar
> 
> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
> 
> > I'll have to give it a try on Mac OS X (we have an ancient G4 in
> > the lab
> > which I can try it on).  I'll let you know what I find.
> >
> > This is what I get when I do a search for 'Bio::Ont*' using Firefox
> > on WinXP
> > and this Deobfuscator link (http://bioperl.org/cgi-bin/
> > deob_interface.cgi?);
> > all the classes have links that work (I added newline and tab to
> > make it a
> > bit more readable) :
> >
> > Bio::OntologyIO
> > 	Parser factory for Ontology formats
> > Bio::OntologyIO::Handlers::BaseSAXHandler
> > 	no short description available
> > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> > 	no short description available
> > Bio::Ontology::OntologyI
> > 	Interface for an ontology implementation
> > Bio::Ontology::TermFactory
> > 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> > factory
> > Bio::Ontology::OntologyStore
> > 	A repository of ontologies
> > Bio::Ontology::RelationshipFactory
> > 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> > through a factory
> > Bio::Ontology::Ontology
> > 	standard implementation of an Ontology
> >
> > So the names seem fine here.
> >
> > When I click on a class (Bio::Ontology::Ontology) I get in the results
> > section:
> >
> > Method                  Class
> > Returns
> > Usage
> > add_relationship        Bio::Ontology::Ontology
> Its
> > argument.     add_relationship(RelationshipI relationship):
> > RelationshipI
> > add_relationship_type   Bio::Ontology::OntologyEngineI            not
> > documented    not documented
> > add_term                Bio::Ontology::Ontology                   its
> > argument.     add_term(TermI term): TermI
> >
> > ....and so on
> >
> > Where each method is clickable and opens a new page containing a
> > table:
> >
> > Bio::Ontology::Ontology::add_relationship
> > Usage	add_relationship(RelationshipI relationship): RelationshipI
> > Function	Adds a relationship object to the ontology engine.
> > Returns	Its argument.
> > Args	A RelationshipI object.
> >
> >
> > Each class is also linked to the bioperl-live PDOC.  Clicking on class
> > Bio::Ontology::Ontology in the results table gets me this page (no new
> > page):
> >
> > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> >
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Monday, May 15, 2006 1:09 PM
> >> To: Chris Fields
> >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> Safari or Firefox on MacOSX don't do this. Note that the appearance
> >> in the browsable list is already different (the prefix is missing),
> >> and the JavaScript link also lacks the prefix in the module name in
> >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> >> the few Bio::Ontology exceptions that do work and do display
> >> correctly).
> >>
> >> I suppose there is something peculiar about the code formatting of
> >> those modules? Some of the modules under Bio::OntologyIO are also
> >> affected BTW.
> >>
> >> What happens is after you click on the link the page apppears to
> >> reload (i.e., gets submitted) but the second table that is supposed
> >> open underneath the first doesn't appear. However, the sort-by drop
> >> down selector does appear.
> >>
> >> 	-hilmar
> >>
> >> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >>
> >>> That's strange.  Clicking on the list gives me the results for that
> >>> module.
> >>> When I click on the hyperlinks in the results section they open
> >>> fine; the
> >>> method column links opens a new page containing usage-function-
> >>> returns-args
> >>> and the class column links opens pdoc (same page) for bioperl-
> >>> live.  I'm
> >>> using Firefox 1.5 on WinXP.
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >>>> Sent: Monday, May 15, 2006 12:01 PM
> >>>> To: Mauricio Herrera Cuadra
> >>>> Cc: bioperl-l
> >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>
> >>>> Hey, thanks to Laura & David for this interface.
> >>>>
> >>>> Any idea why most of the Bio::Ontology::* modules show up without
> >>>> their leading Bio::Ontology? And clicking on those hyperlinks
> >>>> doesn't
> >>>> go anywhere either ... Anything different with those modules that I
> >>>> can fix?
> >>>>
> >>>> 	-hilmar
> >>>>
> >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>>>
> >>>>> I'm glad to announce the availability of the Deobfuscator
> >>>>> interface at
> >>>>> the BioPerl website. You can use it at the following URL:
> >>>>>
> >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>>>
> >>>>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>>>> contribution to the BioPerl project!
> >>>>>
> >>>>> Mauricio.
> >>>>>
> >>>>> --
> >>>>> MAURICIO HERRERA CUADRA
> >>>>> arareko at campus.iztacala.unam.mx
> >>>>> Laboratorio de Gen?tica
> >>>>> Unidad de Morfofisiolog?a y Funci?n
> >>>>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 17:29:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 16:29:14 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk>
Message-ID: <000b01c67866$9dac2620$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nadeem Faruque
> Sent: Monday, May 15, 2006 2:47 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles
> species,subspecies/variant names
> 
> >> My personal view is that having it as an annotation would serve no
> >> real
> >> purpose. For me the whole point of any kind of species
> >> representation in
> >> bioperl is to allow you to compare species in a biologically
> >> meaningful
> >> way. If it's just some annotation then that means it's basically
> 
> I understand the need to find the species name of entries, especially
> now that so many complete genomes have been given their own strain-
> specific tax nodes, and I also think it is a shame that the ncbi tax
> dump does not give a rank to entries such as these (they cannot
> easily be distinguished from unofficial ranks higher in the tree
> without ascending the tree).
> Would it be useful for the species name to be included within EMBL
> file headers, eg in a line called OB (OB is a terrible suggestion
> based on 'Organism Binomial' since OS is already in use)?
> 
> eg two examples of the species 'Apple stem grooving virus', where the
> second one would appear to be a different species without delving
> into the tax tree or the inclusion of an OB line.
> 
> AC   D14995; S47260;
> DE   Apple stem grooving virus genome, complete sequence.
> OS   Apple stem grooving virus
> OB   Apple stem grooving virus
> OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
> OC   Capillovirus.
> 
> AC   AY646511;
> DE   Citrus tatter leaf virus strain Kumquat 1, complete genome.
> OS   Citrus tatter leaf virus
> OB   Apple stem grooving virus
> OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
> OC   Capillovirus.

Jason also mentions a few examples (see below).  The problem lies in the
fact that EMBL and GenBank flatfiles do not give hierarchy ranking for
taxonomy, so it's a best guess.  What I'm seeing is that the guess is wrong
more often than not when it comes to complex scientific names (viruses,
bacteria, etc).  Notice the doubling of the strain in the following GenBank
files passed through SeqIO (genbank->genbank conversion, BTW; haven't tried
EMBL):

SOURCE      Azoarcus sp. EbN1 EbN1
  ORGANISM  Azoarcus sp.
            Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales;
            Rhodocyclaceae; Azoarcus.

SOURCE      Mycobacterium sp. KMS KMS
  ORGANISM  Mycobacterium sp.
            Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
            Corynebacterineae; Mycobacteriaceae; Mycobacterium.

SOURCE      Mycobacterium tuberculosis C C
  ORGANISM  Mycobacterium tuberculosis
            Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
            Corynebacterineae; Mycobacteriaceae; Mycobacterium;
Mycobacterium;
            tuberculosis complex; Mycobacterium.

SOURCE      Bacillus subtilis subsp. subtilis str. 168 subtilis str. 168
  ORGANISM  Bacillus subtilis subsp.
            Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus.

Here are Jason's examples, for posterity:

Can you guess what value is the strain versus sub-species?  What happens
when there is a two part strain name (space separated) and a sub-species or
variety designation?

SOURCE      Staphylococcus haemolyticus JCSC1435
   ORGANISM  Staphylococcus haemolyticus JCSC1435
             Bacteria; Firmicutes; Bacillales; Staphylococcus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
strain is JCSC1435

versus
SOURCE      Muntiacus muntjak vaginalis
   ORGANISM  Muntiacus muntjak vaginalis
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;
Ruminantia;
             Pecora; Cervidae; Muntiacinae; Muntiacus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
species is muntjak, sub-species vaginalis ?

versus
SOURCE      Aspergillus nidulans FGSC A4
   ORGANISM  Aspergillus nidulans FGSC A4
             Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes;
             Eurotiales; Trichocomaceae; Emericella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321
Genus should be Aspergillus or Emericella ?

Strain and subspecies/variety in the same entry
SOURCE      Cryptococcus neoformans var. grubii H99
   ORGANISM  Cryptococcus neoformans var. grubii H99
             Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
             Heterobasidiomycetes; Tremellomycetidae; Tremellales;
Tremellaceae;
             Filobasidiella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443


> > My point is, a large number of users do NOT use, nor care about,
> > taxonomic
> > information to the degree they need to know the entire
> > classification of the
> > organism; many are just as happy about getting the scientific name
> > only,
> > which is in the GenBank/EMBL file itself.  To take one extreme, it
> > is not
> > productive to force every user to download the NCBI tax database
> > and use
> > lookups just to convert sequences from EMBL format to GenBank
> > format.  It's
> > not productive to allow users to spam the NCBI tax database
> > remotely either,
> > so hardcoding lookups is, IMHO, a big mistake.
> 
> I don't think you need to add any information to turn an embl-format
> file into a Genbank flatfile, but maybe I'm missing something obvious.

The issue is the way the SOURCE and ORGANISM lines are handled (OS/OC lines
in EMBL, I believe), which is using a Bio::Species object.  The problem is,
like I mentioned above, no hierarchal ranking is in the flat file, just the
order of the ranking.  We can try to make a best guess based on that but
it's obviously very tricky, particularly when dealing with subspecies,
strains, etc.  

NCBI also states that many times the classification can be too long for a
file so may be incomplete (I think they leave out nodes which have 'no rank'
tags, but I can't be completely sure), so there's another issue.

Anyway, this is where the lookup would come in, which would require a local
taxonomy  database (we can't spam the NCBI remote database, that would just
be rude) which would give the complete taxonomic classification if it worked
properly.  

So now we have three possible situations:

1) One extreme : We require a lookup to get it right (which, BTW, it
currently doesn't); this by default requires a local database.  
2) Middle of the road : we try and guess the information as best as we can
with the information given (the current situation); this is breaking more
and more often now, so is becoming more unreliable.
3) Other extreme : we punt and absolve ourselves of even trying to parse the
data and just have a strict tagname->value or similar simple construct to
handle the data.

#3 as default with option to do #1 is probably best (least error prone with
option for most information), with caching to speed up lookups as Sendu Bala
does now.

Chris

 
> Nadeem
> 
> 
> --
> Dr S.M. Nadeem N. Faruque
> 9 Barley Court
> Saffron Walden
> Essex  CB11 3HG
> 01799 500 120
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Mon May 15 17:37:56 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 17:37:56 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000a01c67862$0a00cab0$15327e82@pyrimidine>
References: <000a01c67862$0a00cab0$15327e82@pyrimidine>
Message-ID: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net>

It does have the following line though (and a 'use' statement for  
OntologyIO);

@ISA = qw( Bio::OntologyIO );

So what is it doing 'wrong' (there aren't any tests or so in which  
anything erroneous would show)?

	-hilmar

On May 15, 2006, at 4:56 PM, Chris Fields wrote:

> Okay, I see what you mean.  Using the search term "Bio::Ont*" also  
> explains
> why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and  
> Mac OS
> X), and those links are broken like you said.  Could be something  
> to do with
> indexing.
>
> Using the methods script in the FAQ
> (http://www.bioperl.org/wiki/FAQ#Why_can. 
> 27t_I_easily_get_a_list_of_all_the_
> methods_a_object_can_call.3F) I get this:
>
> C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
> Bio::OntologyIO::simplehierarchy::Dumper
> Bio::OntologyIO::simplehierarchy::basename
> Bio::OntologyIO::simplehierarchy::dirname
> Bio::OntologyIO::simplehierarchy::fileparse
> Bio::OntologyIO::simplehierarchy::fileparse_set_fstype
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Monday, May 15, 2006 2:24 PM
>> To: Chris Fields
>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> I wasn't using the search. It's in the scrollable table for browsing.
>> -hilmar
>>
>> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
>>
>>> I'll have to give it a try on Mac OS X (we have an ancient G4 in
>>> the lab
>>> which I can try it on).  I'll let you know what I find.
>>>
>>> This is what I get when I do a search for 'Bio::Ont*' using Firefox
>>> on WinXP
>>> and this Deobfuscator link (http://bioperl.org/cgi-bin/
>>> deob_interface.cgi?);
>>> all the classes have links that work (I added newline and tab to
>>> make it a
>>> bit more readable) :
>>>
>>> Bio::OntologyIO
>>> 	Parser factory for Ontology formats
>>> Bio::OntologyIO::Handlers::BaseSAXHandler
>>> 	no short description available
>>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
>>> 	no short description available
>>> Bio::Ontology::OntologyI
>>> 	Interface for an ontology implementation
>>> Bio::Ontology::TermFactory
>>> 	Instantiates a new Bio::Ontology::TermI (or derived class)  
>>> through a
>>> factory
>>> Bio::Ontology::OntologyStore
>>> 	A repository of ontologies
>>> Bio::Ontology::RelationshipFactory
>>> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
>>> through a factory
>>> Bio::Ontology::Ontology
>>> 	standard implementation of an Ontology
>>>
>>> So the names seem fine here.
>>>
>>> When I click on a class (Bio::Ontology::Ontology) I get in the  
>>> results
>>> section:
>>>
>>> Method                  Class
>>> Returns
>>> Usage
>>> add_relationship        Bio::Ontology::Ontology
>> Its
>>> argument.     add_relationship(RelationshipI relationship):
>>> RelationshipI
>>> add_relationship_type   Bio::Ontology::OntologyEngineI             
>>> not
>>> documented    not documented
>>> add_term                Bio::Ontology::Ontology                    
>>> its
>>> argument.     add_term(TermI term): TermI
>>>
>>> ....and so on
>>>
>>> Where each method is clickable and opens a new page containing a
>>> table:
>>>
>>> Bio::Ontology::Ontology::add_relationship
>>> Usage	add_relationship(RelationshipI relationship): RelationshipI
>>> Function	Adds a relationship object to the ontology engine.
>>> Returns	Its argument.
>>> Args	A RelationshipI object.
>>>
>>>
>>> Each class is also linked to the bioperl-live PDOC.  Clicking on  
>>> class
>>> Bio::Ontology::Ontology in the results table gets me this page  
>>> (no new
>>> page):
>>>
>>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
>>>
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>>>> Sent: Monday, May 15, 2006 1:09 PM
>>>> To: Chris Fields
>>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>
>>>> Safari or Firefox on MacOSX don't do this. Note that the appearance
>>>> in the browsable list is already different (the prefix is missing),
>>>> and the JavaScript link also lacks the prefix in the module name in
>>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
>>>> the few Bio::Ontology exceptions that do work and do display
>>>> correctly).
>>>>
>>>> I suppose there is something peculiar about the code formatting of
>>>> those modules? Some of the modules under Bio::OntologyIO are also
>>>> affected BTW.
>>>>
>>>> What happens is after you click on the link the page apppears to
>>>> reload (i.e., gets submitted) but the second table that is supposed
>>>> open underneath the first doesn't appear. However, the sort-by drop
>>>> down selector does appear.
>>>>
>>>> 	-hilmar
>>>>
>>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
>>>>
>>>>> That's strange.  Clicking on the list gives me the results for  
>>>>> that
>>>>> module.
>>>>> When I click on the hyperlinks in the results section they open
>>>>> fine; the
>>>>> method column links opens a new page containing usage-function-
>>>>> returns-args
>>>>> and the class column links opens pdoc (same page) for bioperl-
>>>>> live.  I'm
>>>>> using Firefox 1.5 on WinXP.
>>>>>
>>>>> Chris
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>>>> Sent: Monday, May 15, 2006 12:01 PM
>>>>>> To: Mauricio Herrera Cuadra
>>>>>> Cc: bioperl-l
>>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>>>
>>>>>> Hey, thanks to Laura & David for this interface.
>>>>>>
>>>>>> Any idea why most of the Bio::Ontology::* modules show up without
>>>>>> their leading Bio::Ontology? And clicking on those hyperlinks
>>>>>> doesn't
>>>>>> go anywhere either ... Anything different with those modules  
>>>>>> that I
>>>>>> can fix?
>>>>>>
>>>>>> 	-hilmar
>>>>>>
>>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>>>>>
>>>>>>> I'm glad to announce the availability of the Deobfuscator
>>>>>>> interface at
>>>>>>> the BioPerl website. You can use it at the following URL:
>>>>>>>
>>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>>>>
>>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>>>>>> contribution to the BioPerl project!
>>>>>>>
>>>>>>> Mauricio.
>>>>>>>
>>>>>>> --
>>>>>>> MAURICIO HERRERA CUADRA
>>>>>>> arareko at campus.iztacala.unam.mx
>>>>>>> Laboratorio de Gen?tica
>>>>>>> Unidad de Morfofisiolog?a y Funci?n
>>>>>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 18:03:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 17:03:48 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net>
Message-ID: <000d01c6786b$71c04e60$15327e82@pyrimidine>

And Bio::OntologyIO works on it's own:

C:\Perl\Scripts>methods.pl Bio::OntologyIO
Bio::OntologyIO::DESTROY
Bio::OntologyIO::new
Bio::OntologyIO::next_ontology
Bio::OntologyIO::term_factory
Bio::OntologyIO::unescape
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented

But when I try these:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::goflat


C:\Perl\Scripts>methods.pl Bio::OntologyIO::dagflat


I get nada.  It could be related to the way the methods are parsed using
Class::Inspector :

print join ("\n", sort
@{Class::Inspector->methods($class,'full','public')}), "\n";

I haven't tried it on all the weird Bio::Ontology-missing modules (don't
have time today).  It's not common to all of those modules though:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::InterProParser
Bio::OntologyIO::DESTROY
Bio::OntologyIO::InterProParser::next_ontology
Bio::OntologyIO::InterProParser::parse
Bio::OntologyIO::InterProParser::secondary_accessions_map
Bio::OntologyIO::new
Bio::OntologyIO::term_factory
Bio::OntologyIO::unescape
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented


Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 4:38 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> It does have the following line though (and a 'use' statement for
> OntologyIO);
> 
> @ISA = qw( Bio::OntologyIO );
> 
> So what is it doing 'wrong' (there aren't any tests or so in which
> anything erroneous would show)?
> 
> 	-hilmar
> 
> On May 15, 2006, at 4:56 PM, Chris Fields wrote:
> 
> > Okay, I see what you mean.  Using the search term "Bio::Ont*" also
> > explains
> > why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and
> > Mac OS
> > X), and those links are broken like you said.  Could be something
> > to do with
> > indexing.
> >
> > Using the methods script in the FAQ
> > (http://www.bioperl.org/wiki/FAQ#Why_can.
> > 27t_I_easily_get_a_list_of_all_the_
> > methods_a_object_can_call.3F) I get this:
> >
> > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
> > Bio::OntologyIO::simplehierarchy::Dumper
> > Bio::OntologyIO::simplehierarchy::basename
> > Bio::OntologyIO::simplehierarchy::dirname
> > Bio::OntologyIO::simplehierarchy::fileparse
> > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >> Sent: Monday, May 15, 2006 2:24 PM
> >> To: Chris Fields
> >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> I wasn't using the search. It's in the scrollable table for browsing.
> >> -hilmar
> >>
> >> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
> >>
> >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in
> >>> the lab
> >>> which I can try it on).  I'll let you know what I find.
> >>>
> >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox
> >>> on WinXP
> >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/
> >>> deob_interface.cgi?);
> >>> all the classes have links that work (I added newline and tab to
> >>> make it a
> >>> bit more readable) :
> >>>
> >>> Bio::OntologyIO
> >>> 	Parser factory for Ontology formats
> >>> Bio::OntologyIO::Handlers::BaseSAXHandler
> >>> 	no short description available
> >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> >>> 	no short description available
> >>> Bio::Ontology::OntologyI
> >>> 	Interface for an ontology implementation
> >>> Bio::Ontology::TermFactory
> >>> 	Instantiates a new Bio::Ontology::TermI (or derived class)
> >>> through a
> >>> factory
> >>> Bio::Ontology::OntologyStore
> >>> 	A repository of ontologies
> >>> Bio::Ontology::RelationshipFactory
> >>> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> >>> through a factory
> >>> Bio::Ontology::Ontology
> >>> 	standard implementation of an Ontology
> >>>
> >>> So the names seem fine here.
> >>>
> >>> When I click on a class (Bio::Ontology::Ontology) I get in the
> >>> results
> >>> section:
> >>>
> >>> Method                  Class
> >>> Returns
> >>> Usage
> >>> add_relationship        Bio::Ontology::Ontology
> >> Its
> >>> argument.     add_relationship(RelationshipI relationship):
> >>> RelationshipI
> >>> add_relationship_type   Bio::Ontology::OntologyEngineI
> >>> not
> >>> documented    not documented
> >>> add_term                Bio::Ontology::Ontology
> >>> its
> >>> argument.     add_term(TermI term): TermI
> >>>
> >>> ....and so on
> >>>
> >>> Where each method is clickable and opens a new page containing a
> >>> table:
> >>>
> >>> Bio::Ontology::Ontology::add_relationship
> >>> Usage	add_relationship(RelationshipI relationship): RelationshipI
> >>> Function	Adds a relationship object to the ontology engine.
> >>> Returns	Its argument.
> >>> Args	A RelationshipI object.
> >>>
> >>>
> >>> Each class is also linked to the bioperl-live PDOC.  Clicking on
> >>> class
> >>> Bio::Ontology::Ontology in the results table gets me this page
> >>> (no new
> >>> page):
> >>>
> >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> >>>
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >>>> Sent: Monday, May 15, 2006 1:09 PM
> >>>> To: Chris Fields
> >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>
> >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance
> >>>> in the browsable list is already different (the prefix is missing),
> >>>> and the JavaScript link also lacks the prefix in the module name in
> >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> >>>> the few Bio::Ontology exceptions that do work and do display
> >>>> correctly).
> >>>>
> >>>> I suppose there is something peculiar about the code formatting of
> >>>> those modules? Some of the modules under Bio::OntologyIO are also
> >>>> affected BTW.
> >>>>
> >>>> What happens is after you click on the link the page apppears to
> >>>> reload (i.e., gets submitted) but the second table that is supposed
> >>>> open underneath the first doesn't appear. However, the sort-by drop
> >>>> down selector does appear.
> >>>>
> >>>> 	-hilmar
> >>>>
> >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >>>>
> >>>>> That's strange.  Clicking on the list gives me the results for
> >>>>> that
> >>>>> module.
> >>>>> When I click on the hyperlinks in the results section they open
> >>>>> fine; the
> >>>>> method column links opens a new page containing usage-function-
> >>>>> returns-args
> >>>>> and the class column links opens pdoc (same page) for bioperl-
> >>>>> live.  I'm
> >>>>> using Firefox 1.5 on WinXP.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >>>>>> Sent: Monday, May 15, 2006 12:01 PM
> >>>>>> To: Mauricio Herrera Cuadra
> >>>>>> Cc: bioperl-l
> >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>>>
> >>>>>> Hey, thanks to Laura & David for this interface.
> >>>>>>
> >>>>>> Any idea why most of the Bio::Ontology::* modules show up without
> >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks
> >>>>>> doesn't
> >>>>>> go anywhere either ... Anything different with those modules
> >>>>>> that I
> >>>>>> can fix?
> >>>>>>
> >>>>>> 	-hilmar
> >>>>>>
> >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>>>>>
> >>>>>>> I'm glad to announce the availability of the Deobfuscator
> >>>>>>> interface at
> >>>>>>> the BioPerl website. You can use it at the following URL:
> >>>>>>>
> >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>>>>>
> >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>>>>>> contribution to the BioPerl project!
> >>>>>>>
> >>>>>>> Mauricio.
> >>>>>>>
> >>>>>>> --
> >>>>>>> MAURICIO HERRERA CUADRA
> >>>>>>> arareko at campus.iztacala.unam.mx
> >>>>>>> Laboratorio de Gen?tica
> >>>>>>> Unidad de Morfofisiolog?a y Funci?n
> >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> ===========================================================
> >>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>>>> ===========================================================
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 20:14:28 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Mon, 15 May 2006 19:14:28 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <a7d26051.b90db78f.81ac600@expms6.cites.uiuc.edu>

---- Original message ----
>Date: Mon, 15 May 2006 15:40:15 -0400
>From: "Clarke, Wayne" <ClarkeW at agr.gc.ca>  
>Subject: [Bioperl-l] Memory Leak in Bio::SearchIO  
>To: <bioperl-l at lists.open-bio.org>
>
>Hey everyone, 
>
> 
>
>I have been developing some code to download and parse blast reports
>from a remote server using Soap::Lite as well as insert the results into
>a mysql database. The problem I am having is that my program seems to be
>taking up and huge amount of RAM. For a single job of 10000 queries it
>can consume as much as a couple hundred Mb inside an hour. 

If you're parsing 10000 queries (10000 different BLAST reports, right?) then it's 
not necessarily a memory leak as much as it is object creatio.  Each report 
generates hit objects which in turn generate hsp objects.  I think Jason 
recommends using the tabular output option (-m8 or -m9) for huge reports as 
it cuts down considerably on this.  If you are cycling through each report it 
shouldn't be as much of a problem unless your BLAST reports are really huge.  
Have you tried parsing a single report to see if the problem persists?

Now, if you are using Bioperl 1.5.1 with BLAST 2.2.13 or newer, you'll likely run 
into a problem with an infinite loop that occurs due to a change in NCBI's text 
output.  You can try updating bioperl from CVS in either case to see if that helps 
any.  Tabular output and XML output, AFAIK, is the same regardless of version; 
this bug only affected text output of BLAST reports.

> I realize
>that a lot of work is being done but this seems like way too much. This
>leads me to the subject of my post. I think I may have traced the source
>of the memory leak to Bio::SearchIO. I have used Devel::Size to track
>the size of my variables and done other debugging steps and have had no
>luck with resolving this very frustrating problem. My code is as
>follows:
>
> 
>
> my $result = $connector->getQueryResult($query_id);
>
> 
>
>                my $FH;
>
>                open $FH, "<", \$result;
>
> 
>
>                my $searchio = new Bio::SearchIO(-format => "blast",
>
> 
>
>                         -fh => $FH);
>
> 
>
>                while (my $o_blast = $searchio->next_result()) {
>
>                        my $clone_id = $o_blast->query_name();
>
> 
>
>                        my $statement = $bdbi->form_push_SQL ($o_blast,
>$clone_id, 5);
>
> 
>
>this is just the leading and tailing code surrounding the use of
>Bio::SearchIO since there is quite a lot. I am mostly just wondering if
>anyone has ever had problems with SearchIO and its memory usage. I
>looked at the source code for it but am afraid it is out of my league.
>Any help/suggestions/questions would be great. Thanks
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

From torsten.seemann at infotech.monash.edu.au  Mon May 15 20:18:44 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 16 May 2006 10:18:44 +1000
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>
Message-ID: <44691A64.8040607@infotech.monash.edu.au>

> taking up and huge amount of RAM. For a single job of 10000 queries it
> can consume as much as a couple hundred Mb inside an hour. I realize

>  my $result = $connector->getQueryResult($query_id);
>                 my $searchio = new Bio::SearchIO(-format => "blast",
>                 while (my $o_blast = $searchio->next_result()) {
>                         my $clone_id = $o_blast->query_name();
>                         my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); }

Some comments:

Have you considered that whatever class/module $bdbi belongs to is 
causing the problem? ie. is it keeping a reference to $o_blast around?

Are you aware that Perl garbage collection does not necessarily return 
freed memory back to the OS? This may affect how you were measuring 
"memory usage".

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From kmdaily at indiana.edu  Mon May 15 17:00:12 2006
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Mon, 15 May 2006 17:00:12 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
Message-ID: <20528E699A515C499B80C222BDBEBC34043FF8@iu-mssg-mbx108.ads.iu.edu>

I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module?

Kenny Daily
IU School of Informatics
kmdaily at indiana.edu


From letondal at pasteur.fr  Tue May 16 02:06:19 2006
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 16 May 2006 08:06:19 +0200
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
	<C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>
Message-ID: <9c36140009c3d80bbb0d543376afa6e0@pasteur.fr>


On May 15, 2006, at 9:34 PM, David Messina wrote:

>>>> A couple of minor interface thoughts.
>>>>
>>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>>> think I'll often want to browse through what's available in a
>>>> class. But
>>>> 60% or so of the screen real estate is used for "Enter a search
>>>> string... OR select a class from the list". IMO, it would be
>>>> better to
>>>> have two pages, a search page and a result page.   It only takes
>>>> a click
>>>> on Back (or a "new search" button) to get to a new search, and
>>>> now you
>>>> can use your whole screen for reading your results.
>>>
>>> As the compromise it must be, I like the way it behaves. I don't like
>>> lots of windows. I especially don't like pop up windows. Right now
>>> when
>>> I'm using the bioperl docs I tend to have a whole bunch of tabs
>>> open to
>>> different class pages at once, so being able to see an overview
>>> all on
>>> one page in Deobfuscator is very nice.
>
> I think the current behavior makes sense as the default, but I like
> the idea of being able to view the search results in a separate
> window for easier browsing. Thanks for the suggestion; I'll add it to
> the list.
>

First, thanks for this very useful Web interface!

There are examples (quite ajaxian ones) that reach a compromise between 
several windows for easily browsing large results, and composing 
everything in one window to get an overview - the 2 examples that come 
in my mind currently are (not biology related):
	- http://montreal.mspace.fm/chi/sched/
	- http://www.live.com/
		(see the slider on the top right enabling to squeeze or enlarge the 
results area)


--
Catherine Letondal -- Institut Pasteur


From cjfields at uiuc.edu  Tue May 16 07:38:42 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Tue, 16 May 2006 06:38:42 -0500
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
Message-ID: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>

You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
developer release  (1.5.1):

http://www.bioperl.org/wiki/Installing_BioPerl

Chris

---- Original message ----
>Date: Mon, 15 May 2006 17:00:12 -0400
>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>To: <bioperl-l at lists.open-bio.org>
>
>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
Bio/SeqIO). How can I get this module?
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

From bernd.web at gmail.com  Tue May 16 07:37:46 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 16 May 2006 13:37:46 +0200
Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
Message-ID: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com>

Hi all,

I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and
found some issues and differences (bugs?) in behaviour wrt the pod.
Do these look familiar ?

Some example code:
my $query = Bio::DB::Query::GenBank->new
       (-query   =>'Lassa Virus[ORGN]',
        -reldate => '30',
        -db      => 'protein',
        -ids => [195052,2981014,11127914],
        -maxids => 30 );

$gb = new Bio::DB::GenBank(format=>'fasta');
my $seqio = $gb->get_Stream_by_query($query);
while (my $seq = $seqio->next_seq) {
       print $seq->desc,"\n"; }

The module states that if we provide -ids that:
       If you provide an array reference of IDs in -ids, the query will be
       ignored and the list of IDs will be used when the query is passed to a
       Bio::DB::GenBank object's get_Stream_by_query() method.

In the above case actually the query is passed ('Lassa Virus[ORGN]),
not the IDs. Also $query->query shows the original query. Am I doing
something wrong or is the pod not reflecting current behaviour of this
module?

I was also surprised that if internet is down no warning is thrown for
$query->query or $query->count at all. Only the get_Stream_by_query
above will warn us if the site is unreachable (500 Internal Server
Error).

$query->ids or $query->count will not throw a warning and
@ids=$query->ids will just be an empty array. (I realize $query->count
is not initialized, so I am using this now to check for succes, but a
warning from WebDBSeqI would me more approprotiate I think).

Last, the example from the pod is not working, but no warnings are raised:
          # initialize the list yourself
          my $query =
Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]);

$query->count returns zero w/o any warning. Of course this query did
not specify a DB. Only if we specify -db=>'nucleotide' $query->count
is 3.
However, why not any warning if we set -db->'protein' or if we did not set this?

On the NCBI website searching Protein DB returns for 19505:
      See Details. No items found.
      The following term(s) refer to a different DB:195052

But this is not reflected via Bio::DB::Query::GenBank.

Can I check for this situation in the code apart from checking on
$query->count == 0 ? Or would it indeed be better to check for these
situations in the module?

Regards,
Bernd


From chen_li3 at yahoo.com  Tue May 16 10:55:51 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 16 May 2006 07:55:51 -0700 (PDT)
Subject: [Bioperl-l] module for 6 reading frames
Message-ID: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com>

Hi all,

I wonder which module is available for translating DNA
sequence into 6 reading frames.

Thank you,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From smarkel at scitegic.com  Tue May 16 11:10:35 2006
From: smarkel at scitegic.com (smarkel at scitegic.com)
Date: Tue, 16 May 2006 08:10:35 -0700
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com>
Message-ID: <OF41BF3DF8.D7365B03-ON88257170.00534209-88257170.00535904@scitegic.com>

Li,

Use the translate() function in Bio::Tools::CodonTable.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


bioperl-l-bounces at lists.open-bio.org wrote on 16.05.2006 07:55:51:

> Hi all,
> 
> I wonder which module is available for translating DNA
> sequence into 6 reading frames.
> 
> Thank you,
> 
> Li


From golharam at umdnj.edu  Tue May 16 12:18:19 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 16 May 2006 12:18:19 -0400
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
Message-ID: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1>

I just updated my local copy of bioperl from cvs.  When I ran the
configure script, it says I need the external module
Bio::ASN1::EntrezGene.  Which package contains this module?

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From golharam at umdnj.edu  Tue May 16 12:24:03 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 16 May 2006 12:24:03 -0400
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
Message-ID: <002001c67905$2622a580$2f01a8c0@GOLHARMOBILE1>

Never mind.  I see its in CPAN.

-----Original Message-----
From: Ryan Golhar [mailto:golharam at umdnj.edu] 
Sent: Tuesday, May 16, 2006 12:18 PM
To: 'bioperl-l at bioperl.org'
Subject: Where is Bio::ASN1::EntrezGene?


I just updated my local copy of bioperl from cvs.  When I ran the
configure script, it says I need the external module
Bio::ASN1::EntrezGene.  Which package contains this module?

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From cjfields at uiuc.edu  Tue May 16 13:27:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 12:27:32 -0500
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
In-Reply-To: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <002701c6790e$03d8f110$15327e82@pyrimidine>

It's actually not part of Bioperl currently; you can find it on CPAN:

http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene-1.091/lib/Bio/ASN1/Ent
rezGene.pm

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, May 16, 2006 11:18 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
> 
> I just updated my local copy of bioperl from cvs.  When I ran the
> configure script, it says I need the external module
> Bio::ASN1::EntrezGene.  Which package contains this module?
> 
> --
> Ryan Golhar  -  golharam at umdnj.edu
> The Informatics Institute of UMDNJ
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue May 16 16:57:13 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 16 May 2006 16:57:13 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca>


With regards to the suggestions/comments made thank you. However I think
I should clear a few things up. I am running bioperl v1.4, I am cycling
through the blast reports which should not be of absurd size since they
only contain the top 5 hits, and I am using top to track(although I
realize fairly inacuately) the memory usage. I have looked through the
code for both AAFCBLAST and BEAST_UPDATE but do not believe the
leak/problem to be contained within them since they are almost
exclusively using method calls and those variables should be destroyed
upon leaving the scope of the method. I have used Devel::Size to check
the size of the variables $bdbi and $searchio and $connector and on each
iteration these variables have the same size. Any other suggestions
would be greatly appreciated as I have nearly gone insane trying to
track this problem down.

Thanks, Wayne 


-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] 
Sent: Monday, May 15, 2006 6:19 PM
To: Clarke, Wayne
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO

> taking up and huge amount of RAM. For a single job of 10000 queries it
> can consume as much as a couple hundred Mb inside an hour. I realize

>  my $result = $connector->getQueryResult($query_id);
>                 my $searchio = new Bio::SearchIO(-format => "blast",
>                 while (my $o_blast = $searchio->next_result()) {
>                         my $clone_id = $o_blast->query_name();
>                         my $statement = $bdbi->form_push_SQL
($o_blast, $clone_id, 5); }

Some comments:

Have you considered that whatever class/module $bdbi belongs to is 
causing the problem? ie. is it keeping a reference to $o_blast around?

Are you aware that Perl garbage collection does not necessarily return 
freed memory back to the OS? This may affect how you were measuring 
"memory usage".

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From smarkel at scitegic.com  Tue May 16 16:52:05 2006
From: smarkel at scitegic.com (smarkel at scitegic.com)
Date: Tue, 16 May 2006 13:52:05 -0700
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <20060516200436.34908.qmail@web36812.mail.mud.yahoo.com>
Message-ID: <OFE1576D7B.C032BA7B-ON88257170.00721261-88257170.00729CCD@scitegic.com>

Li,

You can either do the substring, and reverse complement, yourself
or you can use the translate() function in Bio::PrimarySeq.  It
inherits from Bio::PrimarySeqI, so check there for the documentation.
That translate() function takes a "-frame" argument.

Scott

PS In future, please respond to the list.  That way others see
the questions and answers.

chen li <chen_li3 at yahoo.com> wrote on 16.05.2006 13:04:36:

> Dear Dr. Markel,
> 
>     I browse through the document of 
> Bio:Tools::Codontable and find this line:
> 
> my $translation= $CodonTable->translate($seq);
> 
> I think this line is to do the translation. Here is my
> question: which line in the doc says how to translate
> the remaining frames 2,3, and -1, -2, -3? 
> 
> 
> Thank you,
> 
> Li
> 
> --- smarkel at scitegic.com wrote:
> 
> > Li,
> > 
> > Use the translate() function in
> > Bio::Tools::CodonTable.
> > 
> > Scott
> > 
> > Scott Markel, Ph.D.
> > Principal Bioinformatics Architect  email: 
> > smarkel at scitegic.com
> > SciTegic Inc.                       mobile: +1 858
> > 205 3653
> > 10188 Telesis Court, Suite 100      voice:  +1 858
> > 799 5603
> > San Diego, CA 92121                 fax:    +1 858
> > 279 8804
> > USA                                 web: 
> > http://www.scitegic.com
> > 
> > 
> > bioperl-l-bounces at lists.open-bio.org wrote on
> > 16.05.2006 07:55:51:
> > 
> > > Hi all,
> > > 
> > > I wonder which module is available for translating
> > DNA
> > > sequence into 6 reading frames.
> > > 
> > > Thank you,
> > > 
> > > Li
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> 
> -- 
> Click on the link below to report this email as spam
> https://www.mailcontrol.
> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO!
> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI!
> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT!
> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2!
> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV 


From cjfields at uiuc.edu  Tue May 16 17:15:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 16:15:10 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca>
Message-ID: <000601c6792d$d0ab1500$15327e82@pyrimidine>

I mentioned two possibilities last time I posted: 1) that the BLAST file was
too large, or 2) that you are using an old version of bioperl that SearchIO
is broken.  You seem to fit #2. 

The issue is that NCBI does not consider text BLAST output sacrosanct and
routinely makes changes to it that break parsing.  Due to this,
SearchIO::blast needs to be constantly updated, so much so that there are
normally a few updates a year to fix parsing issues in that module alone
compared to BioPerl as a whole.  And, BTW, although bioperl-1.4 is about 2
years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the
latest NCBI BLAST (2.2.14 now).  I seriously suggest updating your local
bioperl distribution to the latest bioperl-live (from CVS).

Take one of those 10000 reports, just one, and try parsing it.  If you have
the same problem (a CPU spike and increasing memory usage) then it may be
fixed in CVS.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 3:57 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> With regards to the suggestions/comments made thank you. However I think
> I should clear a few things up. I am running bioperl v1.4, I am cycling
> through the blast reports which should not be of absurd size since they
> only contain the top 5 hits, and I am using top to track(although I
> realize fairly inacuately) the memory usage. I have looked through the
> code for both AAFCBLAST and BEAST_UPDATE but do not believe the
> leak/problem to be contained within them since they are almost
> exclusively using method calls and those variables should be destroyed
> upon leaving the scope of the method. I have used Devel::Size to check
> the size of the variables $bdbi and $searchio and $connector and on each
> iteration these variables have the same size. Any other suggestions
> would be greatly appreciated as I have nearly gone insane trying to
> track this problem down.
> 
> Thanks, Wayne
> 
> 
> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Monday, May 15, 2006 6:19 PM
> To: Clarke, Wayne
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> > taking up and huge amount of RAM. For a single job of 10000 queries it
> > can consume as much as a couple hundred Mb inside an hour. I realize
> 
> >  my $result = $connector->getQueryResult($query_id);
> >                 my $searchio = new Bio::SearchIO(-format => "blast",
> >                 while (my $o_blast = $searchio->next_result()) {
> >                         my $clone_id = $o_blast->query_name();
> >                         my $statement = $bdbi->form_push_SQL
> ($o_blast, $clone_id, 5); }
> 
> Some comments:
> 
> Have you considered that whatever class/module $bdbi belongs to is
> causing the problem? ie. is it keeping a reference to $o_blast around?
> 
> Are you aware that Perl garbage collection does not necessarily return
> freed memory back to the OS? This may affect how you were measuring
> "memory usage".
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue May 16 17:24:51 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 16 May 2006 17:24:51 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca>


Thanks Chris, 

I did forget to mention however that I did parse one single report and
found no problems, it finished fast and with no noticeable memory usage.
I will consider getting my SA to update bioperl from CVS as a precaution
but he has already stated he prefers to wait for the release of v1.5.
Even a single job of 10000 will finish but the problem is that I am
trying to loop through many jobs of 10000 and it seems to be additive
for reasons I can not determine. During testing I noticed that the RSS
on top decreased around 80% MEM usage, but then the shared mem
increased. I am wondering if this is due to the perl garbage collector
freeing up memory but keeping it in its pool for use, if so that is fine
as long as the it does not then want to reach into swapped mem.

Thanks again, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Tuesday, May 16, 2006 3:15 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Memory Leak in Bio::SearchIO

I mentioned two possibilities last time I posted: 1) that the BLAST file
was
too large, or 2) that you are using an old version of bioperl that
SearchIO
is broken.  You seem to fit #2. 

The issue is that NCBI does not consider text BLAST output sacrosanct
and
routinely makes changes to it that break parsing.  Due to this,
SearchIO::blast needs to be constantly updated, so much so that there
are
normally a few updates a year to fix parsing issues in that module alone
compared to BioPerl as a whole.  And, BTW, although bioperl-1.4 is about
2
years old now, even bioperl-1.5.1 SearchIO is broken when it comes to
the
latest NCBI BLAST (2.2.14 now).  I seriously suggest updating your local
bioperl distribution to the latest bioperl-live (from CVS).

Take one of those 10000 reports, just one, and try parsing it.  If you
have
the same problem (a CPU spike and increasing memory usage) then it may
be
fixed in CVS.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 3:57 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> With regards to the suggestions/comments made thank you. However I
think
> I should clear a few things up. I am running bioperl v1.4, I am
cycling
> through the blast reports which should not be of absurd size since
they
> only contain the top 5 hits, and I am using top to track(although I
> realize fairly inacuately) the memory usage. I have looked through the
> code for both AAFCBLAST and BEAST_UPDATE but do not believe the
> leak/problem to be contained within them since they are almost
> exclusively using method calls and those variables should be destroyed
> upon leaving the scope of the method. I have used Devel::Size to check
> the size of the variables $bdbi and $searchio and $connector and on
each
> iteration these variables have the same size. Any other suggestions
> would be greatly appreciated as I have nearly gone insane trying to
> track this problem down.
> 
> Thanks, Wayne
> 
> 
> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Monday, May 15, 2006 6:19 PM
> To: Clarke, Wayne
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> > taking up and huge amount of RAM. For a single job of 10000 queries
it
> > can consume as much as a couple hundred Mb inside an hour. I realize
> 
> >  my $result = $connector->getQueryResult($query_id);
> >                 my $searchio = new Bio::SearchIO(-format => "blast",
> >                 while (my $o_blast = $searchio->next_result()) {
> >                         my $clone_id = $o_blast->query_name();
> >                         my $statement = $bdbi->form_push_SQL
> ($o_blast, $clone_id, 5); }
> 
> Some comments:
> 
> Have you considered that whatever class/module $bdbi belongs to is
> causing the problem? ie. is it keeping a reference to $o_blast around?
> 
> Are you aware that Perl garbage collection does not necessarily return
> freed memory back to the OS? This may affect how you were measuring
> "memory usage".
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue May 16 17:45:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 16:45:16 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca>
Message-ID: <000801c67932$050dbd30$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 4:25 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> Thanks Chris,
> 
> I did forget to mention however that I did parse one single report and
> found no problems, it finished fast and with no noticeable memory usage.
> I will consider getting my SA to update bioperl from CVS as a precaution
> but he has already stated he prefers to wait for the release of v1.5.

Um, you can tell him the last release was v.1.5.1 (last October).  It's
considered a developer release but is pretty stable; well, except for that
whole SearchIO quibble, and that's not our fault.

You could also install a local version in case he doesn't budge; see here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_I
N_A_PERSONAL_MODULE_AREA

Chris

> Even a single job of 10000 will finish but the problem is that I am
> trying to loop through many jobs of 10000 and it seems to be additive
> for reasons I can not determine. During testing I noticed that the RSS
> on top decreased around 80% MEM usage, but then the shared mem
> increased. I am wondering if this is due to the perl garbage collector
> freeing up memory but keeping it in its pool for use, if so that is fine
> as long as the it does not then want to reach into swapped mem.
> 
> Thanks again, Wayne
> ...


From cjfields at uiuc.edu  Tue May 16 18:20:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 17:20:29 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
In-Reply-To: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com>
Message-ID: <000901c67936$f0896990$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernd Web
> Sent: Tuesday, May 16, 2006 6:38 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
> 
> Hi all,
> 
> I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and
> found some issues and differences (bugs?) in behaviour wrt the pod.
> Do these look familiar ?
> 
> Some example code:
> my $query = Bio::DB::Query::GenBank->new
>        (-query   =>'Lassa Virus[ORGN]',
>         -reldate => '30',
>         -db      => 'protein',
>         -ids => [195052,2981014,11127914],
>         -maxids => 30 );
> 
> $gb = new Bio::DB::GenBank(format=>'fasta');
> my $seqio = $gb->get_Stream_by_query($query);
> while (my $seq = $seqio->next_seq) {
>        print $seq->desc,"\n"; }
> 
> The module states that if we provide -ids that:
>        If you provide an array reference of IDs in -ids, the query will be
>        ignored and the list of IDs will be used when the query is passed
> to a
>        Bio::DB::GenBank object's get_Stream_by_query() method.
> 
> In the above case actually the query is passed ('Lassa Virus[ORGN]),
> not the IDs. Also $query->query shows the original query. Am I doing
> something wrong or is the pod not reflecting current behaviour of this
> module?
> 
> I was also surprised that if internet is down no warning is thrown for
> $query->query or $query->count at all. Only the get_Stream_by_query
> above will warn us if the site is unreachable (500 Internal Server
> Error).

I believe this has to do with the difference in the objects and the way they
retrieve request data; Bio::DB::GenBank and Bio::DB::Query::GenBank use
different methods to retrieve ids, Bio::DB::GenBank's get_Stream_by_query
method just makes it a bit easier to retrieve a list of uid's directly
instead of saving them as an array then reposting them using
get_Stream_by_id.  Not fullproof but it works okay.

> $query->ids or $query->count will not throw a warning and
> @ids=$query->ids will just be an empty array. (I realize $query->count
> is not initialized, so I am using this now to check for succes, but a
> warning from WebDBSeqI would me more approprotiate I think).

WebDBSeqI would be the place to make general warnings (it supposed to be and
interface for any web seq DB), but not eutils-specific warnings. 

> Last, the example from the pod is not working, but no warnings are raised:
>           # initialize the list yourself
>           my $query =
> Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]);
> 
> $query->count returns zero w/o any warning. Of course this query did
> not specify a DB. Only if we specify -db=>'nucleotide' $query->count
> is 3.
> However, why not any warning if we set -db->'protein' or if we did not set
> this?
>
>
> On the NCBI website searching Protein DB returns for 19505:
>       See Details. No items found.
>       The following term(s) refer to a different DB:195052
> 
> But this is not reflected via Bio::DB::Query::GenBank.
> 
> Can I check for this situation in the code apart from checking on
> $query->count == 0 ? Or would it indeed be better to check for these
> situations in the module?
> 
> Regards,
> Bernd

I can probably play around with adding a few things in tomorrow and clean up
the POD somewhat.  I'm planning a rewrite for EUtilities-based searches but
that's a ways off still...  Can't promise much;l I'm pretty busy til next
week.

Chris


From chen_li3 at yahoo.com  Tue May 16 20:53:17 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 16 May 2006 17:53:17 -0700 (PDT)
Subject: [Bioperl-l] module for formating sequence output on the screen
Message-ID: <20060517005317.3976.qmail@web36815.mail.mud.yahoo.com>

Hi all,

Thank you very much for the help.

I have some DNA sequences printed on the screen. But
the default output is longer than I expect.  I need 50
necleotides/line. I search CPAN but can not get the
right module.  Which bioperl module can do this job?

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From kmdaily at indiana.edu  Tue May 16 09:57:52 2006
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Tue, 16 May 2006 09:57:52 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>
Message-ID: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>

OK, got that installed. But I still get an error:

Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557.

I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core".

Kenny Daily
IU School of Informatics
kmdaily at indiana.edu


-----Original Message-----
From: Christopher Fields [mailto:cjfields at uiuc.edu]
Sent: Tue 5/16/2006 7:38 AM
To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
 
You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
developer release  (1.5.1):

http://www.bioperl.org/wiki/Installing_BioPerl

Chris

---- Original message ----
>Date: Mon, 15 May 2006 17:00:12 -0400
>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>To: <bioperl-l at lists.open-bio.org>
>
>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
Bio/SeqIO). How can I get this module?
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From skirov at utk.edu  Wed May 17 07:48:29 2006
From: skirov at utk.edu (Stefan Kirov)
Date: Wed, 17 May 2006 07:48:29 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
In-Reply-To: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>
References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>
	<20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>
Message-ID: <446B0D8D.40901@utk.edu>

You are using an old Bio::Annotation::DBLink module. Did you download 
only entrezgene.pm or the whole  bioperl? If yes, what does the tests 
tell you?
Stefan
 
Daily, Kenneth Michael wrote:

>OK, got that installed. But I still get an error:
>
>Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557.
>
>I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core".
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>
>-----Original Message-----
>From: Christopher Fields [mailto:cjfields at uiuc.edu]
>Sent: Tue 5/16/2006 7:38 AM
>To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
> 
>You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
>developer release  (1.5.1):
>
>http://www.bioperl.org/wiki/Installing_BioPerl
>
>Chris
>
>---- Original message ----
>  
>
>>Date: Mon, 15 May 2006 17:00:12 -0400
>>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>>To: <bioperl-l at lists.open-bio.org>
>>
>>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
>>    
>>
>Bio/SeqIO). How can I get this module?
>  
>
>>Kenny Daily
>>IU School of Informatics
>>kmdaily at indiana.edu
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>    
>>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>  
>


From osborne1 at optonline.net  Tue May 16 20:46:00 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 16 May 2006 20:46:00 -0400
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <OFE1576D7B.C032BA7B-ON88257170.00721261-88257170.00729CCD@scitegic.com>
Message-ID: <C08FEA88.877A%osborne1@optonline.net>

Chen Li,

There's some documentation on translate() in bptutorial:

http://bioperl.org/Core/Latest/bptutorial.html

You could also use the translate_6frames() method of Bio::SeqUtils.


Brian O.


On 5/16/06 4:52 PM, "smarkel at scitegic.com" <smarkel at scitegic.com> wrote:

> Li,
> 
> You can either do the substring, and reverse complement, yourself
> or you can use the translate() function in Bio::PrimarySeq.  It
> inherits from Bio::PrimarySeqI, so check there for the documentation.
> That translate() function takes a "-frame" argument.
> 
> Scott
> 
> PS In future, please respond to the list.  That way others see
> the questions and answers.
> 
> chen li <chen_li3 at yahoo.com> wrote on 16.05.2006 13:04:36:
> 
>> Dear Dr. Markel,
>> 
>>     I browse through the document of
>> Bio:Tools::Codontable and find this line:
>> 
>> my $translation= $CodonTable->translate($seq);
>> 
>> I think this line is to do the translation. Here is my
>> question: which line in the doc says how to translate
>> the remaining frames 2,3, and -1, -2, -3?
>> 
>> 
>> Thank you,
>> 
>> Li
>> 
>> --- smarkel at scitegic.com wrote:
>> 
>>> Li,
>>> 
>>> Use the translate() function in
>>> Bio::Tools::CodonTable.
>>> 
>>> Scott
>>> 
>>> Scott Markel, Ph.D.
>>> Principal Bioinformatics Architect  email:
>>> smarkel at scitegic.com
>>> SciTegic Inc.                       mobile: +1 858
>>> 205 3653
>>> 10188 Telesis Court, Suite 100      voice:  +1 858
>>> 799 5603
>>> San Diego, CA 92121                 fax:    +1 858
>>> 279 8804
>>> USA                                 web:
>>> http://www.scitegic.com
>>> 
>>> 
>>> bioperl-l-bounces at lists.open-bio.org wrote on
>>> 16.05.2006 07:55:51:
>>> 
>>>> Hi all,
>>>> 
>>>> I wonder which module is available for translating
>>> DNA
>>>> sequence into 6 reading frames.
>>>> 
>>>> Thank you,
>>>> 
>>>> Li
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> 
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>> 
>> 
>> -- 
>> Click on the link below to report this email as spam
>> https://www.mailcontrol.
>> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO!
>> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI!
>> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT!
>> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2!
>> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e-just at northwestern.edu  Wed May 17 11:03:41 2006
From: e-just at northwestern.edu (Eric Just)
Date: Wed, 17 May 2006 10:03:41 -0500
Subject: [Bioperl-l] Modware: a BioPerl based API for Chado
Message-ID: <6.1.1.1.2.20060517095821.13353920@hecky.it.northwestern.edu>

Hi Everyone,

We are announcing a new Sourceforge Project called Modware that may be of 
interest to you.   It is an object-oriented API written in Perl that 
creates BioPerl object representations of biological features stored in a 
Chado database. It basically creates a Bio::Seq object for chromosomes in 
Chado and creates Bio::SeqFeature::Gene objects for protein coding 
transcripts stored in Chado.  Things like contigs are represented as 
Bio::SeqFeature::Generic objects.  We also provide many methods for 
manipulating these objects once they are in memory.

For download please visit our Sourceforge project page:
http://sourceforge.net/projects/gmod-ware

For API documentation and some short examples of selected use cases visit 
our project home page:
http://gmod-ware.sourceforge.net/

This software is adapted from the production middleware code that dictyBase 
uses.  Modware 0.1 requires the latest stable GMOD release: 0.003 be 
installed.  We are currently calling it a release candidate and if we get 
some feedback will call it an official release if there are no major 
install bugs (we've installed it only on two different machines).  If you 
would like a version that works on the latest CVS version of GMOD, let me 
know and I'll expedite getting that out the door.

Lastly, please use the direct download version, we have not fully recovered 
from the recent Sourceforge CVS issues.

Please try the software out and let us know what you think!


Sincerely,
Eric Just and Sohel Merchant

e-just at northwestern.edu
s-merchant at northwestern.edu


============================================

Eric Just
e-just at northwestern.edu
dictyBase Programmer
Center for Genetic Medicine
Northwestern University
http://dictybase.org

============================================ 


From sb at mrc-dunn.cam.ac.uk  Wed May 17 13:46:45 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 17 May 2006 18:46:45 +0100
Subject: [Bioperl-l] Bio::Map:: enhancements
Message-ID: <446B6185.1000602@mrc-dunn.cam.ac.uk>

I added bug http://bugzilla.bioperl.org/show_bug.cgi?id=1998

I'm interested in what people have to say about the secondary 
enhancement I talk about there. Is it a sane thing to do? What are the 
better ways of doing that?
If it /is/ ok, I suppose I'd have to go back and alter 
Bio::Map::MappableI and Bio::Map::MarkerI as well, not just Marker.


Oh, on a side note, you'll see I had to override RangeI's intersection 
method to work on multiple ranges. Why is RangeI limited to an 
intersection of only two ranges?

Cheers,
Sendu.

From David_Waner/San_Diego/Accelrys at scitegic.com  Thu May 18 15:30:46 2006
From: David_Waner/San_Diego/Accelrys at scitegic.com (David_Waner/San_Diego/Accelrys at scitegic.com)
Date: Thu, 18 May 2006 12:30:46 -0700
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on
	Windows
Message-ID: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>

BioPerl Users/Developers,

In our testing we have found severe performance problems using BioPerl 
with Perl 5.8 on Windows (but not on Linux). They show up especially in 
SeqIO when reading or writing Fasta files containing large (~16 MB) 
sequences.  The same files that can be read in 1 or 2 seconds with Windows 
Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.

Although the fault is clearly with Perl, not with BioPerl, I have 
identified a couple of places where BioPerl could be modified in order to 
save Windows Perl 5.8 users a lot of time, while not affecting other 
users. 

For example, in my testing the following excerpt from 
Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a 
16 MB sequence):

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015?\012/\n/g;
        $line =~ s/\015/\n/g unless $ONMAC;
    }
 
whereas the following replacement code should be equivalent: 

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015\012/\012/g;                        # Change all 
CR/LF pairs to LF
        $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to 
NEWLINE
    }
 
but executes in less than 1 second.

In addition, changing:

    defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
 
to:

    defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove 
whitespace
 
in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.

There are also problems in reading files with the <> operator when $/ is 
redefined to "\n>", where reading the first line of Fasta files containing 
large sequences takes ~50 seconds, but reading subsequent lines or files 
takes about 1 second. I don't have a work-around for this.

I would like to ask the mailing list:

1. Has anyone else run into this problem? Any fixes?
2. Do you think BioPerl should incorporate these changes? 

I plan to submit a bug report to perlbug, but don't know when or if the 
problem will be fixed. 

- David


From cjfields at uiuc.edu  Thu May 18 16:07:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 18 May 2006 15:07:14 -0500
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8
	onWindows
In-Reply-To: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>
Message-ID: <002901c67ab6$a84c3140$15327e82@pyrimidine>

David,

I have seen some slowdowns with Bio::SeqIO associated with GenBank files,
which this could be related to.  I can't do anything about it (test or
commit changes) until next week but someone else using Windows might (though
we are few and far between, and I'm switching to Mac OS X in fall).  Would
be nice to try the changes and test it out on a few platforms.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of
> David_Waner/San_Diego/Accelrys at scitegic.com
> Sent: Thursday, May 18, 2006 2:31 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8
> onWindows
> 
> BioPerl Users/Developers,
> 
> In our testing we have found severe performance problems using BioPerl
> with Perl 5.8 on Windows (but not on Linux). They show up especially in
> SeqIO when reading or writing Fasta files containing large (~16 MB)
> sequences.  The same files that can be read in 1 or 2 seconds with Windows
> Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.
> 
> Although the fault is clearly with Perl, not with BioPerl, I have
> identified a couple of places where BioPerl could be modified in order to
> save Windows Perl 5.8 users a lot of time, while not affecting other
> users.
> 
> For example, in my testing the following excerpt from
> Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a
> 16 MB sequence):
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015?\012/\n/g;
>         $line =~ s/\015/\n/g unless $ONMAC;
>     }
> 
> whereas the following replacement code should be equivalent:
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015\012/\012/g;                        # Change all
> CR/LF pairs to LF
>         $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to
> NEWLINE
>     }
> 
> but executes in less than 1 second.
> 
> In addition, changing:
> 
>     defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
> 
> to:
> 
>     defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove
> whitespace
> 
> in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.
> 
> There are also problems in reading files with the <> operator when $/ is
> redefined to "\n>", where reading the first line of Fasta files containing
> large sequences takes ~50 seconds, but reading subsequent lines or files
> takes about 1 second. I don't have a work-around for this.
> 
> I would like to ask the mailing list:
> 
> 1. Has anyone else run into this problem? Any fixes?
> 2. Do you think BioPerl should incorporate these changes?
> 
> I plan to submit a bug report to perlbug, but don't know when or if the
> problem will be fixed.
> 
> - David
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Thu May 18 16:27:57 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 18 May 2006 16:27:57 -0400
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on
 Windows
In-Reply-To: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>
Message-ID: <C092510D.87EB%osborne1@optonline.net>

David,

What are the results from the relevant t/*t files before and after these
patches?

Brian O.


On 5/18/06 3:30 PM, "David_Waner/San_Diego/Accelrys at scitegic.com"
<David_Waner/San_Diego/Accelrys at scitegic.com> wrote:

> BioPerl Users/Developers,
> 
> In our testing we have found severe performance problems using BioPerl
> with Perl 5.8 on Windows (but not on Linux). They show up especially in
> SeqIO when reading or writing Fasta files containing large (~16 MB)
> sequences.  The same files that can be read in 1 or 2 seconds with Windows
> Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.
> 
> Although the fault is clearly with Perl, not with BioPerl, I have
> identified a couple of places where BioPerl could be modified in order to
> save Windows Perl 5.8 users a lot of time, while not affecting other
> users. 
> 
> For example, in my testing the following excerpt from
> Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a
> 16 MB sequence):
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015?\012/\n/g;
>         $line =~ s/\015/\n/g unless $ONMAC;
>     }
>  
> whereas the following replacement code should be equivalent:
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015\012/\012/g;                        # Change all
> CR/LF pairs to LF
>         $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to
> NEWLINE
>     }
>  
> but executes in less than 1 second.
> 
> In addition, changing:
> 
>     defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
>  
> to:
> 
>     defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove
> whitespace
>  
> in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.
> 
> There are also problems in reading files with the <> operator when $/ is
> redefined to "\n>", where reading the first line of Fasta files containing
> large sequences takes ~50 seconds, but reading subsequent lines or files
> takes about 1 second. I don't have a work-around for this.
> 
> I would like to ask the mailing list:
> 
> 1. Has anyone else run into this problem? Any fixes?
> 2. Do you think BioPerl should incorporate these changes?
> 
> I plan to submit a bug report to perlbug, but don't know when or if the
> problem will be fixed.
> 
> - David
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Thu May 18 16:41:27 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 18 May 2006 14:41:27 -0600
Subject: [Bioperl-l] parsing xml output
Message-ID: <446CDBF7.10908@gmx.at>

hi,
what is the best way to parse NCBI- and WU- Blast XML output....
and is it possible to parse both with the same parser, or differ their 
XML output...

thanks

From staffa at niehs.nih.gov  Thu May 18 16:49:15 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Thu, 18 May 2006 16:49:15 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>

Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations.
Namely the six D.melanogaster sequences.  
Specifically to find gene entries and learn the gene name, begin and end and CDS.
Please point me to appropriate modules and documentation.


Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From adamnkraut at gmail.com  Thu May 18 17:07:42 2006
From: adamnkraut at gmail.com (Adam Kraut)
Date: Thu, 18 May 2006 17:07:42 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C?
Message-ID: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>

I am currently using a pairwise alignment algorithm written in C (not by
me).  The program consists of a library of routines, structures, and
definitions which I do not want to spend a lot of time abstracting.  I
already have a hack method of writing the parameters and inputs I want from
perl, calling the c program with system( ), and then parsing the output in
Perl.  Any good programmer would probably smack me but I'm just an undergrad
and I needed to show my boss that this works in order to spend more time on
it.

So on to my question, what is the preferred method of extending Bioperl to
use this algorithm?  I have just read the XS tutorial and a bit about Inline
C.  Can I put the main function in my script using Inline, and then just
point Inline at the rest of the C library?  The program has several
C-structures that are semantically equivalent to Bioperl objects, so just
need somewhere to start.  I will spend some more time so that I have a more
specific question, I just wanted a little feedback, this is my first post to
the bioperl list.

Thanks,
Adam


From osborne1 at optonline.net  Thu May 18 17:54:01 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 18 May 2006 17:54:01 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
Message-ID: <C0926539.87F5%osborne1@optonline.net>

Nick,

Have you read the Feature-Annotation HOWTO? This would be a good starting
point...

Brian O.


On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
wrote:

> Would like a fairly simple way to extract certain information from Genbank
> Genomic File Annotations.
> Namely the six D.melanogaster sequences.
> Specifically to find gene entries and learn the gene name, begin and end and
> CDS.
> Please point me to appropriate modules and documentation.
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 18 18:22:32 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 18 May 2006 18:22:32 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446CDBF7.10908@gmx.at>
References: <446CDBF7.10908@gmx.at>
Message-ID: <EA7E8F20-2531-45B2-83CD-FDA216A18615@duke.edu>

we don't parse WU-BLAST XML at this time.  We'd welcome someone  
contributing this.

ncbi XML is parsed with blastxml format.

-jason
On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:

> hi,
> what is the best way to parse NCBI- and WU- Blast XML output....
> and is it possible to parse both with the same parser, or differ their
> XML output...
>
> thanks
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From MEC at stowers-institute.org  Thu May 18 18:39:15 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 18 May 2006 17:39:15 -0500
Subject: [Bioperl-l] module for formating sequence output on the screen
Message-ID: <CED81D34E37D5043A1211565277A51E50563F496@exchkc02.stowers-institute.org>

Li,

Here's a one-liner that uses bioperl's Bio::SeqIO module to reformat
fasta on standard input to 50 char wide fasta on standard output.

perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta",
-width => 50);  $in = Bio::SeqIO->newFh(-format => "fasta", -fh =>
\*STDIN); print while <$in>' 

You can call it like this:

perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta",
-width => 50);  $in = Bio::SeqIO->newFh(-format => "fasta", -fh =>
\*STDIN); print while <$in>' inputfile.fasta > outputfile.fasta

Does this help?

--Malcolm Cook


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
>Sent: Tuesday, May 16, 2006 7:53 PM
>To: bioperl-l at bioperl.org
>Subject: [Bioperl-l] module for formating sequence output on the screen
>
>Hi all,
>
>Thank you very much for the help.
>
>I have some DNA sequences printed on the screen. But
>the default output is longer than I expect.  I need 50
>necleotides/line. I search CPAN but can not get the
>right module.  Which bioperl module can do this job?
>
>Li
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gish at watson.wustl.edu  Thu May 18 19:57:03 2006
From: gish at watson.wustl.edu (Warren Gish)
Date: Thu, 18 May 2006 18:57:03 -0500
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <EA7E8F20-2531-45B2-83CD-FDA216A18615@duke.edu>
Message-ID: <009f01c67ad6$c359a560$0d00a8c0@PM>

Just to clarify, the XML output from WU-BLAST conforms to the standard
NCBI_BlastOutput.dtd.  Technically, contents of data fields could still be
incompatible, but care was taken to ensure compatibility.  If someone
identifies a difference that prevents parsing or proper interpretation of
the WU-BLAST output, please let me know.
Regards,
--Warren 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Thursday, May 18, 2006 5:23 PM
> To: Hubert Prielinger
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] parsing xml output
> 
> we don't parse WU-BLAST XML at this time.  We'd welcome someone  
> contributing this.
> 
> ncbi XML is parsed with blastxml format.
> 
> -jason
> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
> 
> > hi,
> > what is the best way to parse NCBI- and WU- Blast XML output....
> > and is it possible to parse both with the same parser, or 
> differ their
> > XML output...
> >
> > thanks
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Thu May 18 21:10:50 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Thu, 18 May 2006 20:10:50 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <ca6b1c14.ba9e5f4f.81c0100@expms6.cites.uiuc.edu>

Just to make sure everybody knows, if you use bioperl v1.5.1, 
SearchIO::blastxml uses XML::Parser which should come with most recent perl 
distributions.   The bioperl-live version has switched over to XML::SAX for SAX2 
parsing and it is recommended that you install XML::SAX::ExpatXS as well for 
faster parsing. 

Chris

---- Original message ----
>Date: Thu, 18 May 2006 18:57:03 -0500
>From: "Warren Gish" <gish at watson.wustl.edu>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: "'Hubert Prielinger'" <hubert.prielinger at gmx.at>
>Cc: bioperl-l at bioperl.org
>
>Just to clarify, the XML output from WU-BLAST conforms to the standard
>NCBI_BlastOutput.dtd.  Technically, contents of data fields could still be
>incompatible, but care was taken to ensure compatibility.  If someone
>identifies a difference that prevents parsing or proper interpretation of
>the WU-BLAST output, please let me know.
>Regards,
>--Warren 
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Jason Stajich
>> Sent: Thursday, May 18, 2006 5:23 PM
>> To: Hubert Prielinger
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] parsing xml output
>> 
>> we don't parse WU-BLAST XML at this time.  We'd welcome someone  
>> contributing this.
>> 
>> ncbi XML is parsed with blastxml format.
>> 
>> -jason
>> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
>> 
>> > hi,
>> > what is the best way to parse NCBI- and WU- Blast XML output....
>> > and is it possible to parse both with the same parser, or 
>> differ their
>> > XML output...
>> >
>> > thanks
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

From jason.stajich at duke.edu  Fri May 19 08:52:13 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 19 May 2006 08:52:13 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <009f01c67ad6$c359a560$0d00a8c0@PM>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>
Message-ID: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>

Whoops - sorry Warren - for some reason I had it in my mind that it  
was different.  So the blastxml parser should work fine.  The WUBLAST  
tab-delimited output is different than NCBI's -m8/9 though, right?

-jason


On May 18, 2006, at 7:57 PM, Warren Gish wrote:

> Just to clarify, the XML output from WU-BLAST conforms to the standard
> NCBI_BlastOutput.dtd.  Technically, contents of data fields could  
> still be
> incompatible, but care was taken to ensure compatibility.  If someone
> identifies a difference that prevents parsing or proper  
> interpretation of
> the WU-BLAST output, please let me know.
> Regards,
> --Warren
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Jason Stajich
>> Sent: Thursday, May 18, 2006 5:23 PM
>> To: Hubert Prielinger
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] parsing xml output
>>
>> we don't parse WU-BLAST XML at this time.  We'd welcome someone
>> contributing this.
>>
>> ncbi XML is parsed with blastxml format.
>>
>> -jason
>> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>
>>> hi,
>>> what is the best way to parse NCBI- and WU- Blast XML output....
>>> and is it possible to parse both with the same parser, or
>> differ their
>>> XML output...
>>>
>>> thanks
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Thu May 18 18:42:05 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 19 May 2006 08:42:05 +1000
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446CDBF7.10908@gmx.at>
References: <446CDBF7.10908@gmx.at>
Message-ID: <446CF83D.60207@infotech.monash.edu.au>

> what is the best way to parse NCBI- and WU- Blast XML output....
> and is it possible to parse both with the same parser, or differ their 
> XML output...


For NCBI BLAST XML format, use
	Bio::SearchIO->new(-format=>'blastxml', ...)

I don't know if 'blastxml' will load WU-BLAST XML format.
http://www.bioperl.org/wiki/HOWTO:SearchIO does not mention it.

Why not try it, and report back the results to the bioperl list?

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.seemann.vcf
Type: text/x-vcard
Size: 348 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/b6343abe/attachment.vcf 

From torsten.seemann at infotech.monash.edu.au  Thu May 18 18:37:17 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 19 May 2006 08:37:17 +1000
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
Message-ID: <446CF71D.2070207@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations.
> Namely the six D.melanogaster sequences.  
> Specifically to find gene entries and learn the gene name, begin and end and CDS.
> Please point me to appropriate modules and documentation.

http://www.bioperl.org/
-> http://www.bioperl.org/wiki/HOWTOs
-> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

http://www.bioperl.org/
-> http://www.bioperl.org/wiki/FAQ
-> http://www.bioperl.org/wiki/FAQ#Annotations_and_Features

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.seemann.vcf
Type: text/x-vcard
Size: 348 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/27f849fc/attachment.vcf 

From gish at watson.wustl.edu  Fri May 19 10:50:08 2006
From: gish at watson.wustl.edu (Warren Gish)
Date: Fri, 19 May 2006 09:50:08 -0500
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>
	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
Message-ID: <D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>

Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
blast.wustl.edu/blast/tabular.html).
--Warren

> Whoops - sorry Warren - for some reason I had it in my mind that it  
> was different.  So the blastxml parser should work fine.  The  
> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
> right?
>
> -jason


From adamnkraut at gmail.com  Fri May 19 11:04:01 2006
From: adamnkraut at gmail.com (Adam Kraut)
Date: Fri, 19 May 2006 11:04:01 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
	C?
In-Reply-To: <OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>
References: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>
	<OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>
Message-ID: <134ede0b0605190804i60ee5ce1v984a33e0c91adf52@mail.gmail.com>

The program generates an ensemble of weighted suboptimal alignments by use
of a partition function and stochastic backtracking.  The algorithm is quite
novel and it's really only part of a larger multi-scale comparative modeling
project. There documentation is here:

http://www.tbi.univie.ac.at/~ulim/probA/probA_lib.html

While I think this would be useful to the bioperl community if it were fully
abstracted/extended, I would at the least like to be able to pass in any two
sequences and get back SimpleAlign objects for our internal uses first.  I
have a good idea on how to get started.  I will be sure to post when I get
into trouble.


On 5/19/06, aaron.j.mackey at gsk.com <aaron.j.mackey at gsk.com> wrote:
>
> bioperl-ext is the package in which alignment algorithms and/or BioPerl
> "wrapped" external C libraries live.  Subprojects in bioperl-ext use both
> XS and Inline::C, that's up to you.
>
> You'll need to get your C code compiled to a dynamically loaded library
> (.so) to use either XS or Inline::C; this precludes any reuse of the C
> main() function (although your Inline::C wrapper might recapitulate/copy
> the main() function code).
>
> Out of curiosity, what pairwise alignment algorithm are you using?  This
> is a heavily beaten path, you might want to dig around first to see if
> someone else already has what you need.
>
> -Aaron
>
>


From slenk at emich.edu  Fri May 19 10:42:41 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Fri, 19 May 2006 10:42:41 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
	C?
Message-ID: <f141831f144a37.f144a37f141831@emich.edu>

There is nothing wrong with a reasonable way that works - better not 
to put yourself down.

Inline is good if you can get it to work for you - I have had issues 
with linking Inline to dynamic libraries. I believe Inline makes a 
file that has linkage characteristics specified. Try it and see, then 
tell people how you did it. My two cents.

Another way to use exterior executables is popen3, then reading and 
writing to the pipes. I use it (primer3 and local lab automation 
code) - snippet follows:

my $pid     = 0;
my $cancmd  = 'cancmd.exe';
my $write   = 0;
my $read    = 0;

sub new {

    my $c = {};

    $pid   = open3(\*WTRFH, \*RDRFH, \*RDRFH, $cancmd);

    $write = *WTRFH;
    $read  = *RDRFH;

    $write->autoflush();

    bless $c;
    return $c;
}

Just write your request, then read it back - I make sure that each 
pair is a newline terminated text line - be sure you harvest the child 
pid when you are done.


----- Original Message -----
From: Adam Kraut <adamnkraut at gmail.com>
Date: Thursday, May 18, 2006 5:07 pm
Subject: [Bioperl-l] writing a pairwise alignment module: XS and 
Inline C?

> I am currently using a pairwise alignment algorithm written in C 
> (not by
> me).  The program consists of a library of routines, structures, and
> definitions which I do not want to spend a lot of time 
> abstracting.  I
> already have a hack method of writing the parameters and inputs I 
> want from
> perl, calling the c program with system( ), and then parsing the 
> output in
> Perl.  Any good programmer would probably smack me but I'm just an 
> undergradand I needed to show my boss that this works in order to 
> spend more time on
> it.
> 
> So on to my question, what is the preferred method of extending 
> Bioperl to
> use this algorithm?  I have just read the XS tutorial and a bit 
> about Inline
> C.  Can I put the main function in my script using Inline, and 
> then just
> point Inline at the rest of the C library?  The program has several
> C-structures that are semantically equivalent to Bioperl objects, 
> so just
> need somewhere to start.  I will spend some more time so that I 
> have a more
> specific question, I just wanted a little feedback, this is my 
> first post to
> the bioperl list.
> 
> Thanks,
> Adam
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

From hubert.prielinger at gmx.at  Fri May 19 12:52:28 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 10:52:28 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
	<D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
Message-ID: <446DF7CC.5060509@gmx.at>

hi,
I wondered whether is it also possible in the xml output (either WU or 
NCBI - Blast) to get the species (taxononmy) for every hit, if I do a 
general search.
regards

Warren Gish wrote:
> Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
> blast.wustl.edu/blast/tabular.html).
> --Warren
>
>   
>> Whoops - sorry Warren - for some reason I had it in my mind that it  
>> was different.  So the blastxml parser should work fine.  The  
>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
>> right?
>>
>> -jason
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From staffa at niehs.nih.gov  Fri May 19 14:12:47 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Fri, 19 May 2006 14:12:47 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <C0926539.87F5%osborne1@optonline.net>
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>

Specifically: 
I have the document to which you refer,
but have not seen this one thing I need in the printout of tags etc.:
the values in this line;
     mRNA            join(380..509,578..1913,7784..8649,9439..10200)
Is that a  location object?


Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


> ----------
> From: 	Brian Osborne
> Sent: 	Thursday, May 18, 2006 5:54 PM
> To: 	Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
> Subject: 	Re: [Bioperl-l] Reading GenBank Genomic File Annotation
> 
> Nick,
> 
> Have you read the Feature-Annotation HOWTO? This would be a good starting
> point...
> 
> Brian O.
> 
> 
> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
> wrote:
> 
> > Would like a fairly simple way to extract certain information from Genbank
> > Genomic File Annotations.
> > Namely the six D.melanogaster sequences.
> > Specifically to find gene entries and learn the gene name, begin and end and
> > CDS.
> > Please point me to appropriate modules and documentation.
> > 
> > 
> > Nick Staffa
> > Telephone: 919-316-4569  (NIEHS: 6-4569)
> > Scientific Computing Support Group
> > NIEHS Information Technology Support Services Contract
> > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> > National Institute of Environmental Health Sciences
> > National Institutes of Health
> > Research Triangle Park, North Carolina
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 


From chandan.kr.singh at gmail.com  Fri May 19 14:37:26 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Sat, 20 May 2006 00:07:26 +0530
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
References: <C0926539.87F5%osborne1@optonline.net>
	<7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
Message-ID: <2d4f320605191137n11017ec0xe41a632a3c7ea9a9@mail.gmail.com>

On 5/19/06, Staffa, Nick (NIH/NIEHS) [C] <staffa at niehs.nih.gov> wrote:
>
> Specifically:
> I have the document to which you refer,
> but have not seen this one thing I need in the printout of tags etc.:
> the values in this line;
>      mRNA            join(380..509,578..1913,7784..8649,9439..10200)
> Is that a  location object?


Yes it is a  location object .  If you  want  that  as a  string (this is
what seems  from ur mail ) , u just have to do this :

$loc = $fet->location();

$loc_str = $loc->to_FTstring() ;

Hope it helps.
Chandan

Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> > ----------
> > From:         Brian Osborne
> > Sent:         Thursday, May 18, 2006 5:54 PM
> > To:   Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
> > Subject:      Re: [Bioperl-l] Reading GenBank Genomic File Annotation
> >
> > Nick,
> >
> > Have you read the Feature-Annotation HOWTO? This would be a good
> starting
> > point...
> >
> > Brian O.
> >
> >
> > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov
> >
> > wrote:
> >
> > > Would like a fairly simple way to extract certain information from
> Genbank
> > > Genomic File Annotations.
> > > Namely the six D.melanogaster sequences.
> > > Specifically to find gene entries and learn the gene name, begin and
> end and
> > > CDS.
> > > Please point me to appropriate modules and documentation.
> > >
> > >
> > > Nick Staffa
> > > Telephone: 919-316-4569  (NIEHS: 6-4569)
> > > Scientific Computing Support Group
> > > NIEHS Information Technology Support Services Contract
> > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> > > National Institute of Environmental Health Sciences
> > > National Institutes of Health
> > > Research Triangle Park, North Carolina
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From osborne1 at optonline.net  Fri May 19 15:39:36 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 19 May 2006 15:39:36 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
Message-ID: <C0939738.8849%osborne1@optonline.net>

Nick,

This is from the HOWTO:

Another way of describing a feature in Genbank involves multiple start and
end positions. These could be called "split" locations, and a very common
example is the join statement in the CDS feature found in Genbank entries
(e.g. join(45..122,233..267)). This calls for a specialized object,
Bio::Location::SplitLocationI, which is a container for Location objects:

      for my $feature ($seqobj->top_SeqFeatures){
        if ( $feature->location->isa('Bio::Location::SplitLocationI')
                       && $feature->primary_tag eq 'CDS' )  {
            for my $location ( $feature->location->sub_Location ) {
                print $location->start . ".." . $location->end . "\n";
          }
        }
      }


Brian O.


On 5/19/06 2:12 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
wrote:

> Specifically: 
> I have the document to which you refer,
> but have not seen this one thing I need in the printout of tags etc.:
> the values in this line;
>      mRNA            join(380..509,578..1913,7784..8649,9439..10200)
> Is that a  location object?
> 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
>> ----------
>> From:  Brian Osborne
>> Sent:  Thursday, May 18, 2006 5:54 PM
>> To:  Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
>> Subject:  Re: [Bioperl-l] Reading GenBank Genomic File Annotation
>> 
>> Nick,
>> 
>> Have you read the Feature-Annotation HOWTO? This would be a good starting
>> point...
>> 
>> Brian O.
>> 
>> 
>> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
>> wrote:
>> 
>>> Would like a fairly simple way to extract certain information from Genbank
>>> Genomic File Annotations.
>>> Namely the six D.melanogaster sequences.
>>> Specifically to find gene entries and learn the gene name, begin and end and
>>> CDS.
>>> Please point me to appropriate modules and documentation.
>>> 
>>> 
>>> Nick Staffa
>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>> Scientific Computing Support Group
>>> NIEHS Information Technology Support Services Contract
>>> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
>>> National Institute of Environmental Health Sciences
>>> National Institutes of Health
>>> Research Triangle Park, North Carolina
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May 19 16:42:09 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 14:42:09 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <F5CA1CDF-B22E-4DFD-9CC1-7CEC7FF6FD75@watson.wustl.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
	<D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
	<446DF7CC.5060509@gmx.at>
	<F5CA1CDF-B22E-4DFD-9CC1-7CEC7FF6FD75@watson.wustl.edu>
Message-ID: <446E2DA1.1050503@gmx.at>

hi warren,
that means if I alter the DTD (if that is possible) by adding the 
taxonomic id to the DTD..... then I should have the taxonomic id tag in 
the xml file (theoretically)
but I guess this is only possible with a local search (blastall) but not 
with an online search.

greetings

Warren Gish wrote:
>
> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote:
>
>> hi,
>> I wondered whether is it also possible in the xml output (either WU 
>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>> do a general search.
>> regards
>>
> The taxonomic id is not an entity in the NCBI XML DTD.  If the 
> information was embedded in deflines, one could conceivably parse for 
> it, but I believe the NCBI only distributes taxids in their ASN.1 data 
> and in their pre-formated BLAST databases, and NCBI BLAST only reports 
> taxids in its ASN.1 output format, where taxid is available as an entity.
>
> --Warren
>
>


From cjfields at uiuc.edu  Fri May 19 16:56:56 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Fri, 19 May 2006 15:56:56 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>

You'll have to pull the GI or accession from each hit and do a lookup by either 
grabbing the sequence and using Bio::Species or use Bio::DB::Taxonomy; there 
isn't any tax information directly incorporated into BLAST reports AFAIK.

Chris

---- Original message ----
>Date: Fri, 19 May 2006 10:52:28 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: Warren Gish <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>
>hi,
>I wondered whether is it also possible in the xml output (either WU or 
>NCBI - Blast) to get the species (taxononmy) for every hit, if I do a 
>general search.
>regards
>
>Warren Gish wrote:
>> Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
>> blast.wustl.edu/blast/tabular.html).
>> --Warren
>>
>>   
>>> Whoops - sorry Warren - for some reason I had it in my mind that it  
>>> was different.  So the blastxml parser should work fine.  The  
>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
>>> right?
>>>
>>> -jason
>>>     
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>   
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

From cjfields at uiuc.edu  Fri May 19 16:59:35 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Fri, 19 May 2006 15:59:35 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <65932c77.bb0b33b0.8253400@expms6.cites.uiuc.edu>

Um, I don't think it works that way.  I'm pretty sure the XML is generated from 
the ASN1 output.  I don't think (like Warren says) that you can directly get to the 
tax information.  Indirectly is another matter...

Chris

---- Original message ----
>Date: Fri, 19 May 2006 14:42:09 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: Warren Gish <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>
>hi warren,
>that means if I alter the DTD (if that is possible) by adding the 
>taxonomic id to the DTD..... then I should have the taxonomic id tag in 
>the xml file (theoretically)
>but I guess this is only possible with a local search (blastall) but not 
>with an online search.
>
>greetings
>
>Warren Gish wrote:
>>
>> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote:
>>
>>> hi,
>>> I wondered whether is it also possible in the xml output (either WU 
>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>>> do a general search.
>>> regards
>>>
>> The taxonomic id is not an entity in the NCBI XML DTD.  If the 
>> information was embedded in deflines, one could conceivably parse for 
>> it, but I believe the NCBI only distributes taxids in their ASN.1 data 
>> and in their pre-formated BLAST databases, and NCBI BLAST only reports 
>> taxids in its ASN.1 output format, where taxid is available as an entity.
>>
>> --Warren
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

From hubert.prielinger at gmx.at  Fri May 19 17:30:20 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 15:30:20 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446E3854.5010708@gmx.at>
References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>
	<446E3854.5010708@gmx.at>
Message-ID: <446E38EC.9020100@gmx.at>

ok, thanks,
it appears that I only need the species where the Protein is derived 
from, so I guess Bio:Species would satisfy me, or?
and it would work that I just pull off the accession from the blast 
output file and then assign the accession code and get as return value  
the  species name.
is it possible to just assign the accession code, because I looked up 
but they were always talking of the entire file.

regards
>
>
> Christopher Fields wrote:
>> You'll have to pull the GI or accession from each hit and do a lookup 
>> by either grabbing the sequence and using Bio::Species or use 
>> Bio::DB::Taxonomy; there isn't any tax information directly 
>> incorporated into BLAST reports AFAIK.
>>
>> Chris
>>
>> ---- Original message ----
>>  
>>> Date: Fri, 19 May 2006 10:52:28 -0600
>>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  Subject: Re: 
>>> [Bioperl-l] parsing xml output  To: Warren Gish 
>>> <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>>>
>>> hi,
>>> I wondered whether is it also possible in the xml output (either WU 
>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>>> do a general search.
>>> regards
>>>
>>> Warren Gish wrote:
>>>    
>>>> Right, the WU-BLAST tabbed output contains more fields.  (See 
>>>> http:// blast.wustl.edu/blast/tabular.html).
>>>> --Warren
>>>>
>>>>        
>>>>> Whoops - sorry Warren - for some reason I had it in my mind that 
>>>>> it  was different.  So the blastxml parser should work fine.  The  
>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 
>>>>> though,  right?
>>>>>
>>>>> -jason
>>>>>             
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>     
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>   
>
>


From jason.stajich at duke.edu  Fri May 19 18:40:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 19 May 2006 18:40:54 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446E38EC.9020100@gmx.at>
References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>
	<446E3854.5010708@gmx.at> <446E38EC.9020100@gmx.at>
Message-ID: <FAE3151B-301F-4A42-9EFD-D1F8D3CBE752@duke.edu>

There is a gi2taxid table in the /pub/taxonomy part of NCBI FTP site  
(ftp.ncbi.nih.gov) -- I have used this to take GI numbers from report  
and get taxonomy for overall classification. I think something like  
this exists in the scripts or examples directory in the bioperl  
distro. I know I posted about it when I wrote about it a while ago.

-jason
On May 19, 2006, at 5:30 PM, Hubert Prielinger wrote:

> ok, thanks,
> it appears that I only need the species where the Protein is derived
> from, so I guess Bio:Species would satisfy me, or?
> and it would work that I just pull off the accession from the blast
> output file and then assign the accession code and get as return value
> the  species name.
> is it possible to just assign the accession code, because I looked up
> but they were always talking of the entire file.
>
> regards
>>
>>
>> Christopher Fields wrote:
>>> You'll have to pull the GI or accession from each hit and do a  
>>> lookup
>>> by either grabbing the sequence and using Bio::Species or use
>>> Bio::DB::Taxonomy; there isn't any tax information directly
>>> incorporated into BLAST reports AFAIK.
>>>
>>> Chris
>>>
>>> ---- Original message ----
>>>
>>>> Date: Fri, 19 May 2006 10:52:28 -0600
>>>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  Subject: Re:
>>>> [Bioperl-l] parsing xml output  To: Warren Gish
>>>> <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>>>>
>>>> hi,
>>>> I wondered whether is it also possible in the xml output (either WU
>>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I
>>>> do a general search.
>>>> regards
>>>>
>>>> Warren Gish wrote:
>>>>
>>>>> Right, the WU-BLAST tabbed output contains more fields.  (See
>>>>> http:// blast.wustl.edu/blast/tabular.html).
>>>>> --Warren
>>>>>
>>>>>
>>>>>> Whoops - sorry Warren - for some reason I had it in my mind that
>>>>>> it  was different.  So the blastxml parser should work fine.  The
>>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9
>>>>>> though,  right?
>>>>>>
>>>>>> -jason
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From ewijaya at i2r.a-star.edu.sg  Sat May 20 08:36:44 2006
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sat, 20 May 2006 20:36:44 +0800
Subject: [Bioperl-l] Method for checking Sequence type of a file
Message-ID: <30362db229c.446f7ddc@i2r.a-star.edu.sg>


Dear expert,

Is there any Bioperl method that allows
you to check verify sequence type in a file?

For example, given a file we wish
to check (return true  or false) whether
it is in FASTA format, GENBANK format, etc.

This method is useful in web application
as taint checking procedure.

Regards,
Edward WIJAYA
SINGAPORE


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From aaron.j.mackey at gsk.com  Fri May 19 09:33:01 2006
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Fri, 19 May 2006 09:33:01 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
 C?
In-Reply-To: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>
Message-ID: <OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>

bioperl-ext is the package in which alignment algorithms and/or BioPerl 
"wrapped" external C libraries live.  Subprojects in bioperl-ext use both 
XS and Inline::C, that's up to you.

You'll need to get your C code compiled to a dynamically loaded library 
(.so) to use either XS or Inline::C; this precludes any reuse of the C 
main() function (although your Inline::C wrapper might recapitulate/copy 
the main() function code).

Out of curiosity, what pairwise alignment algorithm are you using?  This 
is a heavily beaten path, you might want to dig around first to see if 
someone else already has what you need.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 05/18/2006 05:07:42 PM:

> I am currently using a pairwise alignment algorithm written in C (not by
> me).  The program consists of a library of routines, structures, and
> definitions which I do not want to spend a lot of time abstracting.  I
> already have a hack method of writing the parameters and inputs I want 
from
> perl, calling the c program with system( ), and then parsing the output 
in
> Perl.  Any good programmer would probably smack me but I'm just an 
undergrad
> and I needed to show my boss that this works in order to spend more time 
on
> it.
> 
> So on to my question, what is the preferred method of extending Bioperl 
to
> use this algorithm?  I have just read the XS tutorial and a bit about 
Inline
> C.  Can I put the main function in my script using Inline, and then just
> point Inline at the rest of the C library?  The program has several
> C-structures that are semantically equivalent to Bioperl objects, so 
just
> need somewhere to start.  I will spend some more time so that I have a 
more
> specific question, I just wanted a little feedback, this is my first 
post to
> the bioperl list.
> 
> Thanks,
> Adam
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason.stajich at duke.edu  Sat May 20 10:50:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 20 May 2006 10:50:17 -0400
Subject: [Bioperl-l] Method for checking Sequence type of a file
In-Reply-To: <30362db229c.446f7ddc@i2r.a-star.edu.sg>
References: <30362db229c.446f7ddc@i2r.a-star.edu.sg>
Message-ID: <F42D42CC-B609-48DF-B291-E0CE803D527C@duke.edu>

Try Bio::Tools::GuessSeqFormat

On May 20, 2006, at 8:36 AM, Wijaya Edward wrote:

>
> Dear expert,
>
> Is there any Bioperl method that allows
> you to check verify sequence type in a file?
>
> For example, given a file we wish
> to check (return true  or false) whether
> it is in FASTA format, GENBANK format, etc.
>
> This method is useful in web application
> as taint checking procedure.
>
> Regards,
> Edward WIJAYA
> SINGAPORE
>
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From chen_li3 at yahoo.com  Sat May 20 20:15:01 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sat, 20 May 2006 17:15:01 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
Message-ID: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>

Dear all,


I try one script from GraphicsHowTo under Cygwin
environment(GD and libpng already installed). I type
this line in Cygwin X window:


$ perl render_blast1.pl data1.txt | display -

And here is the result:

display: no decode delegate for this image format
`/tmp/magick-qKiRPDRS'.

Any idea?


Thank you very much,

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From osborne1 at optonline.net  Sat May 20 20:59:06 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sat, 20 May 2006 20:59:06 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <C095339A.886C%osborne1@optonline.net>

Chen,

Not sure. However, whenever I see a new or incomprehensible error message
like "display: no decode delegate for this image format" I Google it.

Brian O.


On 5/20/06 8:15 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> Dear all,
> 
> 
> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
> 
> 
> $ perl render_blast1.pl data1.txt | display -
> 
> And here is the result:
> 
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.
> 
> Any idea?
> 
> 
> Thank you very much,
> 
> Li
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From n.saunders at uq.edu.au  Sun May 21 18:17:44 2006
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Mon, 22 May 2006 08:17:44 +1000
Subject: [Bioperl-l] problems with Bio::Graph
Message-ID: <4470E708.3070402@uq.edu.au>

dear all,

I am having some problems with the Bio::Graph modules.  Running Bioperl 1.5.0 
RC1 with Ubuntu 5.10 i686.

I would like to parse files in PSI MI XML 2.5 format and for selected proteins, 
get the Uniprot accession of interacting partners (this is outlined in the 
documentation for Bio::Graph::ProteinGraph).  I wrote a very simple test script 
and ran it on a selection of XML files.  The script is simply:

----------------------------------------------------------------
use strict;
use Bio::Graph::IO;

my $mifile = shift || die("Usage = biograph.pl <MI datafile>\n");
my $graphio = Bio::Graph::IO->new('-file'   => $mifile,
		  		  '-format' => 'psi_xml');
my $gr = $graphio->next_network;
----------------------------------------------------------------

Here's a summary of the error messages with some sample files (I tried PSI MI 
XML versions 1 and 2.5):

1.  MINT database 9707552_small.xml (PSI 2.5)
Can't call method "att" on an undefined value at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.

2. IntAct database yeast_small-11.xml (PSI 2.5)
Can't call method "att" on an undefined value at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.

3. IntAct database yeast_small-11.xml (PSI 1)
Use of uninitialized value in string eq at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126.

4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1)
These give no errors

5. DIP file dip20060402.mif (PSI 1, complete dataset)
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1'
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328
STACK: Bio::Species::validate_species_name 
/usr/local/share/perl/5.8.7/Bio/Species.pm:340
STACK: Bio::Species::classification /usr/local/share/perl/5.8.7/Bio/Species.pm:170
STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118
STACK: Bio::Graph::IO::psi_xml::_proteinInteractor 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105
STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473
STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469
STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187
STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233
STACK: Bio::Graph::IO::psi_xml::next_network 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79
STACK: ./biograph.pl:18
-----------------------------------------------------------


Looking at the module code, it seems that the first 2 errors relate to a 
parameter "proteinInteractorRef", found in PSI MI version 1 but not version 2.5. 
  Error 3 I haven't yet figured out.  DIP PSI MI XML version 1 for single 
species seems OK, but it seems there are species names in the complete dataset 
that cause problems (error 5).


Is the CVS version of Bio::Graph any better at handling PSI MI XML?  Are there 
plans to get it to work with version 2.5 files from all sources (MINT and 
IntAct) ?  Googling and checking the list archives didn't give a lot of hits 
which made me think it's not a widely-used module.

thanks,
Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://psychro.bioinformatics.unsw.edu.au/neil

From torsten.seemann at infotech.monash.edu.au  Sun May 21 21:31:56 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 22 May 2006 11:31:56 +1000
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <4471148C.5090404@infotech.monash.edu.au>

> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
> $ perl render_blast1.pl data1.txt | display -
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.

You are piping the output of the Perl script (which is a GIF/PNG image) 
into the input of a program called "display". This program is part of 
the ImageMagick toolkit, standard on most Linux installations. Because 
you are using Windows you probably don't have it installed! Try this:

$ perl render_blast1.pl data1.txt > image.gif

Then load 'image.gif' into whatever your favourite image viewer is.
	
-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From darin.london at duke.edu  Mon May 22 11:29:45 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 11:29:45 -0400
Subject: [Bioperl-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <4471D8E9.8090109@duke.edu>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.


From darin.london at duke.edu  Mon May 22 12:00:55 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 09:00:55 -0700
Subject: [Bioperl-l] [Bioperl-announce-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <000301c67db8$e8391f70$6400a8c0@CodonSolutions.local>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.

_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From osborne1 at optonline.net  Mon May 22 17:37:50 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 22 May 2006 17:37:50 -0400
Subject: [Bioperl-l] problems with Bio::Graph
In-Reply-To: <4470E708.3070402@uq.edu.au>
Message-ID: <C097A76E.88A9%osborne1@optonline.net>

Neil,

Let me propose an alternative. In the past few months I've been working on a
Bioperl package for handling protein interaction networks, it is called
bioperl-network. It's similar to the Bio::Graph modules, except for the
following:

- It does not use Nat Goodman's SimpleGraph, it uses Perl's Graph. The
advantage is that we are not responsible for maintaining the algorithm code,
the disadvantage is that Graph has some bugs but Jarkko Hietaniemi has been
working on these and has fixed some significant ones recently.

- It uses names and concepts from Graph. It also has separate notions of
edge and interaction, where one edge can have one or more interactions.

- It uses more method names and conventions borrowed from interaction
databases and PSI MI. For example, a node can be a protein complex composed
of multiple Seq objects, not just a protein.

This package is a makeover of Bio::Graph, therefore Nat Goodman and Richard
Adams are major contributors to it. It's also worth mentioning that it's not
complete, meaning it won't parse all fields from PSI MI 2 or 2.5 but I think
it should be able to handle the code you've shown (and if it cannot then
I'll see that it's fixed). I don't know about PSI MI version 1 but if I'm
not mistaken there's a version 1 -> version 2 converter.

I'm about to put this into CVS so you can take a look, should you choose to.

Brian O.


On 5/21/06 6:17 PM, "Neil Saunders" <n.saunders at uq.edu.au> wrote:

> dear all,
> 
> I am having some problems with the Bio::Graph modules.  Running Bioperl 1.5.0
> RC1 with Ubuntu 5.10 i686.
> 
> I would like to parse files in PSI MI XML 2.5 format and for selected
> proteins, 
> get the Uniprot accession of interacting partners (this is outlined in the
> documentation for Bio::Graph::ProteinGraph).  I wrote a very simple test
> script 
> and ran it on a selection of XML files.  The script is simply:
> 
> ----------------------------------------------------------------
> use strict;
> use Bio::Graph::IO;
> 
> my $mifile = shift || die("Usage = biograph.pl <MI datafile>\n");
> my $graphio = Bio::Graph::IO->new('-file'   => $mifile,
>  '-format' => 'psi_xml');
> my $gr = $graphio->next_network;
> ----------------------------------------------------------------
> 
> Here's a summary of the error messages with some sample files (I tried PSI MI
> XML versions 1 and 2.5):
> 
> 1.  MINT database 9707552_small.xml (PSI 2.5)
> Can't call method "att" on an undefined value at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.
> 
> 2. IntAct database yeast_small-11.xml (PSI 2.5)
> Can't call method "att" on an undefined value at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.
> 
> 3. IntAct database yeast_small-11.xml (PSI 1)
> Use of uninitialized value in string eq at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126.
> 
> 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1)
> These give no errors
> 
> 5. DIP file dip20060402.mif (PSI 1, complete dataset)
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1'
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328
> STACK: Bio::Species::validate_species_name
> /usr/local/share/perl/5.8.7/Bio/Species.pm:340
> STACK: Bio::Species::classification
> /usr/local/share/perl/5.8.7/Bio/Species.pm:170
> STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118
> STACK: Bio::Graph::IO::psi_xml::_proteinInteractor
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105
> STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473
> STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469
> STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187
> STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233
> STACK: Bio::Graph::IO::psi_xml::next_network
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79
> STACK: ./biograph.pl:18
> -----------------------------------------------------------
> 
> 
> Looking at the module code, it seems that the first 2 errors relate to a
> parameter "proteinInteractorRef", found in PSI MI version 1 but not version
> 2.5. 
>   Error 3 I haven't yet figured out.  DIP PSI MI XML version 1 for single
> species seems OK, but it seems there are species names in the complete dataset
> that cause problems (error 5).
> 
> 
> Is the CVS version of Bio::Graph any better at handling PSI MI XML?  Are there
> plans to get it to work with version 2.5 files from all sources (MINT and
> IntAct) ?  Googling and checking the list archives didn't give a lot of hits
> which made me think it's not a widely-used module.
> 
> thanks,
> Neil


From torsten.seemann at infotech.monash.edu.au  Mon May 22 17:53:02 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 23 May 2006 07:53:02 +1000
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>
References: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>
Message-ID: <447232BE.1080001@infotech.monash.edu.au>

Chen Li

>  perl render_blast1.pl data1.txt >im.png

Based on http://bioperl.org/wiki/HOWTO:Graphics I believe the example 
script is creating a PNG image. The last line is:
print $panel->png;

> and Perl runs without any problem. I use adobe
> photoshop to open them and Adobe can't recognize them.
> If I use ACDSee to open them I only get a black
> background. If I issue this line under Cygwin X window
> display im.png  or display im.gif
> Cygwin says:
> display: Improper image header `im.png'.
> It seems Perl can't produce an image with right
> format.

Are you sure Perl is producing a PNG file at all?
How many bytes does im.png use? Zero?

Did you notice this in http://bioperl.org/wiki/HOWTO:Graphics ?

It says: "If you are on a Windows platform, you need to put STDOUT into 
binary mode so that the PNG file does not go through Window's carriage 
return/linefeed transformations. Before the final print statement, put 
the statement binmode(STDOUT)."

ie. your script should have

binmode(STDOUT);
print $panel->png;

as the last 2 lines.

> Do you experience the same problem before?

No.

--Torsten

From chen_li3 at yahoo.com  Mon May 22 09:25:53 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 22 May 2006 06:25:53 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <4471148C.5090404@infotech.monash.edu.au>
Message-ID: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>

Dear Dr. Seemann,


Thank you very much for the reply.

I issue this line:
 perl render_blast1.pl data1.txt >im.gif
or 
 perl render_blast1.pl data1.txt >im.png

and Perl runs without any problem. I use adobe
photoshop to open them and Adobe can't recognize them.
If I use ACDSee to open them I only get a black
background. If I issue this line under Cygwin X window

display im.png  or display im.gif

Cygwin says:

display: Improper image header `im.png'.

or display: Improper image header `im.gif'.

It seems Perl can't produce an image with right
format.


Do you experience the same problem before?

Li


--- Torsten Seemann
<torsten.seemann at infotech.monash.edu.au> wrote:

> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I
> type
> > this line in Cygwin X window:
> > $ perl render_blast1.pl data1.txt | display -
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> 
> You are piping the output of the Perl script (which
> is a GIF/PNG image) 
> into the input of a program called "display". This
> program is part of 
> the ImageMagick toolkit, standard on most Linux
> installations. Because 
> you are using Windows you probably don't have it
> installed! Try this:
> 
> $ perl render_blast1.pl data1.txt > image.gif
> 
> Then load 'image.gif' into whatever your favourite
> image viewer is.
> 	
> -- 
> Dr Torsten Seemann              
> http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash
> University, Australia
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From chen_li3 at yahoo.com  Mon May 22 18:57:42 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 22 May 2006 15:57:42 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <447232BE.1080001@infotech.monash.edu.au>
Message-ID: <20060522225742.78245.qmail@web36804.mail.mud.yahoo.com>

Hi,

I try both: either with or without this statement 
 binmode(STDOUT) before the last line print
$panel->png; But there are no differenes. I get a file
of 2432 bytes.

Li


> Chen Li
> 
> >  perl render_blast1.pl data1.txt >im.png
> 
> Based on http://bioperl.org/wiki/HOWTO:Graphics I
> believe the example 
> script is creating a PNG image. The last line is:
> print $panel->png;
> 
> > and Perl runs without any problem. I use adobe
> > photoshop to open them and Adobe can't recognize
> them.
> > If I use ACDSee to open them I only get a black
> > background. If I issue this line under Cygwin X
> window
> > display im.png  or display im.gif
> > Cygwin says:
> > display: Improper image header `im.png'.
> > It seems Perl can't produce an image with right
> > format.
> 
> Are you sure Perl is producing a PNG file at all?
> How many bytes does im.png use? Zero?
> 
> Did you notice this in
> http://bioperl.org/wiki/HOWTO:Graphics ?
> 
> It says: "If you are on a Windows platform, you need
> to put STDOUT into 
> binary mode so that the PNG file does not go through
> Window's carriage 
> return/linefeed transformations. Before the final
> print statement, put 
> the statement binmode(STDOUT)."
> 
> ie. your script should have
> 
> binmode(STDOUT);
> print $panel->png;
> 
> as the last 2 lines.
> 
> > Do you experience the same problem before?
> 
> No.
> 
> --Torsten
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From barry.moore at genetics.utah.edu  Mon May 22 21:00:06 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Mon, 22 May 2006 19:00:06 -0600
Subject: [Bioperl-l] Problems with Unflattener.pm
Message-ID: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu>

Hi All,

NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into  
an infinite recursive loop.  The trouble occurs in the method  
find_best_matches between lines 2258 and 2281, and in particular the  
loop is perpetuated by line 2273.   NT_113910 has a fairly complex  
features table, and but I have as yet been unable to figure out why  
this loop is not exiting properly.  This has been submitted to  
bugzilla, but I?ll post here so it gets documented on the list also.   
Any suggestions from Chris or others would be greatly appreciated.

This problem can be recreated as follows:

Grab NT_113910 from genbank.
bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk

Pass NT_113910.gbk on the command line to the attached script.


#!/usr/bin/perl;

use strict;
use warnings;

use Bio::SeqIO;
use Bio::SeqFeature::Tools::Unflattener;

my $file = shift;

# generate an Unflattener object
my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
#$unflattener->verbose(1);

# first fetch a genbank SeqI object
my $seqio =
     Bio::SeqIO->new(-file   => $file,
                     -format => 'GenBank');
my $out =
     Bio::SeqIO->new(-format => 'asciitree');
while (my $seq = $seqio->next_seq()) {

         # get top level unflattended SeqFeatureI objects
         $unflattener->unflatten_seq(-seq       => $seq,
                                     -use_magic => 1);
         $out->write_seq($seq);
}


From miker at biotiquesystems.com  Mon May 22 19:56:52 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Mon, 22 May 2006 16:56:52 -0700
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
Message-ID: <002a01c67dfb$663cc600$c100a8c0@mike>


As best as I can tell, using Bio::SeqIO to parse a uniprot file ignores the
sequence version, and calling seq_version() on the resulting RichSeq object
returns undef.

It looks like swiss.pm is trying to parse the version out of the SV line, which
apparently doesn't exist any more?  The sequence version(s) are now specified as
part of the Date (DT) lines.  

Is this not a bug?  Is swiss.pm not designed to parse uniprot files?

Thanks for any help ...


From jason.stajich at duke.edu  Mon May 22 21:37:13 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 22 May 2006 21:37:13 -0400
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <002a01c67dfb$663cc600$c100a8c0@mike>
References: <002a01c67dfb$663cc600$c100a8c0@mike>
Message-ID: <B62A5429-083F-4B93-87EF-0F5DCD4033FE@duke.edu>

Sounds like a "missing feature" =)

AFAIK the module was only written for swissprot files.  It is  
possible there have been changes in the format that have not been  
tracked to the current code.  We'd certainly appreciate someone  
testing it out as versions evolve.  If you submit a bug to bugzilla  
with version of bioperl and example files you can track when a fix is  
in.  We of course appreciate anyone's efforts to provide a patch as  
most bugs get fixed of late when someone gets "itchy" enough to fix  
them.

-jason

On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:

>
> As best as I can tell, using Bio::SeqIO to parse a uniprot file  
> ignores the
> sequence version, and calling seq_version() on the resulting  
> RichSeq object
> returns undef.
>
> It looks like swiss.pm is trying to parse the version out of the SV  
> line, which
> apparently doesn't exist any more?  The sequence version(s) are now  
> specified as
> part of the Date (DT) lines.
>
> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>
> Thanks for any help ...
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Mon May 22 22:04:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 22 May 2006 22:04:17 -0400
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike>
References: <003301c67e0b$5dd44410$c100a8c0@mike>
Message-ID: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu>

We ask that people post patches to the bugzilla as an attachment to  
the bugzilla so we can track what and why the bug was that the patch  
fixes.

I am not totally sure this patch works because it seems like we need  
to strip out more information now from the DT line if the $date  
actually contains more information than just the date.

If you would go ahead and create a bug in bugzilla for  this (http:// 
bugzilla.open-bio.org) this sort of conversation can be tracked to  
the bug.

If any of this is unclear please let us know - I though we had put  
some pages up about this sort of thing on the wiki but maybe they  
need to be expanded.

-jason
On May 22, 2006, at 9:51 PM, Michael Rogoff wrote:

> I have a patch that seems to work but I'm not familiar with the  
> proper method to
> "provide" it.  How do I go about that?
>
> The patch is pretty simple, it just parses the sequence version out  
> of the date
> line where it now hides:
>
>          #date
>          elsif( /^DT\s+(.*)/ ) {
>            my $date = $1;
> +
> +          if ($date =~ /sequence version (\d+)/i) {
> +              $params{'-seq_version'} ||= $1;
> +          }
> +
>            $date =~ s/\;//;
>            $date =~ s/\s+$//;
>            push @{$params{'-dates'}}, $date;
>          }
>
> By the way, what is the difference between Bio::Seq::version and
> Bio::Seq::RichSeq::seq_version?
>
>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Monday, May 22, 2006 6:37 PM
>> To: Michael Rogoff
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
>>
>>
>> Sounds like a "missing feature" =)
>>
>> AFAIK the module was only written for swissprot files.  It is
>> possible there have been changes in the format that have not been
>> tracked to the current code.  We'd certainly appreciate someone
>> testing it out as versions evolve.  If you submit a bug to bugzilla
>> with version of bioperl and example files you can track when
>> a fix is
>> in.  We of course appreciate anyone's efforts to provide a patch as
>> most bugs get fixed of late when someone gets "itchy" enough to fix
>> them.
>>
>> -jason
>>
>> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
>>
>>>
>>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
>>> ignores the
>>> sequence version, and calling seq_version() on the resulting
>>> RichSeq object
>>> returns undef.
>>>
>>> It looks like swiss.pm is trying to parse the version out
>> of the SV
>>> line, which
>>> apparently doesn't exist any more?  The sequence version(s)
>> are now
>>> specified as
>>> part of the Date (DT) lines.
>>>
>>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>>>
>>> Thanks for any help ...
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Marc.Logghe at DEVGEN.com  Tue May 23 03:08:37 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 23 May 2006 09:08:37 +0200
Subject: [Bioperl-l] problems iwth Bio::graphics module
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com>

Hi Li,
Did you check your script for any other print statements (to STDOUT,
that is) that potentially could contaminate your png stream ?

Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
> Sent: Tuesday, May 23, 2006 12:58 AM
> To: Torsten Seemann
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] problems iwth Bio::graphics module
> 
> Hi,
> 
> I try both: either with or without this statement
>  binmode(STDOUT) before the last line print $panel->png; But 
> there are no differenes. I get a file of 2432 bytes.
> 
> Li
> 
> 
> 
> > Chen Li
> > 
> > >  perl render_blast1.pl data1.txt >im.png
> > 
> > Based on http://bioperl.org/wiki/HOWTO:Graphics I believe 
> the example 
> > script is creating a PNG image. The last line is:
> > print $panel->png;
> > 
> > > and Perl runs without any problem. I use adobe photoshop to open 
> > > them and Adobe can't recognize
> > them.
> > > If I use ACDSee to open them I only get a black background. If I 
> > > issue this line under Cygwin X
> > window
> > > display im.png  or display im.gif
> > > Cygwin says:
> > > display: Improper image header `im.png'.
> > > It seems Perl can't produce an image with right format.
> > 
> > Are you sure Perl is producing a PNG file at all?
> > How many bytes does im.png use? Zero?
> > 
> > Did you notice this in
> > http://bioperl.org/wiki/HOWTO:Graphics ?
> > 
> > It says: "If you are on a Windows platform, you need to put STDOUT 
> > into binary mode so that the PNG file does not go through Window's 
> > carriage return/linefeed transformations. Before the final print 
> > statement, put the statement binmode(STDOUT)."
> > 
> > ie. your script should have
> > 
> > binmode(STDOUT);
> > print $panel->png;
> > 
> > as the last 2 lines.
> > 
> > > Do you experience the same problem before?
> > 
> > No.
> > 
> > --Torsten
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection 
> around http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From chen_li3 at yahoo.com  Tue May 23 09:27:06 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 06:27:06 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com>
Message-ID: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com>

Dear Dr. Logghe,

Thank you so much. I have the script worked after
getting your suggestion under Cygwin. Here are the
last two lines:

either binmode (STDOUT);
       print STDOUT $panel->png;

or only print STDOUT $panel->png;

They both work for me. I know the default output in
perl to the screen. I don't why it works if STDOUT
after print is added. Could you explain it?  

BTW I copy  this script from GraphicsHowTo on Bioperl
website and only one line contains print statement,
which is 'print $panel->png'. 

Once again thank you so much,

Li

--- Marc Logghe <Marc.Logghe at devgen.com> wrote:

> Hi Li,
> Did you check your script for any other print
> statements (to STDOUT,
> that is) that potentially could contaminate your png
> stream ?
> 
> Marc
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org 
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On
> Behalf Of chen li
> > Sent: Tuesday, May 23, 2006 12:58 AM
> > To: Torsten Seemann
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] problems iwth
> Bio::graphics module
> > 
> > Hi,
> > 
> > I try both: either with or without this statement
> >  binmode(STDOUT) before the last line print
> $panel->png; But 
> > there are no differenes. I get a file of 2432
> bytes.
> > 
> > Li
> > 
> > 
> > 
> > > Chen Li
> > > 
> > > >  perl render_blast1.pl data1.txt >im.png
> > > 
> > > Based on http://bioperl.org/wiki/HOWTO:Graphics
> I believe 
> > the example 
> > > script is creating a PNG image. The last line
> is:
> > > print $panel->png;
> > > 
> > > > and Perl runs without any problem. I use adobe
> photoshop to open 
> > > > them and Adobe can't recognize
> > > them.
> > > > If I use ACDSee to open them I only get a
> black background. If I 
> > > > issue this line under Cygwin X
> > > window
> > > > display im.png  or display im.gif
> > > > Cygwin says:
> > > > display: Improper image header `im.png'.
> > > > It seems Perl can't produce an image with
> right format.
> > > 
> > > Are you sure Perl is producing a PNG file at
> all?
> > > How many bytes does im.png use? Zero?
> > > 
> > > Did you notice this in
> > > http://bioperl.org/wiki/HOWTO:Graphics ?
> > > 
> > > It says: "If you are on a Windows platform, you
> need to put STDOUT 
> > > into binary mode so that the PNG file does not
> go through Window's 
> > > carriage return/linefeed transformations. Before
> the final print 
> > > statement, put the statement binmode(STDOUT)."
> > > 
> > > ie. your script should have
> > > 
> > > binmode(STDOUT);
> > > print $panel->png;
> > > 
> > > as the last 2 lines.
> > > 
> > > > Do you experience the same problem before?
> > > 
> > > No.
> > > 
> > > --Torsten
> > > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection 
> > around http://mail.yahoo.com 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From lstein at cshl.edu  Tue May 23 10:06:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 23 May 2006 10:06:27 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <200605231006.28392.lstein@cshl.edu>

Hi,

It is possible that your version of display can't handle PNG images. Try 
saving the output as a file and then opening it in another image program:

	perl render_blast1.pl data1.txt > data1.png

Another thing to watch out for is that, depending on what version of Perl 
you're using, you may have to insert this statement into the render_blast1.pl 
script (somewhere near the top):

	binmode STDOUT;

Lincoln


On Saturday 20 May 2006 20:15, chen li wrote:
> Dear all,
>
>
> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
>
>
> $ perl render_blast1.pl data1.txt | display -
>
> And here is the result:
>
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.
>
> Any idea?
>
>
> Thank you very much,
>
> Li
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From Derek.Fairley at bll.n-i.nhs.uk  Tue May 23 10:39:16 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Tue, 23 May 2006 15:39:16 +0100
Subject: [Bioperl-l] Bio::Restriction::IO query
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C04019F@bllmail.bll.n-i.nhs.uk>

Hi folks,

I'm new to BioPerl, and struggling to make the Bio::Restriction::*
modules work (using BioPerl-1.4; Perl-5.8.1; Linux-2.4). Specifically,
I'm having some trouble understanding the behaviour of the
Bio::Restriction::IO module. I'm trying to use this to create a
Bio::Restriction::EnzymeCollection object from a local REBASE file
(which is in bairoch-format); this will in turn be passed to a
Bio::Restriction::Analysis object.

The following test script (derived from the Bio::Restriction::IO
perldoc) runs fine:

#! /usr/bin/perl -w
use strict;
use warnings;
use Bio::Restriction::IO;

my $in = Bio::Restriction::IO->new(	-file => "REBASE_file",
						-format =>'Bairoch');
my $collection = $in->read();
print "Number of REs in the collection: ", scalar
$collection->each_enzyme, "\n";

#note that using -format=>'bairoch' without capitalisation (as shown in
perldoc synopsis) throws an exception: Failed to load module
Bio::Restriction::IO::bairoch...

However... the test script returns the number 532 - the number of
enzymes in the default enzyme set - regardless of the number of enzymes
in the file. A default Bio::Restriction::EnzymeCollection object has
presumably been created (as the 'read()' and 'each_enzyme' methods are
available) but it didn't come from the local file. The result is the
same if the Bio::Restriction::IO->new() method is called with no
arguments - a default EnzymeCollection object is created. It's not clear
to me where this has come from.

My (mis?)understanding was that the default set of enzymes would be
loaded on creation of a new Bio::Restriction::Analysis object (in the
absence of a -enzymes=>... argument). Presumably this is down to my poor
understanding of the BioPerl object model... ;-)

So: how should I create an EnzymeCollection object from file?

Any help or advice would be gratefully received.

PS. Congratulations to the development team for creating a very
impressive and useful open source toolkit.

Derek.


-----------------------------------------
Derek Fairley, Ph.D.
Regional Virus Laboratory,
Kelvin Building,
Royal Victoria Hospital, 
Grosvenor Road,
Belfast,
N. Ireland.
BT12 6BA

Tel. +44 (0)2890 635303


From rowan.mitchell at bbsrc.ac.uk  Tue May 23 10:53:42 2006
From: rowan.mitchell at bbsrc.ac.uk (rowan mitchell (RRes-Roth))
Date: Tue, 23 May 2006 15:53:42 +0100
Subject: [Bioperl-l] Assembly::IO ace output
Message-ID: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>

Hi 

I am very interested in writing ace format files and had assumed that I
would be able to do this with Assembly::IO until I tried it! I see there
has been some correspondence last year on this, but as far as I can see
this is still not implemented in 1.5.1. Is this correct ? Is it planned
to be included; are there modules under development available ?

many thanks

Rowan

===============================================
Dr Rowan Mitchell
Rothamsted Research
Harpenden
Herts AL5 2JQ UK

Tel: +44 (0)1582 763133 x2469
Fax: +44 (0)1582 763010
E-mail: rowan.mitchell at bbsrc.ac.uk
WWW: http://www.rothamsted.bbsrc.ac.uk/
===============================================
Rothamsted Research is a company limited by guarantee, registered in
England under the registration number 2393175 and a not for profit
charity number 802038.


From rfsouza at cecm.usp.br  Tue May 23 16:17:36 2006
From: rfsouza at cecm.usp.br (Robson Francisco de Souza {S})
Date: Tue, 23 May 2006 17:17:36 -0300
Subject: [Bioperl-l] Assembly::IO ace output
In-Reply-To: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>
References: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>
Message-ID: <20060523201736.GA28401@cecm.usp.br>

Hi Rowan,

On Tue, May 23, 2006 at 03:53:42PM +0100, rowan mitchell (RRes-Roth) wrote:
> Hi 
> 
> I am very interested in writing ace format files and had assumed that I
> would be able to do this with Assembly::IO until I tried it! I see there
> has been some correspondence last year on this, but as far as I can see
> this is still not implemented in 1.5.1. Is this correct ? Is it planned
> to be included; are there modules under development available ?

As far as I know, there are no plans to add write support to
Bio::Assembly::IO. When I wrote the original modules there was no need
for this so I left it aside.

Best regards,
Robson

> many thanks
> 
> Rowan
> 
> ===============================================
> Dr Rowan Mitchell
> Rothamsted Research
> Harpenden
> Herts AL5 2JQ UK
> 
> Tel: +44 (0)1582 763133 x2469
> Fax: +44 (0)1582 763010
> E-mail: rowan.mitchell at bbsrc.ac.uk
> WWW: http://www.rothamsted.bbsrc.ac.uk/
> ===============================================
> Rothamsted Research is a company limited by guarantee, registered in
> England under the registration number 2393175 and a not for profit
> charity number 802038.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

From lstein at cshl.edu  Tue May 23 16:53:34 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 23 May 2006 16:53:34 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231006.28392.lstein@cshl.edu>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
	<200605231006.28392.lstein@cshl.edu>
Message-ID: <200605231653.36087.lstein@cshl.edu>

Hi Chen,

It looks to me like you cut and paste the data1.txt file from the web site, 
consequently replacing the tabs with spaces. Please get table1.txt from the 
BioPerl distribution, as instructed in the tutorial.

Best,

Lincoln

On Tuesday 23 May 2006 10:06, Lincoln Stein wrote:
> Hi,
>
> It is possible that your version of display can't handle PNG images. Try
> saving the output as a file and then opening it in another image program:
>
> 	perl render_blast1.pl data1.txt > data1.png
>
> Another thing to watch out for is that, depending on what version of Perl
> you're using, you may have to insert this statement into the
> render_blast1.pl script (somewhere near the top):
>
> 	binmode STDOUT;
>
> Lincoln
>
> On Saturday 20 May 2006 20:15, chen li wrote:
> > Dear all,
> >
> >
> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I type
> > this line in Cygwin X window:
> >
> >
> > $ perl render_blast1.pl data1.txt | display -
> >
> > And here is the result:
> >
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> >
> > Any idea?
> >
> >
> > Thank you very much,
> >
> > Li
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From chen_li3 at yahoo.com  Tue May 23 17:46:16 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 14:46:16 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231653.36087.lstein@cshl.edu>
Message-ID: <20060523214616.15131.qmail@web36813.mail.mud.yahoo.com>

Dear Dr. Stein,

Thank you so much. I follow your suggestions and
download codes from the Bioperl CVS website. Now
everything is working.


Li 


--- Lincoln Stein <lstein at cshl.edu> wrote:

> Hi Chen,
> 
> It looks to me like you cut and paste the data1.txt
> file from the web site, 
> consequently replacing the tabs with spaces. Please
> get table1.txt from the 
> BioPerl distribution, as instructed in the tutorial.
> 
> Best,
> 
> Lincoln
> 
> On Tuesday 23 May 2006 10:06, Lincoln Stein wrote:
> > Hi,
> >
> > It is possible that your version of display can't
> handle PNG images. Try
> > saving the output as a file and then opening it in
> another image program:
> >
> > 	perl render_blast1.pl data1.txt > data1.png
> >
> > Another thing to watch out for is that, depending
> on what version of Perl
> > you're using, you may have to insert this
> statement into the
> > render_blast1.pl script (somewhere near the top):
> >
> > 	binmode STDOUT;
> >
> > Lincoln
> >
> > On Saturday 20 May 2006 20:15, chen li wrote:
> > > Dear all,
> > >
> > >
> > > I try one script from GraphicsHowTo under Cygwin
> > > environment(GD and libpng already installed). I
> type
> > > this line in Cygwin X window:
> > >
> > >
> > > $ perl render_blast1.pl data1.txt | display -
> > >
> > > And here is the result:
> > >
> > > display: no decode delegate for this image
> format
> > > `/tmp/magick-qKiRPDRS'.
> > >
> > > Any idea?
> > >
> > >
> > > Thank you very much,
> > >
> > > Li
> > >
> > >
> > >
> __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > > http://mail.yahoo.com
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING, 
> PLEASE CONTACT MY ASSISTANT, 
> SANDRA MICHELSEN, AT michelse at cshl.edu
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From chen_li3 at yahoo.com  Tue May 23 18:59:46 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 15:59:46 -0700 (PDT)
Subject: [Bioperl-l] How to download sequence files either in EMBL format
Message-ID: <20060523225946.2118.qmail@web36805.mail.mud.yahoo.com>

Hi all,

I need to download one sequence for a gene. I go to
NCBI website,find the gene of interest,download the
file in Genbank format(saved as sequence.genbank). But
to my surprise this so-called genbank format file
doesn't contain many features such as exons,compared
to the one in Emsembl. 

My question: where can I download this sequence file
in EMBL format? It looks like the one in EMBL might
contain other information such exon.

Thank you very much,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From osborne1 at optonline.net  Wed May 24 10:33:16 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 24 May 2006 10:33:16 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com>
Message-ID: <C099E6EC.88F0%osborne1@optonline.net>

Li,

The Graphics HOWTO talks about this Windows workaround in _four_ different
places, it's impossible to miss if you read it from start to finish. This is
what one should do if one wants to use these modules and one is a novice.
Example:

Important! Remember that if you are on a Windows platform, you need to put
STDOUT into binary mode so that the PNG file does not go through Window's
carriage return/linefeed transformations. Before the final print statement,
write binmode(STDOUT).

Brian O.


On 5/23/06 9:27 AM, "chen li" <chen_li3 at yahoo.com> wrote:

> BTW I copy  this script from GraphicsHowTo on Bioperl
> website and only one line contains print statement,
> which is 'print $panel->png'. 


From chen_li3 at yahoo.com  Wed May 24 12:17:15 2006
From: chen_li3 at yahoo.com (chen li)
Date: Wed, 24 May 2006 09:17:15 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <C099E6EC.88F0%osborne1@optonline.net>
Message-ID: <20060524161715.45141.qmail@web36807.mail.mud.yahoo.com>

Thanks but Dr. Stein already helps me to figure out
what is going on: I should have copied the source
codes for the examples in CVS instead of "cut and
paste" from the HOWTO tutorial. And sorry for any
inconvience.

Li

--- Brian Osborne <osborne1 at optonline.net> wrote:

> Li,
> 
> The Graphics HOWTO talks about this Windows
> workaround in _four_ different
> places, it's impossible to miss if you read it from
> start to finish. This is
> what one should do if one wants to use these modules
> and one is a novice.
> Example:
> 
> Important! Remember that if you are on a Windows
> platform, you need to put
> STDOUT into binary mode so that the PNG file does
> not go through Window's
> carriage return/linefeed transformations. Before the
> final print statement,
> write binmode(STDOUT).
> 
> Brian O.
> 
> 
> On 5/23/06 9:27 AM, "chen li" <chen_li3 at yahoo.com>
> wrote:
> 
> > BTW I copy  this script from GraphicsHowTo on
> Bioperl
> > website and only one line contains print
> statement,
> > which is 'print $panel->png'. 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From ULNJUJERYDIX at spammotel.com  Wed May 24 21:59:36 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Thu, 25 May 2006 09:59:36 +0800
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have
	negative (-) position numbering imagemap making
Message-ID: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>

Hi
thanks for the help offered thus far!
sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using
bioperl. therefore i was asked to make the numberings as such (-1000) is
there any way at all to do this in bioperl without changing the .pm file?

thanks guys..
kevin


From cjfields at uiuc.edu  Thu May 25 12:43:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 11:43:37 -0500
Subject: [Bioperl-l] Problems with Unflattener.pm
In-Reply-To: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu>
Message-ID: <009d01c6801a$5f75d2a0$15327e82@pyrimidine>

I was able to reproduce this using WinXP and bioperl-live.  Seems to get
caught up in the loop during recursion: debugging shows it is unable to get
past 'find_best_matches: (/15)'.  There are lots of unmatched pairs here
with this sequence, so could that be the problem?  I not terribly familiar
with Unflattener...

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Barry Moore
> Sent: Monday, May 22, 2006 8:00 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Problems with Unflattener.pm
> 
> Hi All,
> 
> NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into
> an infinite recursive loop.  The trouble occurs in the method
> find_best_matches between lines 2258 and 2281, and in particular the
> loop is perpetuated by line 2273.   NT_113910 has a fairly complex
> features table, and but I have as yet been unable to figure out why
> this loop is not exiting properly.  This has been submitted to
> bugzilla, but I'll post here so it gets documented on the list also.
> Any suggestions from Chris or others would be greatly appreciated.
> 
> This problem can be recreated as follows:
> 
> Grab NT_113910 from genbank.
> bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk
> 
> Pass NT_113910.gbk on the command line to the attached script.
> 
> 
> 
> #!/usr/bin/perl;
> 
> use strict;
> use warnings;
> 
> use Bio::SeqIO;
> use Bio::SeqFeature::Tools::Unflattener;
> 
> my $file = shift;
> 
> # generate an Unflattener object
> my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
> #$unflattener->verbose(1);
> 
> # first fetch a genbank SeqI object
> my $seqio =
>      Bio::SeqIO->new(-file   => $file,
>                      -format => 'GenBank');
> my $out =
>      Bio::SeqIO->new(-format => 'asciitree');
> while (my $seq = $seqio->next_seq()) {
> 
>          # get top level unflattended SeqFeatureI objects
>          $unflattener->unflatten_seq(-seq       => $seq,
>                                      -use_magic => 1);
>          $out->write_seq($seq);
> }
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu May 25 15:44:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 14:44:01 -0500
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu>
Message-ID: <00a101c68033$95606dd0$15327e82@pyrimidine>

This is due to recent changes in the SwissProt/UniProt format (there
apparently are many other changes besides this).  

>From UniProtKB news (http://ca.expasy.org/sprot/relnotes/sp_news.html) is
this tidbit:
----------------------------------------------------------
 UniProtKB release 7.0 of 07-Feb-2006

    Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB
releases in the DT lines to displaying the date of the biweekly release at
which an entry is integrated or updated. We dropped the information
concerning the release number and introduced entry and sequence version
numbers in the DT lines.

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino
acid sequence is modified. The entry version number is incremented by one
whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates,
using archived UniProtKB releases.

----------------------------------------------------------

Probably should explain on the swissprot wiki page that the format is in a
state of flux at the moment.  I've added this tidbit to the bug page (#2003)
as well.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Monday, May 22, 2006 9:04 PM
> To: Michael Rogoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> 
> We ask that people post patches to the bugzilla as an attachment to
> the bugzilla so we can track what and why the bug was that the patch
> fixes.
> 
> I am not totally sure this patch works because it seems like we need
> to strip out more information now from the DT line if the $date
> actually contains more information than just the date.
> 
> If you would go ahead and create a bug in bugzilla for  this (http://
> bugzilla.open-bio.org) this sort of conversation can be tracked to
> the bug.
> 
> If any of this is unclear please let us know - I though we had put
> some pages up about this sort of thing on the wiki but maybe they
> need to be expanded.
> 
> -jason
> On May 22, 2006, at 9:51 PM, Michael Rogoff wrote:
> 
> > I have a patch that seems to work but I'm not familiar with the
> > proper method to
> > "provide" it.  How do I go about that?
> >
> > The patch is pretty simple, it just parses the sequence version out
> > of the date
> > line where it now hides:
> >
> >          #date
> >          elsif( /^DT\s+(.*)/ ) {
> >            my $date = $1;
> > +
> > +          if ($date =~ /sequence version (\d+)/i) {
> > +              $params{'-seq_version'} ||= $1;
> > +          }
> > +
> >            $date =~ s/\;//;
> >            $date =~ s/\s+$//;
> >            push @{$params{'-dates'}}, $date;
> >          }
> >
> > By the way, what is the difference between Bio::Seq::version and
> > Bio::Seq::RichSeq::seq_version?
> >
> >
> >> -----Original Message-----
> >> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >> Sent: Monday, May 22, 2006 6:37 PM
> >> To: Michael Rogoff
> >> Cc: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> >>
> >>
> >> Sounds like a "missing feature" =)
> >>
> >> AFAIK the module was only written for swissprot files.  It is
> >> possible there have been changes in the format that have not been
> >> tracked to the current code.  We'd certainly appreciate someone
> >> testing it out as versions evolve.  If you submit a bug to bugzilla
> >> with version of bioperl and example files you can track when
> >> a fix is
> >> in.  We of course appreciate anyone's efforts to provide a patch as
> >> most bugs get fixed of late when someone gets "itchy" enough to fix
> >> them.
> >>
> >> -jason
> >>
> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
> >>
> >>>
> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
> >>> ignores the
> >>> sequence version, and calling seq_version() on the resulting
> >>> RichSeq object
> >>> returns undef.
> >>>
> >>> It looks like swiss.pm is trying to parse the version out
> >> of the SV
> >>> line, which
> >>> apparently doesn't exist any more?  The sequence version(s)
> >> are now
> >>> specified as
> >>> part of the Date (DT) lines.
> >>>
> >>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
> >>>
> >>> Thanks for any help ...
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >>
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From miker at biotiquesystems.com  Mon May 22 21:51:10 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Mon, 22 May 2006 18:51:10 -0700
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <B62A5429-083F-4B93-87EF-0F5DCD4033FE@duke.edu>
Message-ID: <003301c67e0b$5dd44410$c100a8c0@mike>

I have a patch that seems to work but I'm not familiar with the proper method to
"provide" it.  How do I go about that?

The patch is pretty simple, it just parses the sequence version out of the date
line where it now hides:

         #date
         elsif( /^DT\s+(.*)/ ) {
           my $date = $1;
+
+          if ($date =~ /sequence version (\d+)/i) {
+              $params{'-seq_version'} ||= $1;
+          }
+
           $date =~ s/\;//;
           $date =~ s/\s+$//;
           push @{$params{'-dates'}}, $date;
         }

By the way, what is the difference between Bio::Seq::version and
Bio::Seq::RichSeq::seq_version?


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Monday, May 22, 2006 6:37 PM
> To: Michael Rogoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> 
> 
> Sounds like a "missing feature" =)
> 
> AFAIK the module was only written for swissprot files.  It is  
> possible there have been changes in the format that have not been  
> tracked to the current code.  We'd certainly appreciate someone  
> testing it out as versions evolve.  If you submit a bug to bugzilla  
> with version of bioperl and example files you can track when 
> a fix is  
> in.  We of course appreciate anyone's efforts to provide a patch as  
> most bugs get fixed of late when someone gets "itchy" enough to fix  
> them.
> 
> -jason
> 
> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
> 
> >
> > As best as I can tell, using Bio::SeqIO to parse a uniprot file  
> > ignores the
> > sequence version, and calling seq_version() on the resulting  
> > RichSeq object
> > returns undef.
> >
> > It looks like swiss.pm is trying to parse the version out 
> of the SV  
> > line, which
> > apparently doesn't exist any more?  The sequence version(s) 
> are now  
> > specified as
> > part of the Date (DT) lines.
> >
> > Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
> >
> > Thanks for any help ...
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 


From chen_li3 at yahoo.com  Tue May 23 11:48:46 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 08:48:46 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231006.28392.lstein@cshl.edu>
Message-ID: <20060523154846.70831.qmail@web36815.mail.mud.yahoo.com>

Dear Dr. Stein,

I have the job partially done by adding this line
(under Cygwin)

print STDOUT $panel->png;

It is done because I can produce the image to be
viewed by other programs but it is only partially done
because I don't get exactly the same image as that
shown on the website. Enclosed is the image I get.

Thank you,

Li

--- Lincoln Stein <lstein at cshl.edu> wrote:

> Hi,
> 
> It is possible that your version of display can't
> handle PNG images. Try 
> saving the output as a file and then opening it in
> another image program:
> 
> 	perl render_blast1.pl data1.txt > data1.png
> 
> Another thing to watch out for is that, depending on
> what version of Perl 
> you're using, you may have to insert this statement
> into the render_blast1.pl 
> script (somewhere near the top):
> 
> 	binmode STDOUT;
> 
> Lincoln
> 
> 
> On Saturday 20 May 2006 20:15, chen li wrote:
> > Dear all,
> >
> >
> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I
> type
> > this line in Cygwin X window:
> >
> >
> > $ perl render_blast1.pl data1.txt | display -
> >
> > And here is the result:
> >
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> >
> > Any idea?
> >
> >
> > Thank you very much,
> >
> > Li
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING, 
> PLEASE CONTACT MY ASSISTANT, 
> SANDRA MICHELSEN, AT michelse at cshl.edu
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: im1
Type: image/x-png
Size: 2423 bytes
Desc: 2615755531-im1
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060523/6870f840/attachment.bin 

From cjfields at uiuc.edu  Thu May 25 21:28:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 20:28:14 -0500
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike>
References: <003301c67e0b$5dd44410$c100a8c0@mike>
Message-ID: <D422B7D5-C92D-436A-8385-01CFD306DFA8@uiuc.edu>

This patch works only for the recent change in swissprot seq format  
for sequence versions on the DT line.  I checked it out vs the test  
data provided with bioperl (t\data\swiss.dat).  I did manage to get  
it working for both old and new using a modification to your patch  
but there's another issue; using $seq->get_dates, which should only  
show dates, shows the entire line (date and version info).  Jason  
mentioned that there needs to be a better way to address this which  
I'm looking into.

Chris

On May 22, 2006, at 8:51 PM, Michael Rogoff wrote:

> I have a patch that seems to work but I'm not familiar with the  
> proper method to
> "provide" it.  How do I go about that?
>
> The patch is pretty simple, it just parses the sequence version out  
> of the date
> line where it now hides:
>
>          #date
>          elsif( /^DT\s+(.*)/ ) {
>            my $date = $1;
> +
> +          if ($date =~ /sequence version (\d+)/i) {
> +              $params{'-seq_version'} ||= $1;
> +          }
> +
>            $date =~ s/\;//;
>            $date =~ s/\s+$//;
>            push @{$params{'-dates'}}, $date;
>          }
>
> By the way, what is the difference between Bio::Seq::version and
> Bio::Seq::RichSeq::seq_version?
>
>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Monday, May 22, 2006 6:37 PM
>> To: Michael Rogoff
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
>>
>>
>> Sounds like a "missing feature" =)
>>
>> AFAIK the module was only written for swissprot files.  It is
>> possible there have been changes in the format that have not been
>> tracked to the current code.  We'd certainly appreciate someone
>> testing it out as versions evolve.  If you submit a bug to bugzilla
>> with version of bioperl and example files you can track when
>> a fix is
>> in.  We of course appreciate anyone's efforts to provide a patch as
>> most bugs get fixed of late when someone gets "itchy" enough to fix
>> them.
>>
>> -jason
>>
>> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
>>
>>>
>>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
>>> ignores the
>>> sequence version, and calling seq_version() on the resulting
>>> RichSeq object
>>> returns undef.
>>>
>>> It looks like swiss.pm is trying to parse the version out
>> of the SV
>>> line, which
>>> apparently doesn't exist any more?  The sequence version(s)
>> are now
>>> specified as
>>> part of the Date (DT) lines.
>>>
>>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>>>
>>> Thanks for any help ...
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Fri May 26 10:38:29 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 26 May 2006 10:38:29 -0400
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have
	negative (-) position numbering imagemap making
In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
Message-ID: <200605261038.30380.lstein@cshl.edu>

Hi,

For some reason I didn't see the first posting on this. In current bioperl 
live, the ruler can have negative numberings - I use this routinely. You need 
to create a feature that starts in negative coordinates. What is happening to 
you when you try this?

Lincoln

On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> Hi
> thanks for the help offered thus far!
> sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using
> bioperl. therefore i was asked to make the numberings as such (-1000) is
> there any way at all to do this in bioperl without changing the .pm file?
>
> thanks guys..
> kevin
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From jelenaob at gmail.com  Fri May 26 12:47:05 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Fri, 26 May 2006 09:47:05 -0700
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
Message-ID: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>

Hi there,

I have tried loading enzyme list from a file REBASE bairoch.605 using
Bio::Restriction::IO;

1. But for some reason the number of enzymes in the list is always 532
which is a default set of enzymes in enzyme collection.

Is there any known issue with this module or a workaround?

And here is the code I have been using:

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-format=>"Bairoch")
|| die "can't load the file bairoch.605: $!";
my $enzymes = $re_in->read;
print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";

2. The other problem is when trying to use format that is lower-case
it throws an exception, but when "B" is capitalized it is ok.
I assume it cannot load a file and does not initilize enzyme
collection properly.

Can't call method "each_enzyme" on an undefined value at
.../cgi-bin/seq-load.pl line 51.

Any thoughts?


Thanks in advance,


Jelena Obradovic
jelenaob at gmail.com


From cjfields at uiuc.edu  Fri May 26 15:27:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 14:27:13 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
Message-ID: <002601c680fa$644635a0$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> Sent: Friday, May 26, 2006 11:47 AM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
> 
> Hi there,
> 
> I have tried loading enzyme list from a file REBASE bairoch.605 using
> Bio::Restriction::IO;
> 
> 1. But for some reason the number of enzymes in the list is always 532
> which is a default set of enzymes in enzyme collection.
> 
> Is there any known issue with this module or a workaround?
> 
> And here is the code I have been using:
> 
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
> format=>"Bairoch")
> || die "can't load the file bairoch.605: $!";
> my $enzymes = $re_in->read;
> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
 
my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"Bairoch");

should be 

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"bairoch");

Note the case change for the format; this is noted in the bug report you
submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (i.e.
requires a specific format, which I believe is case-sensitive).  Judging by
the modules in Bio/Restriction/IO directory, looks like the
Bio::Restriction::IO format should match one of the following formats:
bairoch, itype2, withrefm, and you can also build your own if needed using
the previous as examples and implementing Bio::Restriction::IO::base.

> 2. The other problem is when trying to use format that is lower-case
> it throws an exception, but when "B" is capitalized it is ok.
> I assume it cannot load a file and does not initilize enzyme
> collection properly.
> 
> Can't call method "each_enzyme" on an undefined value at
> .../cgi-bin/seq-load.pl line 51.

My guess?  The reason it works with an uppercase ('Bairoch') is that it
can't find the module and uses the default set of enzymes as a fallback.
The exception that you reported when you use lowercase ('bairoch') is real
and I reported it as a bug (there are a few I found in that module).

You might want to try using one of the other formats if you can get the
files in the right format from REBASE.  I'm looking into the bugs
specifically associated with Bio::Restriction::IO::bairoch.

> Any thoughts?
> 
> 
> Thanks in advance,
> 
> 
> Jelena Obradovic
> jelenaob at gmail.com
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Fri May 26 15:43:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 26 May 2006 15:43:18 -0400
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine>
Message-ID: <C09CD296.8961%osborne1@optonline.net>

Chris,

SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA'
should work). This is what the documentation says and what the code seems to
suggest. This is probably what the Restriction modules should do as well.

Brian O.


From cjfields at uiuc.edu  Fri May 26 16:21:03 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 15:21:03 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <C09CD296.8961%osborne1@optonline.net>
Message-ID: <002701c68101$e9432540$15327e82@pyrimidine>

Okay, my bad.  Having the format be case-insensitive makes sense and is
probably an easy fix, but there seem to be more serious issues with the
Bio::Restriction::IO modules at the moment.  None have implemented write
methods though POD implies they work:

SYNOPSIS

    use Bio::Restriction::IO;

    $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                     -format => 'withrefm');
    $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                     -format => 'bairoch');
    my $res = $in->read; # a Bio::Restriction::EnzymeCollection
    $out->write($res);

and no tests exist for Bio::Restriction::IO::bairoch yet.  In fact, the
tests are pretty confusing; when did we allow this syntax: '-format => 8'?
Anyway, I'm muddling my way through this and will probably write something
up for the project priority list if I can't work this bug out.  

Chris

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Friday, May 26, 2006 2:43 PM
> To: Chris Fields; 'Jelena Obradovic'; Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file
> 
> Chris,
> 
> SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA'
> should work). This is what the documentation says and what the code seems
> to
> suggest. This is probably what the Restriction modules should do as well.
> 
> Brian O.
> 
> 


From andreas.bender at complife.org  Fri May 26 10:50:03 2006
From: andreas.bender at complife.org (Andreas Bender (CompLife'06))
Date: Fri, 26 May 2006 10:50:03 -0400
Subject: [Bioperl-l] Bioperl-based Applications for "Free Software" Session?
Message-ID: <e83118520605260750w3e66286bmbd6a14be3d2299d6@mail.gmail.com>

Dear All,

Did anyone of you implement some cool programs/tools using Bioperl? Or
is there someone from the Bioperl core team who wants to present
Bioperl itself at our conference? We are holding a "free software"
session (free at least as in free beer, ideally also open source, some
GNU-type license) at our "Computational Life Sciences" Conference in
Cambridge/UK later this year and you are warmly welcome to present
your software there. Please contact me directly or visit the website
in case of any questions.

Enjoy the weekend,
Andreas


                                  Call for Contributions
==================================================
               LIFE SCIENCE FREE SOFTWARE SESSION

          held at CompLife 2006 (http://www.complife.org)
     in Cambridge, United Kingdom, on September 27 - 29, 2006
==================================================
In the last years more and more free and open source software has been
developed for chemo- and bioinformatics, molecular modelling or other
Life Science applications, but many of the programs are not well
known. During the CompLife 2006 conference we will organize a special
session dedicated to this type of free software. The demo session will
be preceeded by a short session having room for brief introductory
presentations whereas the demo session itself will allow attendees to
see the tools in action. Authors of free software will have the
opportunity to present their program to the CompLife audience which
will consist of researchers and users from computer science, biology,
chemistry and everything in between.

In case you are interested in the free software session, send us an
email at fss at complife.org and briefly describe your program and how
you intend to present it at the conference (1-2 pages max - please
include URL to downloadable version where available). The only
restrictions are that the program must be freely available for
everyone or even open source and that it must be related to Life
Science applications. The deadline for these proposals is June, 16th
2006. In mid July we will notify you if your software demo was
accepted.
************************

-- 
Computational Life Sciences '06 Cambridge/UK, 27-29 September 2006:
Visit http://www.complife.org for more information!

Andreas Kieron Patrick Bender - http://www.andreasbender.de
Novartis Institutes for BioMedical Research, Cambridge/MA


From cjfields at uiuc.edu  Fri May 26 17:19:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 16:19:08 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
Message-ID: <002b01c6810a$06642400$15327e82@pyrimidine>

The POD documentation is a bit misleading for Bio::Restriction::IO.  Brian's
right, there needs to be more flexibility with the case for the formats
used.  I found a few other odd things as well which I may file bug reports
for.  Looks like another post for the project priority list.

 
Chris

 
  _____  

From: Jelena Obradovic [mailto:jobradovic at gmail.com] 
Sent: Friday, May 26, 2006 3:56 PM
To: Chris Fields
Cc: Jelena Obradovic; Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file

 
Hi guys, I tried with the other formats, and it works fine with "withrefm"
format but not with "withref".

Thanks a lot for your reponse.

Cheers,

Jelena

On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> Sent: Friday, May 26, 2006 11:47 AM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file 
>
> Hi there,
>
> I have tried loading enzyme list from a file REBASE bairoch.605 using
> Bio::Restriction::IO;
>
> 1. But for some reason the number of enzymes in the list is always 532 
> which is a default set of enzymes in enzyme collection.
>
> Is there any known issue with this module or a workaround?
>
> And here is the code I have been using:
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- 
> format=>"Bairoch")
> || die "can't load the file bairoch.605: $!";
> my $enzymes = $re_in->read;
> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; 

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"Bairoch");

should be

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- 
                                   format=>"bairoch");

Note the case change for the format; this is noted in the bug report you
submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (
i.e.
requires a specific format, which I believe is case-sensitive).  Judging by
the modules in Bio/Restriction/IO directory, looks like the
Bio::Restriction::IO format should match one of the following formats:
bairoch, itype2, withrefm, and you can also build your own if needed using
the previous as examples and implementing Bio::Restriction::IO::base.

> 2. The other problem is when trying to use format that is lower-case 
> it throws an exception, but when "B" is capitalized it is ok.
> I assume it cannot load a file and does not initilize enzyme
> collection properly.
>
> Can't call method "each_enzyme" on an undefined value at 
> .../cgi-bin/seq-load.pl line 51.

My guess?  The reason it works with an uppercase ('Bairoch') is that it
can't find the module and uses the default set of enzymes as a fallback.
The exception that you reported when you use lowercase ('bairoch') is real 
and I reported it as a bug (there are a few I found in that module).

You might want to try using one of the other formats if you can get the
files in the right format from REBASE.  I'm looking into the bugs
specifically associated with Bio::Restriction::IO::bairoch.

> Any thoughts?
>
>
> Thanks in advance,
>
>
> Jelena Obradovic
> jelenaob at gmail.com  <mailto:jelenaob at gmail.com> 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Jelena Obradovic
Email: jobradovic at gmail.com


From jay at jays.net  Sat May 27 12:47:27 2006
From: jay at jays.net (Jay Hannah)
Date: Sat, 27 May 2006 11:47:27 -0500
Subject: [Bioperl-l] "Project OpenLab" (working title)
Message-ID: <4478829F.5030508@jays.net>

Hola --

We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)

   "Project OpenLab":
   http://omaha.pm.org/kwiki/?BioPerl

- Does any such project already exist? 
- If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 
- I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
- I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
- I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
- I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.

Thanks for your time,

j


From fernan at iib.unsam.edu.ar  Sat May 27 18:30:44 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Sat, 27 May 2006 19:30:44 -0300
Subject: [Bioperl-l] "Project OpenLab" (working title)
In-Reply-To: <4478829F.5030508@jays.net>
References: <4478829F.5030508@jays.net>
Message-ID: <20060527223044.GA40583@iib.unsam.edu.ar>

+----[ Jay Hannah <jay at jays.net> (27.May.2006 15:15):
|
| Hola --

Hola!

| We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)
| 
|    "Project OpenLab":
|    http://omaha.pm.org/kwiki/?BioPerl
| 
| - Does any such project already exist? 

mmm ... maybe ... both GUS (Genomics Unified Schema:
gusdb.org, though not developed around bioperl) and GMOD
(Generic Model Organism Database: gmod.org) provide you with 
i) RDBMS storage
ii) a Perl object layer
iii) a web app framework

Though certainly overkill for the needs you describe
in the wiki, they can be customized to work in the way you
describe or at least serve as a guide.

| - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 

Have you considered Perl Catalyst? It has the benefits of
allowing you to work with bioperl modules naturally (it's
Perl!) a choice of templating toolkits (Template Toolkit, Mason,
among others) and will provide you with an almost ready to
go controller/url dispatcher.

| - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
| - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
| - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
| - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.
| 
| Thanks for your time,
| 
| j
|
+----]

Good luck,

Fernan

From epsteinj at mail.nih.gov  Fri May 26 14:46:32 2006
From: epsteinj at mail.nih.gov (Epstein, Jonathan A (NIH/NICHD) [E])
Date: Fri, 26 May 2006 14:46:32 -0400
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler
	havenegative (-) position numbering imagemap making
In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
Message-ID: <42504F69898FE546B3F0238C9BD032750915F8@NIHCESMLBX7.nih.gov>

While this is being discussed and we have Lincoln's attention; in example 4 on the Biographics Howto:
   http://stein.cshl.org/genome_informatics/BioGraphics/Graphics-HOWTO.html
how can one assign directional arrows to the graded segments which represent the BLAST hits?  I.e., is there a glyph type which is both an 'arrow' and a 'graded_segment'?  What other techniques do you recommend for associating directionality with these hits?

Thanks&regards,

Jonathan


From jobradovic at gmail.com  Fri May 26 16:55:35 2006
From: jobradovic at gmail.com (Jelena Obradovic)
Date: Fri, 26 May 2006 13:55:35 -0700
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine>
References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
	<002601c680fa$644635a0$15327e82@pyrimidine>
Message-ID: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>

Hi guys, I tried with the other formats, and it works fine with "withrefm"
format but not with "withref".

Thanks a lot for your reponse.

Cheers,

Jelena

On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> > Sent: Friday, May 26, 2006 11:47 AM
> > To: Bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
> >
> > Hi there,
> >
> > I have tried loading enzyme list from a file REBASE bairoch.605 using
> > Bio::Restriction::IO;
> >
> > 1. But for some reason the number of enzymes in the list is always 532
> > which is a default set of enzymes in enzyme collection.
> >
> > Is there any known issue with this module or a workaround?
> >
> > And here is the code I have been using:
> >
> > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
> > format=>"Bairoch")
> > || die "can't load the file bairoch.605: $!";
> > my $enzymes = $re_in->read;
> > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>                                    format=>"Bairoch");
>
> should be
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>                                    format=>"bairoch");
>
> Note the case change for the format; this is noted in the bug report you
> submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (
> i.e.
> requires a specific format, which I believe is case-sensitive).  Judging
> by
> the modules in Bio/Restriction/IO directory, looks like the
> Bio::Restriction::IO format should match one of the following formats:
> bairoch, itype2, withrefm, and you can also build your own if needed using
> the previous as examples and implementing Bio::Restriction::IO::base.
>
> > 2. The other problem is when trying to use format that is lower-case
> > it throws an exception, but when "B" is capitalized it is ok.
> > I assume it cannot load a file and does not initilize enzyme
> > collection properly.
> >
> > Can't call method "each_enzyme" on an undefined value at
> > .../cgi-bin/seq-load.pl line 51.
>
> My guess?  The reason it works with an uppercase ('Bairoch') is that it
> can't find the module and uses the default set of enzymes as a fallback.
> The exception that you reported when you use lowercase ('bairoch') is real
> and I reported it as a bug (there are a few I found in that module).
>
> You might want to try using one of the other formats if you can get the
> files in the right format from REBASE.  I'm looking into the bugs
> specifically associated with Bio::Restriction::IO::bairoch.
>
> > Any thoughts?
> >
> >
> > Thanks in advance,
> >
> >
> > Jelena Obradovic
> > jelenaob at gmail.com
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Jelena Obradovic
Email: jobradovic at gmail.com

From gad14 at cornell.edu  Fri May 26 16:02:33 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Fri, 26 May 2006 16:02:33 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
Message-ID: <44775ED9.4020208@cornell.edu>

Hi,

I'm running local blast with Bio::Tools::Run::StandAloneBlast. 
Everything seems to work ok up to the point of accessing the results. I 
am able to print the results but when I try to do more than one thing 
with the result, nothing is returned for the second activity..

I'd like to first sort the results into groups of results that hit the 
db seq once, twice, three times, etc - where the results are stored as 
SeqFeature objects in temporary arrays whose contents are printed 
sequentially to stdout when the whole sort is complete.

Secondly, I need to print the results in Hit Table (i.e. -m 8) format to 
stdout.

If I've sorted the results the sorted-results will print to screen, 
however when I try to print the Hit Table results nothing is returned, 
as if the blast results have evaporated.... and visa versa, if i comment 
out the part where i point my sorting subroutine to the blast results 
reference,  my hit table results suddenly prints to screen. It's almost 
like the reference to the SearchIO obj that holds the StandAloneBlast 
results is lost after one use?? (I'm beginning to think there is 
something naive about the way I'm using references?..)


Here's an abbreviated version of my code:


my $ref_seq_objs; # ref to array of Sequence obj's
my $genome_seq; # fasta containing 1 genomic sequence

my @params = ('program' => 'blastn',
	       'database' => $genome_seq,
                 );
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

my $blast_report = $factory->blastall($ref_seq_objs); #OK

#######
### the following 2 actions seem to be mutually exclusive.
# 1) sort results into 1-hitter, 2-hitter, etc. groups of
# SeqFeature objs stored in arrays. arrays are then printed
# to stdout
&sort_results($blast_report);

# 2) print blast results
&print_blast_results($blast_report);
#######


sub print_blast_results{
   my $report = shift;
   while(my $result = $report->next_result()){
     while(my $hit = $result->next_hit()){
       while(my $hsp = $hit->next_hsp()){
	my $q_name = $hsp_q_seq_obj->display_id;
         print join(", ",$q_name,$hit->name,$hsp->bits)."\n";
       }
     }
   }
}


I'm about to lose my mind on this... any assistance appreciated!

Thanks,
Genevieve


From rvosa at sfu.ca  Sun May 28 03:43:23 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Sun, 28 May 2006 00:43:23 -0700
Subject: [Bioperl-l] "Project OpenLab" (working title)
In-Reply-To: <4478829F.5030508@jays.net>
References: <4478829F.5030508@jays.net>
Message-ID: <4479549B.5030202@sfu.ca>

The TreeBaseII team (part of the cipres project: http://www.phylo.org) 
are working on a lab database system for storage of intermediate 
calculation results and data (sequence alignments, trees, taxon sets). I 
think what you're discussing is a bit more molecular and less 
phylogenetic, but it does sound similar in spirit.

Rutger

Jay Hannah wrote:
> Hola --
>
> We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)
>
>    "Project OpenLab":
>    http://omaha.pm.org/kwiki/?BioPerl
>
> - Does any such project already exist? 
> - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 
> - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
> - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
> - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
> - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.
>
> Thanks for your time,
>
> j
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From cjfields at uiuc.edu  Sun May 28 09:55:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 28 May 2006 08:55:47 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
	<002601c680fa$644635a0$15327e82@pyrimidine>
	<286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
Message-ID: <EA78F27A-074E-4C9D-AC70-27D4CC20F8C4@uiuc.edu>

Again, it's b/c 'withrefm' is a valid Restriction::IO module and  
'withref' is not.  Similar to the case issue you saw before with  
'bairoch.'  Making this more lenient would help but there are more  
serious issues with these modules that need to be addressed...

http://www.bioperl.org/wiki/Project_priority_list#Restriction_Enzymes

Chris

On May 26, 2006, at 3:55 PM, Jelena Obradovic wrote:

> Hi guys, I tried with the other formats, and it works fine with  
> "withrefm"
> format but not with "withref".
>
> Thanks a lot for your reponse.
>
> Cheers,
>
> Jelena
>
> On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
>>> Sent: Friday, May 26, 2006 11:47 AM
>>> To: Bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
>>>
>>> Hi there,
>>>
>>> I have tried loading enzyme list from a file REBASE bairoch.605  
>>> using
>>> Bio::Restriction::IO;
>>>
>>> 1. But for some reason the number of enzymes in the list is  
>>> always 532
>>> which is a default set of enzymes in enzyme collection.
>>>
>>> Is there any known issue with this module or a workaround?
>>>
>>> And here is the code I have been using:
>>>
>>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>> format=>"Bairoch")
>>> || die "can't load the file bairoch.605: $!";
>>> my $enzymes = $re_in->read;
>>> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
>>
>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>                                    format=>"Bairoch");
>>
>> should be
>>
>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>                                    format=>"bairoch");
>>
>> Note the case change for the format; this is noted in the bug  
>> report you
>> submitted earlier.  Bio::Restriction::IO works similarly to  
>> Bio::SeqIO (
>> i.e.
>> requires a specific format, which I believe is case-sensitive).   
>> Judging
>> by
>> the modules in Bio/Restriction/IO directory, looks like the
>> Bio::Restriction::IO format should match one of the following  
>> formats:
>> bairoch, itype2, withrefm, and you can also build your own if  
>> needed using
>> the previous as examples and implementing Bio::Restriction::IO::base.
>>
>>> 2. The other problem is when trying to use format that is lower-case
>>> it throws an exception, but when "B" is capitalized it is ok.
>>> I assume it cannot load a file and does not initilize enzyme
>>> collection properly.
>>>
>>> Can't call method "each_enzyme" on an undefined value at
>>> .../cgi-bin/seq-load.pl line 51.
>>
>> My guess?  The reason it works with an uppercase ('Bairoch') is  
>> that it
>> can't find the module and uses the default set of enzymes as a  
>> fallback.
>> The exception that you reported when you use lowercase ('bairoch')  
>> is real
>> and I reported it as a bug (there are a few I found in that module).
>>
>> You might want to try using one of the other formats if you can  
>> get the
>> files in the right format from REBASE.  I'm looking into the bugs
>> specifically associated with Bio::Restriction::IO::bairoch.
>>
>>> Any thoughts?
>>>
>>>
>>> Thanks in advance,
>>>
>>>
>>> Jelena Obradovic
>>> jelenaob at gmail.com
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> -- 
> Jelena Obradovic
> Email: jobradovic at gmail.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From osborne1 at optonline.net  Sun May 28 11:03:37 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 28 May 2006 11:03:37 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
Message-ID: <C09F3409.8992%osborne1@optonline.net>

Genevieve,

Does this simplified code, without the &sort_results($blast_report) line,
work?

By the way, no one can really help you here because you haven't shown us all
of the code. The code you are showing certainly looks OK.


Brian O.


On 5/26/06 4:02 PM, "Genevieve DeClerck" <gad14 at cornell.edu> wrote:

> &sort_results($blast_report);


From simon.rayner.mlist at gmail.com  Mon May 29 03:37:24 2006
From: simon.rayner.mlist at gmail.com (mailing lists)
Date: Mon, 29 May 2006 15:37:24 +0800
Subject: [Bioperl-l] installation problems with bioperl-ext on x86_64
	running SuSE linux
Message-ID: <f73437f70605290037q3c7637e4h29faa3aed16ec77a@mail.gmail.com>

Hello,

i'm having a problem trying to install the bioperl-ext package on my
system.

biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # perl Makefile.PL
Writing Makefile for Bio::Ext::Align
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # make
cc -c  -I./libs -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -fPIC -O2 -fmessage-length=0 -Wall
-D_FORTIFY_SOURCE=2 -g -Wall -pipe   -DVERSION=\"0.1\" -DXS_VERSION=
\"0.1\" -fPIC "-I/usr/lib/perl5/5.8.7/x86_64-linux-thread-multi/CORE"
-DPOSIX -DNOERROR Align.c
In file included from Align.xs:12:
./libs/sw.h:1360:1: warning: "/*" within comment
.
.
.
Running Mkbootstrap for Bio::Ext::Align ()
chmod 644 Align.bs
rm -f blib/arch/auto/Bio/Ext/Align/Align.so
LD_RUN_PATH="" cc  -shared -L/usr/local/lib64 Align.o  -o
blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a  -lm
/usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld:
libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not
be used when making a shared object; recompile with -fPIC
libs/libsw.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #

the -fPIC flag is already set in the makefile.

I found a similar problem in an earlier posting with the following
suggestions....


  From: Aaron J. Mackey <amackey <at> pcbi.upenn.edu>
  Subject: Re: compiling bioperl-ext
  Newsgroups: gmane.comp.lang.perl.bio.general
  Date: 2004-06-09 20:46:05 GMT (1 year, 50 weeks, 3 days, 3 hours and 50
  minutes ago)

  1) Are you starting with a clean build directory?

  2) Does installing other compiled Perl modules work for you (e.g.
  Data::Dumper or Storable)?

  That's a pretty arcane error, and if the answer to #2 is "no", then I
  don't think we can help you.

  -Aaron


....In my case, both 1) and 2) are true.  I installed Data::Dumper without
any problems.


I've found plenty of similar incidences for other sofware and it seems to
relate to
32/64bit issues.

Does anyone have any suggestions about how to get around this?

thanks

Simon Rayner


From ULNJUJERYDIX at spammotel.com  Mon May 29 05:46:21 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Mon, 29 May 2006 17:46:21 +0800
Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the
	ruler have
In-Reply-To: <200605261038.30380.lstein@cshl.edu>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
	<200605261038.30380.lstein@cshl.edu>
Message-ID: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>

Hi!
oh it was in a slightly different header asking about the create image map
feature.
I am using the stable version 1.4 of bioperl now. In any case I have not
added the sequence as a feature annotated seq. as I already have the bp
where the TF binds (in 1-1050 numberings) so what I did was to just add
graded segments based on the position.
I saw that there is a scale function for the arrow glyp however, it is a
multiply function, can it be hacked to take in a offset value (ie minus the
scale by 1000?)

cheers
kevin


Hi,
>
> For some reason I didn't see the first posting on this. In current bioperl
> live, the ruler can have negative numberings - I use this routinely. You
> need
> to create a feature that starts in negative coordinates. What is happening
> to
> you when you try this?
>
> Lincoln
>
> On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > Hi
> > thanks for the help offered thus far!
> > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> using
> > bioperl. therefore i was asked to make the numberings as such (-1000) is
> > there any way at all to do this in bioperl without changing the .pm
> file?
> >
> > thanks guys..
> > kevin
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From shameer at ncbs.res.in  Mon May 29 06:07:17 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 29 May 2006 15:37:17 +0530 (IST)
Subject: [Bioperl-l] Reg. Integrated Server / CGI to pass PDB to multiple
	Servers
Message-ID: <49187.192.168.1.1.1148897237.squirrel@192.168.1.1>

Dear All,

My query may not be directly related to BioPERL, But am sure I will get
some idea to move on. Some possibilities wil be available from Pise or
related modules

Query :
---------
We have several public servers(say a,b,c). All of them will take a
pdb-file as an input and process it and displays it. Now, I need to create
a web page(a meta-server/integrated web-server) with three radio
buttons(a,b,c) and a single input form(to accept pdb file from the users
...:( - File passing as an argument seems to be some what impossible to
me). I need output as 3 links in next page.

Is there any Bio-PERL module / CGI / Perl tricks to do it ?

Thanks in advance,
-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675
W - http://caps.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."


From torsten.seemann at infotech.monash.edu.au  Tue May 30 02:41:31 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 30 May 2006 16:41:31 +1000
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
References: <44775ED9.4020208@cornell.edu>
Message-ID: <447BE91B.30001@infotech.monash.edu.au>

> my $ref_seq_objs; # ref to array of Sequence obj's
> my $genome_seq; # fasta containing 1 genomic sequence
> my @params = ('program' => 'blastn',
> 	       'database' => $genome_seq,
 >                  );

The database parameter needs to be the same thing you would pass to the 
"-d" option in "blastall". I don't think you can pass a perl string 
here. ie. there needs to be a properly formatted set of blast indices 
for your genome sequence on the disk in the appropriate place.
See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html

> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
> my $blast_report = $factory->blastall($ref_seq_objs); #OK

But I could be wrong, and $blast_report here contains a valid BLAST report.

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From sb at mrc-dunn.cam.ac.uk  Tue May 30 03:59:28 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 30 May 2006 08:59:28 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
References: <44775ED9.4020208@cornell.edu>
Message-ID: <447BFB60.4000006@mrc-dunn.cam.ac.uk>

Genevieve DeClerck wrote:
> Hi,
[snip]
> If I've sorted the results the sorted-results will print to screen, 
> however when I try to print the Hit Table results nothing is returned, 
> as if the blast results have evaporated.... and visa versa, if i comment 
> out the part where i point my sorting subroutine to the blast results 
> reference,  my hit table results suddenly prints to screen.
[snip]
> Here's an abbreviated version of my code:
[snip]
> #######
> ### the following 2 actions seem to be mutually exclusive.
> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
> # SeqFeature objs stored in arrays. arrays are then printed
> # to stdout
> &sort_results($blast_report);
> 
> # 2) print blast results
> &print_blast_results($blast_report);

> sub print_blast_results{
>    my $report = shift;
>    while(my $result = $report->next_result()){
[snip]

You didn't give us your sort_results subroutine, but is it as simple as
they both use $report->next_result (and/or $result->next_hit), but you
don't reset the internal counter back to the start, so the second
subroutine tries to get the next_result and finds the first subroutine
has already looked at the last result and so next_result returns false?

 From a quick look it wasn't obvious how to reset the counter. Hopefully
this can be done and someone else knows how.


From torsten.seemann at infotech.monash.edu.au  Tue May 30 04:18:45 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 30 May 2006 18:18:45 +1000
Subject: [Bioperl-l] For CVS developers - potential pitfall with "return
	undef"
Message-ID: <447BFFE5.8010508@infotech.monash.edu.au>

FYI Bioperl developers:

I just audited the bioperl-live CVS and found about 450 occurrences of 
"return undef".

Page 199 of "Perl Best Practices" by Damian Conway, and this URL
http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:

"Use return; instead of return undef; if you want to return nothing. If 
someone assigns the return value to an array, the latter creates an 
array of one value (undef), which evaluates to true. The former will 
correctly handle all contexts."

So I'm guessing at least some of these 450 occurrences *could* result in 
bugs and should probably be changed.

Your opinion may differ :-)

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From cjfields at uiuc.edu  Tue May 30 10:07:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 09:07:45 -0500
Subject: [Bioperl-l] For CVS developers - potential pitfall with
	"returnundef"
In-Reply-To: <447BFFE5.8010508@infotech.monash.edu.au>
Message-ID: <000c01c683f2$6ca62570$15327e82@pyrimidine>

Torsten,

Any way you can post a list of some/all of the offending lines or modules?
Sounds like something to consider, but if the list is as large as you say we
made need something (bugzilla? wiki?) to track the changes and make sure
they pass tests; I'm sure a large majority will.  

I'm guessing Jason would want this somewhere on the project priority list or
bugzilla, with a link to the actual list, but I'm not sure.  Maybe start a
page on the wiki for proposed code changes?

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> Sent: Tuesday, May 30, 2006 3:19 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> "returnundef"
> 
> FYI Bioperl developers:
> 
> I just audited the bioperl-live CVS and found about 450 occurrences of
> "return undef".
> 
> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
> 
> "Use return; instead of return undef; if you want to return nothing. If
> someone assigns the return value to an array, the latter creates an
> array of one value (undef), which evaluates to true. The former will
> correctly handle all contexts."
> 
> So I'm guessing at least some of these 450 occurrences *could* result in
> bugs and should probably be changed.
> 
> Your opinion may differ :-)
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Tue May 30 10:47:48 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 30 May 2006 10:47:48 -0400
Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the
	ruler have
In-Reply-To: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
	<200605261038.30380.lstein@cshl.edu>
	<5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>
Message-ID: <200605301047.49127.lstein@cshl.edu>

Hi Kevin,

I'm afraid that there is no offset value. You'll need the 1.51 version of 
bioperl to handle negative numbers properly. I understand your reluctance to 
upgrade just to get the Bio::Graphics functionality. You might consider 
checking out just the Bio/Graphics subtree and installing that. It should 
work on top of 1.4

Lincoln

On Monday 29 May 2006 05:46, Kevin Lam Koiyau wrote:
> Hi!
> oh it was in a slightly different header asking about the create image map
> feature.
> I am using the stable version 1.4 of bioperl now. In any case I have not
> added the sequence as a feature annotated seq. as I already have the bp
> where the TF binds (in 1-1050 numberings) so what I did was to just add
> graded segments based on the position.
> I saw that there is a scale function for the arrow glyp however, it is a
> multiply function, can it be hacked to take in a offset value (ie minus the
> scale by 1000?)
>
> cheers
> kevin
>
>
> Hi,
>
> > For some reason I didn't see the first posting on this. In current
> > bioperl live, the ruler can have negative numberings - I use this
> > routinely. You need
> > to create a feature that starts in negative coordinates. What is
> > happening to
> > you when you try this?
> >
> > Lincoln
> >
> > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > Hi
> > > thanks for the help offered thus far!
> > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> >
> > using
> >
> > > bioperl. therefore i was asked to make the numberings as such (-1000)
> > > is there any way at all to do this in bioperl without changing the .pm
> >
> > file?
> >
> > > thanks guys..
> > > kevin
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From cjfields at uiuc.edu  Tue May 30 10:50:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 09:50:06 -0500
Subject: [Bioperl-l] Bio::Restriction::IO issues
Message-ID: <000f01c683f8$5771ed50$15327e82@pyrimidine>

Jason, Brian, et al,

I found several major issues with Bio::Restriction::IO (this popped up while
bug squashing).  In particular, the POD is pretty misleading.  It states
(directly from perldoc):

SYNOPSIS
        use Bio::Restriction::IO;

        $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                         -format => 'withrefm');
        $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                         -format => 'bairoch');
        my $res = $in->read; # a Bio::Restriction::EnzymeCollection
        $out->write($res);

      # or

      #    use Bio::Restriction::IO;
      #
      #    #input file format can be read from the file extension (dat|xml)
      #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
      #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
      #
      #    # World's shortest flat<->xml format converter:
      #    print $out $_ while <$in>;

So, I have found several problems with these modules.  I really hate to
criticize code here, as my own is pretty hacky, but I think these are things
to seriously mull over: 

1)	Note that, though some of the lines above are commented they are
still there in POD and thus present in perldoc/pod2html etc.  So, judging
from the above, it suggests using the script above should read in from one
format and write out to another (like SeqIO).  However, NONE of the current
write() methods are implemented for any of the IO modules (withref, base,
itype2, bairoch), so this does not happen as expected.  You get the nasty
thrown 'method not implemented error' instead when writing.
2)	The commented statements in POD above also suggest that REBASE XML
format is supported when there is no XML module.  
3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
made it unusable until I added a few small changes; it still can't handle
multisite/multicut enzymes properly, so in essence it is useless until that
is addressed.
4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make
up it's own methods?  

I'm working on at least getting the 'bairoch' input format up and running
(so at least it gets the enzymes into a
Bio::Restriction::Enzyme::Collection).  From this point I'm not sure where
to proceed.  The POD obviously needs to be corrected to reflect that writing
formats is not implemented (and the bit about XML should be taken out
completely); that's the easy part which I am working on and plan committing
today.  However, these modules don't seem to be used too frequently so I'm
not sure whether it's worth spending too much time getting these up to speed
at the moment (adding write methods, switching to Bio::Root::Root, etc); I
have other priorities at the moment (including a way overdue ListSummary).
I'm also not sure who else is (using|working) on these so I don't want to
(make too many changes|step on someone else's toes), but these are, IMHO,
pretty serious problems.  

Any thoughts?

Chris


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Tue May 30 12:34:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 11:34:18 -0500
Subject: [Bioperl-l] Bio::Restriction::IO changes
Message-ID: <001401c68406$e71e9850$15327e82@pyrimidine>

Jason, Brian, et al:

I have made changes to the Bio::Restriction::IO POD to remove any reference
to write functions since almost none have been implemented yet, so including
this into POD is a bit misleading.  At the moment, you can't write to any
REBASE format except for 'base', which I found is the only one that works.
And, upon further checking, even that one has issues: it looks like there
are problems with multicut/multisite enzymes when writing in 'base' format
which I'm not delving into ('TaqII' only displays one site when writing when
it has two cut sites).  I'll add this to the wiki and a bug report
(enhancement) for this module.

I am also removing mention of XML and 'bairoch' formats (the former isn't
present and the latter is broken at the moment) and added a few things to
the POD TO DO section.  

Rob (if you're out there somewhere in the ether), have you made any more
changes to these modules that need to be committed?  Didn't know if any of
these issues have already been addressed/changed etc.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From jelenaob at gmail.com  Tue May 30 00:58:35 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Mon, 29 May 2006 21:58:35 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>

Hello everybody,

does anybody know how to remove the background color of the Panel.
Currently, I am not adding anything to it, so I can troubleshot the problem,
and I have tried setting up
all color attributes I could find to the panel, but no luck. Whatever I do,
I get the BLUE border of the panel.

Has anybody faced the same problem?

Thanks in advance,

Jelena

And here is the code I am currently using:

-----------------------------------------------------------------------------------------------------------
my $panel =
    Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
                              -width => 800,
                              -pad_left => 10,
                              -pad_right => 10,
                              -key_color => 'white',
                              -bgcolor => 'white',
                              -gridcolor=>'black',
                              -fgcolor => 'black',
                              -grid => 0,
                              );
   my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
     -url  => '/tmpimages');
   #make clickable image
   print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
   print $map;

-----------------------------------------------------------------------------------------------------------

From jelenaob at gmail.com  Tue May 30 00:58:35 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Mon, 29 May 2006 21:58:35 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>

Hello everybody,

does anybody know how to remove the background color of the Panel.
Currently, I am not adding anything to it, so I can troubleshot the problem,
and I have tried setting up
all color attributes I could find to the panel, but no luck. Whatever I do,
I get the BLUE border of the panel.

Has anybody faced the same problem?

Thanks in advance,

Jelena

And here is the code I am currently using:

-----------------------------------------------------------------------------------------------------------
my $panel =
    Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
                              -width => 800,
                              -pad_left => 10,
                              -pad_right => 10,
                              -key_color => 'white',
                              -bgcolor => 'white',
                              -gridcolor=>'black',
                              -fgcolor => 'black',
                              -grid => 0,
                              );
   my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
     -url  => '/tmpimages');
   #make clickable image
   print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
   print $map;

-----------------------------------------------------------------------------------------------------------

From luciap at sas.upenn.edu  Tue May 30 14:49:48 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Tue, 30 May 2006 14:49:48 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
Message-ID: <1149014988.447c93cc01761@128.91.55.38>

Hi

I am here again, I finally got to write the "collapse nodes" function and have a
couple of questions.

In order to collpase any node $node, I first have to get the parent
which I can do as $parent=$node->ancestor

and then the children as:
@children=$node->get_all_Descendents (or should I use each descendent?)

Then before deleting $node I have to assign all its children to $parent,
and here is where I am kind of confussed.
Can I use the add_Descendent function for this?
I've been tryig to write something like this:
foreach $child (@children){
         $parent=add_Descendent->$child;
}
but this doesn't work and I think it is because I don't have any idea of what I
am doing
any suggestions?

thanks


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania

From rvosa at sfu.ca  Tue May 30 14:52:52 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 11:52:52 -0700
Subject: [Bioperl-l] For CVS developers - potential pitfall
	with	"returnundef"
In-Reply-To: <000c01c683f2$6ca62570$15327e82@pyrimidine>
References: <000c01c683f2$6ca62570$15327e82@pyrimidine>
Message-ID: <447C9484.9030102@sfu.ca>

Although I agree with the sentiment of following PBP, I'm not so sure 
changing 'return undef' to 'return' *now* will fix any bugs without 
introducing new, subtle ones.

Chris Fields wrote:
> Torsten,
>
> Any way you can post a list of some/all of the offending lines or modules?
> Sounds like something to consider, but if the list is as large as you say we
> made need something (bugzilla? wiki?) to track the changes and make sure
> they pass tests; I'm sure a large majority will.  
>
> I'm guessing Jason would want this somewhere on the project priority list or
> bugzilla, with a link to the actual list, but I'm not sure.  Maybe start a
> page on the wiki for proposed code changes?
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>> Sent: Tuesday, May 30, 2006 3:19 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>> "returnundef"
>>
>> FYI Bioperl developers:
>>
>> I just audited the bioperl-live CVS and found about 450 occurrences of
>> "return undef".
>>
>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
>>
>> "Use return; instead of return undef; if you want to return nothing. If
>> someone assigns the return value to an array, the latter creates an
>> array of one value (undef), which evaluates to true. The former will
>> correctly handle all contexts."
>>
>> So I'm guessing at least some of these 450 occurrences *could* result in
>> bugs and should probably be changed.
>>
>> Your opinion may differ :-)
>>
>> --
>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>> Victorian Bioinformatics Consortium, Monash University, Australia
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From luciap at sas.upenn.edu  Tue May 30 16:11:52 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Tue, 30 May 2006 16:11:52 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
Message-ID: <1149019912.447ca7085124e@128.91.55.38>

Hi
OK that was silly, but what I have in my code is what you just wrote
But the problem is that if I write

$parent->add_Descendent($child)

it tells me that I am calling  the method "ass_Descendent" on an undefined value
(but I did define $parent before??)

So here it goes the code so far:

use Bio::TreeIO;
 my $in = new Bio::TreeIO(-file => 'Test2.tre',
                          -format => 'newick');
 my $out = new Bio::TreeIO(-file => '>mytree.out',
                           -format => 'newick');
 while( my $tree = $in->next_tree ) {
    foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
    my $bootstrap=$node->_creation_id;

    if ($bootstrap < 70 ){
            my $parent = $node->ancestor;
            my @children=$node->get_all_Descendents;
            foreach my $child (@children){
                $parent->add_Descendent($child);
            }

........

eventually I'll add (once I assigned the children to the parent succesfully):
$tree->remove_Node($node);

        }
    }
    $out->write_tree($tree);
}

Quoting aaron.j.mackey at gsk.com:

> > foreach $child (@children){
> >          $parent=add_Descendent->$child;
> > }
>
> I think what you want is $parent->add_Descendent($child)
>
> -Aaron
>


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania

From jason.stajich at duke.edu  Tue May 30 16:30:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 30 May 2006 16:30:56 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <1149019912.447ca7085124e@128.91.55.38>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
Message-ID: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>

you need to special case the root - it won't have an ancestor.  just  
protect the my $parent = $node->ancestor with an if statement as I  
did below

On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:

> Hi
> OK that was silly, but what I have in my code is what you just wrote
> But the problem is that if I write
>
> $parent->add_Descendent($child)
>
> it tells me that I am calling  the method "ass_Descendent" on an  
> undefined value
> (but I did define $parent before??)
>
> So here it goes the code so far:
>
> use Bio::TreeIO;
>  my $in = new Bio::TreeIO(-file => 'Test2.tre',
>                           -format => 'newick');
>  my $out = new Bio::TreeIO(-file => '>mytree.out',
>                            -format => 'newick');
>  while( my $tree = $in->next_tree ) {
>     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
>     my $bootstrap=$node->_creation_id;
>
>     if ($bootstrap < 70 ){
>    >>> if(        my $parent = $node->ancestor ) {
>               my @children=$node->get_all_Descendents;
>               foreach my $child (@children){
>                  $parent->add_Descendent($child);
>               }
         }
>
> ........
>
> eventually I'll add (once I assigned the children to the parent  
> succesfully):
> $tree->remove_Node($node);
>
>         }
>     }
>     $out->write_tree($tree);
> }
>
> Quoting aaron.j.mackey at gsk.com:
>
>>> foreach $child (@children){
>>>          $parent=add_Descendent->$child;
>>> }
>>
>> I think what you want is $parent->add_Descendent($child)
>>
>> -Aaron
>>
>
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Tue May 30 17:40:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 16:40:18 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith	"returnundef"
In-Reply-To: <447C9484.9030102@sfu.ca>
Message-ID: <001801c68431$a586b2d0$15327e82@pyrimidine>

Agreed, though I think these changes should be implemented at some point
(Conway's argument here makes sense and it is nice for Torsten to check this
out).  If proper tests are written then any changes resulting in errors
should be picked up by checking the appropriate test suite, though I know it
doesn't absolutely guarantee it.  ; P  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> Sent: Tuesday, May 30, 2006 1:53 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> "returnundef"
> 
> Although I agree with the sentiment of following PBP, I'm not so sure
> changing 'return undef' to 'return' *now* will fix any bugs without
> introducing new, subtle ones.
> 
> Chris Fields wrote:
> > Torsten,
> >
> > Any way you can post a list of some/all of the offending lines or
> modules?
> > Sounds like something to consider, but if the list is as large as you
> say we
> > made need something (bugzilla? wiki?) to track the changes and make sure
> > they pass tests; I'm sure a large majority will.
> >
> > I'm guessing Jason would want this somewhere on the project priority
> list or
> > bugzilla, with a link to the actual list, but I'm not sure.  Maybe start
> a
> > page on the wiki for proposed code changes?
> >
> > Chris
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> >> Sent: Tuesday, May 30, 2006 3:19 AM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> >> "returnundef"
> >>
> >> FYI Bioperl developers:
> >>
> >> I just audited the bioperl-live CVS and found about 450 occurrences of
> >> "return undef".
> >>
> >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
> >>
> >> "Use return; instead of return undef; if you want to return nothing. If
> >> someone assigns the return value to an array, the latter creates an
> >> array of one value (undef), which evaluates to true. The former will
> >> correctly handle all contexts."
> >>
> >> So I'm guessing at least some of these 450 occurrences *could* result
> in
> >> bugs and should probably be changed.
> >>
> >> Your opinion may differ :-)
> >>
> >> --
> >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> >> Victorian Bioinformatics Consortium, Monash University, Australia
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
> 
> --
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Rutger Vos, PhD. candidate
> Department of Biological Sciences
> Simon Fraser University
> 8888 University Drive
> Burnaby, BC, V5A1S6
> Phone: 604-291-5625
> Fax: 604-291-3496
> Personal site: http://www.sfu.ca/~rvosa
> FAB* lab: http://www.sfu.ca/~fabstar
> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rvosa at sfu.ca  Tue May 30 17:58:25 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 14:58:25 -0700
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001901c68433$026b1ad0$15327e82@pyrimidine>
References: <001901c68433$026b1ad0$15327e82@pyrimidine>
Message-ID: <447CC001.4050000@sfu.ca>

I've been following the perl6 mailing lists for a while now. I think 
this time around it won't really take that long (one year?) for 
pugs/perl6 stacks to become more than just toys. I think especially 
large projects, like bioperl, will really benefit from the improved OO 
implementation in perl6, so it might be of interest to at least 
fantasize about it.

Chris Fields wrote:
> Ha!  Or may be the 'nonexistent' bioperl-experimental.  Wonder what'll
> happen once Perl6 comes to term?
>
> -CJF
>
>   
>> -----Original Message-----
>> From: Rutger Vos [mailto:rvosa at sfu.ca]
>> Sent: Tuesday, May 30, 2006 4:48 PM
>> To: Chris Fields
>> Subject: Re: [Bioperl-l] For CVS developers - potential
>> pitfallwith"returnundef"
>>
>> Surely this will all sort itself out in bioperl6 ;-)
>>
>> Chris Fields wrote:
>>     
>>> Agreed, though I think these changes should be implemented at some point
>>> (Conway's argument here makes sense and it is nice for Torsten to check
>>>       
>> this
>>     
>>> out).  If proper tests are written then any changes resulting in errors
>>> should be picked up by checking the appropriate test suite, though I
>>>       
>> know it
>>     
>>> doesn't absolutely guarantee it.  ; P
>>>
>>> Chris
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
>>>> Sent: Tuesday, May 30, 2006 1:53 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
>>>> "returnundef"
>>>>
>>>> Although I agree with the sentiment of following PBP, I'm not so sure
>>>> changing 'return undef' to 'return' *now* will fix any bugs without
>>>> introducing new, subtle ones.
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> Torsten,
>>>>>
>>>>> Any way you can post a list of some/all of the offending lines or
>>>>>
>>>>>           
>>>> modules?
>>>>
>>>>         
>>>>> Sounds like something to consider, but if the list is as large as you
>>>>>
>>>>>           
>>>> say we
>>>>
>>>>         
>>>>> made need something (bugzilla? wiki?) to track the changes and make
>>>>>           
>> sure
>>     
>>>>> they pass tests; I'm sure a large majority will.
>>>>>
>>>>> I'm guessing Jason would want this somewhere on the project priority
>>>>>
>>>>>           
>>>> list or
>>>>
>>>>         
>>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
>>>>>           
>> start
>>     
>>>> a
>>>>
>>>>         
>>>>> page on the wiki for proposed code changes?
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>>>>>> "returnundef"
>>>>>>
>>>>>> FYI Bioperl developers:
>>>>>>
>>>>>> I just audited the bioperl-live CVS and found about 450 occurrences
>>>>>>             
>> of
>>     
>>>>>> "return undef".
>>>>>>
>>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
>>>>>>             
>> suggest:
>>     
>>>>>> "Use return; instead of return undef; if you want to return nothing.
>>>>>>             
>> If
>>     
>>>>>> someone assigns the return value to an array, the latter creates an
>>>>>> array of one value (undef), which evaluates to true. The former will
>>>>>> correctly handle all contexts."
>>>>>>
>>>>>> So I'm guessing at least some of these 450 occurrences *could* result
>>>>>>
>>>>>>             
>>>> in
>>>>
>>>>         
>>>>>> bugs and should probably be changed.
>>>>>>
>>>>>> Your opinion may differ :-)
>>>>>>
>>>>>> --
>>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>             
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> --
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Rutger Vos, PhD. candidate
>>>> Department of Biological Sciences
>>>> Simon Fraser University
>>>> 8888 University Drive
>>>> Burnaby, BC, V5A1S6
>>>> Phone: 604-291-5625
>>>> Fax: 604-291-3496
>>>> Personal site: http://www.sfu.ca/~rvosa
>>>> FAB* lab: http://www.sfu.ca/~fabstar
>>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>>
>>>
>>>
>>>       
>> --
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Rutger Vos, PhD. candidate
>> Department of Biological Sciences
>> Simon Fraser University
>> 8888 University Drive
>> Burnaby, BC, V5A1S6
>> Phone: 604-291-5625
>> Fax: 604-291-3496
>> Personal site: http://www.sfu.ca/~rvosa
>> FAB* lab: http://www.sfu.ca/~fabstar
>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>     
>
>
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From cjfields at uiuc.edu  Tue May 30 18:08:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 17:08:26 -0500
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <447CC001.4050000@sfu.ca>
Message-ID: <001a01c68435$93135a50$15327e82@pyrimidine>

Agreed.  I would say, probably 6-12 months time, might be a good idea to try
getting something actually started, maybe under the 'bioperl-experimental'
title Jason has mentioned.  One could always try getting a Bio::Root-like
object going in Pugs/Perl6 as a starter and work up from there, with
emphasis on key areas (seq. parsing, so on).

CJF

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> Sent: Tuesday, May 30, 2006 4:58 PM
> To: bioperl list
> Subject: Re: [Bioperl-l] For CVS developers -
> potentialpitfallwith"returnundef"
> 
> I've been following the perl6 mailing lists for a while now. I think
> this time around it won't really take that long (one year?) for
> pugs/perl6 stacks to become more than just toys. I think especially
> large projects, like bioperl, will really benefit from the improved OO
> implementation in perl6, so it might be of interest to at least
> fantasize about it.
> 
> Chris Fields wrote:
> > Ha!  Or may be the 'nonexistent' bioperl-experimental.  Wonder what'll
> > happen once Perl6 comes to term?
> >
> > -CJF
> >
> >
> >> -----Original Message-----
> >> From: Rutger Vos [mailto:rvosa at sfu.ca]
> >> Sent: Tuesday, May 30, 2006 4:48 PM
> >> To: Chris Fields
> >> Subject: Re: [Bioperl-l] For CVS developers - potential
> >> pitfallwith"returnundef"
> >>
> >> Surely this will all sort itself out in bioperl6 ;-)
> >>
> >> Chris Fields wrote:
> >>
> >>> Agreed, though I think these changes should be implemented at some
> point
> >>> (Conway's argument here makes sense and it is nice for Torsten to
> check
> >>>
> >> this
> >>
> >>> out).  If proper tests are written then any changes resulting in
> errors
> >>> should be picked up by checking the appropriate test suite, though I
> >>>
> >> know it
> >>
> >>> doesn't absolutely guarantee it.  ; P
> >>>
> >>> Chris
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> >>>> Sent: Tuesday, May 30, 2006 1:53 PM
> >>>> To: bioperl-l at lists.open-bio.org
> >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> >>>> "returnundef"
> >>>>
> >>>> Although I agree with the sentiment of following PBP, I'm not so sure
> >>>> changing 'return undef' to 'return' *now* will fix any bugs without
> >>>> introducing new, subtle ones.
> >>>>
> >>>> Chris Fields wrote:
> >>>>
> >>>>
> >>>>> Torsten,
> >>>>>
> >>>>> Any way you can post a list of some/all of the offending lines or
> >>>>>
> >>>>>
> >>>> modules?
> >>>>
> >>>>
> >>>>> Sounds like something to consider, but if the list is as large as
> you
> >>>>>
> >>>>>
> >>>> say we
> >>>>
> >>>>
> >>>>> made need something (bugzilla? wiki?) to track the changes and make
> >>>>>
> >> sure
> >>
> >>>>> they pass tests; I'm sure a large majority will.
> >>>>>
> >>>>> I'm guessing Jason would want this somewhere on the project priority
> >>>>>
> >>>>>
> >>>> list or
> >>>>
> >>>>
> >>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> >>>>>
> >> start
> >>
> >>>> a
> >>>>
> >>>>
> >>>>> page on the wiki for proposed code changes?
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
> >>>>>> To: bioperl-l at lists.open-bio.org
> >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> >>>>>> "returnundef"
> >>>>>>
> >>>>>> FYI Bioperl developers:
> >>>>>>
> >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences
> >>>>>>
> >> of
> >>
> >>>>>> "return undef".
> >>>>>>
> >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> >>>>>>
> >> suggest:
> >>
> >>>>>> "Use return; instead of return undef; if you want to return
> nothing.
> >>>>>>
> >> If
> >>
> >>>>>> someone assigns the return value to an array, the latter creates an
> >>>>>> array of one value (undef), which evaluates to true. The former
> will
> >>>>>> correctly handle all contexts."
> >>>>>>
> >>>>>> So I'm guessing at least some of these 450 occurrences *could*
> result
> >>>>>>
> >>>>>>
> >>>> in
> >>>>
> >>>>
> >>>>>> bugs and should probably be changed.
> >>>>>>
> >>>>>> Your opinion may differ :-)
> >>>>>>
> >>>>>> --
> >>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
> >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> --
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Rutger Vos, PhD. candidate
> >>>> Department of Biological Sciences
> >>>> Simon Fraser University
> >>>> 8888 University Drive
> >>>> Burnaby, BC, V5A1S6
> >>>> Phone: 604-291-5625
> >>>> Fax: 604-291-3496
> >>>> Personal site: http://www.sfu.ca/~rvosa
> >>>> FAB* lab: http://www.sfu.ca/~fabstar
> >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >> --
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Rutger Vos, PhD. candidate
> >> Department of Biological Sciences
> >> Simon Fraser University
> >> 8888 University Drive
> >> Burnaby, BC, V5A1S6
> >> Phone: 604-291-5625
> >> Fax: 604-291-3496
> >> Personal site: http://www.sfu.ca/~rvosa
> >> FAB* lab: http://www.sfu.ca/~fabstar
> >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >
> >
> >
> >
> >
> >
> 
> --
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Rutger Vos, PhD. candidate
> Department of Biological Sciences
> Simon Fraser University
> 8888 University Drive
> Burnaby, BC, V5A1S6
> Phone: 604-291-5625
> Fax: 604-291-3496
> Personal site: http://www.sfu.ca/~rvosa
> FAB* lab: http://www.sfu.ca/~fabstar
> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ULNJUJERYDIX at spammotel.com  Tue May 30 23:45:12 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 31 May 2006 11:45:12 +0800
Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values
Message-ID: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>

I am so sorry for the truncated email accidentally hit reply.
if anyone is interested i have opted to change

change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
in linux its
/usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm


      $gd->string($font,$middle,$center+$a2-1,$label,$font_color)

to

      $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)

just  for this one-off use.


strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
option for coords offset?
    my $relative_coords_offset = $self->option('relative_coords_offset');
    $relative_coords_offset    = 1 unless defined $relative_coords_offset;
but entering the option -relative_coords_offset=>1000 in the arrow glyphs
didn't do anything...


Hi!
> oh it was in a slightly different header asking about the create image map
> feature.
> I am using the stable version 1.4 of bioperl now. In any case I have not
> added the sequence as a feature annotated seq. as I already have the bp
> where the TF binds (in 1-1050 numberings) so what I did was to just add
> graded segments based on the position.
> I saw that there is a scale function for the arrow glyp however, it is a
> multiply function, can it be hacked to take in a offset value (ie minus
> the
> scale by 1000?)
>
> cheers
> kevin
>
>
> Hi,
> >
> > For some reason I didn't see the first posting on this. In current
> bioperl
> > live, the ruler can have negative numberings - I use this routinely. You
> > need
> > to create a feature that starts in negative coordinates. What is
> happening
> > to
> > you when you try this?
> >
> > Lincoln
> >
> > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > Hi
> > > thanks for the help offered thus far!
> > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > using
> > > bioperl. therefore i was asked to make the numberings as such (-1000)
> is
> > > there any way at all to do this in bioperl without changing the .pm
> > file?
> > >
> > > thanks guys..
> > > kevin
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From sb at mrc-dunn.cam.ac.uk  Wed May 31 04:40:08 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 31 May 2006 09:40:08 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447C7985.9000404@cornell.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
Message-ID: <447D5668.7070500@mrc-dunn.cam.ac.uk>

Genevieve DeClerck wrote:
> Thanks for your comment Sendu, it was very helpful. I think this must be 
> what's going on.. I am using $blast_report->next_result in both 
> subroutines. It appears that analyzing the blast results first w/ my 
> sort subroutine empties (?) the $blast_result object so that when I try 
> to print, there is nothing left to print. (and visa-versa when I print 
> first then try to sort).
> So, from the looks of things, using next_result has the effect of 
> popping the Bio::Search::Result::ResultI objects off of the SearchIO 
> blast report object??

Not quite. It's more or less exactly like opening a file and then trying 
to read it all twice like this:
open(FILE, "file");
while (<FILE>) {
     print # prints each line in the file
}
while (<FILE>) {
     print # never happens, we never enter this while loop
}

To get the second while loop to print anything we need to say seek(FILE, 
0, 0) before it. Or in the first while loop store each line in an array, 
and then make the second loop a foreach through that array.


> It seems I could get around this by making a copy of the blast report by 
> setting it to another new variable...(not the most elegant solution) but 
> I'm having trouble with this...
> 
> If I do:
> 
>     my $blast_report_copy = $blast_report;
> 
> I'm just copying the reference to the SearchIO blast result, so it 
> doesn't help me. How can I make another physical copy of this blast 
> result object? Seems like a simple thing but how to do it is escaping me.

Not really a good idea, and it may not work anyway if the object 
contains a filehandle. But for a simple object you might recursively 
loop through the data structure and copy each element out into a similar 
data structure.


> But better yet, the way to go is to 'reset the counter,' or to find a 
> way to look at/print/sort the results without removing data from the 
> blast result object. How is this done though??

It would be rather nice if this worked:
my $blast_report = $factory->blastall($ref_seq_objs);
my $blast_fh = $blast_report->fh();
while (<$blast_fh>) {
     # $_ is a ResultI object, use as normal
}
seek($blast_fh, 0, 0); # this would be great, but does it work?
while <$blast_fh>) {
     # go through the results again in your second subroutine
}

An alternative hacky way of doing it, which may also not work, would be 
to go through your $blast_report as normal, but then before going 
through it a second time, say
my $fh = $blast_report->_fh;
seek($fh, 0, 0);

Finally, the most sensible way (assuming bioperl provides no methods of 
its own for this) of solving the problem is, the first time you go 
through each next_result, next_hit and next_hsp, just store the returned 
objects in an array of arrays of arrays. Then the second time get the 
objects from your array structure instead of with the method calls.

From heikki at sanbi.ac.za  Wed May 31 06:55:18 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 31 May 2006 12:55:18 +0200
Subject: [Bioperl-l]
	=?iso-8859-1?q?For_CVS_developers_-_potential_pitfall?=
	=?iso-8859-1?q?with_=22returnundef=22?=
In-Reply-To: <001801c68431$a586b2d0$15327e82@pyrimidine>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
Message-ID: <200605311255.19166.heikki@sanbi.ac.za>

In my opinion the sooner the bugs get exposed the better. It is much more 
likely that there is a well hidden bug caused by assigning accidentally undef 
into an one element array that someone intentionally writing code that 
expects that behaviour!

I removed (but did not commit yet) all undefs from my old Bio::Variation code 
and could not see any differences in the test output. 

Let's remove them!

	-Heikki

On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> Agreed, though I think these changes should be implemented at some point
> (Conway's argument here makes sense and it is nice for Torsten to check
> this out).  If proper tests are written then any changes resulting in
> errors should be picked up by checking the appropriate test suite, though I
> know it doesn't absolutely guarantee it.  ; P
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > Sent: Tuesday, May 30, 2006 1:53 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > "returnundef"
> >
> > Although I agree with the sentiment of following PBP, I'm not so sure
> > changing 'return undef' to 'return' *now* will fix any bugs without
> > introducing new, subtle ones.
> >
> > Chris Fields wrote:
> > > Torsten,
> > >
> > > Any way you can post a list of some/all of the offending lines or
> >
> > modules?
> >
> > > Sounds like something to consider, but if the list is as large as you
> >
> > say we
> >
> > > made need something (bugzilla? wiki?) to track the changes and make
> > > sure they pass tests; I'm sure a large majority will.
> > >
> > > I'm guessing Jason would want this somewhere on the project priority
> >
> > list or
> >
> > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > start
> >
> > a
> >
> > > page on the wiki for proposed code changes?
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > >> To: bioperl-l at lists.open-bio.org
> > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > >> "returnundef"
> > >>
> > >> FYI Bioperl developers:
> > >>
> > >> I just audited the bioperl-live CVS and found about 450 occurrences of
> > >> "return undef".
> > >>
> > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > >> suggest:
> > >>
> > >> "Use return; instead of return undef; if you want to return nothing.
> > >> If someone assigns the return value to an array, the latter creates an
> > >> array of one value (undef), which evaluates to true. The former will
> > >> correctly handle all contexts."
> > >>
> > >> So I'm guessing at least some of these 450 occurrences *could* result
> >
> > in
> >
> > >> bugs and should probably be changed.
> > >>
> > >> Your opinion may differ :-)
> > >>
> > >> --
> > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Rutger Vos, PhD. candidate
> > Department of Biological Sciences
> > Simon Fraser University
> > 8888 University Drive
> > Burnaby, BC, V5A1S6
> > Phone: 604-291-5625
> > Fax: 604-291-3496
> > Personal site: http://www.sfu.ca/~rvosa
> > FAB* lab: http://www.sfu.ca/~fabstar
> > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of the Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From heikki at sanbi.ac.za  Wed May 31 06:44:28 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 31 May 2006 12:44:28 +0200
Subject: [Bioperl-l] Bio::Restriction::IO issues
In-Reply-To: <000f01c683f8$5771ed50$15327e82@pyrimidine>
References: <000f01c683f8$5771ed50$15327e82@pyrimidine>
Message-ID: <200605311244.29187.heikki@sanbi.ac.za>


Chris,

Thanks for stepping in. I feel partly responsible here because I originally 
changed some of Rob's code but have not followed up since.

There have not been active development on these modules so do not worry about 
stepping on anyone's toes.

   -Heikki

On Tuesday 30 May 2006 16:50, Chris Fields wrote:
> Jason, Brian, et al,
>
> I found several major issues with Bio::Restriction::IO (this popped up
> while bug squashing).  In particular, the POD is pretty misleading.  It
> states (directly from perldoc):
>
> SYNOPSIS
>         use Bio::Restriction::IO;
>
>         $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>                                          -format => 'withrefm');
>         $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>                                          -format => 'bairoch');
>         my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>         $out->write($res);
>
>       # or
>
>       #    use Bio::Restriction::IO;
>       #
>       #    #input file format can be read from the file extension (dat|xml)
>       #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
>       #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>       #
>       #    # World's shortest flat<->xml format converter:
>       #    print $out $_ while <$in>;
>
> So, I have found several problems with these modules.  I really hate to
> criticize code here, as my own is pretty hacky, but I think these are
> things to seriously mull over:
>
> 1)	Note that, though some of the lines above are commented they are
> still there in POD and thus present in perldoc/pod2html etc.  So, judging
> from the above, it suggests using the script above should read in from one
> format and write out to another (like SeqIO).  However, NONE of the current
> write() methods are implemented for any of the IO modules (withref, base,
> itype2, bairoch), so this does not happen as expected.  You get the nasty
> thrown 'method not implemented error' instead when writing.
> 2)	The commented statements in POD above also suggest that REBASE XML
> format is supported when there is no XML module.
> 3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
> made it unusable until I added a few small changes; it still can't handle
> multisite/multicut enzymes properly, so in essence it is useless until that
> is addressed.
> 4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
> why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make
> up it's own methods?
>
> I'm working on at least getting the 'bairoch' input format up and running
> (so at least it gets the enzymes into a
> Bio::Restriction::Enzyme::Collection).  From this point I'm not sure where
> to proceed.  The POD obviously needs to be corrected to reflect that
> writing formats is not implemented (and the bit about XML should be taken
> out completely); that's the easy part which I am working on and plan
> committing today.  However, these modules don't seem to be used too
> frequently so I'm not sure whether it's worth spending too much time
> getting these up to speed at the moment (adding write methods, switching to
> Bio::Root::Root, etc); I have other priorities at the moment (including a
> way overdue ListSummary). I'm also not sure who else is (using|working) on
> these so I don't want to (make too many changes|step on someone else's
> toes), but these are, IMHO, pretty serious problems.
>
> Any thoughts?
>
> Chris
>
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of the Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From cjfields at uiuc.edu  Wed May 31 09:10:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 08:10:00 -0500
Subject: [Bioperl-l] Bio::Restriction::IO issues
In-Reply-To: <200605311244.29187.heikki@sanbi.ac.za>
References: <000f01c683f8$5771ed50$15327e82@pyrimidine>
	<200605311244.29187.heikki@sanbi.ac.za>
Message-ID: <C8B60E1D-D5A5-4661-AA2B-CEE1E5B5D758@uiuc.edu>

Heikki,

I mainly just changed a few things so no one would get the wrong  
ideas from POD (that they write format as well) and added a few  
things to the TO DO.  I also added a warning to  
Bio::Restriction::IO::bairoch for the multisite/multicut issue.   
Besides that I haven't done much to them.  I also added a bit to the  
Project Priority List in case someone wants to take it up.  I may  
tinker with it but it's not really high on my priority list.  I've  
been pretty busy getting the ListSummaries back up to speed (very  
busy mail lists since the last one) and am writing/testing a new  
interface to NCBI EUtilities which I may donate at some in the next  
few months or so.

Chris


On May 31, 2006, at 5:44 AM, Heikki Lehvaslaiho wrote:

>
> Chris,
>
> Thanks for stepping in. I feel partly responsible here because I  
> originally
> changed some of Rob's code but have not followed up since.
>
> There have not been active development on these modules so do not  
> worry about
> stepping on anyone's toes.
>
>    -Heikki
>
> On Tuesday 30 May 2006 16:50, Chris Fields wrote:
>> Jason, Brian, et al,
>>
>> I found several major issues with Bio::Restriction::IO (this  
>> popped up
>> while bug squashing).  In particular, the POD is pretty  
>> misleading.  It
>> states (directly from perldoc):
>>
>> SYNOPSIS
>>         use Bio::Restriction::IO;
>>
>>         $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>>                                          -format => 'withrefm');
>>         $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>>                                          -format => 'bairoch');
>>         my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>>         $out->write($res);
>>
>>       # or
>>
>>       #    use Bio::Restriction::IO;
>>       #
>>       #    #input file format can be read from the file extension  
>> (dat|xml)
>>       #    $in  = Bio::Restriction::IO->newFh(-file =>  
>> "inputfilename");
>>       #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>>       #
>>       #    # World's shortest flat<->xml format converter:
>>       #    print $out $_ while <$in>;
>>
>> So, I have found several problems with these modules.  I really  
>> hate to
>> criticize code here, as my own is pretty hacky, but I think these are
>> things to seriously mull over:
>>
>> 1)	Note that, though some of the lines above are commented they are
>> still there in POD and thus present in perldoc/pod2html etc.  So,  
>> judging
>> from the above, it suggests using the script above should read in  
>> from one
>> format and write out to another (like SeqIO).  However, NONE of  
>> the current
>> write() methods are implemented for any of the IO modules  
>> (withref, base,
>> itype2, bairoch), so this does not happen as expected.  You get  
>> the nasty
>> thrown 'method not implemented error' instead when writing.
>> 2)	The commented statements in POD above also suggest that REBASE XML
>> format is supported when there is no XML module.
>> 3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
>> made it unusable until I added a few small changes; it still can't  
>> handle
>> multisite/multicut enzymes properly, so in essence it is useless  
>> until that
>> is addressed.
>> 4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
>> why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO  
>> and make
>> up it's own methods?
>>
>> I'm working on at least getting the 'bairoch' input format up and  
>> running
>> (so at least it gets the enzymes into a
>> Bio::Restriction::Enzyme::Collection).  From this point I'm not  
>> sure where
>> to proceed.  The POD obviously needs to be corrected to reflect that
>> writing formats is not implemented (and the bit about XML should  
>> be taken
>> out completely); that's the easy part which I am working on and plan
>> committing today.  However, these modules don't seem to be used too
>> frequently so I'm not sure whether it's worth spending too much time
>> getting these up to speed at the moment (adding write methods,  
>> switching to
>> Bio::Root::Root, etc); I have other priorities at the moment  
>> (including a
>> way overdue ListSummary). I'm also not sure who else is (using| 
>> working) on
>> these so I don't want to (make too many changes|step on someone  
>> else's
>> toes), but these are, IMHO, pretty serious problems.
>>
>> Any thoughts?
>>
>> Chris
>>
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of the Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jay at jays.net  Wed May 31 09:07:10 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 08:07:10 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
Message-ID: <447D94FE.8090305@jays.net>

http://www.bioperl.org/wiki/Bptutorial.pl

I think I just partially fulfilled this TODO:

  TODO: check if the POD is in the Wiki yet, and if not, put it here? 

I used Pod::Simple::Wiki (format 'mediawiki') to burn bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the wiki page via my web browser. (Is that proper procedure? Is the plan to just do that manually from time to time as the document changes?)

Now what?

Should there be a new link on the far left of bioperl.org called "Tutorial"? 

It's an amazing document. IMHO it should be listed prominently on bioperl.org.

HTH,

j

From osborne1 at optonline.net  Wed May 31 09:58:01 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 31 May 2006 09:58:01 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447D94FE.8090305@jays.net>
Message-ID: <C0A31929.89F9%osborne1@optonline.net>

Jay,

Excellent! Now we need to answer a few more questions for ourselves:

- Do we remove the file bptutorial.pl from the package now? I'd say yes, we
don't want to have to maintain two bptutorials.

- What do we do with the script part of bptutorial.pl? It certainly could be
excised and put into the examples/ directory, for example, but this would
break a few of the paths that are being used.

- A link to bptutorial? Or a link to the existing tutorials page?
http://www.bioperl.org/wiki/Tutorials.

Any thoughts on these?


Brian O.


On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:

> http://www.bioperl.org/wiki/Bptutorial.pl
> 
> I think I just partially fulfilled this TODO:
> 
>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> 
> I used Pod::Simple::Wiki (format 'mediawiki') to burn
> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the
> wiki page via my web browser. (Is that proper procedure? Is the plan to just
> do that manually from time to time as the document changes?)
> 
> Now what?
> 
> Should there be a new link on the far left of bioperl.org called "Tutorial"?
> 
> It's an amazing document. IMHO it should be listed prominently on bioperl.org.
> 
> HTH,
> 
> j
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From luciap at sas.upenn.edu  Wed May 31 10:06:13 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Wed, 31 May 2006 10:06:13 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
	<6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
Message-ID: <1149084373.447da2d5c5339@128.91.55.38>

Hi
Thanks
a couple more questions
why is the bootstrap value stored as the node id? Is that right?

also, in the add_descendant method, how do you set the $ignoreoverwrite
parameter to true?

Lucia

Quoting Jason Stajich <jason.stajich at duke.edu>:

> you need to special case the root - it won't have an ancestor.  just
> protect the my $parent = $node->ancestor with an if statement as I
> did below
>
> On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:
>
> > Hi
> > OK that was silly, but what I have in my code is what you just wrote
> > But the problem is that if I write
> >
> > $parent->add_Descendent($child)
> >
> > it tells me that I am calling  the method "ass_Descendent" on an
> > undefined value
> > (but I did define $parent before??)
> >
> > So here it goes the code so far:
> >
> > use Bio::TreeIO;
> >  my $in = new Bio::TreeIO(-file => 'Test2.tre',
> >                           -format => 'newick');
> >  my $out = new Bio::TreeIO(-file => '>mytree.out',
> >                            -format => 'newick');
> >  while( my $tree = $in->next_tree ) {
> >     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
> >     my $bootstrap=$node->_creation_id;
> >
> >     if ($bootstrap < 70 ){
> >    >>> if(        my $parent = $node->ancestor ) {
> >               my @children=$node->get_all_Descendents;
> >               foreach my $child (@children){
> >                  $parent->add_Descendent($child);
> >               }
>          }
> >
> > ........
> >
> > eventually I'll add (once I assigned the children to the parent
> > succesfully):
> > $tree->remove_Node($node);
> >
> >         }
> >     }
> >     $out->write_tree($tree);
> > }
> >
> > Quoting aaron.j.mackey at gsk.com:
> >
> >>> foreach $child (@children){
> >>>          $parent=add_Descendent->$child;
> >>> }
> >>
> >> I think what you want is $parent->add_Descendent($child)
> >>
> >> -Aaron
> >>
> >
> >
> > Lucia Peixoto
> > Department of Biology,SAS
> > University of Pennsylvania
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania

From sb at mrc-dunn.cam.ac.uk  Wed May 31 10:56:49 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 31 May 2006 15:56:49 +0100
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>

Heikki Lehvaslaiho wrote:
> In my opinion the sooner the bugs get exposed the better. It is much more 
> likely that there is a well hidden bug caused by assigning accidentally undef 
> into an one element array that someone intentionally writing code that 
> expects that behaviour!
> 
> I removed (but did not commit yet) all undefs from my old Bio::Variation code 
> and could not see any differences in the test output. 
> 
> Let's remove them!

Just looking for all return undef;s isn't enough. It's entirely possible 
to do something like:

my $return_value;
{
   # do something that assigns to return_value on success
   # on failure, just do nothing
}
return $return_value;

The bioperl docs will typically explicitly state that undef is returned, 
and under what circumstance. If a user suffers from the 
undef-into-array-problem, yes it can be slightly unexpected, but lots of 
unexpected things will happen when you don't use a method correctly, as 
per the docs!

Fixing the return of undef is either a job that shouldn't be done, or a 
much harder job than expected.

From bernd.web at gmail.com  Wed May 31 10:30:30 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 31 May 2006 16:30:30 +0200
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <447D94FE.8090305@jays.net> <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <716af09c0605310730o7de20489m674a07b5a928039d@mail.gmail.com>

Hi,

I am not sure to what extent bptutorial will be removed, but
I actually like having bptutorial.pl in my BioPerl base for reference.

regards,
Bernd

On 5/31/06, Brian Osborne <osborne1 at optonline.net> wrote:
> Jay,
>
> Excellent! Now we need to answer a few more questions for ourselves:
>
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.
>
> - What do we do with the script part of bptutorial.pl? It certainly could be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.
>
> - A link to bptutorial? Or a link to the existing tutorials page?
> http://www.bioperl.org/wiki/Tutorials.
>
> Any thoughts on these?
>
>
> Brian O.
>
>
> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
>
> > http://www.bioperl.org/wiki/Bptutorial.pl
> >
> > I think I just partially fulfilled this TODO:
> >
> >   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >
> > I used Pod::Simple::Wiki (format 'mediawiki') to burn
> > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the
> > wiki page via my web browser. (Is that proper procedure? Is the plan to just
> > do that manually from time to time as the document changes?)
> >
> > Now what?
> >
> > Should there be a new link on the far left of bioperl.org called "Tutorial"?
> >
> > It's an amazing document. IMHO it should be listed prominently on bioperl.org.
> >
> > HTH,
> >
> > j
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From lstein at cshl.edu  Wed May 31 12:03:13 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 12:03:13 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <200605311203.13922.lstein@cshl.edu>

I'm afraid that everything depends on the context. If the subroutine is 
documented to return a single scalar, then returning undef is appropriate. If 
the subroutine is documented to return "false" on failure, then one must call 
return (or "return ()" ).

Changing all the return undefs to return is going to expose hidden bugs in the 
code written by people who are using BioPerl. While I agree wholeheartedly 
with the proposed audit, I think we need to expect that people are going to 
complain.

Lincoln


On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote:
> In my opinion the sooner the bugs get exposed the better. It is much more
> likely that there is a well hidden bug caused by assigning accidentally
> undef into an one element array that someone intentionally writing code
> that expects that behaviour!
>
> I removed (but did not commit yet) all undefs from my old Bio::Variation
> code and could not see any differences in the test output.
>
> Let's remove them!
>
> 	-Heikki
>
> On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> > Agreed, though I think these changes should be implemented at some point
> > (Conway's argument here makes sense and it is nice for Torsten to check
> > this out).  If proper tests are written then any changes resulting in
> > errors should be picked up by checking the appropriate test suite, though
> > I know it doesn't absolutely guarantee it.  ; P
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > > Sent: Tuesday, May 30, 2006 1:53 PM
> > > To: bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > > "returnundef"
> > >
> > > Although I agree with the sentiment of following PBP, I'm not so sure
> > > changing 'return undef' to 'return' *now* will fix any bugs without
> > > introducing new, subtle ones.
> > >
> > > Chris Fields wrote:
> > > > Torsten,
> > > >
> > > > Any way you can post a list of some/all of the offending lines or
> > >
> > > modules?
> > >
> > > > Sounds like something to consider, but if the list is as large as you
> > >
> > > say we
> > >
> > > > made need something (bugzilla? wiki?) to track the changes and make
> > > > sure they pass tests; I'm sure a large majority will.
> > > >
> > > > I'm guessing Jason would want this somewhere on the project priority
> > >
> > > list or
> > >
> > > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > > start
> > >
> > > a
> > >
> > > > page on the wiki for proposed code changes?
> > > >
> > > > Chris
> > > >
> > > >> -----Original Message-----
> > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > > >> To: bioperl-l at lists.open-bio.org
> > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > > >> "returnundef"
> > > >>
> > > >> FYI Bioperl developers:
> > > >>
> > > >> I just audited the bioperl-live CVS and found about 450 occurrences
> > > >> of "return undef".
> > > >>
> > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > > >> suggest:
> > > >>
> > > >> "Use return; instead of return undef; if you want to return nothing.
> > > >> If someone assigns the return value to an array, the latter creates
> > > >> an array of one value (undef), which evaluates to true. The former
> > > >> will correctly handle all contexts."
> > > >>
> > > >> So I'm guessing at least some of these 450 occurrences *could*
> > > >> result
> > >
> > > in
> > >
> > > >> bugs and should probably be changed.
> > > >>
> > > >> Your opinion may differ :-)
> > > >>
> > > >> --
> > > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > > >>
> > > >> _______________________________________________
> > > >> Bioperl-l mailing list
> > > >> Bioperl-l at lists.open-bio.org
> > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Rutger Vos, PhD. candidate
> > > Department of Biological Sciences
> > > Simon Fraser University
> > > 8888 University Drive
> > > Burnaby, BC, V5A1S6
> > > Phone: 604-291-5625
> > > Fax: 604-291-3496
> > > Personal site: http://www.sfu.ca/~rvosa
> > > FAB* lab: http://www.sfu.ca/~fabstar
> > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From cjfields at uiuc.edu  Wed May 31 12:34:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 11:34:54 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <001201c684d0$263c5530$15327e82@pyrimidine>

Brian, Jay,

I think it would be nice to have the tutorial prominently displayed somehow
(Jay's suggestion), with a link provided via the tutorials page.  Hopefully
this will help with the bioperl newbies.

Jay, looks like there are still some weird formatting issues with the
bptutorial wiki page, something which I ran into before when getting the
Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
spaces preceding a line denotes code for some reason).  Not much you can do
in these cases except remove the extra spaces in those spots.  Looking good
though!  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Wednesday, May 31, 2006 8:58 AM
> To: Jay Hannah; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Jay,
> 
> Excellent! Now we need to answer a few more questions for ourselves:
> 
> - Do we remove the file bptutorial.pl from the package now? I'd say yes,
> we
> don't want to have to maintain two bptutorials.
> 
> - What do we do with the script part of bptutorial.pl? It certainly could
> be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.
> 
> - A link to bptutorial? Or a link to the existing tutorials page?
> http://www.bioperl.org/wiki/Tutorials.
> 
> Any thoughts on these?
> 
> 
> Brian O.
> 
> 
> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
> 
> > http://www.bioperl.org/wiki/Bptutorial.pl
> >
> > I think I just partially fulfilled this TODO:
> >
> >   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >
> > I used Pod::Simple::Wiki (format 'mediawiki') to burn
> > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
> the
> > wiki page via my web browser. (Is that proper procedure? Is the plan to
> just
> > do that manually from time to time as the document changes?)
> >
> > Now what?
> >
> > Should there be a new link on the far left of bioperl.org called
> "Tutorial"?
> >
> > It's an amazing document. IMHO it should be listed prominently on
> bioperl.org.
> >
> > HTH,
> >
> > j
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 31 12:44:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 11:44:31 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311203.13922.lstein@cshl.edu>
Message-ID: <001301c684d1$7e849fd0$15327e82@pyrimidine>

My feeling is the test suite 'should' pick up a large majority of problems
if changes are made to these lines, the quotes there indicating the utopian
idea that the tests are all written well (I believe 99% of the tests are,
BTW).  You can always try the changes (wholesale or on smaller chunks of
code), see if they pass tests on different OS's using 'make/nmake test',
revert the ones that didn't pass, etc.  It's a matter of someone willing to
try it out.

I think the original argument proposed here (originating from Damian Conway
and 'Perl Best Practices') is maybe using 'return undef' is something we
shouldn't be doing since this can lead to subtle errors itself.  Not that
everything we do is considered 'a good practice' by any means.  If I
remember correctly from 'OOPerl', Conway doesn't like combined get/setters
either (he prefers separate getters and setters); we use the 'bad' combined
version predominately in Bioperl.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> Sent: Wednesday, May 31, 2006 11:03 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Heikki Lehvaslaiho
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> I'm afraid that everything depends on the context. If the subroutine is
> documented to return a single scalar, then returning undef is appropriate.
> If
> the subroutine is documented to return "false" on failure, then one must
> call
> return (or "return ()" ).
> 
> Changing all the return undefs to return is going to expose hidden bugs in
> the
> code written by people who are using BioPerl. While I agree wholeheartedly
> with the proposed audit, I think we need to expect that people are going
> to
> complain.
> 
> Lincoln
> 
> 
> On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote:
> > In my opinion the sooner the bugs get exposed the better. It is much
> more
> > likely that there is a well hidden bug caused by assigning accidentally
> > undef into an one element array that someone intentionally writing code
> > that expects that behaviour!
> >
> > I removed (but did not commit yet) all undefs from my old Bio::Variation
> > code and could not see any differences in the test output.
> >
> > Let's remove them!
> >
> > 	-Heikki
> >
> > On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> > > Agreed, though I think these changes should be implemented at some
> point
> > > (Conway's argument here makes sense and it is nice for Torsten to
> check
> > > this out).  If proper tests are written then any changes resulting in
> > > errors should be picked up by checking the appropriate test suite,
> though
> > > I know it doesn't absolutely guarantee it.  ; P
> > >
> > > Chris
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > > > Sent: Tuesday, May 30, 2006 1:53 PM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > > > "returnundef"
> > > >
> > > > Although I agree with the sentiment of following PBP, I'm not so
> sure
> > > > changing 'return undef' to 'return' *now* will fix any bugs without
> > > > introducing new, subtle ones.
> > > >
> > > > Chris Fields wrote:
> > > > > Torsten,
> > > > >
> > > > > Any way you can post a list of some/all of the offending lines or
> > > >
> > > > modules?
> > > >
> > > > > Sounds like something to consider, but if the list is as large as
> you
> > > >
> > > > say we
> > > >
> > > > > made need something (bugzilla? wiki?) to track the changes and
> make
> > > > > sure they pass tests; I'm sure a large majority will.
> > > > >
> > > > > I'm guessing Jason would want this somewhere on the project
> priority
> > > >
> > > > list or
> > > >
> > > > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > > > start
> > > >
> > > > a
> > > >
> > > > > page on the wiki for proposed code changes?
> > > > >
> > > > > Chris
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > > > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > > > >> To: bioperl-l at lists.open-bio.org
> > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > > > >> "returnundef"
> > > > >>
> > > > >> FYI Bioperl developers:
> > > > >>
> > > > >> I just audited the bioperl-live CVS and found about 450
> occurrences
> > > > >> of "return undef".
> > > > >>
> > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > > > >> suggest:
> > > > >>
> > > > >> "Use return; instead of return undef; if you want to return
> nothing.
> > > > >> If someone assigns the return value to an array, the latter
> creates
> > > > >> an array of one value (undef), which evaluates to true. The
> former
> > > > >> will correctly handle all contexts."
> > > > >>
> > > > >> So I'm guessing at least some of these 450 occurrences *could*
> > > > >> result
> > > >
> > > > in
> > > >
> > > > >> bugs and should probably be changed.
> > > > >>
> > > > >> Your opinion may differ :-)
> > > > >>
> > > > >> --
> > > > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > > > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > > > >>
> > > > >> _______________________________________________
> > > > >> Bioperl-l mailing list
> > > > >> Bioperl-l at lists.open-bio.org
> > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > Rutger Vos, PhD. candidate
> > > > Department of Biological Sciences
> > > > Simon Fraser University
> > > > 8888 University Drive
> > > > Burnaby, BC, V5A1S6
> > > > Phone: 604-291-5625
> > > > Fax: 604-291-3496
> > > > Personal site: http://www.sfu.ca/~rvosa
> > > > FAB* lab: http://www.sfu.ca/~fabstar
> > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Wed May 31 10:59:53 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 10:59:53 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <949F348A-391B-495D-ABCE-30BABC37FF05@gmx.net>

I agree. Thanks to Torsten for the audit and Chris for stepping up.

	-hilmar

On May 31, 2006, at 6:55 AM, Heikki Lehvaslaiho wrote:

> In my opinion the sooner the bugs get exposed the better. It is  
> much more
> likely that there is a well hidden bug caused by assigning  
> accidentally undef
> into an one element array that someone intentionally writing code that
> expects that behaviour!
>
> I removed (but did not commit yet) all undefs from my old  
> Bio::Variation code
> and could not see any differences in the test output.
>
> Let's remove them!
>
> 	-Heikki
>
> On Tuesday 30 May 2006 23:40, Chris Fields wrote:
>> Agreed, though I think these changes should be implemented at some  
>> point
>> (Conway's argument here makes sense and it is nice for Torsten to  
>> check
>> this out).  If proper tests are written then any changes resulting in
>> errors should be picked up by checking the appropriate test suite,  
>> though I
>> know it doesn't absolutely guarantee it.  ; P
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
>>> Sent: Tuesday, May 30, 2006 1:53 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
>>> "returnundef"
>>>
>>> Although I agree with the sentiment of following PBP, I'm not so  
>>> sure
>>> changing 'return undef' to 'return' *now* will fix any bugs without
>>> introducing new, subtle ones.
>>>
>>> Chris Fields wrote:
>>>> Torsten,
>>>>
>>>> Any way you can post a list of some/all of the offending lines or
>>>
>>> modules?
>>>
>>>> Sounds like something to consider, but if the list is as large  
>>>> as you
>>>
>>> say we
>>>
>>>> made need something (bugzilla? wiki?) to track the changes and make
>>>> sure they pass tests; I'm sure a large majority will.
>>>>
>>>> I'm guessing Jason would want this somewhere on the project  
>>>> priority
>>>
>>> list or
>>>
>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
>>>> start
>>>
>>> a
>>>
>>>> page on the wiki for proposed code changes?
>>>>
>>>> Chris
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>>>>> "returnundef"
>>>>>
>>>>> FYI Bioperl developers:
>>>>>
>>>>> I just audited the bioperl-live CVS and found about 450  
>>>>> occurrences of
>>>>> "return undef".
>>>>>
>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
>>>>> suggest:
>>>>>
>>>>> "Use return; instead of return undef; if you want to return  
>>>>> nothing.
>>>>> If someone assigns the return value to an array, the latter  
>>>>> creates an
>>>>> array of one value (undef), which evaluates to true. The former  
>>>>> will
>>>>> correctly handle all contexts."
>>>>>
>>>>> So I'm guessing at least some of these 450 occurrences *could*  
>>>>> result
>>>
>>> in
>>>
>>>>> bugs and should probably be changed.
>>>>>
>>>>> Your opinion may differ :-)
>>>>>
>>>>> --
>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Rutger Vos, PhD. candidate
>>> Department of Biological Sciences
>>> Simon Fraser University
>>> 8888 University Drive
>>> Burnaby, BC, V5A1S6
>>> Phone: 604-291-5625
>>> Fax: 604-291-3496
>>> Personal site: http://www.sfu.ca/~rvosa
>>> FAB* lab: http://www.sfu.ca/~fabstar
>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of the Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed May 31 14:08:43 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:08:43 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311203.13922.lstein@cshl.edu>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
	<200605311203.13922.lstein@cshl.edu>
Message-ID: <FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>


On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:

> If the subroutine is documented to return "false" on failure, then  
> one must call
> return (or "return ()" ).

The problem seems to be that 'a value that evaluates to either true  
or false' and 'a [meaningful] value or undef' and 'a value or  
false' ('a value or no value) are not the same in perl. And what  
would/should one expect if the doc states 'true on success and false  
otherwise'?

Maybe the documentation should also be fixed to avoid any ambiguity.  
I.e., avoid documenting 'a value or false' because it may be  
ambiguous (not only) to the less proficient. 'True or false' should  
imply a value being returned.

Comments?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lstein at cshl.edu  Wed May 31 14:14:59 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 14:14:59 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311203.13922.lstein@cshl.edu>
	<FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
Message-ID: <200605311415.00414.lstein@cshl.edu>

If the documentation says "returns false" then I expect to be able to do this:

	@result = foo();
	die "foo() failed" unless @result;

If the documentation says "returns undef" then I expect this:

	@result = foo();
	die "foo() failed" unless $result[0];

Lincoln


On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > If the subroutine is documented to return "false" on failure, then
> > one must call
> > return (or "return ()" ).
>
> The problem seems to be that 'a value that evaluates to either true
> or false' and 'a [meaningful] value or undef' and 'a value or
> false' ('a value or no value) are not the same in perl. And what
> would/should one expect if the doc states 'true on success and false
> otherwise'?
>
> Maybe the documentation should also be fixed to avoid any ambiguity.
> I.e., avoid documenting 'a value or false' because it may be
> ambiguous (not only) to the less proficient. 'True or false' should
> imply a value being returned.
>
> Comments?
>
> 	-hilmar

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From hlapp at gmx.net  Wed May 31 14:31:21 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:31:21 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311415.00414.lstein@cshl.edu>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311203.13922.lstein@cshl.edu>
	<FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
	<200605311415.00414.lstein@cshl.edu>
Message-ID: <241E77AE-8D1E-4708-9C4C-8A9619822DB4@gmx.net>


On May 31, 2006, at 2:14 PM, Lincoln Stein wrote:

> If the documentation says "returns false" then I expect to be able  
> to do this:
>
> 	@result = foo();
> 	die "foo() failed" unless @result;

Except if the alternative to 'false' would be a scalar, you normally  
wouldn't assign it to an array, would you?

I.e., I wouldn't expect this strict of a behavior from an open-source  
package written largely from people whose job is biological science,  
not programming perl knowing and following DC to the letter ... I'd  
rather be on the safe side and assign to a scalar.

Just my $0.02 ...

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed May 31 14:50:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 13:50:30 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>
Message-ID: <001801c684e3$16e33730$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Wednesday, May 31, 2006 9:57 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> Heikki Lehvaslaiho wrote:
> > In my opinion the sooner the bugs get exposed the better. It is much
> more
> > likely that there is a well hidden bug caused by assigning accidentally
> undef
> > into an one element array that someone intentionally writing code that
> > expects that behaviour!
> >
> > I removed (but did not commit yet) all undefs from my old Bio::Variation
> code
> > and could not see any differences in the test output.
> >
> > Let's remove them!
> 
> Just looking for all return undef;s isn't enough. It's entirely possible
> to do something like:
> 
> my $return_value;
> {
>    # do something that assigns to return_value on success
>    # on failure, just do nothing
> }
> return $return_value;

Agreed, though looking for these is obviously much harder.  

The way to get around those is:

return $return_value if $return_value;
return;

which I've seen used in a number of get/set methods. 

> The bioperl docs will typically explicitly state that undef is returned,
> and under what circumstance. If a user suffers from the
> undef-into-array-problem, yes it can be slightly unexpected, but lots of
> unexpected things will happen when you don't use a method correctly, as
> per the docs!

Right, but the argument you make is that code will always work as expected
from the perldoc examples.  My recent experiences with the
Bio::Restriction::IO and Bio::Species classes show that the docs are not
always up-to-date and may indicate the unimplemented intent of the author
more than the actual implementation.  Again, I believe a large majority of
the docs are fine, but it's those few errors that made a devil's advocate of
me...

> Fixing the return of undef is either a job that shouldn't be done, or a
> much harder job than expected.

I don't think ignoring the problem is the best answer here though I agree
the problem is more complicated than at first glance.  Judging from code I'm
trolled through a bit lately I've seen a lot of methods (mainly get/setters)
that are essentially copied multiple times in the same or across similar
modules to save time.  You could see a scenario where, in those instances,
so-called 'bad code' would spread quite quickly.

I think adding a wiki page to address some of these issues would be nice,
something separate from the Project Priority List.

Chris
 _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From forward at hongyu.org  Wed May 31 14:03:46 2006
From: forward at hongyu.org (Hongyu Zhang)
Date: Wed, 31 May 2006 11:03:46 -0700
Subject: [Bioperl-l] New functions for SimpleAlign.pm
Message-ID: <20060531110346.78xod658td8o0w0w@hongyu.org>

Greetings,

I am a new member in this mailing list. Nice to be here.

I wrote two more functions for the alignment module SimpleAlign.pm  
that calculate the percentage of identity based on the shortest and  
longest sequence length, respectively. I also found an error in the  
no_residues() function that calculate the number of residues in the  
alignment.

I am wondering whether they can be added to the official bioperl  
package. I've contacted the original author of this module, Heikki  
Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet.

Thanks.

-- 
Hongyu Zhang, Ph.D.
Computational biologist
Ceres Inc.


From cjfields at uiuc.edu  Wed May 31 15:39:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 14:39:26 -0500
Subject: [Bioperl-l] New functions for SimpleAlign.pm
In-Reply-To: <20060531110346.78xod658td8o0w0w@hongyu.org>
Message-ID: <001901c684e9$ed4a1720$15327e82@pyrimidine>

I added a bit to the FAQ about this:

http://www.bioperl.org/wiki/FAQ#How_do_I_submit_a_patch_or_enhancement_to_Bi
oPerl.3F

and the HOWTO explains things a bit more directly:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

In brief, these need to be submitted to Bugzilla as either code enhancements
(for your added methods) or bugs with the patch to the relevant code.  Code
enhancements probably should include some code and test cases to demonstrate
usage.  Patches to buggy code are checked to make sure they pass relevant
tests by the core developers.  Submitting it to the mail list is definitely
the first step, though, so you're on the right path.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hongyu Zhang
> Sent: Wednesday, May 31, 2006 1:04 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] New functions for SimpleAlign.pm
> 
> Greetings,
> 
> I am a new member in this mailing list. Nice to be here.
> 
> I wrote two more functions for the alignment module SimpleAlign.pm
> that calculate the percentage of identity based on the shortest and
> longest sequence length, respectively. I also found an error in the
> no_residues() function that calculate the number of residues in the
> alignment.
> 
> I am wondering whether they can be added to the official bioperl
> package. I've contacted the original author of this module, Heikki
> Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet.
> 
> Thanks.
> 
> --
> Hongyu Zhang, Ph.D.
> Computational biologist
> Ceres Inc.
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 31 16:40:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 15:40:19 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311415.00414.lstein@cshl.edu>
Message-ID: <002001c684f2$6fb7daf0$15327e82@pyrimidine>

What about modules that have 'throw_not_implemented' statements present?
Here's a list with the total for each.  Some of these are interfaces (I got
rid of a number that ended in 'I' or 'IO' to remove the I/IO interfaces but
it misses a few).  There are a number here that are implementations, though
(Bio::AlignIO::maf, Bio::Restriction:IO::*), so they are technically
incomplete:

Instances: 1	Module : Bio::AlignIO::maf
Instances: 25	Module : Bio::Assembly::Contig
Instances: 2	Module : Bio::Assembly::ContigAnalysis
Instances: 2	Module : Bio::Biblio::BiblioBase
Instances: 4	Module : Bio::DB::Expression
Instances: 2	Module : Bio::DB::Expression::geo
Instances: 5	Module : Bio::DB::Flat
Instances: 2	Module : Bio::DB::Query::WebQuery
Instances: 17	Module : Bio::DB::SeqFeature::Store
Instances: 2	Module : Bio::DB::SeqVersion
Instances: 3	Module : Bio::DB::Taxonomy
Instances: 1	Module : Bio::FeatureIO::bed
Instances: 1	Module : Bio::Map::Marker
Instances: 1	Module : Bio::MapIO::fpc
Instances: 1	Module : Bio::MapIO::mapmaker
Instances: 1	Module : Bio::Restriction::IO::bairoch
Instances: 1	Module : Bio::Restriction::IO::itype2
Instances: 1	Module : Bio::Restriction::IO::withrefm
Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
Instances: 3	Module : Bio::Tools::Run::WrapperBase

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> Sent: Wednesday, May 31, 2006 1:15 PM
> To: Hilmar Lapp
> Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> If the documentation says "returns false" then I expect to be able to do
> this:
> 
> 	@result = foo();
> 	die "foo() failed" unless @result;
> 
> If the documentation says "returns undef" then I expect this:
> 
> 	@result = foo();
> 	die "foo() failed" unless $result[0];
> 
> Lincoln
> 
> 
> On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > If the subroutine is documented to return "false" on failure, then
> > > one must call
> > > return (or "return ()" ).
> >
> > The problem seems to be that 'a value that evaluates to either true
> > or false' and 'a [meaningful] value or undef' and 'a value or
> > false' ('a value or no value) are not the same in perl. And what
> > would/should one expect if the doc states 'true on success and false
> > otherwise'?
> >
> > Maybe the documentation should also be fixed to avoid any ambiguity.
> > I.e., avoid documenting 'a value or false' because it may be
> > ambiguous (not only) to the less proficient. 'True or false' should
> > imply a value being returned.
> >
> > Comments?
> >
> > 	-hilmar
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Wed May 31 17:07:06 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 17:07:06 -0400
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
References: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
Message-ID: <200605311707.08196.lstein@cshl.edu>


> Instances: 17	Module : Bio::DB::SeqFeature::Store

This is intentional. Bio::DB::SeqFeature::Store is intended to be a virtual 
base class. The throw_not_implemented() calls are there to force developers 
to override the needed interface methods.

If this is not the right way to do it, let me know and I'll fix it.

Lincoln


> Instances: 2	Module : Bio::DB::SeqVersion
> Instances: 3	Module : Bio::DB::Taxonomy
> Instances: 1	Module : Bio::FeatureIO::bed
> Instances: 1	Module : Bio::Map::Marker
> Instances: 1	Module : Bio::MapIO::fpc
> Instances: 1	Module : Bio::MapIO::mapmaker
> Instances: 1	Module : Bio::Restriction::IO::bairoch
> Instances: 1	Module : Bio::Restriction::IO::itype2
> Instances: 1	Module : Bio::Restriction::IO::withrefm
> Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
> Instances: 3	Module : Bio::Tools::Run::WrapperBase
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> > Sent: Wednesday, May 31, 2006 1:15 PM
> > To: Hilmar Lapp
> > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> > Subject: Re: [Bioperl-l] For CVS developers - potential
> > pitfallwith"returnundef"
> >
> > If the documentation says "returns false" then I expect to be able to do
> > this:
> >
> > 	@result = foo();
> > 	die "foo() failed" unless @result;
> >
> > If the documentation says "returns undef" then I expect this:
> >
> > 	@result = foo();
> > 	die "foo() failed" unless $result[0];
> >
> > Lincoln
> >
> > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > > If the subroutine is documented to return "false" on failure, then
> > > > one must call
> > > > return (or "return ()" ).
> > >
> > > The problem seems to be that 'a value that evaluates to either true
> > > or false' and 'a [meaningful] value or undef' and 'a value or
> > > false' ('a value or no value) are not the same in perl. And what
> > > would/should one expect if the doc states 'true on success and false
> > > otherwise'?
> > >
> > > Maybe the documentation should also be fixed to avoid any ambiguity.
> > > I.e., avoid documenting 'a value or false' because it may be
> > > ambiguous (not only) to the less proficient. 'True or false' should
> > > imply a value being returned.
> > >
> > > Comments?
> > >
> > > 	-hilmar
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From hlapp at gmx.net  Wed May 31 17:21:57 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:21:57 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
References: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
Message-ID: <A5EEA3BE-DEC6-42F2-AC44-D54F6C49DD8E@gmx.net>


On May 31, 2006, at 4:40 PM, Chris Fields wrote:

> What about modules that have 'throw_not_implemented' statements  
> present?

Those are often if not always legitimate - the problem are those that  
don't have them but fail to override an inherited interface or  
abstract method.

If something is not implemented what is the better way to express  
this other than throwing an exception? (and if it's not an interface  
or abstract base class, saying so in the documentation)

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed May 31 17:25:48 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:25:48 -0400
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine>
References: <001801c684e3$16e33730$15327e82@pyrimidine>
Message-ID: <8AA04BF0-FA79-43CF-9FBB-310314FECD91@gmx.net>


On May 31, 2006, at 2:50 PM, Chris Fields wrote:

> I've seen a lot of methods (mainly get/setters)
> that are essentially copied multiple times in the same or across  
> similar
> modules to save time.  You could see a scenario where, in those  
> instances,
> so-called 'bad code' would spread quite quickly.

This will usually be code generated by macros, e.g. the emacs macros  
for getter/setter generation for properties.

If the macro generates wrong code, that's indeed pretty bad. (We've  
had that.) OTOH it should be spotted quickly as well. And macro  
changes or new macros should probably be scrutinized by all eyes  
watching ...

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed May 31 17:40:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 16:40:22 -0500
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <A5EEA3BE-DEC6-42F2-AC44-D54F6C49DD8E@gmx.net>
Message-ID: <002401c684fa$d28e7640$15327e82@pyrimidine>

I think, as long as it's reflected in the docs that something doesn't work
(hasn't been implemented) then there's no problem.  It's when the docs are
misleading that we run into problems.  

The sticking point lies with some classes, such as IO classes (like SeqIO,
or Restrict::IO, with read and write methods) where the IO base class
specifies that it is possible to read and write a particular format but the
actual implementation varies according to whether or not the derived class
overrides the base or interface method (in other words, 'doesn't work as
advertised' only in specific circumstances).  I don't know how to solve this
issue except to add in the docs that specific formats don't implement
write() methods.  

Personally, I haven't had an issue with it and it probably makes no
difference, but I think it needs to be pointed out.  The most extreme I ran
into was Bio::Restriction::IO, which had 3 out of 4 plugin modules that
didn't implement the write() method but left this in the synopsis in POD:

    use Bio::Restriction::IO;

    $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                     -format => 'withrefm');
    $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                     -format => 'bairoch');
    my $res = $in->read; # a Bio::Restriction::EnzymeCollection
    $out->write($res);

  # or

  #    use Bio::Restriction::IO;
  #
  #    #input file format can be read from the file extension (dat|xml)
  #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
  #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
  #
  #    # World's shortest flat<->xml format converter:
  #    print $out $_ while <$in>;

None of this code works; in fact, no XML parser even exists for these IO
classes!  Bio::AlignIO also has a few as well (maf and Stockholm formats
don't write).

Chris


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, May 31, 2006 4:22 PM
> To: Chris Fields
> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho'
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> 
> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> 
> > What about modules that have 'throw_not_implemented' statements
> > present?
> 
> Those are often if not always legitimate - the problem are those that
> don't have them but fail to override an inherited interface or
> abstract method.
> 
> If something is not implemented what is the better way to express
> this other than throwing an exception? (and if it's not an interface
> or abstract base class, saying so in the documentation)
> 
> 	-hilmar
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From hlapp at gmx.net  Wed May 31 17:55:37 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:55:37 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <002401c684fa$d28e7640$15327e82@pyrimidine>
References: <002401c684fa$d28e7640$15327e82@pyrimidine>
Message-ID: <CB29173C-0BFC-43CA-A620-519084AFEE04@gmx.net>

This is documentation cruft resulting from copy&paste w/o later  
fixing it. (which isn't a justification)

Note that not implementing the write is as legitimate as not  
implementing the read method ... It should be pointed out in the  
documentation though that it will depend on the actual implementation  
of the format whether it supports reading or writing or both.

	-hilmar

On May 31, 2006, at 5:40 PM, Chris Fields wrote:

> I think, as long as it's reflected in the docs that something  
> doesn't work
> (hasn't been implemented) then there's no problem.  It's when the  
> docs are
> misleading that we run into problems.
>
> The sticking point lies with some classes, such as IO classes (like  
> SeqIO,
> or Restrict::IO, with read and write methods) where the IO base class
> specifies that it is possible to read and write a particular format  
> but the
> actual implementation varies according to whether or not the  
> derived class
> overrides the base or interface method (in other words, 'doesn't  
> work as
> advertised' only in specific circumstances).  I don't know how to  
> solve this
> issue except to add in the docs that specific formats don't implement
> write() methods.
>
> Personally, I haven't had an issue with it and it probably makes no
> difference, but I think it needs to be pointed out.  The most  
> extreme I ran
> into was Bio::Restriction::IO, which had 3 out of 4 plugin modules  
> that
> didn't implement the write() method but left this in the synopsis  
> in POD:
>
>     use Bio::Restriction::IO;
>
>     $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>                                      -format => 'withrefm');
>     $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>                                      -format => 'bairoch');
>     my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>     $out->write($res);
>
>   # or
>
>   #    use Bio::Restriction::IO;
>   #
>   #    #input file format can be read from the file extension (dat| 
> xml)
>   #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
>   #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>   #
>   #    # World's shortest flat<->xml format converter:
>   #    print $out $_ while <$in>;
>
> None of this code works; in fact, no XML parser even exists for  
> these IO
> classes!  Bio::AlignIO also has a few as well (maf and Stockholm  
> formats
> don't write).
>
> Chris
>
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Wednesday, May 31, 2006 4:22 PM
>> To: Chris Fields
>> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki  
>> Lehvaslaiho'
>> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
>>
>>
>> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
>>
>>> What about modules that have 'throw_not_implemented' statements
>>> present?
>>
>> Those are often if not always legitimate - the problem are those that
>> don't have them but fail to override an inherited interface or
>> abstract method.
>>
>> If something is not implemented what is the better way to express
>> this other than throwing an exception? (and if it's not an interface
>> or abstract base class, saying so in the documentation)
>>
>> 	-hilmar
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From slenk at emich.edu  Wed May 31 17:52:13 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Wed, 31 May 2006 17:52:13 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
Message-ID: <100682f110067a83.10067a83100682f1@emich.edu>


Isn't it fairly standard in OO schemes/languages to have an exception thrown if a method 
can't be found at the 
end of a search up the class hierarchy? I recall being very mad at Smalltalk because "method 
not found" kept 
biting me. C++ has pure virtual base classes that do not allow objects to be instantiated 
directly; they are 
meant to be inherited and then implemented. 

Perl 6 was mentioned a bit back. Is this issue addressed there? Should it be? Do the Bioperl 
people feed their 
needs into Perl 6 so that all the code effort to make Bio::Root is handled for them in the next 
effort by Perl 6 
itself. Make the Perl 6 people solve these issues with your input, then you will not have to 
deal with 
implementing it yourselves. I'll just bet that you are not the only potential users of Perl 6 who 
will have to solve 
these issues eventually.


----- Original Message -----
From: Hilmar Lapp <hlapp at gmx.net>
Date: Wednesday, May 31, 2006 5:21 pm
Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented

> 
> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> 
> > What about modules that have 'throw_not_implemented' statements  
> > present?
> 
> Those are often if not always legitimate - the problem are those 
> that  
> don't have them but fail to override an inherited interface or  
> abstract method.
> 
> If something is not implemented what is the better way to express  
> this other than throwing an exception? (and if it's not an 
> interface  
> or abstract base class, saying so in the documentation)
> 
> 	-hilmar
> 
> -- 
> 
=========================================================
==
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> 
=========================================================
==
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

From arareko at campus.iztacala.unam.mx  Wed May 31 18:49:03 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 31 May 2006 17:49:03 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <001201c684d0$263c5530$15327e82@pyrimidine>
References: <001201c684d0$263c5530$15327e82@pyrimidine>
Message-ID: <447E1D5F.1050807@campus.iztacala.unam.mx>

Brian, Jay, Chris,

I agree with what Bernd Web said in another reply. For some people will 
be nice to still be able to run the script from the codebase and 
interact with it.

I don't think it should be a lot of problem to maintain both tutorials, 
as long as the 'main' one is the one in the CVS tree. By reading what 
Jay did in order to convert it into mediawiki format, I suppose this can 
be easily done again for each new change to the script (again, this is 
just my guessing). Besides, as far as I've seen, there aren't frequent 
commits to the script at all.

I've added a link in the left menu of the wiki. If you think it should 
point to the Tutorials page instead of the Bptutorial.pl page please let 
me know.

Regards,
Mauricio.

Chris Fields wrote:
> Brian, Jay,
> 
> I think it would be nice to have the tutorial prominently displayed somehow
> (Jay's suggestion), with a link provided via the tutorials page.  Hopefully
> this will help with the bioperl newbies.
> 
> Jay, looks like there are still some weird formatting issues with the
> bptutorial wiki page, something which I ran into before when getting the
> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
> spaces preceding a line denotes code for some reason).  Not much you can do
> in these cases except remove the extra spaces in those spots.  Looking good
> though!  
> 
> Chris
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
>> Sent: Wednesday, May 31, 2006 8:58 AM
>> To: Jay Hannah; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
>>
>> Jay,
>>
>> Excellent! Now we need to answer a few more questions for ourselves:
>>
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes,
>> we
>> don't want to have to maintain two bptutorials.
>>
>> - What do we do with the script part of bptutorial.pl? It certainly could
>> be
>> excised and put into the examples/ directory, for example, but this would
>> break a few of the paths that are being used.
>>
>> - A link to bptutorial? Or a link to the existing tutorials page?
>> http://www.bioperl.org/wiki/Tutorials.
>>
>> Any thoughts on these?
>>
>>
>> Brian O.
>>
>>
>> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
>>
>>> http://www.bioperl.org/wiki/Bptutorial.pl
>>>
>>> I think I just partially fulfilled this TODO:
>>>
>>>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
>>>
>>> I used Pod::Simple::Wiki (format 'mediawiki') to burn
>>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
>> the
>>> wiki page via my web browser. (Is that proper procedure? Is the plan to
>> just
>>> do that manually from time to time as the document changes?)
>>>
>>> Now what?
>>>
>>> Should there be a new link on the far left of bioperl.org called
>> "Tutorial"?
>>> It's an amazing document. IMHO it should be listed prominently on
>> bioperl.org.
>>> HTH,
>>>
>>> j
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Wed May 31 20:43:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 19:43:48 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311707.08196.lstein@cshl.edu>
Message-ID: <002801c68514$72f11480$15327e82@pyrimidine>


> -----Original Message-----
> From: Lincoln Stein [mailto:lstein at cshl.edu]
> Sent: Wednesday, May 31, 2006 4:07 PM
> To: Chris Fields
> Cc: 'Hilmar Lapp'; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho'
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> 
> > Instances: 17	Module : Bio::DB::SeqFeature::Store
> 
> This is intentional. Bio::DB::SeqFeature::Store is intended to be a
> virtual
> base class. The throw_not_implemented() calls are there to force
> developers
> to override the needed interface methods.
> 
> If this is not the right way to do it, let me know and I'll fix it.

That's the right way, though I don't really know what the 'right way' is.
Sorry Lincoln, didn't mean to imply anything directly at you specifically; I
responded to your last post to stay in the thread, so to speak.  It was
meant to be a general statement that some classes haven't implemented
methods specified by their abstract base or interface class.  This is just
output from a quickie script I wrote up to check on this and see how many of
these statements are out there, and since there isn't a full-proof method to
know what an abstract base class is, it pulls in a few abstract classes
(such as yours) along with all the others.  At least there aren't as many
hits as Torsten's ~400-500 for 'return undef'! 

Anyway, I'm not sure what would be the best place to address code problems
or issues like the unimplemented methods issue or Torsten's audits (list,
wiki, etc); it's a delicate issue b/c it's bordering on code critiquing and
what constitutes good vs. bad code.  I remember some pretty heated arguments
about the 'proper' way to do things a while back involving AUTOLOAD'ing
methods, which I think is summarized somewhere in the wiki.  Myself, I'm a
microbiologist and not a programmer, so I'm prone to bouts of hackery, but I
try to have the code at least do what the docs state.

Chris

> Lincoln
> 
> 
> > Instances: 2	Module : Bio::DB::SeqVersion
> > Instances: 3	Module : Bio::DB::Taxonomy
> > Instances: 1	Module : Bio::FeatureIO::bed
> > Instances: 1	Module : Bio::Map::Marker
> > Instances: 1	Module : Bio::MapIO::fpc
> > Instances: 1	Module : Bio::MapIO::mapmaker
> > Instances: 1	Module : Bio::Restriction::IO::bairoch
> > Instances: 1	Module : Bio::Restriction::IO::itype2
> > Instances: 1	Module : Bio::Restriction::IO::withrefm
> > Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
> > Instances: 3	Module : Bio::Tools::Run::WrapperBase
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> > > Sent: Wednesday, May 31, 2006 1:15 PM
> > > To: Hilmar Lapp
> > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> > > Subject: Re: [Bioperl-l] For CVS developers - potential
> > > pitfallwith"returnundef"
> > >
> > > If the documentation says "returns false" then I expect to be able to
> do
> > > this:
> > >
> > > 	@result = foo();
> > > 	die "foo() failed" unless @result;
> > >
> > > If the documentation says "returns undef" then I expect this:
> > >
> > > 	@result = foo();
> > > 	die "foo() failed" unless $result[0];
> > >
> > > Lincoln
> > >
> > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > > > If the subroutine is documented to return "false" on failure, then
> > > > > one must call
> > > > > return (or "return ()" ).
> > > >
> > > > The problem seems to be that 'a value that evaluates to either true
> > > > or false' and 'a [meaningful] value or undef' and 'a value or
> > > > false' ('a value or no value) are not the same in perl. And what
> > > > would/should one expect if the doc states 'true on success and false
> > > > otherwise'?
> > > >
> > > > Maybe the documentation should also be fixed to avoid any ambiguity.
> > > > I.e., avoid documenting 'a value or false' because it may be
> > > > ambiguous (not only) to the less proficient. 'True or false' should
> > > > imply a value being returned.
> > > >
> > > > Comments?
> > > >
> > > > 	-hilmar
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed May 31 20:56:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 19:56:12 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx>
Message-ID: <002901c68516$316d4fe0$15327e82@pyrimidine>

Mauricio et al,

Sounds good, except that there are a few issues with the formatting done by
Pod::Simple::Wiki, such as changing some things to <code> tags when they
obviously aren't code; I don't know if thee is a work around for that
(Jay?).  It may not be anything too serious though.  

There was a similar issue with the INSTALL doc conversion to wiki that I ran
into, in that I don't think it will be easy converting one way or the other
(POD->wiki or wiki->POD or text), so syncing updates with wiki and CVS docs
could be an issue we'll have to face in the future.

We could strip the POD out of the script and have the docs on the wiki
(Brian's idea), or have minimal POD in the tutorial and keep the wiki
updated, just to simplify things, but this may not appeal to those who use
perldoc frequently (I personally use browsable prettified HTML).

cjf

> -----Original Message-----
> From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx]
> Sent: Wednesday, May 31, 2006 5:49 PM
> To: Chris Fields
> Cc: 'Brian Osborne'; 'Jay Hannah'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Brian, Jay, Chris,
> 
> I agree with what Bernd Web said in another reply. For some people will
> be nice to still be able to run the script from the codebase and
> interact with it.
> 
> I don't think it should be a lot of problem to maintain both tutorials,
> as long as the 'main' one is the one in the CVS tree. By reading what
> Jay did in order to convert it into mediawiki format, I suppose this can
> be easily done again for each new change to the script (again, this is
> just my guessing). Besides, as far as I've seen, there aren't frequent
> commits to the script at all.
> 
> I've added a link in the left menu of the wiki. If you think it should
> point to the Tutorials page instead of the Bptutorial.pl page please let
> me know.
> 
> Regards,
> Mauricio.
> 
> Chris Fields wrote:
> > Brian, Jay,
> >
> > I think it would be nice to have the tutorial prominently displayed
> somehow
> > (Jay's suggestion), with a link provided via the tutorials page.
> Hopefully
> > this will help with the bioperl newbies.
> >
> > Jay, looks like there are still some weird formatting issues with the
> > bptutorial wiki page, something which I ran into before when getting the
> > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or
> more
> > spaces preceding a line denotes code for some reason).  Not much you can
> do
> > in these cases except remove the extra spaces in those spots.  Looking
> good
> > though!
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> >> Sent: Wednesday, May 31, 2006 8:58 AM
> >> To: Jay Hannah; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> >>
> >> Jay,
> >>
> >> Excellent! Now we need to answer a few more questions for ourselves:
> >>
> >> - Do we remove the file bptutorial.pl from the package now? I'd say
> yes,
> >> we
> >> don't want to have to maintain two bptutorials.
> >>
> >> - What do we do with the script part of bptutorial.pl? It certainly
> could
> >> be
> >> excised and put into the examples/ directory, for example, but this
> would
> >> break a few of the paths that are being used.
> >>
> >> - A link to bptutorial? Or a link to the existing tutorials page?
> >> http://www.bioperl.org/wiki/Tutorials.
> >>
> >> Any thoughts on these?
> >>
> >>
> >> Brian O.
> >>
> >>
> >> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
> >>
> >>> http://www.bioperl.org/wiki/Bptutorial.pl
> >>>
> >>> I think I just partially fulfilled this TODO:
> >>>
> >>>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >>>
> >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn
> >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
> >> the
> >>> wiki page via my web browser. (Is that proper procedure? Is the plan
> to
> >> just
> >>> do that manually from time to time as the document changes?)
> >>>
> >>> Now what?
> >>>
> >>> Should there be a new link on the far left of bioperl.org called
> >> "Tutorial"?
> >>> It's an amazing document. IMHO it should be listed prominently on
> >> bioperl.org.
> >>> HTH,
> >>>
> >>> j
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM


From osborne1 at optonline.net  Wed May 31 21:37:15 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 31 May 2006 21:37:15 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx>
Message-ID: <C0A3BD0B.8A2C%osborne1@optonline.net>

Mauricio,

Bernd didn't say he want the _script_ in the package, he said he wanted
bptutorial.pl in the package, not indicating whether it was the
documentation or the script that was important. It's my suspicion that the
documentation is more important than the script, and this is what my last
letter was asking, in part: is the script important? Or can we focus on the
text/POD part?

Brian O.


On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
<arareko at campus.iztacala.unam.mx> wrote:

> I agree with what Bernd Web said in another reply. For some people will
> be nice to still be able to run the script from the codebase and
> interact with it.


From cjfields at uiuc.edu  Wed May 31 21:42:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 20:42:54 -0500
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <100682f110067a83.10067a83100682f1@emich.edu>
Message-ID: <002a01c6851c$b3b8a980$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Stephen Gordon Lenk
> Sent: Wednesday, May 31, 2006 4:52 PM
> To: Hilmar Lapp
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> 
> Isn't it fairly standard in OO schemes/languages to have an exception
> thrown if a method
> can't be found at the
> end of a search up the class hierarchy? I recall being very mad at
> Smalltalk because "method
> not found" kept
> biting me. C++ has pure virtual base classes that do not allow objects to
> be instantiated
> directly; they are
> meant to be inherited and then implemented.

Perl will throw an error if it can't find a method in a class hierarchy.
It will do a few things first before dying, like looking for AUTOLOAD, etc.
AUTOLOAD has it's supporters and detractors; I try to stay away from it as
much as possible.

Not sure about C++ like pure virtual classes in Perl5, i.e. not allowing
direct object instantiation, but Perl6 is supposed to have them, at least
according to Apocalypse 12.  From what Mr. Wall says about OOP in Perl5,
it's essentially 'bolted on' but works with caveats (is 'private' really
'private'?).  Perl6 is rebuilt from scratch (internals are OO).

> Perl 6 was mentioned a bit back. Is this issue addressed there? Should it
> be? Do the Bioperl
> people feed their
> needs into Perl 6 so that all the code effort to make Bio::Root is handled
> for them in the next
> effort by Perl 6
> itself. Make the Perl 6 people solve these issues with your input, then
> you will not have to
> deal with
> implementing it yourselves. I'll just bet that you are not the only
> potential users of Perl 6 who
> will have to solve
> these issues eventually.

I think Perl6 will solve most (if not all) these problems since it's a
complete rebuild.  In fact, it's pretty much a new language altogether from
what I have seen (and the little I have played around with using Pugs).
Parrot is supposed to handle mixes of Perl5/Perl6, so it may not be
necessary to immediately convert all of bioperl to Perl6.  Though I have
also heard of a Perl5->6 converter in the works as well...  

>From an OO standpoint, I believe everything is considered an object in
Perl6, though it's not supposed to force you into using objects according to
the Apocalypses that I have read.  I actually see a lot there that reminds
me of C++ (but in a Perl-ish way, of course).  Apocalypse 12 is a good
primer, though you may want to go through the others first, they're heavy
slogging:

http://dev.perl.org/perl6/doc/design/apo/A12.html

Not sure what you mean by 'feeding our needs into Perl6'.  I have
periodically checked on perl6 progress and they seem to have everything well
under control.

Chris
 
> ----- Original Message -----
> From: Hilmar Lapp <hlapp at gmx.net>
> Date: Wednesday, May 31, 2006 5:21 pm
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> >
> > On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> >
> > > What about modules that have 'throw_not_implemented' statements
> > > present?
> >
> > Those are often if not always legitimate - the problem are those
> > that
> > don't have them but fail to override an inherited interface or
> > abstract method.
> >
> > If something is not implemented what is the better way to express
> > this other than throwing an exception? (and if it's not an
> > interface
> > or abstract base class, saying so in the documentation)
> >
> > 	-hilmar
> >
> > --
> >
> =========================================================
> ==
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >
> =========================================================
> ==
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jay at jays.net  Wed May 31 21:54:01 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 20:54:01 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <447E48B9.4080503@jays.net>

Brian Osborne wrote:
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.

We certainly wouldn't want to try to maintain two copies, one POD one in wiki. That would be the worst of all options. One option that hasn't been mentioned yet is to keep maintenance of that in POD in the distro (leaving the cool runability alone), and then flag that document as unchangeable in the wiki with a note on top "Maintenance of this document is done in POD in the distro. Submit POD patches to bioperl-l and we'll re-post an updated copy to this wiki."

Just a thought.

> - What do we do with the script part of bptutorial.pl? It certainly could be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.

/README says this:

 scripts/    - Useful production-quality scripts with POD documentation
 examples/   - Scripts demonstrating the many uses of Bioperl

I'm personally not clear on the difference. Little stuff should start in examples/ and graduate to scripts/ once they've matured? 

Is the doc/ tree being abandoned?

doc/faq        (empty?)
doc/howto      
doc/howto/examples
doc/howto/figs (empty?)
doc/howto/html (empty?)
doc/howto/pdf  (empty?)
doc/howto/sgml (empty?)
doc/howto/txt  (empty?)
doc/howto/xml  (empty?)

Does all that stuff officially live in and is being changed in the wiki, never to return to the distro?

Any reason those empty dirs aren't nuked out of CVS?

Chris Fields wrote:
> Jay, looks like there are still some weird formatting issues with the
> bptutorial wiki page, something which I ran into before when getting the
> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
> spaces preceding a line denotes code for some reason).  Not much you can do
> in these cases except remove the extra spaces in those spots.  Looking good
> though!  

Sorry, I spent zero time on the whole conversion. I'm not sure what parts didn't convert well. I've never done that conversion before, and know nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran off to work. :)

Mauricio Herrera Cuadra wrote:
> I've added a link in the left menu of the wiki. If you think it should 
> point to the Tutorials page instead of the Bptutorial.pl page please let 
> me know.

Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so?

Documentation  (linked on the left menu)
- Quick start
- FAQ
- HOWTOs
- Tutorials

(What's the conceptual difference between a HOWTO and a tutorial?)

It's hard for me to dive into a wiki lifestyle for the huge documentation pillars since it can't ever get back into the distro... (can it?)  Small, throw away stuff is great for the wiki, but huge, established, thoughtful, long documents should be left in the distro? Present (and searchable) on the wiki but static?

Why isn't the short "Current events" just listed on the top of the "News" page?

Sick of my endless questions yet? -grin-

j


From cjfields at uiuc.edu  Wed May 31 23:09:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 22:09:38 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
Message-ID: <000001c68528$d1b6ec10$15327e82@pyrimidine>


...

> We certainly wouldn't want to try to maintain two copies, one POD one in
> wiki. That would be the worst of all options. One option that hasn't been
> mentioned yet is to keep maintenance of that in POD in the distro (leaving
> the cool runability alone), and then flag that document as unchangeable in
> the wiki with a note on top "Maintenance of this document is done in POD
> in the distro. Submit POD patches to bioperl-l and we'll re-post an
> updated copy to this wiki."
> 
> Just a thought.

There are probably three schools of thought on docs: those that like nice
docs with links within and beyond BioPerl (hence the wiki), those who like
including docs with the distribution, and those that would like both.  The
latter would be nice but isn't realistic unless we can come up with a way to
sync changes between the wiki and CVS those docs we want to include with the
distribution w/o too much trouble.  I'm in the first school of thought since
rich text with links is better and more informative than plain text any day.
It might be a very small school though...

> > - What do we do with the script part of bptutorial.pl? It certainly
> could be
> > excised and put into the examples/ directory, for example, but this
> would
> > break a few of the paths that are being used.
> 
> /README says this:
> 
>  scripts/    - Useful production-quality scripts with POD documentation
>  examples/   - Scripts demonstrating the many uses of Bioperl
> 
> I'm personally not clear on the difference. Little stuff should start in
> examples/ and graduate to scripts/ once they've matured?
> 
> Is the doc/ tree being abandoned?

Most docs have been moved over to the wiki, which generates nicely formatted
docs for printing.
...

> Does all that stuff officially live in and is being changed in the wiki,
> never to return to the distro?

It's easier to add changes in the wiki and add markup, links, etc.  Much
richer text, so on.
 
> Any reason those empty dirs aren't nuked out of CVS?
> 
> Chris Fields wrote:
> > Jay, looks like there are still some weird formatting issues with the
> > bptutorial wiki page, something which I ran into before when getting the
> > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or
> more
> > spaces preceding a line denotes code for some reason).  Not much you can
> do
> > in these cases except remove the extra spaces in those spots.  Looking
> good
> > though!
> 
> Sorry, I spent zero time on the whole conversion. I'm not sure what parts
> didn't convert well. I've never done that conversion before, and know
> nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing
> then ran off to work. :)

No big deal.  

> Mauricio Herrera Cuadra wrote:
> > I've added a link in the left menu of the wiki. If you think it should
> > point to the Tutorials page instead of the Bptutorial.pl page please let
> > me know.
> 
> Instead of all these competing links on the left, maybe we should have a
> master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials

Okay, though Mauricio may know a bit more on how/if this can be done.
Mauricio?

> (What's the conceptual difference between a HOWTO and a tutorial?)

I believe the reasoning is along these lines: HOWTO's are focused in on
specific areas (graphics, trees, BLAST report parsing, etc) and thus usually
has greater detail. The tutorials are more broadly based (sort of a general
bioperl HOWTO).  The only exception is the Beginner's HOWTO, but even that
has additional information over the tutorial (at least it did the last time
I looked at the tutorial, which has been a while).

> It's hard for me to dive into a wiki lifestyle for the huge documentation
> pillars since it can't ever get back into the distro... (can it?)  Small,
> throw away stuff is great for the wiki, but huge, established, thoughtful,
> long documents should be left in the distro? Present (and searchable) on
> the wiki but static?

Hence the problem we face now.  It is something we need to really look into
before adding too much more to the wiki.  IMHO, I think we should have very
little information directly in the distribution itself since it's already
quite large.  It's almost as easy to have a bare-bones INSTALL file, which
would point to the wiki for additional information.  But I may be very much
alone in that train of thought ; >

> Why isn't the short "Current events" just listed on the top of the "News"
> page?

Don't know.
 
> Sick of my endless questions yet? -grin-

Not really.

cjf

> j
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gad14 at cornell.edu  Tue May 30 12:57:41 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Tue, 30 May 2006 12:57:41 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447BFB20.40501@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
Message-ID: <447C7985.9000404@cornell.edu>

Thanks for your comment Sendu, it was very helpful. I think this must be 
what's going on.. I am using $blast_report->next_result in both 
subroutines. It appears that analyzing the blast results first w/ my 
sort subroutine empties (?) the $blast_result object so that when I try 
to print, there is nothing left to print. (and visa-versa when I print 
first then try to sort).
So, from the looks of things, using next_result has the effect of 
popping the Bio::Search::Result::ResultI objects off of the SearchIO 
blast report object??

It seems I could get around this by making a copy of the blast report by 
setting it to another new variable...(not the most elegant solution) but 
I'm having trouble with this...

If I do:

	my $blast_report_copy = $blast_report;

I'm just copying the reference to the SearchIO blast result, so it 
doesn't help me. How can I make another physical copy of this blast 
result object? Seems like a simple thing but how to do it is escaping me.

But better yet, the way to go is to 'reset the counter,' or to find a 
way to look at/print/sort the results without removing data from the 
blast result object. How is this done though??

Sendu and Brian, I didn't post the sort_results subroutine because it is 
sprawling, as is a lot of my code. The code I provided was more like an 
aid for my explanation of the problem.. it doesn't actually run - sorry 
for the confusion, I should have more clear on that.  The important 
thing to know perhaps is that both sort_results and print_blast_results 
contain a foreach loop where I am using the 'next_results' method to 
view blast results. (And to clarify for Torsten, the blastall() is 
working just fine - the analysis/viewing of the results object is where 
I am encountering the problem.)


Any other ideas would be greatly appreciated...

Thank you,
Genevieve


Sendu Bala wrote:

> Genevieve DeClerck wrote:
> 
>> Hi,
> 
> [snip]
> 
>> If I've sorted the results the sorted-results will print to screen, 
>> however when I try to print the Hit Table results nothing is returned, 
>> as if the blast results have evaporated.... and visa versa, if i 
>> comment out the part where i point my sorting subroutine to the blast 
>> results reference,  my hit table results suddenly prints to screen.
> 
> [snip]
> 
>> Here's an abbreviated version of my code:
> 
> [snip]
> 
>> #######
>> ### the following 2 actions seem to be mutually exclusive.
>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>> # SeqFeature objs stored in arrays. arrays are then printed
>> # to stdout
>> &sort_results($blast_report);
>>
>> # 2) print blast results
>> &print_blast_results($blast_report);
> 
> 
>> sub print_blast_results{
>>    my $report = shift;
>>    while(my $result = $report->next_result()){
> 
> [snip]
> 
> You didn't give us your sort_results subroutine, but is it as simple as 
> they both use $report->next_result (and/or $result->next_hit), but you 
> don't reset the internal counter back to the start, so the second 
> subroutine tries to get the next_result and finds the first subroutine 
> has already looked at the last result and so next_result returns false?
> 
>  From a quick look it wasn't obvious how to reset the counter. Hopefully 
> this can be done and someone else knows how.
> 


From lstein at cshl.edu  Wed May 31 11:17:39 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 11:17:39 -0400
Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg
	values
In-Reply-To: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>
References: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>
Message-ID: <200605311117.41479.lstein@cshl.edu>

Hi Kevin,

Since you are modifying the Panel.pm source code, why don't you just go ahead 
and use the current Bio::Graphics development tree? Since 1.5.1 it supports 
negative coordinates. Here's an illustration:

 #!/usr/bin/perl

 use strict;

 use Bio::Graphics;
 use Bio::Graphics::Feature;

 my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
 my $feature = 
Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
 my $panel   = Bio::Graphics::Panel->new(-start=> -200,
					 -end  => +200,
					 -width=>800,
					 -pad_left=>10,
					 -pad_right=>10);
 $panel->add_track($whole,
		   -glyph=>'arrow',
		   -double=>1,
		   -tick=>2);
 $panel->add_track($feature,
	 	  -glyph=>'box',
		   -stranded=>1);
 print $panel->png;

 exit 0;

The resulting image is attached.

Lincoln

On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> I am so sorry for the truncated email accidentally hit reply.
> if anyone is interested i have opted to change
>
> change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> in linux its
> /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
>
>
>       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
>
> to
>
>       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
>
> just  for this one-off use.
>
>
>
> strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> option for coords offset?
>     my $relative_coords_offset = $self->option('relative_coords_offset');
>     $relative_coords_offset    = 1 unless defined $relative_coords_offset;
> but entering the option -relative_coords_offset=>1000 in the arrow glyphs
> didn't do anything...
>
>
>
> Hi!
>
> > oh it was in a slightly different header asking about the create image
> > map feature.
> > I am using the stable version 1.4 of bioperl now. In any case I have not
> > added the sequence as a feature annotated seq. as I already have the bp
> > where the TF binds (in 1-1050 numberings) so what I did was to just add
> > graded segments based on the position.
> > I saw that there is a scale function for the arrow glyp however, it is a
> > multiply function, can it be hacked to take in a offset value (ie minus
> > the
> > scale by 1000?)
> >
> > cheers
> > kevin
> >
> >
> > Hi,
> >
> > > For some reason I didn't see the first posting on this. In current
> >
> > bioperl
> >
> > > live, the ruler can have negative numberings - I use this routinely.
> > > You need
> > > to create a feature that starts in negative coordinates. What is
> >
> > happening
> >
> > > to
> > > you when you try this?
> > >
> > > Lincoln
> > >
> > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > Hi
> > > > thanks for the help offered thus far!
> > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > >
> > > using
> > >
> > > > bioperl. therefore i was asked to make the numberings as such (-1000)
> >
> > is
> >
> > > > there any way at all to do this in bioperl without changing the .pm
> > >
> > > file?
> > >
> > > > thanks guys..
> > > > kevin
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: negatives.png
Type: image/png
Size: 1065 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/eaeb5e28/attachment.png 

From lstein at cshl.edu  Wed May 31 12:05:47 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 12:05:47 -0400
Subject: [Bioperl-l] Fwd: Re: SOLVED Bio::Graphics::Panel make ruler have
	neg values
Message-ID: <200605311205.48122.lstein@cshl.edu>

Oddly, bioperl-l listserver is holding this mail because it has "a suspicious 
header". I took out Kevin's email address in case it is the "spammotel" 
header that is bothering it.

Lincoln

----------  Forwarded Message  ----------

Subject: Re: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg 
values
Date: Wednesday 31 May 2006 11:17
From: Lincoln Stein <lstein at cshl.edu>
To: bioperl-l at lists.open-bio.org
Cc: "Kevin Lam Koiyau" <ULNJUJERYDIX at spammotel.com>

Hi Kevin,

Since you are modifying the Panel.pm source code, why don't you just go ahead
and use the current Bio::Graphics development tree? Since 1.5.1 it supports
negative coordinates. Here's an illustration:

 #!/usr/bin/perl

 use strict;

 use Bio::Graphics;
 use Bio::Graphics::Feature;

 my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
 my $feature =
Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
 my $panel   = Bio::Graphics::Panel->new(-start=> -200,
					 -end  => +200,
					 -width=>800,
					 -pad_left=>10,
					 -pad_right=>10);
 $panel->add_track($whole,
		   -glyph=>'arrow',
		   -double=>1,
		   -tick=>2);
 $panel->add_track($feature,
	 	  -glyph=>'box',
		   -stranded=>1);
 print $panel->png;

 exit 0;

The resulting image is attached.

Lincoln

On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> I am so sorry for the truncated email accidentally hit reply.
> if anyone is interested i have opted to change
>
> change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> in linux its
> /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
>
>
>       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
>
> to
>
>       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
>
> just  for this one-off use.
>
>
>
> strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> option for coords offset?
>     my $relative_coords_offset = $self->option('relative_coords_offset');
>     $relative_coords_offset    = 1 unless defined $relative_coords_offset;
> but entering the option -relative_coords_offset=>1000 in the arrow glyphs
> didn't do anything...
>
>
>
> Hi!
>
> > oh it was in a slightly different header asking about the create image
> > map feature.
> > I am using the stable version 1.4 of bioperl now. In any case I have not
> > added the sequence as a feature annotated seq. as I already have the bp
> > where the TF binds (in 1-1050 numberings) so what I did was to just add
> > graded segments based on the position.
> > I saw that there is a scale function for the arrow glyp however, it is a
> > multiply function, can it be hacked to take in a offset value (ie minus
> > the
> > scale by 1000?)
> >
> > cheers
> > kevin
> >
> >
> > Hi,
> >
> > > For some reason I didn't see the first posting on this. In current
> >
> > bioperl
> >
> > > live, the ruler can have negative numberings - I use this routinely.
> > > You need
> > > to create a feature that starts in negative coordinates. What is
> >
> > happening
> >
> > > to
> > > you when you try this?
> > >
> > > Lincoln
> > >
> > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > Hi
> > > > thanks for the help offered thus far!
> > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > >
> > > using
> > >
> > > > bioperl. therefore i was asked to make the numberings as such (-1000)
> >
> > is
> >
> > > > there any way at all to do this in bioperl without changing the .pm
> > >
> > > file?
> > >
> > > > thanks guys..
> > > > kevin
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

-------------------------------------------------------

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: negatives.png
Type: image/png
Size: 1065 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/6c5f4137/attachment.png 

From rvosa at sfu.ca  Tue May 30 15:10:17 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 12:10:17 -0700
Subject: [Bioperl-l] New mailing list for Bio::Phylo
Message-ID: <447C9899.5060102@sfu.ca>

Dear recipients,

the open bioinformatics foundation has been kind enough to host a 
mailing list for Bio::Phylo (http://search.cpan.org/~rvosa/Bio-Phylo/, 
the cpan distribution for phylogenetic analysis using perl).

The scope of this list is at present fairly broad as it is both meant 
for user questions and development discussion on deeper integration with 
bioperl.

You are invited to sign up at: 
http://lists.open-bio.org/mailman/listinfo/bio-phylo-l

Best wishes,

Rutger Vos

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From bioperlanand at yahoo.com  Mon May  1 14:36:20 2006
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Mon, 1 May 2006 11:36:20 -0700 (PDT)
Subject: [Bioperl-l] how to obtain GIs from clone_ids
Message-ID: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com>


Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry) 
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions..

Thanks in advance.

Anand

		
---------------------------------
Blab-away for as little as 1?/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.


From cuiw at mail.nih.gov  Mon May  1 15:39:01 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Mon, 1 May 2006 15:39:01 -0400
Subject: [Bioperl-l] how to obtain GIs from clone_ids
In-Reply-To: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F48B0@nihcesmlbx10.nih.gov>

use strict;
use Bio::DB::Query::GenBank;

my $query_string = 'EST["C0005918b04"]';   
my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',                                           
					 -query=>$query_string,				       
					);   
my $count = $query->count;


my @ids   = $query->ids;  


for (@ids) {
  print;
}

-----Original Message-----
From: Anand Venkatraman [mailto:bioperlanand at yahoo.com] 
Sent: Monday, May 01, 2006 2:36 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] how to obtain GIs from clone_ids


Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry)
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions..

Thanks in advance.

Anand

		
---------------------------------
Blab-away for as little as 1?/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From s.ryazansky at gmail.com  Mon May  1 17:55:13 2006
From: s.ryazansky at gmail.com (Sergei Ryazansky)
Date: Mon, 1 May 2006 21:55:13 +0000 (UTC)
Subject: [Bioperl-l] blast program to run locally on windows
References: <007c01c66883$61f29490$15327e82@pyrimidine>
	<20060425215433.35436.qmail@web36613.mail.mud.yahoo.com>
Message-ID: <loom.20060501T235327-11@post.gmane.org>

Hi,
Can you post your formatdb.log file here?


From cjfields at uiuc.edu  Tue May  2 00:15:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 1 May 2006 23:15:19 -0500
Subject: [Bioperl-l] blast program to run locally on windows
In-Reply-To: <loom.20060501T235327-11@post.gmane.org>
References: <007c01c66883$61f29490$15327e82@pyrimidine>
	<20060425215433.35436.qmail@web36613.mail.mud.yahoo.com>
	<loom.20060501T235327-11@post.gmane.org>
Message-ID: <D54C8321-6A9C-4674-8C7E-5452DEF84599@uiuc.edu>

We managed to work our way through it.  He hadn't set ncbi.ini to the  
correct directories; the database was formatted correctly.

Chris

On May 1, 2006, at 4:55 PM, Sergei Ryazansky wrote:

> Hi,
> Can you post your formatdb.log file here?
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue May  2 12:19:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 2 May 2006 11:19:34 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and complexity
Message-ID: <000901c66e04$33e07370$15327e82@pyrimidine>

I ran into some wonkiness with using extra parameters ('seq_start',
'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
gone through, fixed, and committed.  I also have added a few tests to DB.t
for everything (all changes were in Bio::DB::WebDBSeqI and
Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
manage to get it added as well (with tests).  This is how NCBI defines
complexity:

complexity regulates the display:
0 - get the whole blob
1 - get the bioseq for gi of interest (default in Entrez)
2 - get the minimal bioseq-set containing the gi of interest
3 - get the minimal nuc-prot containing the gi of interest
4 - get the minimal pub-set containing the gi of interest

Here's my quandary; when setting complexity to '0', you get a glob back (the
main sequence as well as any subsequences, such as CDS); this is in essence
a sequence stream with multiple alphabet types.  So, I now have it set up to
do this:

my $factory = Bio::DB::GenBank->new(-format => 'fasta',
                                    -complexity => 0
                                   );

my $seqin = $factory->get_Seq_by_acc($acc);

while (my $seq = $seqin->next_seq) {
    $seqout->write_seq($seq);
}

since I thought returning an array would be horrendously expensive on
memory, esp. with larger sequences.  Currently this is only set up for
sequences which are retrieved when complexity is set to '0' so it's a pretty
unique case.  Regardless, I'm worried that, since users expect a Bio::Seq
object instead of a Bio::SeqIO object here, it will cause a lot of confusion
with the API.  Any suggestions/gripes?

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From mamillerpa at yahoo.com  Tue May  2 07:41:01 2006
From: mamillerpa at yahoo.com (Mark A. Miller)
Date: Tue, 2 May 2006 04:41:01 -0700 (PDT)
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines
Message-ID: <20060502114101.29745.qmail@web50409.mail.yahoo.com>

Hello all.

I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
make FASTA subset files for some bacterial strains.  I haven't been
able to parse out the strain information from the OS or RC lines. 
These lines typically look like:

OS Somegenus somespecies subsp. somesubspecies strain ABC123.
RC STRAIN=ABC123.

I'm not especiialy good with Perl, and I'm definitely weak when it
comes to OOP.

I have included some code I pasted together from various pages on the
bioperl wiki.  In addition to the wiki, I have been making use of 
www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html

The code I have so far reports the species but not the subspecies or
variant.  I have also tried to walk through all of the feature,
annotation and reference objects but I still can't seem to parse out
the information I need.  (For brevity, the example I'm including below
only lists the code I used for the annotation objects.)  Also, this
code only prints the information...  I know that I'll have to write a
FASTA sequence object seperately.

Any suggestions?

Thanks,
Mark

---   ---   ---


#!/usr/bin/perl


use Bio::SeqIO;


my $usage = "getaccs.pl file format\n";

my $file = shift or die $usage;

my $format = shift or die $usage;


my $inseq = Bio::SeqIO->new(-file   => "<$file",

   -format => $format );


while (my $seq = $inseq->next_seq) {


  my $species_object = $seq->species;

  my $species_string = $species_object->species;

  my $variant_string = $species_object->variant;

  my $common_string = $species_object->common_name;

  my $sub_string = $species_object->sub_species;

  my $binomial = $species_object->binomial('FULL');

  
  print "display   ",$seq->display_id,"\n";

  print "accession ",$seq->accession_number,"\n";

  print "desc      ",$seq->desc,"\n";

  
  print "species   ",$species_string,"\n";

  print "variant   ",$variant_string,"\n";

  print "common    ",$common_string,"\n";

  print "sub       ",$sub_string,"\n";

  print "binomial  ",$binomial,"\n";

  
  print $seq->seq,"\n";

  
  my $anno_collection = $seq->annotation;

  for my $key ( $anno_collection->get_all_annotation_keys ) {

    my @annotations = $anno_collection->get_Annotations($key);

    for my $value ( @annotations ) {

      print "tagname : ", $value->tagname, "\n";

      # $value is an Bio::Annotation, and has an "as_text" method

      print "  annotation value: ", $value->as_text, "\n";


       if ($value->tagname eq "reference") {

        my $hash_ref = $value->hash_tree;

        for my $key (keys %{$hash_ref}) {

          print $key,": ",$hash_ref->{$key},"\n";

        }

      }

    }

  }

  print "\n";

}

exit;


---   ---   ---   ---   ---   ---   ---   ---

Mark A. Miller

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  2 14:01:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 2 May 2006 13:01:58 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and complexity
In-Reply-To: <000901c66e04$33e07370$15327e82@pyrimidine>
Message-ID: <000a01c66e12$8131a960$15327e82@pyrimidine>

I hate responding to my own post!  Just wanted to add that I'm adding a
warnings for the get_Seq* methods to use the approp. get_Stream* method when
complexity == 0 before returning the Bio::SeqIO object.

CJF

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, May 02, 2006 11:20 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::GenBank and complexity
> 
> I ran into some wonkiness with using extra parameters ('seq_start',
> 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
> gone through, fixed, and committed.  I also have added a few tests to DB.t
> for everything (all changes were in Bio::DB::WebDBSeqI and
> Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
> manage to get it added as well (with tests).  This is how NCBI defines
> complexity:
> 
> complexity regulates the display:
> 0 - get the whole blob
> 1 - get the bioseq for gi of interest (default in Entrez)
> 2 - get the minimal bioseq-set containing the gi of interest
> 3 - get the minimal nuc-prot containing the gi of interest
> 4 - get the minimal pub-set containing the gi of interest
> 
> Here's my quandary; when setting complexity to '0', you get a glob back
> (the
> main sequence as well as any subsequences, such as CDS); this is in
> essence
> a sequence stream with multiple alphabet types.  So, I now have it set up
> to
> do this:
> 
> my $factory = Bio::DB::GenBank->new(-format => 'fasta',
>                                     -complexity => 0
>                                    );
> 
> my $seqin = $factory->get_Seq_by_acc($acc);
> 
> while (my $seq = $seqin->next_seq) {
>     $seqout->write_seq($seq);
> }
> 
> since I thought returning an array would be horrendously expensive on
> memory, esp. with larger sequences.  Currently this is only set up for
> sequences which are retrieved when complexity is set to '0' so it's a
> pretty
> unique case.  Regardless, I'm worried that, since users expect a Bio::Seq
> object instead of a Bio::SeqIO object here, it will cause a lot of
> confusion
> with the API.  Any suggestions/gripes?
> 
> Chris
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Tue May  2 14:36:08 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 2 May 2006 14:36:08 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
References: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
Message-ID: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>

This is really a limitation of the EMBL/GenBank format

See this thread:
http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html

or on GMANE
http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557

I don't know if any of this has been resolved really so hopefully  
James will speak up if he's implemented anything.

-jason
On May 2, 2006, at 7:41 AM, Mark A. Miller wrote:

> Hello all.
>
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
>
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
>
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
>
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
>
> The code I have so far reports the species but not the subspecies or
> variant.  I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need.  (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.)  Also, this
> code only prints the information...  I know that I'll have to write a
> FASTA sequence object seperately.
>
> Any suggestions?
>
> Thanks,
> Mark
>
> ---   ---   ---
>
>
> #!/usr/bin/perl
>
>
>
> use Bio::SeqIO;
>
>
>
> my $usage = "getaccs.pl file format\n";
>
> my $file = shift or die $usage;
>
> my $format = shift or die $usage;
>
>
>
> my $inseq = Bio::SeqIO->new(-file   => "<$file",
>
>    -format => $format );
>
>
>
> while (my $seq = $inseq->next_seq) {
>
>
>
>   my $species_object = $seq->species;
>
>   my $species_string = $species_object->species;
>
>   my $variant_string = $species_object->variant;
>
>   my $common_string = $species_object->common_name;
>
>   my $sub_string = $species_object->sub_species;
>
>   my $binomial = $species_object->binomial('FULL');
>
>
>
>   print "display   ",$seq->display_id,"\n";
>
>   print "accession ",$seq->accession_number,"\n";
>
>   print "desc      ",$seq->desc,"\n";
>
>
>
>   print "species   ",$species_string,"\n";
>
>   print "variant   ",$variant_string,"\n";
>
>   print "common    ",$common_string,"\n";
>
>   print "sub       ",$sub_string,"\n";
>
>   print "binomial  ",$binomial,"\n";
>
>
>
>   print $seq->seq,"\n";
>
>
>
>   my $anno_collection = $seq->annotation;
>
>   for my $key ( $anno_collection->get_all_annotation_keys ) {
>
>     my @annotations = $anno_collection->get_Annotations($key);
>
>     for my $value ( @annotations ) {
>
>       print "tagname : ", $value->tagname, "\n";
>
>       # $value is an Bio::Annotation, and has an "as_text" method
>
>       print "  annotation value: ", $value->as_text, "\n";
>
>
>
>        if ($value->tagname eq "reference") {
>
>         my $hash_ref = $value->hash_tree;
>
>         for my $key (keys %{$hash_ref}) {
>
>           print $key,": ",$hash_ref->{$key},"\n";
>
>         }
>
>       }
>
>     }
>
>   }
>
>   print "\n";
>
> }
>
> exit;
>
>
>
>
>
> ---   ---   ---   ---   ---   ---   ---   ---
>
> Mark A. Miller
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From mblanche at berkeley.edu  Tue May  2 15:30:49 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 12:30:49 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
Message-ID: <C07D0179.2183%mblanche@berkeley.edu>

Dear all--

I have been trying to use the intersection function to extract overlapping
region from alternatively spliced exons as in the following script. The
returned object from the 'my $overlap = $exon1->intersection($exon2);' is
actually loosing the strand of $exon1 if $exon1 is from the negative strand.
Is this behavior expected? Should I check the strand of $exon1 before
working on the object return by any Bio::RangeI function?

Many thanks 

#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;

MAIN:{

    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                -dsn =>
'dbi:mysql:database=dmel_43_LS;host=riolab.net',
                                -user => 'guest');
    my $test_db = $db->segment('4');
    
    # Load up the exons into $exons_p
    for my $gene ($test_db->features(-types => 'gene')){

        my $exons_p = extractExons($gene);

        cluster($exons_p) unless ($#{$exons_p} == -1);

    }
}

sub extractExons {
    my $gene = shift;
    my %ex_list;
    my @tcs = $gene->features(    -type =>'processed_transcript',
                                    -attributes =>{Gene => $gene->group});
                   
    for my $tc (@tcs){
        my @exons = $tc->features (-type => 'exon',
                                     -attributes => {Parent => $tc->group}
);        
    
        for (@exons){
            my $ex_id    = $_->id;
            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});

        }
    
    }
    my @values = values %ex_list;
    return(\@values);
}

sub cluster {
    my $exons_p = shift;
    
    for (my $s = 0; $s <= $#{$exons_p}; $s++){
        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
            my $exon1 = $exons_p->[$s];
            my $exon2 = $exons_p->[$t];
            
            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
            
                my $overlap = $exon1->intersection($exon2);
                
                print "===\n";;
                print "ex1\n", $exon1->seq, "\n";
                print "ex2\n", $exon2->seq, "\n";
                print "overlap\n", $overlap->seq, "\n";
            }
        }
    }
}
______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From osborne1 at optonline.net  Tue May  2 16:17:29 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 02 May 2006 16:17:29 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D0179.2183%mblanche@berkeley.edu>
Message-ID: <C07D3699.84BC%osborne1@optonline.net>

Marco,

Yes, this is how intersection() is supposed to work. If both of the Range
objects have the same strand then the strand information is returned as part
of the result but if they aren't on the same strand then no strand
information is returned.

Brian O.


On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:

> Dear all--
> 
> I have been trying to use the intersection function to extract overlapping
> region from alternatively spliced exons as in the following script. The
> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
> Is this behavior expected? Should I check the strand of $exon1 before
> working on the object return by any Bio::RangeI function?
> 
> Many thanks 
> 
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
> 
> MAIN:{
> 
>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                 -dsn =>
> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                 -user => 'guest');
>     my $test_db = $db->segment('4');
>     
>     # Load up the exons into $exons_p
>     for my $gene ($test_db->features(-types => 'gene')){
> 
>         my $exons_p = extractExons($gene);
> 
>         cluster($exons_p) unless ($#{$exons_p} == -1);
> 
>     }
> }
> 
> sub extractExons {
>     my $gene = shift;
>     my %ex_list;
>     my @tcs = $gene->features(    -type =>'processed_transcript',
>                                     -attributes =>{Gene => $gene->group});
>                  
>     for my $tc (@tcs){
>         my @exons = $tc->features (-type => 'exon',
>                                      -attributes => {Parent => $tc->group}
> );        
>     
>         for (@exons){
>             my $ex_id    = $_->id;
>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> 
>         }
>     
>     }
>     my @values = values %ex_list;
>     return(\@values);
> }
> 
> sub cluster {
>     my $exons_p = shift;
>     
>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>             my $exon1 = $exons_p->[$s];
>             my $exon2 = $exons_p->[$t];
>             
>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>             
>                 my $overlap = $exon1->intersection($exon2);
>                 
>                 print "===\n";;
>                 print "ex1\n", $exon1->seq, "\n";
>                 print "ex2\n", $exon2->seq, "\n";
>                 print "overlap\n", $overlap->seq, "\n";
>             }
>         }
>     }
> }
> ______________________________
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062


From mblanche at berkeley.edu  Tue May  2 16:32:58 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 13:32:58 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D3699.84BC%osborne1@optonline.net>
Message-ID: <C07D100A.218A%mblanche@berkeley.edu>

Brian--

Even when both elements of intersection() are from the negative strand, the
return object is from the positive strand and $overlap is actually the
revervese complement of the intersection between the 2 exons. Here is part
of the output from the script below:

===
ex1     Strand: -1
CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
ex2     Strand: -1
CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
CAAATCG
overlap Strand: 1
CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
TGCCGACTGCCATGTTCAACTAATAAACCGG
AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
...

If both are from the positive strand, the return object is positive as in:

===
ex1     Strand: 1
CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
TTTGTGCCTGTTTCAGTATAAATTAATTATG
CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
AAATATACATATATGCAACATATATAACTTC
CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
ex2     Strand: 1
ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
overlap Strand: 1
CAACGCAGACGTG

Is there something I am missing? Here is the script generating the output

Many thanks all...

Marco


use strict;
use warnings;
use Bio::DB::GFF;

MAIN:{

    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                -dsn =>
'dbi:mysql:database=dmel_43_LS;host=riolab.net',
                                -user => 'guest');
    my $test_db = $db->segment('4');
    
    # Load up the exons into $exons_p
    for my $gene ($test_db->features(-types => 'gene')){

        my $exons_p = extractExons($gene);

        cluster($exons_p) unless ($#{$exons_p} == -1);

    }
}

sub extractExons {
    my $gene = shift;
    my %ex_list;
    my @tcs = $gene->features(    -type =>'processed_transcript',
                                    -attributes =>{Gene => $gene->group});
                   
    for my $tc (@tcs){
        my @exons = $tc->features (-type => 'exon',
                                     -attributes => {Parent => $tc->group}
);        
    
        for (@exons){
            my $ex_id    = $_->id;
            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});

        }
    
    }
    my @values = values %ex_list;
    return(\@values);
}

sub cluster {
    my $exons_p = shift;
    
    for (my $s = 0; $s <= $#{$exons_p}; $s++){
        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
            my $exon1 = $exons_p->[$s];
            my $exon2 = $exons_p->[$t];
            
            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
            
                my $overlap = $exon1->intersection($exon2);
                
                print "===\n";;
                print     "ex1\tStrand: ", $exon1->strand, "\n",
                        $exon1->seq, "\n";
                print     "ex2\tStrand: ", $exon2->strand, "\n",
                        $exon2->seq, "\n";
                print "overlap\tStrand: ", $overlap->strand, "\n",
                        $overlap->seq, "\n";
            }
        }
    }
}

On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:

> Marco,
> 
> Yes, this is how intersection() is supposed to work. If both of the Range
> objects have the same strand then the strand information is returned as part
> of the result but if they aren't on the same strand then no strand
> information is returned.
> 
> Brian O.
> 
> 
> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> 
>> Dear all--
>> 
>> I have been trying to use the intersection function to extract overlapping
>> region from alternatively spliced exons as in the following script. The
>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
>> Is this behavior expected? Should I check the strand of $exon1 before
>> working on the object return by any Bio::RangeI function?
>> 
>> Many thanks 
>> 
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> 
>> MAIN:{
>> 
>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                 -dsn =>
>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>                                 -user => 'guest');
>>     my $test_db = $db->segment('4');
>>     
>>     # Load up the exons into $exons_p
>>     for my $gene ($test_db->features(-types => 'gene')){
>> 
>>         my $exons_p = extractExons($gene);
>> 
>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>> 
>>     }
>> }
>> 
>> sub extractExons {
>>     my $gene = shift;
>>     my %ex_list;
>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>                                     -attributes =>{Gene => $gene->group});
>>                 
>>     for my $tc (@tcs){
>>         my @exons = $tc->features (-type => 'exon',
>>                                      -attributes => {Parent => $tc->group}
>> );        
>>     
>>         for (@exons){
>>             my $ex_id    = $_->id;
>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>> 
>>         }
>>     
>>     }
>>     my @values = values %ex_list;
>>     return(\@values);
>> }
>> 
>> sub cluster {
>>     my $exons_p = shift;
>>     
>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>             my $exon1 = $exons_p->[$s];
>>             my $exon2 = $exons_p->[$t];
>>             
>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>             
>>                 my $overlap = $exon1->intersection($exon2);
>>                 
>>                 print "===\n";;
>>                 print "ex1\n", $exon1->seq, "\n";
>>                 print "ex2\n", $exon2->seq, "\n";
>>                 print "overlap\n", $overlap->seq, "\n";
>>             }
>>         }
>>     }
>> }
>> ______________________________
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at uclink.berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
> 
> 

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From osborne1 at optonline.net  Tue May  2 17:49:49 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 02 May 2006 17:49:49 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D100A.218A%mblanche@berkeley.edu>
Message-ID: <C07D4C3D.84C4%osborne1@optonline.net>

Marco,

Odd, because the intersection() code is quite simple and it's clear how it
should behave. What version of Bioperl are you using? I'm looking at the
latest, in bioperl-live...

Brian O.


On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:

> Brian--
> 
> Even when both elements of intersection() are from the negative strand, the
> return object is from the positive strand and $overlap is actually the
> revervese complement of the intersection between the 2 exons. Here is part
> of the output from the script below:
> 
> ===
> ex1     Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
> ex2     Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
> CAAATCG
> overlap Strand: 1
> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
> TGCCGACTGCCATGTTCAACTAATAAACCGG
> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
> ...
> 
> If both are from the positive strand, the return object is positive as in:
> 
> ===
> ex1     Strand: 1
> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
> TTTGTGCCTGTTTCAGTATAAATTAATTATG
> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
> AAATATACATATATGCAACATATATAACTTC
> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
> ex2     Strand: 1
> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
> overlap Strand: 1
> CAACGCAGACGTG
> 
> Is there something I am missing? Here is the script generating the output
> 
> Many thanks all...
> 
> Marco
> 
> 
> use strict;
> use warnings;
> use Bio::DB::GFF;
> 
> MAIN:{
> 
>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                 -dsn =>
> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                 -user => 'guest');
>     my $test_db = $db->segment('4');
>     
>     # Load up the exons into $exons_p
>     for my $gene ($test_db->features(-types => 'gene')){
> 
>         my $exons_p = extractExons($gene);
> 
>         cluster($exons_p) unless ($#{$exons_p} == -1);
> 
>     }
> }
> 
> sub extractExons {
>     my $gene = shift;
>     my %ex_list;
>     my @tcs = $gene->features(    -type =>'processed_transcript',
>                                     -attributes =>{Gene => $gene->group});
>                  
>     for my $tc (@tcs){
>         my @exons = $tc->features (-type => 'exon',
>                                      -attributes => {Parent => $tc->group}
> );        
>     
>         for (@exons){
>             my $ex_id    = $_->id;
>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> 
>         }
>     
>     }
>     my @values = values %ex_list;
>     return(\@values);
> }
> 
> sub cluster {
>     my $exons_p = shift;
>     
>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>             my $exon1 = $exons_p->[$s];
>             my $exon2 = $exons_p->[$t];
>             
>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>             
>                 my $overlap = $exon1->intersection($exon2);
>                 
>                 print "===\n";;
>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
>                         $exon1->seq, "\n";
>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
>                         $exon2->seq, "\n";
>                 print "overlap\tStrand: ", $overlap->strand, "\n",
>                         $overlap->seq, "\n";
>             }
>         }
>     }
> }
> 
> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
> 
>> Marco,
>> 
>> Yes, this is how intersection() is supposed to work. If both of the Range
>> objects have the same strand then the strand information is returned as part
>> of the result but if they aren't on the same strand then no strand
>> information is returned.
>> 
>> Brian O.
>> 
>> 
>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>> 
>>> Dear all--
>>> 
>>> I have been trying to use the intersection function to extract overlapping
>>> region from alternatively spliced exons as in the following script. The
>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
>>> Is this behavior expected? Should I check the strand of $exon1 before
>>> working on the object return by any Bio::RangeI function?
>>> 
>>> Many thanks 
>>> 
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::DB::GFF;
>>> 
>>> MAIN:{
>>> 
>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>                                 -dsn =>
>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>>                                 -user => 'guest');
>>>     my $test_db = $db->segment('4');
>>>     
>>>     # Load up the exons into $exons_p
>>>     for my $gene ($test_db->features(-types => 'gene')){
>>> 
>>>         my $exons_p = extractExons($gene);
>>> 
>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>>> 
>>>     }
>>> }
>>> 
>>> sub extractExons {
>>>     my $gene = shift;
>>>     my %ex_list;
>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>>                                     -attributes =>{Gene => $gene->group});
>>>                
>>>     for my $tc (@tcs){
>>>         my @exons = $tc->features (-type => 'exon',
>>>                                      -attributes => {Parent => $tc->group}
>>> );        
>>>     
>>>         for (@exons){
>>>             my $ex_id    = $_->id;
>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>> 
>>>         }
>>>     
>>>     }
>>>     my @values = values %ex_list;
>>>     return(\@values);
>>> }
>>> 
>>> sub cluster {
>>>     my $exons_p = shift;
>>>     
>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>>             my $exon1 = $exons_p->[$s];
>>>             my $exon2 = $exons_p->[$t];
>>>             
>>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>             
>>>                 my $overlap = $exon1->intersection($exon2);
>>>                
>>>                 print "===\n";;
>>>                 print "ex1\n", $exon1->seq, "\n";
>>>                 print "ex2\n", $exon2->seq, "\n";
>>>                 print "overlap\n", $overlap->seq, "\n";
>>>             }
>>>         }
>>>     }
>>> }
>>> ______________________________
>>> Marco Blanchette, Ph.D.
>>> 
>>> mblanche at uclink.berkeley.edu
>>> 
>>> Donald C. Rio's lab
>>> Department of Molecular and Cell Biology
>>> 16 Barker Hall
>>> University of California
>>> Berkeley, CA 94720-3204
>>> 
>>> Tel: (510) 642-1084
>>> Cell: (510) 847-0996
>>> Fax: (510) 642-6062
>> 
>> 
> 
> ______________________________
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062


From mblanche at berkeley.edu  Tue May  2 18:31:44 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 15:31:44 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D4C3D.84C4%osborne1@optonline.net>
Message-ID: <C07D2BE0.2196%mblanche@berkeley.edu>

Brian--

I checked out last week version from the CVS.

Silly question: How do I get the version of BioPerl I am using... Never had
to check a module/bundle version number before...

Marco


On 5/2/06 14:49, "Brian Osborne" <osborne1 at optonline.net> wrote:

> Marco,
> 
> Odd, because the intersection() code is quite simple and it's clear how it
> should behave. What version of Bioperl are you using? I'm looking at the
> latest, in bioperl-live...
> 
> Brian O.
> 
> 
> On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> 
>> Brian--
>> 
>> Even when both elements of intersection() are from the negative strand, the
>> return object is from the positive strand and $overlap is actually the
>> revervese complement of the intersection between the 2 exons. Here is part
>> of the output from the script below:
>> 
>> ===
>> ex1     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
>> ex2     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
>> CAAATCG
>> overlap Strand: 1
>> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
>> TGCCGACTGCCATGTTCAACTAATAAACCGG
>> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
>> ...
>> 
>> If both are from the positive strand, the return object is positive as in:
>> 
>> ===
>> ex1     Strand: 1
>> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
>> TTTGTGCCTGTTTCAGTATAAATTAATTATG
>> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
>> AAATATACATATATGCAACATATATAACTTC
>> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
>> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
>> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
>> ex2     Strand: 1
>> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
>> overlap Strand: 1
>> CAACGCAGACGTG
>> 
>> Is there something I am missing? Here is the script generating the output
>> 
>> Many thanks all...
>> 
>> Marco
>> 
>> 
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> 
>> MAIN:{
>> 
>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                 -dsn =>
>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>                                 -user => 'guest');
>>     my $test_db = $db->segment('4');
>>     
>>     # Load up the exons into $exons_p
>>     for my $gene ($test_db->features(-types => 'gene')){
>> 
>>         my $exons_p = extractExons($gene);
>> 
>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>> 
>>     }
>> }
>> 
>> sub extractExons {
>>     my $gene = shift;
>>     my %ex_list;
>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>                                     -attributes =>{Gene => $gene->group});
>>                 
>>     for my $tc (@tcs){
>>         my @exons = $tc->features (-type => 'exon',
>>                                      -attributes => {Parent => $tc->group}
>> );        
>>     
>>         for (@exons){
>>             my $ex_id    = $_->id;
>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>> 
>>         }
>>     
>>     }
>>     my @values = values %ex_list;
>>     return(\@values);
>> }
>> 
>> sub cluster {
>>     my $exons_p = shift;
>>     
>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>             my $exon1 = $exons_p->[$s];
>>             my $exon2 = $exons_p->[$t];
>>             
>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>             
>>                 my $overlap = $exon1->intersection($exon2);
>>                 
>>                 print "===\n";;
>>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
>>                         $exon1->seq, "\n";
>>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
>>                         $exon2->seq, "\n";
>>                 print "overlap\tStrand: ", $overlap->strand, "\n",
>>                         $overlap->seq, "\n";
>>             }
>>         }
>>     }
>> }
>> 
>> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
>> 
>>> Marco,
>>> 
>>> Yes, this is how intersection() is supposed to work. If both of the Range
>>> objects have the same strand then the strand information is returned as part
>>> of the result but if they aren't on the same strand then no strand
>>> information is returned.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>>> 
>>>> Dear all--
>>>> 
>>>> I have been trying to use the intersection function to extract overlapping
>>>> region from alternatively spliced exons as in the following script. The
>>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>>> actually loosing the strand of $exon1 if $exon1 is from the negative
>>>> strand.
>>>> Is this behavior expected? Should I check the strand of $exon1 before
>>>> working on the object return by any Bio::RangeI function?
>>>> 
>>>> Many thanks 
>>>> 
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use Bio::DB::GFF;
>>>> 
>>>> MAIN:{
>>>> 
>>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>>                                 -dsn =>
>>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>>>                                 -user => 'guest');
>>>>     my $test_db = $db->segment('4');
>>>>     
>>>>     # Load up the exons into $exons_p
>>>>     for my $gene ($test_db->features(-types => 'gene')){
>>>> 
>>>>         my $exons_p = extractExons($gene);
>>>> 
>>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>>>> 
>>>>     }
>>>> }
>>>> 
>>>> sub extractExons {
>>>>     my $gene = shift;
>>>>     my %ex_list;
>>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>>>                                     -attributes =>{Gene => $gene->group});
>>>>               
>>>>     for my $tc (@tcs){
>>>>         my @exons = $tc->features (-type => 'exon',
>>>>                                      -attributes => {Parent => $tc->group}
>>>> );        
>>>>     
>>>>         for (@exons){
>>>>             my $ex_id    = $_->id;
>>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>>> 
>>>>         }
>>>>     
>>>>     }
>>>>     my @values = values %ex_list;
>>>>     return(\@values);
>>>> }
>>>> 
>>>> sub cluster {
>>>>     my $exons_p = shift;
>>>>     
>>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>>>             my $exon1 = $exons_p->[$s];
>>>>             my $exon2 = $exons_p->[$t];
>>>>             
>>>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>>             
>>>>                 my $overlap = $exon1->intersection($exon2);
>>>>               
>>>>                 print "===\n";;
>>>>                 print "ex1\n", $exon1->seq, "\n";
>>>>                 print "ex2\n", $exon2->seq, "\n";
>>>>                 print "overlap\n", $overlap->seq, "\n";
>>>>             }
>>>>         }
>>>>     }
>>>> }
>>>> ______________________________
>>>> Marco Blanchette, Ph.D.
>>>> 
>>>> mblanche at uclink.berkeley.edu
>>>> 
>>>> Donald C. Rio's lab
>>>> Department of Molecular and Cell Biology
>>>> 16 Barker Hall
>>>> University of California
>>>> Berkeley, CA 94720-3204
>>>> 
>>>> Tel: (510) 642-1084
>>>> Cell: (510) 847-0996
>>>> Fax: (510) 642-6062
>>> 
>>> 
>> 
>> ______________________________
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at uclink.berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
> 
> 

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From arareko at campus.iztacala.unam.mx  Tue May  2 18:32:24 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Tue, 02 May 2006 17:32:24 -0500
Subject: [Bioperl-l] BioPerl-run in FreeBSD
Message-ID: <4457DDF8.4050005@campus.iztacala.unam.mx>

It?s my great pleasure to announce the availability of the BioPerl-run 
packages (stable & developer releases) for the FreeBSD operating system.

For instructions on how to install BioPerl ports in FreeBSD, please take 
a look into the Getting Bioperl section of the BioPerl Wiki.

Regards,
Mauricio.
-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From heikki at sanbi.ac.za  Wed May  3 02:51:12 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 3 May 2006 08:51:12 +0200
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D2BE0.2196%mblanche@berkeley.edu>
References: <C07D2BE0.2196%mblanche@berkeley.edu>
Message-ID: <200605030851.13007.heikki@sanbi.ac.za>

On Wednesday 03 May 2006 00:31, Marco Blanchette wrote:
> Brian--
>
> I checked out last week version from the CVS.
>
> Silly question: How do I get the version of BioPerl I am using... Never had
> to check a module/bundle version number before...

It is not that silly. The syntax in not too easy:

	perl -MBio::Perl -le 'print Bio::Perl->VERSION;'

You can use any module in bioperl, of course.

     -Heikki

> Marco
>
> On 5/2/06 14:49, "Brian Osborne" <osborne1 at optonline.net> wrote:
> > Marco,
> >
> > Odd, because the intersection() code is quite simple and it's clear how
> > it should behave. What version of Bioperl are you using? I'm looking at
> > the latest, in bioperl-live...
> >
> > Brian O.
> >
> > On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> >> Brian--
> >>
> >> Even when both elements of intersection() are from the negative strand,
> >> the return object is from the positive strand and $overlap is actually
> >> the revervese complement of the intersection between the 2 exons. Here
> >> is part of the output from the script below:
> >>
> >> ===
> >> ex1     Strand: -1
> >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA
> >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
> >> ex2     Strand: -1
> >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA
> >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAAC
> >>CCGT CAAATCG
> >> overlap Strand: 1
> >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTA
> >>TTTT TGCCGACTGCCATGTTCAACTAATAAACCGG
> >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
> >> ...
> >>
> >> If both are from the positive strand, the return object is positive as
> >> in:
> >>
> >> ===
> >> ex1     Strand: 1
> >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCT
> >>TTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG
> >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGAT
> >>GAAT AAATATACATATATGCAACATATATAACTTC
> >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGG
> >>CAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
> >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
> >> ex2     Strand: 1
> >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
> >> overlap Strand: 1
> >> CAACGCAGACGTG
> >>
> >> Is there something I am missing? Here is the script generating the
> >> output
> >>
> >> Many thanks all...
> >>
> >> Marco
> >>
> >>
> >> use strict;
> >> use warnings;
> >> use Bio::DB::GFF;
> >>
> >> MAIN:{
> >>
> >>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >>                                 -dsn =>
> >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
> >>                                 -user => 'guest');
> >>     my $test_db = $db->segment('4');
> >>
> >>     # Load up the exons into $exons_p
> >>     for my $gene ($test_db->features(-types => 'gene')){
> >>
> >>         my $exons_p = extractExons($gene);
> >>
> >>         cluster($exons_p) unless ($#{$exons_p} == -1);
> >>
> >>     }
> >> }
> >>
> >> sub extractExons {
> >>     my $gene = shift;
> >>     my %ex_list;
> >>     my @tcs = $gene->features(    -type =>'processed_transcript',
> >>                                     -attributes =>{Gene =>
> >> $gene->group});
> >>
> >>     for my $tc (@tcs){
> >>         my @exons = $tc->features (-type => 'exon',
> >>                                      -attributes => {Parent =>
> >> $tc->group} );
> >>
> >>         for (@exons){
> >>             my $ex_id    = $_->id;
> >>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> >>
> >>         }
> >>
> >>     }
> >>     my @values = values %ex_list;
> >>     return(\@values);
> >> }
> >>
> >> sub cluster {
> >>     my $exons_p = shift;
> >>
> >>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
> >>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
> >>             my $exon1 = $exons_p->[$s];
> >>             my $exon2 = $exons_p->[$t];
> >>
> >>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
> >>
> >>                 my $overlap = $exon1->intersection($exon2);
> >>
> >>                 print "===\n";;
> >>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
> >>                         $exon1->seq, "\n";
> >>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
> >>                         $exon2->seq, "\n";
> >>                 print "overlap\tStrand: ", $overlap->strand, "\n",
> >>                         $overlap->seq, "\n";
> >>             }
> >>         }
> >>     }
> >> }
> >>
> >> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
> >>> Marco,
> >>>
> >>> Yes, this is how intersection() is supposed to work. If both of the
> >>> Range objects have the same strand then the strand information is
> >>> returned as part of the result but if they aren't on the same strand
> >>> then no strand information is returned.
> >>>
> >>> Brian O.
> >>>
> >>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> >>>> Dear all--
> >>>>
> >>>> I have been trying to use the intersection function to extract
> >>>> overlapping region from alternatively spliced exons as in the
> >>>> following script. The returned object from the 'my $overlap =
> >>>> $exon1->intersection($exon2);' is actually loosing the strand of
> >>>> $exon1 if $exon1 is from the negative strand.
> >>>> Is this behavior expected? Should I check the strand of $exon1 before
> >>>> working on the object return by any Bio::RangeI function?
> >>>>
> >>>> Many thanks
> >>>>
> >>>> #!/usr/bin/perl
> >>>> use strict;
> >>>> use warnings;
> >>>> use Bio::DB::GFF;
> >>>>
> >>>> MAIN:{
> >>>>
> >>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >>>>                                 -dsn =>
> >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
> >>>>                                 -user => 'guest');
> >>>>     my $test_db = $db->segment('4');
> >>>>
> >>>>     # Load up the exons into $exons_p
> >>>>     for my $gene ($test_db->features(-types => 'gene')){
> >>>>
> >>>>         my $exons_p = extractExons($gene);
> >>>>
> >>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
> >>>>
> >>>>     }
> >>>> }
> >>>>
> >>>> sub extractExons {
> >>>>     my $gene = shift;
> >>>>     my %ex_list;
> >>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
> >>>>                                     -attributes =>{Gene =>
> >>>> $gene->group});
> >>>>
> >>>>     for my $tc (@tcs){
> >>>>         my @exons = $tc->features (-type => 'exon',
> >>>>                                      -attributes => {Parent =>
> >>>> $tc->group} );
> >>>>
> >>>>         for (@exons){
> >>>>             my $ex_id    = $_->id;
> >>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> >>>>
> >>>>         }
> >>>>
> >>>>     }
> >>>>     my @values = values %ex_list;
> >>>>     return(\@values);
> >>>> }
> >>>>
> >>>> sub cluster {
> >>>>     my $exons_p = shift;
> >>>>
> >>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
> >>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
> >>>>             my $exon1 = $exons_p->[$s];
> >>>>             my $exon2 = $exons_p->[$t];
> >>>>
> >>>>             if (!($exon1->equals($exon2)) &&
> >>>> $exon1->overlaps($exon2)){
> >>>>
> >>>>                 my $overlap = $exon1->intersection($exon2);
> >>>>
> >>>>                 print "===\n";;
> >>>>                 print "ex1\n", $exon1->seq, "\n";
> >>>>                 print "ex2\n", $exon2->seq, "\n";
> >>>>                 print "overlap\n", $overlap->seq, "\n";
> >>>>             }
> >>>>         }
> >>>>     }
> >>>> }
> >>>> ______________________________
> >>>> Marco Blanchette, Ph.D.
> >>>>
> >>>> mblanche at uclink.berkeley.edu
> >>>>
> >>>> Donald C. Rio's lab
> >>>> Department of Molecular and Cell Biology
> >>>> 16 Barker Hall
> >>>> University of California
> >>>> Berkeley, CA 94720-3204
> >>>>
> >>>> Tel: (510) 642-1084
> >>>> Cell: (510) 847-0996
> >>>> Fax: (510) 642-6062
> >>
> >> ______________________________
> >> Marco Blanchette, Ph.D.
> >>
> >> mblanche at uclink.berkeley.edu
> >>
> >> Donald C. Rio's lab
> >> Department of Molecular and Cell Biology
> >> 16 Barker Hall
> >> University of California
> >> Berkeley, CA 94720-3204
> >>
> >> Tel: (510) 642-1084
> >> Cell: (510) 847-0996
> >> Fax: (510) 642-6062
>
> ______________________________
> Marco Blanchette, Ph.D.
>
> mblanche at uclink.berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From nuclearn at gmail.com  Wed May  3 02:05:42 2006
From: nuclearn at gmail.com (Li Xiao)
Date: Wed, 3 May 2006 14:05:42 +0800
Subject: [Bioperl-l] about the frame and strand of a blastx report
Message-ID: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com>

Hi, anybody,

    I am working to parse a blastx report by using BioPerl modules
(Bio::SearchIO).
The blastx result was created by NCBI-BLAST. How i can obtain the strand ( +
or -)
of query sequence against the hited protein? I tried to use the strand
function, but
nothing were reported. And i used the frame funtion, the result usually
display 0,1,2,
so, the result can not give any information about the query strand( + o r-
).
  How i obtain the strand of a query squence?
--
*********************************************************************
Li Xiao
Sichuan Key Laboratory of Molecular Biology and Biotechnology
College of Life Science, Sichuan University
Chengdu, SiChuan, P.R.China
TEL:86-28-85470083 FAX:86-28-85412738
E-MAIL: nuclearn at gmail.com
URL: http://scbi.scu.edu.cn
**********************************************************************


From cjfields at uiuc.edu  Wed May  3 09:38:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 08:38:17 -0500
Subject: [Bioperl-l] about the frame and strand of a blastx report
In-Reply-To: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com>
Message-ID: <000601c66eb6$d5d5f530$15327e82@pyrimidine>

$hsp->strand():

my $parser = Bio::SearchIO->new (-file => shift @ARGV,
                                 -format => 'blast');

while (my $result = $parser->next_result) {
    while (my $hit = $result->next_hit) {
        while (my $hsp = $hit->next_hsp) {
            print $hsp->strand,"\n";
        }
    }
}

This will give 1 or -1.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Li Xiao
> Sent: Wednesday, May 03, 2006 1:06 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] about the frame and strand of a blastx report
> 
> Hi, anybody,
> 
>     I am working to parse a blastx report by using BioPerl modules
> (Bio::SearchIO).
> The blastx result was created by NCBI-BLAST. How i can obtain the strand (
> +
> or -)
> of query sequence against the hited protein? I tried to use the strand
> function, but
> nothing were reported. And i used the frame funtion, the result usually
> display 0,1,2,
> so, the result can not give any information about the query strand( + o r-
> ).
>   How i obtain the strand of a query squence?
> --
> *********************************************************************
> Li Xiao
> Sichuan Key Laboratory of Molecular Biology and Biotechnology
> College of Life Science, Sichuan University
> Chengdu, SiChuan, P.R.China
> TEL:86-28-85470083 FAX:86-28-85412738
> E-MAIL: nuclearn at gmail.com
> URL: http://scbi.scu.edu.cn
> **********************************************************************
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Wed May  3 11:22:27 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 03 May 2006 11:22:27 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
Message-ID: <C07E42F3.84E3%osborne1@optonline.net>

Mark,

So you're trying to get the information in the RC line from a Swissprot
format file?

Brian O.


On 5/2/06 7:41 AM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:

> Hello all.
> 
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
> 
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
> 
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
> 
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
> 
> The code I have so far reports the species but not the subspecies or
> variant.  I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need.  (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.)  Also, this
> code only prints the information...  I know that I'll have to write a
> FASTA sequence object seperately.
> 
> Any suggestions?
> 
> Thanks,
> Mark
> 
> ---   ---   ---
> 
> 
> #!/usr/bin/perl
> 
> 
> 
> use Bio::SeqIO;
> 
> 
> 
> my $usage = "getaccs.pl file format\n";
> 
> my $file = shift or die $usage;
> 
> my $format = shift or die $usage;
> 
> 
> 
> my $inseq = Bio::SeqIO->new(-file   => "<$file",
> 
>    -format => $format );
> 
> 
> 
> while (my $seq = $inseq->next_seq) {
> 
> 
> 
>   my $species_object = $seq->species;
> 
>   my $species_string = $species_object->species;
> 
>   my $variant_string = $species_object->variant;
> 
>   my $common_string = $species_object->common_name;
> 
>   my $sub_string = $species_object->sub_species;
> 
>   my $binomial = $species_object->binomial('FULL');
> 
>   
> 
>   print "display   ",$seq->display_id,"\n";
> 
>   print "accession ",$seq->accession_number,"\n";
> 
>   print "desc      ",$seq->desc,"\n";
> 
>   
> 
>   print "species   ",$species_string,"\n";
> 
>   print "variant   ",$variant_string,"\n";
> 
>   print "common    ",$common_string,"\n";
> 
>   print "sub       ",$sub_string,"\n";
> 
>   print "binomial  ",$binomial,"\n";
> 
>   
> 
>   print $seq->seq,"\n";
> 
>   
> 
>   my $anno_collection = $seq->annotation;
> 
>   for my $key ( $anno_collection->get_all_annotation_keys ) {
> 
>     my @annotations = $anno_collection->get_Annotations($key);
> 
>     for my $value ( @annotations ) {
> 
>       print "tagname : ", $value->tagname, "\n";
> 
>       # $value is an Bio::Annotation, and has an "as_text" method
> 
>       print "  annotation value: ", $value->as_text, "\n";
> 
> 
> 
>        if ($value->tagname eq "reference") {
> 
>         my $hash_ref = $value->hash_tree;
> 
>         for my $key (keys %{$hash_ref}) {
> 
>           print $key,": ",$hash_ref->{$key},"\n";
> 
>         }
> 
>       }
> 
>     }
> 
>   }
> 
>   print "\n";
> 
> }
> 
> exit;
> 
> 
> 
> 
> 
> ---   ---   ---   ---   ---   ---   ---   ---
> 
> Mark A. Miller
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From MEC at stowers-institute.org  Wed May  3 11:09:04 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 3 May 2006 10:09:04 -0500
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
Message-ID: <CED81D34E37D5043A1211565277A51E504D2E369@exchkc02.stowers-institute.org>

Marco,

It appears that your code assumes that the exons as returned from call
to BIO::DB::GFF::features are sorted by start; I don't think is
guaranteed (at least not in the documentation I'm reading).  Also I
think your code will not report overlap between two exons that have an
intervening overlapping exon.  Depending on what you're application is,
you may care.  For example, e1, e2, e3 all intersect pairwise, but your
code won't report on e1's overlap with e3.

e1 ---*******-------
e2 -----******------
e3 ------***--------

Out of curiousity, what is your application?  Designing primers for gene
resequencing?

Cheers,

Malcolm Cook
Database Applications Manager, Bioinformatics
Stowers Institute for Medical Research 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Marco Blanchette
>Sent: Tuesday, May 02, 2006 2:31 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
>
>Dear all--
>
>I have been trying to use the intersection function to extract 
>overlapping
>region from alternatively spliced exons as in the following script. The
>returned object from the 'my $overlap = 
>$exon1->intersection($exon2);' is
>actually loosing the strand of $exon1 if $exon1 is from the 
>negative strand.
>Is this behavior expected? Should I check the strand of $exon1 before
>working on the object return by any Bio::RangeI function?
>
>Many thanks 
>
>#!/usr/bin/perl
>use strict;
>use warnings;
>use Bio::DB::GFF;
>
>MAIN:{
>
>    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                -dsn =>
>'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                -user => 'guest');
>    my $test_db = $db->segment('4');
>    
>    # Load up the exons into $exons_p
>    for my $gene ($test_db->features(-types => 'gene')){
>
>        my $exons_p = extractExons($gene);
>
>        cluster($exons_p) unless ($#{$exons_p} == -1);
>
>    }
>}
>
>sub extractExons {
>    my $gene = shift;
>    my %ex_list;
>    my @tcs = $gene->features(    -type =>'processed_transcript',
>                                    -attributes =>{Gene => 
>$gene->group});
>                   
>    for my $tc (@tcs){
>        my @exons = $tc->features (-type => 'exon',
>                                     -attributes => {Parent => 
>$tc->group}
>);        
>    
>        for (@exons){
>            my $ex_id    = $_->id;
>            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>
>        }
>    
>    }
>    my @values = values %ex_list;
>    return(\@values);
>}
>
>sub cluster {
>    my $exons_p = shift;
>    
>    for (my $s = 0; $s <= $#{$exons_p}; $s++){
>        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>            my $exon1 = $exons_p->[$s];
>            my $exon2 = $exons_p->[$t];
>            
>            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>            
>                my $overlap = $exon1->intersection($exon2);
>                
>                print "===\n";;
>                print "ex1\n", $exon1->seq, "\n";
>                print "ex2\n", $exon2->seq, "\n";
>                print "overlap\n", $overlap->seq, "\n";
>            }
>        }
>    }
>}
>______________________________
>Marco Blanchette, Ph.D.
>
>mblanche at uclink.berkeley.edu
>
>Donald C. Rio's lab
>Department of Molecular and Cell Biology
>16 Barker Hall
>University of California
>Berkeley, CA 94720-3204
>
>Tel: (510) 642-1084
>Cell: (510) 847-0996
>Fax: (510) 642-6062
>-- 
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From sdavis2 at mail.nih.gov  Wed May  3 12:18:48 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 03 May 2006 12:18:48 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <CED81D34E37D5043A1211565277A51E504D2E369@exchkc02.stowers-institute.org>
Message-ID: <C07E5028.AF8A%sdavis2@mail.nih.gov>


On 5/3/06 11:09 AM, "Cook, Malcolm" <MEC at stowers-institute.org> wrote:

> Marco,
> 
> It appears that your code assumes that the exons as returned from call
> to BIO::DB::GFF::features are sorted by start; I don't think is
> guaranteed (at least not in the documentation I'm reading).  Also I
> think your code will not report overlap between two exons that have an
> intervening overlapping exon.  Depending on what you're application is,
> you may care.  For example, e1, e2, e3 all intersect pairwise, but your
> code won't report on e1's overlap with e3.
> 
> e1 ---*******-------
> e2 -----******------
> e3 ------***--------

I think this can be done (looking for "superexons") via the UCSC table
browser or via Penn State University's Galaxy server (written in python and
downloadable) in case you want a quick solution to what I think is your
problem....

Sean


From osborne1 at optonline.net  Wed May  3 16:22:57 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 03 May 2006 16:22:57 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060503193446.92476.qmail@web50412.mail.yahoo.com>
Message-ID: <C07E8961.84F2%osborne1@optonline.net>

Mark,

The RC line is part of the description of a reference, I'm guessing 'RC'
stands for Reference Comment. In order to get the attributes of a reference
you'll first do something like:

my $anno_collection = $seq->annotation;
my @references = $anno_collection->get_Annotations('reference');

To get the comment field for a specific reference you can do:

$references[0]->comment;

See the Feature-Annotation HOWTO for more information on Annotations, the
Reference object is a kind of Annotation object.

Brian O.


On 5/3/06 3:34 PM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:

> Yeah.  Do you have any experience with that?
> 
> Mark
> 
> --- Brian Osborne <osborne1 at optonline.net> wrote:
> 
>> Mark,
>> 
>> So you're trying to get the information in the RC line from a
>> Swissprot
>> format file?
>> 
>> Brian O.
> 
> 
> ---   ---   ---   ---   ---   ---   ---   ---
> 
> Mark A. Miller
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 


From cjfields at uiuc.edu  Wed May  3 17:09:36 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 16:09:36 -0500
Subject: [Bioperl-l] Batch retrieval partially implemented in
	Bio::DB::GenBank/GenPept
Message-ID: <000601c66ef5$e3066d90$15327e82@pyrimidine>

Just wanted to let you guys know I have added a few bits and pieces to
Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
epost/efetch.  I didn't want to break anything too severely so you can only
use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
methods yet).  I also added tests to DB.t, a few each for protein and
nucleotide retrieval using batch mode and so far they all pass fine.  

I haven't tested the upper sequence limit for this yet to see if it's at all
comparable to just using efetch but it seems a bit faster.  The eutils
coursebook states that one should only post ~500 at a time (I think you can
get a bit higher though).

Also, at the moment it only works at the moment for GI's (NOT accessions,
which apparently epost does not accept).  If we want to continue using this
method for retrieval then we may need a workaround for accs.

CJF

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From torsten.seemann at infotech.monash.edu.au  Wed May  3 17:44:48 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 04 May 2006 07:44:48 +1000
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D2BE0.2196%mblanche@berkeley.edu>
References: <C07D2BE0.2196%mblanche@berkeley.edu>
Message-ID: <1146692688.12571.1.camel@chauvel.csse.monash.edu.au>

Marco,

> Silly question: How do I get the version of BioPerl I am using... Never had
> to check a module/bundle version number before...

http://bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

-- 
Torsten Seemann <torsten.seemann at infotech.monash.edu.au>
Victorian Bioinformatics Consortium


From cjfields at uiuc.edu  Wed May  3 18:08:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 17:08:37 -0500
Subject: [Bioperl-l] Batch retrieval partially implemented
	inBio::DB::GenBank/GenPept
In-Reply-To: <000601c66ef5$e3066d90$15327e82@pyrimidine>
Message-ID: <000001c66efe$21dbcf80$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Wednesday, May 03, 2006 4:10 PM
> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Batch retrieval partially implemented
> inBio::DB::GenBank/GenPept
> 
> Just wanted to let you guys know I have added a few bits and pieces to
> Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
                     ^^^^^^^^^^^^^^^^^^^
                     Bio::DB::NCBIHelper
Fat fingers!

> epost/efetch.  I didn't want to break anything too severely so you can
> only
> use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
> methods yet).  I also added tests to DB.t, a few each for protein and
> nucleotide retrieval using batch mode and so far they all pass fine.
> 
> I haven't tested the upper sequence limit for this yet to see if it's at
> all
> comparable to just using efetch but it seems a bit faster.  The eutils
> coursebook states that one should only post ~500 at a time (I think you
> can
> get a bit higher though).
> 
> Also, at the moment it only works at the moment for GI's (NOT accessions,
> which apparently epost does not accept).  If we want to continue using
> this
> method for retrieval then we may need a workaround for accs.
> 
> CJF
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed May  3 18:24:23 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 03 May 2006 17:24:23 -0500
Subject: [Bioperl-l] Batch retrieval partially
	implemented	inBio::DB::GenBank/GenPept
In-Reply-To: <000001c66efe$21dbcf80$15327e82@pyrimidine>
References: <000001c66efe$21dbcf80$15327e82@pyrimidine>
Message-ID: <44592D97.6090906@campus.iztacala.unam.mx>

hehehe :)

Chris Fields wrote:
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Wednesday, May 03, 2006 4:10 PM
>> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Batch retrieval partially implemented
>> inBio::DB::GenBank/GenPept
>>
>> Just wanted to let you guys know I have added a few bits and pieces to
>> Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
>                      ^^^^^^^^^^^^^^^^^^^
>                      Bio::DB::NCBIHelper
> Fat fingers!
> 
>> epost/efetch.  I didn't want to break anything too severely so you can
>> only
>> use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
>> methods yet).  I also added tests to DB.t, a few each for protein and
>> nucleotide retrieval using batch mode and so far they all pass fine.
>>
>> I haven't tested the upper sequence limit for this yet to see if it's at
>> all
>> comparable to just using efetch but it seems a bit faster.  The eutils
>> coursebook states that one should only post ~500 at a time (I think you
>> can
>> get a bit higher though).
>>
>> Also, at the moment it only works at the moment for GI's (NOT accessions,
>> which apparently epost does not accept).  If we want to continue using
>> this
>> method for retrieval then we may need a workaround for accs.
>>
>> CJF
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From fernan at iib.unsam.edu.ar  Wed May  3 20:38:07 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Wed, 3 May 2006 21:38:07 -0300
Subject: [Bioperl-l] BioPerl-run in FreeBSD
In-Reply-To: <4457DDF8.4050005@campus.iztacala.unam.mx>
References: <4457DDF8.4050005@campus.iztacala.unam.mx>
Message-ID: <20060504003807.GA86447@iib.unsam.edu.ar>

+----[ Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> (02.May.2006 19:49):
|
| It?s my great pleasure to announce the availability of the BioPerl-run 
| packages (stable & developer releases) for the FreeBSD operating system.
| 
| For instructions on how to install BioPerl ports in FreeBSD, please take 
| a look into the Getting Bioperl section of the BioPerl Wiki.
| 
+----]

Great job Mauricio,

thanks for contributing this!

Fernan


From miker at biotiquesystems.com  Tue May  2 23:31:59 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Tue, 2 May 2006 20:31:59 -0700
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
Message-ID: <007b01c66e62$23161d20$c100a8c0@mike>


I've encountered a pretty serious bug in Bio::SeqIO when parsing certain genbank
files that contain CONTIG entries with gaps.  One such record is NW_925173.

When I try to parse this file using Bio::SeqIO::genbank, it will enter an
infinite loop and spin until it runs out of memory.  

I'm pretty certain it relates to this bug:
http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate that
genbank records with CONTIG gaps are not valid and can't be parsed.  But this
bug actually claims to be fixed, which is strange, since looking at the code for
FTLocationFactory (where the loop is) it's still right there.  I assume that
this may be fixed in other contexts but is still not fixed in
Bio::SeqIO::genbank?  Or am I doing something wrong?

I think that this should probably be filed as an open bug.  I would think that
even if bioperl isn't interested in parsing this type of file via SeqIO,
certainly you'd want to ensure that no finite input file would send the parser
into an infinite loop.  Have others encountered this problem?  Is there any plan
to address it?

Thanks very much for any information or help!

-Mike

P.S.  I've played around with my version of FTLocationFactory and it seems to
actually work and parse the gaps.  I'm not sure if I've created other bugs or if
it works in all cases, but at least the parser doesn't die.  I also don't know
that my hacky code is appropriate for putting back in to BioPerl, but I'm happy
to provide it if someone wants to check it out and/or consider it for checkin.


From ULNJUJERYDIX at spammotel.com  Wed May  3 04:20:38 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 3 May 2006 16:20:38 +0800
Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with
	Bio::Graphics::Panel
Message-ID: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>

Help!
I can't figure out the docs instructions

I want to create an imagemap of short sequence matches with a longer one
with clickable imagemaps for the short sequences. I figure I can do this
easily enough using the example script for parsing blast output but I need
an example script to understand how to produce the html code for the
imagemap. I can find only rather cryptic references about how this can be
done (see below).

$boxes = $panel-E<gt>boxes
    @boxes = $panel-E<gt>boxes
    The boxes() method returns a list of arrayrefs containing the
coordinates of each glyph.  The method is useful for constructing an
image map.  In a scalar context, boxes() returns an arrayref.  In an
list context, the method returns the list directly.

    Each member of the list is an arrayref of the following format:

      [ $feature, $x1, $y1, $x2, $y2, $track ]

    The first element is the feature object; either an
Ace::Sequence::Feature, a Das::Segment::Feature, or another Bioperl
Bio::SeqFeatureI object.  The coordinates are the topleft and
bottomright corners of the glyph, including any space allocated for
labels. The track is the Bio::Graphics::Glyph object corresponding to
the track that the feature is rendered inside.

    $position = $panel-E<gt>track_position($track)
    After calling gd() or boxes(), you can learn the resulting Y
coordinate of a track by calling track_position() with the value
returned by add_track() or unshift_track().  This will return undef if
called before gd() or boxes() or with an invalid track.

    @pixel_coords = $panel-E<gt>location2pixel(@feature_coords)
    Public routine to map feature coordinates (in base pairs) into pixel
coordinates relative to the left-hand edge of the picture. If you
define a -background callback, the callback may wish to invoke this
routine in order to translate base coordinates into pixel coordinates.

    $left = $panel-E<gt>left
    $right = $panel-E<gt>right
    $top   = $panel-E<gt>top
    $bottom = $panel-E<gt>bottom
    Return the pixel coordinates of the *drawing area*     of the panel, that
is, exclusive of the padding.


got it from http://docs.bioperl.org/bioperl-live/Bio/Graphics/Panel.html


From s.johri at imperial.ac.uk  Thu May  4 08:50:34 2006
From: s.johri at imperial.ac.uk (Johri, Saurabh)
Date: Thu, 4 May 2006 13:50:34 +0100
Subject: [Bioperl-l] Fu and Li's D statistic - calculate
Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AB3@icex5.ic.ac.uk>

Hi all,

I'm trying to calculate Fu and Li's D summary statistic for a group of
sequences.
the function fu_and_li_D(@ingroup,$extmutations)  takes 2 args, the
first being the ingroup (population) and the second being the number of
external mutations
which is calculated from an outgroup sequence.. 
 
my question is, which function do i use to calculate the number of
external mutations ?
would this be the singleton_count() function ?
the singleton_count() function takes a PopGen object - which represents
a clustal alignment file...
would i include the outgroup in a multiple fasta file for alignment with
clustal ?
 
any suggestions as to how to calculate the number of external mutations
would be much appreciated
 
Thanks for your help!
 

Saurabh Johri
Centre for Molecular Microbiology & Infection
Imperial College London
SW7 2AZ

 
From hlapp at gmx.net  Thu May  4 12:30:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 4 May 2006 12:30:05 -0400
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike>
References: <007b01c66e62$23161d20$c100a8c0@mike>
Message-ID: <C9D4D0CB-8340-4157-A603-3935C8F581E6@gmx.net>

Infinite loop on a file you can download (i.e., as opposed to a file  
you tinkered with) is never ok. Could you file this as a bug report?  
And ideally attach your patch?

Thanks,

	-hilmar

On May 2, 2006, at 11:31 PM, Michael Rogoff wrote:

>
> I've encountered a pretty serious bug in Bio::SeqIO when parsing  
> certain genbank
> files that contain CONTIG entries with gaps.  One such record is  
> NW_925173.
>
> When I try to parse this file using Bio::SeqIO::genbank, it will  
> enter an
> infinite loop and spin until it runs out of memory.
>
> I'm pretty certain it relates to this bug:
> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to  
> indicate that
> genbank records with CONTIG gaps are not valid and can't be  
> parsed.  But this
> bug actually claims to be fixed, which is strange, since looking at  
> the code for
> FTLocationFactory (where the loop is) it's still right there.  I  
> assume that
> this may be fixed in other contexts but is still not fixed in
> Bio::SeqIO::genbank?  Or am I doing something wrong?
>
> I think that this should probably be filed as an open bug.  I would  
> think that
> even if bioperl isn't interested in parsing this type of file via  
> SeqIO,
> certainly you'd want to ensure that no finite input file would send  
> the parser
> into an infinite loop.  Have others encountered this problem?  Is  
> there any plan
> to address it?
>
> Thanks very much for any information or help!
>
> -Mike
>
> P.S.  I've played around with my version of FTLocationFactory and  
> it seems to
> actually work and parse the gaps.  I'm not sure if I've created  
> other bugs or if
> it works in all cases, but at least the parser doesn't die.  I also  
> don't know
> that my hacky code is appropriate for putting back in to BioPerl,  
> but I'm happy
> to provide it if someone wants to check it out and/or consider it  
> for checkin.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From saldroubi at yahoo.com  Thu May  4 13:03:00 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Thu, 4 May 2006 10:03:00 -0700 (PDT)
Subject: [Bioperl-l] Is webiste down?
Message-ID: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>

All,
  
  Is the bioperl website down?  I can't get to http://www.bioperl.org 
  
  
  Thank you. 
  

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From arareko at campus.iztacala.unam.mx  Thu May  4 14:22:52 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 04 May 2006 13:22:52 -0500
Subject: [Bioperl-l] Is webiste down?
In-Reply-To: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>
References: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>
Message-ID: <445A467C.4070700@campus.iztacala.unam.mx>

Website is ok, maybe your gateway can't lookup the bioperl server at the 
moment.

Regards,
Mauricio.

Sam Al-Droubi wrote:
> All,
>   
>   Is the bioperl website down?  I can't get to http://www.bioperl.org 
>   
>   
>   Thank you. 
>   
> 
> 
> Sincerely, 
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu May  4 14:40:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 4 May 2006 13:40:32 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike>
Message-ID: <000001c66faa$3a25b130$15327e82@pyrimidine>

Are you using the CONTIG record or the full GenBank file? 	I see
problems with both (using bioperl-live) which seem unrelated to one another.
The full file seems to be running a bit slow b/c the full GenBank record is
huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
memory).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> Sent: Tuesday, May 02, 2006 10:32 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> 
> I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> genbank
> files that contain CONTIG entries with gaps.  One such record is
> NW_925173.
> 
> When I try to parse this file using Bio::SeqIO::genbank, it will enter an
> infinite loop and spin until it runs out of memory.
> 
> I'm pretty certain it relates to this bug:
> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> that
> genbank records with CONTIG gaps are not valid and can't be parsed.  But
> this
> bug actually claims to be fixed, which is strange, since looking at the
> code for
> FTLocationFactory (where the loop is) it's still right there.  I assume
> that
> this may be fixed in other contexts but is still not fixed in
> Bio::SeqIO::genbank?  Or am I doing something wrong?
> 
> I think that this should probably be filed as an open bug.  I would think
> that
> even if bioperl isn't interested in parsing this type of file via SeqIO,
> certainly you'd want to ensure that no finite input file would send the
> parser
> into an infinite loop.  Have others encountered this problem?  Is there
> any plan
> to address it?
> 
> Thanks very much for any information or help!
> 
> -Mike
> 
> P.S.  I've played around with my version of FTLocationFactory and it seems
> to
> actually work and parse the gaps.  I'm not sure if I've created other bugs
> or if
> it works in all cases, but at least the parser doesn't die.  I also don't
> know
> that my hacky code is appropriate for putting back in to BioPerl, but I'm
> happy
> to provide it if someone wants to check it out and/or consider it for
> checkin.
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From j.abbott at imperial.ac.uk  Thu May  4 11:44:44 2006
From: j.abbott at imperial.ac.uk (James Abbott)
Date: Thu, 04 May 2006 16:44:44 +0100
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or
	RC	lines
In-Reply-To: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>
References: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
	<7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>
Message-ID: <445A216C.7090108@imperial.ac.uk>

Jason Stajich wrote:
> I don't know if any of this has been resolved really so hopefully  
> James will speak up if he's implemented anything.
Not as yet, I'm afraid - $job is keeping me overly busy at the moment, 
but it's on my todo list....

Cheers,
James

-- 
Dr. James Abbott <j.abbott at imperial.ac.uk>
Bioinformatics Software Developer, Bioinformatics Support Service
Imperial College, London


From hubert.prielinger at gmx.at  Thu May  4 15:35:42 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 13:35:42 -0600
Subject: [Bioperl-l] can't parse blast file anymore
Message-ID: <445A578E.8050207@gmx.at>

Hi,
the following perl script worked fine until a few days ago....

==============================================================
#!/usr/bin/perl -w

use Bio::SearchIO;
use strict;
use DBI;
use Net::MySQL;

#use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux);

print "trying to connect to database \n";
my $database = 'antimicro_peptides';
my $host = 'ppc7.bio.ucalgary.ca';
my $user = 'Hubert';
my $password = 'Col00eng30';

my $mysql = Net::MySQL->new(
        hostname => $host,
        database => $database,
        user     => $user,
        password => $password,
    );
   

print "Connection established \n";

my $selectID = 0;
my $count = 0;


##output database results
#while (my @row = $sth->fetchrow_array)
#   { print "@row\n" }


print "start program\n";
my $directory = '/home/Hubert/test';
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
  if ($file =~ /txt$/)   {
      $count++;
    print "read file $file \n";
  

    $file = $directory . '/' . $file;

    my $search = new Bio::SearchIO (-format => 'blast',
                                       -file => $file);
    print "bioperl seems to work....\n";                           
    my $cutoff_len = 10;
                               
    #iterate over each query sequence
    print "try to enter while loop\n";
    while (my $result = $search->next_result) {
    print "entered 1st while loop\n";
   
      #iterate over each hit on the query sequence
      while (my $hit = $result->next_hit) {
      print "entered 2nd while loop\n";
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
        print "entered 3rd while loop\n";
           
          if ($hsp->length('sbjct') <= $cutoff_len) {
          #print $hsp->hit_string, "\n";
               
            for ($hsp->hit_string) {        #$hsp->hit_string
             print "count files....., $count ,\n";
.................

===================================================================

Output:

[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
trying to connect to database
Connection established
start program
opened directory
read file 40026.txt
bioperl seems to work....
try to enter while loop


but it doesn't enter the first while loop, it stuck there, first I 
thought it is a linux problem, because I updated from FC4 to FC5, but it 
isn't because perl is working fine, and it seems bioperl is working fine 
too, but it cannot parse the file anymore.....

regards
Hubert


From barry.moore at genetics.utah.edu  Thu May  4 17:22:51 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 4 May 2006 15:22:51 -0600
Subject: [Bioperl-l] [BULK]   can't parse blast file anymore
In-Reply-To: <445A578E.8050207@gmx.at>
References: <445A578E.8050207@gmx.at>
Message-ID: <BD1D97AA-99BD-451C-9835-4F22A59BCFDD@genetics.utah.edu>

Hubert,

My first suggestion would be to log onto your calgary server and  
change your password real quick (unless that is intended to post you  
password to the world).  Well, this isn't an answer, but it may help  
you find one.  Use perl -d your_script.pl to run your script under  
the debugger.  Type 'n' to step forward to the line where you start  
the while loop.  Type 'x $result' to see that an object exists (it  
should or you'd have gotten an error).  Type 's' to step into the  
next_results call, and then continue to type 'n' and 's' as needed to  
burrow down to see if you can find where you're hanging.

Barry

On May 4, 2006, at 1:35 PM, Hubert Prielinger wrote:

> Hi,
> the following perl script worked fine until a few days ago....
>
> ==============================================================
> #!/usr/bin/perl -w
>
> use Bio::SearchIO;
> use strict;
> use DBI;
> use Net::MySQL;
>
> #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux);
>
> print "trying to connect to database \n";
> my $database = 'antimicro_peptides';
> my $host = 'ppc7.bio.ucalgary.ca';
> my $user = 'Hubert';
> my $password = 'Col00eng30';
>
> my $mysql = Net::MySQL->new(
>         hostname => $host,
>         database => $database,
>         user     => $user,
>         password => $password,
>     );
>
>
> print "Connection established \n";
>
> my $selectID = 0;
> my $count = 0;
>
>
>
> ##output database results
> #while (my @row = $sth->fetchrow_array)
> #   { print "@row\n" }
>
>
>
> print "start program\n";
> my $directory = '/home/Hubert/test';
> opendir(DIR, $directory) || die("Cannot open directory");
> print "opened directory\n";
>
> foreach my $file (readdir(DIR))  {
>   if ($file =~ /txt$/)   {
>       $count++;
>     print "read file $file \n";
>
>
>     $file = $directory . '/' . $file;
>
>     my $search = new Bio::SearchIO (-format => 'blast',
>                                        -file => $file);
>     print "bioperl seems to work....\n";
>     my $cutoff_len = 10;
>
>     #iterate over each query sequence
>     print "try to enter while loop\n";
>     while (my $result = $search->next_result) {
>     print "entered 1st while loop\n";
>
>       #iterate over each hit on the query sequence
>       while (my $hit = $result->next_hit) {
>       print "entered 2nd while loop\n";
>
>         #iterate over each HSP in the hit
>         while (my $hsp = $hit->next_hsp) {
>         print "entered 3rd while loop\n";
>
>           if ($hsp->length('sbjct') <= $cutoff_len) {
>           #print $hsp->hit_string, "\n";
>
>             for ($hsp->hit_string) {        #$hsp->hit_string
>              print "count files....., $count ,\n";
> .................
>
> ===================================================================
>
> Output:
>
> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
> trying to connect to database
> Connection established
> start program
> opened directory
> read file 40026.txt
> bioperl seems to work....
> try to enter while loop
>
>
> but it doesn't enter the first while loop, it stuck there, first I
> thought it is a linux problem, because I updated from FC4 to FC5,  
> but it
> isn't because perl is working fine, and it seems bioperl is working  
> fine
> too, but it cannot parse the file anymore.....
>
> regards
> Hubert
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu May  4 18:27:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 4 May 2006 17:27:57 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <000001c66faa$3a25b130$15327e82@pyrimidine>
Message-ID: <000001c66fc9$fe7e5680$15327e82@pyrimidine>

Here's another odd bit.  This is what I get for the CONTIG line when I
passed a simple contig file (NW_925062, with one join) through Bio::SeqIO:

-----------------------------------
....
FEATURES             Location/Qualifiers
     source          1..8541
                     /db_xref="taxon:9606"
                     /mol_type="genomic DNA"
                     /chromosome="11"
                     /organism="Homo sapiens"
CONTIG      AADB02014027.1:1..8541

//
-----------------------------------
Here's the original:
-----------------------------------
FEATURES             Location/Qualifiers
     source          1..8541
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /chromosome="11"
CONTIG      join(AADB02014027.1:1..8541)
//
-----------------------------------

Looks like it lopped out the 'join' here as well.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, May 04, 2006 1:41 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> Are you using the CONTIG record or the full GenBank file? 	I see
> problems with both (using bioperl-live) which seem unrelated to one
> another.
> The full file seems to be running a bit slow b/c the full GenBank record
> is
> huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
> memory).
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > Sent: Tuesday, May 02, 2006 10:32 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> >
> > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> > genbank
> > files that contain CONTIG entries with gaps.  One such record is
> > NW_925173.
> >
> > When I try to parse this file using Bio::SeqIO::genbank, it will enter
> an
> > infinite loop and spin until it runs out of memory.
> >
> > I'm pretty certain it relates to this bug:
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> > that
> > genbank records with CONTIG gaps are not valid and can't be parsed.  But
> > this
> > bug actually claims to be fixed, which is strange, since looking at the
> > code for
> > FTLocationFactory (where the loop is) it's still right there.  I assume
> > that
> > this may be fixed in other contexts but is still not fixed in
> > Bio::SeqIO::genbank?  Or am I doing something wrong?
> >
> > I think that this should probably be filed as an open bug.  I would
> think
> > that
> > even if bioperl isn't interested in parsing this type of file via SeqIO,
> > certainly you'd want to ensure that no finite input file would send the
> > parser
> > into an infinite loop.  Have others encountered this problem?  Is there
> > any plan
> > to address it?
> >
> > Thanks very much for any information or help!
> >
> > -Mike
> >
> > P.S.  I've played around with my version of FTLocationFactory and it
> seems
> > to
> > actually work and parse the gaps.  I'm not sure if I've created other
> bugs
> > or if
> > it works in all cases, but at least the parser doesn't die.  I also
> don't
> > know
> > that my hacky code is appropriate for putting back in to BioPerl, but
> I'm
> > happy
> > to provide it if someone wants to check it out and/or consider it for
> > checkin.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Thu May  4 18:39:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 4 May 2006 18:39:05 -0400
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <000001c66fc9$fe7e5680$15327e82@pyrimidine>
References: <000001c66fc9$fe7e5680$15327e82@pyrimidine>
Message-ID: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net>

The two notations are equivalent and syntactically correct, or so I  
believe ... I don't think 100% verbatim preservation should be the  
goal. Or am I missing the point?

On May 4, 2006, at 6:27 PM, Chris Fields wrote:

> Here's another odd bit.  This is what I get for the CONTIG line when I
> passed a simple contig file (NW_925062, with one join) through  
> Bio::SeqIO:
>
> -----------------------------------
> ....
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /db_xref="taxon:9606"
>                      /mol_type="genomic DNA"
>                      /chromosome="11"
>                      /organism="Homo sapiens"
> CONTIG      AADB02014027.1:1..8541
>
> //
> -----------------------------------
> Here's the original:
> -----------------------------------
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /organism="Homo sapiens"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:9606"
>                      /chromosome="11"
> CONTIG      join(AADB02014027.1:1..8541)
> //
> -----------------------------------
>
> Looks like it lopped out the 'join' here as well.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, May 04, 2006 1:41 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>
>> Are you using the CONTIG record or the full GenBank file? 	I see
>> problems with both (using bioperl-live) which seem unrelated to one
>> another.
>> The full file seems to be running a bit slow b/c the full GenBank  
>> record
>> is
>> huge (~55 MB) but the CONTIG file does exactly what you said (runs  
>> out of
>> memory).
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
>>> Sent: Tuesday, May 02, 2006 10:32 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>>
>>>
>>> I've encountered a pretty serious bug in Bio::SeqIO when parsing  
>>> certain
>>> genbank
>>> files that contain CONTIG entries with gaps.  One such record is
>>> NW_925173.
>>>
>>> When I try to parse this file using Bio::SeqIO::genbank, it will  
>>> enter
>> an
>>> infinite loop and spin until it runs out of memory.
>>>
>>> I'm pretty certain it relates to this bug:
>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to  
>>> indicate
>>> that
>>> genbank records with CONTIG gaps are not valid and can't be  
>>> parsed.  But
>>> this
>>> bug actually claims to be fixed, which is strange, since looking  
>>> at the
>>> code for
>>> FTLocationFactory (where the loop is) it's still right there.  I  
>>> assume
>>> that
>>> this may be fixed in other contexts but is still not fixed in
>>> Bio::SeqIO::genbank?  Or am I doing something wrong?
>>>
>>> I think that this should probably be filed as an open bug.  I would
>> think
>>> that
>>> even if bioperl isn't interested in parsing this type of file via  
>>> SeqIO,
>>> certainly you'd want to ensure that no finite input file would  
>>> send the
>>> parser
>>> into an infinite loop.  Have others encountered this problem?  Is  
>>> there
>>> any plan
>>> to address it?
>>>
>>> Thanks very much for any information or help!
>>>
>>> -Mike
>>>
>>> P.S.  I've played around with my version of FTLocationFactory and it
>> seems
>>> to
>>> actually work and parse the gaps.  I'm not sure if I've created  
>>> other
>> bugs
>>> or if
>>> it works in all cases, but at least the parser doesn't die.  I also
>> don't
>>> know
>>> that my hacky code is appropriate for putting back in to BioPerl,  
>>> but
>> I'm
>>> happy
>>> to provide it if someone wants to check it out and/or consider it  
>>> for
>>> checkin.
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hubert.prielinger at gmx.at  Thu May  4 19:57:44 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 17:57:44 -0600
Subject: [Bioperl-l] can't parse blast file anymore
In-Reply-To: <445A7449.1080607@infotech.monash.edu.au>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
Message-ID: <445A94F8.9000903@gmx.at>

Torsten Seemann wrote:
> Hubert
>
>> the following perl script worked fine until a few days ago....
>>
>>    #iterate over each query sequence
>>    print "try to enter while loop\n";
>>  
>>
> die "Bad BLAST report" if not defined $search;
>
>>    while (my $result = $search->next_result) {
>>    print "entered 1st while loop\n";
>>
>> Output:
>>
>> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
>> try to enter while loop
>>
>> but it doesn't enter the first while loop, it stuck there, first I  
>>
> What is the value of $search before you start the WHILE loop ?
>
>


hi,
$search is defined, like

my $search = new Bio::SearchIO (-format => 'blast',
                                       -file => $file)


if I try it with the debugger as barry has suggested than I get the following

 
DB<1> n
main::(Blast.pl:24):    print "Connection established \n";
  DB<1> n
Connection established
main::(Blast.pl:26):    my $selectID = 0;
  DB<1> n
main::(Blast.pl:27):    my $count = 0;
  DB<1> n
main::(Blast.pl:37):    print "start program\n";
  DB<1> n
start program
main::(Blast.pl:38):    my $directory = '/home/Hubert/test';
  DB<1> n
main::(Blast.pl:39):    opendir(DIR, $directory) || die("Cannot open 
directory");
  DB<1> n
main::(Blast.pl:40):    print "opened directory\n";
  DB<1> n
opened directory
main::(Blast.pl:42):    foreach my $file (readdir(DIR))  {
  DB<1> n
main::(Blast.pl:43):      if ($file =~ /txt$/)   {
  DB<1> n
main::(Blast.pl:44):            $count++;
  DB<1> n
main::(Blast.pl:45):        print "read file $file \n";
  DB<1> n
read file 40026.txt
main::(Blast.pl:48):        $file = $directory . '/' . $file;
  DB<1> n
main::(Blast.pl:50):        my $search = new Bio::SearchIO (-format => 
'blast',
main::(Blast.pl:51):                                                           
-file => $file);
  DB<1> n
main::(Blast.pl:52):            print "bioperl seems to work....\n";
  DB<1> s $search
main::((eval 14)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
3:      $search;
  DB<<2>> n

  DB<2> n
bioperl seems to work....
main::(Blast.pl:53):        my $cutoff_len = 10;
  DB<2> n
main::(Blast.pl:56):        print "try to enter while loop\n";
  DB<2> n
try to enter while loop
main::(Blast.pl:57):        while (my $result = $search->next_result) {
  DB<2> s $result
main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
3:      $result;
  DB<<3>>


From torsten.seemann at infotech.monash.edu.au  Thu May  4 17:38:17 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 05 May 2006 07:38:17 +1000
Subject: [Bioperl-l] can't parse blast file anymore
In-Reply-To: <445A578E.8050207@gmx.at>
References: <445A578E.8050207@gmx.at>
Message-ID: <445A7449.1080607@infotech.monash.edu.au>

Hubert

>the following perl script worked fine until a few days ago....
>
>    #iterate over each query sequence
>    print "try to enter while loop\n";
>  
>
die "Bad BLAST report" if not defined $search;

>    while (my $result = $search->next_result) {
>    print "entered 1st while loop\n";
>
>Output:
>
>[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
>try to enter while loop
>
>but it doesn't enter the first while loop, it stuck there, first I 
>  
>
What is the value of $search before you start the WHILE loop ?


From barry.moore at genetics.utah.edu  Thu May  4 20:39:57 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 4 May 2006 18:39:57 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445A94F8.9000903@gmx.at>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
	<445A94F8.9000903@gmx.at>
Message-ID: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>

That should be 'x $resust' and you should see the object dumped to  
the screen.

or just 's' by itself which will step you into the sub on the while  
line will step you into the next_result sub, and you can look around  
and watch what's happening.

B

>   DB<2> s $result
> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
> 3:      $result;
>   DB<<3>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Thu May  4 22:04:20 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 20:04:20 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>
	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
Message-ID: <445AB2A4.7020405@gmx.at>

if I do so it returns:
0 undef


Barry Moore wrote:
> That should be 'x $resust' and you should see the object dumped to  
> the screen.
>
> or just 's' by itself which will step you into the sub on the while  
> line will step you into the next_result sub, and you can look around  
> and watch what's happening.
>
> B
>
>   
>>   DB<2> s $result
>> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
>> 3:      $result;
>>   DB<<3>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From torsten.seemann at infotech.monash.edu.au  Fri May  5 00:40:34 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 05 May 2006 14:40:34 +1000
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445AB2A4.7020405@gmx.at>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
	<445A94F8.9000903@gmx.at>
	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
	<445AB2A4.7020405@gmx.at>
Message-ID: <445AD742.4070408@infotech.monash.edu.au>

Hubert Prielinger wrote:
> if I do so it returns:
> 0 undef

That means the value of $search was undef.
That means that it could not parse or open the BLAST report.
I repeat the line that I put in my earlier email which you ignored.

# your line
my $search = Bio::SearchIO->new( ..... );

# then check if it was successful!
die "could not open blast report" if not defined $search;

--Torsten


From jason.stajich at duke.edu  Fri May  5 09:21:38 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 5 May 2006 09:21:38 -0400
Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files
In-Reply-To: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
Message-ID: <B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>

Space after the > is causing the problem since we infer the ID as the  
everything after the '>' BEFORE the first whitespace.  Get rid of the  
space.
   $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE

On May 4, 2006, at 7:00 PM, Gloria Rendon wrote:

> contents of the input file has a single sequence:
>
>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel
> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNFS
> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN
> ------------------------------------------
> this is the script that tries to parse it:
>
> use Bio::AlignIO;
> my $inseq = Bio::AlignIO->new(-format => 'fasta',
>                            -file   => 'test.fasta');
> while( my $aln = $inseq->next_aln ) {
>      print "name: ", $aln->displayname;
>      print "length: ", $aln->length;
>      print "\n";
> }
>
> ------------------------------------------
> and this is the result of running that script on winxp
>
> D:\msa\NAK MUTANTS>perl parseFasta.pl
>
>
> ------------- EXCEPTION  -------------
> MSG: No sequence with name []
> STACK Bio::SimpleAlign::displayname
> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047
> STACK toplevel parseFasta.pl:11
>
> --------------------------------------
> D:\msa\NAK MUTANTS>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From thoufek at pngg.org  Thu May  4 12:50:44 2006
From: thoufek at pngg.org (T.D. Houfek)
Date: Thu, 04 May 2006 12:50:44 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
Message-ID: <445A30E4.6070103@pngg.org>

Using Bioperl 1.5, having trouble with writing FASTA-style quality files 
using Bio::Seq::Quality.

I create the Bio::Seq::Quality object, giving its constructor an ID, a 
description, a nucleotide sequence, and a quality sequence. I then write 
the sequence FASTA and the quality FASTA. The description string will 
appear in the header line of the sequence FASTA, but not in the header 
line of the quality FASTA.

Can anybody help me figure out how to fix this? I've attached a sample 
script and output.

-T.D.

------------------- sample script follows 
---------------------------------------

#!/usr/bin/perl
use strict;
use Bio::Seq::Quality;
use Bio::SeqIO;

my $id = "bogus_id";
my $desc = "bogus description";
my $seq = "ATTATTATTATTATT";
my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";

my $sequal_obj = Bio::Seq::Quality->new(
-display_id => $id,
-desc => $desc,
-seq => $seq,
-qual => $qual
);

my $qualout = Bio::SeqIO->new(
-file => ">myfile.qual",
-format => 'qual'
);
my $seqout = Bio::SeqIO->new(
-file => ">myfile.seq",
-format => 'Fasta'
);

$seqout->write_seq($sequal_obj);
$qualout->write_seq($sequal_obj);


------------------ sample output follows 
---------------------------------------

tdhoufek at aether:~$ cat myfile.seq
 >bogus_id bogus description
ATTATTATTATTATT
tdhoufek at aether:~$ cat myfile.qual
 >bogus_id
10 20 30 10 20 30 10 20 30 10 20 30 10 20 30

--------------------------------------------------------------------------------------------------


-- 
T.D. Houfek
senior bioinformatics developer
plant nematode genetics group
north carolina state university
Email: thoufek at pngg.org
----------------------------------------------------------
use Bio::Seq; @a =qw/NNN CCT GAG CAT GCG TGT AAG AAC TAG/;
$u=seq;$r=Bio::Seq;sub c{$c=$r->new(-$u=>"@_[0]")->revcom;
$t=$c->$u;}map{m/\d/?$g=c($a[$_]):tr/a-i/1-9/&&($g=$a[$_])
;$x[$i++]=$g;} split //,"dgh5cb40ab120cdefb4";$z=$r->new(-
$u=>(join"", at x))->translate()->$u;$z =~s/X/ /g;print"$z\n"


From jason.stajich at duke.edu  Fri May  5 09:27:51 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 5 May 2006 09:27:51 -0400
Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files
In-Reply-To: <B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>
References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
	<B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>
Message-ID: <0F79C9AD-DE36-4424-9E59-37ABE8B62A5E@duke.edu>

[replying to myself]

although if you are trying to just read a sequence not an alignment  
then you want to use Bio::SeqIO.

See the copious help on the HOWTO page at bioperl website including a  
sequence and feature howto and beginner's guide.
  http://bioperl.org/wiki/HOWTOs

-jason
On May 5, 2006, at 9:21 AM, Jason Stajich wrote:

> Space after the > is causing the problem since we infer the ID as the
> everything after the '>' BEFORE the first whitespace.  Get rid of the
> space.
>    $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE
>
> On May 4, 2006, at 7:00 PM, Gloria Rendon wrote:
>
>> contents of the input file has a single sequence:
>>
>>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel
>> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNF 
>> S
>> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN
>> ------------------------------------------
>> this is the script that tries to parse it:
>>
>> use Bio::AlignIO;
>> my $inseq = Bio::AlignIO->new(-format => 'fasta',
>>                            -file   => 'test.fasta');
>> while( my $aln = $inseq->next_aln ) {
>>      print "name: ", $aln->displayname;
>>      print "length: ", $aln->length;
>>      print "\n";
>> }
>>
>> ------------------------------------------
>> and this is the result of running that script on winxp
>>
>> D:\msa\NAK MUTANTS>perl parseFasta.pl
>>
>>
>> ------------- EXCEPTION  -------------
>> MSG: No sequence with name []
>> STACK Bio::SimpleAlign::displayname
>> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047
>> STACK toplevel parseFasta.pl:11
>>
>> --------------------------------------
>> D:\msa\NAK MUTANTS>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From osborne1 at optonline.net  Fri May  5 10:04:02 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 05 May 2006 10:04:02 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
Message-ID: <C080D392.8567%osborne1@optonline.net>

T.D.,

According to the documentation,
http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file looks
right. What are you trying to create?

Brian O.


On 5/4/06 12:50 PM, "T.D. Houfek" <thoufek at pngg.org> wrote:

> Using Bioperl 1.5, having trouble with writing FASTA-style quality files
> using Bio::Seq::Quality.
> 
> I create the Bio::Seq::Quality object, giving its constructor an ID, a
> description, a nucleotide sequence, and a quality sequence. I then write
> the sequence FASTA and the quality FASTA. The description string will
> appear in the header line of the sequence FASTA, but not in the header
> line of the quality FASTA.
> 
> Can anybody help me figure out how to fix this? I've attached a sample
> script and output.
> 
> -T.D.
> 
> ------------------- sample script follows
> ---------------------------------------
> 
> #!/usr/bin/perl
> use strict;
> use Bio::Seq::Quality;
> use Bio::SeqIO;
> 
> my $id = "bogus_id";
> my $desc = "bogus description";
> my $seq = "ATTATTATTATTATT";
> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";
> 
> my $sequal_obj = Bio::Seq::Quality->new(
> -display_id => $id,
> -desc => $desc,
> -seq => $seq,
> -qual => $qual
> );
> 
> my $qualout = Bio::SeqIO->new(
> -file => ">myfile.qual",
> -format => 'qual'
> );
> my $seqout = Bio::SeqIO->new(
> -file => ">myfile.seq",
> -format => 'Fasta'
> );
> 
> $seqout->write_seq($sequal_obj);
> $qualout->write_seq($sequal_obj);
> 
> 
> ------------------ sample output follows
> ---------------------------------------
> 
> tdhoufek at aether:~$ cat myfile.seq
>> bogus_id bogus description
> ATTATTATTATTATT
> tdhoufek at aether:~$ cat myfile.qual
>> bogus_id
> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
> 
> ------------------------------------------------------------------------------
> --------------------
> 
> 
> 


From cjfields at uiuc.edu  Fri May  5 10:24:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 09:24:05 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net>
Message-ID: <001701c6704f$90dbd090$15327e82@pyrimidine>

I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk
from the longer file Michael used as an example here (NW_925173). I believe
the CONTIG line is currently handled like a feature so I think it goes
through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix is;
I think it's getting beaten up in there somehow. I may see what happens if
it's treated like a WGS line (like a Bio::Annotation::SimpleValue object)
and just glob the whole mess together as is.


Chris

...
FEATURES             Location/Qualifiers
     source          1..44976370
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /chromosome="11"
CONTIG      join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321,
            gap(441),AADB02014318.1:1..173584,gap(676),
            AADB02014319.1:1..377558,gap(20),
            complement(AADB02014320.1:1..431263),gap(20),
            AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198,
            gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771,
            gap(4611),AADB02014325.1:1..383881,gap(20),
            complement(AADB02014326.1:1..381633),gap(1930),
            complement(AADB02014327.1:1..460053),gap(20),
            AADB02014328.1:1..4186,gap(1587),
...

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Thursday, May 04, 2006 5:39 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> The two notations are equivalent and syntactically correct, or so I
> believe ... I don't think 100% verbatim preservation should be the
> goal. Or am I missing the point?
> 
> On May 4, 2006, at 6:27 PM, Chris Fields wrote:
> 
> > Here's another odd bit.  This is what I get for the CONTIG line when I
> > passed a simple contig file (NW_925062, with one join) through
> > Bio::SeqIO:
> >
> > -----------------------------------
> > ....
> > FEATURES             Location/Qualifiers
> >      source          1..8541
> >                      /db_xref="taxon:9606"
> >                      /mol_type="genomic DNA"
> >                      /chromosome="11"
> >                      /organism="Homo sapiens"
> > CONTIG      AADB02014027.1:1..8541
> >
> > //
> > -----------------------------------
> > Here's the original:
> > -----------------------------------
> > FEATURES             Location/Qualifiers
> >      source          1..8541
> >                      /organism="Homo sapiens"
> >                      /mol_type="genomic DNA"
> >                      /db_xref="taxon:9606"
> >                      /chromosome="11"
> > CONTIG      join(AADB02014027.1:1..8541)
> > //
> > -----------------------------------
> >
> > Looks like it lopped out the 'join' here as well.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >> Sent: Thursday, May 04, 2006 1:41 PM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >>
> >> Are you using the CONTIG record or the full GenBank file? 	I see
> >> problems with both (using bioperl-live) which seem unrelated to one
> >> another.
> >> The full file seems to be running a bit slow b/c the full GenBank
> >> record
> >> is
> >> huge (~55 MB) but the CONTIG file does exactly what you said (runs
> >> out of
> >> memory).
> >>
> >> Chris
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> >>> Sent: Tuesday, May 02, 2006 10:32 PM
> >>> To: bioperl-l at lists.open-bio.org
> >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >>>
> >>>
> >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing
> >>> certain
> >>> genbank
> >>> files that contain CONTIG entries with gaps.  One such record is
> >>> NW_925173.
> >>>
> >>> When I try to parse this file using Bio::SeqIO::genbank, it will
> >>> enter
> >> an
> >>> infinite loop and spin until it runs out of memory.
> >>>
> >>> I'm pretty certain it relates to this bug:
> >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to
> >>> indicate
> >>> that
> >>> genbank records with CONTIG gaps are not valid and can't be
> >>> parsed.  But
> >>> this
> >>> bug actually claims to be fixed, which is strange, since looking
> >>> at the
> >>> code for
> >>> FTLocationFactory (where the loop is) it's still right there.  I
> >>> assume
> >>> that
> >>> this may be fixed in other contexts but is still not fixed in
> >>> Bio::SeqIO::genbank?  Or am I doing something wrong?
> >>>
> >>> I think that this should probably be filed as an open bug.  I would
> >> think
> >>> that
> >>> even if bioperl isn't interested in parsing this type of file via
> >>> SeqIO,
> >>> certainly you'd want to ensure that no finite input file would
> >>> send the
> >>> parser
> >>> into an infinite loop.  Have others encountered this problem?  Is
> >>> there
> >>> any plan
> >>> to address it?
> >>>
> >>> Thanks very much for any information or help!
> >>>
> >>> -Mike
> >>>
> >>> P.S.  I've played around with my version of FTLocationFactory and it
> >> seems
> >>> to
> >>> actually work and parse the gaps.  I'm not sure if I've created
> >>> other
> >> bugs
> >>> or if
> >>> it works in all cases, but at least the parser doesn't die.  I also
> >> don't
> >>> know
> >>> that my hacky code is appropriate for putting back in to BioPerl,
> >>> but
> >> I'm
> >>> happy
> >>> to provide it if someone wants to check it out and/or consider it
> >>> for
> >>> checkin.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Fri May  5 10:47:50 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 5 May 2006 10:47:50 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <C080D392.8567%osborne1@optonline.net>
References: <C080D392.8567%osborne1@optonline.net>
Message-ID: <2E1683FE-57E4-4D97-A958-1B529973E89E@gmx.net>

He wants the description on the description line, like for the  
sequence file.

Thomas, my guess is the code doesn't print the description to the  
line although I haven't made sure. Do you want to volunteer and  
check, add that print statement and post the patch?

	-hilmar

On May 5, 2006, at 10:04 AM, Brian Osborne wrote:

> T.D.,
>
> According to the documentation,
> http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file  
> looks
> right. What are you trying to create?
>
> Brian O.
>
>
> On 5/4/06 12:50 PM, "T.D. Houfek" <thoufek at pngg.org> wrote:
>
>> Using Bioperl 1.5, having trouble with writing FASTA-style quality  
>> files
>> using Bio::Seq::Quality.
>>
>> I create the Bio::Seq::Quality object, giving its constructor an  
>> ID, a
>> description, a nucleotide sequence, and a quality sequence. I then  
>> write
>> the sequence FASTA and the quality FASTA. The description string will
>> appear in the header line of the sequence FASTA, but not in the  
>> header
>> line of the quality FASTA.
>>
>> Can anybody help me figure out how to fix this? I've attached a  
>> sample
>> script and output.
>>
>> -T.D.
>>
>> ------------------- sample script follows
>> ---------------------------------------
>>
>> #!/usr/bin/perl
>> use strict;
>> use Bio::Seq::Quality;
>> use Bio::SeqIO;
>>
>> my $id = "bogus_id";
>> my $desc = "bogus description";
>> my $seq = "ATTATTATTATTATT";
>> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";
>>
>> my $sequal_obj = Bio::Seq::Quality->new(
>> -display_id => $id,
>> -desc => $desc,
>> -seq => $seq,
>> -qual => $qual
>> );
>>
>> my $qualout = Bio::SeqIO->new(
>> -file => ">myfile.qual",
>> -format => 'qual'
>> );
>> my $seqout = Bio::SeqIO->new(
>> -file => ">myfile.seq",
>> -format => 'Fasta'
>> );
>>
>> $seqout->write_seq($sequal_obj);
>> $qualout->write_seq($sequal_obj);
>>
>>
>> ------------------ sample output follows
>> ---------------------------------------
>>
>> tdhoufek at aether:~$ cat myfile.seq
>>> bogus_id bogus description
>> ATTATTATTATTATT
>> tdhoufek at aether:~$ cat myfile.qual
>>> bogus_id
>> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
>>
>> --------------------------------------------------------------------- 
>> ---------
>> --------------------
>>
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From dmessina at wustl.edu  Fri May  5 11:24:47 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 10:24:47 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <5A549C57-A310-4623-BC44-787AC8BFD6C2@wustl.edu>

Apologies if this is a repost -- mail troubles this morning.

Hilmar is correct.

 From a cursory walk through the code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From dmessina at wustl.edu  Fri May  5 10:53:15 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 09:53:15 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <DCF490F7-46CC-47B7-81A7-229BCC819980@wustl.edu>

T.D.,

 From a cursory walk through your code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From dmessina at wustl.edu  Fri May  5 10:53:15 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 09:53:15 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <DCF490F7-46CC-47B7-81A7-229BCC819980@wustl.edu>

T.D.,

 From a cursory walk through your code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From hubert.prielinger at gmx.at  Fri May  5 14:30:24 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 12:30:24 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445AD742.4070408@infotech.monash.edu.au>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>	<445AB2A4.7020405@gmx.at>
	<445AD742.4070408@infotech.monash.edu.au>
Message-ID: <445B99C0.6050407@gmx.at>

hi,
I have done, as you suggested and I got the error message:

Can't call method "next_result" on an undefined value at....

then I looked up at the internet and found a thread which suggested to 
use strict and then the problem is solved....
but I'm already using use strict..

thanks

Torsten Seemann wrote:
> Hubert Prielinger wrote:
>   
>> if I do so it returns:
>> 0 undef
>>     
>
> That means the value of $search was undef.
> That means that it could not parse or open the BLAST report.
> I repeat the line that I put in my earlier email which you ignored.
>
> # your line
> my $search = Bio::SearchIO->new( ..... );
>
> # then check if it was successful!
> die "could not open blast report" if not defined $search;
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri May  5 15:18:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 14:18:16 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
Message-ID: <000001c67078$a9a7ca10$15327e82@pyrimidine>

What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping through
your files and performing a task on each one, such as parsing output.  It
changes into the working directory each time; you should be able to do
something like this:

use File::Find;
use Bio::SearchIO;


Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report.
> > I repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 15:27:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 14:27:12 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
Message-ID: <000101c67079$e8c86a00$15327e82@pyrimidine>

Sorry, mail got sent before I finished it!  Here I go again...

What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping through
your files and performing a task on each one, such as parsing output.  It
changes into the working directory each time; you should be able to do
something like this:

use File::Find;
use Bio::SearchIO;

my @dirlist = ("/home/Hubert/test");

find (\&dir, @dirlist);

sub printdir {
    return unless /txt$/; 
    return if (-d);
    my $parser = Bio::SearchIO->new(-file => $_,
                                    -format => 'blast');	
    while (my $result = $parser->next_result) {
        while (my $hit = $result->next_hit) {
            while (my $hsp = $hit->next_hsp) {
                # do stuff here
            }
        }
    }
}

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report.
> > I repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From barry.moore at genetics.utah.edu  Fri May  5 15:39:37 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Fri, 5 May 2006 13:39:37 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>	<445AB2A4.7020405@gmx.at>
	<445AD742.4070408@infotech.monash.edu.au> <445B99C0.6050407@gmx.at>
Message-ID: <7F3D73A6-392E-4728-ACB9-FD3BEDFD3C18@genetics.utah.edu>

Hubert-

If you want to send me your script and input file I'll try to have a  
look at it.

Barry

On May 5, 2006, at 12:30 PM, Hubert Prielinger wrote:

> hi,
> I have done, as you suggested and I got the error message:
>
> Can't call method "next_result" on an undefined value at....
>
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
>
> thanks
>
> Torsten Seemann wrote:
>> Hubert Prielinger wrote:
>>
>>> if I do so it returns:
>>> 0 undef
>>>
>>
>> That means the value of $search was undef.
>> That means that it could not parse or open the BLAST report.
>> I repeat the line that I put in my earlier email which you ignored.
>>
>> # your line
>> my $search = Bio::SearchIO->new( ..... );
>>
>> # then check if it was successful!
>> die "could not open blast report" if not defined $search;
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 16:07:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 15:07:53 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <000101c67079$e8c86a00$15327e82@pyrimidine>
Message-ID: <000201c6707f$97aaaba0$15327e82@pyrimidine>

Oops!  This is what happens when I copy and paste in a hurry.

> use File::Find;
> use Bio::SearchIO;
> 
> my @dirlist = ("/home/Hubert/test");
> 
> find (\&dir, @dirlist);
> 
> sub printdir {
  ^^^^^^^^^^^

Should be: sub dir {

>     return unless /txt$/;
>     return if (-d);
>     my $parser = Bio::SearchIO->new(-file => $_,
>                                     -format => 'blast');
>     while (my $result = $parser->next_result) {
>         while (my $hit = $result->next_hit) {
>             while (my $hsp = $hit->next_hsp) {
>                 # do stuff here
>             }
>         }
>     }
> }

Hubert, if the file you are parsing looks fine (i.e. valid BLAST output),
post it and your script on Bugzilla and let us take a look.  Leave out your
password though ; >

Chris


From golharam at umdnj.edu  Fri May  5 15:58:03 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 05 May 2006 15:58:03 -0400
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <000001c67078$a9a7ca10$15327e82@pyrimidine>
Message-ID: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>

I'm not sure how applicable this is, but I've seen a problem with Perl
if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8).
I've changed mine to en_US and lots of perl string parsing problems went
away.

Also, what about running the bioperl tests on your installation (make
test).  What happens?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Friday, May 05, 2006 3:18 PM
To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore


What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping
through your files and performing a task on each one, such as parsing
output.  It changes into the working directory each time; you should be
able to do something like this:

use File::Find;
use Bio::SearchIO;


Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to

> use strict and then the problem is solved.... but I'm already using 
> use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report. I 
> > repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 17:56:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 16:56:29 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <001701c6704f$90dbd090$15327e82@pyrimidine>
Message-ID: <000901c6708e$c77442b0$15327e82@pyrimidine>

Okay, I have changed the way the CONTIG line is handled in
Bio::SeqIO::genbank.  It was handling it as a feature; I just changed it
over to handling it as a Bio::Annotation::SimpleValue object with the value
being the entire contig section.  It seems to pass tests fine but I'm
operating off Windows and my wife's IBook went to the great desktop in the
sky (motherboard), so I can't test it there.  Pulling the file off using
Bio::DB::GenBank (using the no-redirect flag) works w/o crashing out.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Friday, May 05, 2006 9:24 AM
> To: 'Hilmar Lapp'
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk
> from the longer file Michael used as an example here (NW_925173). I
> believe
> the CONTIG line is currently handled like a feature so I think it goes
> through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix
> is;
> I think it's getting beaten up in there somehow. I may see what happens if
> it's treated like a WGS line (like a Bio::Annotation::SimpleValue object)
> and just glob the whole mess together as is.
> 
> 
> Chris
> 
> ...
> FEATURES             Location/Qualifiers
>      source          1..44976370
>                      /organism="Homo sapiens"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:9606"
>                      /chromosome="11"
> CONTIG
> join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321,
>             gap(441),AADB02014318.1:1..173584,gap(676),
>             AADB02014319.1:1..377558,gap(20),
>             complement(AADB02014320.1:1..431263),gap(20),
>             AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198,
> 
> gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771,
>             gap(4611),AADB02014325.1:1..383881,gap(20),
>             complement(AADB02014326.1:1..381633),gap(1930),
>             complement(AADB02014327.1:1..460053),gap(20),
>             AADB02014328.1:1..4186,gap(1587),
> ...
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> > Sent: Thursday, May 04, 2006 5:39 PM
> > To: Chris Fields
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> > The two notations are equivalent and syntactically correct, or so I
> > believe ... I don't think 100% verbatim preservation should be the
> > goal. Or am I missing the point?
> >
> > On May 4, 2006, at 6:27 PM, Chris Fields wrote:
> >
> > > Here's another odd bit.  This is what I get for the CONTIG line when I
> > > passed a simple contig file (NW_925062, with one join) through
> > > Bio::SeqIO:
> > >
> > > -----------------------------------
> > > ....
> > > FEATURES             Location/Qualifiers
> > >      source          1..8541
> > >                      /db_xref="taxon:9606"
> > >                      /mol_type="genomic DNA"
> > >                      /chromosome="11"
> > >                      /organism="Homo sapiens"
> > > CONTIG      AADB02014027.1:1..8541
> > >
> > > //
> > > -----------------------------------
> > > Here's the original:
> > > -----------------------------------
> > > FEATURES             Location/Qualifiers
> > >      source          1..8541
> > >                      /organism="Homo sapiens"
> > >                      /mol_type="genomic DNA"
> > >                      /db_xref="taxon:9606"
> > >                      /chromosome="11"
> > > CONTIG      join(AADB02014027.1:1..8541)
> > > //
> > > -----------------------------------
> > >
> > > Looks like it lopped out the 'join' here as well.
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> > >> Sent: Thursday, May 04, 2006 1:41 PM
> > >> To: bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> > >>
> > >> Are you using the CONTIG record or the full GenBank file? 	I
see
> > >> problems with both (using bioperl-live) which seem unrelated to one
> > >> another.
> > >> The full file seems to be running a bit slow b/c the full GenBank
> > >> record
> > >> is
> > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs
> > >> out of
> > >> memory).
> > >>
> > >> Chris
> > >>
> > >>> -----Original Message-----
> > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > >>> Sent: Tuesday, May 02, 2006 10:32 PM
> > >>> To: bioperl-l at lists.open-bio.org
> > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> > >>>
> > >>>
> > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing
> > >>> certain
> > >>> genbank
> > >>> files that contain CONTIG entries with gaps.  One such record is
> > >>> NW_925173.
> > >>>
> > >>> When I try to parse this file using Bio::SeqIO::genbank, it will
> > >>> enter
> > >> an
> > >>> infinite loop and spin until it runs out of memory.
> > >>>
> > >>> I'm pretty certain it relates to this bug:
> > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to
> > >>> indicate
> > >>> that
> > >>> genbank records with CONTIG gaps are not valid and can't be
> > >>> parsed.  But
> > >>> this
> > >>> bug actually claims to be fixed, which is strange, since looking
> > >>> at the
> > >>> code for
> > >>> FTLocationFactory (where the loop is) it's still right there.  I
> > >>> assume
> > >>> that
> > >>> this may be fixed in other contexts but is still not fixed in
> > >>> Bio::SeqIO::genbank?  Or am I doing something wrong?
> > >>>
> > >>> I think that this should probably be filed as an open bug.  I would
> > >> think
> > >>> that
> > >>> even if bioperl isn't interested in parsing this type of file via
> > >>> SeqIO,
> > >>> certainly you'd want to ensure that no finite input file would
> > >>> send the
> > >>> parser
> > >>> into an infinite loop.  Have others encountered this problem?  Is
> > >>> there
> > >>> any plan
> > >>> to address it?
> > >>>
> > >>> Thanks very much for any information or help!
> > >>>
> > >>> -Mike
> > >>>
> > >>> P.S.  I've played around with my version of FTLocationFactory and it
> > >> seems
> > >>> to
> > >>> actually work and parse the gaps.  I'm not sure if I've created
> > >>> other
> > >> bugs
> > >>> or if
> > >>> it works in all cases, but at least the parser doesn't die.  I also
> > >> don't
> > >>> know
> > >>> that my hacky code is appropriate for putting back in to BioPerl,
> > >>> but
> > >> I'm
> > >>> happy
> > >>> to provide it if someone wants to check it out and/or consider it
> > >>> for
> > >>> checkin.
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May  5 19:54:55 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 17:54:55 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>
References: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>
Message-ID: <445BE5CF.2000007@gmx.at>

hi ryan,
nothing happend if I add the verbose flag
and how can I test my bioperl installation.....


Ryan Golhar wrote:
> I'm not sure how applicable this is, but I've seen a problem with Perl
> if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8).
> I've changed mine to en_US and lots of perl string parsing problems went
> away.
>
> Also, what about running the bioperl tests on your installation (make
> test).  What happens?
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Friday, May 05, 2006 3:18 PM
> To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
>
>
> What happens if you add the verbose flag?
>
> my $search = new Bio::SearchIO (-verbose => 1,
>                                 -format => 'blast',
>                                 -file => $file);
>
> Added thought : you might want to look at File::Find for stepping
> through your files and performing a task on each one, such as parsing
> output.  It changes into the working directory each time; you should be
> able to do something like this:
>
> use File::Find;
> use Bio::SearchIO;
>
>
>
>
> Original Message-----
>   
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, May 05, 2006 1:30 PM
>> To: Torsten Seemann; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
>>
>> hi,
>> I have done, as you suggested and I got the error message:
>>
>> Can't call method "next_result" on an undefined value at....
>>
>> then I looked up at the internet and found a thread which suggested to
>>     
>
>   
>> use strict and then the problem is solved.... but I'm already using 
>> use strict..
>>
>> thanks
>>
>> Torsten Seemann wrote:
>>     
>>> Hubert Prielinger wrote:
>>>
>>>       
>>>> if I do so it returns:
>>>> 0 undef
>>>>
>>>>         
>>> That means the value of $search was undef.
>>> That means that it could not parse or open the BLAST report. I 
>>> repeat the line that I put in my earlier email which you ignored.
>>>
>>> # your line
>>> my $search = Bio::SearchIO->new( ..... );
>>>
>>> # then check if it was successful!
>>> die "could not open blast report" if not defined $search;
>>>
>>> --Torsten
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org 
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org 
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From hubert.prielinger at gmx.at  Fri May  5 20:01:11 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 18:01:11 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <445BE747.5020202@gmx.at>

hi
I have posted my script and the blast file to bugzilla......


From hubert.prielinger at gmx.at  Fri May  5 21:21:33 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 19:21:33 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445BE747.5020202@gmx.at>
References: <445BE747.5020202@gmx.at>
Message-ID: <445BFA1D.5060008@gmx.at>

they bugzilla posting didn't work, what is the exact email address for 
bugzilla

Hubert Prielinger wrote:
> hi
> I have posted my script and the blast file to bugzilla......
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri May  5 21:38:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 20:38:47 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445BFA1D.5060008@gmx.at>
Message-ID: <000d01c670ad$d209f980$15327e82@pyrimidine>

Hubert,

Calm down.  Breathe in, breath out.  Relax.......

Okay, here is the place to start.  Read the instructions there first.

http://www.bioperl.org/wiki/Bugs

Bugs are reported at this site:

http://bugzilla.bioperl.org/

Again, follow the instructions.  You will have to create a user name and
password to submit.  Once that is set up, click the "Submit a new bug" link
on the main bugzilla page.  On that page, fill out all information first and
a description of the error and hit 'commit'.   Add the BLAST report and some
sample script by clicking on the "Create a New Attachment" link (you'll have
to do this for each file).  Once you go back to the bug page you should see
two attachments and the bug report.  Any commits get sent through the
bioperl-guts-l mail list which most developers subscribe to, so they'll know
there's a new bug out there.  

I will not be able to get to it personally; our home computer died a slow
painful death today (RIP 2002-2006) but I can get to it next week.  If you
post the bug, somebody might be able to get to it sooner!

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 8:22 PM
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore
> 
> they bugzilla posting didn't work, what is the exact email address for
> bugzilla
> 
> Hubert Prielinger wrote:
> > hi
> > I have posted my script and the blast file to bugzilla......
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 22:26:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 21:26:35 -0500
Subject: [Bioperl-l] Changes to NCBIHelper (RE: CONTIG, genome files)
Message-ID: <000f01c670b4$7f22f760$15327e82@pyrimidine>

I committed a change to NCBIHelper that permits the downloading of CON
(contig) files and corrects an issue where no sequence features were saved
when rebuilding those files.  If you use Bio::DB::GenBank regularly to
download genome files, this likely will NOT affect your code unless you
explicitly set the format type to 'genbank', like so:

$factory = Bio::DB::GenBank->new(-format => 'gb'); # or 'genbank'

I believe most will not have that setting since the default was already
'gb'.  Now, the default is 'gbwithparts', which returns the full sequence
regardless.  If it is a file with a CONTIG line, the sequence is built on
NCBI's end and will include seq features if they are present).  As Brian
said, we'll let NCBI do the work for us!  

If you need the actual file w/o sequence, then you can set the format to
'genbank' (like above) and it will grab it for you.  There was an unrelated
problem with CONTIG line parsing that I also fixed, where I changed the
format over to a Bio::Annotation::SimpleValue as a workaround for now; for
some reason some CON files were misparsed and resulted in infinite loops or
missing 'join' statements.  

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hubert.prielinger at gmx.at  Sat May  6 18:22:05 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sat, 06 May 2006 16:22:05 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <000d01c670ad$d209f980$15327e82@pyrimidine>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
Message-ID: <445D218D.2030504@gmx.at>

ok, thanks
I have submitted the bug
bug #1994


Chris Fields wrote:
> Hubert,
>
> Calm down.  Breathe in, breath out.  Relax.......
>
> Okay, here is the place to start.  Read the instructions there first.
>
> http://www.bioperl.org/wiki/Bugs
>
> Bugs are reported at this site:
>
> http://bugzilla.bioperl.org/
>
> Again, follow the instructions.  You will have to create a user name and
> password to submit.  Once that is set up, click the "Submit a new bug" link
> on the main bugzilla page.  On that page, fill out all information first and
> a description of the error and hit 'commit'.   Add the BLAST report and some
> sample script by clicking on the "Create a New Attachment" link (you'll have
> to do this for each file).  Once you go back to the bug page you should see
> two attachments and the bug report.  Any commits get sent through the
> bioperl-guts-l mail list which most developers subscribe to, so they'll know
> there's a new bug out there.  
>
> I will not be able to get to it personally; our home computer died a slow
> painful death today (RIP 2002-2006) but I can get to it next week.  If you
> post the bug, somebody might be able to get to it sooner!
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, May 05, 2006 8:22 PM
>> To: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore
>>
>> they bugzilla posting didn't work, what is the exact email address for
>> bugzilla
>>
>> Hubert Prielinger wrote:
>>     
>>> hi
>>> I have posted my script and the blast file to bugzilla......
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From torsten.seemann at infotech.monash.edu.au  Sat May  6 20:57:14 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 07 May 2006 10:57:14 +1000
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445D218D.2030504@gmx.at>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
	<445D218D.2030504@gmx.at>
Message-ID: <445D45EA.8020804@infotech.monash.edu.au>

Hubert Prielinger wrote:
> ok, thanks
> I have submitted the bug
> bug #1994

This is a line from the script you sent to Bugzilla:

my $search = new Bio::SearchIO (
-verbose => 1,-format => 'blast', -file => $file)
or die "could not open blast report" if not defined my $search;

Althoygh syntactically correct, I don't think it is doing what you want.
Please change it to this:

my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die 
"could not open blast report";

or alternatively, this:

my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
if (not defined $search) {
   die "could not open blast report";
}

and let us know what happens.

all the example output you have supplied still suggests that Bio::SearchIO can 
not load or parse your blast report.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia


From mamillerpa at yahoo.com  Sat May  6 19:07:30 2006
From: mamillerpa at yahoo.com (Mark A. Miller)
Date: Sat, 6 May 2006 16:07:30 -0700 (PDT)
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <C07E8961.84F2%osborne1@optonline.net>
Message-ID: <20060506230730.56480.qmail@web50410.mail.yahoo.com>

Thanks for your responses, Jason and Brian.

Brian, you suggestion works great.  I had really hoped that by parsing
the OS line as well, I could be sure I wasn't missing any sequences
from my organisms.  Well, I gave up on that and just obtained the NCBI
taxonomy values.  I find it pretty easy to work with them in bioperl. 
Unfortunately, walking through all of Trembl takes a while, and I'm
getting this error:

  Can't call method "ncbi_taxid" on an undefined value at ./ga2.pl line
55, <GEN0> line 3253682.

When I try to extract annotations, etc., from entries like:

  DHE4_UNKP

with:

  my $species_object = $seq->species;
  my $taxid_string = $species_object->ncbi_taxid;

I guess I have to write an error handler for incomplete taxonomy
values.

Bye for now,
Mark


--- Brian Osborne <osborne1 at optonline.net> wrote:

> Mark,
> 
> The RC line is part of the description of a reference, I'm guessing
> 'RC'
> stands for Reference Comment. In order to get the attributes of a
> reference
> you'll first do something like:
> 
> my $anno_collection = $seq->annotation;
> my @references = $anno_collection->get_Annotations('reference');
> 
> To get the comment field for a specific reference you can do:
> 
> $references[0]->comment;
> 
> See the Feature-Annotation HOWTO for more information on Annotations,
> the
> Reference object is a kind of Annotation object.
> 
> Brian O.
> 
> 
> On 5/3/06 3:34 PM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:
> 
> > Yeah.  Do you have any experience with that?
> > 
> > Mark
> > 
> > --- Brian Osborne <osborne1 at optonline.net> wrote:
> > 
> >> Mark,
> >> 
> >> So you're trying to get the information in the RC line from a
> >> Swissprot
> >> format file?
> >> 
> >> Brian O.
> > 
> > 
> > ---   ---   ---   ---   ---   ---   ---   ---
> > 
> > Mark A. Miller
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com 
> 
> 
> 


---   ---   ---   ---   ---   ---   ---   ---

Mark A. Miller

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Sat May  6 23:33:40 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sat, 6 May 2006 22:33:40 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>

The -verbose flag was my suggestion; it should output a ton of debugging info 
from SearchIO::blast; if you see anything there, then it means that it's at least 
attempting to parse the report.  

Of course I can't test this myself at the moment since my wife's computer died 
(along with the bioperl setup); I'm using a loaner computer at the moment.

Chris

---- Original message ----
>Date: Sun, 07 May 2006 10:57:14 +1000
>From: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>  
>Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore  
>To: Hubert Prielinger <hubert.prielinger at gmx.at>
>Cc: bioperl-l at bioperl.org
>
>Hubert Prielinger wrote:
>> ok, thanks
>> I have submitted the bug
>> bug #1994
>
>This is a line from the script you sent to Bugzilla:
>
>my $search = new Bio::SearchIO (
>-verbose => 1,-format => 'blast', -file => $file)
>or die "could not open blast report" if not defined my $search;
>
>Althoygh syntactically correct, I don't think it is doing what you want.
>Please change it to this:
>
>my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die 
>"could not open blast report";
>
>or alternatively, this:
>
>my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>if (not defined $search) {
>   die "could not open blast report";
>}
>
>and let us know what happens.
>
>all the example output you have supplied still suggests that Bio::SearchIO can 
>not load or parse your blast report.
>
>-- 
>Torsten Seemann
>Victorian Bioinformatics Consortium, Monash University, Australia
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May  7 03:34:55 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 7 May 2006 00:34:55 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
Message-ID: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>

Hi all,

I use Bio::Tools::Run::Primer3 to design PCR primers.
I want to change some default values, for example, to
increase the PCR product size to 490-510 bp instead of
using the default value of 100-300 bp. What should I
do ?  


Thanks,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From jason.stajich at duke.edu  Sun May  7 16:49:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 16:49:29 -0400
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>
References: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>
Message-ID: <F69897D1-C65F-47F3-8324-EC2E52A2ACCD@duke.edu>

The problem is in how SearchIO was being initialized, the code  
basically looked like this:

  my $x = new Foo() or die if not defined my $x;

which is invalid for two reason.
  1) if not defined my $x;
  Will ALWAYS be false.

  2) my $x = new Foo() or die ;
  Will cast the new object as a boolean.

Whenever things aren't working, take a look at the code and try and  
walk through any shortcuts.  For clarity make it a two-step process
my $x = new Foo();
die "no valid $x" unless defined $x;

Please note that currently BioPerl WILL die (via throw) if you try  
and ask for an invalid file when you initialize a new IO object  --  
this is handled by code in Bio::Root::IO (line 313 in Bio/Root/IO.pm)  
which all the IO objects use, so you don't really need to do a test  
on the object after all.

--jason
On May 6, 2006, at 11:33 PM, Christopher Fields wrote:

> The -verbose flag was my suggestion; it should output a ton of  
> debugging info
> from SearchIO::blast; if you see anything there, then it means that  
> it's at least
> attempting to parse the report.
>
> Of course I can't test this myself at the moment since my wife's  
> computer died
> (along with the bioperl setup); I'm using a loaner computer at the  
> moment.
>
> Chris
>
> ---- Original message ----
>> Date: Sun, 07 May 2006 10:57:14 +1000
>> From: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>
>> Subject: Re: [Bioperl-l] [BULK]  can't parse blast file anymore
>> To: Hubert Prielinger <hubert.prielinger at gmx.at>
>> Cc: bioperl-l at bioperl.org
>>
>> Hubert Prielinger wrote:
>>> ok, thanks
>>> I have submitted the bug
>>> bug #1994
>>
>> This is a line from the script you sent to Bugzilla:
>>
>> my $search = new Bio::SearchIO (
>> -verbose => 1,-format => 'blast', -file => $file)
>> or die "could not open blast report" if not defined my $search;
>>
>> Althoygh syntactically correct, I don't think it is doing what you  
>> want.
>> Please change it to this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file)  
>> or die
>> "could not open blast report";
>>
>> or alternatively, this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>> if (not defined $search) {
>>   die "could not open blast report";
>> }
>>
>> and let us know what happens.
>>
>> all the example output you have supplied still suggests that  
>> Bio::SearchIO can
>> not load or parse your blast report.
>>
>> -- 
>> Torsten Seemann
>> Victorian Bioinformatics Consortium, Monash University, Australia
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Sun May  7 17:01:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 17:01:29 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>
References: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>
Message-ID: <C9CE0912-9C48-4404-AB56-054A425FE3A0@duke.edu>

I put up some info on the wiki (and I encourage other people to do  
the same!)
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3

Set the command line parameters by just calling a function of the  
name of the parameter.  To get a list of the available options, this  
perl code will report it to you:

# what are the arguments, and what do they mean?
   my $args = $primer3->arguments;

   print "ARGUMENT\tMEANING\n";
   foreach my $key (keys %{$args}) {print "$key\t", $$args{$key}, "\n"}

The info for PRODUCT_SIZE_RANGE is:
   (size range list, default 100-300) space separated list of product  
sizes eg <a>-<b> <x>-<y>

I believe you can set the PCR product size with
   $primer3->primer_product_size_range("490-510");

-jason
On May 7, 2006, at 3:34 AM, chen li wrote:

> Hi all,
>
> I use Bio::Tools::Run::Primer3 to design PCR primers.
> I want to change some default values, for example, to
> increase the PCR product size to 490-510 bp instead of
> using the default value of 100-300 bp. What should I
> do ?
>
>
> Thanks,
>
> Li
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From chen_li3 at yahoo.com  Sun May  7 21:18:17 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 7 May 2006 18:18:17 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <C9CE0912-9C48-4404-AB56-054A425FE3A0@duke.edu>
Message-ID: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>

Hi Jason,

I add the line code   
$primer3->primer_product_size_range("490-510");
 to my script. But it doesn't work nor primer3
complains it.

Li

--- Jason Stajich <jason.stajich at duke.edu> wrote:

> I put up some info on the wiki (and I encourage
> other people to do  
> the same!)
>
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
> 
> Set the command line parameters by just calling a
> function of the  
> name of the parameter.  To get a list of the
> available options, this  
> perl code will report it to you:
> 
> # what are the arguments, and what do they mean?
>    my $args = $primer3->arguments;
> 
>    print "ARGUMENT\tMEANING\n";
>    foreach my $key (keys %{$args}) {print "$key\t",
> $$args{$key}, "\n"}
> 
> The info for PRODUCT_SIZE_RANGE is:
>    (size range list, default 100-300) space
> separated list of product  
> sizes eg <a>-<b> <x>-<y>
> 
> I believe you can set the PCR product size with
>    $primer3->primer_product_size_range("490-510");
> 
> -jason
> On May 7, 2006, at 3:34 AM, chen li wrote:
> 
> > Hi all,
> >
> > I use Bio::Tools::Run::Primer3 to design PCR
> primers.
> > I want to change some default values, for example,
> to
> > increase the PCR product size to 490-510 bp
> instead of
> > using the default value of 100-300 bp. What should
> I
> > do ?
> >
> >
> > Thanks,
> >
> > Li
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From hubert.prielinger at gmx.at  Sun May  7 21:41:14 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 07 May 2006 19:41:14 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445D45EA.8020804@infotech.monash.edu.au>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
	<445D218D.2030504@gmx.at> <445D45EA.8020804@infotech.monash.edu.au>
Message-ID: <445EA1BA.9050301@gmx.at>

hi,
I have corrected that and now I finally I got a few error messages:

blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
Madden, Alejandro A. Sch?ffer,
blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
David J. Lipman
blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
generation of
blast.pm: unrecognized line protein database search programs", Nucleic 
Acids Res. 25:3389-3402.
blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1

after that line it stops without terminating....


Torsten Seemann wrote:
> Hubert Prielinger wrote:
>> ok, thanks
>> I have submitted the bug
>> bug #1994
>
> This is a line from the script you sent to Bugzilla:
>
> my $search = new Bio::SearchIO (
> -verbose => 1,-format => 'blast', -file => $file)
> or die "could not open blast report" if not defined my $search;
>
> Althoygh syntactically correct, I don't think it is doing what you want.
> Please change it to this:
>
> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
> die "could not open blast report";
>
> or alternatively, this:
>
> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
> if (not defined $search) {
>   die "could not open blast report";
> }
>
> and let us know what happens.
>
> all the example output you have supplied still suggests that 
> Bio::SearchIO can not load or parse your blast report.
>


From cjfields at uiuc.edu  Sun May  7 22:04:13 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sun, 7 May 2006 21:04:13 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>

These are debugging lines (not errors); you still have the -verbose flag set.  

Did you follow Jason's advice?  I believe he's right on the money about the issue 
at hand...

Chris

---- Original message ----
>Date: Sun, 07 May 2006 19:41:14 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore  
>To: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>, bioperl-
l at bioperl.org, Chris Fields <cjfields at uiuc.edu>, Jason Stajich 
<jason.stajich at duke.edu>
>
>hi,
>I have corrected that and now I finally I got a few error messages:
>
>blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
>Madden, Alejandro A. Sch?ffer,
>blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
>David J. Lipman
>blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
>generation of
>blast.pm: unrecognized line protein database search programs", Nucleic 
>Acids Res. 25:3389-3402.
>blast.pm: unrecognized line RID: 
1137529800-24476-151611170370.BLASTQ1
>
>after that line it stops without terminating....
>
>
>Torsten Seemann wrote:
>> Hubert Prielinger wrote:
>>> ok, thanks
>>> I have submitted the bug
>>> bug #1994
>>
>> This is a line from the script you sent to Bugzilla:
>>
>> my $search = new Bio::SearchIO (
>> -verbose => 1,-format => 'blast', -file => $file)
>> or die "could not open blast report" if not defined my $search;
>>
>> Althoygh syntactically correct, I don't think it is doing what you want.
>> Please change it to this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
>> die "could not open blast report";
>>
>> or alternatively, this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>> if (not defined $search) {
>>   die "could not open blast report";
>> }
>>
>> and let us know what happens.
>>
>> all the example output you have supplied still suggests that 
>> Bio::SearchIO can not load or parse your blast report.
>>
>


From jason.stajich at duke.edu  Sun May  7 22:47:00 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 22:47:00 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <430DE892-8EE8-4FC9-8BAC-7D344C876B72@duke.edu>

I'm not really familiar with the module more  than what the  
documentation says so did you try and use the add_targets method to  
add arguments instead?  I had thought the AUTOLOAD method took care  
of access to the cmd line arguments as it does for the other Run  
modules but I am not really sure.  Perhaps folks on the list who use  
this module can provide better advice.

-jason
On May 7, 2006, at 9:18 PM, chen li wrote:

> Hi Jason,
>
> I add the line code
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
>
> Li
>
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
>
>> I put up some info on the wiki (and I encourage
>> other people to do
>> the same!)
>>
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>>
>> Set the command line parameters by just calling a
>> function of the
>> name of the parameter.  To get a list of the
>> available options, this
>> perl code will report it to you:
>>
>> # what are the arguments, and what do they mean?
>>    my $args = $primer3->arguments;
>>
>>    print "ARGUMENT\tMEANING\n";
>>    foreach my $key (keys %{$args}) {print "$key\t",
>> $$args{$key}, "\n"}
>>
>> The info for PRODUCT_SIZE_RANGE is:
>>    (size range list, default 100-300) space
>> separated list of product
>> sizes eg <a>-<b> <x>-<y>
>>
>> I believe you can set the PCR product size with
>>    $primer3->primer_product_size_range("490-510");
>>
>> -jason
>> On May 7, 2006, at 3:34 AM, chen li wrote:
>>
>>> Hi all,
>>>
>>> I use Bio::Tools::Run::Primer3 to design PCR
>> primers.
>>> I want to change some default values, for example,
>> to
>>> increase the PCR product size to 490-510 bp
>> instead of
>>> using the default value of 100-300 bp. What should
>> I
>>> do ?
>>>
>>>
>>> Thanks,
>>>
>>> Li
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From osborne1 at optonline.net  Mon May  8 10:49:22 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 08 May 2006 10:49:22 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <C084D2B2.85D7%osborne1@optonline.net>

Li,

Read the documentation, Bio::Tools::Run::Primer3. It shows examples of the
correct syntax. Also look at bioperl-run/t/Primer3.t.

Brian O.


On 5/7/06 9:18 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> Hi Jason,
> 
> I add the line code
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
> 
> Li
> 
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
> 
>> I put up some info on the wiki (and I encourage
>> other people to do
>> the same!)
>> 
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>> 
>> Set the command line parameters by just calling a
>> function of the 
>> name of the parameter.  To get a list of the
>> available options, this
>> perl code will report it to you:
>> 
>> # what are the arguments, and what do they mean?
>>    my $args = $primer3->arguments;
>> 
>>    print "ARGUMENT\tMEANING\n";
>>    foreach my $key (keys %{$args}) {print "$key\t",
>> $$args{$key}, "\n"}
>> 
>> The info for PRODUCT_SIZE_RANGE is:
>>    (size range list, default 100-300) space
>> separated list of product
>> sizes eg <a>-<b> <x>-<y>
>> 
>> I believe you can set the PCR product size with
>>    $primer3->primer_product_size_range("490-510");
>> 
>> -jason
>> On May 7, 2006, at 3:34 AM, chen li wrote:
>> 
>>> Hi all,
>>> 
>>> I use Bio::Tools::Run::Primer3 to design PCR
>> primers.
>>> I want to change some default values, for example,
>> to
>>> increase the PCR product size to 490-510 bp
>> instead of
>>> using the default value of 100-300 bp. What should
>> I
>>> do ?
>>> 
>>> 
>>> Thanks,
>>> 
>>> Li
>>> 
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> 
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy at colibase.bham.ac.uk  Mon May  8 07:12:49 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Mon, 08 May 2006 12:12:49 +0100
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <445F27B1.40501@colibase.bham.ac.uk>

Hi Li,

I think the syntax you need is:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

I guess you may also need to change the parameter PRIMER_PRODUCT_OPT_SIZE.

Incidentally, such a restricted product size range may mean that Primer3 
is unable to design any suitable primers. If I recall correctly, this 
doesn't cause an error, you just get a Bio::Tools::Primer3 object with 
no primers in it. I have had some success with testing for this, and if 
necessary relaxing some constraints on primer design and re-running 
Primer3.

Hope this helps.
Roy.

--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk

> Hi Jason,
> 
> I add the line code   
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
> 
> Li
> 
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
> 
>> > I put up some info on the wiki (and I encourage
>> > other people to do  
>> > the same!)
>> >
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>> > 
>> > Set the command line parameters by just calling a
>> > function of the  
>> > name of the parameter.  To get a list of the
>> > available options, this  
>> > perl code will report it to you:
>> > 
>> > # what are the arguments, and what do they mean?
>> >    my $args = $primer3->arguments;
>> > 
>> >    print "ARGUMENT\tMEANING\n";
>> >    foreach my $key (keys %{$args}) {print "$key\t",
>> > $$args{$key}, "\n"}
>> > 
>> > The info for PRODUCT_SIZE_RANGE is:
>> >    (size range list, default 100-300) space
>> > separated list of product  
>> > sizes eg <a>-<b> <x>-<y>
>> > 
>> > I believe you can set the PCR product size with
>> >    $primer3->primer_product_size_range("490-510");
>> > 
>> > -jason
>> > On May 7, 2006, at 3:34 AM, chen li wrote:
>> > 
>>> > > Hi all,
>>> > >
>>> > > I use Bio::Tools::Run::Primer3 to design PCR
>> > primers.
>>> > > I want to change some default values, for example,
>> > to
>>> > > increase the PCR product size to 490-510 bp
>> > instead of
>>> > > using the default value of 100-300 bp. What should
>> > I
>>> > > do ?
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Li
>>> > >
>>> > > __________________________________________________
>>> > > Do You Yahoo!?
>>> > > Tired of spam?  Yahoo! Mail has the best spam
>> > protection around
>>> > > http://mail.yahoo.com
>>> > > _______________________________________________
>>> > > Bioperl-l mailing list
>>> > > Bioperl-l at lists.open-bio.org
>>> > >
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > 
>> > --
>> > Jason Stajich
>> > Duke University
>> > http://www.duke.edu/~jes12
>> > 
>> > 
>> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Mon May  8 09:21:54 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 8 May 2006 06:21:54 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <445F27B1.40501@colibase.bham.ac.uk>
Message-ID: <20060508132154.71440.qmail@web36802.mail.mud.yahoo.com>

I think Dr. Chaudhuri is correct. 

I add the follwoing line codes to my script(actually
copy from the document)

$primer3->add_targets(
PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

$primer3->add_targets('PRIMER_MIN_TM'=>60,
'PRIMER_MAX_TM'=>64);

to design the primers with product size from 490-510
bp and primer annealing Tm from 60 to 64C .

Here is part of the output in the file called
temp.out:

.......... original sequence.....
GTGGGCTGGTGTTGCTTGGAAAATTTCAAAATCCCAAAGTTTCAGGCTTCCCAAAGTTGGCTTGGAAAAATGTGATAGTCTCACCTGAGTCTAGACATGT
.................

PRIMER_PRODUCT_SIZE_RANGE=490-510
PRIMER_MIN_TM=60
PRIMER_MAX_TM=64
PRIMER_PAIR_PENALTY=0.1544
PRIMER_LEFT_PENALTY=0.081468
PRIMER_RIGHT_PENALTY=0.072951
PRIMER_LEFT_SEQUENCE=CCAAAGTTGGCTTGGAAAAA
...............................
PRIMER_PRODUCT_SIZE=501

..............

This is what I want. If you don't set the special
parameters such annealing Tm program will use the
defualt ones. If you set your own parameters they will
show up after the sequence (see this output example).

If one needs to set more parameters and wants to know
what parameters are available just browse the code for
BEGIN section.

Now I have another question: the program always prints
out the original sequence at the beginning is it
possible not to do that?

Thanks all for join this topic,

Li 


--- Roy Chaudhuri <roy at colibase.bham.ac.uk> wrote:

> Hi Li,
> 
> I think the syntax you need is:
> 
>
$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');
> 
> I guess you may also need to change the parameter
> PRIMER_PRODUCT_OPT_SIZE.
> 
> Incidentally, such a restricted product size range
> may mean that Primer3 
> is unable to design any suitable primers. If I
> recall correctly, this 
> doesn't cause an error, you just get a
> Bio::Tools::Primer3 object with 
> no primers in it. I have had some success with
> testing for this, and if 
> necessary relaxing some constraints on primer design
> and re-running 
> Primer3.
> 
> Hope this helps.
> Roy.
> 
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
> 
> http://xbase.bham.ac.uk
> 
> > Hi Jason,
> > 
> > I add the line code   
> > $primer3->primer_product_size_range("490-510");
> >  to my script. But it doesn't work nor primer3
> > complains it.
> > 
> > Li
> > 
> > --- Jason Stajich <jason.stajich at duke.edu> wrote:
> > 
> >> > I put up some info on the wiki (and I encourage
> >> > other people to do  
> >> > the same!)
> >> >
> >
>
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
> >> > 
> >> > Set the command line parameters by just calling
> a
> >> > function of the  
> >> > name of the parameter.  To get a list of the
> >> > available options, this  
> >> > perl code will report it to you:
> >> > 
> >> > # what are the arguments, and what do they
> mean?
> >> >    my $args = $primer3->arguments;
> >> > 
> >> >    print "ARGUMENT\tMEANING\n";
> >> >    foreach my $key (keys %{$args}) {print
> "$key\t",
> >> > $$args{$key}, "\n"}
> >> > 
> >> > The info for PRODUCT_SIZE_RANGE is:
> >> >    (size range list, default 100-300) space
> >> > separated list of product  
> >> > sizes eg <a>-<b> <x>-<y>
> >> > 
> >> > I believe you can set the PCR product size with
> >> >   
> $primer3->primer_product_size_range("490-510");
> >> > 
> >> > -jason
> >> > On May 7, 2006, at 3:34 AM, chen li wrote:
> >> > 
> >>> > > Hi all,
> >>> > >
> >>> > > I use Bio::Tools::Run::Primer3 to design PCR
> >> > primers.
> >>> > > I want to change some default values, for
> example,
> >> > to
> >>> > > increase the PCR product size to 490-510 bp
> >> > instead of
> >>> > > using the default value of 100-300 bp. What
> should
> >> > I
> >>> > > do ?
> >>> > >
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Li
> >>> > >
> >>> > >
> __________________________________________________
> >>> > > Do You Yahoo!?
> >>> > > Tired of spam?  Yahoo! Mail has the best
> spam
> >> > protection around
> >>> > > http://mail.yahoo.com
> >>> > >
> _______________________________________________
> >>> > > Bioperl-l mailing list
> >>> > > Bioperl-l at lists.open-bio.org
> >>> > >
> >> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> > 
> >> > --
> >> > Jason Stajich
> >> > Duke University
> >> > http://www.duke.edu/~jes12
> >> > 
> >> > 
> >> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> > http://mail.yahoo.com 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From hubert.prielinger at gmx.at  Mon May  8 15:09:29 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 08 May 2006 13:09:29 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>
References: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>
Message-ID: <445F9769.70500@gmx.at>

hi all together,
i have solved the problem, because I'm parsing blast 2.2.13 and I have 
installed an early bioperl 1.5.1 and there it occurred that
bug 1934 wasn't fixed yet, so I had to exchange the blast.pm file and 
now it works properly.

thank you very much
Hubert

Christopher Fields wrote:
> These are debugging lines (not errors); you still have the -verbose flag set.  
>
> Did you follow Jason's advice?  I believe he's right on the money about the issue 
> at hand...
>
> Chris
>
> ---- Original message ----
>   
>> Date: Sun, 07 May 2006 19:41:14 -0600
>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>> Subject: Re: [Bioperl-l] [BULK]  can't parse blast file anymore  
>> To: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>, bioperl-
>>     
> l at bioperl.org, Chris Fields <cjfields at uiuc.edu>, Jason Stajich 
> <jason.stajich at duke.edu>
>   
>> hi,
>> I have corrected that and now I finally I got a few error messages:
>>
>> blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
>> Madden, Alejandro A. Sch?ffer,
>> blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
>> David J. Lipman
>> blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
>> generation of
>> blast.pm: unrecognized line protein database search programs", Nucleic 
>> Acids Res. 25:3389-3402.
>> blast.pm: unrecognized line RID: 
>>     
> 1137529800-24476-151611170370.BLASTQ1
>   
>> after that line it stops without terminating....
>>
>>
>> Torsten Seemann wrote:
>>     
>>> Hubert Prielinger wrote:
>>>       
>>>> ok, thanks
>>>> I have submitted the bug
>>>> bug #1994
>>>>         
>>> This is a line from the script you sent to Bugzilla:
>>>
>>> my $search = new Bio::SearchIO (
>>> -verbose => 1,-format => 'blast', -file => $file)
>>> or die "could not open blast report" if not defined my $search;
>>>
>>> Althoygh syntactically correct, I don't think it is doing what you want.
>>> Please change it to this:
>>>
>>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
>>> die "could not open blast report";
>>>
>>> or alternatively, this:
>>>
>>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>>> if (not defined $search) {
>>>   die "could not open blast report";
>>> }
>>>
>>> and let us know what happens.
>>>
>>> all the example output you have supplied still suggests that 
>>> Bio::SearchIO can not load or parse your blast report.
>>>
>>>       
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From s.johri at imperial.ac.uk  Mon May  8 11:38:13 2006
From: s.johri at imperial.ac.uk (Johri, Saurabh)
Date: Mon, 8 May 2006 16:38:13 +0100
Subject: [Bioperl-l] PAML + Codeml problem..
Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>

Hi all,
 
I'm trying to use codeml from PAML to estimate Ka, Ks values from
sequences within a multi fasta file:
i'm using the code which has been posted on the bioperl wiki...
 
However, when I run the code, i get the following errors:
 
I did a google search to see if anyone had come across similar
problems.... in which case the problem seems to have been due to the
sequences not being a multiple of 3,
In my code I check if the sequence is a multiple of 3 and if  not, i
alter the sequences until this is the case, although I still have the
same error messages,
 
Any suggestions as to why this could be happening?
 
Thanks!!!
 
Saurabh Johri
Tuberculosis Research Group
Centre for Molecular Microbiology & Infection
Imperial College London
SW7 2AZ

 
-------------------- WARNING ---------------------
MSG: There was an error - see error_string for the program output
---------------------------------------------------
 
------------- EXCEPTION Bio::Root::NotImplemented -------------
MSG: Unknown format of PAML output
STACK Bio::Tools::Phylo::PAML::_parse_summary
/sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359
STACK Bio::Tools::Phylo::PAML::next_result
/sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224
------------------------------------
 
>Rv3923c
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_cdc1551
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac
>Rv3923c_mtb_f11
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_c1
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_210
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mbovis
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
 
------------------------------------


From chen_li3 at yahoo.com  Mon May  8 20:21:42 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 8 May 2006 17:21:42 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple sequences
Message-ID: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com>

Dear all,

The following is the script I use to design primers
for one sequence:

#!/cygdrive/c/Perl/bin/perl.exe

use warnings;
use strict;

use Bio::Tools::Run::Primer3;
use Bio::SeqIO;

my $file_in='piwil2.fa';
my $file_out='temp.out';
my $seqio=Bio::SeqIO->new(-file=>$file_in)
                    
my $seq=$seqio->next_seq;      
my $primer3=Bio::Tools::Run::Primer3->new(
                                            
-seq=>$seq,
-outfile=>$file_out,
- path=>"c:/Perl/local/primer3_1.0.0/src/primer3.exe" 
                                           
 );
                                                    
  unless ($primer3->executable){                	print
"primer3 can not be found. 
             Is it installed?\n"; 
  		exit(-1);
   }

$primer3->add_targets(
# set your own parameters for the primers or product
				
'PRIMER_OPT_GC_PERCENT'=>' 50   ',		
'PRIMER_OPT_SIZE'=>  '24    ',		
'PRIMER_OPT_TM'=>  ' 60   ');
                      	
  my $result=$primer3->run;    

   exit;

I try to modify it for multiple sequences by using a
while loop as following:

while ($seq=$seqio->next_seq){

my $primer3=Bio::Tools::Run::Primer3->new()
# design the primer}
....}

I get primers only for the last sequence. It seems the
earlier ones are overwritten.

Any idea will be highly aprreciated.

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From jason.stajich at duke.edu  Mon May  8 20:59:26 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 8 May 2006 20:59:26 -0400
Subject: [Bioperl-l] PAML + Codeml problem..
In-Reply-To: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>
References: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>
Message-ID: <4796FE3D-9D14-4D93-B455-69EDFE2B2B62@duke.edu>

Saurabh -

a) These sequences are identical except for difference in length so  
there isn't going to be any interesting values from PAML, but maybe  
you are just providing an example?
b) I think you are missing the trailing gaps in the alignment of the  
Rv3923c_mtb_cdc1551 sequence as it is shorter PAML requires aligned  
sequences as input.
c) The sequences, in the reading frame you have provided (and using  
the standard translation table), have stop codons in them, this will  
cause failure as well.

Which code from the wiki are you running, the 'running PAML' part of  
the HOWTO?

Try looking at the actual output from PAML to figure out what is wrong.
Add this when initializing the Run object:
-save_tempfiles => 1,
-verbose => 1,

then open up the tempdir that is reported and look at the output  
files (mlc file).

-jason

On May 8, 2006, at 11:38 AM, Johri, Saurabh wrote:

> Hi all,
>
> I'm trying to use codeml from PAML to estimate Ka, Ks values from
> sequences within a multi fasta file:
> i'm using the code which has been posted on the bioperl wiki...
>
> However, when I run the code, i get the following errors:
>
> I did a google search to see if anyone had come across similar
> problems.... in which case the problem seems to have been due to the
> sequences not being a multiple of 3,
> In my code I check if the sequence is a multiple of 3 and if  not, i
> alter the sequences until this is the case, although I still have the
> same error messages,
>
> Any suggestions as to why this could be happening?
>
> Thanks!!!
>
> Saurabh Johri
> Tuberculosis Research Group
> Centre for Molecular Microbiology & Infection
> Imperial College London
> SW7 2AZ
>
>
>
>
> -------------------- WARNING ---------------------
> MSG: There was an error - see error_string for the program output
> ---------------------------------------------------
>
> ------------- EXCEPTION Bio::Root::NotImplemented -------------
> MSG: Unknown format of PAML output
> STACK Bio::Tools::Phylo::PAML::_parse_summary
> /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359
> STACK Bio::Tools::Phylo::PAML::next_result
> /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224
> ------------------------------------
>
>> Rv3923c
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_cdc1551
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac
>> Rv3923c_mtb_f11
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_c1
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_210
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mbovis
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>
> ------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From osborne1 at optonline.net  Mon May  8 21:17:22 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 08 May 2006 21:17:22 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com>
Message-ID: <C08565E2.85FD%osborne1@optonline.net>

Li,

If you're analyzing multiple input sequences you're going to have to create
multiple output sequences.

Brian O.


On 5/8/06 8:21 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> I get primers only for the last sequence. It seems the
> earlier ones are overwritten.


From WiersmaP at AGR.GC.CA  Mon May  8 21:28:27 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Mon, 8 May 2006 21:28:27 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C41@onncrxms5.agr.gc.ca>

Hi Li,

 
When you execute $primer3->run with a Bio::Tools::Run::Primer3 object it
opens -outfile=>"filename" for writing and then closes.  That's why
putting it in a loop will overwrite your output file each time so you
only see the last one.  I suppose you could read in each output file
before looping to the next seq and append it to another file.

 
If you're doing a fair bit of work with this module it would be worth
looking at the Bio::Tools::Primer3 module.  The statement $result =
$primer3->run produces a Bio::Tools::Primer3 object which has all the
methods you need for customizing your output.

 
Paul

 
Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC

wiersmap at agr.gc.ca

 
From simon_sask at yahoo.com  Tue May  9 04:06:04 2006
From: simon_sask at yahoo.com (Simon K. Chan)
Date: Tue, 9 May 2006 01:06:04 -0700 (PDT)
Subject: [Bioperl-l] Raw Blast Alignment
Message-ID: <20060509080604.53621.qmail@web54104.mail.yahoo.com>

Hi Fellow Bioperl-ers,

bioperl-live/examples/searchio/rawwriter.pl is
supposed to show the raw alignments using
Bio::SearchIO.  The script is written to parse a
PSI-BLAST report.  I found an old email in the archive
from Jason stating that this should parse other
flavors of blast reports as well.  

What do I need to do to make this script parse non-PSI
blast reports?  I tried to just specify a file and
that the -format is 'blast', but I get an error
stating that the object method 'raw_hit_data' is not
defined in Bio::Search::Hit::BlastHit.

Basically, I want to obtain the raw alignment because
I'd like to get the size of the gaps, not just the
number.

Any help will be much appreciated.
Many thanks


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  9 08:21:02 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Tue, 9 May 2006 07:21:02 -0500
Subject: [Bioperl-l] Raw Blast Alignment
Message-ID: <fe65cab2.b5b5696a.81acb00@expms6.cites.uiuc.edu>

You need to read the SearchIO HOWTO, which gives several examples:

http://www.bioperl.org/wiki/HOWTO:SearchIO

Chris

---- Original message ----
>Date: Tue, 9 May 2006 01:06:04 -0700 (PDT)
>From: "Simon K. Chan" <simon_sask at yahoo.com>  
>Subject: [Bioperl-l] Raw Blast Alignment  
>To: bioperl-l at lists.open-bio.org
>
>Hi Fellow Bioperl-ers,
>
>bioperl-live/examples/searchio/rawwriter.pl is
>supposed to show the raw alignments using
>Bio::SearchIO.  The script is written to parse a
>PSI-BLAST report.  I found an old email in the archive
>from Jason stating that this should parse other
>flavors of blast reports as well.  
>
>What do I need to do to make this script parse non-PSI
>blast reports?  I tried to just specify a file and
>that the -format is 'blast', but I get an error
>stating that the object method 'raw_hit_data' is not
>defined in Bio::Search::Hit::BlastHit.
>
>Basically, I want to obtain the raw alignment because
>I'd like to get the size of the gaps, not just the
>number.
>
>Any help will be much appreciated.
>Many thanks
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From peterm at bioinf.uni-leipzig.de  Tue May  9 08:44:25 2006
From: peterm at bioinf.uni-leipzig.de (Peter Menzel)
Date: Tue, 09 May 2006 14:44:25 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <44608EA9.1030808@bioinf.uni-leipzig.de>

Hi all,
I am using the Bio::Graphics module to draw sequences and their features 
with Bio::SeqFeature::Generic.
The features I want to highlight are occurrences of transcription 
binding factors. Therefore I want to give every factor its own color, 
but i didn't see how to manage it. I only can colorize complete tracks.
Is there a known workaround?

Thanks, Peter


From Marc.Logghe at DEVGEN.com  Tue May  9 10:13:24 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 9 May 2006 16:13:24 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D88@ANTARESIA.be.devgen.com>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Peter Menzel
> Sent: Tuesday, May 09, 2006 2:44 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] colorize features
> 
> Hi all,
> I am using the Bio::Graphics module to draw sequences and 
> their features with Bio::SeqFeature::Generic.
> The features I want to highlight are occurrences of 
> transcription binding factors. Therefore I want to give every 
> factor its own color, but i didn't see how to manage it. I 
> only can colorize complete tracks.
> Is there a known workaround?

Yes, instead of giving a hardcoded color value you can pass a subroutine
to the option.
-bgcolor => sub {
    my $feat = shift;
    # get your attribute on which you want to base your color
    my ($attr) = $feat->get_tag_values('my_attribute');

    return $attr > 10 ? 'red' : 'green'
}

Not sure about the method calls I am making here (could as well be
get_attributes()) but you get the idea.
Cheers,
Marc


From Marc.Logghe at DEVGEN.com  Tue May  9 10:47:06 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 9 May 2006 16:47:06 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D89@ANTARESIA.be.devgen.com>

Hi Peter,
Actually it is explained much better in this howto:
http://bioperl.org/wiki/HOWTO:Graphics

The examples show the principle I mentioned in my previous post (e.g.
Example 4), but then for the -label or -description options.
But as said, you can apply this as well for (most of ?) the other
options as well.
Regards,
ML 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Marc Logghe
> Sent: Tuesday, May 09, 2006 4:13 PM
> To: Peter Menzel; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] colorize features
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Peter 
> > Menzel
> > Sent: Tuesday, May 09, 2006 2:44 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] colorize features
> > 
> > Hi all,
> > I am using the Bio::Graphics module to draw sequences and their 
> > features with Bio::SeqFeature::Generic.
> > The features I want to highlight are occurrences of transcription 
> > binding factors. Therefore I want to give every factor its 
> own color, 
> > but i didn't see how to manage it. I only can colorize complete 
> > tracks.
> > Is there a known workaround?
> 
> Yes, instead of giving a hardcoded color value you can pass a 
> subroutine to the option.
> -bgcolor => sub {
>     my $feat = shift;
>     # get your attribute on which you want to base your color
>     my ($attr) = $feat->get_tag_values('my_attribute');
> 
>     return $attr > 10 ? 'red' : 'green'
> }
> 
> Not sure about the method calls I am making here (could as well be
> get_attributes()) but you get the idea.
> Cheers,
> Marc
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From WiersmaP at AGR.GC.CA  Tue May  9 11:49:33 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Tue, 9 May 2006 11:49:33 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca>

Hi Li,

The line "my $result = $primer3->run" is already in the code you submitted.  In the Bio::Tools::Primer3 module the author uses "$p3" for the object.  If you change your line to "my $p3 = $primer3->run" you should be able to run the examples below. Process the results for each sequence and output the results before looping to the next sequence.

>From Bio::Tools::Primer3.pm:

 # how many results were there?
 my $num=$p3->number_of_results;
 print "There were $num results\n";

 # get all the results
 my $all_results=$p3->all_results;
 print "ALL the results\n";
 foreach my $key (keys %{$all_results}) {print "$key\t${$all_results}{$key}\n"}

 # get specific results
 my $result1=$p3->primer_results(1);
 print "The first primer is\n";
 foreach my $key (keys %{$result1}) {print "$key\t${$result1}{$key}\n"}

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Monday, May 08, 2006 8:32 PM
To: Wiersma, Paul
Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

I read both documents. What I understand is that
Bio:Tools::Run:Primer3 is for designing primers and
Bio:Tools::Primer3 is for parsing the results. When I
read the documents I do not see this line
 $result = $primer3->run in Bio:Tools::Primer3. I
wonder how you get this infomration.

Thanks,

Li 

--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
>  
> 
> When you execute $primer3->run with a
> Bio::Tools::Run::Primer3 object it
> opens -outfile=>"filename" for writing and then
> closes.  That's why
> putting it in a loop will overwrite your output file
> each time so you
> only see the last one.  I suppose you could read in
> each output file
> before looping to the next seq and append it to
> another file.
> 
>  
> 
> If you're doing a fair bit of work with this module
> it would be worth
> looking at the Bio::Tools::Primer3 module.  The
> statement $result =
> $primer3->run produces a Bio::Tools::Primer3 object
> which has all the
> methods you need for customizing your output.
> 
>  
> 
> Paul
> 
>  
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> 
> wiersmap at agr.gc.ca
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Tue May  9 13:32:32 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 9 May 2006 10:32:32 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca>
Message-ID: <20060509173232.18843.qmail@web36802.mail.mud.yahoo.com>

Thanks Paul it REALLY works.

I have other questions:
1) When I run the script I use this line on the
command prompt

perl primer.pl >test

When I check the default output file(temp.out) used by
the script I only see the information about the last
sequence which is different from what is in the test
file. In test file I can get all the information for
all the sequences.

2)Is it possible directly to use Bio::Tools:: Primer3
to print out selective information such as the primer
sequence and the size of PCR product?  Or do I have
parse the file by myself?

After I get all these information I would like to post
the script for bacth-designing PCR primers.


Thanks,

Li 


--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
> The line "my $result = $primer3->run" is already in
> the code you submitted.  In the Bio::Tools::Primer3
> module the author uses "$p3" for the object.  If you
> change your line to "my $p3 = $primer3->run" you
> should be able to run the examples below. Process
> the results for each sequence and output the results
> before looping to the next sequence.
> 
> >From Bio::Tools::Primer3.pm:
> 
>  # how many results were there?
>  my $num=$p3->number_of_results;
>  print "There were $num results\n";
> 
>  # get all the results
>  my $all_results=$p3->all_results;
>  print "ALL the results\n";
>  foreach my $key (keys %{$all_results}) {print
> "$key\t${$all_results}{$key}\n"}
> 
>  # get specific results
>  my $result1=$p3->primer_results(1);
>  print "The first primer is\n";
>  foreach my $key (keys %{$result1}) {print
> "$key\t${$result1}{$key}\n"}
> 
> Paul
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> ?
> 
> 
> 
> -----Original Message-----
> From: chen li [mailto:chen_li3 at yahoo.com] 
> Sent: Monday, May 08, 2006 8:32 PM
> To: Wiersma, Paul
> Subject: Re: [Bioperl-l] use primer3 to design
> primers with multiple sequences
> 
> Hi Paul,
> 
> I read both documents. What I understand is that
> Bio:Tools::Run:Primer3 is for designing primers and
> Bio:Tools::Primer3 is for parsing the results. When
> I
> read the documents I do not see this line
>  $result = $primer3->run in Bio:Tools::Primer3. I
> wonder how you get this infomration.
> 
> Thanks,
> 
> Li 
> 
> --- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:
> 
> > Hi Li,
> > 
> >  
> > 
> > When you execute $primer3->run with a
> > Bio::Tools::Run::Primer3 object it
> > opens -outfile=>"filename" for writing and then
> > closes.  That's why
> > putting it in a loop will overwrite your output
> file
> > each time so you
> > only see the last one.  I suppose you could read
> in
> > each output file
> > before looping to the next seq and append it to
> > another file.
> > 
> >  
> > 
> > If you're doing a fair bit of work with this
> module
> > it would be worth
> > looking at the Bio::Tools::Primer3 module.  The
> > statement $result =
> > $primer3->run produces a Bio::Tools::Primer3
> object
> > which has all the
> > methods you need for customizing your output.
> > 
> >  
> > 
> > Paul
> > 
> >  
> > 
> > Paul A. Wiersma
> > Agriculture and Agri-Food Canada/Agriculture et
> > Agroalimentaire Canada
> > Summerland, BC
> > 
> > wiersmap at agr.gc.ca
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From WiersmaP at AGR.GC.CA  Tue May  9 13:59:20 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Tue, 9 May 2006 13:59:20 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca>

Hi Li,

I've attached some code I used to explore basic functionality of Primer3.pm modules.  Hopefully you can see how I've picked out parts of the results for printing.  You can modify it as you need to output only some results.

>>>>>>>>
  # design the primers. This runs primer3 and returns a 
  # Bio::Tools::Run::Primer3 object with the results
my $results=$primer3->run;

  # see the Bio::Tools::Run::Primer3 pod for
  # things that you can get from this. For example:

print "There were ", $results->number_of_results+1, " primers\n";

my @out_keys_part = qw(	START
			LENGTH
			TM
			GC_PERCENT
			SELF_ANY
			SELF_END
			SEQUENCE );

for (my $i=0;$i <= $results->number_of_results;$i++){
	
	# get specific results
	my $result1=$results->primer_results($i);
 
	print "\n",$i+1;	
	for $key qw(PRIMER_LEFT PRIMER_RIGHT){
			
			my ($start, $length) = split /,/, ${$result1}{$key};
			${$result1}{$key."_START"} = $start;
			${$result1}{$key."_LENGTH"} = $length;
			foreach $partkey (@out_keys_part) {
				print "\t", ${$result1}{$key."_".$partkey};
			} 
			print "\n";
	}
	print "\tPRODUCT SIZE: ", ${$result1}{'PRIMER_PRODUCT_SIZE'}, ", PAIR ANY COMPL: ",
				${$result1}{'PRIMER_PAIR_COMPL_ANY'};
	print ", PAIR 3\' COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_END'}, "\n";
}
>>>>>>>>>>>>>>>

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Telephone/T?l?phone: 250-494-6388
Facsimile/T?l?copieur: 250-494-0755
Box 5000, 4200 Hwy 97
Summerland, BC
V0H 1Z0
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Tuesday, May 09, 2006 10:33 AM
To: Wiersma, Paul
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences

Thanks Paul it REALLY works.

I have other questions:
1) When I run the script I use this line on the
command prompt

perl primer.pl >test

When I check the default output file(temp.out) used by
the script I only see the information about the last
sequence which is different from what is in the test
file. In test file I can get all the information for
all the sequences.

2)Is it possible directly to use Bio::Tools:: Primer3
to print out selective information such as the primer
sequence and the size of PCR product?  Or do I have
parse the file by myself?

After I get all these information I would like to post
the script for bacth-designing PCR primers.


Thanks,

Li 


--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
> The line "my $result = $primer3->run" is already in
> the code you submitted.  In the Bio::Tools::Primer3
> module the author uses "$p3" for the object.  If you
> change your line to "my $p3 = $primer3->run" you
> should be able to run the examples below. Process
> the results for each sequence and output the results
> before looping to the next sequence.
> 
> >From Bio::Tools::Primer3.pm:
> 
>  # how many results were there?
>  my $num=$p3->number_of_results;
>  print "There were $num results\n";
> 
>  # get all the results
>  my $all_results=$p3->all_results;
>  print "ALL the results\n";
>  foreach my $key (keys %{$all_results}) {print
> "$key\t${$all_results}{$key}\n"}
> 
>  # get specific results
>  my $result1=$p3->primer_results(1);
>  print "The first primer is\n";
>  foreach my $key (keys %{$result1}) {print
> "$key\t${$result1}{$key}\n"}
> 
> Paul
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> ?
> 
> 
> 
> -----Original Message-----
> From: chen li [mailto:chen_li3 at yahoo.com] 
> Sent: Monday, May 08, 2006 8:32 PM
> To: Wiersma, Paul
> Subject: Re: [Bioperl-l] use primer3 to design
> primers with multiple sequences
> 
> Hi Paul,
> 
> I read both documents. What I understand is that
> Bio:Tools::Run:Primer3 is for designing primers and
> Bio:Tools::Primer3 is for parsing the results. When
> I
> read the documents I do not see this line
>  $result = $primer3->run in Bio:Tools::Primer3. I
> wonder how you get this infomration.
> 
> Thanks,
> 
> Li 
> 
> --- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:
> 
> > Hi Li,
> > 
> >  
> > 
> > When you execute $primer3->run with a
> > Bio::Tools::Run::Primer3 object it
> > opens -outfile=>"filename" for writing and then
> > closes.  That's why
> > putting it in a loop will overwrite your output
> file
> > each time so you
> > only see the last one.  I suppose you could read
> in
> > each output file
> > before looping to the next seq and append it to
> > another file.
> > 
> >  
> > 
> > If you're doing a fair bit of work with this
> module
> > it would be worth
> > looking at the Bio::Tools::Primer3 module.  The
> > statement $result =
> > $primer3->run produces a Bio::Tools::Primer3
> object
> > which has all the
> > methods you need for customizing your output.
> > 
> >  
> > 
> > Paul
> > 
> >  
> > 
> > Paul A. Wiersma
> > Agriculture and Agri-Food Canada/Agriculture et
> > Agroalimentaire Canada
> > Summerland, BC
> > 
> > wiersmap at agr.gc.ca
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  9 17:13:43 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 9 May 2006 16:13:43 -0500
Subject: [Bioperl-l] Oddness in Bio::SeqIO
Message-ID: <000601c673ad$74601c30$15327e82@pyrimidine>

I noticed an odd thing with SeqIO parsing of species lines (those
problematic bacterial tax names again).  I have a simple script that runs
output to STDOUT to generate a list of hits.  Here's what I get:

Bacterium: Corynebacterium glutamicum ATCC 13032
         hits: 4
Bacterium: Corynebacterium jeikeium K411 K411 <--
         hits: 1
Bacterium: Frankia sp. CcI3 CcI3 <--
         hits: 1
Bacterium: Frankia sp. EAN1pec EAN1pec <--
         hits: 1
Bacterium: Janibacter sp. HTCC2649 HTCC2649 <--
         hits: 1
Bacterium: Kineococcus radiotolerans SRS30216 SRS30216  <--
         hits: 1
Bacterium: Leifsonia xyli subsp. xyli str. CTCB07 xyli str. CTCB07 <--
         hits: 1
Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
K-10 <--

...

Most (but not all) of the strain numbers get repeated (marked with arrows).
This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
(and thus passed through Bio::SeqIO).  Anyone seen this before?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From torsten.seemann at infotech.monash.edu.au  Tue May  9 19:42:29 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 10 May 2006 09:42:29 +1000
Subject: [Bioperl-l] Oddness in Bio::SeqIO
In-Reply-To: <000601c673ad$74601c30$15327e82@pyrimidine>
References: <000601c673ad$74601c30$15327e82@pyrimidine>
Message-ID: <446128E5.1000908@infotech.monash.edu.au>

Chris,

> I noticed an odd thing with SeqIO parsing of species lines (those
> problematic bacterial tax names again).  I have a simple script that runs
> output to STDOUT to generate a list of hits.  Here's what I get:

> Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
> K-10 <--

In this case,

Genus = Mycobacterium
Species = avium
Subspecies = paratuberculosis
Strain = K-10

which suggests that BioPerl is trying to handle something special, 
because the 'subsp.' is gone?

Here's the pertinent parts of the Genbank file
(apologies for the wrapping):

LOCUS       NC_002944            4829781 bp    DNA     circular BCT 
18-JAN-2006
DEFINITION  Mycobacterium avium subsp. paratuberculosis K-10, complete 
genome.
SOURCE      Mycobacterium avium subsp. paratuberculosis K-10
   ORGANISM  Mycobacterium avium subsp. paratuberculosis K-10
             Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
             Corynebacterineae; Mycobacteriaceae; Mycobacterium; 
Mycobacterium
             avium complex (MAC).

                      /organism="Mycobacterium avium subsp. 
paratuberculosis K-10"
                      /strain="K-10"
                      /sub_species="paratuberculosis"


> Most (but not all) of the strain numbers get repeated (marked with arrows).
> This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
> (and thus passed through Bio::SeqIO).  Anyone seen this before?

The problem is mentioned in the wiki so it must have come up before?
http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data

I also deal with Bacteria mainly, and should also look into this. I 
haven't been using the genbank headers directly, only the features, so i 
never came across this.

Another thing which may crop up is when no Species has been allocated 
yet but the genus is known (or something like that). In that case the 
name is written as "Genus spp." eg.  	 Gallibacterium spp.

--Torsten


From chen_li3 at yahoo.com  Tue May  9 21:04:08 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 9 May 2006 18:04:08 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C47@onncrxms5.agr.gc.ca>
Message-ID: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com>

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now
figure out the line
"my $result1=$results->primer_results(1);"

returns a hash reference containing all the
information for the first pair of primer.  1)Since it
is a hash I should be able to get the specific value
for its corresponding  key by telling Perl which key
is the entry for the value. 2) Also it is a reference
I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks
a little bit complicated to me. But I get the job done
by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration 
#foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From zhouyubio at gmail.com  Tue May  9 21:35:01 2006
From: zhouyubio at gmail.com (Yu ZHOU)
Date: Wed, 10 May 2006 01:35:01 +0000 (UTC)
Subject: [Bioperl-l] pubmed
References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu>
Message-ID: <loom.20060510T032601-573@post.gmane.org>

Qunfeng <qfdong <at> iastate.edu> writes:

> 
> Hi there,
> 
> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
> 
> I am not very familiar with BioPerl. I tried to follow the example showing 
> in the above page to retrieve pubmed ID under each Reference tag , i.e., 
> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The 
> authors() works for me.  Appreciate any suggestions.
> 
> Qunfeng 
> 


Hi, 

I have the same problem with you. Here is what I have done, by using regular
expression to match the value of 'location' tag, if there is.

#------------------
my $ann = $seqobj->annotation(); # annotation object
foreach my $ref ( $ann->get_Annotations('reference') ) {
    print "Title: ", $ref->title,"\n";
    print "Location: ", $ref->location, "\n";
    if ($ref->location =~ /PUBMED\s+(\d+)/) {
	my $pmid = $1;
	print "PMID: ", $pmid, "\n";
    }
    print "Authors: ", $ref->authors, "\n";
}
#------------------


From osborne1 at optonline.net  Tue May  9 23:01:49 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 09 May 2006 23:01:49 -0400
Subject: [Bioperl-l] pubmed
In-Reply-To: <loom.20060510T032601-573@post.gmane.org>
Message-ID: <C086CFDD.865A%osborne1@optonline.net>

Qunfeng,

I'm using bioperl-live, I'm able retrieve the single PubMed id found in the
56961711 entry using the pubmed() method. Note that there are 4 references,
only one of which has a Pubmed id. Also, the authors() method prints out the
authors, not the Pubmed id. If you have a problem please show your code and
tell us which version of Bioperl you're using.

Brian O.


use strict;

use lib "/Users/bosborne/bioperl-live";

use Bio::DB::GenBank;


my $db = Bio::DB::GenBank->new;

my $seq = $db->get_Seq_by_id(56961711);

my $ann_coll = $seq->annotation;


foreach my $ann ($ann_coll->get_Annotations('reference')) {

  print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n";

}


On 5/9/06 9:35 PM, "Yu ZHOU" <zhouyubio at gmail.com> wrote:

> Qunfeng <qfdong <at> iastate.edu> writes:
> 
>> 
>> Hi there,
>> 
>> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
>> 
>> I am not very familiar with BioPerl. I tried to follow the example showing
>> in the above page to retrieve pubmed ID under each Reference tag , i.e.,
>> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The
>> authors() works for me.  Appreciate any suggestions.
>> 
>> Qunfeng 
>> 
> 
> 
> Hi, 
> 
> I have the same problem with you. Here is what I have done, by using regular
> expression to match the value of 'location' tag, if there is.
> 
> #------------------
> my $ann = $seqobj->annotation(); # annotation object
> foreach my $ref ( $ann->get_Annotations('reference') ) {
>     print "Title: ", $ref->title,"\n";
>     print "Location: ", $ref->location, "\n";
>     if ($ref->location =~ /PUBMED\s+(\d+)/) {
> my $pmid = $1;
> print "PMID: ", $pmid, "\n";
>     }
>     print "Authors: ", $ref->authors, "\n";
> }
> #------------------
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Wed May 10 05:30:59 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 10 May 2006 10:30:59 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
Message-ID: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>

Hi,
I'm a little confused as to how names are supposed to work in 
Bio::Taxonomy::Node.

In the bioperl versions that I've looked at a Node doesn't seem to store 
the most important information about itself - it's scientific name - in 
an obvious place. bioperl 1.5.1 puts it at the start of the 
classification list. I'd have thought sticking it in -name would make 
more sense, but this is used only for the GenBank common name.

The Bio::Taxonomy docs still suggests:

my $node_species_sapiens = Bio::Taxonomy::Node->new(
   -object_id => 9606, # or -ncbi_taxid. Requird tag
   -names => {
       'scientific' => ['sapiens'],
       'common_name' => ['human']
   },
   -rank => 'species'  # Required tag
);

and whilst Bio::Taxonomy::Node does not accept -names, it does have a 
'name' method which claims to work like:

$obj->name('scientific', 'sapiens');

This kind of thing would be really nice, but afaics 
Bio::Taxonomy::Node->new takes the -name value and makes a common name 
out of it, whilst the name() method passes any 'scientific' name to the 
scientific_name() method which is unable to set any value (and warns 
about this), only get.

It seems like the need to have this classification array work the same 
way as Bio::Species is causing some unnecessary restrictions. Can't the 
more sensible idea of having a dedicated storage spot for the 
ScientificName and other parameters be used, with the classification 
array either being generated just-in-time from the hash-stored data, or 
indeed being generated from the Lineage field?


Also, why does a node store the complete hierarchy on itself in the 
classification array? If we're going that far, why don't the 
Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a 
get_taxonomy() method instead of a get_Taxonomy_Node() method. 
get_taxonomy() could, from a single efetch.fcgi lookup, create a 
complete Bio::Taxonomy with all the nodes. Whilst most nodes would only 
have a minimum of information, if you could simply ask a node what its 
rank and scientific name was you could easily build a classification 
array, or ask what Kingdom your species was in etc.

Are there good reasons for Taxonomy working the way it does in 1.5.1, or 
would I not be wasting my time re-writing things to make more sense (to me)?


Cheers,
Sendu.


From osborne1 at optonline.net  Wed May 10 10:33:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 10 May 2006 10:33:18 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca>
Message-ID: <C08771EE.866A%osborne1@optonline.net>

Paul,

I took your code, added some "run" code and made it into a script and added
this to CVS, examples/tools/run_primer3.pl. I hope this is OK with you.

Brian O.


On 5/9/06 1:59 PM, "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> $results->number_of_results


From stoltzfu at umbi.umd.edu  Tue May  9 16:22:43 2006
From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus)
Date: Tue, 09 May 2006 16:22:43 -0400
Subject: [Bioperl-l] proposal: CDAT (character data and trees) integrative
	object
Message-ID: <D8EE6983-2123-45B3-967C-0E4982428CFA@umbi.umd.edu>

Dear developers--

We propose a Bio::CDAT (Character Data And Trees) module to  
facilitate comparative analysis
using evolutionary methods by 1) managing evolutionary relationships  
(by linking data to trees)
and 2) allowing coordinated analysis of different types of data (by  
implementing a generic concept
of ?character-state? data).

Bio::CDAT would take advantage of existing BioPerl objects and would  
include the functionality
of Rutger Vos's Bio::Phylo.  It would provide the framework to  
develop interfaces to analysis tools
(phylogeny inference, evolutionary rate models, functional shift  
inference, etc), as well as to file
formats and visualization methods appropriate for such analyses.

A proposal is attached.  We would like to hear your thoughts (e.g.,  
see the section on "Questions
to consider")!  Thanks

Arlin Stoltzfus
WeiGang Qiu
Rutger Vos
(with thanks to Justin Reese and Aaron Mackey)
------------------
Arlin Stoltzfus (stoltzfu at umbi.umd.edu)
CARB, 9600 Gudelsky Drive, Rockville, Maryland  20850
tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel
---------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CDAT-proposal.pdf
Type: application/pdf
Size: 193701 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060509/48aeca4b/attachment-0002.pdf>
-------------- next part --------------


From zhouyubio at gmail.com  Wed May 10 04:55:46 2006
From: zhouyubio at gmail.com (Yu Zhou)
Date: Wed, 10 May 2006 16:55:46 +0800
Subject: [Bioperl-l] pubmed
In-Reply-To: <C086CFDD.865A%osborne1@optonline.net>
References: <loom.20060510T032601-573@post.gmane.org>
	<C086CFDD.865A%osborne1@optonline.net>
Message-ID: <613ffb490605100155w43a9ea4sca23818bc7fa4e33@mail.gmail.com>

Thanks!

I am using Bioperl-1.4, not bioperl-live. That may be the reason why
it does not work!


On 5/10/06, Brian Osborne <osborne1 at optonline.net> wrote:
> Qunfeng,
>
> I'm using bioperl-live, I'm able retrieve the single PubMed id found in the
> 56961711 entry using the pubmed() method. Note that there are 4 references,
> only one of which has a Pubmed id. Also, the authors() method prints out the
> authors, not the Pubmed id. If you have a problem please show your code and
> tell us which version of Bioperl you're using.
>
> Brian O.
>
>
> use strict;
>
> use lib "/Users/bosborne/bioperl-live";
>
> use Bio::DB::GenBank;
>
>
>
> my $db = Bio::DB::GenBank->new;
>
> my $seq = $db->get_Seq_by_id(56961711);
>
> my $ann_coll = $seq->annotation;
>
>
> foreach my $ann ($ann_coll->get_Annotations('reference')) {
>
>   print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n";
>
> }
>
>
>
>
>
> On 5/9/06 9:35 PM, "Yu ZHOU" <zhouyubio at gmail.com> wrote:
>
> > Qunfeng <qfdong <at> iastate.edu> writes:
> >
> >>
> >> Hi there,
> >>
> >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
> >>
> >> I am not very familiar with BioPerl. I tried to follow the example
> showing
> >> in the above page to retrieve pubmed ID under each Reference tag , i.e.,
> >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The
> >> authors() works for me.  Appreciate any suggestions.
> >>
> >> Qunfeng
> >>
> >
> >
> > Hi,
> >
> > I have the same problem with you. Here is what I have done, by using
> regular
> > expression to match the value of 'location' tag, if there is.
> >
> > #------------------
> > my $ann = $seqobj->annotation(); # annotation object
> > foreach my $ref ( $ann->get_Annotations('reference') ) {
> >     print "Title: ", $ref->title,"\n";
> >     print "Location: ", $ref->location, "\n";
> >     if ($ref->location =~ /PUBMED\s+(\d+)/) {
> > my $pmid = $1;
> > print "PMID: ", $pmid, "\n";
> >     }
> >     print "Authors: ", $ref->authors, "\n";
> > }
> > #------------------
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>


--
Best Wishes!

Yu


From cjfields at uiuc.edu  Wed May 10 11:46:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 10:46:27 -0500
Subject: [Bioperl-l] Oddness in Bio::SeqIO
In-Reply-To: <446128E5.1000908@infotech.monash.edu.au>
Message-ID: <000f01c67448$e63973b0$15327e82@pyrimidine>

This actually pops up when using $seq->species->common_name; using
$seq->species->binomial chops some of the strain designations off, so really
neither one works optimally for bacterial genus-species-strain taxonomy.
Hilmar made the suggestion that it's probably best to grab the NCBI TaxID
and parse it out that way by looking it up in the taxonomy database (using
Bio::DB::Taxonomy), but at the moment that's not what Bio::SeqIO::genbank
does.  

I wonder if we should be trying to shove most of this stuff into species
objects directly from the beginning; in other words, maybe we should try to
get the information in Bio::Annotation objects and then, after the
parsing/IO is finished, have a method to get the information into
Bio::Species objects when wanted/needed; a check could be added against the
NCBI Taxonomy database there.  

Anyway, I really haven't looked at how they are parsed out and don't have
the time at the moment.  I may look into this as well but not until I get
back from conference (end of May).  Jason and Brian have been calling for a
refactoring of Bio::SeqIO::genbank for a while; maybe it's getting time to
do something about it...

Chris 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> Sent: Tuesday, May 09, 2006 6:42 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Oddness in Bio::SeqIO
> 
> Chris,
> 
> > I noticed an odd thing with SeqIO parsing of species lines (those
> > problematic bacterial tax names again).  I have a simple script that
> runs
> > output to STDOUT to generate a list of hits.  Here's what I get:
> 
> > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10
> paratuberculosis
> > K-10 <--
> 
> In this case,
> 
> Genus = Mycobacterium
> Species = avium
> Subspecies = paratuberculosis
> Strain = K-10
> 
> which suggests that BioPerl is trying to handle something special,
> because the 'subsp.' is gone?
> 
> Here's the pertinent parts of the Genbank file
> (apologies for the wrapping):
> 
> LOCUS       NC_002944            4829781 bp    DNA     circular BCT
> 18-JAN-2006
> DEFINITION  Mycobacterium avium subsp. paratuberculosis K-10, complete
> genome.
> SOURCE      Mycobacterium avium subsp. paratuberculosis K-10
>    ORGANISM  Mycobacterium avium subsp. paratuberculosis K-10
>              Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
>              Corynebacterineae; Mycobacteriaceae; Mycobacterium;
> Mycobacterium
>              avium complex (MAC).
> 
>                       /organism="Mycobacterium avium subsp.
> paratuberculosis K-10"
>                       /strain="K-10"
>                       /sub_species="paratuberculosis"
> 
> 
> > Most (but not all) of the strain numbers get repeated (marked with
> arrows).
> > This is actually in the GenBank file itself, downloaded via
> Bio::DB::GenBank
> > (and thus passed through Bio::SeqIO).  Anyone seen this before?
> 
> The problem is mentioned in the wiki so it must have come up before?
> http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data
> 
> I also deal with Bacteria mainly, and should also look into this. I
> haven't been using the genbank headers directly, only the features, so i
> never came across this.
> 
> Another thing which may crop up is when no Species has been allocated
> yet but the genus is known (or something like that). In that case the
> name is written as "Genus spp." eg.  	 Gallibacterium spp.
> 
> --Torsten
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cuiw at mail.nih.gov  Wed May 10 12:02:55 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 12:02:55 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiplesequences
In-Reply-To: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F4999@nihcesmlbx10.nih.gov>


'PRIMER_SEQUENCE_ID' is not a key in the Bio::Tools::Primer3 output
hash.

You can find all legal keys by "print keys %{$result1};"


There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

From WiersmaP at AGR.GC.CA  Wed May 10 12:08:37 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Wed, 10 May 2006 12:08:37 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca>

Brian, no problem with the code, thanks for asking.

Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0).  If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error.

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Tuesday, May 09, 2006 6:04 PM
To: Wiersma, Paul
Cc: bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now
figure out the line
"my $result1=$results->primer_results(1);"

returns a hash reference containing all the
information for the first pair of primer.  1)Since it
is a hash I should be able to get the specific value
for its corresponding  key by telling Perl which key
is the entry for the value. 2) Also it is a reference
I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks
a little bit complicated to me. But I get the job done
by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration 
#foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cuiw at mail.nih.gov  Wed May 10 14:42:36 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 14:42:36 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiplesequences:
	bug in code!
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F49A0@nihcesmlbx10.nih.gov>

Hope this works!

Bio::Tools::Primer3 line 264 should be:
 
$self->{seqobject}=Bio::Seq->new(-seq=>$value, -id=>$id);

Then you should be able to display PRIMER_SEQUENCE_ID by

####read primer3 output file############
my $p3=Bio::Tools::Primer3->new(-file=>"data/primer3_output.txt");

########  print id###############
print $p3->seqobject->id;

Wenwu Cui, PhD
NIH/NCI


-----Original Message-----
From: Wiersma, Paul [mailto:WiersmaP at agr.gc.ca] 
Sent: Wednesday, May 10, 2006 12:09 PM
To: chen li
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] use primer3 to design primers with multiplesequences

Brian, no problem with the code, thanks for asking.

Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0).  If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error.

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com]
Sent: Tuesday, May 09, 2006 6:04 PM
To: Wiersma, Paul
Cc: bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);"

returns a hash reference containing all the information for the first pair of primer.  1)Since it is a hash I should be able to get the specific value for its corresponding  key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 10 14:58:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 13:58:19 -0500
Subject: [Bioperl-l] ListSummaries for April 26-May 9
Message-ID: <001801c67463$b3c0a910$15327e82@pyrimidine>

ListSummaries for April 26-May 9 are up at the usual place:

http://www.bioperl.org/wiki/Mailing_list_summaries

Direct link:

http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006

It's a bit of a hurried one so don't be surprised to find a few spelling
errors here and there.  I'm getting ready for a conference in a couple weeks
so I may be off the radar a bit here and there.  The next ListSummary won't
be posted until May 26.  Enjoy!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From chen_li3 at yahoo.com  Wed May 10 20:27:34 2006
From: chen_li3 at yahoo.com (chen li)
Date: Wed, 10 May 2006 17:27:34 -0700 (PDT)
Subject: [Bioperl-l] What is the relationship between primer3 module and
	run-primer3 module?
Message-ID: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>

First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual 
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3 

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run; 

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li 

 
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From jason.stajich at duke.edu  Wed May 10 20:41:31 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 10 May 2006 20:41:31 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module and
	run-primer3 module?
In-Reply-To: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <B1D9C06A-09FF-4342-81E4-7D38AD66F4CA@duke.edu>

Bio::Tools::Run::XXX modules are for running applications...

On May 10, 2006, at 8:27 PM, chen li wrote:

> First thank you all for replying my previous post
> about primer3.
>
> But now I am a little confused even after I read the
> documents: What is the relationship between these two
> modules? What is correct/standard way to use them to
> do the batch-primer design? What I do is that I use
> Bio::Tools::Run::Primer3 to design primers. Based on
> Dr. Roy Chaudhuri's information I can set the
> parameters using the following syntax:
>
> $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');
>
> Based on Paul A. Wiersma's explanation I can also
> print out part of the primer results(because I don't
> need all the information). But there is a little
> trouble: PRIMER_SEQUENCE_ID can't be accessed using
> this method. And Paul points out that
> "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
> individual
> results but only end up by default with
> $results->primer_results(0)".  So it seems there is no
> way to get around this problem using
> Bio::Tools::Run::Primer3. And others suggest using
> Bio::Tools::Primer3 to parse the results. So is true
> that Bio::Tools::Run::Primer3 is for primer design and
> Bio::Tools::Primer3 is for parsing the results from
> Bio::Tools::Run::Primer3? But what I find is that I
> get almost all the results (except PRIMER_SEQUENCE_ID
> and SEQUENCE ) without providing a line code
>
> use Bio::Tools::Primer3
>
> in the script.  How to explain this? Is it because the
> following line code?
>
> my $result=$primer3->run;
>
> The last question: which line code is used to invoke
> program primer3.exe? How does Perl script call the
> primer3.exe?
>
> Once again thank you all very much,
>
> Li
>
>
>
>
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Wed May 10 20:53:43 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 10 May 2006 20:53:43 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
Message-ID: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>

I would use the implementation that talks to the flatfile db as the  
standard here.  nodes are defined by the data in from taxonomy dump  
dbs from ncbi.
the eutils is pretty worthless except for taxid->name or reverse, you  
can't get the full taxonomy (or couldn't when that implementation was  
written).

The "name" method refers to the name of the node - each level in the  
taxonomy can have a "name".

The bits of hackiness relate to wrapping the node object as a  
Bio::Species and/or being able to read  a genbank file and the  
organism taxonomy data as a list and instantiating.  If we could rely  
on everything being in a DB of course this would be simpler.

Another problem is the depth of the taxonomy is not constant for  
every node so assuming that a fixed number of slots will be filled in  
to generate the taxonomy leads to problems.

Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the  
best example of working code as this is how I really wanted it to  
work, the Bio::Species hacks are only there to shoehorn data  
retrieved from genbank files in.  With the flatfile implementation  
you have to walk all the way up the db hierarchy to get the kingdom  
for a node so you do have to build up the classification hierarchy as  
each node only stores data about itsself.

I'm not exactly sure what you are proposing to do, but would  
definitely enjoy another pair of hands, I don't really have time to  
mess with it any time soon.

-jason
On May 10, 2006, at 5:30 AM, Sendu Bala wrote:

> Hi,
> I'm a little confused as to how names are supposed to work in
> Bio::Taxonomy::Node.
>
> In the bioperl versions that I've looked at a Node doesn't seem to  
> store
> the most important information about itself - it's scientific name  
> - in
> an obvious place. bioperl 1.5.1 puts it at the start of the
> classification list. I'd have thought sticking it in -name would make
> more sense, but this is used only for the GenBank common name.
>
> The Bio::Taxonomy docs still suggests:
>
> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>    -names => {
>        'scientific' => ['sapiens'],
>        'common_name' => ['human']
>    },
>    -rank => 'species'  # Required tag
> );
>
> and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> 'name' method which claims to work like:
>
> $obj->name('scientific', 'sapiens');
>
> This kind of thing would be really nice, but afaics
> Bio::Taxonomy::Node->new takes the -name value and makes a common name
> out of it, whilst the name() method passes any 'scientific' name to  
> the
> scientific_name() method which is unable to set any value (and warns
> about this), only get.
>
> It seems like the need to have this classification array work the same
> way as Bio::Species is causing some unnecessary restrictions. Can't  
> the
> more sensible idea of having a dedicated storage spot for the
> ScientificName and other parameters be used, with the classification
> array either being generated just-in-time from the hash-stored  
> data, or
> indeed being generated from the Lineage field?
>
>
> Also, why does a node store the complete hierarchy on itself in the
> classification array? If we're going that far, why don't the
> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> complete Bio::Taxonomy with all the nodes. Whilst most nodes would  
> only
> have a minimum of information, if you could simply ask a node what its
> rank and scientific name was you could easily build a classification
> array, or ask what Kingdom your species was in etc.
>
> Are there good reasons for Taxonomy working the way it does in  
> 1.5.1, or
> would I not be wasting my time re-writing things to make more sense  
> (to me)?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cuiw at mail.nih.gov  Wed May 10 21:46:00 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 21:46:00 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module
	andrun-primer3 module?
References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B07D391@nihcesmlbx10.nih.gov>

1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file.
 
2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output PRIMER_SEQUENCE_ID
 
3. primer3.exe is called in the Bio::Tools::Run::Primer3  "run" function, please read the function definition.


________________________________

From: chen li [mailto:chen_li3 at yahoo.com]
Sent: Wed 5/10/2006 8:27 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module?


First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run;

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 10 23:36:39 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 22:36:39 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
Message-ID: <000301c674ac$1d40f0f0$15327e82@pyrimidine>

I think you can get pretty much everything now, though I can definitely see
the use of a local database.  I ran a few tests, really unrelated to this,
using the powerscripting test page at NCBI for eutils (for the curious, at
http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was able to
retrieve XML-formatted taxonomic information; here's the bacterium Frankia
sp. CcI3 TaxID info, which looks like they have everything set up by rank.
It gives quite a bit of information. 
 
<?xml version="1.0"?>
<!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
"http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
<TaxaSet>

<Taxon>
  <TaxId>106370</TaxId>
  <ScientificName>Frankia sp. CcI3</ScientificName>
  <ParentTaxId>1854</ParentTaxId>
  <Rank>species</Rank>
  <Division>Bacteria</Division>
  <GeneticCode>
    <GCId>11</GCId>
    <GCName>Bacterial and Plant Plastid</GCName>
  </GeneticCode>
  <MitoGeneticCode>
    <MGCId>0</MGCId>
    <MGCName>Unspecified</MGCName>
  </MitoGeneticCode>
  <Lineage>cellular organisms; Bacteria; Actinobacteria; Actinobacteria
(class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
Frankia</Lineage>
  <LineageEx>
    <Taxon>
      <TaxId>131567</TaxId>
      <ScientificName>cellular organisms</ScientificName>
      <Rank>no rank</Rank>
    </Taxon>
    <Taxon>
      <TaxId>2</TaxId>
      <ScientificName>Bacteria</ScientificName>
      <Rank>superkingdom</Rank>
    </Taxon>
    <Taxon>
      <TaxId>201174</TaxId>
      <ScientificName>Actinobacteria</ScientificName>
      <Rank>phylum</Rank>
    </Taxon>
    <Taxon>
      <TaxId>1760</TaxId>
      <ScientificName>Actinobacteria (class)</ScientificName>
      <Rank>class</Rank>
    </Taxon>
    <Taxon>
      <TaxId>85003</TaxId>
      <ScientificName>Actinobacteridae</ScientificName>
      <Rank>subclass</Rank>
    </Taxon>
    <Taxon>
      <TaxId>2037</TaxId>
      <ScientificName>Actinomycetales</ScientificName>
      <Rank>order</Rank>
    </Taxon>
    <Taxon>
      <TaxId>85013</TaxId>
      <ScientificName>Frankineae</ScientificName>
      <Rank>suborder</Rank>
    </Taxon>
    <Taxon>
      <TaxId>74712</TaxId>
      <ScientificName>Frankiaceae</ScientificName>
      <Rank>family</Rank>
    </Taxon>
    <Taxon>
      <TaxId>1854</TaxId>
      <ScientificName>Frankia</ScientificName>
      <Rank>genus</Rank>
    </Taxon>
  </LineageEx>
  <CreateDate>1999/10/22</CreateDate>
  <UpdateDate>2005/01/19</UpdateDate>
  <PubDate>2000/02/02</PubDate>
</Taxon>


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Wednesday, May 10, 2006 7:54 PM
> To: Sendu Bala
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> 
> I would use the implementation that talks to the flatfile db as the
> standard here.  nodes are defined by the data in from taxonomy dump
> dbs from ncbi.
> the eutils is pretty worthless except for taxid->name or reverse, you
> can't get the full taxonomy (or couldn't when that implementation was
> written).
> 
> The "name" method refers to the name of the node - each level in the
> taxonomy can have a "name".
> 
> The bits of hackiness relate to wrapping the node object as a
> Bio::Species and/or being able to read  a genbank file and the
> organism taxonomy data as a list and instantiating.  If we could rely
> on everything being in a DB of course this would be simpler.
> 
> Another problem is the depth of the taxonomy is not constant for
> every node so assuming that a fixed number of slots will be filled in
> to generate the taxonomy leads to problems.
> 
> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
> best example of working code as this is how I really wanted it to
> work, the Bio::Species hacks are only there to shoehorn data
> retrieved from genbank files in.  With the flatfile implementation
> you have to walk all the way up the db hierarchy to get the kingdom
> for a node so you do have to build up the classification hierarchy as
> each node only stores data about itsself.
> 
> I'm not exactly sure what you are proposing to do, but would
> definitely enjoy another pair of hands, I don't really have time to
> mess with it any time soon.
> 
> -jason
> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> 
> > Hi,
> > I'm a little confused as to how names are supposed to work in
> > Bio::Taxonomy::Node.
> >
> > In the bioperl versions that I've looked at a Node doesn't seem to
> > store
> > the most important information about itself - it's scientific name
> > - in
> > an obvious place. bioperl 1.5.1 puts it at the start of the
> > classification list. I'd have thought sticking it in -name would make
> > more sense, but this is used only for the GenBank common name.
> >
> > The Bio::Taxonomy docs still suggests:
> >
> > my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >    -names => {
> >        'scientific' => ['sapiens'],
> >        'common_name' => ['human']
> >    },
> >    -rank => 'species'  # Required tag
> > );
> >
> > and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> > 'name' method which claims to work like:
> >
> > $obj->name('scientific', 'sapiens');
> >
> > This kind of thing would be really nice, but afaics
> > Bio::Taxonomy::Node->new takes the -name value and makes a common name
> > out of it, whilst the name() method passes any 'scientific' name to
> > the
> > scientific_name() method which is unable to set any value (and warns
> > about this), only get.
> >
> > It seems like the need to have this classification array work the same
> > way as Bio::Species is causing some unnecessary restrictions. Can't
> > the
> > more sensible idea of having a dedicated storage spot for the
> > ScientificName and other parameters be used, with the classification
> > array either being generated just-in-time from the hash-stored
> > data, or
> > indeed being generated from the Lineage field?
> >
> >
> > Also, why does a node store the complete hierarchy on itself in the
> > classification array? If we're going that far, why don't the
> > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> > get_taxonomy() method instead of a get_Taxonomy_Node() method.
> > get_taxonomy() could, from a single efetch.fcgi lookup, create a
> > complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> > only
> > have a minimum of information, if you could simply ask a node what its
> > rank and scientific name was you could easily build a classification
> > array, or ask what Kingdom your species was in etc.
> >
> > Are there good reasons for Taxonomy working the way it does in
> > 1.5.1, or
> > would I not be wasting my time re-writing things to make more sense
> > (to me)?
> >
> >
> > Cheers,
> > Sendu.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 11 08:04:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 08:04:54 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
Message-ID: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>

Great - now we just need someone to volunteer to actually work on this.

The current code grabs most of this but I believe expects a different  
XML


On May 10, 2006, at 11:36 PM, Chris Fields wrote:

> I think you can get pretty much everything now, though I can  
> definitely see
> the use of a local database.  I ran a few tests, really unrelated  
> to this,
> using the powerscripting test page at NCBI for eutils (for the  
> curious, at
> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was  
> able to
> retrieve XML-formatted taxonomic information; here's the bacterium  
> Frankia
> sp. CcI3 TaxID info, which looks like they have everything set up  
> by rank.
> It gives quite a bit of information.
>
> <?xml version="1.0"?>
> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> <TaxaSet>
>
> <Taxon>
>   <TaxId>106370</TaxId>
>   <ScientificName>Frankia sp. CcI3</ScientificName>
>   <ParentTaxId>1854</ParentTaxId>
>   <Rank>species</Rank>
>   <Division>Bacteria</Division>
>   <GeneticCode>
>     <GCId>11</GCId>
>     <GCName>Bacterial and Plant Plastid</GCName>
>   </GeneticCode>
>   <MitoGeneticCode>
>     <MGCId>0</MGCId>
>     <MGCName>Unspecified</MGCName>
>   </MitoGeneticCode>
>   <Lineage>cellular organisms; Bacteria; Actinobacteria;  
> Actinobacteria
> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> Frankia</Lineage>
>   <LineageEx>
>     <Taxon>
>       <TaxId>131567</TaxId>
>       <ScientificName>cellular organisms</ScientificName>
>       <Rank>no rank</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>2</TaxId>
>       <ScientificName>Bacteria</ScientificName>
>       <Rank>superkingdom</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>201174</TaxId>
>       <ScientificName>Actinobacteria</ScientificName>
>       <Rank>phylum</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>1760</TaxId>
>       <ScientificName>Actinobacteria (class)</ScientificName>
>       <Rank>class</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>85003</TaxId>
>       <ScientificName>Actinobacteridae</ScientificName>
>       <Rank>subclass</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>2037</TaxId>
>       <ScientificName>Actinomycetales</ScientificName>
>       <Rank>order</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>85013</TaxId>
>       <ScientificName>Frankineae</ScientificName>
>       <Rank>suborder</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>74712</TaxId>
>       <ScientificName>Frankiaceae</ScientificName>
>       <Rank>family</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>1854</TaxId>
>       <ScientificName>Frankia</ScientificName>
>       <Rank>genus</Rank>
>     </Taxon>
>   </LineageEx>
>   <CreateDate>1999/10/22</CreateDate>
>   <UpdateDate>2005/01/19</UpdateDate>
>   <PubDate>2000/02/02</PubDate>
> </Taxon>
>
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>> Sent: Wednesday, May 10, 2006 7:54 PM
>> To: Sendu Bala
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>
>> I would use the implementation that talks to the flatfile db as the
>> standard here.  nodes are defined by the data in from taxonomy dump
>> dbs from ncbi.
>> the eutils is pretty worthless except for taxid->name or reverse, you
>> can't get the full taxonomy (or couldn't when that implementation was
>> written).
>>
>> The "name" method refers to the name of the node - each level in the
>> taxonomy can have a "name".
>>
>> The bits of hackiness relate to wrapping the node object as a
>> Bio::Species and/or being able to read  a genbank file and the
>> organism taxonomy data as a list and instantiating.  If we could rely
>> on everything being in a DB of course this would be simpler.
>>
>> Another problem is the depth of the taxonomy is not constant for
>> every node so assuming that a fixed number of slots will be filled in
>> to generate the taxonomy leads to problems.
>>
>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
>> best example of working code as this is how I really wanted it to
>> work, the Bio::Species hacks are only there to shoehorn data
>> retrieved from genbank files in.  With the flatfile implementation
>> you have to walk all the way up the db hierarchy to get the kingdom
>> for a node so you do have to build up the classification hierarchy as
>> each node only stores data about itsself.
>>
>> I'm not exactly sure what you are proposing to do, but would
>> definitely enjoy another pair of hands, I don't really have time to
>> mess with it any time soon.
>>
>> -jason
>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
>>
>>> Hi,
>>> I'm a little confused as to how names are supposed to work in
>>> Bio::Taxonomy::Node.
>>>
>>> In the bioperl versions that I've looked at a Node doesn't seem to
>>> store
>>> the most important information about itself - it's scientific name
>>> - in
>>> an obvious place. bioperl 1.5.1 puts it at the start of the
>>> classification list. I'd have thought sticking it in -name would  
>>> make
>>> more sense, but this is used only for the GenBank common name.
>>>
>>> The Bio::Taxonomy docs still suggests:
>>>
>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>>>    -names => {
>>>        'scientific' => ['sapiens'],
>>>        'common_name' => ['human']
>>>    },
>>>    -rank => 'species'  # Required tag
>>> );
>>>
>>> and whilst Bio::Taxonomy::Node does not accept -names, it does  
>>> have a
>>> 'name' method which claims to work like:
>>>
>>> $obj->name('scientific', 'sapiens');
>>>
>>> This kind of thing would be really nice, but afaics
>>> Bio::Taxonomy::Node->new takes the -name value and makes a common  
>>> name
>>> out of it, whilst the name() method passes any 'scientific' name to
>>> the
>>> scientific_name() method which is unable to set any value (and warns
>>> about this), only get.
>>>
>>> It seems like the need to have this classification array work the  
>>> same
>>> way as Bio::Species is causing some unnecessary restrictions. Can't
>>> the
>>> more sensible idea of having a dedicated storage spot for the
>>> ScientificName and other parameters be used, with the classification
>>> array either being generated just-in-time from the hash-stored
>>> data, or
>>> indeed being generated from the Lineage field?
>>>
>>>
>>> Also, why does a node store the complete hierarchy on itself in the
>>> classification array? If we're going that far, why don't the
>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
>>> only
>>> have a minimum of information, if you could simply ask a node  
>>> what its
>>> rank and scientific name was you could easily build a classification
>>> array, or ask what Kingdom your species was in etc.
>>>
>>> Are there good reasons for Taxonomy working the way it does in
>>> 1.5.1, or
>>> would I not be wasting my time re-writing things to make more sense
>>> (to me)?
>>>
>>>
>>> Cheers,
>>> Sendu.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Thu May 11 07:51:44 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 11 May 2006 12:51:44 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
	<655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
Message-ID: <44632550.3040603@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> I would use the implementation that talks to the flatfile db as the 
> standard here.  nodes are defined by the data in from taxonomy dump
> dbs from ncbi. the eutils is pretty worthless except for taxid->name
> or reverse, you can't get the full taxonomy (or couldn't when that
> implementation was written).

I'm not sure what you mean. In 1.5.1 you have access to the full
taxonomy because you're using efetch.fcgi. Indeed, you parse the full
taxonomy already to get the classification.


> The "name" method refers to the name of the node - each level in the
>  taxonomy can have a "name".

Yes, and to me the 'name of the node' is its scientific name (something
like 'sapiens'), not a 'common' name. So why is it stored as a
'common' name in the object? Why don't the DB::Taxonomy modules store
the actual common names (something like 'human')?


> The bits of hackiness relate to wrapping the node object as a 
> Bio::Species and/or being able to read  a genbank file and the
> organism taxonomy data as a list and instantiating.  If we could rely
> on everything being in a DB of course this would be simpler.

I think that Taxonomy stuff could be done in a 'pure' way, with a new
Bio::Species made as a wrapper around an appropriate Taxonomy module(s)
that cheated and made fake nodes from a genbank list and then made a
proper Bio::Taxonomy.


> With the flatfile implementation you have to walk all the way up the
> db hierarchy to get the kingdom for a node so you do have to build up
> the classification hierarchy as each node only stores data about
> itsself.

I'm still actually using bioperl 1.4 but I'm looking at 1.5.1 assuming
it is the latest available and I see that the flatfile implementation
works the same way as the entrez one. The requested node is fetched, but
then internally it walks the hierarchy purely so it can build a
classification list which is then stored on the object. If you're
already retrieving every node above the the requested node, why not just
return every node? Why not just return a whole Bio::Taxonomy?


> I'm not exactly sure what you are proposing to do, but would
> definitely enjoy another pair of hands, I don't really have time to
> mess with it any time soon.

I shouldn't really be spending any time on it either, but I knocked up a
quick implementation for myself yesterday/today. I'm working on a bunch 
of modules that inherit from bioperl and then add/alter to suit my 
needs. In this regard they're a bit limited and kind of hard-coded to my 
way of thinking, but hopefully you can see my intent and perhaps use 
some of my implementation.

In my implementation:
# DB::Taxonomy::* return a Bio::Taxonomy equivalent with a single 
database lookup.
# The Taxonomy is implicitly a tree.
# The Taxonomy can have branches of different length from root to the
same rank level.
# The Taxonomy isn't told what ranks is has (isn't limited by some
supplied rank list); it has the ranks that its Nodes have and knows
(without being told) what order those ranks should be in.
# The Taxonomy is made of Nodes that truly only contain information
about themselves and have no classification array or anything like that.
# A Node can still be classified.
# We can have Nodes of rank 'no rank' that will be correctly ordered in
the classification.
# Nodes have a scientific name and common names
# You get parent and all children nodes without database lookups.
# There is a Bio::Species like thing that wraps around this and gives
easy access to what I really want to do:

my $human = TFBS::Species->new(-common_name => 'human');
my @classification = $human->classification; # returns the array you'd
expect from a normally created, fully classified Bio::Species
my $kingdom = $human->kingdom # returns 'Metazoa'

# For genbank, we can still supply TFBS::Species a classification array

http://bix.sendu.me.uk/files/taxonomy_the_tfbs_way.tar.gz
(only tested inheriting from bioperl 1.4, but ideally that shouldn't 
make any difference!)

Is there any scope for bioperl Taxonomy becoming more like this? Or are
there problems with my design (quite likely!)? Or are there good reasons
for maintaining the current way of working? Please feel free to shoot me
down/ discuss.


Cheers,
Sendu.


From sb at mrc-dunn.cam.ac.uk  Thu May 11 08:22:53 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 11 May 2006 13:22:53 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
	<EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
Message-ID: <44632C9D.4010408@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> Great - now we just need someone to volunteer to actually work on this.

Now I'm really confused...


> The current code grabs most of this but I believe expects a different XML

No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez expects 
that XML, and parses it as fully as flatfile.pm does. Nothing more to 
do. Weren't you the person that wrote that parser?

I parse the same XML in my version of entrez.pm (see my previous email); 
the main difference being I make Nodes out of each Taxon instead of just 
adding each Taxon's ScientificName to the classification array.


From jason.stajich at duke.edu  Thu May 11 09:53:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 09:53:56 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <44632C9D.4010408@mrc-dunn.cam.ac.uk>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
	<EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
	<44632C9D.4010408@mrc-dunn.cam.ac.uk>
Message-ID: <AAFFC5EC-8B54-4D87-BE38-CB90785AD4B5@duke.edu>

i guess so - long since forgotten what it supports though since I  
don't regularly use it. sorry.

On May 11, 2006, at 8:22 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>> Great - now we just need someone to volunteer to actually work on  
>> this.
>
> Now I'm really confused...
>
>
>> The current code grabs most of this but I believe expects a  
>> different XML
>
> No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez  
> expects
> that XML, and parses it as fully as flatfile.pm does. Nothing more to
> do. Weren't you the person that wrote that parser?
>
> I parse the same XML in my version of entrez.pm (see my previous  
> email);
> the main difference being I make Nodes out of each Taxon instead of  
> just
> adding each Taxon's ScientificName to the classification array.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Thu May 11 10:57:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 11 May 2006 09:57:20 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
Message-ID: <000b01c6750b$33e95ea0$15327e82@pyrimidine>

Heh... 

To tell the truth, I haven't looked at Bio::DB::Taxonomy in any depth yet,
but I myself have seen issues with the way Bio::Species treats bacterial
strains (I guess this also involves Bio::Taxonomy::Node since that's what
Bio::Species delegates to).  Seems it likes to repeat some strain names when
using $seq->species->common_name.  Not a killer problem but annoying since
the correct name is in the source tag in the feature table!  I 'could' take
a look at it but I can't guarantee quick results.

Jason, I could add Taxonomy to the EUtilities overhaul I mentioned to you
previously but it'll take awhile to get going.  I'm really more interested
in getting epost-esearch-efetch sequence retrieval up and running first with
the same API as Bio::DB::GenBank/Genpept and Bio::DB::Query::GenBank, donate
the code (late summer/fall???) after working out namespace issues so it
doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I suppose I
could also look at Bio::DB:Taxonomy to see what's up in the next couple of
weeks (after conference), unless someone gets to it sooner.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Thursday, May 11, 2006 7:05 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> 
> Great - now we just need someone to volunteer to actually work on this.
> 
> The current code grabs most of this but I believe expects a different
> XML
> 
> 
> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
> 
> > I think you can get pretty much everything now, though I can
> > definitely see
> > the use of a local database.  I ran a few tests, really unrelated
> > to this,
> > using the powerscripting test page at NCBI for eutils (for the
> > curious, at
> > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
> > able to
> > retrieve XML-formatted taxonomic information; here's the bacterium
> > Frankia
> > sp. CcI3 TaxID info, which looks like they have everything set up
> > by rank.
> > It gives quite a bit of information.
> >
> > <?xml version="1.0"?>
> > <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> > <TaxaSet>
> >
> > <Taxon>
> >   <TaxId>106370</TaxId>
> >   <ScientificName>Frankia sp. CcI3</ScientificName>
> >   <ParentTaxId>1854</ParentTaxId>
> >   <Rank>species</Rank>
> >   <Division>Bacteria</Division>
> >   <GeneticCode>
> >     <GCId>11</GCId>
> >     <GCName>Bacterial and Plant Plastid</GCName>
> >   </GeneticCode>
> >   <MitoGeneticCode>
> >     <MGCId>0</MGCId>
> >     <MGCName>Unspecified</MGCName>
> >   </MitoGeneticCode>
> >   <Lineage>cellular organisms; Bacteria; Actinobacteria;
> > Actinobacteria
> > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> > Frankia</Lineage>
> >   <LineageEx>
> >     <Taxon>
> >       <TaxId>131567</TaxId>
> >       <ScientificName>cellular organisms</ScientificName>
> >       <Rank>no rank</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>2</TaxId>
> >       <ScientificName>Bacteria</ScientificName>
> >       <Rank>superkingdom</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>201174</TaxId>
> >       <ScientificName>Actinobacteria</ScientificName>
> >       <Rank>phylum</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>1760</TaxId>
> >       <ScientificName>Actinobacteria (class)</ScientificName>
> >       <Rank>class</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>85003</TaxId>
> >       <ScientificName>Actinobacteridae</ScientificName>
> >       <Rank>subclass</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>2037</TaxId>
> >       <ScientificName>Actinomycetales</ScientificName>
> >       <Rank>order</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>85013</TaxId>
> >       <ScientificName>Frankineae</ScientificName>
> >       <Rank>suborder</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>74712</TaxId>
> >       <ScientificName>Frankiaceae</ScientificName>
> >       <Rank>family</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>1854</TaxId>
> >       <ScientificName>Frankia</ScientificName>
> >       <Rank>genus</Rank>
> >     </Taxon>
> >   </LineageEx>
> >   <CreateDate>1999/10/22</CreateDate>
> >   <UpdateDate>2005/01/19</UpdateDate>
> >   <PubDate>2000/02/02</PubDate>
> > </Taxon>
> >
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >> Sent: Wednesday, May 10, 2006 7:54 PM
> >> To: Sendu Bala
> >> Cc: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>
> >> I would use the implementation that talks to the flatfile db as the
> >> standard here.  nodes are defined by the data in from taxonomy dump
> >> dbs from ncbi.
> >> the eutils is pretty worthless except for taxid->name or reverse, you
> >> can't get the full taxonomy (or couldn't when that implementation was
> >> written).
> >>
> >> The "name" method refers to the name of the node - each level in the
> >> taxonomy can have a "name".
> >>
> >> The bits of hackiness relate to wrapping the node object as a
> >> Bio::Species and/or being able to read  a genbank file and the
> >> organism taxonomy data as a list and instantiating.  If we could rely
> >> on everything being in a DB of course this would be simpler.
> >>
> >> Another problem is the depth of the taxonomy is not constant for
> >> every node so assuming that a fixed number of slots will be filled in
> >> to generate the taxonomy leads to problems.
> >>
> >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
> >> best example of working code as this is how I really wanted it to
> >> work, the Bio::Species hacks are only there to shoehorn data
> >> retrieved from genbank files in.  With the flatfile implementation
> >> you have to walk all the way up the db hierarchy to get the kingdom
> >> for a node so you do have to build up the classification hierarchy as
> >> each node only stores data about itsself.
> >>
> >> I'm not exactly sure what you are proposing to do, but would
> >> definitely enjoy another pair of hands, I don't really have time to
> >> mess with it any time soon.
> >>
> >> -jason
> >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> >>
> >>> Hi,
> >>> I'm a little confused as to how names are supposed to work in
> >>> Bio::Taxonomy::Node.
> >>>
> >>> In the bioperl versions that I've looked at a Node doesn't seem to
> >>> store
> >>> the most important information about itself - it's scientific name
> >>> - in
> >>> an obvious place. bioperl 1.5.1 puts it at the start of the
> >>> classification list. I'd have thought sticking it in -name would
> >>> make
> >>> more sense, but this is used only for the GenBank common name.
> >>>
> >>> The Bio::Taxonomy docs still suggests:
> >>>
> >>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >>>    -names => {
> >>>        'scientific' => ['sapiens'],
> >>>        'common_name' => ['human']
> >>>    },
> >>>    -rank => 'species'  # Required tag
> >>> );
> >>>
> >>> and whilst Bio::Taxonomy::Node does not accept -names, it does
> >>> have a
> >>> 'name' method which claims to work like:
> >>>
> >>> $obj->name('scientific', 'sapiens');
> >>>
> >>> This kind of thing would be really nice, but afaics
> >>> Bio::Taxonomy::Node->new takes the -name value and makes a common
> >>> name
> >>> out of it, whilst the name() method passes any 'scientific' name to
> >>> the
> >>> scientific_name() method which is unable to set any value (and warns
> >>> about this), only get.
> >>>
> >>> It seems like the need to have this classification array work the
> >>> same
> >>> way as Bio::Species is causing some unnecessary restrictions. Can't
> >>> the
> >>> more sensible idea of having a dedicated storage spot for the
> >>> ScientificName and other parameters be used, with the classification
> >>> array either being generated just-in-time from the hash-stored
> >>> data, or
> >>> indeed being generated from the Lineage field?
> >>>
> >>>
> >>> Also, why does a node store the complete hierarchy on itself in the
> >>> classification array? If we're going that far, why don't the
> >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> >>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> >>> only
> >>> have a minimum of information, if you could simply ask a node
> >>> what its
> >>> rank and scientific name was you could easily build a classification
> >>> array, or ask what Kingdom your species was in etc.
> >>>
> >>> Are there good reasons for Taxonomy working the way it does in
> >>> 1.5.1, or
> >>> would I not be wasting my time re-writing things to make more sense
> >>> (to me)?
> >>>
> >>>
> >>> Cheers,
> >>> Sendu.
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 11 11:42:07 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 11:42:07 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <000b01c6750b$33e95ea0$15327e82@pyrimidine>
References: <000b01c6750b$33e95ea0$15327e82@pyrimidine>
Message-ID: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu>


I think you'll see it is different and mostly a limitation of the  
genbank format and the Bio::Species objects that you get from a  
genbank parse do represent the full capabilities of a Taxonomy::Node.

I am happy for someone to overhaul things, but it all boils down to  
inferring which part of a list of names is the species versus sub- 
species versus strain when none of the members of the list are  
labeled.  This is some of the same problems we have for swissprot as  
well.  I just don't think we can do it right only from the genbank  
file data so I don't see a lot of point of expecting Bio::Species to  
provide more than a representation of what is in the file and just  
return that array.


It has seemed like we need to special case things pretty heavily or  
do a lookup in the taxonomydb for something.

Can you guess what value is the strain versus sub-species?  What  
happens when there is a two part strain name (space separated) and a  
sub-species or variety designation?

SOURCE      Staphylococcus haemolyticus JCSC1435
   ORGANISM  Staphylococcus haemolyticus JCSC1435
             Bacteria; Firmicutes; Bacillales; Staphylococcus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
strain is JCSC1435

versus
SOURCE      Muntiacus muntjak vaginalis
   ORGANISM  Muntiacus muntjak vaginalis
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;  
Ruminantia;
             Pecora; Cervidae; Muntiacinae; Muntiacus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
species is muntjak, sub-species vaginalis ?

versus
SOURCE      Aspergillus nidulans FGSC A4
   ORGANISM  Aspergillus nidulans FGSC A4
             Eukaryota; Fungi; Ascomycota; Pezizomycotina;  
Eurotiomycetes;
             Eurotiales; Trichocomaceae; Emericella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321

Genus should be Aspergillus or Emericella ?

Strain and subspecies/variety in the same entry
SOURCE      Cryptococcus neoformans var. grubii H99
   ORGANISM  Cryptococcus neoformans var. grubii H99
             Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
             Heterobasidiomycetes; Tremellomycetidae; Tremellales;  
Tremellaceae;
             Filobasidiella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443


On May 11, 2006, at 10:57 AM, Chris Fields wrote:

> Heh...
>
> To tell the truth, I haven't looked at Bio::DB::Taxonomy in any  
> depth yet,
> but I myself have seen issues with the way Bio::Species treats  
> bacterial
> strains (I guess this also involves Bio::Taxonomy::Node since  
> that's what
> Bio::Species delegates to).  Seems it likes to repeat some strain  
> names when
> using $seq->species->common_name.  Not a killer problem but  
> annoying since
> the correct name is in the source tag in the feature table!  I  
> 'could' take
> a look at it but I can't guarantee quick results.
>
> Jason, I could add Taxonomy to the EUtilities overhaul I mentioned  
> to you
> previously but it'll take awhile to get going.  I'm really more  
> interested
> in getting epost-esearch-efetch sequence retrieval up and running  
> first with
> the same API as Bio::DB::GenBank/Genpept and  
> Bio::DB::Query::GenBank, donate
> the code (late summer/fall???) after working out namespace issues  
> so it
> doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I  
> suppose I
> could also look at Bio::DB:Taxonomy to see what's up in the next  
> couple of
> weeks (after conference), unless someone gets to it sooner.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>> Sent: Thursday, May 11, 2006 7:05 AM
>> To: Chris Fields
>> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>
>> Great - now we just need someone to volunteer to actually work on  
>> this.
>>
>> The current code grabs most of this but I believe expects a different
>> XML
>>
>>
>> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
>>
>>> I think you can get pretty much everything now, though I can
>>> definitely see
>>> the use of a local database.  I ran a few tests, really unrelated
>>> to this,
>>> using the powerscripting test page at NCBI for eutils (for the
>>> curious, at
>>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
>>> able to
>>> retrieve XML-formatted taxonomic information; here's the bacterium
>>> Frankia
>>> sp. CcI3 TaxID info, which looks like they have everything set up
>>> by rank.
>>> It gives quite a bit of information.
>>>
>>> <?xml version="1.0"?>
>>> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
>>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
>>> <TaxaSet>
>>>
>>> <Taxon>
>>>   <TaxId>106370</TaxId>
>>>   <ScientificName>Frankia sp. CcI3</ScientificName>
>>>   <ParentTaxId>1854</ParentTaxId>
>>>   <Rank>species</Rank>
>>>   <Division>Bacteria</Division>
>>>   <GeneticCode>
>>>     <GCId>11</GCId>
>>>     <GCName>Bacterial and Plant Plastid</GCName>
>>>   </GeneticCode>
>>>   <MitoGeneticCode>
>>>     <MGCId>0</MGCId>
>>>     <MGCName>Unspecified</MGCName>
>>>   </MitoGeneticCode>
>>>   <Lineage>cellular organisms; Bacteria; Actinobacteria;
>>> Actinobacteria
>>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
>>> Frankia</Lineage>
>>>   <LineageEx>
>>>     <Taxon>
>>>       <TaxId>131567</TaxId>
>>>       <ScientificName>cellular organisms</ScientificName>
>>>       <Rank>no rank</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>2</TaxId>
>>>       <ScientificName>Bacteria</ScientificName>
>>>       <Rank>superkingdom</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>201174</TaxId>
>>>       <ScientificName>Actinobacteria</ScientificName>
>>>       <Rank>phylum</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>1760</TaxId>
>>>       <ScientificName>Actinobacteria (class)</ScientificName>
>>>       <Rank>class</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>85003</TaxId>
>>>       <ScientificName>Actinobacteridae</ScientificName>
>>>       <Rank>subclass</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>2037</TaxId>
>>>       <ScientificName>Actinomycetales</ScientificName>
>>>       <Rank>order</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>85013</TaxId>
>>>       <ScientificName>Frankineae</ScientificName>
>>>       <Rank>suborder</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>74712</TaxId>
>>>       <ScientificName>Frankiaceae</ScientificName>
>>>       <Rank>family</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>1854</TaxId>
>>>       <ScientificName>Frankia</ScientificName>
>>>       <Rank>genus</Rank>
>>>     </Taxon>
>>>   </LineageEx>
>>>   <CreateDate>1999/10/22</CreateDate>
>>>   <UpdateDate>2005/01/19</UpdateDate>
>>>   <PubDate>2000/02/02</PubDate>
>>> </Taxon>
>>>
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>>>> Sent: Wednesday, May 10, 2006 7:54 PM
>>>> To: Sendu Bala
>>>> Cc: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>>>
>>>> I would use the implementation that talks to the flatfile db as the
>>>> standard here.  nodes are defined by the data in from taxonomy dump
>>>> dbs from ncbi.
>>>> the eutils is pretty worthless except for taxid->name or  
>>>> reverse, you
>>>> can't get the full taxonomy (or couldn't when that  
>>>> implementation was
>>>> written).
>>>>
>>>> The "name" method refers to the name of the node - each level in  
>>>> the
>>>> taxonomy can have a "name".
>>>>
>>>> The bits of hackiness relate to wrapping the node object as a
>>>> Bio::Species and/or being able to read  a genbank file and the
>>>> organism taxonomy data as a list and instantiating.  If we could  
>>>> rely
>>>> on everything being in a DB of course this would be simpler.
>>>>
>>>> Another problem is the depth of the taxonomy is not constant for
>>>> every node so assuming that a fixed number of slots will be  
>>>> filled in
>>>> to generate the taxonomy leads to problems.
>>>>
>>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as  
>>>> the
>>>> best example of working code as this is how I really wanted it to
>>>> work, the Bio::Species hacks are only there to shoehorn data
>>>> retrieved from genbank files in.  With the flatfile implementation
>>>> you have to walk all the way up the db hierarchy to get the kingdom
>>>> for a node so you do have to build up the classification  
>>>> hierarchy as
>>>> each node only stores data about itsself.
>>>>
>>>> I'm not exactly sure what you are proposing to do, but would
>>>> definitely enjoy another pair of hands, I don't really have time to
>>>> mess with it any time soon.
>>>>
>>>> -jason
>>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
>>>>
>>>>> Hi,
>>>>> I'm a little confused as to how names are supposed to work in
>>>>> Bio::Taxonomy::Node.
>>>>>
>>>>> In the bioperl versions that I've looked at a Node doesn't seem to
>>>>> store
>>>>> the most important information about itself - it's scientific name
>>>>> - in
>>>>> an obvious place. bioperl 1.5.1 puts it at the start of the
>>>>> classification list. I'd have thought sticking it in -name would
>>>>> make
>>>>> more sense, but this is used only for the GenBank common name.
>>>>>
>>>>> The Bio::Taxonomy docs still suggests:
>>>>>
>>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>>>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>>>>>    -names => {
>>>>>        'scientific' => ['sapiens'],
>>>>>        'common_name' => ['human']
>>>>>    },
>>>>>    -rank => 'species'  # Required tag
>>>>> );
>>>>>
>>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does
>>>>> have a
>>>>> 'name' method which claims to work like:
>>>>>
>>>>> $obj->name('scientific', 'sapiens');
>>>>>
>>>>> This kind of thing would be really nice, but afaics
>>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common
>>>>> name
>>>>> out of it, whilst the name() method passes any 'scientific'  
>>>>> name to
>>>>> the
>>>>> scientific_name() method which is unable to set any value (and  
>>>>> warns
>>>>> about this), only get.
>>>>>
>>>>> It seems like the need to have this classification array work the
>>>>> same
>>>>> way as Bio::Species is causing some unnecessary restrictions.  
>>>>> Can't
>>>>> the
>>>>> more sensible idea of having a dedicated storage spot for the
>>>>> ScientificName and other parameters be used, with the  
>>>>> classification
>>>>> array either being generated just-in-time from the hash-stored
>>>>> data, or
>>>>> indeed being generated from the Lineage field?
>>>>>
>>>>>
>>>>> Also, why does a node store the complete hierarchy on itself in  
>>>>> the
>>>>> classification array? If we're going that far, why don't the
>>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just  
>>>>> have a
>>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
>>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
>>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
>>>>> only
>>>>> have a minimum of information, if you could simply ask a node
>>>>> what its
>>>>> rank and scientific name was you could easily build a  
>>>>> classification
>>>>> array, or ask what Kingdom your species was in etc.
>>>>>
>>>>> Are there good reasons for Taxonomy working the way it does in
>>>>> 1.5.1, or
>>>>> would I not be wasting my time re-writing things to make more  
>>>>> sense
>>>>> (to me)?
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Sendu.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From WiersmaP at AGR.GC.CA  Thu May 11 13:04:01 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Thu, 11 May 2006 13:04:01 -0400
Subject: [Bioperl-l] What is the relationship between primer3
	moduleandrun-primer3 module?
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca>

The bug that Wenwu referred should only occur when reading a Primer3 output file;  the Bio::Tools::Run::Primer3->run method takes the results and directly transfers them to a Bio::Tools::Primer3 object without an intermediate file.  A Data::Dumper look at the Bio::Tools::Primer3 object shows the keys and results for PRIMER_SEQUENCE_ID and SEQUENCE in 'results' and then again in the 'results_by_number' hash but only in the '0' hash.

All of this doesn't really matter for Li's original concern.  If you want to include the id of sequence along with the primer3 results just take it from the seq object (i.e. $seq->display_id() ).  Since you are in a loop taking one sequence at a time this $seq will be the one that was sent to primer3.

PAW

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Cui, Wenwu (NIH/NCI) [F]
Sent: Wednesday, May 10, 2006 6:46 PM
To: chen li; bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module?

1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file.
 
2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output 
 
3. primer3.exe is called in the Bio::Tools::Run::Primer3  "run" function, please read the function definition.


From cjfields at uiuc.edu  Thu May 11 13:16:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 11 May 2006 12:16:19 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu>
Message-ID: <000f01c6751e$9e89d6a0$15327e82@pyrimidine>

> I think you'll see it is different and mostly a limitation of the
> genbank format and the Bio::Species objects that you get from a
> genbank parse do represent the full capabilities of a Taxonomy::Node.

I definitely see the rational for using a TaxID lookup (I think Hilmar said
so as well), especially for local databases.  I wonder, though, if there is
a way that RichSeqs like GenBank, when passed through SeqIO, can be just be
'short-circuited' using the sequence builder to just accept what's on the
SOURCE or ORGANISM line of a file as is, without forcing it into
Bio::Species/Bio::Taxonomy::Node.  Or maybe diminish the role of the
SOURCE/ORGANISM lines altogether to just simple Annotation objects and place
much greater emphasis on the TaxID itself, in effect decoupling the TaxID
(taxonomic information) from SOURCE/ORGANISM (annotation information).

In other words, have GenBank/EMBL classification lines and organism lines
essentially stay like they are in the input file (use simple objects).
Then, if one were really intent on getting the full name, classification,
etc., or one wanted to store their sequences in bioperl-db, they would be
required to either have a local db of NCBI Taxonomy or remote access to a
similar database (NCBI or something else) so a lookup could be accomplished
using the TaxID.  If they us BioSQL, then require them to preload their
BioSQL database with NCBI's taxonomy, something Hilmar already strongly
suggests.

If anyone isn't interested in the taxonomic information or doesn't want to
bother grabbing the database or setting up remote access, tough luck; just
grab the Bio::Annotation/Bio::Species object and use that.  As the saying
goes, "you can't be all things to all people."  At some point you have to
throw your arms in the air, do the best you can, but give up trying to
please everyone.

> I am happy for someone to overhaul things, but it all boils down to
> inferring which part of a list of names is the species versus sub-
> species versus strain when none of the members of the list are
> labeled.  This is some of the same problems we have for swissprot as
> well.  I just don't think we can do it right only from the genbank
> file data so I don't see a lot of point of expecting Bio::Species to
> provide more than a representation of what is in the file and just
> return that array.
> 
> 
> It has seemed like we need to special case things pretty heavily or
> do a lookup in the taxonomydb for something.
> 
> Can you guess what value is the strain versus sub-species?  What
> happens when there is a two part strain name (space separated) and a
> sub-species or variety designation?
> 
> SOURCE      Staphylococcus haemolyticus JCSC1435
>    ORGANISM  Staphylococcus haemolyticus JCSC1435
>              Bacteria; Firmicutes; Bacillales; Staphylococcus.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
> strain is JCSC1435
> 
> versus
> SOURCE      Muntiacus muntjak vaginalis
>    ORGANISM  Muntiacus muntjak vaginalis
>              Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
>              Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;
> Ruminantia;
>              Pecora; Cervidae; Muntiacinae; Muntiacus.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
> species is muntjak, sub-species vaginalis ?
> 
> versus
> SOURCE      Aspergillus nidulans FGSC A4
>    ORGANISM  Aspergillus nidulans FGSC A4
>              Eukaryota; Fungi; Ascomycota; Pezizomycotina;
> Eurotiomycetes;
>              Eurotiales; Trichocomaceae; Emericella.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321
> 
> Genus should be Aspergillus or Emericella ?
> 
> Strain and subspecies/variety in the same entry
> SOURCE      Cryptococcus neoformans var. grubii H99
>    ORGANISM  Cryptococcus neoformans var. grubii H99
>              Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
>              Heterobasidiomycetes; Tremellomycetidae; Tremellales;
> Tremellaceae;
>              Filobasidiella.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443

Definitely tricky!  This really points out the problem here.  It used to be
a problem for only a few cases but with so many bacterial and fungal genomes
that's changed.  

The Frankia XML example has the scientific name set to "Frankia sp. CcI3",
which matches the SOURCE/ORGANISM line in NCBI's GenBank files and the OS
line in EMBL files.  It looks like the lines are parsed into and then built
from the ground-up in Bio::SeqIO::genbank using Bio::Species objects, which,
in my case with the strain designation, is where the problem lies.  They
could be placed in annotation objects with (-tagname=> 'SOURCE', value
=>'Frankia sp. CcI3') or similar settings.  Or simplify Bio::Species to only
represent the information in the GenBank SOURCE/ORGANISM/CLASSIFICATION or
EMBL OS/OC lines and nothing more complex than that (no complex taxonomy;
for that you use the TaxID and local database). 

Okay,  I need to lay off the coffee now...

Chris

> On May 11, 2006, at 10:57 AM, Chris Fields wrote:
> 
> > Heh...
> >
> > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any
> > depth yet,
> > but I myself have seen issues with the way Bio::Species treats
> > bacterial
> > strains (I guess this also involves Bio::Taxonomy::Node since
> > that's what
> > Bio::Species delegates to).  Seems it likes to repeat some strain
> > names when
> > using $seq->species->common_name.  Not a killer problem but
> > annoying since
> > the correct name is in the source tag in the feature table!  I
> > 'could' take
> > a look at it but I can't guarantee quick results.
> >
> > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned
> > to you
> > previously but it'll take awhile to get going.  I'm really more
> > interested
> > in getting epost-esearch-efetch sequence retrieval up and running
> > first with
> > the same API as Bio::DB::GenBank/Genpept and
> > Bio::DB::Query::GenBank, donate
> > the code (late summer/fall???) after working out namespace issues
> > so it
> > doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I
> > suppose I
> > could also look at Bio::DB:Taxonomy to see what's up in the next
> > couple of
> > weeks (after conference), unless someone gets to it sooner.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >> Sent: Thursday, May 11, 2006 7:05 AM
> >> To: Chris Fields
> >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
> >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>
> >> Great - now we just need someone to volunteer to actually work on
> >> this.
> >>
> >> The current code grabs most of this but I believe expects a different
> >> XML
> >>
> >>
> >> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
> >>
> >>> I think you can get pretty much everything now, though I can
> >>> definitely see
> >>> the use of a local database.  I ran a few tests, really unrelated
> >>> to this,
> >>> using the powerscripting test page at NCBI for eutils (for the
> >>> curious, at
> >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
> >>> able to
> >>> retrieve XML-formatted taxonomic information; here's the bacterium
> >>> Frankia
> >>> sp. CcI3 TaxID info, which looks like they have everything set up
> >>> by rank.
> >>> It gives quite a bit of information.
> >>>
> >>> <?xml version="1.0"?>
> >>> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> >>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> >>> <TaxaSet>
> >>>
> >>> <Taxon>
> >>>   <TaxId>106370</TaxId>
> >>>   <ScientificName>Frankia sp. CcI3</ScientificName>
> >>>   <ParentTaxId>1854</ParentTaxId>
> >>>   <Rank>species</Rank>
> >>>   <Division>Bacteria</Division>
> >>>   <GeneticCode>
> >>>     <GCId>11</GCId>
> >>>     <GCName>Bacterial and Plant Plastid</GCName>
> >>>   </GeneticCode>
> >>>   <MitoGeneticCode>
> >>>     <MGCId>0</MGCId>
> >>>     <MGCName>Unspecified</MGCName>
> >>>   </MitoGeneticCode>
> >>>   <Lineage>cellular organisms; Bacteria; Actinobacteria;
> >>> Actinobacteria
> >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> >>> Frankia</Lineage>
> >>>   <LineageEx>
> >>>     <Taxon>
> >>>       <TaxId>131567</TaxId>
> >>>       <ScientificName>cellular organisms</ScientificName>
> >>>       <Rank>no rank</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>2</TaxId>
> >>>       <ScientificName>Bacteria</ScientificName>
> >>>       <Rank>superkingdom</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>201174</TaxId>
> >>>       <ScientificName>Actinobacteria</ScientificName>
> >>>       <Rank>phylum</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>1760</TaxId>
> >>>       <ScientificName>Actinobacteria (class)</ScientificName>
> >>>       <Rank>class</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>85003</TaxId>
> >>>       <ScientificName>Actinobacteridae</ScientificName>
> >>>       <Rank>subclass</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>2037</TaxId>
> >>>       <ScientificName>Actinomycetales</ScientificName>
> >>>       <Rank>order</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>85013</TaxId>
> >>>       <ScientificName>Frankineae</ScientificName>
> >>>       <Rank>suborder</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>74712</TaxId>
> >>>       <ScientificName>Frankiaceae</ScientificName>
> >>>       <Rank>family</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>1854</TaxId>
> >>>       <ScientificName>Frankia</ScientificName>
> >>>       <Rank>genus</Rank>
> >>>     </Taxon>
> >>>   </LineageEx>
> >>>   <CreateDate>1999/10/22</CreateDate>
> >>>   <UpdateDate>2005/01/19</UpdateDate>
> >>>   <PubDate>2000/02/02</PubDate>
> >>> </Taxon>
> >>>
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >>>> Sent: Wednesday, May 10, 2006 7:54 PM
> >>>> To: Sendu Bala
> >>>> Cc: bioperl-l at lists.open-bio.org
> >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>>>
> >>>> I would use the implementation that talks to the flatfile db as the
> >>>> standard here.  nodes are defined by the data in from taxonomy dump
> >>>> dbs from ncbi.
> >>>> the eutils is pretty worthless except for taxid->name or
> >>>> reverse, you
> >>>> can't get the full taxonomy (or couldn't when that
> >>>> implementation was
> >>>> written).
> >>>>
> >>>> The "name" method refers to the name of the node - each level in
> >>>> the
> >>>> taxonomy can have a "name".
> >>>>
> >>>> The bits of hackiness relate to wrapping the node object as a
> >>>> Bio::Species and/or being able to read  a genbank file and the
> >>>> organism taxonomy data as a list and instantiating.  If we could
> >>>> rely
> >>>> on everything being in a DB of course this would be simpler.
> >>>>
> >>>> Another problem is the depth of the taxonomy is not constant for
> >>>> every node so assuming that a fixed number of slots will be
> >>>> filled in
> >>>> to generate the taxonomy leads to problems.
> >>>>
> >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as
> >>>> the
> >>>> best example of working code as this is how I really wanted it to
> >>>> work, the Bio::Species hacks are only there to shoehorn data
> >>>> retrieved from genbank files in.  With the flatfile implementation
> >>>> you have to walk all the way up the db hierarchy to get the kingdom
> >>>> for a node so you do have to build up the classification
> >>>> hierarchy as
> >>>> each node only stores data about itsself.
> >>>>
> >>>> I'm not exactly sure what you are proposing to do, but would
> >>>> definitely enjoy another pair of hands, I don't really have time to
> >>>> mess with it any time soon.
> >>>>
> >>>> -jason
> >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> >>>>
> >>>>> Hi,
> >>>>> I'm a little confused as to how names are supposed to work in
> >>>>> Bio::Taxonomy::Node.
> >>>>>
> >>>>> In the bioperl versions that I've looked at a Node doesn't seem to
> >>>>> store
> >>>>> the most important information about itself - it's scientific name
> >>>>> - in
> >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the
> >>>>> classification list. I'd have thought sticking it in -name would
> >>>>> make
> >>>>> more sense, but this is used only for the GenBank common name.
> >>>>>
> >>>>> The Bio::Taxonomy docs still suggests:
> >>>>>
> >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >>>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >>>>>    -names => {
> >>>>>        'scientific' => ['sapiens'],
> >>>>>        'common_name' => ['human']
> >>>>>    },
> >>>>>    -rank => 'species'  # Required tag
> >>>>> );
> >>>>>
> >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does
> >>>>> have a
> >>>>> 'name' method which claims to work like:
> >>>>>
> >>>>> $obj->name('scientific', 'sapiens');
> >>>>>
> >>>>> This kind of thing would be really nice, but afaics
> >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common
> >>>>> name
> >>>>> out of it, whilst the name() method passes any 'scientific'
> >>>>> name to
> >>>>> the
> >>>>> scientific_name() method which is unable to set any value (and
> >>>>> warns
> >>>>> about this), only get.
> >>>>>
> >>>>> It seems like the need to have this classification array work the
> >>>>> same
> >>>>> way as Bio::Species is causing some unnecessary restrictions.
> >>>>> Can't
> >>>>> the
> >>>>> more sensible idea of having a dedicated storage spot for the
> >>>>> ScientificName and other parameters be used, with the
> >>>>> classification
> >>>>> array either being generated just-in-time from the hash-stored
> >>>>> data, or
> >>>>> indeed being generated from the Lineage field?
> >>>>>
> >>>>>
> >>>>> Also, why does a node store the complete hierarchy on itself in
> >>>>> the
> >>>>> classification array? If we're going that far, why don't the
> >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just
> >>>>> have a
> >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> >>>>> only
> >>>>> have a minimum of information, if you could simply ask a node
> >>>>> what its
> >>>>> rank and scientific name was you could easily build a
> >>>>> classification
> >>>>> array, or ask what Kingdom your species was in etc.
> >>>>>
> >>>>> Are there good reasons for Taxonomy working the way it does in
> >>>>> 1.5.1, or
> >>>>> would I not be wasting my time re-writing things to make more
> >>>>> sense
> >>>>> (to me)?
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>> Sendu.
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> Jason Stajich
> >>>> Duke University
> >>>> http://www.duke.edu/~jes12
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From WiersmaP at AGR.GC.CA  Thu May 11 20:13:12 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Thu, 11 May 2006 20:13:12 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module
	andrun-primer3 module?
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C52@onncrxms5.agr.gc.ca>

Li,

If you are only "a little confused" by the OO concepts in the primer3 modules than you are doing well.

To expand a little on Wenwu's explanations.  A Bio::Tools::Run:Primer3 object is a "wrapper" around the Primer3 program. All the commands and parameters that Primer3 needs for it to run are collected inside the object.  This includes a sequence (which you must supply as a sequence object) and parameters (most of which are already supplied by default but can be changed using the $primer3_object->add_targets method). Then, when everything is set the way you want it you 'run' the Primer3 program by using $primer3_object->run.  The "wrapper" collects all the run parameters and sends them off to the Primer3 executable.  Primer3 does the analysis and outputs the results to "stdout" in boulder-io format.  By redirecting the output (i.e. perl p3run_script.pl > out.txt) you will get the Primer3 output directly in the  boulder-io format ('tag'='value') stored in out.txt.  Because out.txt is not being closed between each sequence called in the script you get all of the results concatenated in out.txt.  However, if you supplied an output filename (-outfile=>$file_out) in the "wrapper", each line of output from Primer3 will be written to $file_out and at the end of Primer3 output the file will be closed.  Now if your script loops to another sequence it will open the same outfile again and overwrite.  

One last important detail for the "wrapper" object.  When Primer3 is executed the $primer3_object is designed to return a Bio::Tools::Primer3 object (the code is: my $results_object = $primer3_object->run).  $results_object is a Bio::Tools::Primer3 object and contains the results of your Primer3 run as well as having methods for getting at that information.  This includes finding out how many primer sets were found and the means to access the primer set results one at a time.  It does work as advertised.  Because all of the primer sets are based on the same sequence, Primer3 only outputs the SEQUENCE and PRIMER_SEQUENCE_ID one time instead of for each primer set.  That is why they only show up in $results_object as if they belonged with the first primer set (set '0') and they are not available for the other primer sets.

PAW

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
Sent: Wednesday, May 10, 2006 5:28 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module?

First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual 
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3 

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run; 

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li 

 
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Fri May 12 00:29:37 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 12 May 2006 14:29:37 +1000
Subject: [Bioperl-l] Using bioperl to convert gene predictions to gff
In-Reply-To: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin>
References: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin>
Message-ID: <44640F31.6090702@infotech.monash.edu.au>

Mark,

> I'd like to reformat gene predictions from several different programs
> (genscan, glimmerhmm, fgenesh) to gff format. I know bioperl can parse the
> output from these and other predictors and that it can export into GFF. But
> I'm not clear on how to string the two together.
> Can anyone point me at any example code?

The parser module for the gene predictions generally allow you to 
iterate through the predicted genes. Each prediction is usually returned 
as a Bio::SeqFeatureI-derived object. Those objects have a gff_string() 
method to print them as GFF.

So something as simple as this *may* work:

use Bio::Tools::Glimmer;
my $parser = new Bio::Tools::Glimmer(-file => 'glimmer.out');
while(my $gene = $parser->next_prediction) {
   print $gene->gff_string;
}

If you want separate GFF lines for each exon, you'll have to do another 
loop over $gene->exons() etc each of which are luckily also 
Bio::SeqFeatures!

Or if want to modify some of the GFF columns first, eg. the source tag, 
just do $gene->source_tag('mynewtag') before printing it.

Hope this helps,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From torsten.seemann at infotech.monash.edu.au  Fri May 12 00:36:46 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 12 May 2006 14:36:46 +1000
Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making
	with	Bio::Graphics::Panel
In-Reply-To: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>
References: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>
Message-ID: <446410DE.7070305@infotech.monash.edu.au>

Kevin,

> I want to create an imagemap of short sequence matches with a longer one
> with clickable imagemaps for the short sequences. I figure I can do this
> easily enough using the example script for parsing blast output but I need
> an example script to understand how to produce the html code for the
> imagemap. I can find only rather cryptic references about how this can be
> done (see below).

The "blastGraphic" project probably has Perl code that could help you.

	http://www.gmod.org/blastGraphic.shtml

It is/was part of the GMOD project.
It produces pretty clickable image maps from BLAST reports.

Hope it helps,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From brianjgilmartin at hotmail.com  Fri May 12 05:29:15 2006
From: brianjgilmartin at hotmail.com (brian gilmartin)
Date: Fri, 12 May 2006 10:29:15 +0100
Subject: [Bioperl-l] (no subject)
Message-ID: <BAY107-F354AD036A551D290A1874CBCAC0@phx.gbl>

please remove me from the list

_________________________________________________________________
Be the first to hear what's new at MSN - sign up to our free newsletters! 
http://www.msn.co.uk/newsletters


From sb at mrc-dunn.cam.ac.uk  Fri May 12 06:24:39 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 12 May 2006 11:24:39 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
Message-ID: <44646267.2000802@mrc-dunn.cam.ac.uk>

In bioperl up to at least 1.5.1, when one of the database modules comes 
across a species rank it does:

if ($rank eq 'species') {
   # get rid of genus from species name
   (undef,$taxon_name) = split(/\s+/,$taxon_name,2);
}

However even though true scientific name is usually 'Genus species' in 
the database, note the 'usually' - sometimes the species is a multiword 
item that does not include the Genus, so we can't do some simple split 
and take the second word.
The same applies to levels below species, eg. 'Avian erythroblastosis 
virus' is a variant of the species 'Avian leukosis virus' but 'Avian 
erythroblastosis virus (strain ES4)' is a variant of that variant...

My solution is to just remove whatever is the same between the current 
rank and the previous rank. Maybe even that's not so perfect, but it 
must be a lot better than turning the species 'Avian leukosis virus' 
into the species 'virus' (especially given that the genus here is 
'Alpharetrovirus')!

# we need to be going root(kingdom) -> leaf (species or lower) order
#
# we need to be storing untouched versions of the scientific name of
# the previous rank ($self->{_last_raw})
#
# probably only bother start doing this when we get to genus
my $last_raw = $self->{_last_raw} || undef;
$self->{_last_raw} = $sci_name;
if ($last_raw) {
   $sci_name =~ s/$last_raw//;
   $sci_name =~ s/^\s+//;
}

Are there even more strange species (and lower) names that would still 
not work well with the above solution?

Cheers,
Sendu.


From s_maheshwari84 at rediffmail.com  Fri May 12 09:55:49 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 12 May 2006 13:55:49 -0000
Subject: [Bioperl-l] problem help me...........please
Message-ID: <20060512135549.27106.qmail@webmail9.rediffmail.com>

  
hello
I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
I am working on protein protein interaction but I am unable to use the protein interaction module i.e. ProteinGraph.pm..
Actially I am facing lots of problem in the programme I have written Please help me since last four months I am not able to solve the same problem..
I am pasting my programe here also I am attaching it also. ......

#!usr/bin/perl
use lib "/usr/local/bioxapps/bioperl/library/";
use strict;
use Bio::Graph::SimpleGraph;
use Bio::Graph::IO;
our @ISA=qw( Bio::SeqI);
use Bio::Graph::Edge;
use Bio::Graph::IO::dip;
use Bio::Graph::IO::psi_xml;
use Clone qw(clone);
use vars  qw(@ISA);
use Bio::AnnotatableI;
use Bio::IdentifiableI;
our @ISA = qw(Bio::Graph::SimpleGraph);
@ISA = qw(Bio::Graph::IO);
our @ISA=qw(Expoerter);
use Bio::Graph::ProteinGraph;
use Class::AutoClass;
use Bio::Graph::SimpleGraph::Traversal;

my $graphio = Bio::Graph::IO->new(-file   => '/users/saurabh/perl_program/sample1.txt',-format => 'dip');
print "$graphio";
my $graph   = $graphio->next_network();
print "$graph->nodes\t";
$graph->remove_dup_edges();
my @un=$graph->unconnected_nodes();
print "\nthe unconnected nodes are =@un";
my @n=$graph->subgraph();
print "\subgraph=@n\n";
#print "Please the protein-id whose clusering coefficient is to be detemined\n";
#my $v=<STDIN>;
my $density = $graph->density();
print "\ngraph density=$density\n";
my @graphs = $graph->components();
print "\nno of Connected components=$#graphs\n";
print "\nplease enter the protein-id whom you want to remove from the network\n";
my $no=<STDIN>;
$graph->remove_nodes($graph->nodes_by_id($no));
my $count = $graph->edge_count();
print "\nno of edges=$count\n ";
my $ncount = $graph->node_count();
print "\nno of nodes=$ncount\n ";

print"\nenter the protein  whose interactions is to be find "; 
my $x=<STDIN>;
my $node = $graph->nodes_by_id($x);
#print " this is $node\n";
my @neighbors = $graph->neighbors($node); 
print "to check";
print join",",map{$_->object_id()} @neighbors;
my @nodes = $graph->nodes();
print "\nno of nodes = @nodes\t\n";
my @hubs;
foreach my $nodi (@nodes) 
 {
  if ($graph->neighbor_count($node) > 10) 
      {
       push @hubs, $nodi;
      }
  }
  
foreach my $r(@hubs)
  {
     my @y=@$r;
      print "the following proteins have > 10 interactors=@y\n";
  }
  #siblingual protein

 my @edgeref = $graph->articulation_points();
 print "no of articulation points=$#edgeref\n";
 print "please enter the protein whom you want to check for articulation point \n ";
 my $nod=<STDIN>;
  # make pathgen graph
  my $grap = Bio::Graph::IO->new(-file   => 'org.txt',-format => 'dip');
  my $gra   = $grap->next_network();
  $graph->remove_dup_edges();
  $graph->union($gra);
  my @duplicates = $graph->dup_edges();
  print "these interactions exist in cere and c.elegan\n=@duplicates";
  print "please enter the first protein for identifiaction of shortest path\n";
  my $p1=<STDIN>;
  print "please enter the second  protein for identifiaction of shortest path\n";
  my $p2=<STDIN>;
  
    my @a=$graph->shortest_paths();
 print "shortest path=@a\t\n";
    
  
with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: from.pl
Type: application/octet-stream
Size: 2723 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060512/fe287972/attachment-0002.obj>

From chen_li3 at yahoo.com  Thu May 11 13:47:33 2006
From: chen_li3 at yahoo.com (chen li)
Date: Thu, 11 May 2006 10:47:33 -0700 (PDT)
Subject: [Bioperl-l] script for batch-primer design using primer3 module
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca>
Message-ID: <20060511174733.68836.qmail@web36812.mail.mud.yahoo.com>

Hi all,

With the valuable input from many of you I finally
come out a script for my personal need:
1)bacth-primer design
2)set some of the parameters instead of using all the
default values
3)output only part of the information for the first
pair of primers but not all of them(but you can
choose)
4)the reults can be exported into excel for my
convience.

Enclosed are the script and the results tested.  I
also include some lines about how I figure out which
keys/entries are vailable for change.If you don't 
want the sequence part just add # to comment it.

Any comments are welcome.

BTW the solution suggested by Dr. Cui and Paul doesn't
work for me.

Once again thank you very much,

Li  

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: primer3-5
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment-0002.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: result1.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment-0002.txt>

From Marc.Logghe at DEVGEN.com  Fri May 12 11:28:55 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Fri, 12 May 2006 17:28:55 +0200
Subject: [Bioperl-l] problem help me...........please
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAB@ANTARESIA.be.devgen.com>

Hi,
What is actually the problem ? Do you have errors ? Is the script not
behaving as you expect ?
You also might attach the input file sample1.txt so that people can try
it.
Regards,
Marc
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> saurabh maheshwari
> Sent: Friday, May 12, 2006 3:56 PM
> To: bioperl-l at bioperl.org; s_maheshwari84
> Subject: [Bioperl-l] problem help me...........please
> 
>   
> hello
> I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
> I am working on protein protein interaction but I am unable 
> to use the protein interaction module i.e. ProteinGraph.pm..
> Actially I am facing lots of problem in the programme I have 
> written Please help me since last four months I am not able 
> to solve the same problem..
> I am pasting my programe here also I am attaching it also. ......
> 
> #!usr/bin/perl
> use lib "/usr/local/bioxapps/bioperl/library/";
> use strict;
> use Bio::Graph::SimpleGraph;
> use Bio::Graph::IO;
> our @ISA=qw( Bio::SeqI);
> use Bio::Graph::Edge;
> use Bio::Graph::IO::dip;
> use Bio::Graph::IO::psi_xml;
> use Clone qw(clone);
> use vars  qw(@ISA);
> use Bio::AnnotatableI;
> use Bio::IdentifiableI;
> our @ISA = qw(Bio::Graph::SimpleGraph);
> @ISA = qw(Bio::Graph::IO);
> our @ISA=qw(Expoerter);
> use Bio::Graph::ProteinGraph;
> use Class::AutoClass;
> use Bio::Graph::SimpleGraph::Traversal;
> 
> my $graphio = Bio::Graph::IO->new(-file   => 
> '/users/saurabh/perl_program/sample1.txt',-format => 'dip');
> print "$graphio";
> my $graph   = $graphio->next_network();
> print "$graph->nodes\t";
> $graph->remove_dup_edges();
> my @un=$graph->unconnected_nodes();
> print "\nthe unconnected nodes are =@un"; my 
> @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please 
> the protein-id whose clusering coefficient is to be 
> detemined\n"; #my $v=<STDIN>; my $density = 
> $graph->density(); print "\ngraph density=$density\n"; my 
> @graphs = $graph->components(); print "\nno of Connected 
> components=$#graphs\n"; print "\nplease enter the protein-id 
> whom you want to remove from the network\n"; my $no=<STDIN>; 
> $graph->remove_nodes($graph->nodes_by_id($no));
> my $count = $graph->edge_count();
> print "\nno of edges=$count\n ";
> my $ncount = $graph->node_count();
> print "\nno of nodes=$ncount\n ";
> 
> print"\nenter the protein  whose interactions is to be find 
> "; my $x=<STDIN>; my $node = $graph->nodes_by_id($x); #print 
> " this is $node\n"; my @neighbors = $graph->neighbors($node); 
> print "to check"; print join",",map{$_->object_id()} 
> @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes 
> = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes)  {
>   if ($graph->neighbor_count($node) > 10) 
>       {
>        push @hubs, $nodi;
>       }
>   }
>   
> foreach my $r(@hubs)
>   {
>      my @y=@$r;
>       print "the following proteins have > 10 interactors=@y\n";
>   }
>   #siblingual protein
> 
>  my @edgeref = $graph->articulation_points();  print "no of 
> articulation points=$#edgeref\n";  print "please enter the 
> protein whom you want to check for articulation point \n ";  
> my $nod=<STDIN>;
>   # make pathgen graph
>   my $grap = Bio::Graph::IO->new(-file   => 'org.txt',-format 
> => 'dip');
>   my $gra   = $grap->next_network();
>   $graph->remove_dup_edges();
>   $graph->union($gra);
>   my @duplicates = $graph->dup_edges();
>   print "these interactions exist in cere and c.elegan\n=@duplicates";
>   print "please enter the first protein for identifiaction of 
> shortest path\n";
>   my $p1=<STDIN>;
>   print "please enter the second  protein for identifiaction 
> of shortest path\n";
>   my $p2=<STDIN>;
>   
>     my @a=$graph->shortest_paths();
>  print "shortest path=@a\t\n";
>     
>   
> 
> with Regards
> 
> SAURABH MAHESHWARI
> 
> M.Sc. (BIOINFORMATICS)
> 
> JAMIA MILLIA ISLAMIA
> 
> NEW DELHI
> 


From stoltzfu at umbi.umd.edu  Fri May 12 11:56:06 2006
From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus)
Date: Fri, 12 May 2006 11:56:06 -0400
Subject: [Bioperl-l] proposal: Bio::CDAT (character data and trees)
Message-ID: <A52F256F-A851-4429-A5B1-D3162A344790@umbi.umd.edu>

Dear developers--

We propose a Bio::CDAT (Character Data And Trees) module to  
facilitate comparative analysis
using evolutionary methods by 1) managing evolutionary relationships  
(by linking data to trees)
and 2) allowing coordinated analysis of different types of data (by  
implementing a generic concept
of ?character-state? data).  Bio::CDAT would leverage existing  
BioPerl objects and include the functionality
of Rutger Vos's Bio::Phylo.  It would provide the framework to  
develop interfaces to analysis tools
(phylogeny inference, evolutionary rate models, functional shift  
inference, etc), as well as to file
formats and visualization methods appropriate for such analyses.  A  
proposal is available at

   http://www.molevol.org/camel/projects/CDAT-proposal.pdf

We would like to hear your thoughts (e.g., see the section on  
"Questions to consider")!  Thanks

Arlin Stoltzfus
WeiGang Qiu
Rutger Vos
(with thanks to Justin Reese and Aaron Mackey)
------------------
Arlin Stoltzfus (stoltzfu at umbi.umd.edu)
CARB, 9600 Gudelsky Drive, Rockville, Maryland  20850
tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel


From sdavis2 at mail.nih.gov  Fri May 12 11:54:57 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Fri, 12 May 2006 11:54:57 -0400
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060512135549.27106.qmail@webmail9.rediffmail.com>
Message-ID: <C08A2811.B6B5%sdavis2@mail.nih.gov>


On 5/12/06 9:55 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
wrote:

>   
> hello
> I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
> I am working on protein protein interaction but I am unable to use the protein
> interaction module i.e. ProteinGraph.pm..
> Actially I am facing lots of problem in the programme I have written Please
> help me since last four months I am not able to solve the same problem..
> I am pasting my programe here also I am attaching it also. ......

You haven't really told us what you are trying to do or what problems you
are having.

Sean


From cjfields at uiuc.edu  Fri May 12 13:08:11 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 12 May 2006 12:08:11 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <44646267.2000802@mrc-dunn.cam.ac.uk>
Message-ID: <000f01c675e6$a61bde90$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Friday, May 12, 2006 5:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles
> species,subspecies/variant names
> 
> In bioperl up to at least 1.5.1, when one of the database modules comes
> across a species rank it does:
> 
> if ($rank eq 'species') {
>    # get rid of genus from species name
>    (undef,$taxon_name) = split(/\s+/,$taxon_name,2);
> }

The XML example from NCBI Taxonomy I mentioned previously seems to have
everything in the classification, from superkingdom down to species (no
strain unfortunately, and I'm nit sure about subspecies); if it's missing
the rank then the designation doesn't exist or is tagged as 'no rank'.  Like
I mentioned before I'm not intimately familiar Bio::Taxonomy,
Bio::DB::Taxonomy, or Bio::Species, so I don't have a clue as to how
everything is parsed and plugged in to Bio::Taxonomy objects.  I do know
that XML::Twig is used for parsing through the data so it shouldn't be too
hard to change what you want.

I haven't tried using Bio::DB::Taxonomy directly yet, but I would have
thought that the binomial is just built from the XML twig 'LineageEx'
Rank=Genus + Rank=Species, that the genus comes from the tag 'Genus' and
species from 'Species', and that the scientific name is from the tag
'ScientificName'.  Guess not. 

> However even though true scientific name is usually 'Genus species' in
> the database, note the 'usually' - sometimes the species is a multiword
> item that does not include the Genus, so we can't do some simple split
> and take the second word.
> The same applies to levels below species, eg. 'Avian erythroblastosis
> virus' is a variant of the species 'Avian leukosis virus' but 'Avian
> erythroblastosis virus (strain ES4)' is a variant of that variant...
> 
> My solution is to just remove whatever is the same between the current
> rank and the previous rank. Maybe even that's not so perfect, but it
> must be a lot better than turning the species 'Avian leukosis virus'
> into the species 'virus' (especially given that the genus here is
> 'Alpharetrovirus')!
> 
> # we need to be going root(kingdom) -> leaf (species or lower) order
> #
> # we need to be storing untouched versions of the scientific name of
> # the previous rank ($self->{_last_raw})
> #
> # probably only bother start doing this when we get to genus
> my $last_raw = $self->{_last_raw} || undef;
> $self->{_last_raw} = $sci_name;
> if ($last_raw) {
>    $sci_name =~ s/$last_raw//;
>    $sci_name =~ s/^\s+//;
> }
> 
> Are there even more strange species (and lower) names that would still
> not work well with the above solution?

I'm don't think taking Genus/Species directly from the scientific name
(normally what is in the SOURCE or ORGANISM annotation for GenBank or OS for
EMBL) is the best way to go about it since it's really a best guess using
regex; Jason pointed out several examples where this falls apart, and being
a bacterial man I have found many examples myself.  I'm also not sure that
forcing a lookup for every TaxID in every sequence every time it's passed
through SeqIO is the best way to go either, though I think it should be
required for storing sequences.  It's a tricky balance.  

I still think that maybe we should absolve ourselves from using
SOURCE/ORGANISM or OS/OC information in GenBank files as anything more than
strictly annotation, or reconstruct Bio::Species to maybe a
Bio::Annotation::Species object to handle that annotation and either
deprecate Bio::Species or separate it completely from any Bio::Taxonomy
objects.  It would really simplify things.  Then, if anyone is interested in
taxonomy, either install a local database or use Entrez efetch, and then use
Bio::DB::Taxonomy (fixed of course) to grab the TaxID info.  Seems like
we're running more and more into exceptions to the rule as more genomes are
made available.

Anyway, using Bio::Species for GenBank is really screwy for bacterial names,
so currently I get around BioPerl issues with bacterial names by grabbing
the 'source' seqfeature and pulling the 'organism' tag out.  But it really
shouldn't be that obfuscated, right?

Chris

> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Sat May 13 08:19:21 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sat, 13 May 2006 08:19:21 -0400
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060513041853.16091.qmail@webmail31.rediffmail.com>
References: <20060513041853.16091.qmail@webmail31.rediffmail.com>
Message-ID: <4465CEC9.2010909@mail.nih.gov>

saurabh maheshwari wrote:
>  
> hello
> Thanks for your prompt reply.
> Actaully I am trying to make a protein interaction graph from a dip 
> file.But I am not able to do so.In my last mail I have already attached 
> my program which is giving some error and I am not able troble shot 
> them.Please help
> Thanks

I meant that since we don't know what error(s) you are getting, it is 
really not possible to determine what the problem is.  Also, someone 
else on the list offered to look at your code if you were to privide the 
input file.  I find it helpful to look at this webpage every now and 
then to remind myself what constitutes a useful question to email lists:

http://www.catb.org/~esr/faqs/smart-questions.html

Sean


> On Fri, 12 May 2006 Sean Davis wrote :
>  >
>  >
>  >
>  >On 5/12/06 9:55 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
>  >wrote:
>  >
>  > >
>  > > hello
>  > > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
>  > > I am working on protein protein interaction but I am unable to use 
> the protein
>  > > interaction module i.e. ProteinGraph.pm..
>  > > Actially I am facing lots of problem in the programme I have 
> written Please
>  > > help me since last four months I am not able to solve the same 
> problem..
>  > > I am pasting my programe here also I am attaching it also. ......
>  >
>  >You haven't really told us what you are trying to do or what problems you
>  >are having.
>  >
>  >Sean
>  >
>  >_______________________________________________
>  >Bioperl-l mailing list
>  >Bioperl-l at lists.open-bio.org
>  >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> with Regards
> SAURABH MAHESHWARI
> M.Sc. (BIOINFORMATICS)
> JAMIA MILLIA ISLAMIA
> NEW DELHI
> 
> <http://adworks.rediff.com/cgi-bin/AdWorks/sigclick.cgi/www.rediff.com/signature-home.htm/1507191490 at Middle5?PARTNER=3> 
> 


From s_maheshwari84 at rediffmail.com  Sat May 13 01:17:58 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 13 May 2006 05:17:58 -0000
Subject: [Bioperl-l] problem help me...........please
Message-ID: <20060513051758.4610.qmail@webmail31.rediffmail.com>

  
hello
I am very happy to see the prompt reply from the group members..
As you all suggested  to attach the required files ..
So I have attached all the three file first the input file,secod I have saved the error I was getting into a error file and third the programme file..
Actully in error file I want to know some thing .
I am putting here one error line,
## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
what this stand for
Second thing I want to get the connected graph as I have.
which type of connected grph I explain you by example..
Let there are five object in such a way.
A connected to B
A connected to C
B connected to C
D connected to C
E connected to A
I want to create a whole link in betwwen all five.


Please help me I am not getting the result


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample.dip
Type: application/octet-stream
Size: 5794 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment-0004.obj>
-------------- next part --------------
bash-2.05b$ perl from.pl
Bio::Graph::ProteinGraph=HASH(0x1182e70)
Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes
the unconnected nodes are =subgraph=Bio::Graph::SimpleGraph=HASH(0x11e2160)

graph density=0.00826446280991736

no of Connected components=60

please enter the protein-id whom you want to remove from the network
XMECF2

no of edges=61

no of nodes=122

enter the protein  whose interactions is to be find XMECF2
XMECF2
 interacts with map{->object_id()}

no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) Bio::Seq::RichSeq=HASH(0x11d1850
) Bio::Seq::RichSeq=HASH(0x11bd4c0) Bio::Seq::RichSeq=HASH(0x11c2fd0) Bio::Seq::
RichSeq=HASH(0x11aa7f0) Bio::Seq::RichSeq=HASH(0x1198340) Bio::Seq::RichSeq=HASH
(0x11d81a0) Bio::Seq::RichSeq=HASH(0x11ca320) Bio::Seq::RichSeq=HASH(0x11b5e40)
Bio::Seq::RichSeq=HASH(0x1190e00) Bio::Seq::RichSeq=HASH(0x11c1350) Bio::Seq::Ri
chSeq=HASH(0x11b2e20) Bio::Seq::RichSeq=HASH(0x11cb360) Bio::Seq::RichSeq=HASH(0
x1198250) Bio::Seq::RichSeq=HASH(0x11d0240) Bio::Seq::RichSeq=HASH(0x11c8f20) Bi
o::Seq::RichSeq=HASH(0x11b4ef0) Bio::Seq::RichSeq=HASH(0x119f7a0) Bio::Seq::Rich
Seq=HASH(0x11c2ee0) Bio::Seq::RichSeq=HASH(0x11dba20) Bio::Seq::RichSeq=HASH(0x1
1e2300) Bio::Seq::RichSeq=HASH(0x11b2f10) Bio::Seq::RichSeq=HASH(0x11b4b90) Bio:
:Seq::RichSeq=HASH(0x11d4df0) Bio::Seq::RichSeq=HASH(0x11d4b80) Bio::Seq::RichSe
q=HASH(0x11d8e70) Bio::Seq::RichSeq=HASH(0x11a1270) Bio::Seq::RichSeq=HASH(0x11c
b5d0) Bio::Seq::RichSeq=HASH(0x11d5cc0) Bio::Seq::RichSeq=HASH(0x11d32a0) Bio::S
eq::RichSeq=HASH(0x11b4c80) Bio::Seq::RichSeq=HASH(0x119e0c0) Bio::Seq::RichSeq=
HASH(0x11b7ed0) Bio::Seq::RichSeq=HASH(0x11ad490) Bio::Seq::RichSeq=HASH(0x1196e
60) Bio::Seq::RichSeq=HASH(0x119b7f0) Bio::Seq::RichSeq=HASH(0x11cef60) Bio::Seq
::RichSeq=HASH(0x11b7b70) Bio::Seq::RichSeq=HASH(0x11dd330) Bio::Seq::RichSeq=HA
SH(0x11da8c0) Bio::Seq::RichSeq=HASH(0x11a9f70) Bio::Seq::RichSeq=HASH(0x119b700
) Bio::Seq::RichSeq=HASH(0x119a550) Bio::Seq::RichSeq=HASH(0x11ba910) Bio::Seq::
RichSeq=HASH(0x11e0b30) Bio::Seq::RichSeq=HASH(0x11d3030) Bio::Seq::RichSeq=HASH
(0x11c62d0) Bio::Seq::RichSeq=HASH(0x11abb20) Bio::Seq::RichSeq=HASH(0x11d5bd0)
Bio::Seq::RichSeq=HASH(0x11b03c0) Bio::Seq::RichSeq=HASH(0x119e1b0) Bio::Seq::Ri
chSeq=HASH(0x11aa060) Bio::Seq::RichSeq=HASH(0x11a5700) Bio::Seq::RichSeq=HASH(0
x11a81e0) Bio::Seq::RichSeq=HASH(0x1196b00) Bio::Seq::RichSeq=HASH(0x11c1260) Bi
o::Seq::RichSeq=HASH(0x11a2800) Bio::Seq::RichSeq=HASH(0x11c63c0) Bio::Seq::Rich
Seq=HASH(0x11b60b0) Bio::Seq::RichSeq=HASH(0x11b93b0) Bio::Seq::RichSeq=HASH(0x1
1a4490) Bio::Seq::RichSeq=HASH(0x11ded50) Bio::Seq::RichSeq=HASH(0x11bbcd0) Bio:
:Seq::RichSeq=HASH(0x1194780) Bio::Seq::RichSeq=HASH(0x11aedd0) Bio::Seq::RichSe
q=HASH(0x11cd300) Bio::Seq::RichSeq=HASH(0x11a14e0) Bio::Seq::RichSeq=HASH(0x11c
4630) Bio::Seq::RichSeq=HASH(0x11a43a0) Bio::Seq::RichSeq=HASH(0x11a80f0) Bio::S
eq::RichSeq=HASH(0x11bbbe0) Bio::Seq::RichSeq=HASH(0x11d5960) Bio::Seq::RichSeq=
HASH(0x11c8e30) Bio::Seq::RichSeq=HASH(0x11cd3f0) Bio::Seq::RichSeq=HASH(0x11dd4
20) Bio::Seq::RichSeq=HASH(0x11cee70) Bio::Seq::RichSeq=HASH(0x11dbb10) Bio::Seq
::RichSeq=HASH(0x119a460) Bio::Seq::RichSeq=HASH(0x11aaa60) Bio::Seq::RichSeq=HA
SH(0x11d1760) Bio::Seq::RichSeq=HASH(0x11cb6c0) Bio::Seq::RichSeq=HASH(0x11c7530
) Bio::Seq::RichSeq=HASH(0x11deae0) Bio::Seq::RichSeq=HASH(0x11c4720) Bio::Seq::
RichSeq=HASH(0x119f890) Bio::Seq::RichSeq=HASH(0x11a6c40) Bio::Seq::RichSeq=HASH
(0x11ad130) Bio::Seq::RichSeq=HASH(0x11e23f0) Bio::Seq::RichSeq=HASH(0x11d2f40)
Bio::Seq::RichSeq=HASH(0x1194640) Bio::Seq::RichSeq=HASH(0x11d8f60) Bio::Seq::Ri
chSeq=HASH(0x11d0150) Bio::Seq::RichSeq=HASH(0x119d070) Bio::Seq::RichSeq=HASH(0
x11a5610) Bio::Seq::RichSeq=HASH(0x11aa2d0) Bio::Seq::RichSeq=HASH(0x11b94a0) Bi
o::Seq::RichSeq=HASH(0x11bd5b0) Bio::Seq::RichSeq=HASH(0x11c0ff0) Bio::Seq::Rich
Seq=HASH(0x11a6b50) Bio::Seq::RichSeq=HASH(0x119cf80) Bio::Seq::RichSeq=HASH(0x1
1baa00) Bio::Seq::RichSeq=HASH(0x11c7620) Bio::Seq::RichSeq=HASH(0x119fb00) Bio:
:Seq::RichSeq=HASH(0x11a2a70) Bio::Seq::RichSeq=HASH(0x11b1960) Bio::Seq::RichSe
q=HASH(0x11ab8b0) Bio::Seq::RichSeq=HASH(0x11e0c20) Bio::Seq::RichSeq=HASH(0x11a
d3a0) Bio::Seq::RichSeq=HASH(0x1197fe0) Bio::Seq::RichSeq=HASH(0x11b1870) Bio::S
eq::RichSeq=HASH(0x11a2b60) Bio::Seq::RichSeq=HASH(0x1192750) Bio::Seq::RichSeq=
HASH(0x11c9190) Bio::Seq::RichSeq=HASH(0x11e08c0) Bio::Seq::RichSeq=HASH(0x11dd6
90) Bio::Seq::RichSeq=HASH(0x11da7d0) Bio::Seq::RichSeq=HASH(0x11aece0) Bio::Seq
::RichSeq=HASH(0x11d80b0) Bio::Seq::RichSeq=HASH(0x11ca0b0) Bio::Seq::RichSeq=HA
SH(0x1196bf0) Bio::Seq::RichSeq=HASH(0x11b7de0) Bio::Seq::RichSeq=HASH(0x11b02d0
)
Can't call method "isa" on an undefined value at /usr/local/bioxapps/bioperl/lib
rary//Bio/Graph/ProteinGraph.pm line 477, <STDIN> line 2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: from.pl
Type: application/octet-stream
Size: 2723 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment-0005.obj>

From cjfields at uiuc.edu  Sat May 13 14:18:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 13 May 2006 13:18:53 -0500
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060513051758.4610.qmail@webmail31.rediffmail.com>
Message-ID: <000901c676b9$b14479c0$15327e82@pyrimidine>

I really hate to break the bad news here, but I'm going to be brutally
honest.  I have not looked at any of the Bio::Graph modules and have no idea
how they are implemented, and I haven't looked at your input file, but I can
tell right off the bat your script has major logic problems.  I can also
pretty much  tell that you don't understand the object model we use here, at
all.  This is why I say that (from your last response):

> ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
> what this stand for

Did you cut and paste from several other scripts hoping that it would work?
I say that b/c you mix styles quite frequently here, using objects correctly
(deref'ing with '->') and incorrectly (print "$object").  You also declare
(and redeclare) @ISA four times for a script (not needed unless you're
declaring a class and inheriting methods from other modules).  You also use
@ISA once with a misspelled module name (I don't think there is a module
named 'Expoerter').  So, I'm actually stunned that the script doesn't crash
at all.  Yikes!

Okay, brutal honesty time over.  Any time you see something like this:

Bio::Graph::ProteinGraph=HASH(0x1182e70)

means that what you are printing out is an reference to an object (it refers
to the object class and the location in memory) and is NOT what you want.
You should be doing something along the lines of $object->method, not 'print
$object', to get at the object data and methods.  You use this several times
in your script already; that should be a big hint as the areas where it
doesn't work do not use this syntax.  Read the documentation for the many
varied modules you use in your script.  Look at script examples.  Start
simply, then work your way up.  

Also, using the '->' dereferencing operator inside double quotes doesn't
work; you have to do something like:

print $graph->nodes,"\t";

not 

print "$graph->nodes\t";

That's why you get this in your output:

Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes

Which just prints the object reference with the string '->nodes'.

If any of what I just said doesn't make any sense, you really need to pick
up 'Learning Perl' and 'Intermediate Perl' by Schwartz et al and
'Programming Perl' by Wall et al.  I don't know if anyone can really help at
this point w/o completely writing the script for you.  We will fix problems
to a point but we, for the most part, will not do your work for you.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari
> Sent: Saturday, May 13, 2006 12:18 AM
> To: bioperl_l
> Subject: [Bioperl-l] problem help me...........please
> 
> 
> hello
> I am very happy to see the prompt reply from the group members..
> As you all suggested  to attach the required files ..
> So I have attached all the three file first the input file,secod I have
> saved the error I was getting into a error file and third the programme
> file..
> Actully in error file I want to know some thing .
> I am putting here one error line,
> ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
> what this stand for
> Second thing I want to get the connected graph as I have.
> which type of connected grph I explain you by example..
> Let there are five object in such a way.
> A connected to B
> A connected to C
> B connected to C
> D connected to C
> E connected to A
> I want to create a whole link in betwwen all five.
> 
> 
> Please help me I am not getting the result
> 
> 
> with Regards
> 
> SAURABH MAHESHWARI
> 
> M.Sc. (BIOINFORMATICS)
> 
> JAMIA MILLIA ISLAMIA
> 
> NEW DELHI


From hubert.prielinger at gmx.at  Sat May 13 23:45:58 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sat, 13 May 2006 21:45:58 -0600
Subject: [Bioperl-l] parsing output files from other tools
Message-ID: <4466A7F6.30204@gmx.at>

hi,
Is it possible to parse text outputfiles rather than blast output files, 
like the text outputfiles form the search tool mpSrch that is offered by
EBI, because the WU Blast output files are possible to parse with bioperl.

thanks
Hubert


From arareko at campus.iztacala.unam.mx  Sun May 14 00:09:35 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 13 May 2006 23:09:35 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <4466AD7F.6050700@campus.iztacala.unam.mx>

I'm glad to announce the availability of the Deobfuscator interface at 
the BioPerl website. You can use it at the following URL:

http://bioperl.org/cgi-bin/deob_interface.cgi

Many thanks to Laura Kavanaugh and David Messina for this great 
contribution to the BioPerl project!

Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Sun May 14 12:18:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 14 May 2006 11:18:10 -0500
Subject: [Bioperl-l] parsing output files from other tools
In-Reply-To: <4466A7F6.30204@gmx.at>
Message-ID: <000301c67772$00b4e4f0$15327e82@pyrimidine>

These are the current report types parsed through SearchIO:

http://www.bioperl.org/wiki/Module:Bio::SearchIO

I don't see mpsrch among them.  If you want you could create a new plugin
module to parse those reports; the SearchIO HOWTO gives some pointers:

http://www.bioperl.org/wiki/HOWTO:SearchIO

You can always look at some of the current modules like blast, blastxml, or
fasta to get an idea of how it works.  Judging by the mpsrch output I'm
pretty sure you would have to build a custom plugin for it.  

A viable alternative: looking through the mail list it looks like mpsrch is
a multiprocessor implementation of ssearch, itself an implementation of the
Smith-Waterman algorithm for local alignments in the FASTA package of
programs:

http://www.bioperl.org/wiki/SSEARCH

You might be able to use SearchIO::fasta there...

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Saturday, May 13, 2006 10:46 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] parsing output files from other tools
> 
> hi,
> Is it possible to parse text outputfiles rather than blast output files,
> like the text outputfiles form the search tool mpSrch that is offered by
> EBI, because the WU Blast output files are possible to parse with bioperl.
> 
> thanks
> Hubert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May 14 13:14:30 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 14 May 2006 10:14:30 -0700 (PDT)
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com>

Hi all,

I need to get a reverse-complemenary sequence out of a
fasta sequence file. And the Synopsis of Bio::Seq
points out I can do like this way:

$revcom=$seqobj->revcom();

I use the following script trying to get the job done
but it doesn't work. Then I read documentation of
Bio::Seq and it looks like it doesn't contain revcom
method.

Any idea will be appreciated.

Li 


###############################
Here is the code:

#!c:/perl/bin/perl.exe
use strict;
use warnings;

use Bio::Seq; 
use Bio::SeqIO;     
       
my $file='c:/perl/local/primer3_1.0.0/src/est.txt';   
 
    
my $seqIO=Bio::SeqIO->new(-file=>"<$file",
                            -format=>'fasta' );
                            
    my $seqobj=$seqIO->next_seq();#create object  
    
  print "what attributes/keys are available:\n";    
  for my $key (sort keys %$seqobj){
           my $value=$seqobj->{$key};
	    print "$key\t=>\t$value\n"
	    }
# These are the output on the screen	    
#primary_id =>      gi|54093|emb|X61809.1|
#primary_seq =>     Bio::PrimarySeq=HASH(0x10492848)

#based on these results primary_id can get 
#access right away
# as to primary_seq it is an object in
#Bio::Primaryseq and it provides the following
#methods after reading the documentaion:
                #new   
		#seq 
		#validate_seq 
		#subseq 
		#length 
		#display_id
		#accession_number 
		#primary_id 
		#alphabet 
		#desc 
		#can_call_new
		#id 
		#is_circular 
		#object_id
		#version 
		#authority 
		#namespace 
		#display_name 
		#description 
    
print "primary_id=",$seqobj->primary_id, "\n\n";
print "id=",$seqobj->id, "\n\n"; 
print "revcom=",$seqobj->revcom,"\n\n";
                      
        my $now_time=localtime;
        print  $now_time, "\n\n";  
        exit;

 #These are the output on the screen 
	#primary_id=gi|54093|emb|X61809.1|
	#id=gi|54093|emb|X61809.1
	#revcom=Bio::Seq=HASH(0x10493304)
	#Sun May 14 12:45:20 2006

      
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Sun May 14 13:39:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 14 May 2006 12:39:50 -0500
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
In-Reply-To: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com>
Message-ID: <000401c6777d$66ddb120$15327e82@pyrimidine>

This line should give you the hint:

	#revcom=Bio::Seq=HASH(0x10493304)

You're getting an object ref here.  The actual way to get the rev. comp on
the wiki states '$seq->revcom->seq', not '$seq->revcom'.

When I ran your script and change your line to the wiki version I get (using
my test seq):

what attributes/keys are available:
primary_id      =>      test,
primary_seq     =>      Bio::PrimarySeq=HASH(0x1d47fe0)
primary_id=test,

id=test,

revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG
CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG
TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG
GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG

Sun May 14 17:34:45 2006

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of chen li
> Sent: Sunday, May 14, 2006 12:15 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] no revcom method in Bio::Seq module?
> 
> Hi all,
> 
> I need to get a reverse-complemenary sequence out of a
> fasta sequence file. And the Synopsis of Bio::Seq
> points out I can do like this way:
> 
> $revcom=$seqobj->revcom();
> 
> I use the following script trying to get the job done
> but it doesn't work. Then I read documentation of
> Bio::Seq and it looks like it doesn't contain revcom
> method.
> 
> Any idea will be appreciated.
> 
> Li
> 
> 
> ###############################
> Here is the code:
> 
> #!c:/perl/bin/perl.exe
> use strict;
> use warnings;
> 
> use Bio::Seq;
> use Bio::SeqIO;
> 
> my $file='c:/perl/local/primer3_1.0.0/src/est.txt';
> 
> 
> my $seqIO=Bio::SeqIO->new(-file=>"<$file",
>                             -format=>'fasta' );
> 
>     my $seqobj=$seqIO->next_seq();#create object
> 
>   print "what attributes/keys are available:\n";
>   for my $key (sort keys %$seqobj){
>            my $value=$seqobj->{$key};
> 	    print "$key\t=>\t$value\n"
> 	    }
> # These are the output on the screen
> #primary_id =>      gi|54093|emb|X61809.1|
> #primary_seq =>     Bio::PrimarySeq=HASH(0x10492848)
> 
> #based on these results primary_id can get
> #access right away
> # as to primary_seq it is an object in
> #Bio::Primaryseq and it provides the following
> #methods after reading the documentaion:
>                 #new
> 		#seq
> 		#validate_seq
> 		#subseq
> 		#length
> 		#display_id
> 		#accession_number
> 		#primary_id
> 		#alphabet
> 		#desc
> 		#can_call_new
> 		#id
> 		#is_circular
> 		#object_id
> 		#version
> 		#authority
> 		#namespace
> 		#display_name
> 		#description
> 
> print "primary_id=",$seqobj->primary_id, "\n\n";
> print "id=",$seqobj->id, "\n\n";
> print "revcom=",$seqobj->revcom,"\n\n";
> 
>         my $now_time=localtime;
>         print  $now_time, "\n\n";
>         exit;
> 
>  #These are the output on the screen
> 	#primary_id=gi|54093|emb|X61809.1|
> 	#id=gi|54093|emb|X61809.1
> 	#revcom=Bio::Seq=HASH(0x10493304)
> 	#Sun May 14 12:45:20 2006
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May 14 14:08:49 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 14 May 2006 11:08:49 -0700 (PDT)
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
In-Reply-To: <000401c6777d$66ddb120$15327e82@pyrimidine>
Message-ID: <20060514180849.55423.qmail@web36808.mail.mud.yahoo.com>

Hi Chris,

Thank you very much. But could you please give me the
link for this syntax: $seq->revcom->seq?

Li


--- Chris Fields <cjfields at uiuc.edu> wrote:

> This line should give you the hint:
> 
> 	#revcom=Bio::Seq=HASH(0x10493304)
> 
> You're getting an object ref here.  The actual way
> to get the rev. comp on
> the wiki states '$seq->revcom->seq', not
> '$seq->revcom'.
> 
> When I ran your script and change your line to the
> wiki version I get (using
> my test seq):
> 
> what attributes/keys are available:
> primary_id      =>      test,
> primary_seq     =>     
> Bio::PrimarySeq=HASH(0x1d47fe0)
> primary_id=test,
> 
> id=test,
> 
>
revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG
> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
>
CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG
> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
>
GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG
> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG
> 
> Sun May 14 17:34:45 2006
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of chen li
> > Sent: Sunday, May 14, 2006 12:15 PM
> > To: bioperl-l at bioperl.org
> > Subject: [Bioperl-l] no revcom method in Bio::Seq
> module?
> > 
> > Hi all,
> > 
> > I need to get a reverse-complemenary sequence out
> of a
> > fasta sequence file. And the Synopsis of Bio::Seq
> > points out I can do like this way:
> > 
> > $revcom=$seqobj->revcom();
> > 
> > I use the following script trying to get the job
> done
> > but it doesn't work. Then I read documentation of
> > Bio::Seq and it looks like it doesn't contain
> revcom
> > method.
> > 
> > Any idea will be appreciated.
> > 
> > Li
> > 
> > 
> > ###############################
> > Here is the code:
> > 
> > #!c:/perl/bin/perl.exe
> > use strict;
> > use warnings;
> > 
> > use Bio::Seq;
> > use Bio::SeqIO;
> > 
> > my
> $file='c:/perl/local/primer3_1.0.0/src/est.txt';
> > 
> > 
> > my $seqIO=Bio::SeqIO->new(-file=>"<$file",
> >                             -format=>'fasta' );
> > 
> >     my $seqobj=$seqIO->next_seq();#create object
> > 
> >   print "what attributes/keys are available:\n";
> >   for my $key (sort keys %$seqobj){
> >            my $value=$seqobj->{$key};
> > 	    print "$key\t=>\t$value\n"
> > 	    }
> > # These are the output on the screen
> > #primary_id =>      gi|54093|emb|X61809.1|
> > #primary_seq =>    
> Bio::PrimarySeq=HASH(0x10492848)
> > 
> > #based on these results primary_id can get
> > #access right away
> > # as to primary_seq it is an object in
> > #Bio::Primaryseq and it provides the following
> > #methods after reading the documentaion:
> >                 #new
> > 		#seq
> > 		#validate_seq
> > 		#subseq
> > 		#length
> > 		#display_id
> > 		#accession_number
> > 		#primary_id
> > 		#alphabet
> > 		#desc
> > 		#can_call_new
> > 		#id
> > 		#is_circular
> > 		#object_id
> > 		#version
> > 		#authority
> > 		#namespace
> > 		#display_name
> > 		#description
> > 
> > print "primary_id=",$seqobj->primary_id, "\n\n";
> > print "id=",$seqobj->id, "\n\n";
> > print "revcom=",$seqobj->revcom,"\n\n";
> > 
> >         my $now_time=localtime;
> >         print  $now_time, "\n\n";
> >         exit;
> > 
> >  #These are the output on the screen
> > 	#primary_id=gi|54093|emb|X61809.1|
> > 	#id=gi|54093|emb|X61809.1
> > 	#revcom=Bio::Seq=HASH(0x10493304)
> > 	#Sun May 14 12:45:20 2006
> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Sun May 14 14:28:14 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sun, 14 May 2006 13:28:14 -0500
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <b3ef767e.b86a2fe8.820dd00@expms6.cites.uiuc.edu>

I think the confusion lies in what revcom returns.  This page

http://www.bioperl.org/wiki/Getting_Started

show a quick way of using revcom, (which I mentioned previously) while this 
page

http://www.bioperl.org/wiki/HOWTO:Beginners

explains what is returned when you use revcom.  '$seq_obj->revcom' returns a 
sequence object (not a sequence string):

http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object

which is why you need to use the 'seq' method to get the string.

Hence, '$seq_obj->revcom->seq'.

Chris

---- Original message ----
>Date: Sun, 14 May 2006 11:08:49 -0700 (PDT)
>From: chen li <chen_li3 at yahoo.com>  
>Subject: RE: [Bioperl-l] no revcom method in Bio::Seq module?  
>To: Chris Fields <cjfields at uiuc.edu>
>Cc: bioperl-l at bioperl.org
>
>Hi Chris,
>
>Thank you very much. But could you please give me the
>link for this syntax: $seq->revcom->seq?
>
>Li
>
>
>
>--- Chris Fields <cjfields at uiuc.edu> wrote:
>
>> This line should give you the hint:
>> 
>> 	#revcom=Bio::Seq=HASH(0x10493304)
>> 
>> You're getting an object ref here.  The actual way
>> to get the rev. comp on
>> the wiki states '$seq->revcom->seq', not
>> '$seq->revcom'.
>> 
>> When I ran your script and change your line to the
>> wiki version I get (using
>> my test seq):
>> 
>> what attributes/keys are available:
>> primary_id      =>      test,
>> primary_seq     =>     
>> Bio::PrimarySeq=HASH(0x1d47fe0)
>> primary_id=test,
>> 
>> id=test,
>> 
>>
>revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGAT
CGCGCGGTCCGGCAGCATCG
>> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
>>
>CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCG
TCGGCCGCGGGCAGTTCGGCG
>> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
>>
>GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGT
CACGTTGGAGCGGGCCACGCG
>> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG
>> 
>> Sun May 14 17:34:45 2006
>> 
>> Chris
>> 
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-
>> > bounces at lists.open-bio.org] On Behalf Of chen li
>> > Sent: Sunday, May 14, 2006 12:15 PM
>> > To: bioperl-l at bioperl.org
>> > Subject: [Bioperl-l] no revcom method in Bio::Seq
>> module?
>> > 
>> > Hi all,
>> > 
>> > I need to get a reverse-complemenary sequence out
>> of a
>> > fasta sequence file. And the Synopsis of Bio::Seq
>> > points out I can do like this way:
>> > 
>> > $revcom=$seqobj->revcom();
>> > 
>> > I use the following script trying to get the job
>> done
>> > but it doesn't work. Then I read documentation of
>> > Bio::Seq and it looks like it doesn't contain
>> revcom
>> > method.
>> > 
>> > Any idea will be appreciated.
>> > 
>> > Li
>> > 
>> > 
>> > ###############################
>> > Here is the code:
>> > 
>> > #!c:/perl/bin/perl.exe
>> > use strict;
>> > use warnings;
>> > 
>> > use Bio::Seq;
>> > use Bio::SeqIO;
>> > 
>> > my
>> $file='c:/perl/local/primer3_1.0.0/src/est.txt';
>> > 
>> > 
>> > my $seqIO=Bio::SeqIO->new(-file=>"<$file",
>> >                             -format=>'fasta' );
>> > 
>> >     my $seqobj=$seqIO->next_seq();#create object
>> > 
>> >   print "what attributes/keys are available:\n";
>> >   for my $key (sort keys %$seqobj){
>> >            my $value=$seqobj->{$key};
>> > 	    print "$key\t=>\t$value\n"
>> > 	    }
>> > # These are the output on the screen
>> > #primary_id =>      gi|54093|emb|X61809.1|
>> > #primary_seq =>    
>> Bio::PrimarySeq=HASH(0x10492848)
>> > 
>> > #based on these results primary_id can get
>> > #access right away
>> > # as to primary_seq it is an object in
>> > #Bio::Primaryseq and it provides the following
>> > #methods after reading the documentaion:
>> >                 #new
>> > 		#seq
>> > 		#validate_seq
>> > 		#subseq
>> > 		#length
>> > 		#display_id
>> > 		#accession_number
>> > 		#primary_id
>> > 		#alphabet
>> > 		#desc
>> > 		#can_call_new
>> > 		#id
>> > 		#is_circular
>> > 		#object_id
>> > 		#version
>> > 		#authority
>> > 		#namespace
>> > 		#display_name
>> > 		#description
>> > 
>> > print "primary_id=",$seqobj->primary_id, "\n\n";
>> > print "id=",$seqobj->id, "\n\n";
>> > print "revcom=",$seqobj->revcom,"\n\n";
>> > 
>> >         my $now_time=localtime;
>> >         print  $now_time, "\n\n";
>> >         exit;
>> > 
>> >  #These are the output on the screen
>> > 	#primary_id=gi|54093|emb|X61809.1|
>> > 	#id=gi|54093|emb|X61809.1
>> > 	#revcom=Bio::Seq=HASH(0x10493304)
>> > 	#Sun May 14 12:45:20 2006
>> > 
>> > 
>> > 
>> > __________________________________________________
>> > Do You Yahoo!?
>> > Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>> > http://mail.yahoo.com
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> >
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 


From Marc.Logghe at DEVGEN.com  Sun May 14 16:28:34 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Sun, 14 May 2006 22:28:34 +0200
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAC@ANTARESIA.be.devgen.com>

Hi Li,
> doesn't work. Then I read documentation of Bio::Seq and it 
> looks like it doesn't contain revcom method.
Here, the Deobfuscator interface that Mauricio announced earlier, comes
in handy.
http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3
A%3ASeq&sort_order=by+method&search_string=
If you look in the methods table, you will find out that the revcom
method is inherited from, and implemented by Bio::PrimarySeqI.
HTH,
Marc 


From sb at mrc-dunn.cam.ac.uk  Mon May 15 04:18:11 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 09:18:11 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
 subspecies/variant names
In-Reply-To: <000f01c675e6$a61bde90$15327e82@pyrimidine>
References: <000f01c675e6$a61bde90$15327e82@pyrimidine>
Message-ID: <44683943.5020307@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sendu Bala wrote:
>> In bioperl up to at least 1.5.1, when one of the database modules 
>> comes across a species rank it does:
>> 
>> if ($rank eq 'species') { # get rid of genus from species name 
>> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); }
> 
> The XML example from NCBI Taxonomy I mentioned previously seems to 
> have everything in the classification, from superkingdom down to 
> species (no strain unfortunately, and I'm nit sure about subspecies);
> if it's missing the rank then the designation doesn't exist or is 
> tagged as 'no rank'.  Like I mentioned before I'm not intimately 
> familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I 
> don't have a clue as to how everything is parsed and plugged in to 
> Bio::Taxonomy objects.  I do know that XML::Twig is used for parsing
> through the data so it shouldn't be too hard to change what you
> want.

Yes, that's all true, but I'm not sure what it has to do with what I was
saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my
own implementation I change the rank of all 'no rank' Nodes below
species to 'variant'.


> I haven't tried using Bio::DB::Taxonomy directly yet, but I would 
> have thought that the binomial is just built from the XML twig 
> 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the
> tag 'Genus' and species from 'Species', and that the scientific name
> is from the tag 'ScientificName'.  Guess not.

No. See above for what it actually does. That is a copy/paste from the
code (there, $taxon_name == ScientificName). When it finds a species
rank it does that split because in the
ncbi taxonomy database the 'genus' rank for a human has a ScientificName
of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo
sapiens', and the bioperl model (quite rightly, I think) wants the
'species' node to not have information of other nodes (well, except for
the classification array). So it removes the 'Homo' from 'Homo sapiens'
giving a species name of 'sapiens'. This then allows the binomial method
to return 'Homo sapiens' instead of 'Homo Homo sapiens'.

(though in a bizarre twist, and this is one of my problems with how
names are currently represented in the Taxonomy modules, 'Scientific
Name' and 'binomial' are synonymous)


[snip]
>> My solution is to just remove whatever is the same between the 
>> current rank and the previous rank. Maybe even that's not so 
>> perfect, but it must be a lot better than turning the species 
>> 'Avian leukosis virus' into the species 'virus' (especially given 
>> that the genus here is 'Alpharetrovirus')!
> 
> I'm don't think taking Genus/Species directly from the scientific 
> name (normally what is in the SOURCE or ORGANISM annotation for 
> GenBank or OS for EMBL) is the best way to go about it [snip]

Perhaps, but again I'm not sure what this has to do with what I was
saying. If you don't want your species name to contain your genus name
you have to do some kind of parsing. My post merely pointed out that the
parsing currently in bioperl does not work for viruses and possibly
other species. I'd like to think that someone cares about this error and
would do the simple fix I offered, or that they already know about the
problem and have done their own fix.


> I'm also not sure that forcing a lookup for every TaxID in every 
> sequence every time it's passed through SeqIO is the best way to go 
> either, though I think it should be required for storing sequences. 
> It's a tricky balance.

In my own implementation any database lookups are cached, and you have
the option of not doing any database lookup at all and 'faking' a
taxonomy from the supplied list of names (so it works just like normal
Bio::Seq).


> I still think that maybe we should absolve ourselves from using 
> SOURCE/ORGANISM or OS/OC information in GenBank files as anything 
> more than strictly annotation, or reconstruct Bio::Species to maybe a
>  Bio::Annotation::Species object to handle that annotation and either
>  deprecate Bio::Species or separate it completely from any 
> Bio::Taxonomy objects.  It would really simplify things.  Then, if 
> anyone is interested in taxonomy, either install a local database or
>  use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course)
>  to grab the TaxID info.

My personal view is that having it as an annotation would serve no real
purpose. For me the whole point of any kind of species representation in
bioperl is to allow you to compare species in a biologically meaningful
way. If it's just some annotation then that means it's basically
free-form text and you have no guarantee that two sequences from the
same species are annotated exactly the same - no guarantee that your
code would identify that those sequences are from the same species.
The only other useful thing that a species object needs to do it let you
know how related two different species are - you need to be able to ask
what a species' class, kingdom etc. are. Again, not viable with an
annotation - you need something strict like a properly constructed Taxonomy.

I guess it comes down to the philosophy of parsing a file. Do you try
and reflect exactly what the file contains, letter for letter, so that
your resulting object can recreate that file letter for letter, or do
you parse the file and extract the correct /meaning/ in order to be more
useful?
I think there can be a choice by the user, and this is best done by
making Bio::Species a clever wrapper around an improved Bio::Taxonomy,
as in my own implementation.


From s_maheshwari84 at rediffmail.com  Mon May 15 04:15:26 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 15 May 2006 08:15:26 -0000
Subject: [Bioperl-l] please help
Message-ID: <20060515081526.27270.qmail@webmail7.rediffmail.com>

  
Hello All
I have sent a problem to the earlier also but my problem is still unsolve so i have modified the problem in another way please can any body give me code to make a graph between some items which are in a text file in the following formate:
Example
item1 interacts with item2 and i want to make graph by giving any item as input and asking all interactions of that item.

item 1      item 2 
A            B
A            C
C            B
D            B
D            E
A            F
G            A     

with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI


From sdavis2 at mail.nih.gov  Mon May 15 06:26:53 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 15 May 2006 06:26:53 -0400
Subject: [Bioperl-l] please help
In-Reply-To: <20060515081526.27270.qmail@webmail7.rediffmail.com>
Message-ID: <C08DCFAD.B7D2%sdavis2@mail.nih.gov>


On 5/15/06 4:15 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
wrote:

>   
> Hello All
> I have sent a problem to the earlier also but my problem is still unsolve so i
> have modified the problem in another way please can any body give me code to
> make a graph between some items which are in a text file in the following
> formate:
> Example
> item1 interacts with item2 and i want to make graph by giving any item as
> input and asking all interactions of that item.
> 
> item 1      item 2
> A            B
> A            C
> C            B
> D            B
> D            E
> A            F
> G            A   

Not a bioperl answer, but in your case, I would suggest looking at using
cytoscape to do this.  Look here for details:

http://www.cytoscape.org/

Sean


From sdavis2 at mail.nih.gov  Mon May 15 07:03:28 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 15 May 2006 07:03:28 -0400
Subject: [Bioperl-l] please help
In-Reply-To: <C08DCFAD.B7D2%sdavis2@mail.nih.gov>
Message-ID: <C08DD840.B7DE%sdavis2@mail.nih.gov>


On 5/15/06 6:26 AM, "Sean Davis" <sdavis2 at mail.nih.gov> wrote:

> 
> 
> 
> On 5/15/06 4:15 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
> wrote:
> 
>>   
>> Hello All
>> I have sent a problem to the earlier also but my problem is still unsolve so
>> i
>> have modified the problem in another way please can any body give me code to
>> make a graph between some items which are in a text file in the following
>> formate:
>> Example
>> item1 interacts with item2 and i want to make graph by giving any item as
>> input and asking all interactions of that item.
>> 
>> item 1      item 2
>> A            B
>> A            C
>> C            B
>> D            B
>> D            E
>> A            F
>> G            A  
> 
> Not a bioperl answer, but in your case, I would suggest looking at using
> cytoscape to do this.  Look here for details:
> 
> http://www.cytoscape.org/

I forgot to mention, if you are looking for a perl solution, I would look at
the Graph module.

http://search.cpan.org/~jhi/Graph-0.69/lib/Graph.pod

You can create the graph according to the docs and then use the neighbors()
method (if I remember correctly) to get the nodes connected to the query
node.

Sean


From akarger at CGR.Harvard.edu  Mon May 15 08:20:11 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 15 May 2006 08:20:11 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>

This tool is quite nice, and may save me a lot of perdoc'ing.

A couple of minor interface thoughts. 

1)There's quite a lot of methods for many of the classes. As such, I
think I'll often want to browse through what's available in a class. But
60% or so of the screen real estate is used for "Enter a search
string... OR select a class from the list". IMO, it would be better to
have two pages, a search page and a result page.   It only takes a click
on Back (or a "new search" button) to get to a new search, and now you
can use your whole screen for reading your results.

2) Please sort the "select a class from the list" alphabetically. I
guess I can enter a search term to get the right classes, but it would
be nice to be able to browse.
2a) if you want to be really fancy, make a javascript nested menu with
expandable submenus. OK, maybe not.

3) Minimalist is nice, but documentation is even nicer. It wasn't clear
to me that the search searches within class names rather than function
names. What I really want to know sometimes is which module has, say,
the revcom method in it. So, if it's not easy to include that within
this search, then at least tell me what my search space is.

4) When I search for something that's not found, I get a screen that
looks pretty familiar, with the extra text "No match to string found"
down at the bottom. It took me a while to even notice it. (Studies show
that most users don't read most of the text on a page.) Bold might be
nice here. Or put the error at the top of the screen. Or both.

5) I'll save my stupidest comment for last - please make the page title
"Bioperl Deobfuscator", so that when I bookmark it I'll know what the
bookmark stands for.

Thanks, Laura Kavanaugh and David Messina, for a neat AND useful tool.

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626


From sb at mrc-dunn.cam.ac.uk  Mon May 15 09:08:32 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 14:08:32 +0100
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>
References: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>
Message-ID: <44687D50.6080306@mrc-dunn.cam.ac.uk>

Amir Karger wrote:
> This tool is quite nice, and may save me a lot of perdoc'ing.

Yes, many thanks to everyone involved.


> A couple of minor interface thoughts. 
> 
> 1)There's quite a lot of methods for many of the classes. As such, I
> think I'll often want to browse through what's available in a class. But
> 60% or so of the screen real estate is used for "Enter a search
> string... OR select a class from the list". IMO, it would be better to
> have two pages, a search page and a result page.   It only takes a click
> on Back (or a "new search" button) to get to a new search, and now you
> can use your whole screen for reading your results.

As the compromise it must be, I like the way it behaves. I don't like 
lots of windows. I especially don't like pop up windows. Right now when 
I'm using the bioperl docs I tend to have a whole bunch of tabs open to 
different class pages at once, so being able to see an overview all on 
one page in Deobfuscator is very nice.

Further to that, I'd love it if clicking on a method name caused an 
in-place css(&|javascript) reveal (similar to how a well implemented 
drop down menu works in a website) rather than a new window opened. 
Alternatively, just have more columns in the results table, ie. usage, 
function, returns, args columns. I feel that opening a window for each 
method you want to understand is far too slow.

I'd also really like a link to the code for the method as well. The 
bioperl docs are rarely complete enough that you can really understand 
what every method is supposed to do without looking at the code.


> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
> to me that the search searches within class names rather than function
> names. What I really want to know sometimes is which module has, say,
> the revcom method in it.

This would be a great feature to add.


Another minor interface thought:
6) Have a little more cell padding in all the tables. Things are just a 
little too cramped and things start to look messy/ run into each other.


From cjfields at uiuc.edu  Mon May 15 09:59:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 08:59:57 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <44687D50.6080306@mrc-dunn.cam.ac.uk>
Message-ID: <000901c67827$d99eabb0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, May 15, 2006 8:09 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Amir Karger wrote:
> > This tool is quite nice, and may save me a lot of perdoc'ing.
> 
> Yes, many thanks to everyone involved.

The Deobfuscator currently indexes bioperl-1.4, so it's not completely
up-to-date.  I believe Mauricio and Dave may be working on updating to the
newer versions and maybe bioperl-live, as well as getting the other bioperl
packages up and running.

For modules added after v1.4 I use the script in the FAQ question mentioned
on the Deobfuscator wiki page to get up-to-date methods, then grab the that
ActiveState HTML'd perldocs pumped out when installing using PPM (I make a
custom PPM/PPD file and install myself every once in a while):

#!/usr/bin/perl -w
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector-

> > A couple of minor interface thoughts.
> >
> > 1)There's quite a lot of methods for many of the classes. As such, I
> > think I'll often want to browse through what's available in a class. But
> > 60% or so of the screen real estate is used for "Enter a search
> > string... OR select a class from the list". IMO, it would be better to
> > have two pages, a search page and a result page.   It only takes a click
> > on Back (or a "new search" button) to get to a new search, and now you
> > can use your whole screen for reading your results.
> 
> As the compromise it must be, I like the way it behaves. I don't like
> lots of windows. I especially don't like pop up windows. Right now when
> I'm using the bioperl docs I tend to have a whole bunch of tabs open to
> different class pages at once, so being able to see an overview all on
> one page in Deobfuscator is very nice.
>
> Further to that, I'd love it if clicking on a method name caused an
> in-place css(&|javascript) reveal (similar to how a well implemented
> drop down menu works in a website) rather than a new window opened.
> Alternatively, just have more columns in the results table, ie. usage,
> function, returns, args columns. I feel that opening a window for each
> method you want to understand is far too slow.

Agreed.

> I'd also really like a link to the code for the method as well. The
> bioperl docs are rarely complete enough that you can really understand
> what every method is supposed to do without looking at the code.

The methods that pop up are in columns along with the class module that
implements the method.  


If you click on that link you get PDOC documentation for the module which
includes most of the code (strangely, though Deobfuscator indexes bioperl
1.4, the PDOC corresponds to bioperl-live).  Is that what you meant, or
something a bit more detailed?

> > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
> > to me that the search searches within class names rather than function
> > names. What I really want to know sometimes is which module has, say,
> > the revcom method in it.

That's listed in the method results table (the next column has the module
with a link to the module's online docs).


Chris


> This would be a great feature to add.
> 
> 
> Another minor interface thought:
> 6) Have a little more cell padding in all the tables. Things are just a
> little too cramped and things start to look messy/ run into each other.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 12:08:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 11:08:30 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <44683943.5020307@mrc-dunn.cam.ac.uk>
Message-ID: <001601c67839$cf289490$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, May 15, 2006 3:18 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
> subspecies/variant names
> 
> Chris Fields wrote:
> > Sendu Bala wrote:
> >> In bioperl up to at least 1.5.1, when one of the database modules
> >> comes across a species rank it does:
> >>
> >> if ($rank eq 'species') { # get rid of genus from species name
> >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); }
> >
> > The XML example from NCBI Taxonomy I mentioned previously seems to
> > have everything in the classification, from superkingdom down to
> > species (no strain unfortunately, and I'm nit sure about subspecies);
> > if it's missing the rank then the designation doesn't exist or is
> > tagged as 'no rank'.  Like I mentioned before I'm not intimately
> > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I
> > don't have a clue as to how everything is parsed and plugged in to
> > Bio::Taxonomy objects.  I do know that XML::Twig is used for parsing
> > through the data so it shouldn't be too hard to change what you
> > want.
> 
> Yes, that's all true, but I'm not sure what it has to do with what I was
> saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my
> own implementation I change the rank of all 'no rank' Nodes below
> species to 'variant'.

Sorry; wandered a bit off topic there.

> > I haven't tried using Bio::DB::Taxonomy directly yet, but I would
> > have thought that the binomial is just built from the XML twig
> > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the
> > tag 'Genus' and species from 'Species', and that the scientific name
> > is from the tag 'ScientificName'.  Guess not.
> 
> No. See above for what it actually does. That is a copy/paste from the
> code (there, $taxon_name == ScientificName). When it finds a species
> rank it does that split because in the
> ncbi taxonomy database the 'genus' rank for a human has a ScientificName
> of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo
> sapiens', and the bioperl model (quite rightly, I think) wants the
> 'species' node to not have information of other nodes (well, except for
> the classification array). So it removes the 'Homo' from 'Homo sapiens'
> giving a species name of 'sapiens'. This then allows the binomial method
> to return 'Homo sapiens' instead of 'Homo Homo sapiens'.
> 
> (though in a bizarre twist, and this is one of my problems with how
> names are currently represented in the Taxonomy modules, 'Scientific
> Name' and 'binomial' are synonymous)
 
Ah, now I see.  That's a bit screwy, but it's not on our end so we have to
deal with it.  I also noticed that subspecies also contains the entire
string:

    <Taxon>
      <TaxId>135461</TaxId>
      <ScientificName>Bacillus subtilis subsp. subtilis</ScientificName>
      <Rank>subspecies</Rank>
    </Taxon>

As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy,
I don't get the actual scientific name for the node (from the GenBank
ORGANISM line) almost every time; I get the name with the strain chopped off
instead and a number of times the names get mangled.  The regexes below only
grab from the topmost tags:

Script:
---------------------------------
#! perl
use strict;
use warnings;

use Bio::DB::Taxonomy;
my $file = shift @ARGV;

print "\nNCBI XML output ScientificName tag for each node:\n";
my @taxid =();
open (TAXFILE, "<tax.xml") or die "Can't open file:$!\n";
while (<TAXFILE>){
	if (/^\s{2}<TaxId>(\d+)<\/TaxId>/) {
		print "$1\t";
		push @taxid, $1;
	}
	print "$1\n" if /^\s{2}<ScientificName>(.*)<\/ScientificName>/;
}
close TAXFILE;

print "\nBio::DB::Taxonomy scientific_name:\n";
for my $id (@taxid){
	my $factory = Bio::DB::Taxonomy->new(-source => 'entrez');
	my $node = $factory->get_Taxonomy_Node(-taxonid => $id);
	print $node->ncbi_taxid,"\t",$node->scientific_name,"\n";
}
---------------------------------

Output:
---------------------------------
NCBI XML output ScientificName tag for each node:
191218  Bacillus anthracis str. A2012
198094  Bacillus anthracis str. Ames
222523  Bacillus cereus ATCC 10987
224308  Bacillus subtilis subsp. subtilis str. 168
226186  Bacteroides thetaiotaomicron VPI-5482
226900  Bacillus cereus ATCC 14579
246194  Carboxydothermus hydrogenoformans Z-2901
260799  Bacillus anthracis str. Sterne
261594  Bacillus anthracis str. 'Ames Ancestor'
264462  Bdellovibrio bacteriovorus HD100
272558  Bacillus halodurans C-125
272559  Bacteroides fragilis NCTC 9343
279010  Bacillus licheniformis ATCC 14580
281309  Bacillus thuringiensis serovar konkukian str. 97-27
288681  Bacillus cereus E33L
295405  Bacteroides fragilis YCH46
66692   Bacillus clausii KSM-K16
76114   Azoarcus sp. EbN1

Bio::DB::Taxonomy scientific_name:
191218  Bacillus cereus group anthracis
198094  Bacillus cereus group anthracis
222523  Bacillus cereus group cereus
224308  subtilis Bacillus subtilis subsp. subtilis
226186  Bacteroides thetaiotaomicron
226900  Bacillus cereus group cereus
246194  Carboxydothermus hydrogenoformans
260799  Bacillus cereus group anthracis
261594  Bacillus cereus group anthracis
264462  Bdellovibrio bacteriovorus
272558  Bacillus halodurans
272559  Bacteroides fragilis
279010  Bacillus licheniformis
281309  Bacillus cereus group thuringiensis
288681  Bacillus cereus group cereus
295405  Bacteroides fragilis
66692   Bacillus clausii
76114   Azoarcus sp.
---------------------------------
Note Bacillus subtilis in the Bio::Tax output above.  Not one of those is
the scientific name as defined by NCBI (and most taxonomists for that
matter).

So, in a nutshell, there's a problem here.  I don't know if your fix works
for that, but I definitely don't think the 'scientific name' should be
assembled ad hoc but should be taken from the tagname for that node.  I am
currently reduced to grabbing the feature primary_tagged 'source' and
getting the 'organism' tagname from that.  I cannot stress enough that it
should NOT be that way.

As for 'binomial' == 'scientific_name', I agree; I see it as well and that
should be fixed.
 
...
> Perhaps, but again I'm not sure what this has to do with what I was
> saying. If you don't want your species name to contain your genus name
> you have to do some kind of parsing. My post merely pointed out that the
> parsing currently in bioperl does not work for viruses and possibly
> other species. I'd like to think that someone cares about this error and
> would do the simple fix I offered, or that they already know about the
> problem and have done their own fix.

Again me going off-topic, so my apologies; it's more to do with my
frustrations with Bio::Species (not Bio::DB::Taxonomy).  My point here was,
since there is no real way to surmise from a GenBank flatfile what the
taxonomic ranks are w/o guessing (which seems to break more often than not
when dealing with complex names), there shouldn't be any tie to Bio::Tax
objects, at least directly.  I guess methods could be incorporated into
Bio::Species for those who want to give it a try, but I would like to get a
GenBank file, for once, in which the scientific name/binomial name isn't
mangled by Bio::Species.

Back to Bio::DB::Taxonomy; I don't have a problem with implementing your
methods here; on the contrary, if they fix my problem above then I'll be
more than glad to.  I can't get to it immediately but maybe later
today/tomorrow.
 
> > I'm also not sure that forcing a lookup for every TaxID in every
> > sequence every time it's passed through SeqIO is the best way to go
> > either, though I think it should be required for storing sequences.
> > It's a tricky balance.
> 
> In my own implementation any database lookups are cached, and you have
> the option of not doing any database lookup at all and 'faking' a
> taxonomy from the supplied list of names (so it works just like normal
> Bio::Seq).
>
> 
> > I still think that maybe we should absolve ourselves from using
> > SOURCE/ORGANISM or OS/OC information in GenBank files as anything
> > more than strictly annotation, or reconstruct Bio::Species to maybe a
> >  Bio::Annotation::Species object to handle that annotation and either
> >  deprecate Bio::Species or separate it completely from any
> > Bio::Taxonomy objects.  It would really simplify things.  Then, if
> > anyone is interested in taxonomy, either install a local database or
> >  use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course)
> >  to grab the TaxID info.
> 
> My personal view is that having it as an annotation would serve no real
> purpose. For me the whole point of any kind of species representation in
> bioperl is to allow you to compare species in a biologically meaningful
> way. If it's just some annotation then that means it's basically
> free-form text and you have no guarantee that two sequences from the
> same species are annotated exactly the same - no guarantee that your
> code would identify that those sequences are from the same species.
> The only other useful thing that a species object needs to do it let you
> know how related two different species are - you need to be able to ask
> what a species' class, kingdom etc. are. Again, not viable with an
> annotation - you need something strict like a properly constructed
> Taxonomy.

My point is, a large number of users do NOT use, nor care about, taxonomic
information to the degree they need to know the entire classification of the
organism; many are just as happy about getting the scientific name only,
which is in the GenBank/EMBL file itself.  To take one extreme, it is not
productive to force every user to download the NCBI tax database and use
lookups just to convert sequences from EMBL format to GenBank format.  It's
not productive to allow users to spam the NCBI tax database remotely either,
so hardcoding lookups is, IMHO, a big mistake.  

> I guess it comes down to the philosophy of parsing a file. Do you try
> and reflect exactly what the file contains, letter for letter, so that
> your resulting object can recreate that file letter for letter, or do
> you parse the file and extract the correct /meaning/ in order to be more
> useful?
> I think there can be a choice by the user, and this is best done by
> making Bio::Species a clever wrapper around an improved Bio::Taxonomy,
> as in my own implementation.

I understand both philosophies, but the latter implies that you know the
intention of the ones submitting the sequence.  99.9% of the time that's
fine, something I can live with.  However, when we mess up something as
simple as getting the scientific name for an organism when the information
is directly in the flat file (ORGANISM line) by trying to 'imply' what the
classification is, yes, I get frustrated.  Even more frustrating to me is
that Bio::DB::Taxonomy, which should return accurate information directly
from the Taxonomy database, still manages to screw up the scientific name.  

The NCBI definition in the sample record:

http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

state that the ORGANISM line contains the formal scientific name and it's
lineage (no ranking).  If the lineage is very long it is abbreviated so you
don't get the same thing as you would through using TaxID. 

So, in essence, I believe you are correct, that Bio::Species can be used as
a 'wrapper' for Bio::Taxonomy objects, but only up to a certain degree with
caveats or warnings for possible inaccuracies.  I also believe that lookups
should be allowed but optional, not required (i.e. left up to the user, as
you state).  

I just feel that it's somewhat misleading to imply, by delegating to
Bio::Taxonomy, that Bio::Species contains accurate taxonomic information
when NCBI themselves state that the GenBank flatfile classification can be
incomplete and does not supply rankings (genus, species) in the file.  It's
our best guess in most cases, and a best guess by definition is not very
accurate.  If you want taxonomic accuracy, use the TaxID and a local tax
database.  I feel that we shouldn't punish those who don't worry/care about
taxonomy by implementing Bio::Species with methods that mangle data that's
directly in the flat file they're parsing.

Okay, not to cut short this discussion, but I have to get back to $job.
I'll try adding your fixes in a bit later today/tomorrow; if they pass tests
I'll commit them in.

Chris


From hlapp at gmx.net  Mon May 15 12:59:06 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 12:59:06 -0400
Subject: [Bioperl-l] error loading uniprot release 49.6 into mysql
In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
Message-ID: <C78E4724-CC95-483E-876B-69AF7C1CC6AF@gmx.net>

You found the right instance. Unfortunately with the way the bioperl  
swissprot parser works the group (RG) isn't promoted to author if  
there is no author in addition (in fact you may debate whether that  
would even be the best way of doing things), so it doesn't find it on  
second occurrence by unique key.

If you can live without this entry, or any other entry that causes a  
hiccup, just supply the flag --safe and it will gracefully move on to  
the next entry.

Fixing the issue would require either to fix the bioperl swissprot  
parser (or Bio::Annotation::Reference) to stick the RG group into the  
author slot if there is no author, or to fix Bioperl  
Bio::Annotation::Reference to also feature a group and biosql to use  
it in place of a missing author.

Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql)  
should just use that in place of a missing author?

The downside is that upon round-tripping an entry, the RG annotation  
line will become an RA annotation line. How bad would that be?

Any thoughts from anyone?

	-hilmar

On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote:

> I found where the script is hiccuping....
>
> The Uniprot release contains lines with identical annotation for  
> the RL keyword for two different sequences.
>
> ___________________
>
> First occurence...
> ___________________
>
> ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
> AC   Q5RFJ2; Q5RDK2;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein theta.
> GN   Name=YWHAQ;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Brain cortex, and Kidney;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.   
> <======  Not Unique
>
>
> ___________________
>
> Second occurence...
> ___________________
>
>
> ID   1433G_PONPY    STANDARD;      PRT;   246 AA.
> AC   Q5RC20;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein gamma.
> GN   Name=YWHAG;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Heart;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.    
> <======  Not Unique
>
>
>
> in these two cases the generated CRC key is identical and so MySQL  
> throws a wobbly.
>
> if i look at the MySQL entry in the REFERENCE table for the first  
> sequence
> ------+-------+---------+----------------------+
> |          139 |      NULL | Submitted (NOV-2004) to the EMBL/ 
> GenBank/DDBJ databases. | NULL  | NULL    | CRC-E7973FEA4B5611DC |
> +--------------+----------- 
> +----------------------------------------------------
>
> and the error when the script choked was
>
>  MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
> values were
>  ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ
>  databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
>  Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3
>
> hence the problem.
>
> I'm guessing i'm not the first person to encounter this, but dont  
> see any hints for an easy way around this.
>
> any suggestions....?
>
> ta
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon May 15 13:01:14 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 13:01:14 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <4466AD7F.6050700@campus.iztacala.unam.mx>
References: <4466AD7F.6050700@campus.iztacala.unam.mx>
Message-ID: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net>

Hey, thanks to Laura & David for this interface.

Any idea why most of the Bio::Ontology::* modules show up without  
their leading Bio::Ontology? And clicking on those hyperlinks doesn't  
go anywhere either ... Anything different with those modules that I  
can fix?

	-hilmar

On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:

> I'm glad to announce the availability of the Deobfuscator interface at
> the BioPerl website. You can use it at the following URL:
>
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> Many thanks to Laura Kavanaugh and David Messina for this great
> contribution to the BioPerl project!
>
> Mauricio.
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 13:22:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 12:22:13 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net>
Message-ID: <000301c67844$1b506280$15327e82@pyrimidine>

That's strange.  Clicking on the list gives me the results for that module.
When I click on the hyperlinks in the results section they open fine; the
method column links opens a new page containing usage-function-returns-args
and the class column links opens pdoc (same page) for bioperl-live.  I'm
using Firefox 1.5 on WinXP.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 12:01 PM
> To: Mauricio Herrera Cuadra
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Hey, thanks to Laura & David for this interface.
> 
> Any idea why most of the Bio::Ontology::* modules show up without
> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> go anywhere either ... Anything different with those modules that I
> can fix?
> 
> 	-hilmar
> 
> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> 
> > I'm glad to announce the availability of the Deobfuscator interface at
> > the BioPerl website. You can use it at the following URL:
> >
> > http://bioperl.org/cgi-bin/deob_interface.cgi
> >
> > Many thanks to Laura Kavanaugh and David Messina for this great
> > contribution to the BioPerl project!
> >
> > Mauricio.
> >
> > --
> > MAURICIO HERRERA CUADRA
> > arareko at campus.iztacala.unam.mx
> > Laboratorio de Gen?tica
> > Unidad de Morfofisiolog?a y Funci?n
> > Facultad de Estudios Superiores Iztacala, UNAM
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Mon May 15 14:00:15 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 19:00:15 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
 subspecies/variant names
In-Reply-To: <001601c67839$cf289490$15327e82@pyrimidine>
References: <001601c67839$cf289490$15327e82@pyrimidine>
Message-ID: <4468C1AF.9080400@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Ah, now I see.  That's a bit screwy, but it's not on our end so we have to
> deal with it.  I also noticed that subspecies also contains the entire
> string:
> 
>     <Taxon>
>       <TaxId>135461</TaxId>
>       <ScientificName>Bacillus subtilis subsp. subtilis</ScientificName>
>       <Rank>subspecies</Rank>
>     </Taxon>

Yes, this is one of the problems I mentioned in the first post to this
thread.


> As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy,
> I don't get the actual scientific name for the node (from the GenBank
> ORGANISM line) almost every time; I get the name with the strain chopped off
> instead and a number of times the names get mangled.

[snip, should be:]
> 224308  Bacillus subtilis subsp. subtilis str. 168
> 281309  Bacillus thuringiensis serovar konkukian str. 97-27

[snip, but Bio::DB::Taxonomy gives:]
> 224308  subtilis Bacillus subtilis subsp. subtilis
> 281309  Bacillus cereus group thuringiensis

[snip]
> So, in a nutshell, there's a problem here.  I don't know if your fix works
> for that, but I definitely don't think the 'scientific name' should be
> assembled ad hoc but should be taken from the tagname for that node.

Yes, my implementation will get you the correct answer, but not quite as
you say. My solution was to munge the actual ScientificName but 'ensure'
that the binomial would give you back the actual binomial name you
wanted - which is the intent of current Bio::DB::Taxonomy code.

my $species0 = TFBS::Species->new(-ncbi_taxid => 224308);
my $leaf_node = $species0->taxonomy->get_leaves();
print "sci_name of Node = '", $leaf_node->scientific_name, "'\n";
print "Species0 subspecies = '", $species0->subspecies, "'\n";
print "Species0 variants = '", scalar($species0->variant), "'\n";
print "Species0 binomial = '", $species0->binomial('FULL'), "'\n";

gives:
sci_name of Node = 'str. 168'
Species0 subspecies = 'subsp. subtilis'
Species0 variants = 'str. 168'
Species0 binomial = 'Bacillus subtilis subsp. subtilis str. 168'

and the same again for id 281309:

sci_name of Node = 'str. 97-27'
Species0 subspecies = ''
Species0 variants = 'serovar konkukian str. 97-27'
Species0 binomial = 'Bacillus thuringiensis serovar konkukian str. 97-27'

I've done it this way because even though strictly speaking the
ScientificName for 224308 (a 'no rank') is 'Bacillus subtilis subsp.
subtilis str. 168', when I ask for the variant I don't want that whole
string. I just want the bit that will be different when comparing other
strains of this subspecies of this species of Bacillus. I want 'str.
168'. Note that my objects never store the original ScientificName; it
is due to 'luck' (or as I like to think, a good implementation) that the
binomial method is able to reconstruct a string that is identical to
what the original ScientificName was.

If you'd like to see my code let me know. You can't just drop the code
snippet I posted in this thread into existing bioperl modules; quite a
bit else has to change as well. I'll have to make an updated
taxonomy_the_tfbs_way.tar.gz file available if you want an example
implementation; the current version of that file is now out of date - it
doesn't do any of what I describe above.


From hlapp at gmx.net  Mon May 15 14:08:49 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 14:08:49 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000301c67844$1b506280$15327e82@pyrimidine>
References: <000301c67844$1b506280$15327e82@pyrimidine>
Message-ID: <F85F6F46-3AB7-4D42-825B-BAD4CA748FC8@gmx.net>

Safari or Firefox on MacOSX don't do this. Note that the appearance  
in the browsable list is already different (the prefix is missing),  
and the JavaScript link also lacks the prefix in the module name in  
contrast to others, e.g., Bio::Ontology::Ontology (which is one of  
the few Bio::Ontology exceptions that do work and do display correctly).

I suppose there is something peculiar about the code formatting of  
those modules? Some of the modules under Bio::OntologyIO are also  
affected BTW.

What happens is after you click on the link the page apppears to  
reload (i.e., gets submitted) but the second table that is supposed  
open underneath the first doesn't appear. However, the sort-by drop  
down selector does appear.

	-hilmar

On May 15, 2006, at 1:22 PM, Chris Fields wrote:

> That's strange.  Clicking on the list gives me the results for that  
> module.
> When I click on the hyperlinks in the results section they open  
> fine; the
> method column links opens a new page containing usage-function- 
> returns-args
> and the class column links opens pdoc (same page) for bioperl- 
> live.  I'm
> using Firefox 1.5 on WinXP.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Monday, May 15, 2006 12:01 PM
>> To: Mauricio Herrera Cuadra
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Hey, thanks to Laura & David for this interface.
>>
>> Any idea why most of the Bio::Ontology::* modules show up without
>> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
>> go anywhere either ... Anything different with those modules that I
>> can fix?
>>
>> 	-hilmar
>>
>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>
>>> I'm glad to announce the availability of the Deobfuscator  
>>> interface at
>>> the BioPerl website. You can use it at the following URL:
>>>
>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>
>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>> contribution to the BioPerl project!
>>>
>>> Mauricio.
>>>
>>> --
>>> MAURICIO HERRERA CUADRA
>>> arareko at campus.iztacala.unam.mx
>>> Laboratorio de Gen?tica
>>> Unidad de Morfofisiolog?a y Funci?n
>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 15:07:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 14:07:59 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <F85F6F46-3AB7-4D42-825B-BAD4CA748FC8@gmx.net>
Message-ID: <000501c67852$e1bb55c0$15327e82@pyrimidine>

I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab
which I can try it on).  I'll let you know what I find.  

This is what I get when I do a search for 'Bio::Ont*' using Firefox on WinXP
and this Deobfuscator link (http://bioperl.org/cgi-bin/deob_interface.cgi?);
all the classes have links that work (I added newline and tab to make it a
bit more readable) :

Bio::OntologyIO	
	Parser factory for Ontology formats
Bio::OntologyIO::Handlers::BaseSAXHandler	
	no short description available
Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
	no short description available
Bio::Ontology::OntologyI
	Interface for an ontology implementation
Bio::Ontology::TermFactory
	Instantiates a new Bio::Ontology::TermI (or derived class) through a
factory
Bio::Ontology::OntologyStore
	A repository of ontologies
Bio::Ontology::RelationshipFactory
	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
through a factory
Bio::Ontology::Ontology
	standard implementation of an Ontology

So the names seem fine here.

When I click on a class (Bio::Ontology::Ontology) I get in the results
section:

Method                  Class                                     Returns
Usage
add_relationship        Bio::Ontology::Ontology	                  Its
argument.     add_relationship(RelationshipI relationship): RelationshipI
add_relationship_type   Bio::Ontology::OntologyEngineI            not
documented    not documented
add_term                Bio::Ontology::Ontology                   its
argument.     add_term(TermI term): TermI

....and so on

Where each method is clickable and opens a new page containing a table:

Bio::Ontology::Ontology::add_relationship
Usage	add_relationship(RelationshipI relationship): RelationshipI
Function	Adds a relationship object to the ontology engine.
Returns	Its argument.
Args	A RelationshipI object.


Each class is also linked to the bioperl-live PDOC.  Clicking on class
Bio::Ontology::Ontology in the results table gets me this page (no new
page):

http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html


Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Monday, May 15, 2006 1:09 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Safari or Firefox on MacOSX don't do this. Note that the appearance
> in the browsable list is already different (the prefix is missing),
> and the JavaScript link also lacks the prefix in the module name in
> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> the few Bio::Ontology exceptions that do work and do display correctly).
> 
> I suppose there is something peculiar about the code formatting of
> those modules? Some of the modules under Bio::OntologyIO are also
> affected BTW.
> 
> What happens is after you click on the link the page apppears to
> reload (i.e., gets submitted) but the second table that is supposed
> open underneath the first doesn't appear. However, the sort-by drop
> down selector does appear.
> 
> 	-hilmar
> 
> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> 
> > That's strange.  Clicking on the list gives me the results for that
> > module.
> > When I click on the hyperlinks in the results section they open
> > fine; the
> > method column links opens a new page containing usage-function-
> > returns-args
> > and the class column links opens pdoc (same page) for bioperl-
> > live.  I'm
> > using Firefox 1.5 on WinXP.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >> Sent: Monday, May 15, 2006 12:01 PM
> >> To: Mauricio Herrera Cuadra
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> Hey, thanks to Laura & David for this interface.
> >>
> >> Any idea why most of the Bio::Ontology::* modules show up without
> >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> >> go anywhere either ... Anything different with those modules that I
> >> can fix?
> >>
> >> 	-hilmar
> >>
> >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>
> >>> I'm glad to announce the availability of the Deobfuscator
> >>> interface at
> >>> the BioPerl website. You can use it at the following URL:
> >>>
> >>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>
> >>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>> contribution to the BioPerl project!
> >>>
> >>> Mauricio.
> >>>
> >>> --
> >>> MAURICIO HERRERA CUADRA
> >>> arareko at campus.iztacala.unam.mx
> >>> Laboratorio de Gen?tica
> >>> Unidad de Morfofisiolog?a y Funci?n
> >>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From cjfields at uiuc.edu  Mon May 15 15:12:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 14:12:34 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <000601c67853$85d49cc0$15327e82@pyrimidine>

I just tried the same thing (links, search, etc) with Mac OS X v 10.3.9 and
Safari (no Firefox sorry) and it worked fine as well (all links, no missing
Bio::Ontology, etc).  Not sure what it could be...

Chris

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Monday, May 15, 2006 2:08 PM
> To: 'Hilmar Lapp'
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: RE: [Bioperl-l] Deobfuscator interface now available
> 
> I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab
> which I can try it on).  I'll let you know what I find.
> 
> This is what I get when I do a search for 'Bio::Ont*' using Firefox on
> WinXP and this Deobfuscator link (http://bioperl.org/cgi-
> bin/deob_interface.cgi?); all the classes have links that work (I added
> newline and tab to make it a bit more readable) :
> 
> Bio::OntologyIO
> 	Parser factory for Ontology formats
> Bio::OntologyIO::Handlers::BaseSAXHandler
> 	no short description available
> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> 	no short description available
> Bio::Ontology::OntologyI
> 	Interface for an ontology implementation
> Bio::Ontology::TermFactory
> 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> factory
> Bio::Ontology::OntologyStore
> 	A repository of ontologies
> Bio::Ontology::RelationshipFactory
> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> through a factory
> Bio::Ontology::Ontology
> 	standard implementation of an Ontology
> 
> So the names seem fine here.
> 
> When I click on a class (Bio::Ontology::Ontology) I get in the results
> section:
> 
> Method                  Class                                     Returns
> Usage
> add_relationship        Bio::Ontology::Ontology
Its
> argument.     add_relationship(RelationshipI relationship): RelationshipI
> add_relationship_type   Bio::Ontology::OntologyEngineI            not
> documented    not documented
> add_term                Bio::Ontology::Ontology                   its
> argument.     add_term(TermI term): TermI
> 
> ....and so on
> 
> Where each method is clickable and opens a new page containing a table:
> 
> Bio::Ontology::Ontology::add_relationship
> Usage	add_relationship(RelationshipI relationship): RelationshipI
> Function	Adds a relationship object to the ontology engine.
> Returns	Its argument.
> Args	A RelationshipI object.
> 
> 
> Each class is also linked to the bioperl-live PDOC.  Clicking on class
> Bio::Ontology::Ontology in the results table gets me this page (no new
> page):
> 
> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> 
> 
> Chris
> 
> > -----Original Message-----
> > From: Hilmar Lapp [mailto:hlapp at gmx.net]
> > Sent: Monday, May 15, 2006 1:09 PM
> > To: Chris Fields
> > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> > Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >
> > Safari or Firefox on MacOSX don't do this. Note that the appearance
> > in the browsable list is already different (the prefix is missing),
> > and the JavaScript link also lacks the prefix in the module name in
> > contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> > the few Bio::Ontology exceptions that do work and do display correctly).
> >
> > I suppose there is something peculiar about the code formatting of
> > those modules? Some of the modules under Bio::OntologyIO are also
> > affected BTW.
> >
> > What happens is after you click on the link the page apppears to
> > reload (i.e., gets submitted) but the second table that is supposed
> > open underneath the first doesn't appear. However, the sort-by drop
> > down selector does appear.
> >
> > 	-hilmar
> >
> > On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >
> > > That's strange.  Clicking on the list gives me the results for that
> > > module.
> > > When I click on the hyperlinks in the results section they open
> > > fine; the
> > > method column links opens a new page containing usage-function-
> > > returns-args
> > > and the class column links opens pdoc (same page) for bioperl-
> > > live.  I'm
> > > using Firefox 1.5 on WinXP.
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> > >> Sent: Monday, May 15, 2006 12:01 PM
> > >> To: Mauricio Herrera Cuadra
> > >> Cc: bioperl-l
> > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> > >>
> > >> Hey, thanks to Laura & David for this interface.
> > >>
> > >> Any idea why most of the Bio::Ontology::* modules show up without
> > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> > >> go anywhere either ... Anything different with those modules that I
> > >> can fix?
> > >>
> > >> 	-hilmar
> > >>
> > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> > >>
> > >>> I'm glad to announce the availability of the Deobfuscator
> > >>> interface at
> > >>> the BioPerl website. You can use it at the following URL:
> > >>>
> > >>> http://bioperl.org/cgi-bin/deob_interface.cgi
> > >>>
> > >>> Many thanks to Laura Kavanaugh and David Messina for this great
> > >>> contribution to the BioPerl project!
> > >>>
> > >>> Mauricio.
> > >>>
> > >>> --
> > >>> MAURICIO HERRERA CUADRA
> > >>> arareko at campus.iztacala.unam.mx
> > >>> Laboratorio de Gen?tica
> > >>> Unidad de Morfofisiolog?a y Funci?n
> > >>> Facultad de Estudios Superiores Iztacala, UNAM
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>
> > >>
> > >> --
> > >> ===========================================================
> > >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > >> ===========================================================
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >


From arareko at campus.iztacala.unam.mx  Mon May 15 15:20:10 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 15 May 2006 14:20:10 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
Message-ID: <4468D46A.8070203@campus.iztacala.unam.mx>

Laura and Dave would be very happy to see all of your 
comments/suggestions/enhancements/complaints summarized in the 
appropriate wiki page. Just be sure to sign them properly with your name 
and date:

http://bioperl.org/wiki/Deobfuscator

I think they'll have to discuss which features will be nice to implement 
and which don't, depending on the direction they want their project to 
go. But don't worry, they're extremely nice people who are open to all 
kind of ideas. The best of all: the Deobfuscator is open-source so 
everyone is invited to contribute to it, just ask them for the code :)

On my side, I'm working on tweaking the code so it would be able of 
browsing different BioPerl packages (core, run, ext) and their 
respective releases (stable, developer, cvs).

Regards,
Mauricio.

Chris Fields wrote:
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>> Sent: Monday, May 15, 2006 8:09 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Amir Karger wrote:
>>> This tool is quite nice, and may save me a lot of perdoc'ing.
>> Yes, many thanks to everyone involved.
> 
> The Deobfuscator currently indexes bioperl-1.4, so it's not completely
> up-to-date.  I believe Mauricio and Dave may be working on updating to the
> newer versions and maybe bioperl-live, as well as getting the other bioperl
> packages up and running.
> 
> For modules added after v1.4 I use the script in the FAQ question mentioned
> on the Deobfuscator wiki page to get up-to-date methods, then grab the that
> ActiveState HTML'd perldocs pumped out when installing using PPM (I make a
> custom PPM/PPD file and install myself every once in a while):
> 
> #!/usr/bin/perl -w
> use Class::Inspector;
> $class = shift || die "Usage: methods perl_class_name\n";
> eval "require $class";
> print join ("\n", sort @{Class::Inspector-
> 
>>> A couple of minor interface thoughts.
>>>
>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>> think I'll often want to browse through what's available in a class. But
>>> 60% or so of the screen real estate is used for "Enter a search
>>> string... OR select a class from the list". IMO, it would be better to
>>> have two pages, a search page and a result page.   It only takes a click
>>> on Back (or a "new search" button) to get to a new search, and now you
>>> can use your whole screen for reading your results.
>> As the compromise it must be, I like the way it behaves. I don't like
>> lots of windows. I especially don't like pop up windows. Right now when
>> I'm using the bioperl docs I tend to have a whole bunch of tabs open to
>> different class pages at once, so being able to see an overview all on
>> one page in Deobfuscator is very nice.
>>
>> Further to that, I'd love it if clicking on a method name caused an
>> in-place css(&|javascript) reveal (similar to how a well implemented
>> drop down menu works in a website) rather than a new window opened.
>> Alternatively, just have more columns in the results table, ie. usage,
>> function, returns, args columns. I feel that opening a window for each
>> method you want to understand is far too slow.
> 
> Agreed.
> 
>> I'd also really like a link to the code for the method as well. The
>> bioperl docs are rarely complete enough that you can really understand
>> what every method is supposed to do without looking at the code.
> 
> The methods that pop up are in columns along with the class module that
> implements the method.  
> 
> 
> If you click on that link you get PDOC documentation for the module which
> includes most of the code (strangely, though Deobfuscator indexes bioperl
> 1.4, the PDOC corresponds to bioperl-live).  Is that what you meant, or
> something a bit more detailed?
> 
>>> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
>>> to me that the search searches within class names rather than function
>>> names. What I really want to know sometimes is which module has, say,
>>> the revcom method in it.
> 
> That's listed in the method results table (the next column has the module
> with a link to the module's online docs).
> 
> 
> Chris
> 
> 
>> This would be a great feature to add.
>>
>>
>> Another minor interface thought:
>> 6) Have a little more cell padding in all the tables. Things are just a
>> little too cramped and things start to look messy/ run into each other.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at gmx.net  Mon May 15 15:23:55 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 15:23:55 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000501c67852$e1bb55c0$15327e82@pyrimidine>
References: <000501c67852$e1bb55c0$15327e82@pyrimidine>
Message-ID: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net>

I wasn't using the search. It's in the scrollable table for browsing.  
-hilmar

On May 15, 2006, at 3:07 PM, Chris Fields wrote:

> I'll have to give it a try on Mac OS X (we have an ancient G4 in  
> the lab
> which I can try it on).  I'll let you know what I find.
>
> This is what I get when I do a search for 'Bio::Ont*' using Firefox  
> on WinXP
> and this Deobfuscator link (http://bioperl.org/cgi-bin/ 
> deob_interface.cgi?);
> all the classes have links that work (I added newline and tab to  
> make it a
> bit more readable) :
>
> Bio::OntologyIO	
> 	Parser factory for Ontology formats
> Bio::OntologyIO::Handlers::BaseSAXHandler	
> 	no short description available
> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> 	no short description available
> Bio::Ontology::OntologyI
> 	Interface for an ontology implementation
> Bio::Ontology::TermFactory
> 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> factory
> Bio::Ontology::OntologyStore
> 	A repository of ontologies
> Bio::Ontology::RelationshipFactory
> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> through a factory
> Bio::Ontology::Ontology
> 	standard implementation of an Ontology
>
> So the names seem fine here.
>
> When I click on a class (Bio::Ontology::Ontology) I get in the results
> section:
>
> Method                  Class                                      
> Returns
> Usage
> add_relationship        Bio::Ontology::Ontology	                  Its
> argument.     add_relationship(RelationshipI relationship):  
> RelationshipI
> add_relationship_type   Bio::Ontology::OntologyEngineI            not
> documented    not documented
> add_term                Bio::Ontology::Ontology                   its
> argument.     add_term(TermI term): TermI
>
> ....and so on
>
> Where each method is clickable and opens a new page containing a  
> table:
>
> Bio::Ontology::Ontology::add_relationship
> Usage	add_relationship(RelationshipI relationship): RelationshipI
> Function	Adds a relationship object to the ontology engine.
> Returns	Its argument.
> Args	A RelationshipI object.
>
>
> Each class is also linked to the bioperl-live PDOC.  Clicking on class
> Bio::Ontology::Ontology in the results table gets me this page (no new
> page):
>
> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
>
>
> Chris
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Monday, May 15, 2006 1:09 PM
>> To: Chris Fields
>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Safari or Firefox on MacOSX don't do this. Note that the appearance
>> in the browsable list is already different (the prefix is missing),
>> and the JavaScript link also lacks the prefix in the module name in
>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
>> the few Bio::Ontology exceptions that do work and do display  
>> correctly).
>>
>> I suppose there is something peculiar about the code formatting of
>> those modules? Some of the modules under Bio::OntologyIO are also
>> affected BTW.
>>
>> What happens is after you click on the link the page apppears to
>> reload (i.e., gets submitted) but the second table that is supposed
>> open underneath the first doesn't appear. However, the sort-by drop
>> down selector does appear.
>>
>> 	-hilmar
>>
>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
>>
>>> That's strange.  Clicking on the list gives me the results for that
>>> module.
>>> When I click on the hyperlinks in the results section they open
>>> fine; the
>>> method column links opens a new page containing usage-function-
>>> returns-args
>>> and the class column links opens pdoc (same page) for bioperl-
>>> live.  I'm
>>> using Firefox 1.5 on WinXP.
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>> Sent: Monday, May 15, 2006 12:01 PM
>>>> To: Mauricio Herrera Cuadra
>>>> Cc: bioperl-l
>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>
>>>> Hey, thanks to Laura & David for this interface.
>>>>
>>>> Any idea why most of the Bio::Ontology::* modules show up without
>>>> their leading Bio::Ontology? And clicking on those hyperlinks  
>>>> doesn't
>>>> go anywhere either ... Anything different with those modules that I
>>>> can fix?
>>>>
>>>> 	-hilmar
>>>>
>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>>>
>>>>> I'm glad to announce the availability of the Deobfuscator
>>>>> interface at
>>>>> the BioPerl website. You can use it at the following URL:
>>>>>
>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>>
>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>>>> contribution to the BioPerl project!
>>>>>
>>>>> Mauricio.
>>>>>
>>>>> --
>>>>> MAURICIO HERRERA CUADRA
>>>>> arareko at campus.iztacala.unam.mx
>>>>> Laboratorio de Gen?tica
>>>>> Unidad de Morfofisiolog?a y Funci?n
>>>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ClarkeW at AGR.GC.CA  Mon May 15 15:40:15 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Mon, 15 May 2006 15:40:15 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>

Hey everyone, 

 
I have been developing some code to download and parse blast reports
from a remote server using Soap::Lite as well as insert the results into
a mysql database. The problem I am having is that my program seems to be
taking up and huge amount of RAM. For a single job of 10000 queries it
can consume as much as a couple hundred Mb inside an hour. I realize
that a lot of work is being done but this seems like way too much. This
leads me to the subject of my post. I think I may have traced the source
of the memory leak to Bio::SearchIO. I have used Devel::Size to track
the size of my variables and done other debugging steps and have had no
luck with resolving this very frustrating problem. My code is as
follows:

 
 my $result = $connector->getQueryResult($query_id);

 
                my $FH;

                open $FH, "<", \$result;

 
                my $searchio = new Bio::SearchIO(-format => "blast",

 
                         -fh => $FH);

 
                while (my $o_blast = $searchio->next_result()) {

                        my $clone_id = $o_blast->query_name();

 
                        my $statement = $bdbi->form_push_SQL ($o_blast,
$clone_id, 5);

 
this is just the leading and tailing code surrounding the use of
Bio::SearchIO since there is quite a lot. I am mostly just wondering if
anyone has ever had problems with SearchIO and its memory usage. I
looked at the source code for it but am afraid it is out of my league.
Any help/suggestions/questions would be great. Thanks


From dmessina at wustl.edu  Mon May 15 15:34:10 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 15 May 2006 14:34:10 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
Message-ID: <C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>

Responding to:
>>> Amir Karger
>> Sendu Bala
>  Chris Fields


> The Deobfuscator currently indexes bioperl-1.4, so it's not completely
> up-to-date.  I believe Mauricio and Dave may be working on updating  
> to the
> newer versions and maybe bioperl-live, as well as getting the other  
> bioperl
> packages up and running.

That's correct -- Mauricio is currently working on a version that  
will allow you to search 1.4, 1.5.1, or bioperl-live. The  
Deobfuscator indexes will be updated (daily?) to keep them in sync  
with the CVS repository.


>>> A couple of minor interface thoughts.
>>>
>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>> think I'll often want to browse through what's available in a  
>>> class. But
>>> 60% or so of the screen real estate is used for "Enter a search
>>> string... OR select a class from the list". IMO, it would be  
>>> better to
>>> have two pages, a search page and a result page.   It only takes  
>>> a click
>>> on Back (or a "new search" button) to get to a new search, and  
>>> now you
>>> can use your whole screen for reading your results.
>>
>> As the compromise it must be, I like the way it behaves. I don't like
>> lots of windows. I especially don't like pop up windows. Right now  
>> when
>> I'm using the bioperl docs I tend to have a whole bunch of tabs  
>> open to
>> different class pages at once, so being able to see an overview  
>> all on
>> one page in Deobfuscator is very nice.

I think the current behavior makes sense as the default, but I like  
the idea of being able to view the search results in a separate  
window for easier browsing. Thanks for the suggestion; I'll add it to  
the list.


>> Further to that, I'd love it if clicking on a method name caused an
>> in-place css(&|javascript) reveal (similar to how a well implemented
>> drop down menu works in a website) rather than a new window opened.
>> Alternatively, just have more columns in the results table, ie.  
>> usage,
>> function, returns, args columns. I feel that opening a window for  
>> each
>> method you want to understand is far too slow.
>
> Agreed.

Yeah, the way it currently works is admittedly lame, and was done as  
a placeholder until we figured out a better way to do it. An in-place  
reveal sounds like a good solution.


>>> 2) Please sort the "select a class from the list" alphabetically. I
>>> guess I can enter a search term to get the right classes, but it  
>>> would
>>> be nice to be able to browse.

Agreed. I think we were doing this in an earlier test version, but I  
must have left it out of the release I handed off to Mauricio.


>>> 3) Minimalist is nice, but documentation is even nicer. It wasn't  
>>> clear
>>> to me that the search searches within class names rather than  
>>> function
>>> names. What I really want to know sometimes is which module has,  
>>> say,
>>> the revcom method in it.
>>
>> This would be a great feature to add.

That's a great idea.


>>> 4) When I search for something that's not found, I get a screen that
>>> looks pretty familiar, with the extra text "No match to string  
>>> found"
>>> down at the bottom. It took me a while to even notice it.  
>>> (Studies show
>>> that most users don't read most of the text on a page.) Bold  
>>> might be
>>> nice here. Or put the error at the top of the screen. Or both.

Added to the list.


>>> 5) I'll save my stupidest comment for last - please make the page  
>>> title
>>> "Bioperl Deobfuscator", so that when I bookmark it I'll know what  
>>> the
>>> bookmark stands for.

Added to the list. Not stupid, by the way -- much to my surprise,  
there are at least 2 or 3 other (obviously inferior :) )  
deobfuscators floating around out there.


>> Another minor interface thought:
>> 6) Have a little more cell padding in all the tables. Things are  
>> just a
>> little too cramped and things start to look messy/ run into each  
>> other.

Added to the list.


Thanks to all of you for taking the time to give such detailed  
feedback -- it's really helpful.

There is a wiki page on the BioPerl site for this project (http:// 
www.bioperl.org/wiki/Deobfuscator), so I'll be putting your comments  
there for tracking and further discussion. Please feel free to add to  
it.


Dave


-- 
Dave Messina
WashU Genome Sequencing Center
dmessina at wustl.edu
314-286-1825


From faruque at ebi.ac.uk  Mon May 15 15:47:27 2006
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Mon, 15 May 2006 20:47:27 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
Message-ID: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk>

>> My personal view is that having it as an annotation would serve no  
>> real
>> purpose. For me the whole point of any kind of species  
>> representation in
>> bioperl is to allow you to compare species in a biologically  
>> meaningful
>> way. If it's just some annotation then that means it's basically

I understand the need to find the species name of entries, especially  
now that so many complete genomes have been given their own strain- 
specific tax nodes, and I also think it is a shame that the ncbi tax  
dump does not give a rank to entries such as these (they cannot  
easily be distinguished from unofficial ranks higher in the tree  
without ascending the tree).
Would it be useful for the species name to be included within EMBL  
file headers, eg in a line called OB (OB is a terrible suggestion  
based on 'Organism Binomial' since OS is already in use)?

eg two examples of the species 'Apple stem grooving virus', where the  
second one would appear to be a different species without delving  
into the tax tree or the inclusion of an OB line.

AC   D14995; S47260;
DE   Apple stem grooving virus genome, complete sequence.
OS   Apple stem grooving virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.

AC   AY646511;
DE   Citrus tatter leaf virus strain Kumquat 1, complete genome.
OS   Citrus tatter leaf virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.


> My point is, a large number of users do NOT use, nor care about,  
> taxonomic
> information to the degree they need to know the entire  
> classification of the
> organism; many are just as happy about getting the scientific name  
> only,
> which is in the GenBank/EMBL file itself.  To take one extreme, it  
> is not
> productive to force every user to download the NCBI tax database  
> and use
> lookups just to convert sequences from EMBL format to GenBank  
> format.  It's
> not productive to allow users to spam the NCBI tax database  
> remotely either,
> so hardcoding lookups is, IMHO, a big mistake.

I don't think you need to add any information to turn an embl-format  
file into a Genbank flatfile, but maybe I'm missing something obvious.

Nadeem


--
Dr S.M. Nadeem N. Faruque
9 Barley Court
Saffron Walden
Essex  CB11 3HG
01799 500 120


From dmessina at wustl.edu  Mon May 15 16:12:48 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 15 May 2006 15:12:48 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <5A2309FD-8C6E-4349-99CC-B3EDA8B2F499@wustl.edu>

On May 15, 2006, at 2:23 PM, Hilmar Lapp wrote:

> I wasn't using the search. It's in the scrollable table for browsing.
> -hilmar

I'm seeing this too on OS X with Safari 2.0.3.

If you type 'goflat' (without the quotes) into the search box, you'll  
see the behavior. Chris, can you try it again this way just to  
confirm it's an OS/browser-specific thing?

Not sure what's going on, Hilmar -- I'll take a look.

Dave


From cjfields at uiuc.edu  Mon May 15 16:56:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 15:56:29 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net>
Message-ID: <000a01c67862$0a00cab0$15327e82@pyrimidine>

Okay, I see what you mean.  Using the search term "Bio::Ont*" also explains
why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and Mac OS
X), and those links are broken like you said.  Could be something to do with
indexing.  

Using the methods script in the FAQ
(http://www.bioperl.org/wiki/FAQ#Why_can.27t_I_easily_get_a_list_of_all_the_
methods_a_object_can_call.3F) I get this:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
Bio::OntologyIO::simplehierarchy::Dumper
Bio::OntologyIO::simplehierarchy::basename
Bio::OntologyIO::simplehierarchy::dirname
Bio::OntologyIO::simplehierarchy::fileparse
Bio::OntologyIO::simplehierarchy::fileparse_set_fstype

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 2:24 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> I wasn't using the search. It's in the scrollable table for browsing.
> -hilmar
> 
> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
> 
> > I'll have to give it a try on Mac OS X (we have an ancient G4 in
> > the lab
> > which I can try it on).  I'll let you know what I find.
> >
> > This is what I get when I do a search for 'Bio::Ont*' using Firefox
> > on WinXP
> > and this Deobfuscator link (http://bioperl.org/cgi-bin/
> > deob_interface.cgi?);
> > all the classes have links that work (I added newline and tab to
> > make it a
> > bit more readable) :
> >
> > Bio::OntologyIO
> > 	Parser factory for Ontology formats
> > Bio::OntologyIO::Handlers::BaseSAXHandler
> > 	no short description available
> > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> > 	no short description available
> > Bio::Ontology::OntologyI
> > 	Interface for an ontology implementation
> > Bio::Ontology::TermFactory
> > 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> > factory
> > Bio::Ontology::OntologyStore
> > 	A repository of ontologies
> > Bio::Ontology::RelationshipFactory
> > 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> > through a factory
> > Bio::Ontology::Ontology
> > 	standard implementation of an Ontology
> >
> > So the names seem fine here.
> >
> > When I click on a class (Bio::Ontology::Ontology) I get in the results
> > section:
> >
> > Method                  Class
> > Returns
> > Usage
> > add_relationship        Bio::Ontology::Ontology
> Its
> > argument.     add_relationship(RelationshipI relationship):
> > RelationshipI
> > add_relationship_type   Bio::Ontology::OntologyEngineI            not
> > documented    not documented
> > add_term                Bio::Ontology::Ontology                   its
> > argument.     add_term(TermI term): TermI
> >
> > ....and so on
> >
> > Where each method is clickable and opens a new page containing a
> > table:
> >
> > Bio::Ontology::Ontology::add_relationship
> > Usage	add_relationship(RelationshipI relationship): RelationshipI
> > Function	Adds a relationship object to the ontology engine.
> > Returns	Its argument.
> > Args	A RelationshipI object.
> >
> >
> > Each class is also linked to the bioperl-live PDOC.  Clicking on class
> > Bio::Ontology::Ontology in the results table gets me this page (no new
> > page):
> >
> > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> >
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Monday, May 15, 2006 1:09 PM
> >> To: Chris Fields
> >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> Safari or Firefox on MacOSX don't do this. Note that the appearance
> >> in the browsable list is already different (the prefix is missing),
> >> and the JavaScript link also lacks the prefix in the module name in
> >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> >> the few Bio::Ontology exceptions that do work and do display
> >> correctly).
> >>
> >> I suppose there is something peculiar about the code formatting of
> >> those modules? Some of the modules under Bio::OntologyIO are also
> >> affected BTW.
> >>
> >> What happens is after you click on the link the page apppears to
> >> reload (i.e., gets submitted) but the second table that is supposed
> >> open underneath the first doesn't appear. However, the sort-by drop
> >> down selector does appear.
> >>
> >> 	-hilmar
> >>
> >> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >>
> >>> That's strange.  Clicking on the list gives me the results for that
> >>> module.
> >>> When I click on the hyperlinks in the results section they open
> >>> fine; the
> >>> method column links opens a new page containing usage-function-
> >>> returns-args
> >>> and the class column links opens pdoc (same page) for bioperl-
> >>> live.  I'm
> >>> using Firefox 1.5 on WinXP.
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >>>> Sent: Monday, May 15, 2006 12:01 PM
> >>>> To: Mauricio Herrera Cuadra
> >>>> Cc: bioperl-l
> >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>
> >>>> Hey, thanks to Laura & David for this interface.
> >>>>
> >>>> Any idea why most of the Bio::Ontology::* modules show up without
> >>>> their leading Bio::Ontology? And clicking on those hyperlinks
> >>>> doesn't
> >>>> go anywhere either ... Anything different with those modules that I
> >>>> can fix?
> >>>>
> >>>> 	-hilmar
> >>>>
> >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>>>
> >>>>> I'm glad to announce the availability of the Deobfuscator
> >>>>> interface at
> >>>>> the BioPerl website. You can use it at the following URL:
> >>>>>
> >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>>>
> >>>>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>>>> contribution to the BioPerl project!
> >>>>>
> >>>>> Mauricio.
> >>>>>
> >>>>> --
> >>>>> MAURICIO HERRERA CUADRA
> >>>>> arareko at campus.iztacala.unam.mx
> >>>>> Laboratorio de Gen?tica
> >>>>> Unidad de Morfofisiolog?a y Funci?n
> >>>>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 17:29:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 16:29:14 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk>
Message-ID: <000b01c67866$9dac2620$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nadeem Faruque
> Sent: Monday, May 15, 2006 2:47 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles
> species,subspecies/variant names
> 
> >> My personal view is that having it as an annotation would serve no
> >> real
> >> purpose. For me the whole point of any kind of species
> >> representation in
> >> bioperl is to allow you to compare species in a biologically
> >> meaningful
> >> way. If it's just some annotation then that means it's basically
> 
> I understand the need to find the species name of entries, especially
> now that so many complete genomes have been given their own strain-
> specific tax nodes, and I also think it is a shame that the ncbi tax
> dump does not give a rank to entries such as these (they cannot
> easily be distinguished from unofficial ranks higher in the tree
> without ascending the tree).
> Would it be useful for the species name to be included within EMBL
> file headers, eg in a line called OB (OB is a terrible suggestion
> based on 'Organism Binomial' since OS is already in use)?
> 
> eg two examples of the species 'Apple stem grooving virus', where the
> second one would appear to be a different species without delving
> into the tax tree or the inclusion of an OB line.
> 
> AC   D14995; S47260;
> DE   Apple stem grooving virus genome, complete sequence.
> OS   Apple stem grooving virus
> OB   Apple stem grooving virus
> OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
> OC   Capillovirus.
> 
> AC   AY646511;
> DE   Citrus tatter leaf virus strain Kumquat 1, complete genome.
> OS   Citrus tatter leaf virus
> OB   Apple stem grooving virus
> OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
> OC   Capillovirus.

Jason also mentions a few examples (see below).  The problem lies in the
fact that EMBL and GenBank flatfiles do not give hierarchy ranking for
taxonomy, so it's a best guess.  What I'm seeing is that the guess is wrong
more often than not when it comes to complex scientific names (viruses,
bacteria, etc).  Notice the doubling of the strain in the following GenBank
files passed through SeqIO (genbank->genbank conversion, BTW; haven't tried
EMBL):

SOURCE      Azoarcus sp. EbN1 EbN1
  ORGANISM  Azoarcus sp.
            Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales;
            Rhodocyclaceae; Azoarcus.

SOURCE      Mycobacterium sp. KMS KMS
  ORGANISM  Mycobacterium sp.
            Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
            Corynebacterineae; Mycobacteriaceae; Mycobacterium.

SOURCE      Mycobacterium tuberculosis C C
  ORGANISM  Mycobacterium tuberculosis
            Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
            Corynebacterineae; Mycobacteriaceae; Mycobacterium;
Mycobacterium;
            tuberculosis complex; Mycobacterium.

SOURCE      Bacillus subtilis subsp. subtilis str. 168 subtilis str. 168
  ORGANISM  Bacillus subtilis subsp.
            Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus.

Here are Jason's examples, for posterity:

Can you guess what value is the strain versus sub-species?  What happens
when there is a two part strain name (space separated) and a sub-species or
variety designation?

SOURCE      Staphylococcus haemolyticus JCSC1435
   ORGANISM  Staphylococcus haemolyticus JCSC1435
             Bacteria; Firmicutes; Bacillales; Staphylococcus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
strain is JCSC1435

versus
SOURCE      Muntiacus muntjak vaginalis
   ORGANISM  Muntiacus muntjak vaginalis
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;
Ruminantia;
             Pecora; Cervidae; Muntiacinae; Muntiacus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
species is muntjak, sub-species vaginalis ?

versus
SOURCE      Aspergillus nidulans FGSC A4
   ORGANISM  Aspergillus nidulans FGSC A4
             Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes;
             Eurotiales; Trichocomaceae; Emericella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321
Genus should be Aspergillus or Emericella ?

Strain and subspecies/variety in the same entry
SOURCE      Cryptococcus neoformans var. grubii H99
   ORGANISM  Cryptococcus neoformans var. grubii H99
             Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
             Heterobasidiomycetes; Tremellomycetidae; Tremellales;
Tremellaceae;
             Filobasidiella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443


> > My point is, a large number of users do NOT use, nor care about,
> > taxonomic
> > information to the degree they need to know the entire
> > classification of the
> > organism; many are just as happy about getting the scientific name
> > only,
> > which is in the GenBank/EMBL file itself.  To take one extreme, it
> > is not
> > productive to force every user to download the NCBI tax database
> > and use
> > lookups just to convert sequences from EMBL format to GenBank
> > format.  It's
> > not productive to allow users to spam the NCBI tax database
> > remotely either,
> > so hardcoding lookups is, IMHO, a big mistake.
> 
> I don't think you need to add any information to turn an embl-format
> file into a Genbank flatfile, but maybe I'm missing something obvious.

The issue is the way the SOURCE and ORGANISM lines are handled (OS/OC lines
in EMBL, I believe), which is using a Bio::Species object.  The problem is,
like I mentioned above, no hierarchal ranking is in the flat file, just the
order of the ranking.  We can try to make a best guess based on that but
it's obviously very tricky, particularly when dealing with subspecies,
strains, etc.  

NCBI also states that many times the classification can be too long for a
file so may be incomplete (I think they leave out nodes which have 'no rank'
tags, but I can't be completely sure), so there's another issue.

Anyway, this is where the lookup would come in, which would require a local
taxonomy  database (we can't spam the NCBI remote database, that would just
be rude) which would give the complete taxonomic classification if it worked
properly.  

So now we have three possible situations:

1) One extreme : We require a lookup to get it right (which, BTW, it
currently doesn't); this by default requires a local database.  
2) Middle of the road : we try and guess the information as best as we can
with the information given (the current situation); this is breaking more
and more often now, so is becoming more unreliable.
3) Other extreme : we punt and absolve ourselves of even trying to parse the
data and just have a strict tagname->value or similar simple construct to
handle the data.

#3 as default with option to do #1 is probably best (least error prone with
option for most information), with caching to speed up lookups as Sendu Bala
does now.

Chris

 
> Nadeem
> 
> 
> --
> Dr S.M. Nadeem N. Faruque
> 9 Barley Court
> Saffron Walden
> Essex  CB11 3HG
> 01799 500 120
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Mon May 15 17:37:56 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 17:37:56 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000a01c67862$0a00cab0$15327e82@pyrimidine>
References: <000a01c67862$0a00cab0$15327e82@pyrimidine>
Message-ID: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net>

It does have the following line though (and a 'use' statement for  
OntologyIO);

@ISA = qw( Bio::OntologyIO );

So what is it doing 'wrong' (there aren't any tests or so in which  
anything erroneous would show)?

	-hilmar

On May 15, 2006, at 4:56 PM, Chris Fields wrote:

> Okay, I see what you mean.  Using the search term "Bio::Ont*" also  
> explains
> why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and  
> Mac OS
> X), and those links are broken like you said.  Could be something  
> to do with
> indexing.
>
> Using the methods script in the FAQ
> (http://www.bioperl.org/wiki/FAQ#Why_can. 
> 27t_I_easily_get_a_list_of_all_the_
> methods_a_object_can_call.3F) I get this:
>
> C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
> Bio::OntologyIO::simplehierarchy::Dumper
> Bio::OntologyIO::simplehierarchy::basename
> Bio::OntologyIO::simplehierarchy::dirname
> Bio::OntologyIO::simplehierarchy::fileparse
> Bio::OntologyIO::simplehierarchy::fileparse_set_fstype
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Monday, May 15, 2006 2:24 PM
>> To: Chris Fields
>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> I wasn't using the search. It's in the scrollable table for browsing.
>> -hilmar
>>
>> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
>>
>>> I'll have to give it a try on Mac OS X (we have an ancient G4 in
>>> the lab
>>> which I can try it on).  I'll let you know what I find.
>>>
>>> This is what I get when I do a search for 'Bio::Ont*' using Firefox
>>> on WinXP
>>> and this Deobfuscator link (http://bioperl.org/cgi-bin/
>>> deob_interface.cgi?);
>>> all the classes have links that work (I added newline and tab to
>>> make it a
>>> bit more readable) :
>>>
>>> Bio::OntologyIO
>>> 	Parser factory for Ontology formats
>>> Bio::OntologyIO::Handlers::BaseSAXHandler
>>> 	no short description available
>>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
>>> 	no short description available
>>> Bio::Ontology::OntologyI
>>> 	Interface for an ontology implementation
>>> Bio::Ontology::TermFactory
>>> 	Instantiates a new Bio::Ontology::TermI (or derived class)  
>>> through a
>>> factory
>>> Bio::Ontology::OntologyStore
>>> 	A repository of ontologies
>>> Bio::Ontology::RelationshipFactory
>>> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
>>> through a factory
>>> Bio::Ontology::Ontology
>>> 	standard implementation of an Ontology
>>>
>>> So the names seem fine here.
>>>
>>> When I click on a class (Bio::Ontology::Ontology) I get in the  
>>> results
>>> section:
>>>
>>> Method                  Class
>>> Returns
>>> Usage
>>> add_relationship        Bio::Ontology::Ontology
>> Its
>>> argument.     add_relationship(RelationshipI relationship):
>>> RelationshipI
>>> add_relationship_type   Bio::Ontology::OntologyEngineI             
>>> not
>>> documented    not documented
>>> add_term                Bio::Ontology::Ontology                    
>>> its
>>> argument.     add_term(TermI term): TermI
>>>
>>> ....and so on
>>>
>>> Where each method is clickable and opens a new page containing a
>>> table:
>>>
>>> Bio::Ontology::Ontology::add_relationship
>>> Usage	add_relationship(RelationshipI relationship): RelationshipI
>>> Function	Adds a relationship object to the ontology engine.
>>> Returns	Its argument.
>>> Args	A RelationshipI object.
>>>
>>>
>>> Each class is also linked to the bioperl-live PDOC.  Clicking on  
>>> class
>>> Bio::Ontology::Ontology in the results table gets me this page  
>>> (no new
>>> page):
>>>
>>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
>>>
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>>>> Sent: Monday, May 15, 2006 1:09 PM
>>>> To: Chris Fields
>>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>
>>>> Safari or Firefox on MacOSX don't do this. Note that the appearance
>>>> in the browsable list is already different (the prefix is missing),
>>>> and the JavaScript link also lacks the prefix in the module name in
>>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
>>>> the few Bio::Ontology exceptions that do work and do display
>>>> correctly).
>>>>
>>>> I suppose there is something peculiar about the code formatting of
>>>> those modules? Some of the modules under Bio::OntologyIO are also
>>>> affected BTW.
>>>>
>>>> What happens is after you click on the link the page apppears to
>>>> reload (i.e., gets submitted) but the second table that is supposed
>>>> open underneath the first doesn't appear. However, the sort-by drop
>>>> down selector does appear.
>>>>
>>>> 	-hilmar
>>>>
>>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
>>>>
>>>>> That's strange.  Clicking on the list gives me the results for  
>>>>> that
>>>>> module.
>>>>> When I click on the hyperlinks in the results section they open
>>>>> fine; the
>>>>> method column links opens a new page containing usage-function-
>>>>> returns-args
>>>>> and the class column links opens pdoc (same page) for bioperl-
>>>>> live.  I'm
>>>>> using Firefox 1.5 on WinXP.
>>>>>
>>>>> Chris
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>>>> Sent: Monday, May 15, 2006 12:01 PM
>>>>>> To: Mauricio Herrera Cuadra
>>>>>> Cc: bioperl-l
>>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>>>
>>>>>> Hey, thanks to Laura & David for this interface.
>>>>>>
>>>>>> Any idea why most of the Bio::Ontology::* modules show up without
>>>>>> their leading Bio::Ontology? And clicking on those hyperlinks
>>>>>> doesn't
>>>>>> go anywhere either ... Anything different with those modules  
>>>>>> that I
>>>>>> can fix?
>>>>>>
>>>>>> 	-hilmar
>>>>>>
>>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>>>>>
>>>>>>> I'm glad to announce the availability of the Deobfuscator
>>>>>>> interface at
>>>>>>> the BioPerl website. You can use it at the following URL:
>>>>>>>
>>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>>>>
>>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>>>>>> contribution to the BioPerl project!
>>>>>>>
>>>>>>> Mauricio.
>>>>>>>
>>>>>>> --
>>>>>>> MAURICIO HERRERA CUADRA
>>>>>>> arareko at campus.iztacala.unam.mx
>>>>>>> Laboratorio de Gen?tica
>>>>>>> Unidad de Morfofisiolog?a y Funci?n
>>>>>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 18:03:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 17:03:48 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net>
Message-ID: <000d01c6786b$71c04e60$15327e82@pyrimidine>

And Bio::OntologyIO works on it's own:

C:\Perl\Scripts>methods.pl Bio::OntologyIO
Bio::OntologyIO::DESTROY
Bio::OntologyIO::new
Bio::OntologyIO::next_ontology
Bio::OntologyIO::term_factory
Bio::OntologyIO::unescape
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented

But when I try these:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::goflat


C:\Perl\Scripts>methods.pl Bio::OntologyIO::dagflat


I get nada.  It could be related to the way the methods are parsed using
Class::Inspector :

print join ("\n", sort
@{Class::Inspector->methods($class,'full','public')}), "\n";

I haven't tried it on all the weird Bio::Ontology-missing modules (don't
have time today).  It's not common to all of those modules though:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::InterProParser
Bio::OntologyIO::DESTROY
Bio::OntologyIO::InterProParser::next_ontology
Bio::OntologyIO::InterProParser::parse
Bio::OntologyIO::InterProParser::secondary_accessions_map
Bio::OntologyIO::new
Bio::OntologyIO::term_factory
Bio::OntologyIO::unescape
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented


Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 4:38 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> It does have the following line though (and a 'use' statement for
> OntologyIO);
> 
> @ISA = qw( Bio::OntologyIO );
> 
> So what is it doing 'wrong' (there aren't any tests or so in which
> anything erroneous would show)?
> 
> 	-hilmar
> 
> On May 15, 2006, at 4:56 PM, Chris Fields wrote:
> 
> > Okay, I see what you mean.  Using the search term "Bio::Ont*" also
> > explains
> > why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and
> > Mac OS
> > X), and those links are broken like you said.  Could be something
> > to do with
> > indexing.
> >
> > Using the methods script in the FAQ
> > (http://www.bioperl.org/wiki/FAQ#Why_can.
> > 27t_I_easily_get_a_list_of_all_the_
> > methods_a_object_can_call.3F) I get this:
> >
> > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
> > Bio::OntologyIO::simplehierarchy::Dumper
> > Bio::OntologyIO::simplehierarchy::basename
> > Bio::OntologyIO::simplehierarchy::dirname
> > Bio::OntologyIO::simplehierarchy::fileparse
> > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >> Sent: Monday, May 15, 2006 2:24 PM
> >> To: Chris Fields
> >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> I wasn't using the search. It's in the scrollable table for browsing.
> >> -hilmar
> >>
> >> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
> >>
> >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in
> >>> the lab
> >>> which I can try it on).  I'll let you know what I find.
> >>>
> >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox
> >>> on WinXP
> >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/
> >>> deob_interface.cgi?);
> >>> all the classes have links that work (I added newline and tab to
> >>> make it a
> >>> bit more readable) :
> >>>
> >>> Bio::OntologyIO
> >>> 	Parser factory for Ontology formats
> >>> Bio::OntologyIO::Handlers::BaseSAXHandler
> >>> 	no short description available
> >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> >>> 	no short description available
> >>> Bio::Ontology::OntologyI
> >>> 	Interface for an ontology implementation
> >>> Bio::Ontology::TermFactory
> >>> 	Instantiates a new Bio::Ontology::TermI (or derived class)
> >>> through a
> >>> factory
> >>> Bio::Ontology::OntologyStore
> >>> 	A repository of ontologies
> >>> Bio::Ontology::RelationshipFactory
> >>> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> >>> through a factory
> >>> Bio::Ontology::Ontology
> >>> 	standard implementation of an Ontology
> >>>
> >>> So the names seem fine here.
> >>>
> >>> When I click on a class (Bio::Ontology::Ontology) I get in the
> >>> results
> >>> section:
> >>>
> >>> Method                  Class
> >>> Returns
> >>> Usage
> >>> add_relationship        Bio::Ontology::Ontology
> >> Its
> >>> argument.     add_relationship(RelationshipI relationship):
> >>> RelationshipI
> >>> add_relationship_type   Bio::Ontology::OntologyEngineI
> >>> not
> >>> documented    not documented
> >>> add_term                Bio::Ontology::Ontology
> >>> its
> >>> argument.     add_term(TermI term): TermI
> >>>
> >>> ....and so on
> >>>
> >>> Where each method is clickable and opens a new page containing a
> >>> table:
> >>>
> >>> Bio::Ontology::Ontology::add_relationship
> >>> Usage	add_relationship(RelationshipI relationship): RelationshipI
> >>> Function	Adds a relationship object to the ontology engine.
> >>> Returns	Its argument.
> >>> Args	A RelationshipI object.
> >>>
> >>>
> >>> Each class is also linked to the bioperl-live PDOC.  Clicking on
> >>> class
> >>> Bio::Ontology::Ontology in the results table gets me this page
> >>> (no new
> >>> page):
> >>>
> >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> >>>
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >>>> Sent: Monday, May 15, 2006 1:09 PM
> >>>> To: Chris Fields
> >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>
> >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance
> >>>> in the browsable list is already different (the prefix is missing),
> >>>> and the JavaScript link also lacks the prefix in the module name in
> >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> >>>> the few Bio::Ontology exceptions that do work and do display
> >>>> correctly).
> >>>>
> >>>> I suppose there is something peculiar about the code formatting of
> >>>> those modules? Some of the modules under Bio::OntologyIO are also
> >>>> affected BTW.
> >>>>
> >>>> What happens is after you click on the link the page apppears to
> >>>> reload (i.e., gets submitted) but the second table that is supposed
> >>>> open underneath the first doesn't appear. However, the sort-by drop
> >>>> down selector does appear.
> >>>>
> >>>> 	-hilmar
> >>>>
> >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >>>>
> >>>>> That's strange.  Clicking on the list gives me the results for
> >>>>> that
> >>>>> module.
> >>>>> When I click on the hyperlinks in the results section they open
> >>>>> fine; the
> >>>>> method column links opens a new page containing usage-function-
> >>>>> returns-args
> >>>>> and the class column links opens pdoc (same page) for bioperl-
> >>>>> live.  I'm
> >>>>> using Firefox 1.5 on WinXP.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >>>>>> Sent: Monday, May 15, 2006 12:01 PM
> >>>>>> To: Mauricio Herrera Cuadra
> >>>>>> Cc: bioperl-l
> >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>>>
> >>>>>> Hey, thanks to Laura & David for this interface.
> >>>>>>
> >>>>>> Any idea why most of the Bio::Ontology::* modules show up without
> >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks
> >>>>>> doesn't
> >>>>>> go anywhere either ... Anything different with those modules
> >>>>>> that I
> >>>>>> can fix?
> >>>>>>
> >>>>>> 	-hilmar
> >>>>>>
> >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>>>>>
> >>>>>>> I'm glad to announce the availability of the Deobfuscator
> >>>>>>> interface at
> >>>>>>> the BioPerl website. You can use it at the following URL:
> >>>>>>>
> >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>>>>>
> >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>>>>>> contribution to the BioPerl project!
> >>>>>>>
> >>>>>>> Mauricio.
> >>>>>>>
> >>>>>>> --
> >>>>>>> MAURICIO HERRERA CUADRA
> >>>>>>> arareko at campus.iztacala.unam.mx
> >>>>>>> Laboratorio de Gen?tica
> >>>>>>> Unidad de Morfofisiolog?a y Funci?n
> >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> ===========================================================
> >>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>>>> ===========================================================
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 20:14:28 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Mon, 15 May 2006 19:14:28 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <a7d26051.b90db78f.81ac600@expms6.cites.uiuc.edu>

---- Original message ----
>Date: Mon, 15 May 2006 15:40:15 -0400
>From: "Clarke, Wayne" <ClarkeW at agr.gc.ca>  
>Subject: [Bioperl-l] Memory Leak in Bio::SearchIO  
>To: <bioperl-l at lists.open-bio.org>
>
>Hey everyone, 
>
> 
>
>I have been developing some code to download and parse blast reports
>from a remote server using Soap::Lite as well as insert the results into
>a mysql database. The problem I am having is that my program seems to be
>taking up and huge amount of RAM. For a single job of 10000 queries it
>can consume as much as a couple hundred Mb inside an hour. 

If you're parsing 10000 queries (10000 different BLAST reports, right?) then it's 
not necessarily a memory leak as much as it is object creatio.  Each report 
generates hit objects which in turn generate hsp objects.  I think Jason 
recommends using the tabular output option (-m8 or -m9) for huge reports as 
it cuts down considerably on this.  If you are cycling through each report it 
shouldn't be as much of a problem unless your BLAST reports are really huge.  
Have you tried parsing a single report to see if the problem persists?

Now, if you are using Bioperl 1.5.1 with BLAST 2.2.13 or newer, you'll likely run 
into a problem with an infinite loop that occurs due to a change in NCBI's text 
output.  You can try updating bioperl from CVS in either case to see if that helps 
any.  Tabular output and XML output, AFAIK, is the same regardless of version; 
this bug only affected text output of BLAST reports.

> I realize
>that a lot of work is being done but this seems like way too much. This
>leads me to the subject of my post. I think I may have traced the source
>of the memory leak to Bio::SearchIO. I have used Devel::Size to track
>the size of my variables and done other debugging steps and have had no
>luck with resolving this very frustrating problem. My code is as
>follows:
>
> 
>
> my $result = $connector->getQueryResult($query_id);
>
> 
>
>                my $FH;
>
>                open $FH, "<", \$result;
>
> 
>
>                my $searchio = new Bio::SearchIO(-format => "blast",
>
> 
>
>                         -fh => $FH);
>
> 
>
>                while (my $o_blast = $searchio->next_result()) {
>
>                        my $clone_id = $o_blast->query_name();
>
> 
>
>                        my $statement = $bdbi->form_push_SQL ($o_blast,
>$clone_id, 5);
>
> 
>
>this is just the leading and tailing code surrounding the use of
>Bio::SearchIO since there is quite a lot. I am mostly just wondering if
>anyone has ever had problems with SearchIO and its memory usage. I
>looked at the source code for it but am afraid it is out of my league.
>Any help/suggestions/questions would be great. Thanks
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Mon May 15 20:18:44 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 16 May 2006 10:18:44 +1000
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>
Message-ID: <44691A64.8040607@infotech.monash.edu.au>

> taking up and huge amount of RAM. For a single job of 10000 queries it
> can consume as much as a couple hundred Mb inside an hour. I realize

>  my $result = $connector->getQueryResult($query_id);
>                 my $searchio = new Bio::SearchIO(-format => "blast",
>                 while (my $o_blast = $searchio->next_result()) {
>                         my $clone_id = $o_blast->query_name();
>                         my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); }

Some comments:

Have you considered that whatever class/module $bdbi belongs to is 
causing the problem? ie. is it keeping a reference to $o_blast around?

Are you aware that Perl garbage collection does not necessarily return 
freed memory back to the OS? This may affect how you were measuring 
"memory usage".

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From kmdaily at indiana.edu  Mon May 15 17:00:12 2006
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Mon, 15 May 2006 17:00:12 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
Message-ID: <20528E699A515C499B80C222BDBEBC34043FF8@iu-mssg-mbx108.ads.iu.edu>

I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module?

Kenny Daily
IU School of Informatics
kmdaily at indiana.edu


From letondal at pasteur.fr  Tue May 16 02:06:19 2006
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 16 May 2006 08:06:19 +0200
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
	<C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>
Message-ID: <9c36140009c3d80bbb0d543376afa6e0@pasteur.fr>


On May 15, 2006, at 9:34 PM, David Messina wrote:

>>>> A couple of minor interface thoughts.
>>>>
>>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>>> think I'll often want to browse through what's available in a
>>>> class. But
>>>> 60% or so of the screen real estate is used for "Enter a search
>>>> string... OR select a class from the list". IMO, it would be
>>>> better to
>>>> have two pages, a search page and a result page.   It only takes
>>>> a click
>>>> on Back (or a "new search" button) to get to a new search, and
>>>> now you
>>>> can use your whole screen for reading your results.
>>>
>>> As the compromise it must be, I like the way it behaves. I don't like
>>> lots of windows. I especially don't like pop up windows. Right now
>>> when
>>> I'm using the bioperl docs I tend to have a whole bunch of tabs
>>> open to
>>> different class pages at once, so being able to see an overview
>>> all on
>>> one page in Deobfuscator is very nice.
>
> I think the current behavior makes sense as the default, but I like
> the idea of being able to view the search results in a separate
> window for easier browsing. Thanks for the suggestion; I'll add it to
> the list.
>

First, thanks for this very useful Web interface!

There are examples (quite ajaxian ones) that reach a compromise between 
several windows for easily browsing large results, and composing 
everything in one window to get an overview - the 2 examples that come 
in my mind currently are (not biology related):
	- http://montreal.mspace.fm/chi/sched/
	- http://www.live.com/
		(see the slider on the top right enabling to squeeze or enlarge the 
results area)


--
Catherine Letondal -- Institut Pasteur


From cjfields at uiuc.edu  Tue May 16 07:38:42 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Tue, 16 May 2006 06:38:42 -0500
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
Message-ID: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>

You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
developer release  (1.5.1):

http://www.bioperl.org/wiki/Installing_BioPerl

Chris

---- Original message ----
>Date: Mon, 15 May 2006 17:00:12 -0400
>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>To: <bioperl-l at lists.open-bio.org>
>
>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
Bio/SeqIO). How can I get this module?
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Tue May 16 07:37:46 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 16 May 2006 13:37:46 +0200
Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
Message-ID: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com>

Hi all,

I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and
found some issues and differences (bugs?) in behaviour wrt the pod.
Do these look familiar ?

Some example code:
my $query = Bio::DB::Query::GenBank->new
       (-query   =>'Lassa Virus[ORGN]',
        -reldate => '30',
        -db      => 'protein',
        -ids => [195052,2981014,11127914],
        -maxids => 30 );

$gb = new Bio::DB::GenBank(format=>'fasta');
my $seqio = $gb->get_Stream_by_query($query);
while (my $seq = $seqio->next_seq) {
       print $seq->desc,"\n"; }

The module states that if we provide -ids that:
       If you provide an array reference of IDs in -ids, the query will be
       ignored and the list of IDs will be used when the query is passed to a
       Bio::DB::GenBank object's get_Stream_by_query() method.

In the above case actually the query is passed ('Lassa Virus[ORGN]),
not the IDs. Also $query->query shows the original query. Am I doing
something wrong or is the pod not reflecting current behaviour of this
module?

I was also surprised that if internet is down no warning is thrown for
$query->query or $query->count at all. Only the get_Stream_by_query
above will warn us if the site is unreachable (500 Internal Server
Error).

$query->ids or $query->count will not throw a warning and
@ids=$query->ids will just be an empty array. (I realize $query->count
is not initialized, so I am using this now to check for succes, but a
warning from WebDBSeqI would me more approprotiate I think).

Last, the example from the pod is not working, but no warnings are raised:
          # initialize the list yourself
          my $query =
Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]);

$query->count returns zero w/o any warning. Of course this query did
not specify a DB. Only if we specify -db=>'nucleotide' $query->count
is 3.
However, why not any warning if we set -db->'protein' or if we did not set this?

On the NCBI website searching Protein DB returns for 19505:
      See Details. No items found.
      The following term(s) refer to a different DB:195052

But this is not reflected via Bio::DB::Query::GenBank.

Can I check for this situation in the code apart from checking on
$query->count == 0 ? Or would it indeed be better to check for these
situations in the module?

Regards,
Bernd


From chen_li3 at yahoo.com  Tue May 16 10:55:51 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 16 May 2006 07:55:51 -0700 (PDT)
Subject: [Bioperl-l] module for 6 reading frames
Message-ID: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com>

Hi all,

I wonder which module is available for translating DNA
sequence into 6 reading frames.

Thank you,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From smarkel at scitegic.com  Tue May 16 11:10:35 2006
From: smarkel at scitegic.com (smarkel at scitegic.com)
Date: Tue, 16 May 2006 08:10:35 -0700
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com>
Message-ID: <OF41BF3DF8.D7365B03-ON88257170.00534209-88257170.00535904@scitegic.com>

Li,

Use the translate() function in Bio::Tools::CodonTable.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


bioperl-l-bounces at lists.open-bio.org wrote on 16.05.2006 07:55:51:

> Hi all,
> 
> I wonder which module is available for translating DNA
> sequence into 6 reading frames.
> 
> Thank you,
> 
> Li


From golharam at umdnj.edu  Tue May 16 12:18:19 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 16 May 2006 12:18:19 -0400
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
Message-ID: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1>

I just updated my local copy of bioperl from cvs.  When I ran the
configure script, it says I need the external module
Bio::ASN1::EntrezGene.  Which package contains this module?

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From golharam at umdnj.edu  Tue May 16 12:24:03 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 16 May 2006 12:24:03 -0400
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
Message-ID: <002001c67905$2622a580$2f01a8c0@GOLHARMOBILE1>

Never mind.  I see its in CPAN.

-----Original Message-----
From: Ryan Golhar [mailto:golharam at umdnj.edu] 
Sent: Tuesday, May 16, 2006 12:18 PM
To: 'bioperl-l at bioperl.org'
Subject: Where is Bio::ASN1::EntrezGene?


I just updated my local copy of bioperl from cvs.  When I ran the
configure script, it says I need the external module
Bio::ASN1::EntrezGene.  Which package contains this module?

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From cjfields at uiuc.edu  Tue May 16 13:27:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 12:27:32 -0500
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
In-Reply-To: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <002701c6790e$03d8f110$15327e82@pyrimidine>

It's actually not part of Bioperl currently; you can find it on CPAN:

http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene-1.091/lib/Bio/ASN1/Ent
rezGene.pm

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, May 16, 2006 11:18 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
> 
> I just updated my local copy of bioperl from cvs.  When I ran the
> configure script, it says I need the external module
> Bio::ASN1::EntrezGene.  Which package contains this module?
> 
> --
> Ryan Golhar  -  golharam at umdnj.edu
> The Informatics Institute of UMDNJ
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue May 16 16:57:13 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 16 May 2006 16:57:13 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca>


With regards to the suggestions/comments made thank you. However I think
I should clear a few things up. I am running bioperl v1.4, I am cycling
through the blast reports which should not be of absurd size since they
only contain the top 5 hits, and I am using top to track(although I
realize fairly inacuately) the memory usage. I have looked through the
code for both AAFCBLAST and BEAST_UPDATE but do not believe the
leak/problem to be contained within them since they are almost
exclusively using method calls and those variables should be destroyed
upon leaving the scope of the method. I have used Devel::Size to check
the size of the variables $bdbi and $searchio and $connector and on each
iteration these variables have the same size. Any other suggestions
would be greatly appreciated as I have nearly gone insane trying to
track this problem down.

Thanks, Wayne 


-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] 
Sent: Monday, May 15, 2006 6:19 PM
To: Clarke, Wayne
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO

> taking up and huge amount of RAM. For a single job of 10000 queries it
> can consume as much as a couple hundred Mb inside an hour. I realize

>  my $result = $connector->getQueryResult($query_id);
>                 my $searchio = new Bio::SearchIO(-format => "blast",
>                 while (my $o_blast = $searchio->next_result()) {
>                         my $clone_id = $o_blast->query_name();
>                         my $statement = $bdbi->form_push_SQL
($o_blast, $clone_id, 5); }

Some comments:

Have you considered that whatever class/module $bdbi belongs to is 
causing the problem? ie. is it keeping a reference to $o_blast around?

Are you aware that Perl garbage collection does not necessarily return 
freed memory back to the OS? This may affect how you were measuring 
"memory usage".

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From smarkel at scitegic.com  Tue May 16 16:52:05 2006
From: smarkel at scitegic.com (smarkel at scitegic.com)
Date: Tue, 16 May 2006 13:52:05 -0700
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <20060516200436.34908.qmail@web36812.mail.mud.yahoo.com>
Message-ID: <OFE1576D7B.C032BA7B-ON88257170.00721261-88257170.00729CCD@scitegic.com>

Li,

You can either do the substring, and reverse complement, yourself
or you can use the translate() function in Bio::PrimarySeq.  It
inherits from Bio::PrimarySeqI, so check there for the documentation.
That translate() function takes a "-frame" argument.

Scott

PS In future, please respond to the list.  That way others see
the questions and answers.

chen li <chen_li3 at yahoo.com> wrote on 16.05.2006 13:04:36:

> Dear Dr. Markel,
> 
>     I browse through the document of 
> Bio:Tools::Codontable and find this line:
> 
> my $translation= $CodonTable->translate($seq);
> 
> I think this line is to do the translation. Here is my
> question: which line in the doc says how to translate
> the remaining frames 2,3, and -1, -2, -3? 
> 
> 
> Thank you,
> 
> Li
> 
> --- smarkel at scitegic.com wrote:
> 
> > Li,
> > 
> > Use the translate() function in
> > Bio::Tools::CodonTable.
> > 
> > Scott
> > 
> > Scott Markel, Ph.D.
> > Principal Bioinformatics Architect  email: 
> > smarkel at scitegic.com
> > SciTegic Inc.                       mobile: +1 858
> > 205 3653
> > 10188 Telesis Court, Suite 100      voice:  +1 858
> > 799 5603
> > San Diego, CA 92121                 fax:    +1 858
> > 279 8804
> > USA                                 web: 
> > http://www.scitegic.com
> > 
> > 
> > bioperl-l-bounces at lists.open-bio.org wrote on
> > 16.05.2006 07:55:51:
> > 
> > > Hi all,
> > > 
> > > I wonder which module is available for translating
> > DNA
> > > sequence into 6 reading frames.
> > > 
> > > Thank you,
> > > 
> > > Li
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> 
> -- 
> Click on the link below to report this email as spam
> https://www.mailcontrol.
> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO!
> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI!
> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT!
> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2!
> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV 


From cjfields at uiuc.edu  Tue May 16 17:15:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 16:15:10 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca>
Message-ID: <000601c6792d$d0ab1500$15327e82@pyrimidine>

I mentioned two possibilities last time I posted: 1) that the BLAST file was
too large, or 2) that you are using an old version of bioperl that SearchIO
is broken.  You seem to fit #2. 

The issue is that NCBI does not consider text BLAST output sacrosanct and
routinely makes changes to it that break parsing.  Due to this,
SearchIO::blast needs to be constantly updated, so much so that there are
normally a few updates a year to fix parsing issues in that module alone
compared to BioPerl as a whole.  And, BTW, although bioperl-1.4 is about 2
years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the
latest NCBI BLAST (2.2.14 now).  I seriously suggest updating your local
bioperl distribution to the latest bioperl-live (from CVS).

Take one of those 10000 reports, just one, and try parsing it.  If you have
the same problem (a CPU spike and increasing memory usage) then it may be
fixed in CVS.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 3:57 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> With regards to the suggestions/comments made thank you. However I think
> I should clear a few things up. I am running bioperl v1.4, I am cycling
> through the blast reports which should not be of absurd size since they
> only contain the top 5 hits, and I am using top to track(although I
> realize fairly inacuately) the memory usage. I have looked through the
> code for both AAFCBLAST and BEAST_UPDATE but do not believe the
> leak/problem to be contained within them since they are almost
> exclusively using method calls and those variables should be destroyed
> upon leaving the scope of the method. I have used Devel::Size to check
> the size of the variables $bdbi and $searchio and $connector and on each
> iteration these variables have the same size. Any other suggestions
> would be greatly appreciated as I have nearly gone insane trying to
> track this problem down.
> 
> Thanks, Wayne
> 
> 
> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Monday, May 15, 2006 6:19 PM
> To: Clarke, Wayne
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> > taking up and huge amount of RAM. For a single job of 10000 queries it
> > can consume as much as a couple hundred Mb inside an hour. I realize
> 
> >  my $result = $connector->getQueryResult($query_id);
> >                 my $searchio = new Bio::SearchIO(-format => "blast",
> >                 while (my $o_blast = $searchio->next_result()) {
> >                         my $clone_id = $o_blast->query_name();
> >                         my $statement = $bdbi->form_push_SQL
> ($o_blast, $clone_id, 5); }
> 
> Some comments:
> 
> Have you considered that whatever class/module $bdbi belongs to is
> causing the problem? ie. is it keeping a reference to $o_blast around?
> 
> Are you aware that Perl garbage collection does not necessarily return
> freed memory back to the OS? This may affect how you were measuring
> "memory usage".
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue May 16 17:24:51 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 16 May 2006 17:24:51 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca>


Thanks Chris, 

I did forget to mention however that I did parse one single report and
found no problems, it finished fast and with no noticeable memory usage.
I will consider getting my SA to update bioperl from CVS as a precaution
but he has already stated he prefers to wait for the release of v1.5.
Even a single job of 10000 will finish but the problem is that I am
trying to loop through many jobs of 10000 and it seems to be additive
for reasons I can not determine. During testing I noticed that the RSS
on top decreased around 80% MEM usage, but then the shared mem
increased. I am wondering if this is due to the perl garbage collector
freeing up memory but keeping it in its pool for use, if so that is fine
as long as the it does not then want to reach into swapped mem.

Thanks again, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Tuesday, May 16, 2006 3:15 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Memory Leak in Bio::SearchIO

I mentioned two possibilities last time I posted: 1) that the BLAST file
was
too large, or 2) that you are using an old version of bioperl that
SearchIO
is broken.  You seem to fit #2. 

The issue is that NCBI does not consider text BLAST output sacrosanct
and
routinely makes changes to it that break parsing.  Due to this,
SearchIO::blast needs to be constantly updated, so much so that there
are
normally a few updates a year to fix parsing issues in that module alone
compared to BioPerl as a whole.  And, BTW, although bioperl-1.4 is about
2
years old now, even bioperl-1.5.1 SearchIO is broken when it comes to
the
latest NCBI BLAST (2.2.14 now).  I seriously suggest updating your local
bioperl distribution to the latest bioperl-live (from CVS).

Take one of those 10000 reports, just one, and try parsing it.  If you
have
the same problem (a CPU spike and increasing memory usage) then it may
be
fixed in CVS.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 3:57 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> With regards to the suggestions/comments made thank you. However I
think
> I should clear a few things up. I am running bioperl v1.4, I am
cycling
> through the blast reports which should not be of absurd size since
they
> only contain the top 5 hits, and I am using top to track(although I
> realize fairly inacuately) the memory usage. I have looked through the
> code for both AAFCBLAST and BEAST_UPDATE but do not believe the
> leak/problem to be contained within them since they are almost
> exclusively using method calls and those variables should be destroyed
> upon leaving the scope of the method. I have used Devel::Size to check
> the size of the variables $bdbi and $searchio and $connector and on
each
> iteration these variables have the same size. Any other suggestions
> would be greatly appreciated as I have nearly gone insane trying to
> track this problem down.
> 
> Thanks, Wayne
> 
> 
> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Monday, May 15, 2006 6:19 PM
> To: Clarke, Wayne
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> > taking up and huge amount of RAM. For a single job of 10000 queries
it
> > can consume as much as a couple hundred Mb inside an hour. I realize
> 
> >  my $result = $connector->getQueryResult($query_id);
> >                 my $searchio = new Bio::SearchIO(-format => "blast",
> >                 while (my $o_blast = $searchio->next_result()) {
> >                         my $clone_id = $o_blast->query_name();
> >                         my $statement = $bdbi->form_push_SQL
> ($o_blast, $clone_id, 5); }
> 
> Some comments:
> 
> Have you considered that whatever class/module $bdbi belongs to is
> causing the problem? ie. is it keeping a reference to $o_blast around?
> 
> Are you aware that Perl garbage collection does not necessarily return
> freed memory back to the OS? This may affect how you were measuring
> "memory usage".
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue May 16 17:45:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 16:45:16 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca>
Message-ID: <000801c67932$050dbd30$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 4:25 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> Thanks Chris,
> 
> I did forget to mention however that I did parse one single report and
> found no problems, it finished fast and with no noticeable memory usage.
> I will consider getting my SA to update bioperl from CVS as a precaution
> but he has already stated he prefers to wait for the release of v1.5.

Um, you can tell him the last release was v.1.5.1 (last October).  It's
considered a developer release but is pretty stable; well, except for that
whole SearchIO quibble, and that's not our fault.

You could also install a local version in case he doesn't budge; see here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_I
N_A_PERSONAL_MODULE_AREA

Chris

> Even a single job of 10000 will finish but the problem is that I am
> trying to loop through many jobs of 10000 and it seems to be additive
> for reasons I can not determine. During testing I noticed that the RSS
> on top decreased around 80% MEM usage, but then the shared mem
> increased. I am wondering if this is due to the perl garbage collector
> freeing up memory but keeping it in its pool for use, if so that is fine
> as long as the it does not then want to reach into swapped mem.
> 
> Thanks again, Wayne
> ...


From cjfields at uiuc.edu  Tue May 16 18:20:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 17:20:29 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
In-Reply-To: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com>
Message-ID: <000901c67936$f0896990$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernd Web
> Sent: Tuesday, May 16, 2006 6:38 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
> 
> Hi all,
> 
> I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and
> found some issues and differences (bugs?) in behaviour wrt the pod.
> Do these look familiar ?
> 
> Some example code:
> my $query = Bio::DB::Query::GenBank->new
>        (-query   =>'Lassa Virus[ORGN]',
>         -reldate => '30',
>         -db      => 'protein',
>         -ids => [195052,2981014,11127914],
>         -maxids => 30 );
> 
> $gb = new Bio::DB::GenBank(format=>'fasta');
> my $seqio = $gb->get_Stream_by_query($query);
> while (my $seq = $seqio->next_seq) {
>        print $seq->desc,"\n"; }
> 
> The module states that if we provide -ids that:
>        If you provide an array reference of IDs in -ids, the query will be
>        ignored and the list of IDs will be used when the query is passed
> to a
>        Bio::DB::GenBank object's get_Stream_by_query() method.
> 
> In the above case actually the query is passed ('Lassa Virus[ORGN]),
> not the IDs. Also $query->query shows the original query. Am I doing
> something wrong or is the pod not reflecting current behaviour of this
> module?
> 
> I was also surprised that if internet is down no warning is thrown for
> $query->query or $query->count at all. Only the get_Stream_by_query
> above will warn us if the site is unreachable (500 Internal Server
> Error).

I believe this has to do with the difference in the objects and the way they
retrieve request data; Bio::DB::GenBank and Bio::DB::Query::GenBank use
different methods to retrieve ids, Bio::DB::GenBank's get_Stream_by_query
method just makes it a bit easier to retrieve a list of uid's directly
instead of saving them as an array then reposting them using
get_Stream_by_id.  Not fullproof but it works okay.

> $query->ids or $query->count will not throw a warning and
> @ids=$query->ids will just be an empty array. (I realize $query->count
> is not initialized, so I am using this now to check for succes, but a
> warning from WebDBSeqI would me more approprotiate I think).

WebDBSeqI would be the place to make general warnings (it supposed to be and
interface for any web seq DB), but not eutils-specific warnings. 

> Last, the example from the pod is not working, but no warnings are raised:
>           # initialize the list yourself
>           my $query =
> Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]);
> 
> $query->count returns zero w/o any warning. Of course this query did
> not specify a DB. Only if we specify -db=>'nucleotide' $query->count
> is 3.
> However, why not any warning if we set -db->'protein' or if we did not set
> this?
>
>
> On the NCBI website searching Protein DB returns for 19505:
>       See Details. No items found.
>       The following term(s) refer to a different DB:195052
> 
> But this is not reflected via Bio::DB::Query::GenBank.
> 
> Can I check for this situation in the code apart from checking on
> $query->count == 0 ? Or would it indeed be better to check for these
> situations in the module?
> 
> Regards,
> Bernd

I can probably play around with adding a few things in tomorrow and clean up
the POD somewhat.  I'm planning a rewrite for EUtilities-based searches but
that's a ways off still...  Can't promise much;l I'm pretty busy til next
week.

Chris


From chen_li3 at yahoo.com  Tue May 16 20:53:17 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 16 May 2006 17:53:17 -0700 (PDT)
Subject: [Bioperl-l] module for formating sequence output on the screen
Message-ID: <20060517005317.3976.qmail@web36815.mail.mud.yahoo.com>

Hi all,

Thank you very much for the help.

I have some DNA sequences printed on the screen. But
the default output is longer than I expect.  I need 50
necleotides/line. I search CPAN but can not get the
right module.  Which bioperl module can do this job?

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From kmdaily at indiana.edu  Tue May 16 09:57:52 2006
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Tue, 16 May 2006 09:57:52 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>
Message-ID: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>

OK, got that installed. But I still get an error:

Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557.

I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core".

Kenny Daily
IU School of Informatics
kmdaily at indiana.edu


-----Original Message-----
From: Christopher Fields [mailto:cjfields at uiuc.edu]
Sent: Tue 5/16/2006 7:38 AM
To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
 
You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
developer release  (1.5.1):

http://www.bioperl.org/wiki/Installing_BioPerl

Chris

---- Original message ----
>Date: Mon, 15 May 2006 17:00:12 -0400
>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>To: <bioperl-l at lists.open-bio.org>
>
>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
Bio/SeqIO). How can I get this module?
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From skirov at utk.edu  Wed May 17 07:48:29 2006
From: skirov at utk.edu (Stefan Kirov)
Date: Wed, 17 May 2006 07:48:29 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
In-Reply-To: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>
References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>
	<20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>
Message-ID: <446B0D8D.40901@utk.edu>

You are using an old Bio::Annotation::DBLink module. Did you download 
only entrezgene.pm or the whole  bioperl? If yes, what does the tests 
tell you?
Stefan
 
Daily, Kenneth Michael wrote:

>OK, got that installed. But I still get an error:
>
>Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557.
>
>I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core".
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>
>-----Original Message-----
>From: Christopher Fields [mailto:cjfields at uiuc.edu]
>Sent: Tue 5/16/2006 7:38 AM
>To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
> 
>You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
>developer release  (1.5.1):
>
>http://www.bioperl.org/wiki/Installing_BioPerl
>
>Chris
>
>---- Original message ----
>  
>
>>Date: Mon, 15 May 2006 17:00:12 -0400
>>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>>To: <bioperl-l at lists.open-bio.org>
>>
>>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
>>    
>>
>Bio/SeqIO). How can I get this module?
>  
>
>>Kenny Daily
>>IU School of Informatics
>>kmdaily at indiana.edu
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>    
>>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>  
>


From osborne1 at optonline.net  Tue May 16 20:46:00 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 16 May 2006 20:46:00 -0400
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <OFE1576D7B.C032BA7B-ON88257170.00721261-88257170.00729CCD@scitegic.com>
Message-ID: <C08FEA88.877A%osborne1@optonline.net>

Chen Li,

There's some documentation on translate() in bptutorial:

http://bioperl.org/Core/Latest/bptutorial.html

You could also use the translate_6frames() method of Bio::SeqUtils.


Brian O.


On 5/16/06 4:52 PM, "smarkel at scitegic.com" <smarkel at scitegic.com> wrote:

> Li,
> 
> You can either do the substring, and reverse complement, yourself
> or you can use the translate() function in Bio::PrimarySeq.  It
> inherits from Bio::PrimarySeqI, so check there for the documentation.
> That translate() function takes a "-frame" argument.
> 
> Scott
> 
> PS In future, please respond to the list.  That way others see
> the questions and answers.
> 
> chen li <chen_li3 at yahoo.com> wrote on 16.05.2006 13:04:36:
> 
>> Dear Dr. Markel,
>> 
>>     I browse through the document of
>> Bio:Tools::Codontable and find this line:
>> 
>> my $translation= $CodonTable->translate($seq);
>> 
>> I think this line is to do the translation. Here is my
>> question: which line in the doc says how to translate
>> the remaining frames 2,3, and -1, -2, -3?
>> 
>> 
>> Thank you,
>> 
>> Li
>> 
>> --- smarkel at scitegic.com wrote:
>> 
>>> Li,
>>> 
>>> Use the translate() function in
>>> Bio::Tools::CodonTable.
>>> 
>>> Scott
>>> 
>>> Scott Markel, Ph.D.
>>> Principal Bioinformatics Architect  email:
>>> smarkel at scitegic.com
>>> SciTegic Inc.                       mobile: +1 858
>>> 205 3653
>>> 10188 Telesis Court, Suite 100      voice:  +1 858
>>> 799 5603
>>> San Diego, CA 92121                 fax:    +1 858
>>> 279 8804
>>> USA                                 web:
>>> http://www.scitegic.com
>>> 
>>> 
>>> bioperl-l-bounces at lists.open-bio.org wrote on
>>> 16.05.2006 07:55:51:
>>> 
>>>> Hi all,
>>>> 
>>>> I wonder which module is available for translating
>>> DNA
>>>> sequence into 6 reading frames.
>>>> 
>>>> Thank you,
>>>> 
>>>> Li
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> 
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>> 
>> 
>> -- 
>> Click on the link below to report this email as spam
>> https://www.mailcontrol.
>> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO!
>> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI!
>> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT!
>> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2!
>> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e-just at northwestern.edu  Wed May 17 11:03:41 2006
From: e-just at northwestern.edu (Eric Just)
Date: Wed, 17 May 2006 10:03:41 -0500
Subject: [Bioperl-l] Modware: a BioPerl based API for Chado
Message-ID: <6.1.1.1.2.20060517095821.13353920@hecky.it.northwestern.edu>

Hi Everyone,

We are announcing a new Sourceforge Project called Modware that may be of 
interest to you.   It is an object-oriented API written in Perl that 
creates BioPerl object representations of biological features stored in a 
Chado database. It basically creates a Bio::Seq object for chromosomes in 
Chado and creates Bio::SeqFeature::Gene objects for protein coding 
transcripts stored in Chado.  Things like contigs are represented as 
Bio::SeqFeature::Generic objects.  We also provide many methods for 
manipulating these objects once they are in memory.

For download please visit our Sourceforge project page:
http://sourceforge.net/projects/gmod-ware

For API documentation and some short examples of selected use cases visit 
our project home page:
http://gmod-ware.sourceforge.net/

This software is adapted from the production middleware code that dictyBase 
uses.  Modware 0.1 requires the latest stable GMOD release: 0.003 be 
installed.  We are currently calling it a release candidate and if we get 
some feedback will call it an official release if there are no major 
install bugs (we've installed it only on two different machines).  If you 
would like a version that works on the latest CVS version of GMOD, let me 
know and I'll expedite getting that out the door.

Lastly, please use the direct download version, we have not fully recovered 
from the recent Sourceforge CVS issues.

Please try the software out and let us know what you think!


Sincerely,
Eric Just and Sohel Merchant

e-just at northwestern.edu
s-merchant at northwestern.edu


============================================

Eric Just
e-just at northwestern.edu
dictyBase Programmer
Center for Genetic Medicine
Northwestern University
http://dictybase.org

============================================ 


From sb at mrc-dunn.cam.ac.uk  Wed May 17 13:46:45 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 17 May 2006 18:46:45 +0100
Subject: [Bioperl-l] Bio::Map:: enhancements
Message-ID: <446B6185.1000602@mrc-dunn.cam.ac.uk>

I added bug http://bugzilla.bioperl.org/show_bug.cgi?id=1998

I'm interested in what people have to say about the secondary 
enhancement I talk about there. Is it a sane thing to do? What are the 
better ways of doing that?
If it /is/ ok, I suppose I'd have to go back and alter 
Bio::Map::MappableI and Bio::Map::MarkerI as well, not just Marker.


Oh, on a side note, you'll see I had to override RangeI's intersection 
method to work on multiple ranges. Why is RangeI limited to an 
intersection of only two ranges?

Cheers,
Sendu.


From David_Waner/San_Diego/Accelrys at scitegic.com  Thu May 18 15:30:46 2006
From: David_Waner/San_Diego/Accelrys at scitegic.com (David_Waner/San_Diego/Accelrys at scitegic.com)
Date: Thu, 18 May 2006 12:30:46 -0700
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on
	Windows
Message-ID: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>

BioPerl Users/Developers,

In our testing we have found severe performance problems using BioPerl 
with Perl 5.8 on Windows (but not on Linux). They show up especially in 
SeqIO when reading or writing Fasta files containing large (~16 MB) 
sequences.  The same files that can be read in 1 or 2 seconds with Windows 
Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.

Although the fault is clearly with Perl, not with BioPerl, I have 
identified a couple of places where BioPerl could be modified in order to 
save Windows Perl 5.8 users a lot of time, while not affecting other 
users. 

For example, in my testing the following excerpt from 
Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a 
16 MB sequence):

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015?\012/\n/g;
        $line =~ s/\015/\n/g unless $ONMAC;
    }
 
whereas the following replacement code should be equivalent: 

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015\012/\012/g;                        # Change all 
CR/LF pairs to LF
        $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to 
NEWLINE
    }
 
but executes in less than 1 second.

In addition, changing:

    defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
 
to:

    defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove 
whitespace
 
in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.

There are also problems in reading files with the <> operator when $/ is 
redefined to "\n>", where reading the first line of Fasta files containing 
large sequences takes ~50 seconds, but reading subsequent lines or files 
takes about 1 second. I don't have a work-around for this.

I would like to ask the mailing list:

1. Has anyone else run into this problem? Any fixes?
2. Do you think BioPerl should incorporate these changes? 

I plan to submit a bug report to perlbug, but don't know when or if the 
problem will be fixed. 

- David


From cjfields at uiuc.edu  Thu May 18 16:07:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 18 May 2006 15:07:14 -0500
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8
	onWindows
In-Reply-To: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>
Message-ID: <002901c67ab6$a84c3140$15327e82@pyrimidine>

David,

I have seen some slowdowns with Bio::SeqIO associated with GenBank files,
which this could be related to.  I can't do anything about it (test or
commit changes) until next week but someone else using Windows might (though
we are few and far between, and I'm switching to Mac OS X in fall).  Would
be nice to try the changes and test it out on a few platforms.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of
> David_Waner/San_Diego/Accelrys at scitegic.com
> Sent: Thursday, May 18, 2006 2:31 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8
> onWindows
> 
> BioPerl Users/Developers,
> 
> In our testing we have found severe performance problems using BioPerl
> with Perl 5.8 on Windows (but not on Linux). They show up especially in
> SeqIO when reading or writing Fasta files containing large (~16 MB)
> sequences.  The same files that can be read in 1 or 2 seconds with Windows
> Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.
> 
> Although the fault is clearly with Perl, not with BioPerl, I have
> identified a couple of places where BioPerl could be modified in order to
> save Windows Perl 5.8 users a lot of time, while not affecting other
> users.
> 
> For example, in my testing the following excerpt from
> Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a
> 16 MB sequence):
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015?\012/\n/g;
>         $line =~ s/\015/\n/g unless $ONMAC;
>     }
> 
> whereas the following replacement code should be equivalent:
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015\012/\012/g;                        # Change all
> CR/LF pairs to LF
>         $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to
> NEWLINE
>     }
> 
> but executes in less than 1 second.
> 
> In addition, changing:
> 
>     defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
> 
> to:
> 
>     defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove
> whitespace
> 
> in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.
> 
> There are also problems in reading files with the <> operator when $/ is
> redefined to "\n>", where reading the first line of Fasta files containing
> large sequences takes ~50 seconds, but reading subsequent lines or files
> takes about 1 second. I don't have a work-around for this.
> 
> I would like to ask the mailing list:
> 
> 1. Has anyone else run into this problem? Any fixes?
> 2. Do you think BioPerl should incorporate these changes?
> 
> I plan to submit a bug report to perlbug, but don't know when or if the
> problem will be fixed.
> 
> - David
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Thu May 18 16:27:57 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 18 May 2006 16:27:57 -0400
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on
 Windows
In-Reply-To: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>
Message-ID: <C092510D.87EB%osborne1@optonline.net>

David,

What are the results from the relevant t/*t files before and after these
patches?

Brian O.


On 5/18/06 3:30 PM, "David_Waner/San_Diego/Accelrys at scitegic.com"
<David_Waner/San_Diego/Accelrys at scitegic.com> wrote:

> BioPerl Users/Developers,
> 
> In our testing we have found severe performance problems using BioPerl
> with Perl 5.8 on Windows (but not on Linux). They show up especially in
> SeqIO when reading or writing Fasta files containing large (~16 MB)
> sequences.  The same files that can be read in 1 or 2 seconds with Windows
> Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.
> 
> Although the fault is clearly with Perl, not with BioPerl, I have
> identified a couple of places where BioPerl could be modified in order to
> save Windows Perl 5.8 users a lot of time, while not affecting other
> users. 
> 
> For example, in my testing the following excerpt from
> Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a
> 16 MB sequence):
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015?\012/\n/g;
>         $line =~ s/\015/\n/g unless $ONMAC;
>     }
>  
> whereas the following replacement code should be equivalent:
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015\012/\012/g;                        # Change all
> CR/LF pairs to LF
>         $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to
> NEWLINE
>     }
>  
> but executes in less than 1 second.
> 
> In addition, changing:
> 
>     defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
>  
> to:
> 
>     defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove
> whitespace
>  
> in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.
> 
> There are also problems in reading files with the <> operator when $/ is
> redefined to "\n>", where reading the first line of Fasta files containing
> large sequences takes ~50 seconds, but reading subsequent lines or files
> takes about 1 second. I don't have a work-around for this.
> 
> I would like to ask the mailing list:
> 
> 1. Has anyone else run into this problem? Any fixes?
> 2. Do you think BioPerl should incorporate these changes?
> 
> I plan to submit a bug report to perlbug, but don't know when or if the
> problem will be fixed.
> 
> - David
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Thu May 18 16:41:27 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 18 May 2006 14:41:27 -0600
Subject: [Bioperl-l] parsing xml output
Message-ID: <446CDBF7.10908@gmx.at>

hi,
what is the best way to parse NCBI- and WU- Blast XML output....
and is it possible to parse both with the same parser, or differ their 
XML output...

thanks


From staffa at niehs.nih.gov  Thu May 18 16:49:15 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Thu, 18 May 2006 16:49:15 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>

Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations.
Namely the six D.melanogaster sequences.  
Specifically to find gene entries and learn the gene name, begin and end and CDS.
Please point me to appropriate modules and documentation.


Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From adamnkraut at gmail.com  Thu May 18 17:07:42 2006
From: adamnkraut at gmail.com (Adam Kraut)
Date: Thu, 18 May 2006 17:07:42 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C?
Message-ID: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>

I am currently using a pairwise alignment algorithm written in C (not by
me).  The program consists of a library of routines, structures, and
definitions which I do not want to spend a lot of time abstracting.  I
already have a hack method of writing the parameters and inputs I want from
perl, calling the c program with system( ), and then parsing the output in
Perl.  Any good programmer would probably smack me but I'm just an undergrad
and I needed to show my boss that this works in order to spend more time on
it.

So on to my question, what is the preferred method of extending Bioperl to
use this algorithm?  I have just read the XS tutorial and a bit about Inline
C.  Can I put the main function in my script using Inline, and then just
point Inline at the rest of the C library?  The program has several
C-structures that are semantically equivalent to Bioperl objects, so just
need somewhere to start.  I will spend some more time so that I have a more
specific question, I just wanted a little feedback, this is my first post to
the bioperl list.

Thanks,
Adam


From osborne1 at optonline.net  Thu May 18 17:54:01 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 18 May 2006 17:54:01 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
Message-ID: <C0926539.87F5%osborne1@optonline.net>

Nick,

Have you read the Feature-Annotation HOWTO? This would be a good starting
point...

Brian O.


On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
wrote:

> Would like a fairly simple way to extract certain information from Genbank
> Genomic File Annotations.
> Namely the six D.melanogaster sequences.
> Specifically to find gene entries and learn the gene name, begin and end and
> CDS.
> Please point me to appropriate modules and documentation.
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 18 18:22:32 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 18 May 2006 18:22:32 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446CDBF7.10908@gmx.at>
References: <446CDBF7.10908@gmx.at>
Message-ID: <EA7E8F20-2531-45B2-83CD-FDA216A18615@duke.edu>

we don't parse WU-BLAST XML at this time.  We'd welcome someone  
contributing this.

ncbi XML is parsed with blastxml format.

-jason
On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:

> hi,
> what is the best way to parse NCBI- and WU- Blast XML output....
> and is it possible to parse both with the same parser, or differ their
> XML output...
>
> thanks
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From MEC at stowers-institute.org  Thu May 18 18:39:15 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 18 May 2006 17:39:15 -0500
Subject: [Bioperl-l] module for formating sequence output on the screen
Message-ID: <CED81D34E37D5043A1211565277A51E50563F496@exchkc02.stowers-institute.org>

Li,

Here's a one-liner that uses bioperl's Bio::SeqIO module to reformat
fasta on standard input to 50 char wide fasta on standard output.

perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta",
-width => 50);  $in = Bio::SeqIO->newFh(-format => "fasta", -fh =>
\*STDIN); print while <$in>' 

You can call it like this:

perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta",
-width => 50);  $in = Bio::SeqIO->newFh(-format => "fasta", -fh =>
\*STDIN); print while <$in>' inputfile.fasta > outputfile.fasta

Does this help?

--Malcolm Cook


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
>Sent: Tuesday, May 16, 2006 7:53 PM
>To: bioperl-l at bioperl.org
>Subject: [Bioperl-l] module for formating sequence output on the screen
>
>Hi all,
>
>Thank you very much for the help.
>
>I have some DNA sequences printed on the screen. But
>the default output is longer than I expect.  I need 50
>necleotides/line. I search CPAN but can not get the
>right module.  Which bioperl module can do this job?
>
>Li
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gish at watson.wustl.edu  Thu May 18 19:57:03 2006
From: gish at watson.wustl.edu (Warren Gish)
Date: Thu, 18 May 2006 18:57:03 -0500
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <EA7E8F20-2531-45B2-83CD-FDA216A18615@duke.edu>
Message-ID: <009f01c67ad6$c359a560$0d00a8c0@PM>

Just to clarify, the XML output from WU-BLAST conforms to the standard
NCBI_BlastOutput.dtd.  Technically, contents of data fields could still be
incompatible, but care was taken to ensure compatibility.  If someone
identifies a difference that prevents parsing or proper interpretation of
the WU-BLAST output, please let me know.
Regards,
--Warren 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Thursday, May 18, 2006 5:23 PM
> To: Hubert Prielinger
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] parsing xml output
> 
> we don't parse WU-BLAST XML at this time.  We'd welcome someone  
> contributing this.
> 
> ncbi XML is parsed with blastxml format.
> 
> -jason
> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
> 
> > hi,
> > what is the best way to parse NCBI- and WU- Blast XML output....
> > and is it possible to parse both with the same parser, or 
> differ their
> > XML output...
> >
> > thanks
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Thu May 18 21:10:50 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Thu, 18 May 2006 20:10:50 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <ca6b1c14.ba9e5f4f.81c0100@expms6.cites.uiuc.edu>

Just to make sure everybody knows, if you use bioperl v1.5.1, 
SearchIO::blastxml uses XML::Parser which should come with most recent perl 
distributions.   The bioperl-live version has switched over to XML::SAX for SAX2 
parsing and it is recommended that you install XML::SAX::ExpatXS as well for 
faster parsing. 

Chris

---- Original message ----
>Date: Thu, 18 May 2006 18:57:03 -0500
>From: "Warren Gish" <gish at watson.wustl.edu>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: "'Hubert Prielinger'" <hubert.prielinger at gmx.at>
>Cc: bioperl-l at bioperl.org
>
>Just to clarify, the XML output from WU-BLAST conforms to the standard
>NCBI_BlastOutput.dtd.  Technically, contents of data fields could still be
>incompatible, but care was taken to ensure compatibility.  If someone
>identifies a difference that prevents parsing or proper interpretation of
>the WU-BLAST output, please let me know.
>Regards,
>--Warren 
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Jason Stajich
>> Sent: Thursday, May 18, 2006 5:23 PM
>> To: Hubert Prielinger
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] parsing xml output
>> 
>> we don't parse WU-BLAST XML at this time.  We'd welcome someone  
>> contributing this.
>> 
>> ncbi XML is parsed with blastxml format.
>> 
>> -jason
>> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
>> 
>> > hi,
>> > what is the best way to parse NCBI- and WU- Blast XML output....
>> > and is it possible to parse both with the same parser, or 
>> differ their
>> > XML output...
>> >
>> > thanks
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Fri May 19 08:52:13 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 19 May 2006 08:52:13 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <009f01c67ad6$c359a560$0d00a8c0@PM>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>
Message-ID: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>

Whoops - sorry Warren - for some reason I had it in my mind that it  
was different.  So the blastxml parser should work fine.  The WUBLAST  
tab-delimited output is different than NCBI's -m8/9 though, right?

-jason


On May 18, 2006, at 7:57 PM, Warren Gish wrote:

> Just to clarify, the XML output from WU-BLAST conforms to the standard
> NCBI_BlastOutput.dtd.  Technically, contents of data fields could  
> still be
> incompatible, but care was taken to ensure compatibility.  If someone
> identifies a difference that prevents parsing or proper  
> interpretation of
> the WU-BLAST output, please let me know.
> Regards,
> --Warren
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Jason Stajich
>> Sent: Thursday, May 18, 2006 5:23 PM
>> To: Hubert Prielinger
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] parsing xml output
>>
>> we don't parse WU-BLAST XML at this time.  We'd welcome someone
>> contributing this.
>>
>> ncbi XML is parsed with blastxml format.
>>
>> -jason
>> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>
>>> hi,
>>> what is the best way to parse NCBI- and WU- Blast XML output....
>>> and is it possible to parse both with the same parser, or
>> differ their
>>> XML output...
>>>
>>> thanks
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Thu May 18 18:42:05 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 19 May 2006 08:42:05 +1000
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446CDBF7.10908@gmx.at>
References: <446CDBF7.10908@gmx.at>
Message-ID: <446CF83D.60207@infotech.monash.edu.au>

> what is the best way to parse NCBI- and WU- Blast XML output....
> and is it possible to parse both with the same parser, or differ their 
> XML output...


For NCBI BLAST XML format, use
	Bio::SearchIO->new(-format=>'blastxml', ...)

I don't know if 'blastxml' will load WU-BLAST XML format.
http://www.bioperl.org/wiki/HOWTO:SearchIO does not mention it.

Why not try it, and report back the results to the bioperl list?

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.seemann.vcf
Type: text/x-vcard
Size: 348 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/b6343abe/attachment-0002.vcf>

From torsten.seemann at infotech.monash.edu.au  Thu May 18 18:37:17 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 19 May 2006 08:37:17 +1000
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
Message-ID: <446CF71D.2070207@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations.
> Namely the six D.melanogaster sequences.  
> Specifically to find gene entries and learn the gene name, begin and end and CDS.
> Please point me to appropriate modules and documentation.

http://www.bioperl.org/
-> http://www.bioperl.org/wiki/HOWTOs
-> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

http://www.bioperl.org/
-> http://www.bioperl.org/wiki/FAQ
-> http://www.bioperl.org/wiki/FAQ#Annotations_and_Features

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.seemann.vcf
Type: text/x-vcard
Size: 348 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/27f849fc/attachment-0002.vcf>

From gish at watson.wustl.edu  Fri May 19 10:50:08 2006
From: gish at watson.wustl.edu (Warren Gish)
Date: Fri, 19 May 2006 09:50:08 -0500
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>
	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
Message-ID: <D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>

Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
blast.wustl.edu/blast/tabular.html).
--Warren

> Whoops - sorry Warren - for some reason I had it in my mind that it  
> was different.  So the blastxml parser should work fine.  The  
> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
> right?
>
> -jason


From adamnkraut at gmail.com  Fri May 19 11:04:01 2006
From: adamnkraut at gmail.com (Adam Kraut)
Date: Fri, 19 May 2006 11:04:01 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
	C?
In-Reply-To: <OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>
References: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>
	<OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>
Message-ID: <134ede0b0605190804i60ee5ce1v984a33e0c91adf52@mail.gmail.com>

The program generates an ensemble of weighted suboptimal alignments by use
of a partition function and stochastic backtracking.  The algorithm is quite
novel and it's really only part of a larger multi-scale comparative modeling
project. There documentation is here:

http://www.tbi.univie.ac.at/~ulim/probA/probA_lib.html

While I think this would be useful to the bioperl community if it were fully
abstracted/extended, I would at the least like to be able to pass in any two
sequences and get back SimpleAlign objects for our internal uses first.  I
have a good idea on how to get started.  I will be sure to post when I get
into trouble.


On 5/19/06, aaron.j.mackey at gsk.com <aaron.j.mackey at gsk.com> wrote:
>
> bioperl-ext is the package in which alignment algorithms and/or BioPerl
> "wrapped" external C libraries live.  Subprojects in bioperl-ext use both
> XS and Inline::C, that's up to you.
>
> You'll need to get your C code compiled to a dynamically loaded library
> (.so) to use either XS or Inline::C; this precludes any reuse of the C
> main() function (although your Inline::C wrapper might recapitulate/copy
> the main() function code).
>
> Out of curiosity, what pairwise alignment algorithm are you using?  This
> is a heavily beaten path, you might want to dig around first to see if
> someone else already has what you need.
>
> -Aaron
>
>


From slenk at emich.edu  Fri May 19 10:42:41 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Fri, 19 May 2006 10:42:41 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
	C?
Message-ID: <f141831f144a37.f144a37f141831@emich.edu>

There is nothing wrong with a reasonable way that works - better not 
to put yourself down.

Inline is good if you can get it to work for you - I have had issues 
with linking Inline to dynamic libraries. I believe Inline makes a 
file that has linkage characteristics specified. Try it and see, then 
tell people how you did it. My two cents.

Another way to use exterior executables is popen3, then reading and 
writing to the pipes. I use it (primer3 and local lab automation 
code) - snippet follows:

my $pid     = 0;
my $cancmd  = 'cancmd.exe';
my $write   = 0;
my $read    = 0;

sub new {

    my $c = {};

    $pid   = open3(\*WTRFH, \*RDRFH, \*RDRFH, $cancmd);

    $write = *WTRFH;
    $read  = *RDRFH;

    $write->autoflush();

    bless $c;
    return $c;
}

Just write your request, then read it back - I make sure that each 
pair is a newline terminated text line - be sure you harvest the child 
pid when you are done.


----- Original Message -----
From: Adam Kraut <adamnkraut at gmail.com>
Date: Thursday, May 18, 2006 5:07 pm
Subject: [Bioperl-l] writing a pairwise alignment module: XS and 
Inline C?

> I am currently using a pairwise alignment algorithm written in C 
> (not by
> me).  The program consists of a library of routines, structures, and
> definitions which I do not want to spend a lot of time 
> abstracting.  I
> already have a hack method of writing the parameters and inputs I 
> want from
> perl, calling the c program with system( ), and then parsing the 
> output in
> Perl.  Any good programmer would probably smack me but I'm just an 
> undergradand I needed to show my boss that this works in order to 
> spend more time on
> it.
> 
> So on to my question, what is the preferred method of extending 
> Bioperl to
> use this algorithm?  I have just read the XS tutorial and a bit 
> about Inline
> C.  Can I put the main function in my script using Inline, and 
> then just
> point Inline at the rest of the C library?  The program has several
> C-structures that are semantically equivalent to Bioperl objects, 
> so just
> need somewhere to start.  I will spend some more time so that I 
> have a more
> specific question, I just wanted a little feedback, this is my 
> first post to
> the bioperl list.
> 
> Thanks,
> Adam
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hubert.prielinger at gmx.at  Fri May 19 12:52:28 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 10:52:28 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
	<D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
Message-ID: <446DF7CC.5060509@gmx.at>

hi,
I wondered whether is it also possible in the xml output (either WU or 
NCBI - Blast) to get the species (taxononmy) for every hit, if I do a 
general search.
regards

Warren Gish wrote:
> Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
> blast.wustl.edu/blast/tabular.html).
> --Warren
>
>   
>> Whoops - sorry Warren - for some reason I had it in my mind that it  
>> was different.  So the blastxml parser should work fine.  The  
>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
>> right?
>>
>> -jason
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From staffa at niehs.nih.gov  Fri May 19 14:12:47 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Fri, 19 May 2006 14:12:47 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <C0926539.87F5%osborne1@optonline.net>
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>

Specifically: 
I have the document to which you refer,
but have not seen this one thing I need in the printout of tags etc.:
the values in this line;
     mRNA            join(380..509,578..1913,7784..8649,9439..10200)
Is that a  location object?


Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


> ----------
> From: 	Brian Osborne
> Sent: 	Thursday, May 18, 2006 5:54 PM
> To: 	Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
> Subject: 	Re: [Bioperl-l] Reading GenBank Genomic File Annotation
> 
> Nick,
> 
> Have you read the Feature-Annotation HOWTO? This would be a good starting
> point...
> 
> Brian O.
> 
> 
> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
> wrote:
> 
> > Would like a fairly simple way to extract certain information from Genbank
> > Genomic File Annotations.
> > Namely the six D.melanogaster sequences.
> > Specifically to find gene entries and learn the gene name, begin and end and
> > CDS.
> > Please point me to appropriate modules and documentation.
> > 
> > 
> > Nick Staffa
> > Telephone: 919-316-4569  (NIEHS: 6-4569)
> > Scientific Computing Support Group
> > NIEHS Information Technology Support Services Contract
> > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> > National Institute of Environmental Health Sciences
> > National Institutes of Health
> > Research Triangle Park, North Carolina
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 


From chandan.kr.singh at gmail.com  Fri May 19 14:37:26 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Sat, 20 May 2006 00:07:26 +0530
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
References: <C0926539.87F5%osborne1@optonline.net>
	<7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
Message-ID: <2d4f320605191137n11017ec0xe41a632a3c7ea9a9@mail.gmail.com>

On 5/19/06, Staffa, Nick (NIH/NIEHS) [C] <staffa at niehs.nih.gov> wrote:
>
> Specifically:
> I have the document to which you refer,
> but have not seen this one thing I need in the printout of tags etc.:
> the values in this line;
>      mRNA            join(380..509,578..1913,7784..8649,9439..10200)
> Is that a  location object?


Yes it is a  location object .  If you  want  that  as a  string (this is
what seems  from ur mail ) , u just have to do this :

$loc = $fet->location();

$loc_str = $loc->to_FTstring() ;

Hope it helps.
Chandan

Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> > ----------
> > From:         Brian Osborne
> > Sent:         Thursday, May 18, 2006 5:54 PM
> > To:   Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
> > Subject:      Re: [Bioperl-l] Reading GenBank Genomic File Annotation
> >
> > Nick,
> >
> > Have you read the Feature-Annotation HOWTO? This would be a good
> starting
> > point...
> >
> > Brian O.
> >
> >
> > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov
> >
> > wrote:
> >
> > > Would like a fairly simple way to extract certain information from
> Genbank
> > > Genomic File Annotations.
> > > Namely the six D.melanogaster sequences.
> > > Specifically to find gene entries and learn the gene name, begin and
> end and
> > > CDS.
> > > Please point me to appropriate modules and documentation.
> > >
> > >
> > > Nick Staffa
> > > Telephone: 919-316-4569  (NIEHS: 6-4569)
> > > Scientific Computing Support Group
> > > NIEHS Information Technology Support Services Contract
> > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> > > National Institute of Environmental Health Sciences
> > > National Institutes of Health
> > > Research Triangle Park, North Carolina
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From osborne1 at optonline.net  Fri May 19 15:39:36 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 19 May 2006 15:39:36 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
Message-ID: <C0939738.8849%osborne1@optonline.net>

Nick,

This is from the HOWTO:

Another way of describing a feature in Genbank involves multiple start and
end positions. These could be called "split" locations, and a very common
example is the join statement in the CDS feature found in Genbank entries
(e.g. join(45..122,233..267)). This calls for a specialized object,
Bio::Location::SplitLocationI, which is a container for Location objects:

      for my $feature ($seqobj->top_SeqFeatures){
        if ( $feature->location->isa('Bio::Location::SplitLocationI')
                       && $feature->primary_tag eq 'CDS' )  {
            for my $location ( $feature->location->sub_Location ) {
                print $location->start . ".." . $location->end . "\n";
          }
        }
      }


Brian O.


On 5/19/06 2:12 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
wrote:

> Specifically: 
> I have the document to which you refer,
> but have not seen this one thing I need in the printout of tags etc.:
> the values in this line;
>      mRNA            join(380..509,578..1913,7784..8649,9439..10200)
> Is that a  location object?
> 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
>> ----------
>> From:  Brian Osborne
>> Sent:  Thursday, May 18, 2006 5:54 PM
>> To:  Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
>> Subject:  Re: [Bioperl-l] Reading GenBank Genomic File Annotation
>> 
>> Nick,
>> 
>> Have you read the Feature-Annotation HOWTO? This would be a good starting
>> point...
>> 
>> Brian O.
>> 
>> 
>> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
>> wrote:
>> 
>>> Would like a fairly simple way to extract certain information from Genbank
>>> Genomic File Annotations.
>>> Namely the six D.melanogaster sequences.
>>> Specifically to find gene entries and learn the gene name, begin and end and
>>> CDS.
>>> Please point me to appropriate modules and documentation.
>>> 
>>> 
>>> Nick Staffa
>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>> Scientific Computing Support Group
>>> NIEHS Information Technology Support Services Contract
>>> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
>>> National Institute of Environmental Health Sciences
>>> National Institutes of Health
>>> Research Triangle Park, North Carolina
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May 19 16:42:09 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 14:42:09 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <F5CA1CDF-B22E-4DFD-9CC1-7CEC7FF6FD75@watson.wustl.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
	<D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
	<446DF7CC.5060509@gmx.at>
	<F5CA1CDF-B22E-4DFD-9CC1-7CEC7FF6FD75@watson.wustl.edu>
Message-ID: <446E2DA1.1050503@gmx.at>

hi warren,
that means if I alter the DTD (if that is possible) by adding the 
taxonomic id to the DTD..... then I should have the taxonomic id tag in 
the xml file (theoretically)
but I guess this is only possible with a local search (blastall) but not 
with an online search.

greetings

Warren Gish wrote:
>
> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote:
>
>> hi,
>> I wondered whether is it also possible in the xml output (either WU 
>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>> do a general search.
>> regards
>>
> The taxonomic id is not an entity in the NCBI XML DTD.  If the 
> information was embedded in deflines, one could conceivably parse for 
> it, but I believe the NCBI only distributes taxids in their ASN.1 data 
> and in their pre-formated BLAST databases, and NCBI BLAST only reports 
> taxids in its ASN.1 output format, where taxid is available as an entity.
>
> --Warren
>
>


From cjfields at uiuc.edu  Fri May 19 16:56:56 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Fri, 19 May 2006 15:56:56 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>

You'll have to pull the GI or accession from each hit and do a lookup by either 
grabbing the sequence and using Bio::Species or use Bio::DB::Taxonomy; there 
isn't any tax information directly incorporated into BLAST reports AFAIK.

Chris

---- Original message ----
>Date: Fri, 19 May 2006 10:52:28 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: Warren Gish <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>
>hi,
>I wondered whether is it also possible in the xml output (either WU or 
>NCBI - Blast) to get the species (taxononmy) for every hit, if I do a 
>general search.
>regards
>
>Warren Gish wrote:
>> Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
>> blast.wustl.edu/blast/tabular.html).
>> --Warren
>>
>>   
>>> Whoops - sorry Warren - for some reason I had it in my mind that it  
>>> was different.  So the blastxml parser should work fine.  The  
>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
>>> right?
>>>
>>> -jason
>>>     
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>   
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May 19 16:59:35 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Fri, 19 May 2006 15:59:35 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <65932c77.bb0b33b0.8253400@expms6.cites.uiuc.edu>

Um, I don't think it works that way.  I'm pretty sure the XML is generated from 
the ASN1 output.  I don't think (like Warren says) that you can directly get to the 
tax information.  Indirectly is another matter...

Chris

---- Original message ----
>Date: Fri, 19 May 2006 14:42:09 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: Warren Gish <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>
>hi warren,
>that means if I alter the DTD (if that is possible) by adding the 
>taxonomic id to the DTD..... then I should have the taxonomic id tag in 
>the xml file (theoretically)
>but I guess this is only possible with a local search (blastall) but not 
>with an online search.
>
>greetings
>
>Warren Gish wrote:
>>
>> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote:
>>
>>> hi,
>>> I wondered whether is it also possible in the xml output (either WU 
>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>>> do a general search.
>>> regards
>>>
>> The taxonomic id is not an entity in the NCBI XML DTD.  If the 
>> information was embedded in deflines, one could conceivably parse for 
>> it, but I believe the NCBI only distributes taxids in their ASN.1 data 
>> and in their pre-formated BLAST databases, and NCBI BLAST only reports 
>> taxids in its ASN.1 output format, where taxid is available as an entity.
>>
>> --Warren
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May 19 17:30:20 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 15:30:20 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446E3854.5010708@gmx.at>
References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>
	<446E3854.5010708@gmx.at>
Message-ID: <446E38EC.9020100@gmx.at>

ok, thanks,
it appears that I only need the species where the Protein is derived 
from, so I guess Bio:Species would satisfy me, or?
and it would work that I just pull off the accession from the blast 
output file and then assign the accession code and get as return value  
the  species name.
is it possible to just assign the accession code, because I looked up 
but they were always talking of the entire file.

regards
>
>
> Christopher Fields wrote:
>> You'll have to pull the GI or accession from each hit and do a lookup 
>> by either grabbing the sequence and using Bio::Species or use 
>> Bio::DB::Taxonomy; there isn't any tax information directly 
>> incorporated into BLAST reports AFAIK.
>>
>> Chris
>>
>> ---- Original message ----
>>  
>>> Date: Fri, 19 May 2006 10:52:28 -0600
>>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  Subject: Re: 
>>> [Bioperl-l] parsing xml output  To: Warren Gish 
>>> <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>>>
>>> hi,
>>> I wondered whether is it also possible in the xml output (either WU 
>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>>> do a general search.
>>> regards
>>>
>>> Warren Gish wrote:
>>>    
>>>> Right, the WU-BLAST tabbed output contains more fields.  (See 
>>>> http:// blast.wustl.edu/blast/tabular.html).
>>>> --Warren
>>>>
>>>>        
>>>>> Whoops - sorry Warren - for some reason I had it in my mind that 
>>>>> it  was different.  So the blastxml parser should work fine.  The  
>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 
>>>>> though,  right?
>>>>>
>>>>> -jason
>>>>>             
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>     
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>   
>
>


From jason.stajich at duke.edu  Fri May 19 18:40:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 19 May 2006 18:40:54 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446E38EC.9020100@gmx.at>
References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>
	<446E3854.5010708@gmx.at> <446E38EC.9020100@gmx.at>
Message-ID: <FAE3151B-301F-4A42-9EFD-D1F8D3CBE752@duke.edu>

There is a gi2taxid table in the /pub/taxonomy part of NCBI FTP site  
(ftp.ncbi.nih.gov) -- I have used this to take GI numbers from report  
and get taxonomy for overall classification. I think something like  
this exists in the scripts or examples directory in the bioperl  
distro. I know I posted about it when I wrote about it a while ago.

-jason
On May 19, 2006, at 5:30 PM, Hubert Prielinger wrote:

> ok, thanks,
> it appears that I only need the species where the Protein is derived
> from, so I guess Bio:Species would satisfy me, or?
> and it would work that I just pull off the accession from the blast
> output file and then assign the accession code and get as return value
> the  species name.
> is it possible to just assign the accession code, because I looked up
> but they were always talking of the entire file.
>
> regards
>>
>>
>> Christopher Fields wrote:
>>> You'll have to pull the GI or accession from each hit and do a  
>>> lookup
>>> by either grabbing the sequence and using Bio::Species or use
>>> Bio::DB::Taxonomy; there isn't any tax information directly
>>> incorporated into BLAST reports AFAIK.
>>>
>>> Chris
>>>
>>> ---- Original message ----
>>>
>>>> Date: Fri, 19 May 2006 10:52:28 -0600
>>>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  Subject: Re:
>>>> [Bioperl-l] parsing xml output  To: Warren Gish
>>>> <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>>>>
>>>> hi,
>>>> I wondered whether is it also possible in the xml output (either WU
>>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I
>>>> do a general search.
>>>> regards
>>>>
>>>> Warren Gish wrote:
>>>>
>>>>> Right, the WU-BLAST tabbed output contains more fields.  (See
>>>>> http:// blast.wustl.edu/blast/tabular.html).
>>>>> --Warren
>>>>>
>>>>>
>>>>>> Whoops - sorry Warren - for some reason I had it in my mind that
>>>>>> it  was different.  So the blastxml parser should work fine.  The
>>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9
>>>>>> though,  right?
>>>>>>
>>>>>> -jason
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From ewijaya at i2r.a-star.edu.sg  Sat May 20 08:36:44 2006
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sat, 20 May 2006 20:36:44 +0800
Subject: [Bioperl-l] Method for checking Sequence type of a file
Message-ID: <30362db229c.446f7ddc@i2r.a-star.edu.sg>


Dear expert,

Is there any Bioperl method that allows
you to check verify sequence type in a file?

For example, given a file we wish
to check (return true  or false) whether
it is in FASTA format, GENBANK format, etc.

This method is useful in web application
as taint checking procedure.

Regards,
Edward WIJAYA
SINGAPORE


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From aaron.j.mackey at gsk.com  Fri May 19 09:33:01 2006
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Fri, 19 May 2006 09:33:01 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
 C?
In-Reply-To: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>
Message-ID: <OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>

bioperl-ext is the package in which alignment algorithms and/or BioPerl 
"wrapped" external C libraries live.  Subprojects in bioperl-ext use both 
XS and Inline::C, that's up to you.

You'll need to get your C code compiled to a dynamically loaded library 
(.so) to use either XS or Inline::C; this precludes any reuse of the C 
main() function (although your Inline::C wrapper might recapitulate/copy 
the main() function code).

Out of curiosity, what pairwise alignment algorithm are you using?  This 
is a heavily beaten path, you might want to dig around first to see if 
someone else already has what you need.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 05/18/2006 05:07:42 PM:

> I am currently using a pairwise alignment algorithm written in C (not by
> me).  The program consists of a library of routines, structures, and
> definitions which I do not want to spend a lot of time abstracting.  I
> already have a hack method of writing the parameters and inputs I want 
from
> perl, calling the c program with system( ), and then parsing the output 
in
> Perl.  Any good programmer would probably smack me but I'm just an 
undergrad
> and I needed to show my boss that this works in order to spend more time 
on
> it.
> 
> So on to my question, what is the preferred method of extending Bioperl 
to
> use this algorithm?  I have just read the XS tutorial and a bit about 
Inline
> C.  Can I put the main function in my script using Inline, and then just
> point Inline at the rest of the C library?  The program has several
> C-structures that are semantically equivalent to Bioperl objects, so 
just
> need somewhere to start.  I will spend some more time so that I have a 
more
> specific question, I just wanted a little feedback, this is my first 
post to
> the bioperl list.
> 
> Thanks,
> Adam
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason.stajich at duke.edu  Sat May 20 10:50:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 20 May 2006 10:50:17 -0400
Subject: [Bioperl-l] Method for checking Sequence type of a file
In-Reply-To: <30362db229c.446f7ddc@i2r.a-star.edu.sg>
References: <30362db229c.446f7ddc@i2r.a-star.edu.sg>
Message-ID: <F42D42CC-B609-48DF-B291-E0CE803D527C@duke.edu>

Try Bio::Tools::GuessSeqFormat

On May 20, 2006, at 8:36 AM, Wijaya Edward wrote:

>
> Dear expert,
>
> Is there any Bioperl method that allows
> you to check verify sequence type in a file?
>
> For example, given a file we wish
> to check (return true  or false) whether
> it is in FASTA format, GENBANK format, etc.
>
> This method is useful in web application
> as taint checking procedure.
>
> Regards,
> Edward WIJAYA
> SINGAPORE
>
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From chen_li3 at yahoo.com  Sat May 20 20:15:01 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sat, 20 May 2006 17:15:01 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
Message-ID: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>

Dear all,


I try one script from GraphicsHowTo under Cygwin
environment(GD and libpng already installed). I type
this line in Cygwin X window:


$ perl render_blast1.pl data1.txt | display -

And here is the result:

display: no decode delegate for this image format
`/tmp/magick-qKiRPDRS'.

Any idea?


Thank you very much,

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From osborne1 at optonline.net  Sat May 20 20:59:06 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sat, 20 May 2006 20:59:06 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <C095339A.886C%osborne1@optonline.net>

Chen,

Not sure. However, whenever I see a new or incomprehensible error message
like "display: no decode delegate for this image format" I Google it.

Brian O.


On 5/20/06 8:15 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> Dear all,
> 
> 
> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
> 
> 
> $ perl render_blast1.pl data1.txt | display -
> 
> And here is the result:
> 
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.
> 
> Any idea?
> 
> 
> Thank you very much,
> 
> Li
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From n.saunders at uq.edu.au  Sun May 21 18:17:44 2006
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Mon, 22 May 2006 08:17:44 +1000
Subject: [Bioperl-l] problems with Bio::Graph
Message-ID: <4470E708.3070402@uq.edu.au>

dear all,

I am having some problems with the Bio::Graph modules.  Running Bioperl 1.5.0 
RC1 with Ubuntu 5.10 i686.

I would like to parse files in PSI MI XML 2.5 format and for selected proteins, 
get the Uniprot accession of interacting partners (this is outlined in the 
documentation for Bio::Graph::ProteinGraph).  I wrote a very simple test script 
and ran it on a selection of XML files.  The script is simply:

----------------------------------------------------------------
use strict;
use Bio::Graph::IO;

my $mifile = shift || die("Usage = biograph.pl <MI datafile>\n");
my $graphio = Bio::Graph::IO->new('-file'   => $mifile,
		  		  '-format' => 'psi_xml');
my $gr = $graphio->next_network;
----------------------------------------------------------------

Here's a summary of the error messages with some sample files (I tried PSI MI 
XML versions 1 and 2.5):

1.  MINT database 9707552_small.xml (PSI 2.5)
Can't call method "att" on an undefined value at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.

2. IntAct database yeast_small-11.xml (PSI 2.5)
Can't call method "att" on an undefined value at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.

3. IntAct database yeast_small-11.xml (PSI 1)
Use of uninitialized value in string eq at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126.

4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1)
These give no errors

5. DIP file dip20060402.mif (PSI 1, complete dataset)
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1'
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328
STACK: Bio::Species::validate_species_name 
/usr/local/share/perl/5.8.7/Bio/Species.pm:340
STACK: Bio::Species::classification /usr/local/share/perl/5.8.7/Bio/Species.pm:170
STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118
STACK: Bio::Graph::IO::psi_xml::_proteinInteractor 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105
STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473
STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469
STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187
STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233
STACK: Bio::Graph::IO::psi_xml::next_network 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79
STACK: ./biograph.pl:18
-----------------------------------------------------------


Looking at the module code, it seems that the first 2 errors relate to a 
parameter "proteinInteractorRef", found in PSI MI version 1 but not version 2.5. 
  Error 3 I haven't yet figured out.  DIP PSI MI XML version 1 for single 
species seems OK, but it seems there are species names in the complete dataset 
that cause problems (error 5).


Is the CVS version of Bio::Graph any better at handling PSI MI XML?  Are there 
plans to get it to work with version 2.5 files from all sources (MINT and 
IntAct) ?  Googling and checking the list archives didn't give a lot of hits 
which made me think it's not a widely-used module.

thanks,
Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://psychro.bioinformatics.unsw.edu.au/neil


From torsten.seemann at infotech.monash.edu.au  Sun May 21 21:31:56 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 22 May 2006 11:31:56 +1000
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <4471148C.5090404@infotech.monash.edu.au>

> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
> $ perl render_blast1.pl data1.txt | display -
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.

You are piping the output of the Perl script (which is a GIF/PNG image) 
into the input of a program called "display". This program is part of 
the ImageMagick toolkit, standard on most Linux installations. Because 
you are using Windows you probably don't have it installed! Try this:

$ perl render_blast1.pl data1.txt > image.gif

Then load 'image.gif' into whatever your favourite image viewer is.
	
-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From darin.london at duke.edu  Mon May 22 11:29:45 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 11:29:45 -0400
Subject: [Bioperl-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <4471D8E9.8090109@duke.edu>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.


From darin.london at duke.edu  Mon May 22 12:00:55 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 09:00:55 -0700
Subject: [Bioperl-l] [Bioperl-announce-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <000301c67db8$e8391f70$6400a8c0@CodonSolutions.local>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.

_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From osborne1 at optonline.net  Mon May 22 17:37:50 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 22 May 2006 17:37:50 -0400
Subject: [Bioperl-l] problems with Bio::Graph
In-Reply-To: <4470E708.3070402@uq.edu.au>
Message-ID: <C097A76E.88A9%osborne1@optonline.net>

Neil,

Let me propose an alternative. In the past few months I've been working on a
Bioperl package for handling protein interaction networks, it is called
bioperl-network. It's similar to the Bio::Graph modules, except for the
following:

- It does not use Nat Goodman's SimpleGraph, it uses Perl's Graph. The
advantage is that we are not responsible for maintaining the algorithm code,
the disadvantage is that Graph has some bugs but Jarkko Hietaniemi has been
working on these and has fixed some significant ones recently.

- It uses names and concepts from Graph. It also has separate notions of
edge and interaction, where one edge can have one or more interactions.

- It uses more method names and conventions borrowed from interaction
databases and PSI MI. For example, a node can be a protein complex composed
of multiple Seq objects, not just a protein.

This package is a makeover of Bio::Graph, therefore Nat Goodman and Richard
Adams are major contributors to it. It's also worth mentioning that it's not
complete, meaning it won't parse all fields from PSI MI 2 or 2.5 but I think
it should be able to handle the code you've shown (and if it cannot then
I'll see that it's fixed). I don't know about PSI MI version 1 but if I'm
not mistaken there's a version 1 -> version 2 converter.

I'm about to put this into CVS so you can take a look, should you choose to.

Brian O.


On 5/21/06 6:17 PM, "Neil Saunders" <n.saunders at uq.edu.au> wrote:

> dear all,
> 
> I am having some problems with the Bio::Graph modules.  Running Bioperl 1.5.0
> RC1 with Ubuntu 5.10 i686.
> 
> I would like to parse files in PSI MI XML 2.5 format and for selected
> proteins, 
> get the Uniprot accession of interacting partners (this is outlined in the
> documentation for Bio::Graph::ProteinGraph).  I wrote a very simple test
> script 
> and ran it on a selection of XML files.  The script is simply:
> 
> ----------------------------------------------------------------
> use strict;
> use Bio::Graph::IO;
> 
> my $mifile = shift || die("Usage = biograph.pl <MI datafile>\n");
> my $graphio = Bio::Graph::IO->new('-file'   => $mifile,
>  '-format' => 'psi_xml');
> my $gr = $graphio->next_network;
> ----------------------------------------------------------------
> 
> Here's a summary of the error messages with some sample files (I tried PSI MI
> XML versions 1 and 2.5):
> 
> 1.  MINT database 9707552_small.xml (PSI 2.5)
> Can't call method "att" on an undefined value at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.
> 
> 2. IntAct database yeast_small-11.xml (PSI 2.5)
> Can't call method "att" on an undefined value at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.
> 
> 3. IntAct database yeast_small-11.xml (PSI 1)
> Use of uninitialized value in string eq at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126.
> 
> 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1)
> These give no errors
> 
> 5. DIP file dip20060402.mif (PSI 1, complete dataset)
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1'
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328
> STACK: Bio::Species::validate_species_name
> /usr/local/share/perl/5.8.7/Bio/Species.pm:340
> STACK: Bio::Species::classification
> /usr/local/share/perl/5.8.7/Bio/Species.pm:170
> STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118
> STACK: Bio::Graph::IO::psi_xml::_proteinInteractor
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105
> STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473
> STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469
> STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187
> STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233
> STACK: Bio::Graph::IO::psi_xml::next_network
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79
> STACK: ./biograph.pl:18
> -----------------------------------------------------------
> 
> 
> Looking at the module code, it seems that the first 2 errors relate to a
> parameter "proteinInteractorRef", found in PSI MI version 1 but not version
> 2.5. 
>   Error 3 I haven't yet figured out.  DIP PSI MI XML version 1 for single
> species seems OK, but it seems there are species names in the complete dataset
> that cause problems (error 5).
> 
> 
> Is the CVS version of Bio::Graph any better at handling PSI MI XML?  Are there
> plans to get it to work with version 2.5 files from all sources (MINT and
> IntAct) ?  Googling and checking the list archives didn't give a lot of hits
> which made me think it's not a widely-used module.
> 
> thanks,
> Neil


From torsten.seemann at infotech.monash.edu.au  Mon May 22 17:53:02 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 23 May 2006 07:53:02 +1000
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>
References: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>
Message-ID: <447232BE.1080001@infotech.monash.edu.au>

Chen Li

>  perl render_blast1.pl data1.txt >im.png

Based on http://bioperl.org/wiki/HOWTO:Graphics I believe the example 
script is creating a PNG image. The last line is:
print $panel->png;

> and Perl runs without any problem. I use adobe
> photoshop to open them and Adobe can't recognize them.
> If I use ACDSee to open them I only get a black
> background. If I issue this line under Cygwin X window
> display im.png  or display im.gif
> Cygwin says:
> display: Improper image header `im.png'.
> It seems Perl can't produce an image with right
> format.

Are you sure Perl is producing a PNG file at all?
How many bytes does im.png use? Zero?

Did you notice this in http://bioperl.org/wiki/HOWTO:Graphics ?

It says: "If you are on a Windows platform, you need to put STDOUT into 
binary mode so that the PNG file does not go through Window's carriage 
return/linefeed transformations. Before the final print statement, put 
the statement binmode(STDOUT)."

ie. your script should have

binmode(STDOUT);
print $panel->png;

as the last 2 lines.

> Do you experience the same problem before?

No.

--Torsten


From chen_li3 at yahoo.com  Mon May 22 09:25:53 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 22 May 2006 06:25:53 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <4471148C.5090404@infotech.monash.edu.au>
Message-ID: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>

Dear Dr. Seemann,


Thank you very much for the reply.

I issue this line:
 perl render_blast1.pl data1.txt >im.gif
or 
 perl render_blast1.pl data1.txt >im.png

and Perl runs without any problem. I use adobe
photoshop to open them and Adobe can't recognize them.
If I use ACDSee to open them I only get a black
background. If I issue this line under Cygwin X window

display im.png  or display im.gif

Cygwin says:

display: Improper image header `im.png'.

or display: Improper image header `im.gif'.

It seems Perl can't produce an image with right
format.


Do you experience the same problem before?

Li


--- Torsten Seemann
<torsten.seemann at infotech.monash.edu.au> wrote:

> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I
> type
> > this line in Cygwin X window:
> > $ perl render_blast1.pl data1.txt | display -
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> 
> You are piping the output of the Perl script (which
> is a GIF/PNG image) 
> into the input of a program called "display". This
> program is part of 
> the ImageMagick toolkit, standard on most Linux
> installations. Because 
> you are using Windows you probably don't have it
> installed! Try this:
> 
> $ perl render_blast1.pl data1.txt > image.gif
> 
> Then load 'image.gif' into whatever your favourite
> image viewer is.
> 	
> -- 
> Dr Torsten Seemann              
> http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash
> University, Australia
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Mon May 22 18:57:42 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 22 May 2006 15:57:42 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <447232BE.1080001@infotech.monash.edu.au>
Message-ID: <20060522225742.78245.qmail@web36804.mail.mud.yahoo.com>

Hi,

I try both: either with or without this statement 
 binmode(STDOUT) before the last line print
$panel->png; But there are no differenes. I get a file
of 2432 bytes.

Li


> Chen Li
> 
> >  perl render_blast1.pl data1.txt >im.png
> 
> Based on http://bioperl.org/wiki/HOWTO:Graphics I
> believe the example 
> script is creating a PNG image. The last line is:
> print $panel->png;
> 
> > and Perl runs without any problem. I use adobe
> > photoshop to open them and Adobe can't recognize
> them.
> > If I use ACDSee to open them I only get a black
> > background. If I issue this line under Cygwin X
> window
> > display im.png  or display im.gif
> > Cygwin says:
> > display: Improper image header `im.png'.
> > It seems Perl can't produce an image with right
> > format.
> 
> Are you sure Perl is producing a PNG file at all?
> How many bytes does im.png use? Zero?
> 
> Did you notice this in
> http://bioperl.org/wiki/HOWTO:Graphics ?
> 
> It says: "If you are on a Windows platform, you need
> to put STDOUT into 
> binary mode so that the PNG file does not go through
> Window's carriage 
> return/linefeed transformations. Before the final
> print statement, put 
> the statement binmode(STDOUT)."
> 
> ie. your script should have
> 
> binmode(STDOUT);
> print $panel->png;
> 
> as the last 2 lines.
> 
> > Do you experience the same problem before?
> 
> No.
> 
> --Torsten
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From barry.moore at genetics.utah.edu  Mon May 22 21:00:06 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Mon, 22 May 2006 19:00:06 -0600
Subject: [Bioperl-l] Problems with Unflattener.pm
Message-ID: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu>

Hi All,

NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into  
an infinite recursive loop.  The trouble occurs in the method  
find_best_matches between lines 2258 and 2281, and in particular the  
loop is perpetuated by line 2273.   NT_113910 has a fairly complex  
features table, and but I have as yet been unable to figure out why  
this loop is not exiting properly.  This has been submitted to  
bugzilla, but I?ll post here so it gets documented on the list also.   
Any suggestions from Chris or others would be greatly appreciated.

This problem can be recreated as follows:

Grab NT_113910 from genbank.
bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk

Pass NT_113910.gbk on the command line to the attached script.


#!/usr/bin/perl;

use strict;
use warnings;

use Bio::SeqIO;
use Bio::SeqFeature::Tools::Unflattener;

my $file = shift;

# generate an Unflattener object
my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
#$unflattener->verbose(1);

# first fetch a genbank SeqI object
my $seqio =
     Bio::SeqIO->new(-file   => $file,
                     -format => 'GenBank');
my $out =
     Bio::SeqIO->new(-format => 'asciitree');
while (my $seq = $seqio->next_seq()) {

         # get top level unflattended SeqFeatureI objects
         $unflattener->unflatten_seq(-seq       => $seq,
                                     -use_magic => 1);
         $out->write_seq($seq);
}


From miker at biotiquesystems.com  Mon May 22 19:56:52 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Mon, 22 May 2006 16:56:52 -0700
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
Message-ID: <002a01c67dfb$663cc600$c100a8c0@mike>


As best as I can tell, using Bio::SeqIO to parse a uniprot file ignores the
sequence version, and calling seq_version() on the resulting RichSeq object
returns undef.

It looks like swiss.pm is trying to parse the version out of the SV line, which
apparently doesn't exist any more?  The sequence version(s) are now specified as
part of the Date (DT) lines.  

Is this not a bug?  Is swiss.pm not designed to parse uniprot files?

Thanks for any help ...


From jason.stajich at duke.edu  Mon May 22 21:37:13 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 22 May 2006 21:37:13 -0400
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <002a01c67dfb$663cc600$c100a8c0@mike>
References: <002a01c67dfb$663cc600$c100a8c0@mike>
Message-ID: <B62A5429-083F-4B93-87EF-0F5DCD4033FE@duke.edu>

Sounds like a "missing feature" =)

AFAIK the module was only written for swissprot files.  It is  
possible there have been changes in the format that have not been  
tracked to the current code.  We'd certainly appreciate someone  
testing it out as versions evolve.  If you submit a bug to bugzilla  
with version of bioperl and example files you can track when a fix is  
in.  We of course appreciate anyone's efforts to provide a patch as  
most bugs get fixed of late when someone gets "itchy" enough to fix  
them.

-jason

On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:

>
> As best as I can tell, using Bio::SeqIO to parse a uniprot file  
> ignores the
> sequence version, and calling seq_version() on the resulting  
> RichSeq object
> returns undef.
>
> It looks like swiss.pm is trying to parse the version out of the SV  
> line, which
> apparently doesn't exist any more?  The sequence version(s) are now  
> specified as
> part of the Date (DT) lines.
>
> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>
> Thanks for any help ...
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Mon May 22 22:04:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 22 May 2006 22:04:17 -0400
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike>
References: <003301c67e0b$5dd44410$c100a8c0@mike>
Message-ID: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu>

We ask that people post patches to the bugzilla as an attachment to  
the bugzilla so we can track what and why the bug was that the patch  
fixes.

I am not totally sure this patch works because it seems like we need  
to strip out more information now from the DT line if the $date  
actually contains more information than just the date.

If you would go ahead and create a bug in bugzilla for  this (http:// 
bugzilla.open-bio.org) this sort of conversation can be tracked to  
the bug.

If any of this is unclear please let us know - I though we had put  
some pages up about this sort of thing on the wiki but maybe they  
need to be expanded.

-jason
On May 22, 2006, at 9:51 PM, Michael Rogoff wrote:

> I have a patch that seems to work but I'm not familiar with the  
> proper method to
> "provide" it.  How do I go about that?
>
> The patch is pretty simple, it just parses the sequence version out  
> of the date
> line where it now hides:
>
>          #date
>          elsif( /^DT\s+(.*)/ ) {
>            my $date = $1;
> +
> +          if ($date =~ /sequence version (\d+)/i) {
> +              $params{'-seq_version'} ||= $1;
> +          }
> +
>            $date =~ s/\;//;
>            $date =~ s/\s+$//;
>            push @{$params{'-dates'}}, $date;
>          }
>
> By the way, what is the difference between Bio::Seq::version and
> Bio::Seq::RichSeq::seq_version?
>
>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Monday, May 22, 2006 6:37 PM
>> To: Michael Rogoff
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
>>
>>
>> Sounds like a "missing feature" =)
>>
>> AFAIK the module was only written for swissprot files.  It is
>> possible there have been changes in the format that have not been
>> tracked to the current code.  We'd certainly appreciate someone
>> testing it out as versions evolve.  If you submit a bug to bugzilla
>> with version of bioperl and example files you can track when
>> a fix is
>> in.  We of course appreciate anyone's efforts to provide a patch as
>> most bugs get fixed of late when someone gets "itchy" enough to fix
>> them.
>>
>> -jason
>>
>> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
>>
>>>
>>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
>>> ignores the
>>> sequence version, and calling seq_version() on the resulting
>>> RichSeq object
>>> returns undef.
>>>
>>> It looks like swiss.pm is trying to parse the version out
>> of the SV
>>> line, which
>>> apparently doesn't exist any more?  The sequence version(s)
>> are now
>>> specified as
>>> part of the Date (DT) lines.
>>>
>>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>>>
>>> Thanks for any help ...
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Marc.Logghe at DEVGEN.com  Tue May 23 03:08:37 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 23 May 2006 09:08:37 +0200
Subject: [Bioperl-l] problems iwth Bio::graphics module
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com>

Hi Li,
Did you check your script for any other print statements (to STDOUT,
that is) that potentially could contaminate your png stream ?

Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
> Sent: Tuesday, May 23, 2006 12:58 AM
> To: Torsten Seemann
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] problems iwth Bio::graphics module
> 
> Hi,
> 
> I try both: either with or without this statement
>  binmode(STDOUT) before the last line print $panel->png; But 
> there are no differenes. I get a file of 2432 bytes.
> 
> Li
> 
> 
> 
> > Chen Li
> > 
> > >  perl render_blast1.pl data1.txt >im.png
> > 
> > Based on http://bioperl.org/wiki/HOWTO:Graphics I believe 
> the example 
> > script is creating a PNG image. The last line is:
> > print $panel->png;
> > 
> > > and Perl runs without any problem. I use adobe photoshop to open 
> > > them and Adobe can't recognize
> > them.
> > > If I use ACDSee to open them I only get a black background. If I 
> > > issue this line under Cygwin X
> > window
> > > display im.png  or display im.gif
> > > Cygwin says:
> > > display: Improper image header `im.png'.
> > > It seems Perl can't produce an image with right format.
> > 
> > Are you sure Perl is producing a PNG file at all?
> > How many bytes does im.png use? Zero?
> > 
> > Did you notice this in
> > http://bioperl.org/wiki/HOWTO:Graphics ?
> > 
> > It says: "If you are on a Windows platform, you need to put STDOUT 
> > into binary mode so that the PNG file does not go through Window's 
> > carriage return/linefeed transformations. Before the final print 
> > statement, put the statement binmode(STDOUT)."
> > 
> > ie. your script should have
> > 
> > binmode(STDOUT);
> > print $panel->png;
> > 
> > as the last 2 lines.
> > 
> > > Do you experience the same problem before?
> > 
> > No.
> > 
> > --Torsten
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection 
> around http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From chen_li3 at yahoo.com  Tue May 23 09:27:06 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 06:27:06 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com>
Message-ID: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com>

Dear Dr. Logghe,

Thank you so much. I have the script worked after
getting your suggestion under Cygwin. Here are the
last two lines:

either binmode (STDOUT);
       print STDOUT $panel->png;

or only print STDOUT $panel->png;

They both work for me. I know the default output in
perl to the screen. I don't why it works if STDOUT
after print is added. Could you explain it?  

BTW I copy  this script from GraphicsHowTo on Bioperl
website and only one line contains print statement,
which is 'print $panel->png'. 

Once again thank you so much,

Li

--- Marc Logghe <Marc.Logghe at devgen.com> wrote:

> Hi Li,
> Did you check your script for any other print
> statements (to STDOUT,
> that is) that potentially could contaminate your png
> stream ?
> 
> Marc
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org 
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On
> Behalf Of chen li
> > Sent: Tuesday, May 23, 2006 12:58 AM
> > To: Torsten Seemann
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] problems iwth
> Bio::graphics module
> > 
> > Hi,
> > 
> > I try both: either with or without this statement
> >  binmode(STDOUT) before the last line print
> $panel->png; But 
> > there are no differenes. I get a file of 2432
> bytes.
> > 
> > Li
> > 
> > 
> > 
> > > Chen Li
> > > 
> > > >  perl render_blast1.pl data1.txt >im.png
> > > 
> > > Based on http://bioperl.org/wiki/HOWTO:Graphics
> I believe 
> > the example 
> > > script is creating a PNG image. The last line
> is:
> > > print $panel->png;
> > > 
> > > > and Perl runs without any problem. I use adobe
> photoshop to open 
> > > > them and Adobe can't recognize
> > > them.
> > > > If I use ACDSee to open them I only get a
> black background. If I 
> > > > issue this line under Cygwin X
> > > window
> > > > display im.png  or display im.gif
> > > > Cygwin says:
> > > > display: Improper image header `im.png'.
> > > > It seems Perl can't produce an image with
> right format.
> > > 
> > > Are you sure Perl is producing a PNG file at
> all?
> > > How many bytes does im.png use? Zero?
> > > 
> > > Did you notice this in
> > > http://bioperl.org/wiki/HOWTO:Graphics ?
> > > 
> > > It says: "If you are on a Windows platform, you
> need to put STDOUT 
> > > into binary mode so that the PNG file does not
> go through Window's 
> > > carriage return/linefeed transformations. Before
> the final print 
> > > statement, put the statement binmode(STDOUT)."
> > > 
> > > ie. your script should have
> > > 
> > > binmode(STDOUT);
> > > print $panel->png;
> > > 
> > > as the last 2 lines.
> > > 
> > > > Do you experience the same problem before?
> > > 
> > > No.
> > > 
> > > --Torsten
> > > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection 
> > around http://mail.yahoo.com 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From lstein at cshl.edu  Tue May 23 10:06:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 23 May 2006 10:06:27 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <200605231006.28392.lstein@cshl.edu>

Hi,

It is possible that your version of display can't handle PNG images. Try 
saving the output as a file and then opening it in another image program:

	perl render_blast1.pl data1.txt > data1.png

Another thing to watch out for is that, depending on what version of Perl 
you're using, you may have to insert this statement into the render_blast1.pl 
script (somewhere near the top):

	binmode STDOUT;

Lincoln


On Saturday 20 May 2006 20:15, chen li wrote:
> Dear all,
>
>
> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
>
>
> $ perl render_blast1.pl data1.txt | display -
>
> And here is the result:
>
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.
>
> Any idea?
>
>
> Thank you very much,
>
> Li
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From Derek.Fairley at bll.n-i.nhs.uk  Tue May 23 10:39:16 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Tue, 23 May 2006 15:39:16 +0100
Subject: [Bioperl-l] Bio::Restriction::IO query
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C04019F@bllmail.bll.n-i.nhs.uk>

Hi folks,

I'm new to BioPerl, and struggling to make the Bio::Restriction::*
modules work (using BioPerl-1.4; Perl-5.8.1; Linux-2.4). Specifically,
I'm having some trouble understanding the behaviour of the
Bio::Restriction::IO module. I'm trying to use this to create a
Bio::Restriction::EnzymeCollection object from a local REBASE file
(which is in bairoch-format); this will in turn be passed to a
Bio::Restriction::Analysis object.

The following test script (derived from the Bio::Restriction::IO
perldoc) runs fine:

#! /usr/bin/perl -w
use strict;
use warnings;
use Bio::Restriction::IO;

my $in = Bio::Restriction::IO->new(	-file => "REBASE_file",
						-format =>'Bairoch');
my $collection = $in->read();
print "Number of REs in the collection: ", scalar
$collection->each_enzyme, "\n";

#note that using -format=>'bairoch' without capitalisation (as shown in
perldoc synopsis) throws an exception: Failed to load module
Bio::Restriction::IO::bairoch...

However... the test script returns the number 532 - the number of
enzymes in the default enzyme set - regardless of the number of enzymes
in the file. A default Bio::Restriction::EnzymeCollection object has
presumably been created (as the 'read()' and 'each_enzyme' methods are
available) but it didn't come from the local file. The result is the
same if the Bio::Restriction::IO->new() method is called with no
arguments - a default EnzymeCollection object is created. It's not clear
to me where this has come from.

My (mis?)understanding was that the default set of enzymes would be
loaded on creation of a new Bio::Restriction::Analysis object (in the
absence of a -enzymes=>... argument). Presumably this is down to my poor
understanding of the BioPerl object model... ;-)

So: how should I create an EnzymeCollection object from file?

Any help or advice would be gratefully received.

PS. Congratulations to the development team for creating a very
impressive and useful open source toolkit.

Derek.


-----------------------------------------
Derek Fairley, Ph.D.
Regional Virus Laboratory,
Kelvin Building,
Royal Victoria Hospital, 
Grosvenor Road,
Belfast,
N. Ireland.
BT12 6BA

Tel. +44 (0)2890 635303


From rowan.mitchell at bbsrc.ac.uk  Tue May 23 10:53:42 2006
From: rowan.mitchell at bbsrc.ac.uk (rowan mitchell (RRes-Roth))
Date: Tue, 23 May 2006 15:53:42 +0100
Subject: [Bioperl-l] Assembly::IO ace output
Message-ID: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>

Hi 

I am very interested in writing ace format files and had assumed that I
would be able to do this with Assembly::IO until I tried it! I see there
has been some correspondence last year on this, but as far as I can see
this is still not implemented in 1.5.1. Is this correct ? Is it planned
to be included; are there modules under development available ?

many thanks

Rowan

===============================================
Dr Rowan Mitchell
Rothamsted Research
Harpenden
Herts AL5 2JQ UK

Tel: +44 (0)1582 763133 x2469
Fax: +44 (0)1582 763010
E-mail: rowan.mitchell at bbsrc.ac.uk
WWW: http://www.rothamsted.bbsrc.ac.uk/
===============================================
Rothamsted Research is a company limited by guarantee, registered in
England under the registration number 2393175 and a not for profit
charity number 802038.


From rfsouza at cecm.usp.br  Tue May 23 16:17:36 2006
From: rfsouza at cecm.usp.br (Robson Francisco de Souza {S})
Date: Tue, 23 May 2006 17:17:36 -0300
Subject: [Bioperl-l] Assembly::IO ace output
In-Reply-To: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>
References: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>
Message-ID: <20060523201736.GA28401@cecm.usp.br>

Hi Rowan,

On Tue, May 23, 2006 at 03:53:42PM +0100, rowan mitchell (RRes-Roth) wrote:
> Hi 
> 
> I am very interested in writing ace format files and had assumed that I
> would be able to do this with Assembly::IO until I tried it! I see there
> has been some correspondence last year on this, but as far as I can see
> this is still not implemented in 1.5.1. Is this correct ? Is it planned
> to be included; are there modules under development available ?

As far as I know, there are no plans to add write support to
Bio::Assembly::IO. When I wrote the original modules there was no need
for this so I left it aside.

Best regards,
Robson

> many thanks
> 
> Rowan
> 
> ===============================================
> Dr Rowan Mitchell
> Rothamsted Research
> Harpenden
> Herts AL5 2JQ UK
> 
> Tel: +44 (0)1582 763133 x2469
> Fax: +44 (0)1582 763010
> E-mail: rowan.mitchell at bbsrc.ac.uk
> WWW: http://www.rothamsted.bbsrc.ac.uk/
> ===============================================
> Rothamsted Research is a company limited by guarantee, registered in
> England under the registration number 2393175 and a not for profit
> charity number 802038.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From lstein at cshl.edu  Tue May 23 16:53:34 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 23 May 2006 16:53:34 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231006.28392.lstein@cshl.edu>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
	<200605231006.28392.lstein@cshl.edu>
Message-ID: <200605231653.36087.lstein@cshl.edu>

Hi Chen,

It looks to me like you cut and paste the data1.txt file from the web site, 
consequently replacing the tabs with spaces. Please get table1.txt from the 
BioPerl distribution, as instructed in the tutorial.

Best,

Lincoln

On Tuesday 23 May 2006 10:06, Lincoln Stein wrote:
> Hi,
>
> It is possible that your version of display can't handle PNG images. Try
> saving the output as a file and then opening it in another image program:
>
> 	perl render_blast1.pl data1.txt > data1.png
>
> Another thing to watch out for is that, depending on what version of Perl
> you're using, you may have to insert this statement into the
> render_blast1.pl script (somewhere near the top):
>
> 	binmode STDOUT;
>
> Lincoln
>
> On Saturday 20 May 2006 20:15, chen li wrote:
> > Dear all,
> >
> >
> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I type
> > this line in Cygwin X window:
> >
> >
> > $ perl render_blast1.pl data1.txt | display -
> >
> > And here is the result:
> >
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> >
> > Any idea?
> >
> >
> > Thank you very much,
> >
> > Li
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From chen_li3 at yahoo.com  Tue May 23 17:46:16 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 14:46:16 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231653.36087.lstein@cshl.edu>
Message-ID: <20060523214616.15131.qmail@web36813.mail.mud.yahoo.com>

Dear Dr. Stein,

Thank you so much. I follow your suggestions and
download codes from the Bioperl CVS website. Now
everything is working.


Li 


--- Lincoln Stein <lstein at cshl.edu> wrote:

> Hi Chen,
> 
> It looks to me like you cut and paste the data1.txt
> file from the web site, 
> consequently replacing the tabs with spaces. Please
> get table1.txt from the 
> BioPerl distribution, as instructed in the tutorial.
> 
> Best,
> 
> Lincoln
> 
> On Tuesday 23 May 2006 10:06, Lincoln Stein wrote:
> > Hi,
> >
> > It is possible that your version of display can't
> handle PNG images. Try
> > saving the output as a file and then opening it in
> another image program:
> >
> > 	perl render_blast1.pl data1.txt > data1.png
> >
> > Another thing to watch out for is that, depending
> on what version of Perl
> > you're using, you may have to insert this
> statement into the
> > render_blast1.pl script (somewhere near the top):
> >
> > 	binmode STDOUT;
> >
> > Lincoln
> >
> > On Saturday 20 May 2006 20:15, chen li wrote:
> > > Dear all,
> > >
> > >
> > > I try one script from GraphicsHowTo under Cygwin
> > > environment(GD and libpng already installed). I
> type
> > > this line in Cygwin X window:
> > >
> > >
> > > $ perl render_blast1.pl data1.txt | display -
> > >
> > > And here is the result:
> > >
> > > display: no decode delegate for this image
> format
> > > `/tmp/magick-qKiRPDRS'.
> > >
> > > Any idea?
> > >
> > >
> > > Thank you very much,
> > >
> > > Li
> > >
> > >
> > >
> __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > > http://mail.yahoo.com
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING, 
> PLEASE CONTACT MY ASSISTANT, 
> SANDRA MICHELSEN, AT michelse at cshl.edu
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Tue May 23 18:59:46 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 15:59:46 -0700 (PDT)
Subject: [Bioperl-l] How to download sequence files either in EMBL format
Message-ID: <20060523225946.2118.qmail@web36805.mail.mud.yahoo.com>

Hi all,

I need to download one sequence for a gene. I go to
NCBI website,find the gene of interest,download the
file in Genbank format(saved as sequence.genbank). But
to my surprise this so-called genbank format file
doesn't contain many features such as exons,compared
to the one in Emsembl. 

My question: where can I download this sequence file
in EMBL format? It looks like the one in EMBL might
contain other information such exon.

Thank you very much,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From osborne1 at optonline.net  Wed May 24 10:33:16 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 24 May 2006 10:33:16 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com>
Message-ID: <C099E6EC.88F0%osborne1@optonline.net>

Li,

The Graphics HOWTO talks about this Windows workaround in _four_ different
places, it's impossible to miss if you read it from start to finish. This is
what one should do if one wants to use these modules and one is a novice.
Example:

Important! Remember that if you are on a Windows platform, you need to put
STDOUT into binary mode so that the PNG file does not go through Window's
carriage return/linefeed transformations. Before the final print statement,
write binmode(STDOUT).

Brian O.


On 5/23/06 9:27 AM, "chen li" <chen_li3 at yahoo.com> wrote:

> BTW I copy  this script from GraphicsHowTo on Bioperl
> website and only one line contains print statement,
> which is 'print $panel->png'. 


From chen_li3 at yahoo.com  Wed May 24 12:17:15 2006
From: chen_li3 at yahoo.com (chen li)
Date: Wed, 24 May 2006 09:17:15 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <C099E6EC.88F0%osborne1@optonline.net>
Message-ID: <20060524161715.45141.qmail@web36807.mail.mud.yahoo.com>

Thanks but Dr. Stein already helps me to figure out
what is going on: I should have copied the source
codes for the examples in CVS instead of "cut and
paste" from the HOWTO tutorial. And sorry for any
inconvience.

Li

--- Brian Osborne <osborne1 at optonline.net> wrote:

> Li,
> 
> The Graphics HOWTO talks about this Windows
> workaround in _four_ different
> places, it's impossible to miss if you read it from
> start to finish. This is
> what one should do if one wants to use these modules
> and one is a novice.
> Example:
> 
> Important! Remember that if you are on a Windows
> platform, you need to put
> STDOUT into binary mode so that the PNG file does
> not go through Window's
> carriage return/linefeed transformations. Before the
> final print statement,
> write binmode(STDOUT).
> 
> Brian O.
> 
> 
> On 5/23/06 9:27 AM, "chen li" <chen_li3 at yahoo.com>
> wrote:
> 
> > BTW I copy  this script from GraphicsHowTo on
> Bioperl
> > website and only one line contains print
> statement,
> > which is 'print $panel->png'. 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From ULNJUJERYDIX at spammotel.com  Wed May 24 21:59:36 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Thu, 25 May 2006 09:59:36 +0800
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have
	negative (-) position numbering imagemap making
Message-ID: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>

Hi
thanks for the help offered thus far!
sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using
bioperl. therefore i was asked to make the numberings as such (-1000) is
there any way at all to do this in bioperl without changing the .pm file?

thanks guys..
kevin


From cjfields at uiuc.edu  Thu May 25 12:43:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 11:43:37 -0500
Subject: [Bioperl-l] Problems with Unflattener.pm
In-Reply-To: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu>
Message-ID: <009d01c6801a$5f75d2a0$15327e82@pyrimidine>

I was able to reproduce this using WinXP and bioperl-live.  Seems to get
caught up in the loop during recursion: debugging shows it is unable to get
past 'find_best_matches: (/15)'.  There are lots of unmatched pairs here
with this sequence, so could that be the problem?  I not terribly familiar
with Unflattener...

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Barry Moore
> Sent: Monday, May 22, 2006 8:00 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Problems with Unflattener.pm
> 
> Hi All,
> 
> NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into
> an infinite recursive loop.  The trouble occurs in the method
> find_best_matches between lines 2258 and 2281, and in particular the
> loop is perpetuated by line 2273.   NT_113910 has a fairly complex
> features table, and but I have as yet been unable to figure out why
> this loop is not exiting properly.  This has been submitted to
> bugzilla, but I'll post here so it gets documented on the list also.
> Any suggestions from Chris or others would be greatly appreciated.
> 
> This problem can be recreated as follows:
> 
> Grab NT_113910 from genbank.
> bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk
> 
> Pass NT_113910.gbk on the command line to the attached script.
> 
> 
> 
> #!/usr/bin/perl;
> 
> use strict;
> use warnings;
> 
> use Bio::SeqIO;
> use Bio::SeqFeature::Tools::Unflattener;
> 
> my $file = shift;
> 
> # generate an Unflattener object
> my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
> #$unflattener->verbose(1);
> 
> # first fetch a genbank SeqI object
> my $seqio =
>      Bio::SeqIO->new(-file   => $file,
>                      -format => 'GenBank');
> my $out =
>      Bio::SeqIO->new(-format => 'asciitree');
> while (my $seq = $seqio->next_seq()) {
> 
>          # get top level unflattended SeqFeatureI objects
>          $unflattener->unflatten_seq(-seq       => $seq,
>                                      -use_magic => 1);
>          $out->write_seq($seq);
> }
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu May 25 15:44:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 14:44:01 -0500
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu>
Message-ID: <00a101c68033$95606dd0$15327e82@pyrimidine>

This is due to recent changes in the SwissProt/UniProt format (there
apparently are many other changes besides this).  

>From UniProtKB news (http://ca.expasy.org/sprot/relnotes/sp_news.html) is
this tidbit:
----------------------------------------------------------
 UniProtKB release 7.0 of 07-Feb-2006

    Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB
releases in the DT lines to displaying the date of the biweekly release at
which an entry is integrated or updated. We dropped the information
concerning the release number and introduced entry and sequence version
numbers in the DT lines.

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino
acid sequence is modified. The entry version number is incremented by one
whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates,
using archived UniProtKB releases.

----------------------------------------------------------

Probably should explain on the swissprot wiki page that the format is in a
state of flux at the moment.  I've added this tidbit to the bug page (#2003)
as well.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Monday, May 22, 2006 9:04 PM
> To: Michael Rogoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> 
> We ask that people post patches to the bugzilla as an attachment to
> the bugzilla so we can track what and why the bug was that the patch
> fixes.
> 
> I am not totally sure this patch works because it seems like we need
> to strip out more information now from the DT line if the $date
> actually contains more information than just the date.
> 
> If you would go ahead and create a bug in bugzilla for  this (http://
> bugzilla.open-bio.org) this sort of conversation can be tracked to
> the bug.
> 
> If any of this is unclear please let us know - I though we had put
> some pages up about this sort of thing on the wiki but maybe they
> need to be expanded.
> 
> -jason
> On May 22, 2006, at 9:51 PM, Michael Rogoff wrote:
> 
> > I have a patch that seems to work but I'm not familiar with the
> > proper method to
> > "provide" it.  How do I go about that?
> >
> > The patch is pretty simple, it just parses the sequence version out
> > of the date
> > line where it now hides:
> >
> >          #date
> >          elsif( /^DT\s+(.*)/ ) {
> >            my $date = $1;
> > +
> > +          if ($date =~ /sequence version (\d+)/i) {
> > +              $params{'-seq_version'} ||= $1;
> > +          }
> > +
> >            $date =~ s/\;//;
> >            $date =~ s/\s+$//;
> >            push @{$params{'-dates'}}, $date;
> >          }
> >
> > By the way, what is the difference between Bio::Seq::version and
> > Bio::Seq::RichSeq::seq_version?
> >
> >
> >> -----Original Message-----
> >> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >> Sent: Monday, May 22, 2006 6:37 PM
> >> To: Michael Rogoff
> >> Cc: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> >>
> >>
> >> Sounds like a "missing feature" =)
> >>
> >> AFAIK the module was only written for swissprot files.  It is
> >> possible there have been changes in the format that have not been
> >> tracked to the current code.  We'd certainly appreciate someone
> >> testing it out as versions evolve.  If you submit a bug to bugzilla
> >> with version of bioperl and example files you can track when
> >> a fix is
> >> in.  We of course appreciate anyone's efforts to provide a patch as
> >> most bugs get fixed of late when someone gets "itchy" enough to fix
> >> them.
> >>
> >> -jason
> >>
> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
> >>
> >>>
> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
> >>> ignores the
> >>> sequence version, and calling seq_version() on the resulting
> >>> RichSeq object
> >>> returns undef.
> >>>
> >>> It looks like swiss.pm is trying to parse the version out
> >> of the SV
> >>> line, which
> >>> apparently doesn't exist any more?  The sequence version(s)
> >> are now
> >>> specified as
> >>> part of the Date (DT) lines.
> >>>
> >>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
> >>>
> >>> Thanks for any help ...
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >>
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From miker at biotiquesystems.com  Mon May 22 21:51:10 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Mon, 22 May 2006 18:51:10 -0700
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <B62A5429-083F-4B93-87EF-0F5DCD4033FE@duke.edu>
Message-ID: <003301c67e0b$5dd44410$c100a8c0@mike>

I have a patch that seems to work but I'm not familiar with the proper method to
"provide" it.  How do I go about that?

The patch is pretty simple, it just parses the sequence version out of the date
line where it now hides:

         #date
         elsif( /^DT\s+(.*)/ ) {
           my $date = $1;
+
+          if ($date =~ /sequence version (\d+)/i) {
+              $params{'-seq_version'} ||= $1;
+          }
+
           $date =~ s/\;//;
           $date =~ s/\s+$//;
           push @{$params{'-dates'}}, $date;
         }

By the way, what is the difference between Bio::Seq::version and
Bio::Seq::RichSeq::seq_version?


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Monday, May 22, 2006 6:37 PM
> To: Michael Rogoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> 
> 
> Sounds like a "missing feature" =)
> 
> AFAIK the module was only written for swissprot files.  It is  
> possible there have been changes in the format that have not been  
> tracked to the current code.  We'd certainly appreciate someone  
> testing it out as versions evolve.  If you submit a bug to bugzilla  
> with version of bioperl and example files you can track when 
> a fix is  
> in.  We of course appreciate anyone's efforts to provide a patch as  
> most bugs get fixed of late when someone gets "itchy" enough to fix  
> them.
> 
> -jason
> 
> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
> 
> >
> > As best as I can tell, using Bio::SeqIO to parse a uniprot file  
> > ignores the
> > sequence version, and calling seq_version() on the resulting  
> > RichSeq object
> > returns undef.
> >
> > It looks like swiss.pm is trying to parse the version out 
> of the SV  
> > line, which
> > apparently doesn't exist any more?  The sequence version(s) 
> are now  
> > specified as
> > part of the Date (DT) lines.
> >
> > Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
> >
> > Thanks for any help ...
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 


From chen_li3 at yahoo.com  Tue May 23 11:48:46 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 08:48:46 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231006.28392.lstein@cshl.edu>
Message-ID: <20060523154846.70831.qmail@web36815.mail.mud.yahoo.com>

Dear Dr. Stein,

I have the job partially done by adding this line
(under Cygwin)

print STDOUT $panel->png;

It is done because I can produce the image to be
viewed by other programs but it is only partially done
because I don't get exactly the same image as that
shown on the website. Enclosed is the image I get.

Thank you,

Li

--- Lincoln Stein <lstein at cshl.edu> wrote:

> Hi,
> 
> It is possible that your version of display can't
> handle PNG images. Try 
> saving the output as a file and then opening it in
> another image program:
> 
> 	perl render_blast1.pl data1.txt > data1.png
> 
> Another thing to watch out for is that, depending on
> what version of Perl 
> you're using, you may have to insert this statement
> into the render_blast1.pl 
> script (somewhere near the top):
> 
> 	binmode STDOUT;
> 
> Lincoln
> 
> 
> On Saturday 20 May 2006 20:15, chen li wrote:
> > Dear all,
> >
> >
> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I
> type
> > this line in Cygwin X window:
> >
> >
> > $ perl render_blast1.pl data1.txt | display -
> >
> > And here is the result:
> >
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> >
> > Any idea?
> >
> >
> > Thank you very much,
> >
> > Li
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING, 
> PLEASE CONTACT MY ASSISTANT, 
> SANDRA MICHELSEN, AT michelse at cshl.edu
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: im1
Type: image/x-png
Size: 2423 bytes
Desc: 2615755531-im1
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060523/6870f840/attachment-0002.bin>

From cjfields at uiuc.edu  Thu May 25 21:28:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 20:28:14 -0500
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike>
References: <003301c67e0b$5dd44410$c100a8c0@mike>
Message-ID: <D422B7D5-C92D-436A-8385-01CFD306DFA8@uiuc.edu>

This patch works only for the recent change in swissprot seq format  
for sequence versions on the DT line.  I checked it out vs the test  
data provided with bioperl (t\data\swiss.dat).  I did manage to get  
it working for both old and new using a modification to your patch  
but there's another issue; using $seq->get_dates, which should only  
show dates, shows the entire line (date and version info).  Jason  
mentioned that there needs to be a better way to address this which  
I'm looking into.

Chris

On May 22, 2006, at 8:51 PM, Michael Rogoff wrote:

> I have a patch that seems to work but I'm not familiar with the  
> proper method to
> "provide" it.  How do I go about that?
>
> The patch is pretty simple, it just parses the sequence version out  
> of the date
> line where it now hides:
>
>          #date
>          elsif( /^DT\s+(.*)/ ) {
>            my $date = $1;
> +
> +          if ($date =~ /sequence version (\d+)/i) {
> +              $params{'-seq_version'} ||= $1;
> +          }
> +
>            $date =~ s/\;//;
>            $date =~ s/\s+$//;
>            push @{$params{'-dates'}}, $date;
>          }
>
> By the way, what is the difference between Bio::Seq::version and
> Bio::Seq::RichSeq::seq_version?
>
>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Monday, May 22, 2006 6:37 PM
>> To: Michael Rogoff
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
>>
>>
>> Sounds like a "missing feature" =)
>>
>> AFAIK the module was only written for swissprot files.  It is
>> possible there have been changes in the format that have not been
>> tracked to the current code.  We'd certainly appreciate someone
>> testing it out as versions evolve.  If you submit a bug to bugzilla
>> with version of bioperl and example files you can track when
>> a fix is
>> in.  We of course appreciate anyone's efforts to provide a patch as
>> most bugs get fixed of late when someone gets "itchy" enough to fix
>> them.
>>
>> -jason
>>
>> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
>>
>>>
>>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
>>> ignores the
>>> sequence version, and calling seq_version() on the resulting
>>> RichSeq object
>>> returns undef.
>>>
>>> It looks like swiss.pm is trying to parse the version out
>> of the SV
>>> line, which
>>> apparently doesn't exist any more?  The sequence version(s)
>> are now
>>> specified as
>>> part of the Date (DT) lines.
>>>
>>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>>>
>>> Thanks for any help ...
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Fri May 26 10:38:29 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 26 May 2006 10:38:29 -0400
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have
	negative (-) position numbering imagemap making
In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
Message-ID: <200605261038.30380.lstein@cshl.edu>

Hi,

For some reason I didn't see the first posting on this. In current bioperl 
live, the ruler can have negative numberings - I use this routinely. You need 
to create a feature that starts in negative coordinates. What is happening to 
you when you try this?

Lincoln

On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> Hi
> thanks for the help offered thus far!
> sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using
> bioperl. therefore i was asked to make the numberings as such (-1000) is
> there any way at all to do this in bioperl without changing the .pm file?
>
> thanks guys..
> kevin
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From jelenaob at gmail.com  Fri May 26 12:47:05 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Fri, 26 May 2006 09:47:05 -0700
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
Message-ID: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>

Hi there,

I have tried loading enzyme list from a file REBASE bairoch.605 using
Bio::Restriction::IO;

1. But for some reason the number of enzymes in the list is always 532
which is a default set of enzymes in enzyme collection.

Is there any known issue with this module or a workaround?

And here is the code I have been using:

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-format=>"Bairoch")
|| die "can't load the file bairoch.605: $!";
my $enzymes = $re_in->read;
print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";

2. The other problem is when trying to use format that is lower-case
it throws an exception, but when "B" is capitalized it is ok.
I assume it cannot load a file and does not initilize enzyme
collection properly.

Can't call method "each_enzyme" on an undefined value at
.../cgi-bin/seq-load.pl line 51.

Any thoughts?


Thanks in advance,


Jelena Obradovic
jelenaob at gmail.com


From cjfields at uiuc.edu  Fri May 26 15:27:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 14:27:13 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
Message-ID: <002601c680fa$644635a0$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> Sent: Friday, May 26, 2006 11:47 AM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
> 
> Hi there,
> 
> I have tried loading enzyme list from a file REBASE bairoch.605 using
> Bio::Restriction::IO;
> 
> 1. But for some reason the number of enzymes in the list is always 532
> which is a default set of enzymes in enzyme collection.
> 
> Is there any known issue with this module or a workaround?
> 
> And here is the code I have been using:
> 
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
> format=>"Bairoch")
> || die "can't load the file bairoch.605: $!";
> my $enzymes = $re_in->read;
> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
 
my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"Bairoch");

should be 

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"bairoch");

Note the case change for the format; this is noted in the bug report you
submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (i.e.
requires a specific format, which I believe is case-sensitive).  Judging by
the modules in Bio/Restriction/IO directory, looks like the
Bio::Restriction::IO format should match one of the following formats:
bairoch, itype2, withrefm, and you can also build your own if needed using
the previous as examples and implementing Bio::Restriction::IO::base.

> 2. The other problem is when trying to use format that is lower-case
> it throws an exception, but when "B" is capitalized it is ok.
> I assume it cannot load a file and does not initilize enzyme
> collection properly.
> 
> Can't call method "each_enzyme" on an undefined value at
> .../cgi-bin/seq-load.pl line 51.

My guess?  The reason it works with an uppercase ('Bairoch') is that it
can't find the module and uses the default set of enzymes as a fallback.
The exception that you reported when you use lowercase ('bairoch') is real
and I reported it as a bug (there are a few I found in that module).

You might want to try using one of the other formats if you can get the
files in the right format from REBASE.  I'm looking into the bugs
specifically associated with Bio::Restriction::IO::bairoch.

> Any thoughts?
> 
> 
> Thanks in advance,
> 
> 
> Jelena Obradovic
> jelenaob at gmail.com
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Fri May 26 15:43:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 26 May 2006 15:43:18 -0400
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine>
Message-ID: <C09CD296.8961%osborne1@optonline.net>

Chris,

SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA'
should work). This is what the documentation says and what the code seems to
suggest. This is probably what the Restriction modules should do as well.

Brian O.


From cjfields at uiuc.edu  Fri May 26 16:21:03 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 15:21:03 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <C09CD296.8961%osborne1@optonline.net>
Message-ID: <002701c68101$e9432540$15327e82@pyrimidine>

Okay, my bad.  Having the format be case-insensitive makes sense and is
probably an easy fix, but there seem to be more serious issues with the
Bio::Restriction::IO modules at the moment.  None have implemented write
methods though POD implies they work:

SYNOPSIS

    use Bio::Restriction::IO;

    $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                     -format => 'withrefm');
    $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                     -format => 'bairoch');
    my $res = $in->read; # a Bio::Restriction::EnzymeCollection
    $out->write($res);

and no tests exist for Bio::Restriction::IO::bairoch yet.  In fact, the
tests are pretty confusing; when did we allow this syntax: '-format => 8'?
Anyway, I'm muddling my way through this and will probably write something
up for the project priority list if I can't work this bug out.  

Chris

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Friday, May 26, 2006 2:43 PM
> To: Chris Fields; 'Jelena Obradovic'; Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file
> 
> Chris,
> 
> SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA'
> should work). This is what the documentation says and what the code seems
> to
> suggest. This is probably what the Restriction modules should do as well.
> 
> Brian O.
> 
> 


From andreas.bender at complife.org  Fri May 26 10:50:03 2006
From: andreas.bender at complife.org (Andreas Bender (CompLife'06))
Date: Fri, 26 May 2006 10:50:03 -0400
Subject: [Bioperl-l] Bioperl-based Applications for "Free Software" Session?
Message-ID: <e83118520605260750w3e66286bmbd6a14be3d2299d6@mail.gmail.com>

Dear All,

Did anyone of you implement some cool programs/tools using Bioperl? Or
is there someone from the Bioperl core team who wants to present
Bioperl itself at our conference? We are holding a "free software"
session (free at least as in free beer, ideally also open source, some
GNU-type license) at our "Computational Life Sciences" Conference in
Cambridge/UK later this year and you are warmly welcome to present
your software there. Please contact me directly or visit the website
in case of any questions.

Enjoy the weekend,
Andreas


                                  Call for Contributions
==================================================
               LIFE SCIENCE FREE SOFTWARE SESSION

          held at CompLife 2006 (http://www.complife.org)
     in Cambridge, United Kingdom, on September 27 - 29, 2006
==================================================
In the last years more and more free and open source software has been
developed for chemo- and bioinformatics, molecular modelling or other
Life Science applications, but many of the programs are not well
known. During the CompLife 2006 conference we will organize a special
session dedicated to this type of free software. The demo session will
be preceeded by a short session having room for brief introductory
presentations whereas the demo session itself will allow attendees to
see the tools in action. Authors of free software will have the
opportunity to present their program to the CompLife audience which
will consist of researchers and users from computer science, biology,
chemistry and everything in between.

In case you are interested in the free software session, send us an
email at fss at complife.org and briefly describe your program and how
you intend to present it at the conference (1-2 pages max - please
include URL to downloadable version where available). The only
restrictions are that the program must be freely available for
everyone or even open source and that it must be related to Life
Science applications. The deadline for these proposals is June, 16th
2006. In mid July we will notify you if your software demo was
accepted.
************************

-- 
Computational Life Sciences '06 Cambridge/UK, 27-29 September 2006:
Visit http://www.complife.org for more information!

Andreas Kieron Patrick Bender - http://www.andreasbender.de
Novartis Institutes for BioMedical Research, Cambridge/MA


From cjfields at uiuc.edu  Fri May 26 17:19:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 16:19:08 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
Message-ID: <002b01c6810a$06642400$15327e82@pyrimidine>

The POD documentation is a bit misleading for Bio::Restriction::IO.  Brian's
right, there needs to be more flexibility with the case for the formats
used.  I found a few other odd things as well which I may file bug reports
for.  Looks like another post for the project priority list.

 
Chris

 
  _____  

From: Jelena Obradovic [mailto:jobradovic at gmail.com] 
Sent: Friday, May 26, 2006 3:56 PM
To: Chris Fields
Cc: Jelena Obradovic; Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file

 
Hi guys, I tried with the other formats, and it works fine with "withrefm"
format but not with "withref".

Thanks a lot for your reponse.

Cheers,

Jelena

On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> Sent: Friday, May 26, 2006 11:47 AM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file 
>
> Hi there,
>
> I have tried loading enzyme list from a file REBASE bairoch.605 using
> Bio::Restriction::IO;
>
> 1. But for some reason the number of enzymes in the list is always 532 
> which is a default set of enzymes in enzyme collection.
>
> Is there any known issue with this module or a workaround?
>
> And here is the code I have been using:
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- 
> format=>"Bairoch")
> || die "can't load the file bairoch.605: $!";
> my $enzymes = $re_in->read;
> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; 

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"Bairoch");

should be

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- 
                                   format=>"bairoch");

Note the case change for the format; this is noted in the bug report you
submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (
i.e.
requires a specific format, which I believe is case-sensitive).  Judging by
the modules in Bio/Restriction/IO directory, looks like the
Bio::Restriction::IO format should match one of the following formats:
bairoch, itype2, withrefm, and you can also build your own if needed using
the previous as examples and implementing Bio::Restriction::IO::base.

> 2. The other problem is when trying to use format that is lower-case 
> it throws an exception, but when "B" is capitalized it is ok.
> I assume it cannot load a file and does not initilize enzyme
> collection properly.
>
> Can't call method "each_enzyme" on an undefined value at 
> .../cgi-bin/seq-load.pl line 51.

My guess?  The reason it works with an uppercase ('Bairoch') is that it
can't find the module and uses the default set of enzymes as a fallback.
The exception that you reported when you use lowercase ('bairoch') is real 
and I reported it as a bug (there are a few I found in that module).

You might want to try using one of the other formats if you can get the
files in the right format from REBASE.  I'm looking into the bugs
specifically associated with Bio::Restriction::IO::bairoch.

> Any thoughts?
>
>
> Thanks in advance,
>
>
> Jelena Obradovic
> jelenaob at gmail.com  <mailto:jelenaob at gmail.com> 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Jelena Obradovic
Email: jobradovic at gmail.com


From jay at jays.net  Sat May 27 12:47:27 2006
From: jay at jays.net (Jay Hannah)
Date: Sat, 27 May 2006 11:47:27 -0500
Subject: [Bioperl-l] "Project OpenLab" (working title)
Message-ID: <4478829F.5030508@jays.net>

Hola --

We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)

   "Project OpenLab":
   http://omaha.pm.org/kwiki/?BioPerl

- Does any such project already exist? 
- If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 
- I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
- I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
- I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
- I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.

Thanks for your time,

j


From fernan at iib.unsam.edu.ar  Sat May 27 18:30:44 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Sat, 27 May 2006 19:30:44 -0300
Subject: [Bioperl-l] "Project OpenLab" (working title)
In-Reply-To: <4478829F.5030508@jays.net>
References: <4478829F.5030508@jays.net>
Message-ID: <20060527223044.GA40583@iib.unsam.edu.ar>

+----[ Jay Hannah <jay at jays.net> (27.May.2006 15:15):
|
| Hola --

Hola!

| We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)
| 
|    "Project OpenLab":
|    http://omaha.pm.org/kwiki/?BioPerl
| 
| - Does any such project already exist? 

mmm ... maybe ... both GUS (Genomics Unified Schema:
gusdb.org, though not developed around bioperl) and GMOD
(Generic Model Organism Database: gmod.org) provide you with 
i) RDBMS storage
ii) a Perl object layer
iii) a web app framework

Though certainly overkill for the needs you describe
in the wiki, they can be customized to work in the way you
describe or at least serve as a guide.

| - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 

Have you considered Perl Catalyst? It has the benefits of
allowing you to work with bioperl modules naturally (it's
Perl!) a choice of templating toolkits (Template Toolkit, Mason,
among others) and will provide you with an almost ready to
go controller/url dispatcher.

| - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
| - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
| - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
| - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.
| 
| Thanks for your time,
| 
| j
|
+----]

Good luck,

Fernan


From epsteinj at mail.nih.gov  Fri May 26 14:46:32 2006
From: epsteinj at mail.nih.gov (Epstein, Jonathan A (NIH/NICHD) [E])
Date: Fri, 26 May 2006 14:46:32 -0400
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler
	havenegative (-) position numbering imagemap making
In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
Message-ID: <42504F69898FE546B3F0238C9BD032750915F8@NIHCESMLBX7.nih.gov>

While this is being discussed and we have Lincoln's attention; in example 4 on the Biographics Howto:
   http://stein.cshl.org/genome_informatics/BioGraphics/Graphics-HOWTO.html
how can one assign directional arrows to the graded segments which represent the BLAST hits?  I.e., is there a glyph type which is both an 'arrow' and a 'graded_segment'?  What other techniques do you recommend for associating directionality with these hits?

Thanks&regards,

Jonathan


From jobradovic at gmail.com  Fri May 26 16:55:35 2006
From: jobradovic at gmail.com (Jelena Obradovic)
Date: Fri, 26 May 2006 13:55:35 -0700
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine>
References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
	<002601c680fa$644635a0$15327e82@pyrimidine>
Message-ID: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>

Hi guys, I tried with the other formats, and it works fine with "withrefm"
format but not with "withref".

Thanks a lot for your reponse.

Cheers,

Jelena

On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> > Sent: Friday, May 26, 2006 11:47 AM
> > To: Bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
> >
> > Hi there,
> >
> > I have tried loading enzyme list from a file REBASE bairoch.605 using
> > Bio::Restriction::IO;
> >
> > 1. But for some reason the number of enzymes in the list is always 532
> > which is a default set of enzymes in enzyme collection.
> >
> > Is there any known issue with this module or a workaround?
> >
> > And here is the code I have been using:
> >
> > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
> > format=>"Bairoch")
> > || die "can't load the file bairoch.605: $!";
> > my $enzymes = $re_in->read;
> > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>                                    format=>"Bairoch");
>
> should be
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>                                    format=>"bairoch");
>
> Note the case change for the format; this is noted in the bug report you
> submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (
> i.e.
> requires a specific format, which I believe is case-sensitive).  Judging
> by
> the modules in Bio/Restriction/IO directory, looks like the
> Bio::Restriction::IO format should match one of the following formats:
> bairoch, itype2, withrefm, and you can also build your own if needed using
> the previous as examples and implementing Bio::Restriction::IO::base.
>
> > 2. The other problem is when trying to use format that is lower-case
> > it throws an exception, but when "B" is capitalized it is ok.
> > I assume it cannot load a file and does not initilize enzyme
> > collection properly.
> >
> > Can't call method "each_enzyme" on an undefined value at
> > .../cgi-bin/seq-load.pl line 51.
>
> My guess?  The reason it works with an uppercase ('Bairoch') is that it
> can't find the module and uses the default set of enzymes as a fallback.
> The exception that you reported when you use lowercase ('bairoch') is real
> and I reported it as a bug (there are a few I found in that module).
>
> You might want to try using one of the other formats if you can get the
> files in the right format from REBASE.  I'm looking into the bugs
> specifically associated with Bio::Restriction::IO::bairoch.
>
> > Any thoughts?
> >
> >
> > Thanks in advance,
> >
> >
> > Jelena Obradovic
> > jelenaob at gmail.com
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Jelena Obradovic
Email: jobradovic at gmail.com


From gad14 at cornell.edu  Fri May 26 16:02:33 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Fri, 26 May 2006 16:02:33 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
Message-ID: <44775ED9.4020208@cornell.edu>

Hi,

I'm running local blast with Bio::Tools::Run::StandAloneBlast. 
Everything seems to work ok up to the point of accessing the results. I 
am able to print the results but when I try to do more than one thing 
with the result, nothing is returned for the second activity..

I'd like to first sort the results into groups of results that hit the 
db seq once, twice, three times, etc - where the results are stored as 
SeqFeature objects in temporary arrays whose contents are printed 
sequentially to stdout when the whole sort is complete.

Secondly, I need to print the results in Hit Table (i.e. -m 8) format to 
stdout.

If I've sorted the results the sorted-results will print to screen, 
however when I try to print the Hit Table results nothing is returned, 
as if the blast results have evaporated.... and visa versa, if i comment 
out the part where i point my sorting subroutine to the blast results 
reference,  my hit table results suddenly prints to screen. It's almost 
like the reference to the SearchIO obj that holds the StandAloneBlast 
results is lost after one use?? (I'm beginning to think there is 
something naive about the way I'm using references?..)


Here's an abbreviated version of my code:


my $ref_seq_objs; # ref to array of Sequence obj's
my $genome_seq; # fasta containing 1 genomic sequence

my @params = ('program' => 'blastn',
	       'database' => $genome_seq,
                 );
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

my $blast_report = $factory->blastall($ref_seq_objs); #OK

#######
### the following 2 actions seem to be mutually exclusive.
# 1) sort results into 1-hitter, 2-hitter, etc. groups of
# SeqFeature objs stored in arrays. arrays are then printed
# to stdout
&sort_results($blast_report);

# 2) print blast results
&print_blast_results($blast_report);
#######


sub print_blast_results{
   my $report = shift;
   while(my $result = $report->next_result()){
     while(my $hit = $result->next_hit()){
       while(my $hsp = $hit->next_hsp()){
	my $q_name = $hsp_q_seq_obj->display_id;
         print join(", ",$q_name,$hit->name,$hsp->bits)."\n";
       }
     }
   }
}


I'm about to lose my mind on this... any assistance appreciated!

Thanks,
Genevieve


From rvosa at sfu.ca  Sun May 28 03:43:23 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Sun, 28 May 2006 00:43:23 -0700
Subject: [Bioperl-l] "Project OpenLab" (working title)
In-Reply-To: <4478829F.5030508@jays.net>
References: <4478829F.5030508@jays.net>
Message-ID: <4479549B.5030202@sfu.ca>

The TreeBaseII team (part of the cipres project: http://www.phylo.org) 
are working on a lab database system for storage of intermediate 
calculation results and data (sequence alignments, trees, taxon sets). I 
think what you're discussing is a bit more molecular and less 
phylogenetic, but it does sound similar in spirit.

Rutger

Jay Hannah wrote:
> Hola --
>
> We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)
>
>    "Project OpenLab":
>    http://omaha.pm.org/kwiki/?BioPerl
>
> - Does any such project already exist? 
> - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 
> - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
> - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
> - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
> - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.
>
> Thanks for your time,
>
> j
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From cjfields at uiuc.edu  Sun May 28 09:55:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 28 May 2006 08:55:47 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
	<002601c680fa$644635a0$15327e82@pyrimidine>
	<286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
Message-ID: <EA78F27A-074E-4C9D-AC70-27D4CC20F8C4@uiuc.edu>

Again, it's b/c 'withrefm' is a valid Restriction::IO module and  
'withref' is not.  Similar to the case issue you saw before with  
'bairoch.'  Making this more lenient would help but there are more  
serious issues with these modules that need to be addressed...

http://www.bioperl.org/wiki/Project_priority_list#Restriction_Enzymes

Chris

On May 26, 2006, at 3:55 PM, Jelena Obradovic wrote:

> Hi guys, I tried with the other formats, and it works fine with  
> "withrefm"
> format but not with "withref".
>
> Thanks a lot for your reponse.
>
> Cheers,
>
> Jelena
>
> On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
>>> Sent: Friday, May 26, 2006 11:47 AM
>>> To: Bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
>>>
>>> Hi there,
>>>
>>> I have tried loading enzyme list from a file REBASE bairoch.605  
>>> using
>>> Bio::Restriction::IO;
>>>
>>> 1. But for some reason the number of enzymes in the list is  
>>> always 532
>>> which is a default set of enzymes in enzyme collection.
>>>
>>> Is there any known issue with this module or a workaround?
>>>
>>> And here is the code I have been using:
>>>
>>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>> format=>"Bairoch")
>>> || die "can't load the file bairoch.605: $!";
>>> my $enzymes = $re_in->read;
>>> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
>>
>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>                                    format=>"Bairoch");
>>
>> should be
>>
>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>                                    format=>"bairoch");
>>
>> Note the case change for the format; this is noted in the bug  
>> report you
>> submitted earlier.  Bio::Restriction::IO works similarly to  
>> Bio::SeqIO (
>> i.e.
>> requires a specific format, which I believe is case-sensitive).   
>> Judging
>> by
>> the modules in Bio/Restriction/IO directory, looks like the
>> Bio::Restriction::IO format should match one of the following  
>> formats:
>> bairoch, itype2, withrefm, and you can also build your own if  
>> needed using
>> the previous as examples and implementing Bio::Restriction::IO::base.
>>
>>> 2. The other problem is when trying to use format that is lower-case
>>> it throws an exception, but when "B" is capitalized it is ok.
>>> I assume it cannot load a file and does not initilize enzyme
>>> collection properly.
>>>
>>> Can't call method "each_enzyme" on an undefined value at
>>> .../cgi-bin/seq-load.pl line 51.
>>
>> My guess?  The reason it works with an uppercase ('Bairoch') is  
>> that it
>> can't find the module and uses the default set of enzymes as a  
>> fallback.
>> The exception that you reported when you use lowercase ('bairoch')  
>> is real
>> and I reported it as a bug (there are a few I found in that module).
>>
>> You might want to try using one of the other formats if you can  
>> get the
>> files in the right format from REBASE.  I'm looking into the bugs
>> specifically associated with Bio::Restriction::IO::bairoch.
>>
>>> Any thoughts?
>>>
>>>
>>> Thanks in advance,
>>>
>>>
>>> Jelena Obradovic
>>> jelenaob at gmail.com
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> -- 
> Jelena Obradovic
> Email: jobradovic at gmail.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From osborne1 at optonline.net  Sun May 28 11:03:37 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 28 May 2006 11:03:37 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
Message-ID: <C09F3409.8992%osborne1@optonline.net>

Genevieve,

Does this simplified code, without the &sort_results($blast_report) line,
work?

By the way, no one can really help you here because you haven't shown us all
of the code. The code you are showing certainly looks OK.


Brian O.


On 5/26/06 4:02 PM, "Genevieve DeClerck" <gad14 at cornell.edu> wrote:

> &sort_results($blast_report);


From simon.rayner.mlist at gmail.com  Mon May 29 03:37:24 2006
From: simon.rayner.mlist at gmail.com (mailing lists)
Date: Mon, 29 May 2006 15:37:24 +0800
Subject: [Bioperl-l] installation problems with bioperl-ext on x86_64
	running SuSE linux
Message-ID: <f73437f70605290037q3c7637e4h29faa3aed16ec77a@mail.gmail.com>

Hello,

i'm having a problem trying to install the bioperl-ext package on my
system.

biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # perl Makefile.PL
Writing Makefile for Bio::Ext::Align
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # make
cc -c  -I./libs -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -fPIC -O2 -fmessage-length=0 -Wall
-D_FORTIFY_SOURCE=2 -g -Wall -pipe   -DVERSION=\"0.1\" -DXS_VERSION=
\"0.1\" -fPIC "-I/usr/lib/perl5/5.8.7/x86_64-linux-thread-multi/CORE"
-DPOSIX -DNOERROR Align.c
In file included from Align.xs:12:
./libs/sw.h:1360:1: warning: "/*" within comment
.
.
.
Running Mkbootstrap for Bio::Ext::Align ()
chmod 644 Align.bs
rm -f blib/arch/auto/Bio/Ext/Align/Align.so
LD_RUN_PATH="" cc  -shared -L/usr/local/lib64 Align.o  -o
blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a  -lm
/usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld:
libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not
be used when making a shared object; recompile with -fPIC
libs/libsw.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #

the -fPIC flag is already set in the makefile.

I found a similar problem in an earlier posting with the following
suggestions....


  From: Aaron J. Mackey <amackey <at> pcbi.upenn.edu>
  Subject: Re: compiling bioperl-ext
  Newsgroups: gmane.comp.lang.perl.bio.general
  Date: 2004-06-09 20:46:05 GMT (1 year, 50 weeks, 3 days, 3 hours and 50
  minutes ago)

  1) Are you starting with a clean build directory?

  2) Does installing other compiled Perl modules work for you (e.g.
  Data::Dumper or Storable)?

  That's a pretty arcane error, and if the answer to #2 is "no", then I
  don't think we can help you.

  -Aaron


....In my case, both 1) and 2) are true.  I installed Data::Dumper without
any problems.


I've found plenty of similar incidences for other sofware and it seems to
relate to
32/64bit issues.

Does anyone have any suggestions about how to get around this?

thanks

Simon Rayner


From ULNJUJERYDIX at spammotel.com  Mon May 29 05:46:21 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Mon, 29 May 2006 17:46:21 +0800
Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the
	ruler have
In-Reply-To: <200605261038.30380.lstein@cshl.edu>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
	<200605261038.30380.lstein@cshl.edu>
Message-ID: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>

Hi!
oh it was in a slightly different header asking about the create image map
feature.
I am using the stable version 1.4 of bioperl now. In any case I have not
added the sequence as a feature annotated seq. as I already have the bp
where the TF binds (in 1-1050 numberings) so what I did was to just add
graded segments based on the position.
I saw that there is a scale function for the arrow glyp however, it is a
multiply function, can it be hacked to take in a offset value (ie minus the
scale by 1000?)

cheers
kevin


Hi,
>
> For some reason I didn't see the first posting on this. In current bioperl
> live, the ruler can have negative numberings - I use this routinely. You
> need
> to create a feature that starts in negative coordinates. What is happening
> to
> you when you try this?
>
> Lincoln
>
> On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > Hi
> > thanks for the help offered thus far!
> > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> using
> > bioperl. therefore i was asked to make the numberings as such (-1000) is
> > there any way at all to do this in bioperl without changing the .pm
> file?
> >
> > thanks guys..
> > kevin
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From shameer at ncbs.res.in  Mon May 29 06:07:17 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 29 May 2006 15:37:17 +0530 (IST)
Subject: [Bioperl-l] Reg. Integrated Server / CGI to pass PDB to multiple
	Servers
Message-ID: <49187.192.168.1.1.1148897237.squirrel@192.168.1.1>

Dear All,

My query may not be directly related to BioPERL, But am sure I will get
some idea to move on. Some possibilities wil be available from Pise or
related modules

Query :
---------
We have several public servers(say a,b,c). All of them will take a
pdb-file as an input and process it and displays it. Now, I need to create
a web page(a meta-server/integrated web-server) with three radio
buttons(a,b,c) and a single input form(to accept pdb file from the users
...:( - File passing as an argument seems to be some what impossible to
me). I need output as 3 links in next page.

Is there any Bio-PERL module / CGI / Perl tricks to do it ?

Thanks in advance,
-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675
W - http://caps.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."


From torsten.seemann at infotech.monash.edu.au  Tue May 30 02:41:31 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 30 May 2006 16:41:31 +1000
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
References: <44775ED9.4020208@cornell.edu>
Message-ID: <447BE91B.30001@infotech.monash.edu.au>

> my $ref_seq_objs; # ref to array of Sequence obj's
> my $genome_seq; # fasta containing 1 genomic sequence
> my @params = ('program' => 'blastn',
> 	       'database' => $genome_seq,
 >                  );

The database parameter needs to be the same thing you would pass to the 
"-d" option in "blastall". I don't think you can pass a perl string 
here. ie. there needs to be a properly formatted set of blast indices 
for your genome sequence on the disk in the appropriate place.
See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html

> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
> my $blast_report = $factory->blastall($ref_seq_objs); #OK

But I could be wrong, and $blast_report here contains a valid BLAST report.

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From sb at mrc-dunn.cam.ac.uk  Tue May 30 03:59:28 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 30 May 2006 08:59:28 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
References: <44775ED9.4020208@cornell.edu>
Message-ID: <447BFB60.4000006@mrc-dunn.cam.ac.uk>

Genevieve DeClerck wrote:
> Hi,
[snip]
> If I've sorted the results the sorted-results will print to screen, 
> however when I try to print the Hit Table results nothing is returned, 
> as if the blast results have evaporated.... and visa versa, if i comment 
> out the part where i point my sorting subroutine to the blast results 
> reference,  my hit table results suddenly prints to screen.
[snip]
> Here's an abbreviated version of my code:
[snip]
> #######
> ### the following 2 actions seem to be mutually exclusive.
> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
> # SeqFeature objs stored in arrays. arrays are then printed
> # to stdout
> &sort_results($blast_report);
> 
> # 2) print blast results
> &print_blast_results($blast_report);

> sub print_blast_results{
>    my $report = shift;
>    while(my $result = $report->next_result()){
[snip]

You didn't give us your sort_results subroutine, but is it as simple as
they both use $report->next_result (and/or $result->next_hit), but you
don't reset the internal counter back to the start, so the second
subroutine tries to get the next_result and finds the first subroutine
has already looked at the last result and so next_result returns false?

 From a quick look it wasn't obvious how to reset the counter. Hopefully
this can be done and someone else knows how.


From torsten.seemann at infotech.monash.edu.au  Tue May 30 04:18:45 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 30 May 2006 18:18:45 +1000
Subject: [Bioperl-l] For CVS developers - potential pitfall with "return
	undef"
Message-ID: <447BFFE5.8010508@infotech.monash.edu.au>

FYI Bioperl developers:

I just audited the bioperl-live CVS and found about 450 occurrences of 
"return undef".

Page 199 of "Perl Best Practices" by Damian Conway, and this URL
http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:

"Use return; instead of return undef; if you want to return nothing. If 
someone assigns the return value to an array, the latter creates an 
array of one value (undef), which evaluates to true. The former will 
correctly handle all contexts."

So I'm guessing at least some of these 450 occurrences *could* result in 
bugs and should probably be changed.

Your opinion may differ :-)

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From cjfields at uiuc.edu  Tue May 30 10:07:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 09:07:45 -0500
Subject: [Bioperl-l] For CVS developers - potential pitfall with
	"returnundef"
In-Reply-To: <447BFFE5.8010508@infotech.monash.edu.au>
Message-ID: <000c01c683f2$6ca62570$15327e82@pyrimidine>

Torsten,

Any way you can post a list of some/all of the offending lines or modules?
Sounds like something to consider, but if the list is as large as you say we
made need something (bugzilla? wiki?) to track the changes and make sure
they pass tests; I'm sure a large majority will.  

I'm guessing Jason would want this somewhere on the project priority list or
bugzilla, with a link to the actual list, but I'm not sure.  Maybe start a
page on the wiki for proposed code changes?

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> Sent: Tuesday, May 30, 2006 3:19 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> "returnundef"
> 
> FYI Bioperl developers:
> 
> I just audited the bioperl-live CVS and found about 450 occurrences of
> "return undef".
> 
> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
> 
> "Use return; instead of return undef; if you want to return nothing. If
> someone assigns the return value to an array, the latter creates an
> array of one value (undef), which evaluates to true. The former will
> correctly handle all contexts."
> 
> So I'm guessing at least some of these 450 occurrences *could* result in
> bugs and should probably be changed.
> 
> Your opinion may differ :-)
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Tue May 30 10:47:48 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 30 May 2006 10:47:48 -0400
Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the
	ruler have
In-Reply-To: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
	<200605261038.30380.lstein@cshl.edu>
	<5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>
Message-ID: <200605301047.49127.lstein@cshl.edu>

Hi Kevin,

I'm afraid that there is no offset value. You'll need the 1.51 version of 
bioperl to handle negative numbers properly. I understand your reluctance to 
upgrade just to get the Bio::Graphics functionality. You might consider 
checking out just the Bio/Graphics subtree and installing that. It should 
work on top of 1.4

Lincoln

On Monday 29 May 2006 05:46, Kevin Lam Koiyau wrote:
> Hi!
> oh it was in a slightly different header asking about the create image map
> feature.
> I am using the stable version 1.4 of bioperl now. In any case I have not
> added the sequence as a feature annotated seq. as I already have the bp
> where the TF binds (in 1-1050 numberings) so what I did was to just add
> graded segments based on the position.
> I saw that there is a scale function for the arrow glyp however, it is a
> multiply function, can it be hacked to take in a offset value (ie minus the
> scale by 1000?)
>
> cheers
> kevin
>
>
> Hi,
>
> > For some reason I didn't see the first posting on this. In current
> > bioperl live, the ruler can have negative numberings - I use this
> > routinely. You need
> > to create a feature that starts in negative coordinates. What is
> > happening to
> > you when you try this?
> >
> > Lincoln
> >
> > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > Hi
> > > thanks for the help offered thus far!
> > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> >
> > using
> >
> > > bioperl. therefore i was asked to make the numberings as such (-1000)
> > > is there any way at all to do this in bioperl without changing the .pm
> >
> > file?
> >
> > > thanks guys..
> > > kevin
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Tue May 30 10:50:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 09:50:06 -0500
Subject: [Bioperl-l] Bio::Restriction::IO issues
Message-ID: <000f01c683f8$5771ed50$15327e82@pyrimidine>

Jason, Brian, et al,

I found several major issues with Bio::Restriction::IO (this popped up while
bug squashing).  In particular, the POD is pretty misleading.  It states
(directly from perldoc):

SYNOPSIS
        use Bio::Restriction::IO;

        $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                         -format => 'withrefm');
        $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                         -format => 'bairoch');
        my $res = $in->read; # a Bio::Restriction::EnzymeCollection
        $out->write($res);

      # or

      #    use Bio::Restriction::IO;
      #
      #    #input file format can be read from the file extension (dat|xml)
      #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
      #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
      #
      #    # World's shortest flat<->xml format converter:
      #    print $out $_ while <$in>;

So, I have found several problems with these modules.  I really hate to
criticize code here, as my own is pretty hacky, but I think these are things
to seriously mull over: 

1)	Note that, though some of the lines above are commented they are
still there in POD and thus present in perldoc/pod2html etc.  So, judging
from the above, it suggests using the script above should read in from one
format and write out to another (like SeqIO).  However, NONE of the current
write() methods are implemented for any of the IO modules (withref, base,
itype2, bairoch), so this does not happen as expected.  You get the nasty
thrown 'method not implemented error' instead when writing.
2)	The commented statements in POD above also suggest that REBASE XML
format is supported when there is no XML module.  
3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
made it unusable until I added a few small changes; it still can't handle
multisite/multicut enzymes properly, so in essence it is useless until that
is addressed.
4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make
up it's own methods?  

I'm working on at least getting the 'bairoch' input format up and running
(so at least it gets the enzymes into a
Bio::Restriction::Enzyme::Collection).  From this point I'm not sure where
to proceed.  The POD obviously needs to be corrected to reflect that writing
formats is not implemented (and the bit about XML should be taken out
completely); that's the easy part which I am working on and plan committing
today.  However, these modules don't seem to be used too frequently so I'm
not sure whether it's worth spending too much time getting these up to speed
at the moment (adding write methods, switching to Bio::Root::Root, etc); I
have other priorities at the moment (including a way overdue ListSummary).
I'm also not sure who else is (using|working) on these so I don't want to
(make too many changes|step on someone else's toes), but these are, IMHO,
pretty serious problems.  

Any thoughts?

Chris


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Tue May 30 12:34:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 11:34:18 -0500
Subject: [Bioperl-l] Bio::Restriction::IO changes
Message-ID: <001401c68406$e71e9850$15327e82@pyrimidine>

Jason, Brian, et al:

I have made changes to the Bio::Restriction::IO POD to remove any reference
to write functions since almost none have been implemented yet, so including
this into POD is a bit misleading.  At the moment, you can't write to any
REBASE format except for 'base', which I found is the only one that works.
And, upon further checking, even that one has issues: it looks like there
are problems with multicut/multisite enzymes when writing in 'base' format
which I'm not delving into ('TaqII' only displays one site when writing when
it has two cut sites).  I'll add this to the wiki and a bug report
(enhancement) for this module.

I am also removing mention of XML and 'bairoch' formats (the former isn't
present and the latter is broken at the moment) and added a few things to
the POD TO DO section.  

Rob (if you're out there somewhere in the ether), have you made any more
changes to these modules that need to be committed?  Didn't know if any of
these issues have already been addressed/changed etc.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From jelenaob at gmail.com  Tue May 30 00:58:35 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Mon, 29 May 2006 21:58:35 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>

Hello everybody,

does anybody know how to remove the background color of the Panel.
Currently, I am not adding anything to it, so I can troubleshot the problem,
and I have tried setting up
all color attributes I could find to the panel, but no luck. Whatever I do,
I get the BLUE border of the panel.

Has anybody faced the same problem?

Thanks in advance,

Jelena

And here is the code I am currently using:

-----------------------------------------------------------------------------------------------------------
my $panel =
    Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
                              -width => 800,
                              -pad_left => 10,
                              -pad_right => 10,
                              -key_color => 'white',
                              -bgcolor => 'white',
                              -gridcolor=>'black',
                              -fgcolor => 'black',
                              -grid => 0,
                              );
   my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
     -url  => '/tmpimages');
   #make clickable image
   print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
   print $map;

-----------------------------------------------------------------------------------------------------------


From jelenaob at gmail.com  Tue May 30 00:58:35 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Mon, 29 May 2006 21:58:35 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>

Hello everybody,

does anybody know how to remove the background color of the Panel.
Currently, I am not adding anything to it, so I can troubleshot the problem,
and I have tried setting up
all color attributes I could find to the panel, but no luck. Whatever I do,
I get the BLUE border of the panel.

Has anybody faced the same problem?

Thanks in advance,

Jelena

And here is the code I am currently using:

-----------------------------------------------------------------------------------------------------------
my $panel =
    Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
                              -width => 800,
                              -pad_left => 10,
                              -pad_right => 10,
                              -key_color => 'white',
                              -bgcolor => 'white',
                              -gridcolor=>'black',
                              -fgcolor => 'black',
                              -grid => 0,
                              );
   my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
     -url  => '/tmpimages');
   #make clickable image
   print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
   print $map;

-----------------------------------------------------------------------------------------------------------


From luciap at sas.upenn.edu  Tue May 30 14:49:48 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Tue, 30 May 2006 14:49:48 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
Message-ID: <1149014988.447c93cc01761@128.91.55.38>

Hi

I am here again, I finally got to write the "collapse nodes" function and have a
couple of questions.

In order to collpase any node $node, I first have to get the parent
which I can do as $parent=$node->ancestor

and then the children as:
@children=$node->get_all_Descendents (or should I use each descendent?)

Then before deleting $node I have to assign all its children to $parent,
and here is where I am kind of confussed.
Can I use the add_Descendent function for this?
I've been tryig to write something like this:
foreach $child (@children){
         $parent=add_Descendent->$child;
}
but this doesn't work and I think it is because I don't have any idea of what I
am doing
any suggestions?

thanks


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From rvosa at sfu.ca  Tue May 30 14:52:52 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 11:52:52 -0700
Subject: [Bioperl-l] For CVS developers - potential pitfall
	with	"returnundef"
In-Reply-To: <000c01c683f2$6ca62570$15327e82@pyrimidine>
References: <000c01c683f2$6ca62570$15327e82@pyrimidine>
Message-ID: <447C9484.9030102@sfu.ca>

Although I agree with the sentiment of following PBP, I'm not so sure 
changing 'return undef' to 'return' *now* will fix any bugs without 
introducing new, subtle ones.

Chris Fields wrote:
> Torsten,
>
> Any way you can post a list of some/all of the offending lines or modules?
> Sounds like something to consider, but if the list is as large as you say we
> made need something (bugzilla? wiki?) to track the changes and make sure
> they pass tests; I'm sure a large majority will.  
>
> I'm guessing Jason would want this somewhere on the project priority list or
> bugzilla, with a link to the actual list, but I'm not sure.  Maybe start a
> page on the wiki for proposed code changes?
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>> Sent: Tuesday, May 30, 2006 3:19 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>> "returnundef"
>>
>> FYI Bioperl developers:
>>
>> I just audited the bioperl-live CVS and found about 450 occurrences of
>> "return undef".
>>
>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
>>
>> "Use return; instead of return undef; if you want to return nothing. If
>> someone assigns the return value to an array, the latter creates an
>> array of one value (undef), which evaluates to true. The former will
>> correctly handle all contexts."
>>
>> So I'm guessing at least some of these 450 occurrences *could* result in
>> bugs and should probably be changed.
>>
>> Your opinion may differ :-)
>>
>> --
>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>> Victorian Bioinformatics Consortium, Monash University, Australia
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From luciap at sas.upenn.edu  Tue May 30 16:11:52 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Tue, 30 May 2006 16:11:52 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
Message-ID: <1149019912.447ca7085124e@128.91.55.38>

Hi
OK that was silly, but what I have in my code is what you just wrote
But the problem is that if I write

$parent->add_Descendent($child)

it tells me that I am calling  the method "ass_Descendent" on an undefined value
(but I did define $parent before??)

So here it goes the code so far:

use Bio::TreeIO;
 my $in = new Bio::TreeIO(-file => 'Test2.tre',
                          -format => 'newick');
 my $out = new Bio::TreeIO(-file => '>mytree.out',
                           -format => 'newick');
 while( my $tree = $in->next_tree ) {
    foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
    my $bootstrap=$node->_creation_id;

    if ($bootstrap < 70 ){
            my $parent = $node->ancestor;
            my @children=$node->get_all_Descendents;
            foreach my $child (@children){
                $parent->add_Descendent($child);
            }

........

eventually I'll add (once I assigned the children to the parent succesfully):
$tree->remove_Node($node);

        }
    }
    $out->write_tree($tree);
}

Quoting aaron.j.mackey at gsk.com:

> > foreach $child (@children){
> >          $parent=add_Descendent->$child;
> > }
>
> I think what you want is $parent->add_Descendent($child)
>
> -Aaron
>


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From jason.stajich at duke.edu  Tue May 30 16:30:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 30 May 2006 16:30:56 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <1149019912.447ca7085124e@128.91.55.38>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
Message-ID: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>

you need to special case the root - it won't have an ancestor.  just  
protect the my $parent = $node->ancestor with an if statement as I  
did below

On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:

> Hi
> OK that was silly, but what I have in my code is what you just wrote
> But the problem is that if I write
>
> $parent->add_Descendent($child)
>
> it tells me that I am calling  the method "ass_Descendent" on an  
> undefined value
> (but I did define $parent before??)
>
> So here it goes the code so far:
>
> use Bio::TreeIO;
>  my $in = new Bio::TreeIO(-file => 'Test2.tre',
>                           -format => 'newick');
>  my $out = new Bio::TreeIO(-file => '>mytree.out',
>                            -format => 'newick');
>  while( my $tree = $in->next_tree ) {
>     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
>     my $bootstrap=$node->_creation_id;
>
>     if ($bootstrap < 70 ){
>    >>> if(        my $parent = $node->ancestor ) {
>               my @children=$node->get_all_Descendents;
>               foreach my $child (@children){
>                  $parent->add_Descendent($child);
>               }
         }
>
> ........
>
> eventually I'll add (once I assigned the children to the parent  
> succesfully):
> $tree->remove_Node($node);
>
>         }
>     }
>     $out->write_tree($tree);
> }
>
> Quoting aaron.j.mackey at gsk.com:
>
>>> foreach $child (@children){
>>>          $parent=add_Descendent->$child;
>>> }
>>
>> I think what you want is $parent->add_Descendent($child)
>>
>> -Aaron
>>
>
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Tue May 30 17:40:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 16:40:18 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith	"returnundef"
In-Reply-To: <447C9484.9030102@sfu.ca>
Message-ID: <001801c68431$a586b2d0$15327e82@pyrimidine>

Agreed, though I think these changes should be implemented at some point
(Conway's argument here makes sense and it is nice for Torsten to check this
out).  If proper tests are written then any changes resulting in errors
should be picked up by checking the appropriate test suite, though I know it
doesn't absolutely guarantee it.  ; P  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> Sent: Tuesday, May 30, 2006 1:53 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> "returnundef"
> 
> Although I agree with the sentiment of following PBP, I'm not so sure
> changing 'return undef' to 'return' *now* will fix any bugs without
> introducing new, subtle ones.
> 
> Chris Fields wrote:
> > Torsten,
> >
> > Any way you can post a list of some/all of the offending lines or
> modules?
> > Sounds like something to consider, but if the list is as large as you
> say we
> > made need something (bugzilla? wiki?) to track the changes and make sure
> > they pass tests; I'm sure a large majority will.
> >
> > I'm guessing Jason would want this somewhere on the project priority
> list or
> > bugzilla, with a link to the actual list, but I'm not sure.  Maybe start
> a
> > page on the wiki for proposed code changes?
> >
> > Chris
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> >> Sent: Tuesday, May 30, 2006 3:19 AM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> >> "returnundef"
> >>
> >> FYI Bioperl developers:
> >>
> >> I just audited the bioperl-live CVS and found about 450 occurrences of
> >> "return undef".
> >>
> >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
> >>
> >> "Use return; instead of return undef; if you want to return nothing. If
> >> someone assigns the return value to an array, the latter creates an
> >> array of one value (undef), which evaluates to true. The former will
> >> correctly handle all contexts."
> >>
> >> So I'm guessing at least some of these 450 occurrences *could* result
> in
> >> bugs and should probably be changed.
> >>
> >> Your opinion may differ :-)
> >>
> >> --
> >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> >> Victorian Bioinformatics Consortium, Monash University, Australia
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
> 
> --
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Rutger Vos, PhD. candidate
> Department of Biological Sciences
> Simon Fraser University
> 8888 University Drive
> Burnaby, BC, V5A1S6
> Phone: 604-291-5625
> Fax: 604-291-3496
> Personal site: http://www.sfu.ca/~rvosa
> FAB* lab: http://www.sfu.ca/~fabstar
> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rvosa at sfu.ca  Tue May 30 17:58:25 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 14:58:25 -0700
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001901c68433$026b1ad0$15327e82@pyrimidine>
References: <001901c68433$026b1ad0$15327e82@pyrimidine>
Message-ID: <447CC001.4050000@sfu.ca>

I've been following the perl6 mailing lists for a while now. I think 
this time around it won't really take that long (one year?) for 
pugs/perl6 stacks to become more than just toys. I think especially 
large projects, like bioperl, will really benefit from the improved OO 
implementation in perl6, so it might be of interest to at least 
fantasize about it.

Chris Fields wrote:
> Ha!  Or may be the 'nonexistent' bioperl-experimental.  Wonder what'll
> happen once Perl6 comes to term?
>
> -CJF
>
>   
>> -----Original Message-----
>> From: Rutger Vos [mailto:rvosa at sfu.ca]
>> Sent: Tuesday, May 30, 2006 4:48 PM
>> To: Chris Fields
>> Subject: Re: [Bioperl-l] For CVS developers - potential
>> pitfallwith"returnundef"
>>
>> Surely this will all sort itself out in bioperl6 ;-)
>>
>> Chris Fields wrote:
>>     
>>> Agreed, though I think these changes should be implemented at some point
>>> (Conway's argument here makes sense and it is nice for Torsten to check
>>>       
>> this
>>     
>>> out).  If proper tests are written then any changes resulting in errors
>>> should be picked up by checking the appropriate test suite, though I
>>>       
>> know it
>>     
>>> doesn't absolutely guarantee it.  ; P
>>>
>>> Chris
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
>>>> Sent: Tuesday, May 30, 2006 1:53 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
>>>> "returnundef"
>>>>
>>>> Although I agree with the sentiment of following PBP, I'm not so sure
>>>> changing 'return undef' to 'return' *now* will fix any bugs without
>>>> introducing new, subtle ones.
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> Torsten,
>>>>>
>>>>> Any way you can post a list of some/all of the offending lines or
>>>>>
>>>>>           
>>>> modules?
>>>>
>>>>         
>>>>> Sounds like something to consider, but if the list is as large as you
>>>>>
>>>>>           
>>>> say we
>>>>
>>>>         
>>>>> made need something (bugzilla? wiki?) to track the changes and make
>>>>>           
>> sure
>>     
>>>>> they pass tests; I'm sure a large majority will.
>>>>>
>>>>> I'm guessing Jason would want this somewhere on the project priority
>>>>>
>>>>>           
>>>> list or
>>>>
>>>>         
>>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
>>>>>           
>> start
>>     
>>>> a
>>>>
>>>>         
>>>>> page on the wiki for proposed code changes?
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>>>>>> "returnundef"
>>>>>>
>>>>>> FYI Bioperl developers:
>>>>>>
>>>>>> I just audited the bioperl-live CVS and found about 450 occurrences
>>>>>>             
>> of
>>     
>>>>>> "return undef".
>>>>>>
>>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
>>>>>>             
>> suggest:
>>     
>>>>>> "Use return; instead of return undef; if you want to return nothing.
>>>>>>             
>> If
>>     
>>>>>> someone assigns the return value to an array, the latter creates an
>>>>>> array of one value (undef), which evaluates to true. The former will
>>>>>> correctly handle all contexts."
>>>>>>
>>>>>> So I'm guessing at least some of these 450 occurrences *could* result
>>>>>>
>>>>>>             
>>>> in
>>>>
>>>>         
>>>>>> bugs and should probably be changed.
>>>>>>
>>>>>> Your opinion may differ :-)
>>>>>>
>>>>>> --
>>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>             
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> --
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Rutger Vos, PhD. candidate
>>>> Department of Biological Sciences
>>>> Simon Fraser University
>>>> 8888 University Drive
>>>> Burnaby, BC, V5A1S6
>>>> Phone: 604-291-5625
>>>> Fax: 604-291-3496
>>>> Personal site: http://www.sfu.ca/~rvosa
>>>> FAB* lab: http://www.sfu.ca/~fabstar
>>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>>
>>>
>>>
>>>       
>> --
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Rutger Vos, PhD. candidate
>> Department of Biological Sciences
>> Simon Fraser University
>> 8888 University Drive
>> Burnaby, BC, V5A1S6
>> Phone: 604-291-5625
>> Fax: 604-291-3496
>> Personal site: http://www.sfu.ca/~rvosa
>> FAB* lab: http://www.sfu.ca/~fabstar
>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>     
>
>
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From cjfields at uiuc.edu  Tue May 30 18:08:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 17:08:26 -0500
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <447CC001.4050000@sfu.ca>
Message-ID: <001a01c68435$93135a50$15327e82@pyrimidine>

Agreed.  I would say, probably 6-12 months time, might be a good idea to try
getting something actually started, maybe under the 'bioperl-experimental'
title Jason has mentioned.  One could always try getting a Bio::Root-like
object going in Pugs/Perl6 as a starter and work up from there, with
emphasis on key areas (seq. parsing, so on).

CJF

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> Sent: Tuesday, May 30, 2006 4:58 PM
> To: bioperl list
> Subject: Re: [Bioperl-l] For CVS developers -
> potentialpitfallwith"returnundef"
> 
> I've been following the perl6 mailing lists for a while now. I think
> this time around it won't really take that long (one year?) for
> pugs/perl6 stacks to become more than just toys. I think especially
> large projects, like bioperl, will really benefit from the improved OO
> implementation in perl6, so it might be of interest to at least
> fantasize about it.
> 
> Chris Fields wrote:
> > Ha!  Or may be the 'nonexistent' bioperl-experimental.  Wonder what'll
> > happen once Perl6 comes to term?
> >
> > -CJF
> >
> >
> >> -----Original Message-----
> >> From: Rutger Vos [mailto:rvosa at sfu.ca]
> >> Sent: Tuesday, May 30, 2006 4:48 PM
> >> To: Chris Fields
> >> Subject: Re: [Bioperl-l] For CVS developers - potential
> >> pitfallwith"returnundef"
> >>
> >> Surely this will all sort itself out in bioperl6 ;-)
> >>
> >> Chris Fields wrote:
> >>
> >>> Agreed, though I think these changes should be implemented at some
> point
> >>> (Conway's argument here makes sense and it is nice for Torsten to
> check
> >>>
> >> this
> >>
> >>> out).  If proper tests are written then any changes resulting in
> errors
> >>> should be picked up by checking the appropriate test suite, though I
> >>>
> >> know it
> >>
> >>> doesn't absolutely guarantee it.  ; P
> >>>
> >>> Chris
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> >>>> Sent: Tuesday, May 30, 2006 1:53 PM
> >>>> To: bioperl-l at lists.open-bio.org
> >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> >>>> "returnundef"
> >>>>
> >>>> Although I agree with the sentiment of following PBP, I'm not so sure
> >>>> changing 'return undef' to 'return' *now* will fix any bugs without
> >>>> introducing new, subtle ones.
> >>>>
> >>>> Chris Fields wrote:
> >>>>
> >>>>
> >>>>> Torsten,
> >>>>>
> >>>>> Any way you can post a list of some/all of the offending lines or
> >>>>>
> >>>>>
> >>>> modules?
> >>>>
> >>>>
> >>>>> Sounds like something to consider, but if the list is as large as
> you
> >>>>>
> >>>>>
> >>>> say we
> >>>>
> >>>>
> >>>>> made need something (bugzilla? wiki?) to track the changes and make
> >>>>>
> >> sure
> >>
> >>>>> they pass tests; I'm sure a large majority will.
> >>>>>
> >>>>> I'm guessing Jason would want this somewhere on the project priority
> >>>>>
> >>>>>
> >>>> list or
> >>>>
> >>>>
> >>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> >>>>>
> >> start
> >>
> >>>> a
> >>>>
> >>>>
> >>>>> page on the wiki for proposed code changes?
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
> >>>>>> To: bioperl-l at lists.open-bio.org
> >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> >>>>>> "returnundef"
> >>>>>>
> >>>>>> FYI Bioperl developers:
> >>>>>>
> >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences
> >>>>>>
> >> of
> >>
> >>>>>> "return undef".
> >>>>>>
> >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> >>>>>>
> >> suggest:
> >>
> >>>>>> "Use return; instead of return undef; if you want to return
> nothing.
> >>>>>>
> >> If
> >>
> >>>>>> someone assigns the return value to an array, the latter creates an
> >>>>>> array of one value (undef), which evaluates to true. The former
> will
> >>>>>> correctly handle all contexts."
> >>>>>>
> >>>>>> So I'm guessing at least some of these 450 occurrences *could*
> result
> >>>>>>
> >>>>>>
> >>>> in
> >>>>
> >>>>
> >>>>>> bugs and should probably be changed.
> >>>>>>
> >>>>>> Your opinion may differ :-)
> >>>>>>
> >>>>>> --
> >>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
> >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> --
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Rutger Vos, PhD. candidate
> >>>> Department of Biological Sciences
> >>>> Simon Fraser University
> >>>> 8888 University Drive
> >>>> Burnaby, BC, V5A1S6
> >>>> Phone: 604-291-5625
> >>>> Fax: 604-291-3496
> >>>> Personal site: http://www.sfu.ca/~rvosa
> >>>> FAB* lab: http://www.sfu.ca/~fabstar
> >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >> --
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Rutger Vos, PhD. candidate
> >> Department of Biological Sciences
> >> Simon Fraser University
> >> 8888 University Drive
> >> Burnaby, BC, V5A1S6
> >> Phone: 604-291-5625
> >> Fax: 604-291-3496
> >> Personal site: http://www.sfu.ca/~rvosa
> >> FAB* lab: http://www.sfu.ca/~fabstar
> >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >
> >
> >
> >
> >
> >
> 
> --
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Rutger Vos, PhD. candidate
> Department of Biological Sciences
> Simon Fraser University
> 8888 University Drive
> Burnaby, BC, V5A1S6
> Phone: 604-291-5625
> Fax: 604-291-3496
> Personal site: http://www.sfu.ca/~rvosa
> FAB* lab: http://www.sfu.ca/~fabstar
> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ULNJUJERYDIX at spammotel.com  Tue May 30 23:45:12 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 31 May 2006 11:45:12 +0800
Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values
Message-ID: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>

I am so sorry for the truncated email accidentally hit reply.
if anyone is interested i have opted to change

change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
in linux its
/usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm


      $gd->string($font,$middle,$center+$a2-1,$label,$font_color)

to

      $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)

just  for this one-off use.


strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
option for coords offset?
    my $relative_coords_offset = $self->option('relative_coords_offset');
    $relative_coords_offset    = 1 unless defined $relative_coords_offset;
but entering the option -relative_coords_offset=>1000 in the arrow glyphs
didn't do anything...


Hi!
> oh it was in a slightly different header asking about the create image map
> feature.
> I am using the stable version 1.4 of bioperl now. In any case I have not
> added the sequence as a feature annotated seq. as I already have the bp
> where the TF binds (in 1-1050 numberings) so what I did was to just add
> graded segments based on the position.
> I saw that there is a scale function for the arrow glyp however, it is a
> multiply function, can it be hacked to take in a offset value (ie minus
> the
> scale by 1000?)
>
> cheers
> kevin
>
>
> Hi,
> >
> > For some reason I didn't see the first posting on this. In current
> bioperl
> > live, the ruler can have negative numberings - I use this routinely. You
> > need
> > to create a feature that starts in negative coordinates. What is
> happening
> > to
> > you when you try this?
> >
> > Lincoln
> >
> > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > Hi
> > > thanks for the help offered thus far!
> > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > using
> > > bioperl. therefore i was asked to make the numberings as such (-1000)
> is
> > > there any way at all to do this in bioperl without changing the .pm
> > file?
> > >
> > > thanks guys..
> > > kevin
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From sb at mrc-dunn.cam.ac.uk  Wed May 31 04:40:08 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 31 May 2006 09:40:08 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447C7985.9000404@cornell.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
Message-ID: <447D5668.7070500@mrc-dunn.cam.ac.uk>

Genevieve DeClerck wrote:
> Thanks for your comment Sendu, it was very helpful. I think this must be 
> what's going on.. I am using $blast_report->next_result in both 
> subroutines. It appears that analyzing the blast results first w/ my 
> sort subroutine empties (?) the $blast_result object so that when I try 
> to print, there is nothing left to print. (and visa-versa when I print 
> first then try to sort).
> So, from the looks of things, using next_result has the effect of 
> popping the Bio::Search::Result::ResultI objects off of the SearchIO 
> blast report object??

Not quite. It's more or less exactly like opening a file and then trying 
to read it all twice like this:
open(FILE, "file");
while (<FILE>) {
     print # prints each line in the file
}
while (<FILE>) {
     print # never happens, we never enter this while loop
}

To get the second while loop to print anything we need to say seek(FILE, 
0, 0) before it. Or in the first while loop store each line in an array, 
and then make the second loop a foreach through that array.


> It seems I could get around this by making a copy of the blast report by 
> setting it to another new variable...(not the most elegant solution) but 
> I'm having trouble with this...
> 
> If I do:
> 
>     my $blast_report_copy = $blast_report;
> 
> I'm just copying the reference to the SearchIO blast result, so it 
> doesn't help me. How can I make another physical copy of this blast 
> result object? Seems like a simple thing but how to do it is escaping me.

Not really a good idea, and it may not work anyway if the object 
contains a filehandle. But for a simple object you might recursively 
loop through the data structure and copy each element out into a similar 
data structure.


> But better yet, the way to go is to 'reset the counter,' or to find a 
> way to look at/print/sort the results without removing data from the 
> blast result object. How is this done though??

It would be rather nice if this worked:
my $blast_report = $factory->blastall($ref_seq_objs);
my $blast_fh = $blast_report->fh();
while (<$blast_fh>) {
     # $_ is a ResultI object, use as normal
}
seek($blast_fh, 0, 0); # this would be great, but does it work?
while <$blast_fh>) {
     # go through the results again in your second subroutine
}

An alternative hacky way of doing it, which may also not work, would be 
to go through your $blast_report as normal, but then before going 
through it a second time, say
my $fh = $blast_report->_fh;
seek($fh, 0, 0);

Finally, the most sensible way (assuming bioperl provides no methods of 
its own for this) of solving the problem is, the first time you go 
through each next_result, next_hit and next_hsp, just store the returned 
objects in an array of arrays of arrays. Then the second time get the 
objects from your array structure instead of with the method calls.


From heikki at sanbi.ac.za  Wed May 31 06:55:18 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 31 May 2006 12:55:18 +0200
Subject: [Bioperl-l]
	=?iso-8859-1?q?For_CVS_developers_-_potential_pitfall?=
	=?iso-8859-1?q?with_=22returnundef=22?=
In-Reply-To: <001801c68431$a586b2d0$15327e82@pyrimidine>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
Message-ID: <200605311255.19166.heikki@sanbi.ac.za>

In my opinion the sooner the bugs get exposed the better. It is much more 
likely that there is a well hidden bug caused by assigning accidentally undef 
into an one element array that someone intentionally writing code that 
expects that behaviour!

I removed (but did not commit yet) all undefs from my old Bio::Variation code 
and could not see any differences in the test output. 

Let's remove them!

	-Heikki

On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> Agreed, though I think these changes should be implemented at some point
> (Conway's argument here makes sense and it is nice for Torsten to check
> this out).  If proper tests are written then any changes resulting in
> errors should be picked up by checking the appropriate test suite, though I
> know it doesn't absolutely guarantee it.  ; P
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > Sent: Tuesday, May 30, 2006 1:53 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > "returnundef"
> >
> > Although I agree with the sentiment of following PBP, I'm not so sure
> > changing 'return undef' to 'return' *now* will fix any bugs without
> > introducing new, subtle ones.
> >
> > Chris Fields wrote:
> > > Torsten,
> > >
> > > Any way you can post a list of some/all of the offending lines or
> >
> > modules?
> >
> > > Sounds like something to consider, but if the list is as large as you
> >
> > say we
> >
> > > made need something (bugzilla? wiki?) to track the changes and make
> > > sure they pass tests; I'm sure a large majority will.
> > >
> > > I'm guessing Jason would want this somewhere on the project priority
> >
> > list or
> >
> > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > start
> >
> > a
> >
> > > page on the wiki for proposed code changes?
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > >> To: bioperl-l at lists.open-bio.org
> > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > >> "returnundef"
> > >>
> > >> FYI Bioperl developers:
> > >>
> > >> I just audited the bioperl-live CVS and found about 450 occurrences of
> > >> "return undef".
> > >>
> > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > >> suggest:
> > >>
> > >> "Use return; instead of return undef; if you want to return nothing.
> > >> If someone assigns the return value to an array, the latter creates an
> > >> array of one value (undef), which evaluates to true. The former will
> > >> correctly handle all contexts."
> > >>
> > >> So I'm guessing at least some of these 450 occurrences *could* result
> >
> > in
> >
> > >> bugs and should probably be changed.
> > >>
> > >> Your opinion may differ :-)
> > >>
> > >> --
> > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Rutger Vos, PhD. candidate
> > Department of Biological Sciences
> > Simon Fraser University
> > 8888 University Drive
> > Burnaby, BC, V5A1S6
> > Phone: 604-291-5625
> > Fax: 604-291-3496
> > Personal site: http://www.sfu.ca/~rvosa
> > FAB* lab: http://www.sfu.ca/~fabstar
> > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of the Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Wed May 31 06:44:28 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 31 May 2006 12:44:28 +0200
Subject: [Bioperl-l] Bio::Restriction::IO issues
In-Reply-To: <000f01c683f8$5771ed50$15327e82@pyrimidine>
References: <000f01c683f8$5771ed50$15327e82@pyrimidine>
Message-ID: <200605311244.29187.heikki@sanbi.ac.za>


Chris,

Thanks for stepping in. I feel partly responsible here because I originally 
changed some of Rob's code but have not followed up since.

There have not been active development on these modules so do not worry about 
stepping on anyone's toes.

   -Heikki

On Tuesday 30 May 2006 16:50, Chris Fields wrote:
> Jason, Brian, et al,
>
> I found several major issues with Bio::Restriction::IO (this popped up
> while bug squashing).  In particular, the POD is pretty misleading.  It
> states (directly from perldoc):
>
> SYNOPSIS
>         use Bio::Restriction::IO;
>
>         $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>                                          -format => 'withrefm');
>         $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>                                          -format => 'bairoch');
>         my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>         $out->write($res);
>
>       # or
>
>       #    use Bio::Restriction::IO;
>       #
>       #    #input file format can be read from the file extension (dat|xml)
>       #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
>       #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>       #
>       #    # World's shortest flat<->xml format converter:
>       #    print $out $_ while <$in>;
>
> So, I have found several problems with these modules.  I really hate to
> criticize code here, as my own is pretty hacky, but I think these are
> things to seriously mull over:
>
> 1)	Note that, though some of the lines above are commented they are
> still there in POD and thus present in perldoc/pod2html etc.  So, judging
> from the above, it suggests using the script above should read in from one
> format and write out to another (like SeqIO).  However, NONE of the current
> write() methods are implemented for any of the IO modules (withref, base,
> itype2, bairoch), so this does not happen as expected.  You get the nasty
> thrown 'method not implemented error' instead when writing.
> 2)	The commented statements in POD above also suggest that REBASE XML
> format is supported when there is no XML module.
> 3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
> made it unusable until I added a few small changes; it still can't handle
> multisite/multicut enzymes properly, so in essence it is useless until that
> is addressed.
> 4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
> why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make
> up it's own methods?
>
> I'm working on at least getting the 'bairoch' input format up and running
> (so at least it gets the enzymes into a
> Bio::Restriction::Enzyme::Collection).  From this point I'm not sure where
> to proceed.  The POD obviously needs to be corrected to reflect that
> writing formats is not implemented (and the bit about XML should be taken
> out completely); that's the easy part which I am working on and plan
> committing today.  However, these modules don't seem to be used too
> frequently so I'm not sure whether it's worth spending too much time
> getting these up to speed at the moment (adding write methods, switching to
> Bio::Root::Root, etc); I have other priorities at the moment (including a
> way overdue ListSummary). I'm also not sure who else is (using|working) on
> these so I don't want to (make too many changes|step on someone else's
> toes), but these are, IMHO, pretty serious problems.
>
> Any thoughts?
>
> Chris
>
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of the Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From cjfields at uiuc.edu  Wed May 31 09:10:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 08:10:00 -0500
Subject: [Bioperl-l] Bio::Restriction::IO issues
In-Reply-To: <200605311244.29187.heikki@sanbi.ac.za>
References: <000f01c683f8$5771ed50$15327e82@pyrimidine>
	<200605311244.29187.heikki@sanbi.ac.za>
Message-ID: <C8B60E1D-D5A5-4661-AA2B-CEE1E5B5D758@uiuc.edu>

Heikki,

I mainly just changed a few things so no one would get the wrong  
ideas from POD (that they write format as well) and added a few  
things to the TO DO.  I also added a warning to  
Bio::Restriction::IO::bairoch for the multisite/multicut issue.   
Besides that I haven't done much to them.  I also added a bit to the  
Project Priority List in case someone wants to take it up.  I may  
tinker with it but it's not really high on my priority list.  I've  
been pretty busy getting the ListSummaries back up to speed (very  
busy mail lists since the last one) and am writing/testing a new  
interface to NCBI EUtilities which I may donate at some in the next  
few months or so.

Chris


On May 31, 2006, at 5:44 AM, Heikki Lehvaslaiho wrote:

>
> Chris,
>
> Thanks for stepping in. I feel partly responsible here because I  
> originally
> changed some of Rob's code but have not followed up since.
>
> There have not been active development on these modules so do not  
> worry about
> stepping on anyone's toes.
>
>    -Heikki
>
> On Tuesday 30 May 2006 16:50, Chris Fields wrote:
>> Jason, Brian, et al,
>>
>> I found several major issues with Bio::Restriction::IO (this  
>> popped up
>> while bug squashing).  In particular, the POD is pretty  
>> misleading.  It
>> states (directly from perldoc):
>>
>> SYNOPSIS
>>         use Bio::Restriction::IO;
>>
>>         $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>>                                          -format => 'withrefm');
>>         $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>>                                          -format => 'bairoch');
>>         my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>>         $out->write($res);
>>
>>       # or
>>
>>       #    use Bio::Restriction::IO;
>>       #
>>       #    #input file format can be read from the file extension  
>> (dat|xml)
>>       #    $in  = Bio::Restriction::IO->newFh(-file =>  
>> "inputfilename");
>>       #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>>       #
>>       #    # World's shortest flat<->xml format converter:
>>       #    print $out $_ while <$in>;
>>
>> So, I have found several problems with these modules.  I really  
>> hate to
>> criticize code here, as my own is pretty hacky, but I think these are
>> things to seriously mull over:
>>
>> 1)	Note that, though some of the lines above are commented they are
>> still there in POD and thus present in perldoc/pod2html etc.  So,  
>> judging
>> from the above, it suggests using the script above should read in  
>> from one
>> format and write out to another (like SeqIO).  However, NONE of  
>> the current
>> write() methods are implemented for any of the IO modules  
>> (withref, base,
>> itype2, bairoch), so this does not happen as expected.  You get  
>> the nasty
>> thrown 'method not implemented error' instead when writing.
>> 2)	The commented statements in POD above also suggest that REBASE XML
>> format is supported when there is no XML module.
>> 3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
>> made it unusable until I added a few small changes; it still can't  
>> handle
>> multisite/multicut enzymes properly, so in essence it is useless  
>> until that
>> is addressed.
>> 4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
>> why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO  
>> and make
>> up it's own methods?
>>
>> I'm working on at least getting the 'bairoch' input format up and  
>> running
>> (so at least it gets the enzymes into a
>> Bio::Restriction::Enzyme::Collection).  From this point I'm not  
>> sure where
>> to proceed.  The POD obviously needs to be corrected to reflect that
>> writing formats is not implemented (and the bit about XML should  
>> be taken
>> out completely); that's the easy part which I am working on and plan
>> committing today.  However, these modules don't seem to be used too
>> frequently so I'm not sure whether it's worth spending too much time
>> getting these up to speed at the moment (adding write methods,  
>> switching to
>> Bio::Root::Root, etc); I have other priorities at the moment  
>> (including a
>> way overdue ListSummary). I'm also not sure who else is (using| 
>> working) on
>> these so I don't want to (make too many changes|step on someone  
>> else's
>> toes), but these are, IMHO, pretty serious problems.
>>
>> Any thoughts?
>>
>> Chris
>>
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of the Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jay at jays.net  Wed May 31 09:07:10 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 08:07:10 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
Message-ID: <447D94FE.8090305@jays.net>

http://www.bioperl.org/wiki/Bptutorial.pl

I think I just partially fulfilled this TODO:

  TODO: check if the POD is in the Wiki yet, and if not, put it here? 

I used Pod::Simple::Wiki (format 'mediawiki') to burn bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the wiki page via my web browser. (Is that proper procedure? Is the plan to just do that manually from time to time as the document changes?)

Now what?

Should there be a new link on the far left of bioperl.org called "Tutorial"? 

It's an amazing document. IMHO it should be listed prominently on bioperl.org.

HTH,

j


From osborne1 at optonline.net  Wed May 31 09:58:01 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 31 May 2006 09:58:01 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447D94FE.8090305@jays.net>
Message-ID: <C0A31929.89F9%osborne1@optonline.net>

Jay,

Excellent! Now we need to answer a few more questions for ourselves:

- Do we remove the file bptutorial.pl from the package now? I'd say yes, we
don't want to have to maintain two bptutorials.

- What do we do with the script part of bptutorial.pl? It certainly could be
excised and put into the examples/ directory, for example, but this would
break a few of the paths that are being used.

- A link to bptutorial? Or a link to the existing tutorials page?
http://www.bioperl.org/wiki/Tutorials.

Any thoughts on these?


Brian O.


On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:

> http://www.bioperl.org/wiki/Bptutorial.pl
> 
> I think I just partially fulfilled this TODO:
> 
>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> 
> I used Pod::Simple::Wiki (format 'mediawiki') to burn
> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the
> wiki page via my web browser. (Is that proper procedure? Is the plan to just
> do that manually from time to time as the document changes?)
> 
> Now what?
> 
> Should there be a new link on the far left of bioperl.org called "Tutorial"?
> 
> It's an amazing document. IMHO it should be listed prominently on bioperl.org.
> 
> HTH,
> 
> j
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From luciap at sas.upenn.edu  Wed May 31 10:06:13 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Wed, 31 May 2006 10:06:13 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
	<6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
Message-ID: <1149084373.447da2d5c5339@128.91.55.38>

Hi
Thanks
a couple more questions
why is the bootstrap value stored as the node id? Is that right?

also, in the add_descendant method, how do you set the $ignoreoverwrite
parameter to true?

Lucia

Quoting Jason Stajich <jason.stajich at duke.edu>:

> you need to special case the root - it won't have an ancestor.  just
> protect the my $parent = $node->ancestor with an if statement as I
> did below
>
> On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:
>
> > Hi
> > OK that was silly, but what I have in my code is what you just wrote
> > But the problem is that if I write
> >
> > $parent->add_Descendent($child)
> >
> > it tells me that I am calling  the method "ass_Descendent" on an
> > undefined value
> > (but I did define $parent before??)
> >
> > So here it goes the code so far:
> >
> > use Bio::TreeIO;
> >  my $in = new Bio::TreeIO(-file => 'Test2.tre',
> >                           -format => 'newick');
> >  my $out = new Bio::TreeIO(-file => '>mytree.out',
> >                            -format => 'newick');
> >  while( my $tree = $in->next_tree ) {
> >     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
> >     my $bootstrap=$node->_creation_id;
> >
> >     if ($bootstrap < 70 ){
> >    >>> if(        my $parent = $node->ancestor ) {
> >               my @children=$node->get_all_Descendents;
> >               foreach my $child (@children){
> >                  $parent->add_Descendent($child);
> >               }
>          }
> >
> > ........
> >
> > eventually I'll add (once I assigned the children to the parent
> > succesfully):
> > $tree->remove_Node($node);
> >
> >         }
> >     }
> >     $out->write_tree($tree);
> > }
> >
> > Quoting aaron.j.mackey at gsk.com:
> >
> >>> foreach $child (@children){
> >>>          $parent=add_Descendent->$child;
> >>> }
> >>
> >> I think what you want is $parent->add_Descendent($child)
> >>
> >> -Aaron
> >>
> >
> >
> > Lucia Peixoto
> > Department of Biology,SAS
> > University of Pennsylvania
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From sb at mrc-dunn.cam.ac.uk  Wed May 31 10:56:49 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 31 May 2006 15:56:49 +0100
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>

Heikki Lehvaslaiho wrote:
> In my opinion the sooner the bugs get exposed the better. It is much more 
> likely that there is a well hidden bug caused by assigning accidentally undef 
> into an one element array that someone intentionally writing code that 
> expects that behaviour!
> 
> I removed (but did not commit yet) all undefs from my old Bio::Variation code 
> and could not see any differences in the test output. 
> 
> Let's remove them!

Just looking for all return undef;s isn't enough. It's entirely possible 
to do something like:

my $return_value;
{
   # do something that assigns to return_value on success
   # on failure, just do nothing
}
return $return_value;

The bioperl docs will typically explicitly state that undef is returned, 
and under what circumstance. If a user suffers from the 
undef-into-array-problem, yes it can be slightly unexpected, but lots of 
unexpected things will happen when you don't use a method correctly, as 
per the docs!

Fixing the return of undef is either a job that shouldn't be done, or a 
much harder job than expected.


From bernd.web at gmail.com  Wed May 31 10:30:30 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 31 May 2006 16:30:30 +0200
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <447D94FE.8090305@jays.net> <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <716af09c0605310730o7de20489m674a07b5a928039d@mail.gmail.com>

Hi,

I am not sure to what extent bptutorial will be removed, but
I actually like having bptutorial.pl in my BioPerl base for reference.

regards,
Bernd

On 5/31/06, Brian Osborne <osborne1 at optonline.net> wrote:
> Jay,
>
> Excellent! Now we need to answer a few more questions for ourselves:
>
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.
>
> - What do we do with the script part of bptutorial.pl? It certainly could be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.
>
> - A link to bptutorial? Or a link to the existing tutorials page?
> http://www.bioperl.org/wiki/Tutorials.
>
> Any thoughts on these?
>
>
> Brian O.
>
>
> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
>
> > http://www.bioperl.org/wiki/Bptutorial.pl
> >
> > I think I just partially fulfilled this TODO:
> >
> >   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >
> > I used Pod::Simple::Wiki (format 'mediawiki') to burn
> > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the
> > wiki page via my web browser. (Is that proper procedure? Is the plan to just
> > do that manually from time to time as the document changes?)
> >
> > Now what?
> >
> > Should there be a new link on the far left of bioperl.org called "Tutorial"?
> >
> > It's an amazing document. IMHO it should be listed prominently on bioperl.org.
> >
> > HTH,
> >
> > j
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From lstein at cshl.edu  Wed May 31 12:03:13 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 12:03:13 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <200605311203.13922.lstein@cshl.edu>

I'm afraid that everything depends on the context. If the subroutine is 
documented to return a single scalar, then returning undef is appropriate. If 
the subroutine is documented to return "false" on failure, then one must call 
return (or "return ()" ).

Changing all the return undefs to return is going to expose hidden bugs in the 
code written by people who are using BioPerl. While I agree wholeheartedly 
with the proposed audit, I think we need to expect that people are going to 
complain.

Lincoln


On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote:
> In my opinion the sooner the bugs get exposed the better. It is much more
> likely that there is a well hidden bug caused by assigning accidentally
> undef into an one element array that someone intentionally writing code
> that expects that behaviour!
>
> I removed (but did not commit yet) all undefs from my old Bio::Variation
> code and could not see any differences in the test output.
>
> Let's remove them!
>
> 	-Heikki
>
> On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> > Agreed, though I think these changes should be implemented at some point
> > (Conway's argument here makes sense and it is nice for Torsten to check
> > this out).  If proper tests are written then any changes resulting in
> > errors should be picked up by checking the appropriate test suite, though
> > I know it doesn't absolutely guarantee it.  ; P
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > > Sent: Tuesday, May 30, 2006 1:53 PM
> > > To: bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > > "returnundef"
> > >
> > > Although I agree with the sentiment of following PBP, I'm not so sure
> > > changing 'return undef' to 'return' *now* will fix any bugs without
> > > introducing new, subtle ones.
> > >
> > > Chris Fields wrote:
> > > > Torsten,
> > > >
> > > > Any way you can post a list of some/all of the offending lines or
> > >
> > > modules?
> > >
> > > > Sounds like something to consider, but if the list is as large as you
> > >
> > > say we
> > >
> > > > made need something (bugzilla? wiki?) to track the changes and make
> > > > sure they pass tests; I'm sure a large majority will.
> > > >
> > > > I'm guessing Jason would want this somewhere on the project priority
> > >
> > > list or
> > >
> > > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > > start
> > >
> > > a
> > >
> > > > page on the wiki for proposed code changes?
> > > >
> > > > Chris
> > > >
> > > >> -----Original Message-----
> > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > > >> To: bioperl-l at lists.open-bio.org
> > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > > >> "returnundef"
> > > >>
> > > >> FYI Bioperl developers:
> > > >>
> > > >> I just audited the bioperl-live CVS and found about 450 occurrences
> > > >> of "return undef".
> > > >>
> > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > > >> suggest:
> > > >>
> > > >> "Use return; instead of return undef; if you want to return nothing.
> > > >> If someone assigns the return value to an array, the latter creates
> > > >> an array of one value (undef), which evaluates to true. The former
> > > >> will correctly handle all contexts."
> > > >>
> > > >> So I'm guessing at least some of these 450 occurrences *could*
> > > >> result
> > >
> > > in
> > >
> > > >> bugs and should probably be changed.
> > > >>
> > > >> Your opinion may differ :-)
> > > >>
> > > >> --
> > > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > > >>
> > > >> _______________________________________________
> > > >> Bioperl-l mailing list
> > > >> Bioperl-l at lists.open-bio.org
> > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Rutger Vos, PhD. candidate
> > > Department of Biological Sciences
> > > Simon Fraser University
> > > 8888 University Drive
> > > Burnaby, BC, V5A1S6
> > > Phone: 604-291-5625
> > > Fax: 604-291-3496
> > > Personal site: http://www.sfu.ca/~rvosa
> > > FAB* lab: http://www.sfu.ca/~fabstar
> > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed May 31 12:34:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 11:34:54 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <001201c684d0$263c5530$15327e82@pyrimidine>

Brian, Jay,

I think it would be nice to have the tutorial prominently displayed somehow
(Jay's suggestion), with a link provided via the tutorials page.  Hopefully
this will help with the bioperl newbies.

Jay, looks like there are still some weird formatting issues with the
bptutorial wiki page, something which I ran into before when getting the
Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
spaces preceding a line denotes code for some reason).  Not much you can do
in these cases except remove the extra spaces in those spots.  Looking good
though!  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Wednesday, May 31, 2006 8:58 AM
> To: Jay Hannah; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Jay,
> 
> Excellent! Now we need to answer a few more questions for ourselves:
> 
> - Do we remove the file bptutorial.pl from the package now? I'd say yes,
> we
> don't want to have to maintain two bptutorials.
> 
> - What do we do with the script part of bptutorial.pl? It certainly could
> be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.
> 
> - A link to bptutorial? Or a link to the existing tutorials page?
> http://www.bioperl.org/wiki/Tutorials.
> 
> Any thoughts on these?
> 
> 
> Brian O.
> 
> 
> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
> 
> > http://www.bioperl.org/wiki/Bptutorial.pl
> >
> > I think I just partially fulfilled this TODO:
> >
> >   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >
> > I used Pod::Simple::Wiki (format 'mediawiki') to burn
> > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
> the
> > wiki page via my web browser. (Is that proper procedure? Is the plan to
> just
> > do that manually from time to time as the document changes?)
> >
> > Now what?
> >
> > Should there be a new link on the far left of bioperl.org called
> "Tutorial"?
> >
> > It's an amazing document. IMHO it should be listed prominently on
> bioperl.org.
> >
> > HTH,
> >
> > j
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 31 12:44:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 11:44:31 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311203.13922.lstein@cshl.edu>
Message-ID: <001301c684d1$7e849fd0$15327e82@pyrimidine>

My feeling is the test suite 'should' pick up a large majority of problems
if changes are made to these lines, the quotes there indicating the utopian
idea that the tests are all written well (I believe 99% of the tests are,
BTW).  You can always try the changes (wholesale or on smaller chunks of
code), see if they pass tests on different OS's using 'make/nmake test',
revert the ones that didn't pass, etc.  It's a matter of someone willing to
try it out.

I think the original argument proposed here (originating from Damian Conway
and 'Perl Best Practices') is maybe using 'return undef' is something we
shouldn't be doing since this can lead to subtle errors itself.  Not that
everything we do is considered 'a good practice' by any means.  If I
remember correctly from 'OOPerl', Conway doesn't like combined get/setters
either (he prefers separate getters and setters); we use the 'bad' combined
version predominately in Bioperl.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> Sent: Wednesday, May 31, 2006 11:03 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Heikki Lehvaslaiho
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> I'm afraid that everything depends on the context. If the subroutine is
> documented to return a single scalar, then returning undef is appropriate.
> If
> the subroutine is documented to return "false" on failure, then one must
> call
> return (or "return ()" ).
> 
> Changing all the return undefs to return is going to expose hidden bugs in
> the
> code written by people who are using BioPerl. While I agree wholeheartedly
> with the proposed audit, I think we need to expect that people are going
> to
> complain.
> 
> Lincoln
> 
> 
> On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote:
> > In my opinion the sooner the bugs get exposed the better. It is much
> more
> > likely that there is a well hidden bug caused by assigning accidentally
> > undef into an one element array that someone intentionally writing code
> > that expects that behaviour!
> >
> > I removed (but did not commit yet) all undefs from my old Bio::Variation
> > code and could not see any differences in the test output.
> >
> > Let's remove them!
> >
> > 	-Heikki
> >
> > On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> > > Agreed, though I think these changes should be implemented at some
> point
> > > (Conway's argument here makes sense and it is nice for Torsten to
> check
> > > this out).  If proper tests are written then any changes resulting in
> > > errors should be picked up by checking the appropriate test suite,
> though
> > > I know it doesn't absolutely guarantee it.  ; P
> > >
> > > Chris
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > > > Sent: Tuesday, May 30, 2006 1:53 PM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > > > "returnundef"
> > > >
> > > > Although I agree with the sentiment of following PBP, I'm not so
> sure
> > > > changing 'return undef' to 'return' *now* will fix any bugs without
> > > > introducing new, subtle ones.
> > > >
> > > > Chris Fields wrote:
> > > > > Torsten,
> > > > >
> > > > > Any way you can post a list of some/all of the offending lines or
> > > >
> > > > modules?
> > > >
> > > > > Sounds like something to consider, but if the list is as large as
> you
> > > >
> > > > say we
> > > >
> > > > > made need something (bugzilla? wiki?) to track the changes and
> make
> > > > > sure they pass tests; I'm sure a large majority will.
> > > > >
> > > > > I'm guessing Jason would want this somewhere on the project
> priority
> > > >
> > > > list or
> > > >
> > > > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > > > start
> > > >
> > > > a
> > > >
> > > > > page on the wiki for proposed code changes?
> > > > >
> > > > > Chris
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > > > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > > > >> To: bioperl-l at lists.open-bio.org
> > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > > > >> "returnundef"
> > > > >>
> > > > >> FYI Bioperl developers:
> > > > >>
> > > > >> I just audited the bioperl-live CVS and found about 450
> occurrences
> > > > >> of "return undef".
> > > > >>
> > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > > > >> suggest:
> > > > >>
> > > > >> "Use return; instead of return undef; if you want to return
> nothing.
> > > > >> If someone assigns the return value to an array, the latter
> creates
> > > > >> an array of one value (undef), which evaluates to true. The
> former
> > > > >> will correctly handle all contexts."
> > > > >>
> > > > >> So I'm guessing at least some of these 450 occurrences *could*
> > > > >> result
> > > >
> > > > in
> > > >
> > > > >> bugs and should probably be changed.
> > > > >>
> > > > >> Your opinion may differ :-)
> > > > >>
> > > > >> --
> > > > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > > > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > > > >>
> > > > >> _______________________________________________
> > > > >> Bioperl-l mailing list
> > > > >> Bioperl-l at lists.open-bio.org
> > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > Rutger Vos, PhD. candidate
> > > > Department of Biological Sciences
> > > > Simon Fraser University
> > > > 8888 University Drive
> > > > Burnaby, BC, V5A1S6
> > > > Phone: 604-291-5625
> > > > Fax: 604-291-3496
> > > > Personal site: http://www.sfu.ca/~rvosa
> > > > FAB* lab: http://www.sfu.ca/~fabstar
> > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Wed May 31 10:59:53 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 10:59:53 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <949F348A-391B-495D-ABCE-30BABC37FF05@gmx.net>

I agree. Thanks to Torsten for the audit and Chris for stepping up.

	-hilmar

On May 31, 2006, at 6:55 AM, Heikki Lehvaslaiho wrote:

> In my opinion the sooner the bugs get exposed the better. It is  
> much more
> likely that there is a well hidden bug caused by assigning  
> accidentally undef
> into an one element array that someone intentionally writing code that
> expects that behaviour!
>
> I removed (but did not commit yet) all undefs from my old  
> Bio::Variation code
> and could not see any differences in the test output.
>
> Let's remove them!
>
> 	-Heikki
>
> On Tuesday 30 May 2006 23:40, Chris Fields wrote:
>> Agreed, though I think these changes should be implemented at some  
>> point
>> (Conway's argument here makes sense and it is nice for Torsten to  
>> check
>> this out).  If proper tests are written then any changes resulting in
>> errors should be picked up by checking the appropriate test suite,  
>> though I
>> know it doesn't absolutely guarantee it.  ; P
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
>>> Sent: Tuesday, May 30, 2006 1:53 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
>>> "returnundef"
>>>
>>> Although I agree with the sentiment of following PBP, I'm not so  
>>> sure
>>> changing 'return undef' to 'return' *now* will fix any bugs without
>>> introducing new, subtle ones.
>>>
>>> Chris Fields wrote:
>>>> Torsten,
>>>>
>>>> Any way you can post a list of some/all of the offending lines or
>>>
>>> modules?
>>>
>>>> Sounds like something to consider, but if the list is as large  
>>>> as you
>>>
>>> say we
>>>
>>>> made need something (bugzilla? wiki?) to track the changes and make
>>>> sure they pass tests; I'm sure a large majority will.
>>>>
>>>> I'm guessing Jason would want this somewhere on the project  
>>>> priority
>>>
>>> list or
>>>
>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
>>>> start
>>>
>>> a
>>>
>>>> page on the wiki for proposed code changes?
>>>>
>>>> Chris
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>>>>> "returnundef"
>>>>>
>>>>> FYI Bioperl developers:
>>>>>
>>>>> I just audited the bioperl-live CVS and found about 450  
>>>>> occurrences of
>>>>> "return undef".
>>>>>
>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
>>>>> suggest:
>>>>>
>>>>> "Use return; instead of return undef; if you want to return  
>>>>> nothing.
>>>>> If someone assigns the return value to an array, the latter  
>>>>> creates an
>>>>> array of one value (undef), which evaluates to true. The former  
>>>>> will
>>>>> correctly handle all contexts."
>>>>>
>>>>> So I'm guessing at least some of these 450 occurrences *could*  
>>>>> result
>>>
>>> in
>>>
>>>>> bugs and should probably be changed.
>>>>>
>>>>> Your opinion may differ :-)
>>>>>
>>>>> --
>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Rutger Vos, PhD. candidate
>>> Department of Biological Sciences
>>> Simon Fraser University
>>> 8888 University Drive
>>> Burnaby, BC, V5A1S6
>>> Phone: 604-291-5625
>>> Fax: 604-291-3496
>>> Personal site: http://www.sfu.ca/~rvosa
>>> FAB* lab: http://www.sfu.ca/~fabstar
>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of the Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed May 31 14:08:43 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:08:43 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311203.13922.lstein@cshl.edu>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
	<200605311203.13922.lstein@cshl.edu>
Message-ID: <FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>


On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:

> If the subroutine is documented to return "false" on failure, then  
> one must call
> return (or "return ()" ).

The problem seems to be that 'a value that evaluates to either true  
or false' and 'a [meaningful] value or undef' and 'a value or  
false' ('a value or no value) are not the same in perl. And what  
would/should one expect if the doc states 'true on success and false  
otherwise'?

Maybe the documentation should also be fixed to avoid any ambiguity.  
I.e., avoid documenting 'a value or false' because it may be  
ambiguous (not only) to the less proficient. 'True or false' should  
imply a value being returned.

Comments?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lstein at cshl.edu  Wed May 31 14:14:59 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 14:14:59 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311203.13922.lstein@cshl.edu>
	<FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
Message-ID: <200605311415.00414.lstein@cshl.edu>

If the documentation says "returns false" then I expect to be able to do this:

	@result = foo();
	die "foo() failed" unless @result;

If the documentation says "returns undef" then I expect this:

	@result = foo();
	die "foo() failed" unless $result[0];

Lincoln


On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > If the subroutine is documented to return "false" on failure, then
> > one must call
> > return (or "return ()" ).
>
> The problem seems to be that 'a value that evaluates to either true
> or false' and 'a [meaningful] value or undef' and 'a value or
> false' ('a value or no value) are not the same in perl. And what
> would/should one expect if the doc states 'true on success and false
> otherwise'?
>
> Maybe the documentation should also be fixed to avoid any ambiguity.
> I.e., avoid documenting 'a value or false' because it may be
> ambiguous (not only) to the less proficient. 'True or false' should
> imply a value being returned.
>
> Comments?
>
> 	-hilmar

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From hlapp at gmx.net  Wed May 31 14:31:21 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:31:21 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311415.00414.lstein@cshl.edu>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311203.13922.lstein@cshl.edu>
	<FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
	<200605311415.00414.lstein@cshl.edu>
Message-ID: <241E77AE-8D1E-4708-9C4C-8A9619822DB4@gmx.net>


On May 31, 2006, at 2:14 PM, Lincoln Stein wrote:

> If the documentation says "returns false" then I expect to be able  
> to do this:
>
> 	@result = foo();
> 	die "foo() failed" unless @result;

Except if the alternative to 'false' would be a scalar, you normally  
wouldn't assign it to an array, would you?

I.e., I wouldn't expect this strict of a behavior from an open-source  
package written largely from people whose job is biological science,  
not programming perl knowing and following DC to the letter ... I'd  
rather be on the safe side and assign to a scalar.

Just my $0.02 ...

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed May 31 14:50:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 13:50:30 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>
Message-ID: <001801c684e3$16e33730$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Wednesday, May 31, 2006 9:57 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> Heikki Lehvaslaiho wrote:
> > In my opinion the sooner the bugs get exposed the better. It is much
> more
> > likely that there is a well hidden bug caused by assigning accidentally
> undef
> > into an one element array that someone intentionally writing code that
> > expects that behaviour!
> >
> > I removed (but did not commit yet) all undefs from my old Bio::Variation
> code
> > and could not see any differences in the test output.
> >
> > Let's remove them!
> 
> Just looking for all return undef;s isn't enough. It's entirely possible
> to do something like:
> 
> my $return_value;
> {
>    # do something that assigns to return_value on success
>    # on failure, just do nothing
> }
> return $return_value;

Agreed, though looking for these is obviously much harder.  

The way to get around those is:

return $return_value if $return_value;
return;

which I've seen used in a number of get/set methods. 

> The bioperl docs will typically explicitly state that undef is returned,
> and under what circumstance. If a user suffers from the
> undef-into-array-problem, yes it can be slightly unexpected, but lots of
> unexpected things will happen when you don't use a method correctly, as
> per the docs!

Right, but the argument you make is that code will always work as expected
from the perldoc examples.  My recent experiences with the
Bio::Restriction::IO and Bio::Species classes show that the docs are not
always up-to-date and may indicate the unimplemented intent of the author
more than the actual implementation.  Again, I believe a large majority of
the docs are fine, but it's those few errors that made a devil's advocate of
me...

> Fixing the return of undef is either a job that shouldn't be done, or a
> much harder job than expected.

I don't think ignoring the problem is the best answer here though I agree
the problem is more complicated than at first glance.  Judging from code I'm
trolled through a bit lately I've seen a lot of methods (mainly get/setters)
that are essentially copied multiple times in the same or across similar
modules to save time.  You could see a scenario where, in those instances,
so-called 'bad code' would spread quite quickly.

I think adding a wiki page to address some of these issues would be nice,
something separate from the Project Priority List.

Chris
 _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From forward at hongyu.org  Wed May 31 14:03:46 2006
From: forward at hongyu.org (Hongyu Zhang)
Date: Wed, 31 May 2006 11:03:46 -0700
Subject: [Bioperl-l] New functions for SimpleAlign.pm
Message-ID: <20060531110346.78xod658td8o0w0w@hongyu.org>

Greetings,

I am a new member in this mailing list. Nice to be here.

I wrote two more functions for the alignment module SimpleAlign.pm  
that calculate the percentage of identity based on the shortest and  
longest sequence length, respectively. I also found an error in the  
no_residues() function that calculate the number of residues in the  
alignment.

I am wondering whether they can be added to the official bioperl  
package. I've contacted the original author of this module, Heikki  
Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet.

Thanks.

-- 
Hongyu Zhang, Ph.D.
Computational biologist
Ceres Inc.


From cjfields at uiuc.edu  Wed May 31 15:39:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 14:39:26 -0500
Subject: [Bioperl-l] New functions for SimpleAlign.pm
In-Reply-To: <20060531110346.78xod658td8o0w0w@hongyu.org>
Message-ID: <001901c684e9$ed4a1720$15327e82@pyrimidine>

I added a bit to the FAQ about this:

http://www.bioperl.org/wiki/FAQ#How_do_I_submit_a_patch_or_enhancement_to_Bi
oPerl.3F

and the HOWTO explains things a bit more directly:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

In brief, these need to be submitted to Bugzilla as either code enhancements
(for your added methods) or bugs with the patch to the relevant code.  Code
enhancements probably should include some code and test cases to demonstrate
usage.  Patches to buggy code are checked to make sure they pass relevant
tests by the core developers.  Submitting it to the mail list is definitely
the first step, though, so you're on the right path.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hongyu Zhang
> Sent: Wednesday, May 31, 2006 1:04 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] New functions for SimpleAlign.pm
> 
> Greetings,
> 
> I am a new member in this mailing list. Nice to be here.
> 
> I wrote two more functions for the alignment module SimpleAlign.pm
> that calculate the percentage of identity based on the shortest and
> longest sequence length, respectively. I also found an error in the
> no_residues() function that calculate the number of residues in the
> alignment.
> 
> I am wondering whether they can be added to the official bioperl
> package. I've contacted the original author of this module, Heikki
> Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet.
> 
> Thanks.
> 
> --
> Hongyu Zhang, Ph.D.
> Computational biologist
> Ceres Inc.
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 31 16:40:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 15:40:19 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311415.00414.lstein@cshl.edu>
Message-ID: <002001c684f2$6fb7daf0$15327e82@pyrimidine>

What about modules that have 'throw_not_implemented' statements present?
Here's a list with the total for each.  Some of these are interfaces (I got
rid of a number that ended in 'I' or 'IO' to remove the I/IO interfaces but
it misses a few).  There are a number here that are implementations, though
(Bio::AlignIO::maf, Bio::Restriction:IO::*), so they are technically
incomplete:

Instances: 1	Module : Bio::AlignIO::maf
Instances: 25	Module : Bio::Assembly::Contig
Instances: 2	Module : Bio::Assembly::ContigAnalysis
Instances: 2	Module : Bio::Biblio::BiblioBase
Instances: 4	Module : Bio::DB::Expression
Instances: 2	Module : Bio::DB::Expression::geo
Instances: 5	Module : Bio::DB::Flat
Instances: 2	Module : Bio::DB::Query::WebQuery
Instances: 17	Module : Bio::DB::SeqFeature::Store
Instances: 2	Module : Bio::DB::SeqVersion
Instances: 3	Module : Bio::DB::Taxonomy
Instances: 1	Module : Bio::FeatureIO::bed
Instances: 1	Module : Bio::Map::Marker
Instances: 1	Module : Bio::MapIO::fpc
Instances: 1	Module : Bio::MapIO::mapmaker
Instances: 1	Module : Bio::Restriction::IO::bairoch
Instances: 1	Module : Bio::Restriction::IO::itype2
Instances: 1	Module : Bio::Restriction::IO::withrefm
Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
Instances: 3	Module : Bio::Tools::Run::WrapperBase

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> Sent: Wednesday, May 31, 2006 1:15 PM
> To: Hilmar Lapp
> Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> If the documentation says "returns false" then I expect to be able to do
> this:
> 
> 	@result = foo();
> 	die "foo() failed" unless @result;
> 
> If the documentation says "returns undef" then I expect this:
> 
> 	@result = foo();
> 	die "foo() failed" unless $result[0];
> 
> Lincoln
> 
> 
> On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > If the subroutine is documented to return "false" on failure, then
> > > one must call
> > > return (or "return ()" ).
> >
> > The problem seems to be that 'a value that evaluates to either true
> > or false' and 'a [meaningful] value or undef' and 'a value or
> > false' ('a value or no value) are not the same in perl. And what
> > would/should one expect if the doc states 'true on success and false
> > otherwise'?
> >
> > Maybe the documentation should also be fixed to avoid any ambiguity.
> > I.e., avoid documenting 'a value or false' because it may be
> > ambiguous (not only) to the less proficient. 'True or false' should
> > imply a value being returned.
> >
> > Comments?
> >
> > 	-hilmar
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Wed May 31 17:07:06 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 17:07:06 -0400
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
References: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
Message-ID: <200605311707.08196.lstein@cshl.edu>


> Instances: 17	Module : Bio::DB::SeqFeature::Store

This is intentional. Bio::DB::SeqFeature::Store is intended to be a virtual 
base class. The throw_not_implemented() calls are there to force developers 
to override the needed interface methods.

If this is not the right way to do it, let me know and I'll fix it.

Lincoln


> Instances: 2	Module : Bio::DB::SeqVersion
> Instances: 3	Module : Bio::DB::Taxonomy
> Instances: 1	Module : Bio::FeatureIO::bed
> Instances: 1	Module : Bio::Map::Marker
> Instances: 1	Module : Bio::MapIO::fpc
> Instances: 1	Module : Bio::MapIO::mapmaker
> Instances: 1	Module : Bio::Restriction::IO::bairoch
> Instances: 1	Module : Bio::Restriction::IO::itype2
> Instances: 1	Module : Bio::Restriction::IO::withrefm
> Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
> Instances: 3	Module : Bio::Tools::Run::WrapperBase
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> > Sent: Wednesday, May 31, 2006 1:15 PM
> > To: Hilmar Lapp
> > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> > Subject: Re: [Bioperl-l] For CVS developers - potential
> > pitfallwith"returnundef"
> >
> > If the documentation says "returns false" then I expect to be able to do
> > this:
> >
> > 	@result = foo();
> > 	die "foo() failed" unless @result;
> >
> > If the documentation says "returns undef" then I expect this:
> >
> > 	@result = foo();
> > 	die "foo() failed" unless $result[0];
> >
> > Lincoln
> >
> > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > > If the subroutine is documented to return "false" on failure, then
> > > > one must call
> > > > return (or "return ()" ).
> > >
> > > The problem seems to be that 'a value that evaluates to either true
> > > or false' and 'a [meaningful] value or undef' and 'a value or
> > > false' ('a value or no value) are not the same in perl. And what
> > > would/should one expect if the doc states 'true on success and false
> > > otherwise'?
> > >
> > > Maybe the documentation should also be fixed to avoid any ambiguity.
> > > I.e., avoid documenting 'a value or false' because it may be
> > > ambiguous (not only) to the less proficient. 'True or false' should
> > > imply a value being returned.
> > >
> > > Comments?
> > >
> > > 	-hilmar
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From hlapp at gmx.net  Wed May 31 17:21:57 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:21:57 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
References: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
Message-ID: <A5EEA3BE-DEC6-42F2-AC44-D54F6C49DD8E@gmx.net>


On May 31, 2006, at 4:40 PM, Chris Fields wrote:

> What about modules that have 'throw_not_implemented' statements  
> present?

Those are often if not always legitimate - the problem are those that  
don't have them but fail to override an inherited interface or  
abstract method.

If something is not implemented what is the better way to express  
this other than throwing an exception? (and if it's not an interface  
or abstract base class, saying so in the documentation)

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed May 31 17:25:48 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:25:48 -0400
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine>
References: <001801c684e3$16e33730$15327e82@pyrimidine>
Message-ID: <8AA04BF0-FA79-43CF-9FBB-310314FECD91@gmx.net>


On May 31, 2006, at 2:50 PM, Chris Fields wrote:

> I've seen a lot of methods (mainly get/setters)
> that are essentially copied multiple times in the same or across  
> similar
> modules to save time.  You could see a scenario where, in those  
> instances,
> so-called 'bad code' would spread quite quickly.

This will usually be code generated by macros, e.g. the emacs macros  
for getter/setter generation for properties.

If the macro generates wrong code, that's indeed pretty bad. (We've  
had that.) OTOH it should be spotted quickly as well. And macro  
changes or new macros should probably be scrutinized by all eyes  
watching ...

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed May 31 17:40:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 16:40:22 -0500
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <A5EEA3BE-DEC6-42F2-AC44-D54F6C49DD8E@gmx.net>
Message-ID: <002401c684fa$d28e7640$15327e82@pyrimidine>

I think, as long as it's reflected in the docs that something doesn't work
(hasn't been implemented) then there's no problem.  It's when the docs are
misleading that we run into problems.  

The sticking point lies with some classes, such as IO classes (like SeqIO,
or Restrict::IO, with read and write methods) where the IO base class
specifies that it is possible to read and write a particular format but the
actual implementation varies according to whether or not the derived class
overrides the base or interface method (in other words, 'doesn't work as
advertised' only in specific circumstances).  I don't know how to solve this
issue except to add in the docs that specific formats don't implement
write() methods.  

Personally, I haven't had an issue with it and it probably makes no
difference, but I think it needs to be pointed out.  The most extreme I ran
into was Bio::Restriction::IO, which had 3 out of 4 plugin modules that
didn't implement the write() method but left this in the synopsis in POD:

    use Bio::Restriction::IO;

    $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                     -format => 'withrefm');
    $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                     -format => 'bairoch');
    my $res = $in->read; # a Bio::Restriction::EnzymeCollection
    $out->write($res);

  # or

  #    use Bio::Restriction::IO;
  #
  #    #input file format can be read from the file extension (dat|xml)
  #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
  #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
  #
  #    # World's shortest flat<->xml format converter:
  #    print $out $_ while <$in>;

None of this code works; in fact, no XML parser even exists for these IO
classes!  Bio::AlignIO also has a few as well (maf and Stockholm formats
don't write).

Chris


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, May 31, 2006 4:22 PM
> To: Chris Fields
> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho'
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> 
> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> 
> > What about modules that have 'throw_not_implemented' statements
> > present?
> 
> Those are often if not always legitimate - the problem are those that
> don't have them but fail to override an inherited interface or
> abstract method.
> 
> If something is not implemented what is the better way to express
> this other than throwing an exception? (and if it's not an interface
> or abstract base class, saying so in the documentation)
> 
> 	-hilmar
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From hlapp at gmx.net  Wed May 31 17:55:37 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:55:37 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <002401c684fa$d28e7640$15327e82@pyrimidine>
References: <002401c684fa$d28e7640$15327e82@pyrimidine>
Message-ID: <CB29173C-0BFC-43CA-A620-519084AFEE04@gmx.net>

This is documentation cruft resulting from copy&paste w/o later  
fixing it. (which isn't a justification)

Note that not implementing the write is as legitimate as not  
implementing the read method ... It should be pointed out in the  
documentation though that it will depend on the actual implementation  
of the format whether it supports reading or writing or both.

	-hilmar

On May 31, 2006, at 5:40 PM, Chris Fields wrote:

> I think, as long as it's reflected in the docs that something  
> doesn't work
> (hasn't been implemented) then there's no problem.  It's when the  
> docs are
> misleading that we run into problems.
>
> The sticking point lies with some classes, such as IO classes (like  
> SeqIO,
> or Restrict::IO, with read and write methods) where the IO base class
> specifies that it is possible to read and write a particular format  
> but the
> actual implementation varies according to whether or not the  
> derived class
> overrides the base or interface method (in other words, 'doesn't  
> work as
> advertised' only in specific circumstances).  I don't know how to  
> solve this
> issue except to add in the docs that specific formats don't implement
> write() methods.
>
> Personally, I haven't had an issue with it and it probably makes no
> difference, but I think it needs to be pointed out.  The most  
> extreme I ran
> into was Bio::Restriction::IO, which had 3 out of 4 plugin modules  
> that
> didn't implement the write() method but left this in the synopsis  
> in POD:
>
>     use Bio::Restriction::IO;
>
>     $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>                                      -format => 'withrefm');
>     $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>                                      -format => 'bairoch');
>     my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>     $out->write($res);
>
>   # or
>
>   #    use Bio::Restriction::IO;
>   #
>   #    #input file format can be read from the file extension (dat| 
> xml)
>   #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
>   #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>   #
>   #    # World's shortest flat<->xml format converter:
>   #    print $out $_ while <$in>;
>
> None of this code works; in fact, no XML parser even exists for  
> these IO
> classes!  Bio::AlignIO also has a few as well (maf and Stockholm  
> formats
> don't write).
>
> Chris
>
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Wednesday, May 31, 2006 4:22 PM
>> To: Chris Fields
>> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki  
>> Lehvaslaiho'
>> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
>>
>>
>> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
>>
>>> What about modules that have 'throw_not_implemented' statements
>>> present?
>>
>> Those are often if not always legitimate - the problem are those that
>> don't have them but fail to override an inherited interface or
>> abstract method.
>>
>> If something is not implemented what is the better way to express
>> this other than throwing an exception? (and if it's not an interface
>> or abstract base class, saying so in the documentation)
>>
>> 	-hilmar
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From slenk at emich.edu  Wed May 31 17:52:13 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Wed, 31 May 2006 17:52:13 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
Message-ID: <100682f110067a83.10067a83100682f1@emich.edu>


Isn't it fairly standard in OO schemes/languages to have an exception thrown if a method 
can't be found at the 
end of a search up the class hierarchy? I recall being very mad at Smalltalk because "method 
not found" kept 
biting me. C++ has pure virtual base classes that do not allow objects to be instantiated 
directly; they are 
meant to be inherited and then implemented. 

Perl 6 was mentioned a bit back. Is this issue addressed there? Should it be? Do the Bioperl 
people feed their 
needs into Perl 6 so that all the code effort to make Bio::Root is handled for them in the next 
effort by Perl 6 
itself. Make the Perl 6 people solve these issues with your input, then you will not have to 
deal with 
implementing it yourselves. I'll just bet that you are not the only potential users of Perl 6 who 
will have to solve 
these issues eventually.


----- Original Message -----
From: Hilmar Lapp <hlapp at gmx.net>
Date: Wednesday, May 31, 2006 5:21 pm
Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented

> 
> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> 
> > What about modules that have 'throw_not_implemented' statements  
> > present?
> 
> Those are often if not always legitimate - the problem are those 
> that  
> don't have them but fail to override an inherited interface or  
> abstract method.
> 
> If something is not implemented what is the better way to express  
> this other than throwing an exception? (and if it's not an 
> interface  
> or abstract base class, saying so in the documentation)
> 
> 	-hilmar
> 
> -- 
> 
=========================================================
==
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> 
=========================================================
==
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From arareko at campus.iztacala.unam.mx  Wed May 31 18:49:03 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 31 May 2006 17:49:03 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <001201c684d0$263c5530$15327e82@pyrimidine>
References: <001201c684d0$263c5530$15327e82@pyrimidine>
Message-ID: <447E1D5F.1050807@campus.iztacala.unam.mx>

Brian, Jay, Chris,

I agree with what Bernd Web said in another reply. For some people will 
be nice to still be able to run the script from the codebase and 
interact with it.

I don't think it should be a lot of problem to maintain both tutorials, 
as long as the 'main' one is the one in the CVS tree. By reading what 
Jay did in order to convert it into mediawiki format, I suppose this can 
be easily done again for each new change to the script (again, this is 
just my guessing). Besides, as far as I've seen, there aren't frequent 
commits to the script at all.

I've added a link in the left menu of the wiki. If you think it should 
point to the Tutorials page instead of the Bptutorial.pl page please let 
me know.

Regards,
Mauricio.

Chris Fields wrote:
> Brian, Jay,
> 
> I think it would be nice to have the tutorial prominently displayed somehow
> (Jay's suggestion), with a link provided via the tutorials page.  Hopefully
> this will help with the bioperl newbies.
> 
> Jay, looks like there are still some weird formatting issues with the
> bptutorial wiki page, something which I ran into before when getting the
> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
> spaces preceding a line denotes code for some reason).  Not much you can do
> in these cases except remove the extra spaces in those spots.  Looking good
> though!  
> 
> Chris
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
>> Sent: Wednesday, May 31, 2006 8:58 AM
>> To: Jay Hannah; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
>>
>> Jay,
>>
>> Excellent! Now we need to answer a few more questions for ourselves:
>>
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes,
>> we
>> don't want to have to maintain two bptutorials.
>>
>> - What do we do with the script part of bptutorial.pl? It certainly could
>> be
>> excised and put into the examples/ directory, for example, but this would
>> break a few of the paths that are being used.
>>
>> - A link to bptutorial? Or a link to the existing tutorials page?
>> http://www.bioperl.org/wiki/Tutorials.
>>
>> Any thoughts on these?
>>
>>
>> Brian O.
>>
>>
>> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
>>
>>> http://www.bioperl.org/wiki/Bptutorial.pl
>>>
>>> I think I just partially fulfilled this TODO:
>>>
>>>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
>>>
>>> I used Pod::Simple::Wiki (format 'mediawiki') to burn
>>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
>> the
>>> wiki page via my web browser. (Is that proper procedure? Is the plan to
>> just
>>> do that manually from time to time as the document changes?)
>>>
>>> Now what?
>>>
>>> Should there be a new link on the far left of bioperl.org called
>> "Tutorial"?
>>> It's an amazing document. IMHO it should be listed prominently on
>> bioperl.org.
>>> HTH,
>>>
>>> j
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Wed May 31 20:43:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 19:43:48 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311707.08196.lstein@cshl.edu>
Message-ID: <002801c68514$72f11480$15327e82@pyrimidine>


> -----Original Message-----
> From: Lincoln Stein [mailto:lstein at cshl.edu]
> Sent: Wednesday, May 31, 2006 4:07 PM
> To: Chris Fields
> Cc: 'Hilmar Lapp'; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho'
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> 
> > Instances: 17	Module : Bio::DB::SeqFeature::Store
> 
> This is intentional. Bio::DB::SeqFeature::Store is intended to be a
> virtual
> base class. The throw_not_implemented() calls are there to force
> developers
> to override the needed interface methods.
> 
> If this is not the right way to do it, let me know and I'll fix it.

That's the right way, though I don't really know what the 'right way' is.
Sorry Lincoln, didn't mean to imply anything directly at you specifically; I
responded to your last post to stay in the thread, so to speak.  It was
meant to be a general statement that some classes haven't implemented
methods specified by their abstract base or interface class.  This is just
output from a quickie script I wrote up to check on this and see how many of
these statements are out there, and since there isn't a full-proof method to
know what an abstract base class is, it pulls in a few abstract classes
(such as yours) along with all the others.  At least there aren't as many
hits as Torsten's ~400-500 for 'return undef'! 

Anyway, I'm not sure what would be the best place to address code problems
or issues like the unimplemented methods issue or Torsten's audits (list,
wiki, etc); it's a delicate issue b/c it's bordering on code critiquing and
what constitutes good vs. bad code.  I remember some pretty heated arguments
about the 'proper' way to do things a while back involving AUTOLOAD'ing
methods, which I think is summarized somewhere in the wiki.  Myself, I'm a
microbiologist and not a programmer, so I'm prone to bouts of hackery, but I
try to have the code at least do what the docs state.

Chris

> Lincoln
> 
> 
> > Instances: 2	Module : Bio::DB::SeqVersion
> > Instances: 3	Module : Bio::DB::Taxonomy
> > Instances: 1	Module : Bio::FeatureIO::bed
> > Instances: 1	Module : Bio::Map::Marker
> > Instances: 1	Module : Bio::MapIO::fpc
> > Instances: 1	Module : Bio::MapIO::mapmaker
> > Instances: 1	Module : Bio::Restriction::IO::bairoch
> > Instances: 1	Module : Bio::Restriction::IO::itype2
> > Instances: 1	Module : Bio::Restriction::IO::withrefm
> > Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
> > Instances: 3	Module : Bio::Tools::Run::WrapperBase
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> > > Sent: Wednesday, May 31, 2006 1:15 PM
> > > To: Hilmar Lapp
> > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> > > Subject: Re: [Bioperl-l] For CVS developers - potential
> > > pitfallwith"returnundef"
> > >
> > > If the documentation says "returns false" then I expect to be able to
> do
> > > this:
> > >
> > > 	@result = foo();
> > > 	die "foo() failed" unless @result;
> > >
> > > If the documentation says "returns undef" then I expect this:
> > >
> > > 	@result = foo();
> > > 	die "foo() failed" unless $result[0];
> > >
> > > Lincoln
> > >
> > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > > > If the subroutine is documented to return "false" on failure, then
> > > > > one must call
> > > > > return (or "return ()" ).
> > > >
> > > > The problem seems to be that 'a value that evaluates to either true
> > > > or false' and 'a [meaningful] value or undef' and 'a value or
> > > > false' ('a value or no value) are not the same in perl. And what
> > > > would/should one expect if the doc states 'true on success and false
> > > > otherwise'?
> > > >
> > > > Maybe the documentation should also be fixed to avoid any ambiguity.
> > > > I.e., avoid documenting 'a value or false' because it may be
> > > > ambiguous (not only) to the less proficient. 'True or false' should
> > > > imply a value being returned.
> > > >
> > > > Comments?
> > > >
> > > > 	-hilmar
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed May 31 20:56:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 19:56:12 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx>
Message-ID: <002901c68516$316d4fe0$15327e82@pyrimidine>

Mauricio et al,

Sounds good, except that there are a few issues with the formatting done by
Pod::Simple::Wiki, such as changing some things to <code> tags when they
obviously aren't code; I don't know if thee is a work around for that
(Jay?).  It may not be anything too serious though.  

There was a similar issue with the INSTALL doc conversion to wiki that I ran
into, in that I don't think it will be easy converting one way or the other
(POD->wiki or wiki->POD or text), so syncing updates with wiki and CVS docs
could be an issue we'll have to face in the future.

We could strip the POD out of the script and have the docs on the wiki
(Brian's idea), or have minimal POD in the tutorial and keep the wiki
updated, just to simplify things, but this may not appeal to those who use
perldoc frequently (I personally use browsable prettified HTML).

cjf

> -----Original Message-----
> From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx]
> Sent: Wednesday, May 31, 2006 5:49 PM
> To: Chris Fields
> Cc: 'Brian Osborne'; 'Jay Hannah'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Brian, Jay, Chris,
> 
> I agree with what Bernd Web said in another reply. For some people will
> be nice to still be able to run the script from the codebase and
> interact with it.
> 
> I don't think it should be a lot of problem to maintain both tutorials,
> as long as the 'main' one is the one in the CVS tree. By reading what
> Jay did in order to convert it into mediawiki format, I suppose this can
> be easily done again for each new change to the script (again, this is
> just my guessing). Besides, as far as I've seen, there aren't frequent
> commits to the script at all.
> 
> I've added a link in the left menu of the wiki. If you think it should
> point to the Tutorials page instead of the Bptutorial.pl page please let
> me know.
> 
> Regards,
> Mauricio.
> 
> Chris Fields wrote:
> > Brian, Jay,
> >
> > I think it would be nice to have the tutorial prominently displayed
> somehow
> > (Jay's suggestion), with a link provided via the tutorials page.
> Hopefully
> > this will help with the bioperl newbies.
> >
> > Jay, looks like there are still some weird formatting issues with the
> > bptutorial wiki page, something which I ran into before when getting the
> > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or
> more
> > spaces preceding a line denotes code for some reason).  Not much you can
> do
> > in these cases except remove the extra spaces in those spots.  Looking
> good
> > though!
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> >> Sent: Wednesday, May 31, 2006 8:58 AM
> >> To: Jay Hannah; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> >>
> >> Jay,
> >>
> >> Excellent! Now we need to answer a few more questions for ourselves:
> >>
> >> - Do we remove the file bptutorial.pl from the package now? I'd say
> yes,
> >> we
> >> don't want to have to maintain two bptutorials.
> >>
> >> - What do we do with the script part of bptutorial.pl? It certainly
> could
> >> be
> >> excised and put into the examples/ directory, for example, but this
> would
> >> break a few of the paths that are being used.
> >>
> >> - A link to bptutorial? Or a link to the existing tutorials page?
> >> http://www.bioperl.org/wiki/Tutorials.
> >>
> >> Any thoughts on these?
> >>
> >>
> >> Brian O.
> >>
> >>
> >> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
> >>
> >>> http://www.bioperl.org/wiki/Bptutorial.pl
> >>>
> >>> I think I just partially fulfilled this TODO:
> >>>
> >>>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >>>
> >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn
> >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
> >> the
> >>> wiki page via my web browser. (Is that proper procedure? Is the plan
> to
> >> just
> >>> do that manually from time to time as the document changes?)
> >>>
> >>> Now what?
> >>>
> >>> Should there be a new link on the far left of bioperl.org called
> >> "Tutorial"?
> >>> It's an amazing document. IMHO it should be listed prominently on
> >> bioperl.org.
> >>> HTH,
> >>>
> >>> j
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM


From osborne1 at optonline.net  Wed May 31 21:37:15 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 31 May 2006 21:37:15 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx>
Message-ID: <C0A3BD0B.8A2C%osborne1@optonline.net>

Mauricio,

Bernd didn't say he want the _script_ in the package, he said he wanted
bptutorial.pl in the package, not indicating whether it was the
documentation or the script that was important. It's my suspicion that the
documentation is more important than the script, and this is what my last
letter was asking, in part: is the script important? Or can we focus on the
text/POD part?

Brian O.


On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
<arareko at campus.iztacala.unam.mx> wrote:

> I agree with what Bernd Web said in another reply. For some people will
> be nice to still be able to run the script from the codebase and
> interact with it.


From cjfields at uiuc.edu  Wed May 31 21:42:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 20:42:54 -0500
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <100682f110067a83.10067a83100682f1@emich.edu>
Message-ID: <002a01c6851c$b3b8a980$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Stephen Gordon Lenk
> Sent: Wednesday, May 31, 2006 4:52 PM
> To: Hilmar Lapp
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> 
> Isn't it fairly standard in OO schemes/languages to have an exception
> thrown if a method
> can't be found at the
> end of a search up the class hierarchy? I recall being very mad at
> Smalltalk because "method
> not found" kept
> biting me. C++ has pure virtual base classes that do not allow objects to
> be instantiated
> directly; they are
> meant to be inherited and then implemented.

Perl will throw an error if it can't find a method in a class hierarchy.
It will do a few things first before dying, like looking for AUTOLOAD, etc.
AUTOLOAD has it's supporters and detractors; I try to stay away from it as
much as possible.

Not sure about C++ like pure virtual classes in Perl5, i.e. not allowing
direct object instantiation, but Perl6 is supposed to have them, at least
according to Apocalypse 12.  From what Mr. Wall says about OOP in Perl5,
it's essentially 'bolted on' but works with caveats (is 'private' really
'private'?).  Perl6 is rebuilt from scratch (internals are OO).

> Perl 6 was mentioned a bit back. Is this issue addressed there? Should it
> be? Do the Bioperl
> people feed their
> needs into Perl 6 so that all the code effort to make Bio::Root is handled
> for them in the next
> effort by Perl 6
> itself. Make the Perl 6 people solve these issues with your input, then
> you will not have to
> deal with
> implementing it yourselves. I'll just bet that you are not the only
> potential users of Perl 6 who
> will have to solve
> these issues eventually.

I think Perl6 will solve most (if not all) these problems since it's a
complete rebuild.  In fact, it's pretty much a new language altogether from
what I have seen (and the little I have played around with using Pugs).
Parrot is supposed to handle mixes of Perl5/Perl6, so it may not be
necessary to immediately convert all of bioperl to Perl6.  Though I have
also heard of a Perl5->6 converter in the works as well...  

>From an OO standpoint, I believe everything is considered an object in
Perl6, though it's not supposed to force you into using objects according to
the Apocalypses that I have read.  I actually see a lot there that reminds
me of C++ (but in a Perl-ish way, of course).  Apocalypse 12 is a good
primer, though you may want to go through the others first, they're heavy
slogging:

http://dev.perl.org/perl6/doc/design/apo/A12.html

Not sure what you mean by 'feeding our needs into Perl6'.  I have
periodically checked on perl6 progress and they seem to have everything well
under control.

Chris
 
> ----- Original Message -----
> From: Hilmar Lapp <hlapp at gmx.net>
> Date: Wednesday, May 31, 2006 5:21 pm
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> >
> > On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> >
> > > What about modules that have 'throw_not_implemented' statements
> > > present?
> >
> > Those are often if not always legitimate - the problem are those
> > that
> > don't have them but fail to override an inherited interface or
> > abstract method.
> >
> > If something is not implemented what is the better way to express
> > this other than throwing an exception? (and if it's not an
> > interface
> > or abstract base class, saying so in the documentation)
> >
> > 	-hilmar
> >
> > --
> >
> =========================================================
> ==
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >
> =========================================================
> ==
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jay at jays.net  Wed May 31 21:54:01 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 20:54:01 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <447E48B9.4080503@jays.net>

Brian Osborne wrote:
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.

We certainly wouldn't want to try to maintain two copies, one POD one in wiki. That would be the worst of all options. One option that hasn't been mentioned yet is to keep maintenance of that in POD in the distro (leaving the cool runability alone), and then flag that document as unchangeable in the wiki with a note on top "Maintenance of this document is done in POD in the distro. Submit POD patches to bioperl-l and we'll re-post an updated copy to this wiki."

Just a thought.

> - What do we do with the script part of bptutorial.pl? It certainly could be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.

/README says this:

 scripts/    - Useful production-quality scripts with POD documentation
 examples/   - Scripts demonstrating the many uses of Bioperl

I'm personally not clear on the difference. Little stuff should start in examples/ and graduate to scripts/ once they've matured? 

Is the doc/ tree being abandoned?

doc/faq        (empty?)
doc/howto      
doc/howto/examples
doc/howto/figs (empty?)
doc/howto/html (empty?)
doc/howto/pdf  (empty?)
doc/howto/sgml (empty?)
doc/howto/txt  (empty?)
doc/howto/xml  (empty?)

Does all that stuff officially live in and is being changed in the wiki, never to return to the distro?

Any reason those empty dirs aren't nuked out of CVS?

Chris Fields wrote:
> Jay, looks like there are still some weird formatting issues with the
> bptutorial wiki page, something which I ran into before when getting the
> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
> spaces preceding a line denotes code for some reason).  Not much you can do
> in these cases except remove the extra spaces in those spots.  Looking good
> though!  

Sorry, I spent zero time on the whole conversion. I'm not sure what parts didn't convert well. I've never done that conversion before, and know nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran off to work. :)

Mauricio Herrera Cuadra wrote:
> I've added a link in the left menu of the wiki. If you think it should 
> point to the Tutorials page instead of the Bptutorial.pl page please let 
> me know.

Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so?

Documentation  (linked on the left menu)
- Quick start
- FAQ
- HOWTOs
- Tutorials

(What's the conceptual difference between a HOWTO and a tutorial?)

It's hard for me to dive into a wiki lifestyle for the huge documentation pillars since it can't ever get back into the distro... (can it?)  Small, throw away stuff is great for the wiki, but huge, established, thoughtful, long documents should be left in the distro? Present (and searchable) on the wiki but static?

Why isn't the short "Current events" just listed on the top of the "News" page?

Sick of my endless questions yet? -grin-

j


From cjfields at uiuc.edu  Wed May 31 23:09:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 22:09:38 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
Message-ID: <000001c68528$d1b6ec10$15327e82@pyrimidine>


...

> We certainly wouldn't want to try to maintain two copies, one POD one in
> wiki. That would be the worst of all options. One option that hasn't been
> mentioned yet is to keep maintenance of that in POD in the distro (leaving
> the cool runability alone), and then flag that document as unchangeable in
> the wiki with a note on top "Maintenance of this document is done in POD
> in the distro. Submit POD patches to bioperl-l and we'll re-post an
> updated copy to this wiki."
> 
> Just a thought.

There are probably three schools of thought on docs: those that like nice
docs with links within and beyond BioPerl (hence the wiki), those who like
including docs with the distribution, and those that would like both.  The
latter would be nice but isn't realistic unless we can come up with a way to
sync changes between the wiki and CVS those docs we want to include with the
distribution w/o too much trouble.  I'm in the first school of thought since
rich text with links is better and more informative than plain text any day.
It might be a very small school though...

> > - What do we do with the script part of bptutorial.pl? It certainly
> could be
> > excised and put into the examples/ directory, for example, but this
> would
> > break a few of the paths that are being used.
> 
> /README says this:
> 
>  scripts/    - Useful production-quality scripts with POD documentation
>  examples/   - Scripts demonstrating the many uses of Bioperl
> 
> I'm personally not clear on the difference. Little stuff should start in
> examples/ and graduate to scripts/ once they've matured?
> 
> Is the doc/ tree being abandoned?

Most docs have been moved over to the wiki, which generates nicely formatted
docs for printing.
...

> Does all that stuff officially live in and is being changed in the wiki,
> never to return to the distro?

It's easier to add changes in the wiki and add markup, links, etc.  Much
richer text, so on.
 
> Any reason those empty dirs aren't nuked out of CVS?
> 
> Chris Fields wrote:
> > Jay, looks like there are still some weird formatting issues with the
> > bptutorial wiki page, something which I ran into before when getting the
> > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or
> more
> > spaces preceding a line denotes code for some reason).  Not much you can
> do
> > in these cases except remove the extra spaces in those spots.  Looking
> good
> > though!
> 
> Sorry, I spent zero time on the whole conversion. I'm not sure what parts
> didn't convert well. I've never done that conversion before, and know
> nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing
> then ran off to work. :)

No big deal.  

> Mauricio Herrera Cuadra wrote:
> > I've added a link in the left menu of the wiki. If you think it should
> > point to the Tutorials page instead of the Bptutorial.pl page please let
> > me know.
> 
> Instead of all these competing links on the left, maybe we should have a
> master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials

Okay, though Mauricio may know a bit more on how/if this can be done.
Mauricio?

> (What's the conceptual difference between a HOWTO and a tutorial?)

I believe the reasoning is along these lines: HOWTO's are focused in on
specific areas (graphics, trees, BLAST report parsing, etc) and thus usually
has greater detail. The tutorials are more broadly based (sort of a general
bioperl HOWTO).  The only exception is the Beginner's HOWTO, but even that
has additional information over the tutorial (at least it did the last time
I looked at the tutorial, which has been a while).

> It's hard for me to dive into a wiki lifestyle for the huge documentation
> pillars since it can't ever get back into the distro... (can it?)  Small,
> throw away stuff is great for the wiki, but huge, established, thoughtful,
> long documents should be left in the distro? Present (and searchable) on
> the wiki but static?

Hence the problem we face now.  It is something we need to really look into
before adding too much more to the wiki.  IMHO, I think we should have very
little information directly in the distribution itself since it's already
quite large.  It's almost as easy to have a bare-bones INSTALL file, which
would point to the wiki for additional information.  But I may be very much
alone in that train of thought ; >

> Why isn't the short "Current events" just listed on the top of the "News"
> page?

Don't know.
 
> Sick of my endless questions yet? -grin-

Not really.

cjf

> j
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gad14 at cornell.edu  Tue May 30 12:57:41 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Tue, 30 May 2006 12:57:41 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447BFB20.40501@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
Message-ID: <447C7985.9000404@cornell.edu>

Thanks for your comment Sendu, it was very helpful. I think this must be 
what's going on.. I am using $blast_report->next_result in both 
subroutines. It appears that analyzing the blast results first w/ my 
sort subroutine empties (?) the $blast_result object so that when I try 
to print, there is nothing left to print. (and visa-versa when I print 
first then try to sort).
So, from the looks of things, using next_result has the effect of 
popping the Bio::Search::Result::ResultI objects off of the SearchIO 
blast report object??

It seems I could get around this by making a copy of the blast report by 
setting it to another new variable...(not the most elegant solution) but 
I'm having trouble with this...

If I do:

	my $blast_report_copy = $blast_report;

I'm just copying the reference to the SearchIO blast result, so it 
doesn't help me. How can I make another physical copy of this blast 
result object? Seems like a simple thing but how to do it is escaping me.

But better yet, the way to go is to 'reset the counter,' or to find a 
way to look at/print/sort the results without removing data from the 
blast result object. How is this done though??

Sendu and Brian, I didn't post the sort_results subroutine because it is 
sprawling, as is a lot of my code. The code I provided was more like an 
aid for my explanation of the problem.. it doesn't actually run - sorry 
for the confusion, I should have more clear on that.  The important 
thing to know perhaps is that both sort_results and print_blast_results 
contain a foreach loop where I am using the 'next_results' method to 
view blast results. (And to clarify for Torsten, the blastall() is 
working just fine - the analysis/viewing of the results object is where 
I am encountering the problem.)


Any other ideas would be greatly appreciated...

Thank you,
Genevieve


Sendu Bala wrote:

> Genevieve DeClerck wrote:
> 
>> Hi,
> 
> [snip]
> 
>> If I've sorted the results the sorted-results will print to screen, 
>> however when I try to print the Hit Table results nothing is returned, 
>> as if the blast results have evaporated.... and visa versa, if i 
>> comment out the part where i point my sorting subroutine to the blast 
>> results reference,  my hit table results suddenly prints to screen.
> 
> [snip]
> 
>> Here's an abbreviated version of my code:
> 
> [snip]
> 
>> #######
>> ### the following 2 actions seem to be mutually exclusive.
>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>> # SeqFeature objs stored in arrays. arrays are then printed
>> # to stdout
>> &sort_results($blast_report);
>>
>> # 2) print blast results
>> &print_blast_results($blast_report);
> 
> 
>> sub print_blast_results{
>>    my $report = shift;
>>    while(my $result = $report->next_result()){
> 
> [snip]
> 
> You didn't give us your sort_results subroutine, but is it as simple as 
> they both use $report->next_result (and/or $result->next_hit), but you 
> don't reset the internal counter back to the start, so the second 
> subroutine tries to get the next_result and finds the first subroutine 
> has already looked at the last result and so next_result returns false?
> 
>  From a quick look it wasn't obvious how to reset the counter. Hopefully 
> this can be done and someone else knows how.
> 


From lstein at cshl.edu  Wed May 31 11:17:39 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 11:17:39 -0400
Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg
	values
In-Reply-To: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>
References: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>
Message-ID: <200605311117.41479.lstein@cshl.edu>

Hi Kevin,

Since you are modifying the Panel.pm source code, why don't you just go ahead 
and use the current Bio::Graphics development tree? Since 1.5.1 it supports 
negative coordinates. Here's an illustration:

 #!/usr/bin/perl

 use strict;

 use Bio::Graphics;
 use Bio::Graphics::Feature;

 my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
 my $feature = 
Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
 my $panel   = Bio::Graphics::Panel->new(-start=> -200,
					 -end  => +200,
					 -width=>800,
					 -pad_left=>10,
					 -pad_right=>10);
 $panel->add_track($whole,
		   -glyph=>'arrow',
		   -double=>1,
		   -tick=>2);
 $panel->add_track($feature,
	 	  -glyph=>'box',
		   -stranded=>1);
 print $panel->png;

 exit 0;

The resulting image is attached.

Lincoln

On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> I am so sorry for the truncated email accidentally hit reply.
> if anyone is interested i have opted to change
>
> change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> in linux its
> /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
>
>
>       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
>
> to
>
>       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
>
> just  for this one-off use.
>
>
>
> strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> option for coords offset?
>     my $relative_coords_offset = $self->option('relative_coords_offset');
>     $relative_coords_offset    = 1 unless defined $relative_coords_offset;
> but entering the option -relative_coords_offset=>1000 in the arrow glyphs
> didn't do anything...
>
>
>
> Hi!
>
> > oh it was in a slightly different header asking about the create image
> > map feature.
> > I am using the stable version 1.4 of bioperl now. In any case I have not
> > added the sequence as a feature annotated seq. as I already have the bp
> > where the TF binds (in 1-1050 numberings) so what I did was to just add
> > graded segments based on the position.
> > I saw that there is a scale function for the arrow glyp however, it is a
> > multiply function, can it be hacked to take in a offset value (ie minus
> > the
> > scale by 1000?)
> >
> > cheers
> > kevin
> >
> >
> > Hi,
> >
> > > For some reason I didn't see the first posting on this. In current
> >
> > bioperl
> >
> > > live, the ruler can have negative numberings - I use this routinely.
> > > You need
> > > to create a feature that starts in negative coordinates. What is
> >
> > happening
> >
> > > to
> > > you when you try this?
> > >
> > > Lincoln
> > >
> > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > Hi
> > > > thanks for the help offered thus far!
> > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > >
> > > using
> > >
> > > > bioperl. therefore i was asked to make the numberings as such (-1000)
> >
> > is
> >
> > > > there any way at all to do this in bioperl without changing the .pm
> > >
> > > file?
> > >
> > > > thanks guys..
> > > > kevin
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: negatives.png
Type: image/png
Size: 1065 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/eaeb5e28/attachment-0002.png>

From lstein at cshl.edu  Wed May 31 12:05:47 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 12:05:47 -0400
Subject: [Bioperl-l] Fwd: Re: SOLVED Bio::Graphics::Panel make ruler have
	neg values
Message-ID: <200605311205.48122.lstein@cshl.edu>

Oddly, bioperl-l listserver is holding this mail because it has "a suspicious 
header". I took out Kevin's email address in case it is the "spammotel" 
header that is bothering it.

Lincoln

----------  Forwarded Message  ----------

Subject: Re: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg 
values
Date: Wednesday 31 May 2006 11:17
From: Lincoln Stein <lstein at cshl.edu>
To: bioperl-l at lists.open-bio.org
Cc: "Kevin Lam Koiyau" <ULNJUJERYDIX at spammotel.com>

Hi Kevin,

Since you are modifying the Panel.pm source code, why don't you just go ahead
and use the current Bio::Graphics development tree? Since 1.5.1 it supports
negative coordinates. Here's an illustration:

 #!/usr/bin/perl

 use strict;

 use Bio::Graphics;
 use Bio::Graphics::Feature;

 my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
 my $feature =
Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
 my $panel   = Bio::Graphics::Panel->new(-start=> -200,
					 -end  => +200,
					 -width=>800,
					 -pad_left=>10,
					 -pad_right=>10);
 $panel->add_track($whole,
		   -glyph=>'arrow',
		   -double=>1,
		   -tick=>2);
 $panel->add_track($feature,
	 	  -glyph=>'box',
		   -stranded=>1);
 print $panel->png;

 exit 0;

The resulting image is attached.

Lincoln

On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> I am so sorry for the truncated email accidentally hit reply.
> if anyone is interested i have opted to change
>
> change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> in linux its
> /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
>
>
>       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
>
> to
>
>       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
>
> just  for this one-off use.
>
>
>
> strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> option for coords offset?
>     my $relative_coords_offset = $self->option('relative_coords_offset');
>     $relative_coords_offset    = 1 unless defined $relative_coords_offset;
> but entering the option -relative_coords_offset=>1000 in the arrow glyphs
> didn't do anything...
>
>
>
> Hi!
>
> > oh it was in a slightly different header asking about the create image
> > map feature.
> > I am using the stable version 1.4 of bioperl now. In any case I have not
> > added the sequence as a feature annotated seq. as I already have the bp
> > where the TF binds (in 1-1050 numberings) so what I did was to just add
> > graded segments based on the position.
> > I saw that there is a scale function for the arrow glyp however, it is a
> > multiply function, can it be hacked to take in a offset value (ie minus
> > the
> > scale by 1000?)
> >
> > cheers
> > kevin
> >
> >
> > Hi,
> >
> > > For some reason I didn't see the first posting on this. In current
> >
> > bioperl
> >
> > > live, the ruler can have negative numberings - I use this routinely.
> > > You need
> > > to create a feature that starts in negative coordinates. What is
> >
> > happening
> >
> > > to
> > > you when you try this?
> > >
> > > Lincoln
> > >
> > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > Hi
> > > > thanks for the help offered thus far!
> > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > >
> > > using
> > >
> > > > bioperl. therefore i was asked to make the numberings as such (-1000)
> >
> > is
> >
> > > > there any way at all to do this in bioperl without changing the .pm
> > >
> > > file?
> > >
> > > > thanks guys..
> > > > kevin
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

-------------------------------------------------------

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: negatives.png
Type: image/png
Size: 1065 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/6c5f4137/attachment-0002.png>

From rvosa at sfu.ca  Tue May 30 15:10:17 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 12:10:17 -0700
Subject: [Bioperl-l] New mailing list for Bio::Phylo
Message-ID: <447C9899.5060102@sfu.ca>

Dear recipients,

the open bioinformatics foundation has been kind enough to host a 
mailing list for Bio::Phylo (http://search.cpan.org/~rvosa/Bio-Phylo/, 
the cpan distribution for phylogenetic analysis using perl).

The scope of this list is at present fairly broad as it is both meant 
for user questions and development discussion on deeper integration with 
bioperl.

You are invited to sign up at: 
http://lists.open-bio.org/mailman/listinfo/bio-phylo-l

Best wishes,

Rutger Vos

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From bioperlanand at yahoo.com  Mon May  1 14:36:20 2006
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Mon, 1 May 2006 11:36:20 -0700 (PDT)
Subject: [Bioperl-l] how to obtain GIs from clone_ids
Message-ID: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com>


Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry) 
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions..

Thanks in advance.

Anand

		
---------------------------------
Blab-away for as little as 1?/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.


From cuiw at mail.nih.gov  Mon May  1 15:39:01 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Mon, 1 May 2006 15:39:01 -0400
Subject: [Bioperl-l] how to obtain GIs from clone_ids
In-Reply-To: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F48B0@nihcesmlbx10.nih.gov>

use strict;
use Bio::DB::Query::GenBank;

my $query_string = 'EST["C0005918b04"]';   
my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',                                           
					 -query=>$query_string,				       
					);   
my $count = $query->count;


my @ids   = $query->ids;  


for (@ids) {
  print;
}

-----Original Message-----
From: Anand Venkatraman [mailto:bioperlanand at yahoo.com] 
Sent: Monday, May 01, 2006 2:36 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] how to obtain GIs from clone_ids


Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry)
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions..

Thanks in advance.

Anand

		
---------------------------------
Blab-away for as little as 1?/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From s.ryazansky at gmail.com  Mon May  1 17:55:13 2006
From: s.ryazansky at gmail.com (Sergei Ryazansky)
Date: Mon, 1 May 2006 21:55:13 +0000 (UTC)
Subject: [Bioperl-l] blast program to run locally on windows
References: <007c01c66883$61f29490$15327e82@pyrimidine>
	<20060425215433.35436.qmail@web36613.mail.mud.yahoo.com>
Message-ID: <loom.20060501T235327-11@post.gmane.org>

Hi,
Can you post your formatdb.log file here?


From cjfields at uiuc.edu  Tue May  2 00:15:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 1 May 2006 23:15:19 -0500
Subject: [Bioperl-l] blast program to run locally on windows
In-Reply-To: <loom.20060501T235327-11@post.gmane.org>
References: <007c01c66883$61f29490$15327e82@pyrimidine>
	<20060425215433.35436.qmail@web36613.mail.mud.yahoo.com>
	<loom.20060501T235327-11@post.gmane.org>
Message-ID: <D54C8321-6A9C-4674-8C7E-5452DEF84599@uiuc.edu>

We managed to work our way through it.  He hadn't set ncbi.ini to the  
correct directories; the database was formatted correctly.

Chris

On May 1, 2006, at 4:55 PM, Sergei Ryazansky wrote:

> Hi,
> Can you post your formatdb.log file here?
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue May  2 12:19:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 2 May 2006 11:19:34 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and complexity
Message-ID: <000901c66e04$33e07370$15327e82@pyrimidine>

I ran into some wonkiness with using extra parameters ('seq_start',
'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
gone through, fixed, and committed.  I also have added a few tests to DB.t
for everything (all changes were in Bio::DB::WebDBSeqI and
Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
manage to get it added as well (with tests).  This is how NCBI defines
complexity:

complexity regulates the display:
0 - get the whole blob
1 - get the bioseq for gi of interest (default in Entrez)
2 - get the minimal bioseq-set containing the gi of interest
3 - get the minimal nuc-prot containing the gi of interest
4 - get the minimal pub-set containing the gi of interest

Here's my quandary; when setting complexity to '0', you get a glob back (the
main sequence as well as any subsequences, such as CDS); this is in essence
a sequence stream with multiple alphabet types.  So, I now have it set up to
do this:

my $factory = Bio::DB::GenBank->new(-format => 'fasta',
                                    -complexity => 0
                                   );

my $seqin = $factory->get_Seq_by_acc($acc);

while (my $seq = $seqin->next_seq) {
    $seqout->write_seq($seq);
}

since I thought returning an array would be horrendously expensive on
memory, esp. with larger sequences.  Currently this is only set up for
sequences which are retrieved when complexity is set to '0' so it's a pretty
unique case.  Regardless, I'm worried that, since users expect a Bio::Seq
object instead of a Bio::SeqIO object here, it will cause a lot of confusion
with the API.  Any suggestions/gripes?

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From mamillerpa at yahoo.com  Tue May  2 07:41:01 2006
From: mamillerpa at yahoo.com (Mark A. Miller)
Date: Tue, 2 May 2006 04:41:01 -0700 (PDT)
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines
Message-ID: <20060502114101.29745.qmail@web50409.mail.yahoo.com>

Hello all.

I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
make FASTA subset files for some bacterial strains.  I haven't been
able to parse out the strain information from the OS or RC lines. 
These lines typically look like:

OS Somegenus somespecies subsp. somesubspecies strain ABC123.
RC STRAIN=ABC123.

I'm not especiialy good with Perl, and I'm definitely weak when it
comes to OOP.

I have included some code I pasted together from various pages on the
bioperl wiki.  In addition to the wiki, I have been making use of 
www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html

The code I have so far reports the species but not the subspecies or
variant.  I have also tried to walk through all of the feature,
annotation and reference objects but I still can't seem to parse out
the information I need.  (For brevity, the example I'm including below
only lists the code I used for the annotation objects.)  Also, this
code only prints the information...  I know that I'll have to write a
FASTA sequence object seperately.

Any suggestions?

Thanks,
Mark

---   ---   ---


#!/usr/bin/perl


use Bio::SeqIO;


my $usage = "getaccs.pl file format\n";

my $file = shift or die $usage;

my $format = shift or die $usage;


my $inseq = Bio::SeqIO->new(-file   => "<$file",

   -format => $format );


while (my $seq = $inseq->next_seq) {


  my $species_object = $seq->species;

  my $species_string = $species_object->species;

  my $variant_string = $species_object->variant;

  my $common_string = $species_object->common_name;

  my $sub_string = $species_object->sub_species;

  my $binomial = $species_object->binomial('FULL');

  
  print "display   ",$seq->display_id,"\n";

  print "accession ",$seq->accession_number,"\n";

  print "desc      ",$seq->desc,"\n";

  
  print "species   ",$species_string,"\n";

  print "variant   ",$variant_string,"\n";

  print "common    ",$common_string,"\n";

  print "sub       ",$sub_string,"\n";

  print "binomial  ",$binomial,"\n";

  
  print $seq->seq,"\n";

  
  my $anno_collection = $seq->annotation;

  for my $key ( $anno_collection->get_all_annotation_keys ) {

    my @annotations = $anno_collection->get_Annotations($key);

    for my $value ( @annotations ) {

      print "tagname : ", $value->tagname, "\n";

      # $value is an Bio::Annotation, and has an "as_text" method

      print "  annotation value: ", $value->as_text, "\n";


       if ($value->tagname eq "reference") {

        my $hash_ref = $value->hash_tree;

        for my $key (keys %{$hash_ref}) {

          print $key,": ",$hash_ref->{$key},"\n";

        }

      }

    }

  }

  print "\n";

}

exit;


---   ---   ---   ---   ---   ---   ---   ---

Mark A. Miller

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  2 14:01:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 2 May 2006 13:01:58 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and complexity
In-Reply-To: <000901c66e04$33e07370$15327e82@pyrimidine>
Message-ID: <000a01c66e12$8131a960$15327e82@pyrimidine>

I hate responding to my own post!  Just wanted to add that I'm adding a
warnings for the get_Seq* methods to use the approp. get_Stream* method when
complexity == 0 before returning the Bio::SeqIO object.

CJF

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, May 02, 2006 11:20 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::GenBank and complexity
> 
> I ran into some wonkiness with using extra parameters ('seq_start',
> 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
> gone through, fixed, and committed.  I also have added a few tests to DB.t
> for everything (all changes were in Bio::DB::WebDBSeqI and
> Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
> manage to get it added as well (with tests).  This is how NCBI defines
> complexity:
> 
> complexity regulates the display:
> 0 - get the whole blob
> 1 - get the bioseq for gi of interest (default in Entrez)
> 2 - get the minimal bioseq-set containing the gi of interest
> 3 - get the minimal nuc-prot containing the gi of interest
> 4 - get the minimal pub-set containing the gi of interest
> 
> Here's my quandary; when setting complexity to '0', you get a glob back
> (the
> main sequence as well as any subsequences, such as CDS); this is in
> essence
> a sequence stream with multiple alphabet types.  So, I now have it set up
> to
> do this:
> 
> my $factory = Bio::DB::GenBank->new(-format => 'fasta',
>                                     -complexity => 0
>                                    );
> 
> my $seqin = $factory->get_Seq_by_acc($acc);
> 
> while (my $seq = $seqin->next_seq) {
>     $seqout->write_seq($seq);
> }
> 
> since I thought returning an array would be horrendously expensive on
> memory, esp. with larger sequences.  Currently this is only set up for
> sequences which are retrieved when complexity is set to '0' so it's a
> pretty
> unique case.  Regardless, I'm worried that, since users expect a Bio::Seq
> object instead of a Bio::SeqIO object here, it will cause a lot of
> confusion
> with the API.  Any suggestions/gripes?
> 
> Chris
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Tue May  2 14:36:08 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 2 May 2006 14:36:08 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
References: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
Message-ID: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>

This is really a limitation of the EMBL/GenBank format

See this thread:
http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html

or on GMANE
http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557

I don't know if any of this has been resolved really so hopefully  
James will speak up if he's implemented anything.

-jason
On May 2, 2006, at 7:41 AM, Mark A. Miller wrote:

> Hello all.
>
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
>
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
>
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
>
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
>
> The code I have so far reports the species but not the subspecies or
> variant.  I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need.  (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.)  Also, this
> code only prints the information...  I know that I'll have to write a
> FASTA sequence object seperately.
>
> Any suggestions?
>
> Thanks,
> Mark
>
> ---   ---   ---
>
>
> #!/usr/bin/perl
>
>
>
> use Bio::SeqIO;
>
>
>
> my $usage = "getaccs.pl file format\n";
>
> my $file = shift or die $usage;
>
> my $format = shift or die $usage;
>
>
>
> my $inseq = Bio::SeqIO->new(-file   => "<$file",
>
>    -format => $format );
>
>
>
> while (my $seq = $inseq->next_seq) {
>
>
>
>   my $species_object = $seq->species;
>
>   my $species_string = $species_object->species;
>
>   my $variant_string = $species_object->variant;
>
>   my $common_string = $species_object->common_name;
>
>   my $sub_string = $species_object->sub_species;
>
>   my $binomial = $species_object->binomial('FULL');
>
>
>
>   print "display   ",$seq->display_id,"\n";
>
>   print "accession ",$seq->accession_number,"\n";
>
>   print "desc      ",$seq->desc,"\n";
>
>
>
>   print "species   ",$species_string,"\n";
>
>   print "variant   ",$variant_string,"\n";
>
>   print "common    ",$common_string,"\n";
>
>   print "sub       ",$sub_string,"\n";
>
>   print "binomial  ",$binomial,"\n";
>
>
>
>   print $seq->seq,"\n";
>
>
>
>   my $anno_collection = $seq->annotation;
>
>   for my $key ( $anno_collection->get_all_annotation_keys ) {
>
>     my @annotations = $anno_collection->get_Annotations($key);
>
>     for my $value ( @annotations ) {
>
>       print "tagname : ", $value->tagname, "\n";
>
>       # $value is an Bio::Annotation, and has an "as_text" method
>
>       print "  annotation value: ", $value->as_text, "\n";
>
>
>
>        if ($value->tagname eq "reference") {
>
>         my $hash_ref = $value->hash_tree;
>
>         for my $key (keys %{$hash_ref}) {
>
>           print $key,": ",$hash_ref->{$key},"\n";
>
>         }
>
>       }
>
>     }
>
>   }
>
>   print "\n";
>
> }
>
> exit;
>
>
>
>
>
> ---   ---   ---   ---   ---   ---   ---   ---
>
> Mark A. Miller
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From mblanche at berkeley.edu  Tue May  2 15:30:49 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 12:30:49 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
Message-ID: <C07D0179.2183%mblanche@berkeley.edu>

Dear all--

I have been trying to use the intersection function to extract overlapping
region from alternatively spliced exons as in the following script. The
returned object from the 'my $overlap = $exon1->intersection($exon2);' is
actually loosing the strand of $exon1 if $exon1 is from the negative strand.
Is this behavior expected? Should I check the strand of $exon1 before
working on the object return by any Bio::RangeI function?

Many thanks 

#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;

MAIN:{

    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                -dsn =>
'dbi:mysql:database=dmel_43_LS;host=riolab.net',
                                -user => 'guest');
    my $test_db = $db->segment('4');
    
    # Load up the exons into $exons_p
    for my $gene ($test_db->features(-types => 'gene')){

        my $exons_p = extractExons($gene);

        cluster($exons_p) unless ($#{$exons_p} == -1);

    }
}

sub extractExons {
    my $gene = shift;
    my %ex_list;
    my @tcs = $gene->features(    -type =>'processed_transcript',
                                    -attributes =>{Gene => $gene->group});
                   
    for my $tc (@tcs){
        my @exons = $tc->features (-type => 'exon',
                                     -attributes => {Parent => $tc->group}
);        
    
        for (@exons){
            my $ex_id    = $_->id;
            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});

        }
    
    }
    my @values = values %ex_list;
    return(\@values);
}

sub cluster {
    my $exons_p = shift;
    
    for (my $s = 0; $s <= $#{$exons_p}; $s++){
        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
            my $exon1 = $exons_p->[$s];
            my $exon2 = $exons_p->[$t];
            
            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
            
                my $overlap = $exon1->intersection($exon2);
                
                print "===\n";;
                print "ex1\n", $exon1->seq, "\n";
                print "ex2\n", $exon2->seq, "\n";
                print "overlap\n", $overlap->seq, "\n";
            }
        }
    }
}
______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From osborne1 at optonline.net  Tue May  2 16:17:29 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 02 May 2006 16:17:29 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D0179.2183%mblanche@berkeley.edu>
Message-ID: <C07D3699.84BC%osborne1@optonline.net>

Marco,

Yes, this is how intersection() is supposed to work. If both of the Range
objects have the same strand then the strand information is returned as part
of the result but if they aren't on the same strand then no strand
information is returned.

Brian O.


On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:

> Dear all--
> 
> I have been trying to use the intersection function to extract overlapping
> region from alternatively spliced exons as in the following script. The
> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
> Is this behavior expected? Should I check the strand of $exon1 before
> working on the object return by any Bio::RangeI function?
> 
> Many thanks 
> 
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
> 
> MAIN:{
> 
>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                 -dsn =>
> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                 -user => 'guest');
>     my $test_db = $db->segment('4');
>     
>     # Load up the exons into $exons_p
>     for my $gene ($test_db->features(-types => 'gene')){
> 
>         my $exons_p = extractExons($gene);
> 
>         cluster($exons_p) unless ($#{$exons_p} == -1);
> 
>     }
> }
> 
> sub extractExons {
>     my $gene = shift;
>     my %ex_list;
>     my @tcs = $gene->features(    -type =>'processed_transcript',
>                                     -attributes =>{Gene => $gene->group});
>                  
>     for my $tc (@tcs){
>         my @exons = $tc->features (-type => 'exon',
>                                      -attributes => {Parent => $tc->group}
> );        
>     
>         for (@exons){
>             my $ex_id    = $_->id;
>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> 
>         }
>     
>     }
>     my @values = values %ex_list;
>     return(\@values);
> }
> 
> sub cluster {
>     my $exons_p = shift;
>     
>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>             my $exon1 = $exons_p->[$s];
>             my $exon2 = $exons_p->[$t];
>             
>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>             
>                 my $overlap = $exon1->intersection($exon2);
>                 
>                 print "===\n";;
>                 print "ex1\n", $exon1->seq, "\n";
>                 print "ex2\n", $exon2->seq, "\n";
>                 print "overlap\n", $overlap->seq, "\n";
>             }
>         }
>     }
> }
> ______________________________
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062


From mblanche at berkeley.edu  Tue May  2 16:32:58 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 13:32:58 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D3699.84BC%osborne1@optonline.net>
Message-ID: <C07D100A.218A%mblanche@berkeley.edu>

Brian--

Even when both elements of intersection() are from the negative strand, the
return object is from the positive strand and $overlap is actually the
revervese complement of the intersection between the 2 exons. Here is part
of the output from the script below:

===
ex1     Strand: -1
CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
ex2     Strand: -1
CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
CAAATCG
overlap Strand: 1
CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
TGCCGACTGCCATGTTCAACTAATAAACCGG
AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
...

If both are from the positive strand, the return object is positive as in:

===
ex1     Strand: 1
CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
TTTGTGCCTGTTTCAGTATAAATTAATTATG
CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
AAATATACATATATGCAACATATATAACTTC
CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
ex2     Strand: 1
ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
overlap Strand: 1
CAACGCAGACGTG

Is there something I am missing? Here is the script generating the output

Many thanks all...

Marco


use strict;
use warnings;
use Bio::DB::GFF;

MAIN:{

    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                -dsn =>
'dbi:mysql:database=dmel_43_LS;host=riolab.net',
                                -user => 'guest');
    my $test_db = $db->segment('4');
    
    # Load up the exons into $exons_p
    for my $gene ($test_db->features(-types => 'gene')){

        my $exons_p = extractExons($gene);

        cluster($exons_p) unless ($#{$exons_p} == -1);

    }
}

sub extractExons {
    my $gene = shift;
    my %ex_list;
    my @tcs = $gene->features(    -type =>'processed_transcript',
                                    -attributes =>{Gene => $gene->group});
                   
    for my $tc (@tcs){
        my @exons = $tc->features (-type => 'exon',
                                     -attributes => {Parent => $tc->group}
);        
    
        for (@exons){
            my $ex_id    = $_->id;
            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});

        }
    
    }
    my @values = values %ex_list;
    return(\@values);
}

sub cluster {
    my $exons_p = shift;
    
    for (my $s = 0; $s <= $#{$exons_p}; $s++){
        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
            my $exon1 = $exons_p->[$s];
            my $exon2 = $exons_p->[$t];
            
            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
            
                my $overlap = $exon1->intersection($exon2);
                
                print "===\n";;
                print     "ex1\tStrand: ", $exon1->strand, "\n",
                        $exon1->seq, "\n";
                print     "ex2\tStrand: ", $exon2->strand, "\n",
                        $exon2->seq, "\n";
                print "overlap\tStrand: ", $overlap->strand, "\n",
                        $overlap->seq, "\n";
            }
        }
    }
}

On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:

> Marco,
> 
> Yes, this is how intersection() is supposed to work. If both of the Range
> objects have the same strand then the strand information is returned as part
> of the result but if they aren't on the same strand then no strand
> information is returned.
> 
> Brian O.
> 
> 
> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> 
>> Dear all--
>> 
>> I have been trying to use the intersection function to extract overlapping
>> region from alternatively spliced exons as in the following script. The
>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
>> Is this behavior expected? Should I check the strand of $exon1 before
>> working on the object return by any Bio::RangeI function?
>> 
>> Many thanks 
>> 
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> 
>> MAIN:{
>> 
>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                 -dsn =>
>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>                                 -user => 'guest');
>>     my $test_db = $db->segment('4');
>>     
>>     # Load up the exons into $exons_p
>>     for my $gene ($test_db->features(-types => 'gene')){
>> 
>>         my $exons_p = extractExons($gene);
>> 
>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>> 
>>     }
>> }
>> 
>> sub extractExons {
>>     my $gene = shift;
>>     my %ex_list;
>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>                                     -attributes =>{Gene => $gene->group});
>>                 
>>     for my $tc (@tcs){
>>         my @exons = $tc->features (-type => 'exon',
>>                                      -attributes => {Parent => $tc->group}
>> );        
>>     
>>         for (@exons){
>>             my $ex_id    = $_->id;
>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>> 
>>         }
>>     
>>     }
>>     my @values = values %ex_list;
>>     return(\@values);
>> }
>> 
>> sub cluster {
>>     my $exons_p = shift;
>>     
>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>             my $exon1 = $exons_p->[$s];
>>             my $exon2 = $exons_p->[$t];
>>             
>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>             
>>                 my $overlap = $exon1->intersection($exon2);
>>                 
>>                 print "===\n";;
>>                 print "ex1\n", $exon1->seq, "\n";
>>                 print "ex2\n", $exon2->seq, "\n";
>>                 print "overlap\n", $overlap->seq, "\n";
>>             }
>>         }
>>     }
>> }
>> ______________________________
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at uclink.berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
> 
> 

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From osborne1 at optonline.net  Tue May  2 17:49:49 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 02 May 2006 17:49:49 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D100A.218A%mblanche@berkeley.edu>
Message-ID: <C07D4C3D.84C4%osborne1@optonline.net>

Marco,

Odd, because the intersection() code is quite simple and it's clear how it
should behave. What version of Bioperl are you using? I'm looking at the
latest, in bioperl-live...

Brian O.


On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:

> Brian--
> 
> Even when both elements of intersection() are from the negative strand, the
> return object is from the positive strand and $overlap is actually the
> revervese complement of the intersection between the 2 exons. Here is part
> of the output from the script below:
> 
> ===
> ex1     Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
> ex2     Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
> CAAATCG
> overlap Strand: 1
> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
> TGCCGACTGCCATGTTCAACTAATAAACCGG
> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
> ...
> 
> If both are from the positive strand, the return object is positive as in:
> 
> ===
> ex1     Strand: 1
> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
> TTTGTGCCTGTTTCAGTATAAATTAATTATG
> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
> AAATATACATATATGCAACATATATAACTTC
> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
> ex2     Strand: 1
> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
> overlap Strand: 1
> CAACGCAGACGTG
> 
> Is there something I am missing? Here is the script generating the output
> 
> Many thanks all...
> 
> Marco
> 
> 
> use strict;
> use warnings;
> use Bio::DB::GFF;
> 
> MAIN:{
> 
>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                 -dsn =>
> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                 -user => 'guest');
>     my $test_db = $db->segment('4');
>     
>     # Load up the exons into $exons_p
>     for my $gene ($test_db->features(-types => 'gene')){
> 
>         my $exons_p = extractExons($gene);
> 
>         cluster($exons_p) unless ($#{$exons_p} == -1);
> 
>     }
> }
> 
> sub extractExons {
>     my $gene = shift;
>     my %ex_list;
>     my @tcs = $gene->features(    -type =>'processed_transcript',
>                                     -attributes =>{Gene => $gene->group});
>                  
>     for my $tc (@tcs){
>         my @exons = $tc->features (-type => 'exon',
>                                      -attributes => {Parent => $tc->group}
> );        
>     
>         for (@exons){
>             my $ex_id    = $_->id;
>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> 
>         }
>     
>     }
>     my @values = values %ex_list;
>     return(\@values);
> }
> 
> sub cluster {
>     my $exons_p = shift;
>     
>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>             my $exon1 = $exons_p->[$s];
>             my $exon2 = $exons_p->[$t];
>             
>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>             
>                 my $overlap = $exon1->intersection($exon2);
>                 
>                 print "===\n";;
>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
>                         $exon1->seq, "\n";
>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
>                         $exon2->seq, "\n";
>                 print "overlap\tStrand: ", $overlap->strand, "\n",
>                         $overlap->seq, "\n";
>             }
>         }
>     }
> }
> 
> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
> 
>> Marco,
>> 
>> Yes, this is how intersection() is supposed to work. If both of the Range
>> objects have the same strand then the strand information is returned as part
>> of the result but if they aren't on the same strand then no strand
>> information is returned.
>> 
>> Brian O.
>> 
>> 
>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>> 
>>> Dear all--
>>> 
>>> I have been trying to use the intersection function to extract overlapping
>>> region from alternatively spliced exons as in the following script. The
>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
>>> Is this behavior expected? Should I check the strand of $exon1 before
>>> working on the object return by any Bio::RangeI function?
>>> 
>>> Many thanks 
>>> 
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::DB::GFF;
>>> 
>>> MAIN:{
>>> 
>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>                                 -dsn =>
>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>>                                 -user => 'guest');
>>>     my $test_db = $db->segment('4');
>>>     
>>>     # Load up the exons into $exons_p
>>>     for my $gene ($test_db->features(-types => 'gene')){
>>> 
>>>         my $exons_p = extractExons($gene);
>>> 
>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>>> 
>>>     }
>>> }
>>> 
>>> sub extractExons {
>>>     my $gene = shift;
>>>     my %ex_list;
>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>>                                     -attributes =>{Gene => $gene->group});
>>>                
>>>     for my $tc (@tcs){
>>>         my @exons = $tc->features (-type => 'exon',
>>>                                      -attributes => {Parent => $tc->group}
>>> );        
>>>     
>>>         for (@exons){
>>>             my $ex_id    = $_->id;
>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>> 
>>>         }
>>>     
>>>     }
>>>     my @values = values %ex_list;
>>>     return(\@values);
>>> }
>>> 
>>> sub cluster {
>>>     my $exons_p = shift;
>>>     
>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>>             my $exon1 = $exons_p->[$s];
>>>             my $exon2 = $exons_p->[$t];
>>>             
>>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>             
>>>                 my $overlap = $exon1->intersection($exon2);
>>>                
>>>                 print "===\n";;
>>>                 print "ex1\n", $exon1->seq, "\n";
>>>                 print "ex2\n", $exon2->seq, "\n";
>>>                 print "overlap\n", $overlap->seq, "\n";
>>>             }
>>>         }
>>>     }
>>> }
>>> ______________________________
>>> Marco Blanchette, Ph.D.
>>> 
>>> mblanche at uclink.berkeley.edu
>>> 
>>> Donald C. Rio's lab
>>> Department of Molecular and Cell Biology
>>> 16 Barker Hall
>>> University of California
>>> Berkeley, CA 94720-3204
>>> 
>>> Tel: (510) 642-1084
>>> Cell: (510) 847-0996
>>> Fax: (510) 642-6062
>> 
>> 
> 
> ______________________________
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062


From mblanche at berkeley.edu  Tue May  2 18:31:44 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 15:31:44 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D4C3D.84C4%osborne1@optonline.net>
Message-ID: <C07D2BE0.2196%mblanche@berkeley.edu>

Brian--

I checked out last week version from the CVS.

Silly question: How do I get the version of BioPerl I am using... Never had
to check a module/bundle version number before...

Marco


On 5/2/06 14:49, "Brian Osborne" <osborne1 at optonline.net> wrote:

> Marco,
> 
> Odd, because the intersection() code is quite simple and it's clear how it
> should behave. What version of Bioperl are you using? I'm looking at the
> latest, in bioperl-live...
> 
> Brian O.
> 
> 
> On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> 
>> Brian--
>> 
>> Even when both elements of intersection() are from the negative strand, the
>> return object is from the positive strand and $overlap is actually the
>> revervese complement of the intersection between the 2 exons. Here is part
>> of the output from the script below:
>> 
>> ===
>> ex1     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
>> ex2     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
>> CAAATCG
>> overlap Strand: 1
>> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
>> TGCCGACTGCCATGTTCAACTAATAAACCGG
>> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
>> ...
>> 
>> If both are from the positive strand, the return object is positive as in:
>> 
>> ===
>> ex1     Strand: 1
>> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
>> TTTGTGCCTGTTTCAGTATAAATTAATTATG
>> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
>> AAATATACATATATGCAACATATATAACTTC
>> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
>> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
>> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
>> ex2     Strand: 1
>> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
>> overlap Strand: 1
>> CAACGCAGACGTG
>> 
>> Is there something I am missing? Here is the script generating the output
>> 
>> Many thanks all...
>> 
>> Marco
>> 
>> 
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> 
>> MAIN:{
>> 
>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                 -dsn =>
>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>                                 -user => 'guest');
>>     my $test_db = $db->segment('4');
>>     
>>     # Load up the exons into $exons_p
>>     for my $gene ($test_db->features(-types => 'gene')){
>> 
>>         my $exons_p = extractExons($gene);
>> 
>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>> 
>>     }
>> }
>> 
>> sub extractExons {
>>     my $gene = shift;
>>     my %ex_list;
>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>                                     -attributes =>{Gene => $gene->group});
>>                 
>>     for my $tc (@tcs){
>>         my @exons = $tc->features (-type => 'exon',
>>                                      -attributes => {Parent => $tc->group}
>> );        
>>     
>>         for (@exons){
>>             my $ex_id    = $_->id;
>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>> 
>>         }
>>     
>>     }
>>     my @values = values %ex_list;
>>     return(\@values);
>> }
>> 
>> sub cluster {
>>     my $exons_p = shift;
>>     
>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>             my $exon1 = $exons_p->[$s];
>>             my $exon2 = $exons_p->[$t];
>>             
>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>             
>>                 my $overlap = $exon1->intersection($exon2);
>>                 
>>                 print "===\n";;
>>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
>>                         $exon1->seq, "\n";
>>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
>>                         $exon2->seq, "\n";
>>                 print "overlap\tStrand: ", $overlap->strand, "\n",
>>                         $overlap->seq, "\n";
>>             }
>>         }
>>     }
>> }
>> 
>> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
>> 
>>> Marco,
>>> 
>>> Yes, this is how intersection() is supposed to work. If both of the Range
>>> objects have the same strand then the strand information is returned as part
>>> of the result but if they aren't on the same strand then no strand
>>> information is returned.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>>> 
>>>> Dear all--
>>>> 
>>>> I have been trying to use the intersection function to extract overlapping
>>>> region from alternatively spliced exons as in the following script. The
>>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>>> actually loosing the strand of $exon1 if $exon1 is from the negative
>>>> strand.
>>>> Is this behavior expected? Should I check the strand of $exon1 before
>>>> working on the object return by any Bio::RangeI function?
>>>> 
>>>> Many thanks 
>>>> 
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use Bio::DB::GFF;
>>>> 
>>>> MAIN:{
>>>> 
>>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>>                                 -dsn =>
>>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>>>                                 -user => 'guest');
>>>>     my $test_db = $db->segment('4');
>>>>     
>>>>     # Load up the exons into $exons_p
>>>>     for my $gene ($test_db->features(-types => 'gene')){
>>>> 
>>>>         my $exons_p = extractExons($gene);
>>>> 
>>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>>>> 
>>>>     }
>>>> }
>>>> 
>>>> sub extractExons {
>>>>     my $gene = shift;
>>>>     my %ex_list;
>>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>>>                                     -attributes =>{Gene => $gene->group});
>>>>               
>>>>     for my $tc (@tcs){
>>>>         my @exons = $tc->features (-type => 'exon',
>>>>                                      -attributes => {Parent => $tc->group}
>>>> );        
>>>>     
>>>>         for (@exons){
>>>>             my $ex_id    = $_->id;
>>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>>> 
>>>>         }
>>>>     
>>>>     }
>>>>     my @values = values %ex_list;
>>>>     return(\@values);
>>>> }
>>>> 
>>>> sub cluster {
>>>>     my $exons_p = shift;
>>>>     
>>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>>>             my $exon1 = $exons_p->[$s];
>>>>             my $exon2 = $exons_p->[$t];
>>>>             
>>>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>>             
>>>>                 my $overlap = $exon1->intersection($exon2);
>>>>               
>>>>                 print "===\n";;
>>>>                 print "ex1\n", $exon1->seq, "\n";
>>>>                 print "ex2\n", $exon2->seq, "\n";
>>>>                 print "overlap\n", $overlap->seq, "\n";
>>>>             }
>>>>         }
>>>>     }
>>>> }
>>>> ______________________________
>>>> Marco Blanchette, Ph.D.
>>>> 
>>>> mblanche at uclink.berkeley.edu
>>>> 
>>>> Donald C. Rio's lab
>>>> Department of Molecular and Cell Biology
>>>> 16 Barker Hall
>>>> University of California
>>>> Berkeley, CA 94720-3204
>>>> 
>>>> Tel: (510) 642-1084
>>>> Cell: (510) 847-0996
>>>> Fax: (510) 642-6062
>>> 
>>> 
>> 
>> ______________________________
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at uclink.berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
> 
> 

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From arareko at campus.iztacala.unam.mx  Tue May  2 18:32:24 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Tue, 02 May 2006 17:32:24 -0500
Subject: [Bioperl-l] BioPerl-run in FreeBSD
Message-ID: <4457DDF8.4050005@campus.iztacala.unam.mx>

It?s my great pleasure to announce the availability of the BioPerl-run 
packages (stable & developer releases) for the FreeBSD operating system.

For instructions on how to install BioPerl ports in FreeBSD, please take 
a look into the Getting Bioperl section of the BioPerl Wiki.

Regards,
Mauricio.
-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From heikki at sanbi.ac.za  Wed May  3 02:51:12 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 3 May 2006 08:51:12 +0200
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D2BE0.2196%mblanche@berkeley.edu>
References: <C07D2BE0.2196%mblanche@berkeley.edu>
Message-ID: <200605030851.13007.heikki@sanbi.ac.za>

On Wednesday 03 May 2006 00:31, Marco Blanchette wrote:
> Brian--
>
> I checked out last week version from the CVS.
>
> Silly question: How do I get the version of BioPerl I am using... Never had
> to check a module/bundle version number before...

It is not that silly. The syntax in not too easy:

	perl -MBio::Perl -le 'print Bio::Perl->VERSION;'

You can use any module in bioperl, of course.

     -Heikki

> Marco
>
> On 5/2/06 14:49, "Brian Osborne" <osborne1 at optonline.net> wrote:
> > Marco,
> >
> > Odd, because the intersection() code is quite simple and it's clear how
> > it should behave. What version of Bioperl are you using? I'm looking at
> > the latest, in bioperl-live...
> >
> > Brian O.
> >
> > On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> >> Brian--
> >>
> >> Even when both elements of intersection() are from the negative strand,
> >> the return object is from the positive strand and $overlap is actually
> >> the revervese complement of the intersection between the 2 exons. Here
> >> is part of the output from the script below:
> >>
> >> ===
> >> ex1     Strand: -1
> >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA
> >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
> >> ex2     Strand: -1
> >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA
> >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAAC
> >>CCGT CAAATCG
> >> overlap Strand: 1
> >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTA
> >>TTTT TGCCGACTGCCATGTTCAACTAATAAACCGG
> >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
> >> ...
> >>
> >> If both are from the positive strand, the return object is positive as
> >> in:
> >>
> >> ===
> >> ex1     Strand: 1
> >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCT
> >>TTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG
> >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGAT
> >>GAAT AAATATACATATATGCAACATATATAACTTC
> >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGG
> >>CAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
> >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
> >> ex2     Strand: 1
> >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
> >> overlap Strand: 1
> >> CAACGCAGACGTG
> >>
> >> Is there something I am missing? Here is the script generating the
> >> output
> >>
> >> Many thanks all...
> >>
> >> Marco
> >>
> >>
> >> use strict;
> >> use warnings;
> >> use Bio::DB::GFF;
> >>
> >> MAIN:{
> >>
> >>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >>                                 -dsn =>
> >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
> >>                                 -user => 'guest');
> >>     my $test_db = $db->segment('4');
> >>
> >>     # Load up the exons into $exons_p
> >>     for my $gene ($test_db->features(-types => 'gene')){
> >>
> >>         my $exons_p = extractExons($gene);
> >>
> >>         cluster($exons_p) unless ($#{$exons_p} == -1);
> >>
> >>     }
> >> }
> >>
> >> sub extractExons {
> >>     my $gene = shift;
> >>     my %ex_list;
> >>     my @tcs = $gene->features(    -type =>'processed_transcript',
> >>                                     -attributes =>{Gene =>
> >> $gene->group});
> >>
> >>     for my $tc (@tcs){
> >>         my @exons = $tc->features (-type => 'exon',
> >>                                      -attributes => {Parent =>
> >> $tc->group} );
> >>
> >>         for (@exons){
> >>             my $ex_id    = $_->id;
> >>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> >>
> >>         }
> >>
> >>     }
> >>     my @values = values %ex_list;
> >>     return(\@values);
> >> }
> >>
> >> sub cluster {
> >>     my $exons_p = shift;
> >>
> >>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
> >>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
> >>             my $exon1 = $exons_p->[$s];
> >>             my $exon2 = $exons_p->[$t];
> >>
> >>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
> >>
> >>                 my $overlap = $exon1->intersection($exon2);
> >>
> >>                 print "===\n";;
> >>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
> >>                         $exon1->seq, "\n";
> >>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
> >>                         $exon2->seq, "\n";
> >>                 print "overlap\tStrand: ", $overlap->strand, "\n",
> >>                         $overlap->seq, "\n";
> >>             }
> >>         }
> >>     }
> >> }
> >>
> >> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
> >>> Marco,
> >>>
> >>> Yes, this is how intersection() is supposed to work. If both of the
> >>> Range objects have the same strand then the strand information is
> >>> returned as part of the result but if they aren't on the same strand
> >>> then no strand information is returned.
> >>>
> >>> Brian O.
> >>>
> >>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> >>>> Dear all--
> >>>>
> >>>> I have been trying to use the intersection function to extract
> >>>> overlapping region from alternatively spliced exons as in the
> >>>> following script. The returned object from the 'my $overlap =
> >>>> $exon1->intersection($exon2);' is actually loosing the strand of
> >>>> $exon1 if $exon1 is from the negative strand.
> >>>> Is this behavior expected? Should I check the strand of $exon1 before
> >>>> working on the object return by any Bio::RangeI function?
> >>>>
> >>>> Many thanks
> >>>>
> >>>> #!/usr/bin/perl
> >>>> use strict;
> >>>> use warnings;
> >>>> use Bio::DB::GFF;
> >>>>
> >>>> MAIN:{
> >>>>
> >>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >>>>                                 -dsn =>
> >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
> >>>>                                 -user => 'guest');
> >>>>     my $test_db = $db->segment('4');
> >>>>
> >>>>     # Load up the exons into $exons_p
> >>>>     for my $gene ($test_db->features(-types => 'gene')){
> >>>>
> >>>>         my $exons_p = extractExons($gene);
> >>>>
> >>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
> >>>>
> >>>>     }
> >>>> }
> >>>>
> >>>> sub extractExons {
> >>>>     my $gene = shift;
> >>>>     my %ex_list;
> >>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
> >>>>                                     -attributes =>{Gene =>
> >>>> $gene->group});
> >>>>
> >>>>     for my $tc (@tcs){
> >>>>         my @exons = $tc->features (-type => 'exon',
> >>>>                                      -attributes => {Parent =>
> >>>> $tc->group} );
> >>>>
> >>>>         for (@exons){
> >>>>             my $ex_id    = $_->id;
> >>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> >>>>
> >>>>         }
> >>>>
> >>>>     }
> >>>>     my @values = values %ex_list;
> >>>>     return(\@values);
> >>>> }
> >>>>
> >>>> sub cluster {
> >>>>     my $exons_p = shift;
> >>>>
> >>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
> >>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
> >>>>             my $exon1 = $exons_p->[$s];
> >>>>             my $exon2 = $exons_p->[$t];
> >>>>
> >>>>             if (!($exon1->equals($exon2)) &&
> >>>> $exon1->overlaps($exon2)){
> >>>>
> >>>>                 my $overlap = $exon1->intersection($exon2);
> >>>>
> >>>>                 print "===\n";;
> >>>>                 print "ex1\n", $exon1->seq, "\n";
> >>>>                 print "ex2\n", $exon2->seq, "\n";
> >>>>                 print "overlap\n", $overlap->seq, "\n";
> >>>>             }
> >>>>         }
> >>>>     }
> >>>> }
> >>>> ______________________________
> >>>> Marco Blanchette, Ph.D.
> >>>>
> >>>> mblanche at uclink.berkeley.edu
> >>>>
> >>>> Donald C. Rio's lab
> >>>> Department of Molecular and Cell Biology
> >>>> 16 Barker Hall
> >>>> University of California
> >>>> Berkeley, CA 94720-3204
> >>>>
> >>>> Tel: (510) 642-1084
> >>>> Cell: (510) 847-0996
> >>>> Fax: (510) 642-6062
> >>
> >> ______________________________
> >> Marco Blanchette, Ph.D.
> >>
> >> mblanche at uclink.berkeley.edu
> >>
> >> Donald C. Rio's lab
> >> Department of Molecular and Cell Biology
> >> 16 Barker Hall
> >> University of California
> >> Berkeley, CA 94720-3204
> >>
> >> Tel: (510) 642-1084
> >> Cell: (510) 847-0996
> >> Fax: (510) 642-6062
>
> ______________________________
> Marco Blanchette, Ph.D.
>
> mblanche at uclink.berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From nuclearn at gmail.com  Wed May  3 02:05:42 2006
From: nuclearn at gmail.com (Li Xiao)
Date: Wed, 3 May 2006 14:05:42 +0800
Subject: [Bioperl-l] about the frame and strand of a blastx report
Message-ID: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com>

Hi, anybody,

    I am working to parse a blastx report by using BioPerl modules
(Bio::SearchIO).
The blastx result was created by NCBI-BLAST. How i can obtain the strand ( +
or -)
of query sequence against the hited protein? I tried to use the strand
function, but
nothing were reported. And i used the frame funtion, the result usually
display 0,1,2,
so, the result can not give any information about the query strand( + o r-
).
  How i obtain the strand of a query squence?
--
*********************************************************************
Li Xiao
Sichuan Key Laboratory of Molecular Biology and Biotechnology
College of Life Science, Sichuan University
Chengdu, SiChuan, P.R.China
TEL:86-28-85470083 FAX:86-28-85412738
E-MAIL: nuclearn at gmail.com
URL: http://scbi.scu.edu.cn
**********************************************************************


From cjfields at uiuc.edu  Wed May  3 09:38:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 08:38:17 -0500
Subject: [Bioperl-l] about the frame and strand of a blastx report
In-Reply-To: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com>
Message-ID: <000601c66eb6$d5d5f530$15327e82@pyrimidine>

$hsp->strand():

my $parser = Bio::SearchIO->new (-file => shift @ARGV,
                                 -format => 'blast');

while (my $result = $parser->next_result) {
    while (my $hit = $result->next_hit) {
        while (my $hsp = $hit->next_hsp) {
            print $hsp->strand,"\n";
        }
    }
}

This will give 1 or -1.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Li Xiao
> Sent: Wednesday, May 03, 2006 1:06 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] about the frame and strand of a blastx report
> 
> Hi, anybody,
> 
>     I am working to parse a blastx report by using BioPerl modules
> (Bio::SearchIO).
> The blastx result was created by NCBI-BLAST. How i can obtain the strand (
> +
> or -)
> of query sequence against the hited protein? I tried to use the strand
> function, but
> nothing were reported. And i used the frame funtion, the result usually
> display 0,1,2,
> so, the result can not give any information about the query strand( + o r-
> ).
>   How i obtain the strand of a query squence?
> --
> *********************************************************************
> Li Xiao
> Sichuan Key Laboratory of Molecular Biology and Biotechnology
> College of Life Science, Sichuan University
> Chengdu, SiChuan, P.R.China
> TEL:86-28-85470083 FAX:86-28-85412738
> E-MAIL: nuclearn at gmail.com
> URL: http://scbi.scu.edu.cn
> **********************************************************************
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Wed May  3 11:22:27 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 03 May 2006 11:22:27 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
Message-ID: <C07E42F3.84E3%osborne1@optonline.net>

Mark,

So you're trying to get the information in the RC line from a Swissprot
format file?

Brian O.


On 5/2/06 7:41 AM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:

> Hello all.
> 
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
> 
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
> 
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
> 
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
> 
> The code I have so far reports the species but not the subspecies or
> variant.  I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need.  (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.)  Also, this
> code only prints the information...  I know that I'll have to write a
> FASTA sequence object seperately.
> 
> Any suggestions?
> 
> Thanks,
> Mark
> 
> ---   ---   ---
> 
> 
> #!/usr/bin/perl
> 
> 
> 
> use Bio::SeqIO;
> 
> 
> 
> my $usage = "getaccs.pl file format\n";
> 
> my $file = shift or die $usage;
> 
> my $format = shift or die $usage;
> 
> 
> 
> my $inseq = Bio::SeqIO->new(-file   => "<$file",
> 
>    -format => $format );
> 
> 
> 
> while (my $seq = $inseq->next_seq) {
> 
> 
> 
>   my $species_object = $seq->species;
> 
>   my $species_string = $species_object->species;
> 
>   my $variant_string = $species_object->variant;
> 
>   my $common_string = $species_object->common_name;
> 
>   my $sub_string = $species_object->sub_species;
> 
>   my $binomial = $species_object->binomial('FULL');
> 
>   
> 
>   print "display   ",$seq->display_id,"\n";
> 
>   print "accession ",$seq->accession_number,"\n";
> 
>   print "desc      ",$seq->desc,"\n";
> 
>   
> 
>   print "species   ",$species_string,"\n";
> 
>   print "variant   ",$variant_string,"\n";
> 
>   print "common    ",$common_string,"\n";
> 
>   print "sub       ",$sub_string,"\n";
> 
>   print "binomial  ",$binomial,"\n";
> 
>   
> 
>   print $seq->seq,"\n";
> 
>   
> 
>   my $anno_collection = $seq->annotation;
> 
>   for my $key ( $anno_collection->get_all_annotation_keys ) {
> 
>     my @annotations = $anno_collection->get_Annotations($key);
> 
>     for my $value ( @annotations ) {
> 
>       print "tagname : ", $value->tagname, "\n";
> 
>       # $value is an Bio::Annotation, and has an "as_text" method
> 
>       print "  annotation value: ", $value->as_text, "\n";
> 
> 
> 
>        if ($value->tagname eq "reference") {
> 
>         my $hash_ref = $value->hash_tree;
> 
>         for my $key (keys %{$hash_ref}) {
> 
>           print $key,": ",$hash_ref->{$key},"\n";
> 
>         }
> 
>       }
> 
>     }
> 
>   }
> 
>   print "\n";
> 
> }
> 
> exit;
> 
> 
> 
> 
> 
> ---   ---   ---   ---   ---   ---   ---   ---
> 
> Mark A. Miller
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From MEC at stowers-institute.org  Wed May  3 11:09:04 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 3 May 2006 10:09:04 -0500
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
Message-ID: <CED81D34E37D5043A1211565277A51E504D2E369@exchkc02.stowers-institute.org>

Marco,

It appears that your code assumes that the exons as returned from call
to BIO::DB::GFF::features are sorted by start; I don't think is
guaranteed (at least not in the documentation I'm reading).  Also I
think your code will not report overlap between two exons that have an
intervening overlapping exon.  Depending on what you're application is,
you may care.  For example, e1, e2, e3 all intersect pairwise, but your
code won't report on e1's overlap with e3.

e1 ---*******-------
e2 -----******------
e3 ------***--------

Out of curiousity, what is your application?  Designing primers for gene
resequencing?

Cheers,

Malcolm Cook
Database Applications Manager, Bioinformatics
Stowers Institute for Medical Research 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Marco Blanchette
>Sent: Tuesday, May 02, 2006 2:31 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
>
>Dear all--
>
>I have been trying to use the intersection function to extract 
>overlapping
>region from alternatively spliced exons as in the following script. The
>returned object from the 'my $overlap = 
>$exon1->intersection($exon2);' is
>actually loosing the strand of $exon1 if $exon1 is from the 
>negative strand.
>Is this behavior expected? Should I check the strand of $exon1 before
>working on the object return by any Bio::RangeI function?
>
>Many thanks 
>
>#!/usr/bin/perl
>use strict;
>use warnings;
>use Bio::DB::GFF;
>
>MAIN:{
>
>    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                -dsn =>
>'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                -user => 'guest');
>    my $test_db = $db->segment('4');
>    
>    # Load up the exons into $exons_p
>    for my $gene ($test_db->features(-types => 'gene')){
>
>        my $exons_p = extractExons($gene);
>
>        cluster($exons_p) unless ($#{$exons_p} == -1);
>
>    }
>}
>
>sub extractExons {
>    my $gene = shift;
>    my %ex_list;
>    my @tcs = $gene->features(    -type =>'processed_transcript',
>                                    -attributes =>{Gene => 
>$gene->group});
>                   
>    for my $tc (@tcs){
>        my @exons = $tc->features (-type => 'exon',
>                                     -attributes => {Parent => 
>$tc->group}
>);        
>    
>        for (@exons){
>            my $ex_id    = $_->id;
>            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>
>        }
>    
>    }
>    my @values = values %ex_list;
>    return(\@values);
>}
>
>sub cluster {
>    my $exons_p = shift;
>    
>    for (my $s = 0; $s <= $#{$exons_p}; $s++){
>        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>            my $exon1 = $exons_p->[$s];
>            my $exon2 = $exons_p->[$t];
>            
>            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>            
>                my $overlap = $exon1->intersection($exon2);
>                
>                print "===\n";;
>                print "ex1\n", $exon1->seq, "\n";
>                print "ex2\n", $exon2->seq, "\n";
>                print "overlap\n", $overlap->seq, "\n";
>            }
>        }
>    }
>}
>______________________________
>Marco Blanchette, Ph.D.
>
>mblanche at uclink.berkeley.edu
>
>Donald C. Rio's lab
>Department of Molecular and Cell Biology
>16 Barker Hall
>University of California
>Berkeley, CA 94720-3204
>
>Tel: (510) 642-1084
>Cell: (510) 847-0996
>Fax: (510) 642-6062
>-- 
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From sdavis2 at mail.nih.gov  Wed May  3 12:18:48 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 03 May 2006 12:18:48 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <CED81D34E37D5043A1211565277A51E504D2E369@exchkc02.stowers-institute.org>
Message-ID: <C07E5028.AF8A%sdavis2@mail.nih.gov>


On 5/3/06 11:09 AM, "Cook, Malcolm" <MEC at stowers-institute.org> wrote:

> Marco,
> 
> It appears that your code assumes that the exons as returned from call
> to BIO::DB::GFF::features are sorted by start; I don't think is
> guaranteed (at least not in the documentation I'm reading).  Also I
> think your code will not report overlap between two exons that have an
> intervening overlapping exon.  Depending on what you're application is,
> you may care.  For example, e1, e2, e3 all intersect pairwise, but your
> code won't report on e1's overlap with e3.
> 
> e1 ---*******-------
> e2 -----******------
> e3 ------***--------

I think this can be done (looking for "superexons") via the UCSC table
browser or via Penn State University's Galaxy server (written in python and
downloadable) in case you want a quick solution to what I think is your
problem....

Sean


From osborne1 at optonline.net  Wed May  3 16:22:57 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 03 May 2006 16:22:57 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060503193446.92476.qmail@web50412.mail.yahoo.com>
Message-ID: <C07E8961.84F2%osborne1@optonline.net>

Mark,

The RC line is part of the description of a reference, I'm guessing 'RC'
stands for Reference Comment. In order to get the attributes of a reference
you'll first do something like:

my $anno_collection = $seq->annotation;
my @references = $anno_collection->get_Annotations('reference');

To get the comment field for a specific reference you can do:

$references[0]->comment;

See the Feature-Annotation HOWTO for more information on Annotations, the
Reference object is a kind of Annotation object.

Brian O.


On 5/3/06 3:34 PM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:

> Yeah.  Do you have any experience with that?
> 
> Mark
> 
> --- Brian Osborne <osborne1 at optonline.net> wrote:
> 
>> Mark,
>> 
>> So you're trying to get the information in the RC line from a
>> Swissprot
>> format file?
>> 
>> Brian O.
> 
> 
> ---   ---   ---   ---   ---   ---   ---   ---
> 
> Mark A. Miller
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 


From cjfields at uiuc.edu  Wed May  3 17:09:36 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 16:09:36 -0500
Subject: [Bioperl-l] Batch retrieval partially implemented in
	Bio::DB::GenBank/GenPept
Message-ID: <000601c66ef5$e3066d90$15327e82@pyrimidine>

Just wanted to let you guys know I have added a few bits and pieces to
Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
epost/efetch.  I didn't want to break anything too severely so you can only
use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
methods yet).  I also added tests to DB.t, a few each for protein and
nucleotide retrieval using batch mode and so far they all pass fine.  

I haven't tested the upper sequence limit for this yet to see if it's at all
comparable to just using efetch but it seems a bit faster.  The eutils
coursebook states that one should only post ~500 at a time (I think you can
get a bit higher though).

Also, at the moment it only works at the moment for GI's (NOT accessions,
which apparently epost does not accept).  If we want to continue using this
method for retrieval then we may need a workaround for accs.

CJF

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From torsten.seemann at infotech.monash.edu.au  Wed May  3 17:44:48 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 04 May 2006 07:44:48 +1000
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D2BE0.2196%mblanche@berkeley.edu>
References: <C07D2BE0.2196%mblanche@berkeley.edu>
Message-ID: <1146692688.12571.1.camel@chauvel.csse.monash.edu.au>

Marco,

> Silly question: How do I get the version of BioPerl I am using... Never had
> to check a module/bundle version number before...

http://bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

-- 
Torsten Seemann <torsten.seemann at infotech.monash.edu.au>
Victorian Bioinformatics Consortium


From cjfields at uiuc.edu  Wed May  3 18:08:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 17:08:37 -0500
Subject: [Bioperl-l] Batch retrieval partially implemented
	inBio::DB::GenBank/GenPept
In-Reply-To: <000601c66ef5$e3066d90$15327e82@pyrimidine>
Message-ID: <000001c66efe$21dbcf80$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Wednesday, May 03, 2006 4:10 PM
> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Batch retrieval partially implemented
> inBio::DB::GenBank/GenPept
> 
> Just wanted to let you guys know I have added a few bits and pieces to
> Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
                     ^^^^^^^^^^^^^^^^^^^
                     Bio::DB::NCBIHelper
Fat fingers!

> epost/efetch.  I didn't want to break anything too severely so you can
> only
> use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
> methods yet).  I also added tests to DB.t, a few each for protein and
> nucleotide retrieval using batch mode and so far they all pass fine.
> 
> I haven't tested the upper sequence limit for this yet to see if it's at
> all
> comparable to just using efetch but it seems a bit faster.  The eutils
> coursebook states that one should only post ~500 at a time (I think you
> can
> get a bit higher though).
> 
> Also, at the moment it only works at the moment for GI's (NOT accessions,
> which apparently epost does not accept).  If we want to continue using
> this
> method for retrieval then we may need a workaround for accs.
> 
> CJF
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed May  3 18:24:23 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 03 May 2006 17:24:23 -0500
Subject: [Bioperl-l] Batch retrieval partially
	implemented	inBio::DB::GenBank/GenPept
In-Reply-To: <000001c66efe$21dbcf80$15327e82@pyrimidine>
References: <000001c66efe$21dbcf80$15327e82@pyrimidine>
Message-ID: <44592D97.6090906@campus.iztacala.unam.mx>

hehehe :)

Chris Fields wrote:
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Wednesday, May 03, 2006 4:10 PM
>> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Batch retrieval partially implemented
>> inBio::DB::GenBank/GenPept
>>
>> Just wanted to let you guys know I have added a few bits and pieces to
>> Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
>                      ^^^^^^^^^^^^^^^^^^^
>                      Bio::DB::NCBIHelper
> Fat fingers!
> 
>> epost/efetch.  I didn't want to break anything too severely so you can
>> only
>> use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
>> methods yet).  I also added tests to DB.t, a few each for protein and
>> nucleotide retrieval using batch mode and so far they all pass fine.
>>
>> I haven't tested the upper sequence limit for this yet to see if it's at
>> all
>> comparable to just using efetch but it seems a bit faster.  The eutils
>> coursebook states that one should only post ~500 at a time (I think you
>> can
>> get a bit higher though).
>>
>> Also, at the moment it only works at the moment for GI's (NOT accessions,
>> which apparently epost does not accept).  If we want to continue using
>> this
>> method for retrieval then we may need a workaround for accs.
>>
>> CJF
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From fernan at iib.unsam.edu.ar  Wed May  3 20:38:07 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Wed, 3 May 2006 21:38:07 -0300
Subject: [Bioperl-l] BioPerl-run in FreeBSD
In-Reply-To: <4457DDF8.4050005@campus.iztacala.unam.mx>
References: <4457DDF8.4050005@campus.iztacala.unam.mx>
Message-ID: <20060504003807.GA86447@iib.unsam.edu.ar>

+----[ Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> (02.May.2006 19:49):
|
| It?s my great pleasure to announce the availability of the BioPerl-run 
| packages (stable & developer releases) for the FreeBSD operating system.
| 
| For instructions on how to install BioPerl ports in FreeBSD, please take 
| a look into the Getting Bioperl section of the BioPerl Wiki.
| 
+----]

Great job Mauricio,

thanks for contributing this!

Fernan


From miker at biotiquesystems.com  Tue May  2 23:31:59 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Tue, 2 May 2006 20:31:59 -0700
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
Message-ID: <007b01c66e62$23161d20$c100a8c0@mike>


I've encountered a pretty serious bug in Bio::SeqIO when parsing certain genbank
files that contain CONTIG entries with gaps.  One such record is NW_925173.

When I try to parse this file using Bio::SeqIO::genbank, it will enter an
infinite loop and spin until it runs out of memory.  

I'm pretty certain it relates to this bug:
http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate that
genbank records with CONTIG gaps are not valid and can't be parsed.  But this
bug actually claims to be fixed, which is strange, since looking at the code for
FTLocationFactory (where the loop is) it's still right there.  I assume that
this may be fixed in other contexts but is still not fixed in
Bio::SeqIO::genbank?  Or am I doing something wrong?

I think that this should probably be filed as an open bug.  I would think that
even if bioperl isn't interested in parsing this type of file via SeqIO,
certainly you'd want to ensure that no finite input file would send the parser
into an infinite loop.  Have others encountered this problem?  Is there any plan
to address it?

Thanks very much for any information or help!

-Mike

P.S.  I've played around with my version of FTLocationFactory and it seems to
actually work and parse the gaps.  I'm not sure if I've created other bugs or if
it works in all cases, but at least the parser doesn't die.  I also don't know
that my hacky code is appropriate for putting back in to BioPerl, but I'm happy
to provide it if someone wants to check it out and/or consider it for checkin.


From ULNJUJERYDIX at spammotel.com  Wed May  3 04:20:38 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 3 May 2006 16:20:38 +0800
Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with
	Bio::Graphics::Panel
Message-ID: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>

Help!
I can't figure out the docs instructions

I want to create an imagemap of short sequence matches with a longer one
with clickable imagemaps for the short sequences. I figure I can do this
easily enough using the example script for parsing blast output but I need
an example script to understand how to produce the html code for the
imagemap. I can find only rather cryptic references about how this can be
done (see below).

$boxes = $panel-E<gt>boxes
    @boxes = $panel-E<gt>boxes
    The boxes() method returns a list of arrayrefs containing the
coordinates of each glyph.  The method is useful for constructing an
image map.  In a scalar context, boxes() returns an arrayref.  In an
list context, the method returns the list directly.

    Each member of the list is an arrayref of the following format:

      [ $feature, $x1, $y1, $x2, $y2, $track ]

    The first element is the feature object; either an
Ace::Sequence::Feature, a Das::Segment::Feature, or another Bioperl
Bio::SeqFeatureI object.  The coordinates are the topleft and
bottomright corners of the glyph, including any space allocated for
labels. The track is the Bio::Graphics::Glyph object corresponding to
the track that the feature is rendered inside.

    $position = $panel-E<gt>track_position($track)
    After calling gd() or boxes(), you can learn the resulting Y
coordinate of a track by calling track_position() with the value
returned by add_track() or unshift_track().  This will return undef if
called before gd() or boxes() or with an invalid track.

    @pixel_coords = $panel-E<gt>location2pixel(@feature_coords)
    Public routine to map feature coordinates (in base pairs) into pixel
coordinates relative to the left-hand edge of the picture. If you
define a -background callback, the callback may wish to invoke this
routine in order to translate base coordinates into pixel coordinates.

    $left = $panel-E<gt>left
    $right = $panel-E<gt>right
    $top   = $panel-E<gt>top
    $bottom = $panel-E<gt>bottom
    Return the pixel coordinates of the *drawing area*     of the panel, that
is, exclusive of the padding.


got it from http://docs.bioperl.org/bioperl-live/Bio/Graphics/Panel.html


From s.johri at imperial.ac.uk  Thu May  4 08:50:34 2006
From: s.johri at imperial.ac.uk (Johri, Saurabh)
Date: Thu, 4 May 2006 13:50:34 +0100
Subject: [Bioperl-l] Fu and Li's D statistic - calculate
Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AB3@icex5.ic.ac.uk>

Hi all,

I'm trying to calculate Fu and Li's D summary statistic for a group of
sequences.
the function fu_and_li_D(@ingroup,$extmutations)  takes 2 args, the
first being the ingroup (population) and the second being the number of
external mutations
which is calculated from an outgroup sequence.. 
 
my question is, which function do i use to calculate the number of
external mutations ?
would this be the singleton_count() function ?
the singleton_count() function takes a PopGen object - which represents
a clustal alignment file...
would i include the outgroup in a multiple fasta file for alignment with
clustal ?
 
any suggestions as to how to calculate the number of external mutations
would be much appreciated
 
Thanks for your help!
 

Saurabh Johri
Centre for Molecular Microbiology & Infection
Imperial College London
SW7 2AZ

 
From hlapp at gmx.net  Thu May  4 12:30:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 4 May 2006 12:30:05 -0400
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike>
References: <007b01c66e62$23161d20$c100a8c0@mike>
Message-ID: <C9D4D0CB-8340-4157-A603-3935C8F581E6@gmx.net>

Infinite loop on a file you can download (i.e., as opposed to a file  
you tinkered with) is never ok. Could you file this as a bug report?  
And ideally attach your patch?

Thanks,

	-hilmar

On May 2, 2006, at 11:31 PM, Michael Rogoff wrote:

>
> I've encountered a pretty serious bug in Bio::SeqIO when parsing  
> certain genbank
> files that contain CONTIG entries with gaps.  One such record is  
> NW_925173.
>
> When I try to parse this file using Bio::SeqIO::genbank, it will  
> enter an
> infinite loop and spin until it runs out of memory.
>
> I'm pretty certain it relates to this bug:
> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to  
> indicate that
> genbank records with CONTIG gaps are not valid and can't be  
> parsed.  But this
> bug actually claims to be fixed, which is strange, since looking at  
> the code for
> FTLocationFactory (where the loop is) it's still right there.  I  
> assume that
> this may be fixed in other contexts but is still not fixed in
> Bio::SeqIO::genbank?  Or am I doing something wrong?
>
> I think that this should probably be filed as an open bug.  I would  
> think that
> even if bioperl isn't interested in parsing this type of file via  
> SeqIO,
> certainly you'd want to ensure that no finite input file would send  
> the parser
> into an infinite loop.  Have others encountered this problem?  Is  
> there any plan
> to address it?
>
> Thanks very much for any information or help!
>
> -Mike
>
> P.S.  I've played around with my version of FTLocationFactory and  
> it seems to
> actually work and parse the gaps.  I'm not sure if I've created  
> other bugs or if
> it works in all cases, but at least the parser doesn't die.  I also  
> don't know
> that my hacky code is appropriate for putting back in to BioPerl,  
> but I'm happy
> to provide it if someone wants to check it out and/or consider it  
> for checkin.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From saldroubi at yahoo.com  Thu May  4 13:03:00 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Thu, 4 May 2006 10:03:00 -0700 (PDT)
Subject: [Bioperl-l] Is webiste down?
Message-ID: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>

All,
  
  Is the bioperl website down?  I can't get to http://www.bioperl.org 
  
  
  Thank you. 
  

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From arareko at campus.iztacala.unam.mx  Thu May  4 14:22:52 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 04 May 2006 13:22:52 -0500
Subject: [Bioperl-l] Is webiste down?
In-Reply-To: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>
References: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>
Message-ID: <445A467C.4070700@campus.iztacala.unam.mx>

Website is ok, maybe your gateway can't lookup the bioperl server at the 
moment.

Regards,
Mauricio.

Sam Al-Droubi wrote:
> All,
>   
>   Is the bioperl website down?  I can't get to http://www.bioperl.org 
>   
>   
>   Thank you. 
>   
> 
> 
> Sincerely, 
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu May  4 14:40:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 4 May 2006 13:40:32 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike>
Message-ID: <000001c66faa$3a25b130$15327e82@pyrimidine>

Are you using the CONTIG record or the full GenBank file? 	I see
problems with both (using bioperl-live) which seem unrelated to one another.
The full file seems to be running a bit slow b/c the full GenBank record is
huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
memory).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> Sent: Tuesday, May 02, 2006 10:32 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> 
> I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> genbank
> files that contain CONTIG entries with gaps.  One such record is
> NW_925173.
> 
> When I try to parse this file using Bio::SeqIO::genbank, it will enter an
> infinite loop and spin until it runs out of memory.
> 
> I'm pretty certain it relates to this bug:
> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> that
> genbank records with CONTIG gaps are not valid and can't be parsed.  But
> this
> bug actually claims to be fixed, which is strange, since looking at the
> code for
> FTLocationFactory (where the loop is) it's still right there.  I assume
> that
> this may be fixed in other contexts but is still not fixed in
> Bio::SeqIO::genbank?  Or am I doing something wrong?
> 
> I think that this should probably be filed as an open bug.  I would think
> that
> even if bioperl isn't interested in parsing this type of file via SeqIO,
> certainly you'd want to ensure that no finite input file would send the
> parser
> into an infinite loop.  Have others encountered this problem?  Is there
> any plan
> to address it?
> 
> Thanks very much for any information or help!
> 
> -Mike
> 
> P.S.  I've played around with my version of FTLocationFactory and it seems
> to
> actually work and parse the gaps.  I'm not sure if I've created other bugs
> or if
> it works in all cases, but at least the parser doesn't die.  I also don't
> know
> that my hacky code is appropriate for putting back in to BioPerl, but I'm
> happy
> to provide it if someone wants to check it out and/or consider it for
> checkin.
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From j.abbott at imperial.ac.uk  Thu May  4 11:44:44 2006
From: j.abbott at imperial.ac.uk (James Abbott)
Date: Thu, 04 May 2006 16:44:44 +0100
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or
	RC	lines
In-Reply-To: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>
References: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
	<7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>
Message-ID: <445A216C.7090108@imperial.ac.uk>

Jason Stajich wrote:
> I don't know if any of this has been resolved really so hopefully  
> James will speak up if he's implemented anything.
Not as yet, I'm afraid - $job is keeping me overly busy at the moment, 
but it's on my todo list....

Cheers,
James

-- 
Dr. James Abbott <j.abbott at imperial.ac.uk>
Bioinformatics Software Developer, Bioinformatics Support Service
Imperial College, London


From hubert.prielinger at gmx.at  Thu May  4 15:35:42 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 13:35:42 -0600
Subject: [Bioperl-l] can't parse blast file anymore
Message-ID: <445A578E.8050207@gmx.at>

Hi,
the following perl script worked fine until a few days ago....

==============================================================
#!/usr/bin/perl -w

use Bio::SearchIO;
use strict;
use DBI;
use Net::MySQL;

#use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux);

print "trying to connect to database \n";
my $database = 'antimicro_peptides';
my $host = 'ppc7.bio.ucalgary.ca';
my $user = 'Hubert';
my $password = 'Col00eng30';

my $mysql = Net::MySQL->new(
        hostname => $host,
        database => $database,
        user     => $user,
        password => $password,
    );
   

print "Connection established \n";

my $selectID = 0;
my $count = 0;


##output database results
#while (my @row = $sth->fetchrow_array)
#   { print "@row\n" }


print "start program\n";
my $directory = '/home/Hubert/test';
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
  if ($file =~ /txt$/)   {
      $count++;
    print "read file $file \n";
  

    $file = $directory . '/' . $file;

    my $search = new Bio::SearchIO (-format => 'blast',
                                       -file => $file);
    print "bioperl seems to work....\n";                           
    my $cutoff_len = 10;
                               
    #iterate over each query sequence
    print "try to enter while loop\n";
    while (my $result = $search->next_result) {
    print "entered 1st while loop\n";
   
      #iterate over each hit on the query sequence
      while (my $hit = $result->next_hit) {
      print "entered 2nd while loop\n";
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
        print "entered 3rd while loop\n";
           
          if ($hsp->length('sbjct') <= $cutoff_len) {
          #print $hsp->hit_string, "\n";
               
            for ($hsp->hit_string) {        #$hsp->hit_string
             print "count files....., $count ,\n";
.................

===================================================================

Output:

[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
trying to connect to database
Connection established
start program
opened directory
read file 40026.txt
bioperl seems to work....
try to enter while loop


but it doesn't enter the first while loop, it stuck there, first I 
thought it is a linux problem, because I updated from FC4 to FC5, but it 
isn't because perl is working fine, and it seems bioperl is working fine 
too, but it cannot parse the file anymore.....

regards
Hubert


From barry.moore at genetics.utah.edu  Thu May  4 17:22:51 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 4 May 2006 15:22:51 -0600
Subject: [Bioperl-l] [BULK]   can't parse blast file anymore
In-Reply-To: <445A578E.8050207@gmx.at>
References: <445A578E.8050207@gmx.at>
Message-ID: <BD1D97AA-99BD-451C-9835-4F22A59BCFDD@genetics.utah.edu>

Hubert,

My first suggestion would be to log onto your calgary server and  
change your password real quick (unless that is intended to post you  
password to the world).  Well, this isn't an answer, but it may help  
you find one.  Use perl -d your_script.pl to run your script under  
the debugger.  Type 'n' to step forward to the line where you start  
the while loop.  Type 'x $result' to see that an object exists (it  
should or you'd have gotten an error).  Type 's' to step into the  
next_results call, and then continue to type 'n' and 's' as needed to  
burrow down to see if you can find where you're hanging.

Barry

On May 4, 2006, at 1:35 PM, Hubert Prielinger wrote:

> Hi,
> the following perl script worked fine until a few days ago....
>
> ==============================================================
> #!/usr/bin/perl -w
>
> use Bio::SearchIO;
> use strict;
> use DBI;
> use Net::MySQL;
>
> #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux);
>
> print "trying to connect to database \n";
> my $database = 'antimicro_peptides';
> my $host = 'ppc7.bio.ucalgary.ca';
> my $user = 'Hubert';
> my $password = 'Col00eng30';
>
> my $mysql = Net::MySQL->new(
>         hostname => $host,
>         database => $database,
>         user     => $user,
>         password => $password,
>     );
>
>
> print "Connection established \n";
>
> my $selectID = 0;
> my $count = 0;
>
>
>
> ##output database results
> #while (my @row = $sth->fetchrow_array)
> #   { print "@row\n" }
>
>
>
> print "start program\n";
> my $directory = '/home/Hubert/test';
> opendir(DIR, $directory) || die("Cannot open directory");
> print "opened directory\n";
>
> foreach my $file (readdir(DIR))  {
>   if ($file =~ /txt$/)   {
>       $count++;
>     print "read file $file \n";
>
>
>     $file = $directory . '/' . $file;
>
>     my $search = new Bio::SearchIO (-format => 'blast',
>                                        -file => $file);
>     print "bioperl seems to work....\n";
>     my $cutoff_len = 10;
>
>     #iterate over each query sequence
>     print "try to enter while loop\n";
>     while (my $result = $search->next_result) {
>     print "entered 1st while loop\n";
>
>       #iterate over each hit on the query sequence
>       while (my $hit = $result->next_hit) {
>       print "entered 2nd while loop\n";
>
>         #iterate over each HSP in the hit
>         while (my $hsp = $hit->next_hsp) {
>         print "entered 3rd while loop\n";
>
>           if ($hsp->length('sbjct') <= $cutoff_len) {
>           #print $hsp->hit_string, "\n";
>
>             for ($hsp->hit_string) {        #$hsp->hit_string
>              print "count files....., $count ,\n";
> .................
>
> ===================================================================
>
> Output:
>
> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
> trying to connect to database
> Connection established
> start program
> opened directory
> read file 40026.txt
> bioperl seems to work....
> try to enter while loop
>
>
> but it doesn't enter the first while loop, it stuck there, first I
> thought it is a linux problem, because I updated from FC4 to FC5,  
> but it
> isn't because perl is working fine, and it seems bioperl is working  
> fine
> too, but it cannot parse the file anymore.....
>
> regards
> Hubert
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu May  4 18:27:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 4 May 2006 17:27:57 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <000001c66faa$3a25b130$15327e82@pyrimidine>
Message-ID: <000001c66fc9$fe7e5680$15327e82@pyrimidine>

Here's another odd bit.  This is what I get for the CONTIG line when I
passed a simple contig file (NW_925062, with one join) through Bio::SeqIO:

-----------------------------------
....
FEATURES             Location/Qualifiers
     source          1..8541
                     /db_xref="taxon:9606"
                     /mol_type="genomic DNA"
                     /chromosome="11"
                     /organism="Homo sapiens"
CONTIG      AADB02014027.1:1..8541

//
-----------------------------------
Here's the original:
-----------------------------------
FEATURES             Location/Qualifiers
     source          1..8541
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /chromosome="11"
CONTIG      join(AADB02014027.1:1..8541)
//
-----------------------------------

Looks like it lopped out the 'join' here as well.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, May 04, 2006 1:41 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> Are you using the CONTIG record or the full GenBank file? 	I see
> problems with both (using bioperl-live) which seem unrelated to one
> another.
> The full file seems to be running a bit slow b/c the full GenBank record
> is
> huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
> memory).
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > Sent: Tuesday, May 02, 2006 10:32 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> >
> > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> > genbank
> > files that contain CONTIG entries with gaps.  One such record is
> > NW_925173.
> >
> > When I try to parse this file using Bio::SeqIO::genbank, it will enter
> an
> > infinite loop and spin until it runs out of memory.
> >
> > I'm pretty certain it relates to this bug:
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> > that
> > genbank records with CONTIG gaps are not valid and can't be parsed.  But
> > this
> > bug actually claims to be fixed, which is strange, since looking at the
> > code for
> > FTLocationFactory (where the loop is) it's still right there.  I assume
> > that
> > this may be fixed in other contexts but is still not fixed in
> > Bio::SeqIO::genbank?  Or am I doing something wrong?
> >
> > I think that this should probably be filed as an open bug.  I would
> think
> > that
> > even if bioperl isn't interested in parsing this type of file via SeqIO,
> > certainly you'd want to ensure that no finite input file would send the
> > parser
> > into an infinite loop.  Have others encountered this problem?  Is there
> > any plan
> > to address it?
> >
> > Thanks very much for any information or help!
> >
> > -Mike
> >
> > P.S.  I've played around with my version of FTLocationFactory and it
> seems
> > to
> > actually work and parse the gaps.  I'm not sure if I've created other
> bugs
> > or if
> > it works in all cases, but at least the parser doesn't die.  I also
> don't
> > know
> > that my hacky code is appropriate for putting back in to BioPerl, but
> I'm
> > happy
> > to provide it if someone wants to check it out and/or consider it for
> > checkin.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Thu May  4 18:39:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 4 May 2006 18:39:05 -0400
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <000001c66fc9$fe7e5680$15327e82@pyrimidine>
References: <000001c66fc9$fe7e5680$15327e82@pyrimidine>
Message-ID: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net>

The two notations are equivalent and syntactically correct, or so I  
believe ... I don't think 100% verbatim preservation should be the  
goal. Or am I missing the point?

On May 4, 2006, at 6:27 PM, Chris Fields wrote:

> Here's another odd bit.  This is what I get for the CONTIG line when I
> passed a simple contig file (NW_925062, with one join) through  
> Bio::SeqIO:
>
> -----------------------------------
> ....
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /db_xref="taxon:9606"
>                      /mol_type="genomic DNA"
>                      /chromosome="11"
>                      /organism="Homo sapiens"
> CONTIG      AADB02014027.1:1..8541
>
> //
> -----------------------------------
> Here's the original:
> -----------------------------------
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /organism="Homo sapiens"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:9606"
>                      /chromosome="11"
> CONTIG      join(AADB02014027.1:1..8541)
> //
> -----------------------------------
>
> Looks like it lopped out the 'join' here as well.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, May 04, 2006 1:41 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>
>> Are you using the CONTIG record or the full GenBank file? 	I see
>> problems with both (using bioperl-live) which seem unrelated to one
>> another.
>> The full file seems to be running a bit slow b/c the full GenBank  
>> record
>> is
>> huge (~55 MB) but the CONTIG file does exactly what you said (runs  
>> out of
>> memory).
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
>>> Sent: Tuesday, May 02, 2006 10:32 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>>
>>>
>>> I've encountered a pretty serious bug in Bio::SeqIO when parsing  
>>> certain
>>> genbank
>>> files that contain CONTIG entries with gaps.  One such record is
>>> NW_925173.
>>>
>>> When I try to parse this file using Bio::SeqIO::genbank, it will  
>>> enter
>> an
>>> infinite loop and spin until it runs out of memory.
>>>
>>> I'm pretty certain it relates to this bug:
>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to  
>>> indicate
>>> that
>>> genbank records with CONTIG gaps are not valid and can't be  
>>> parsed.  But
>>> this
>>> bug actually claims to be fixed, which is strange, since looking  
>>> at the
>>> code for
>>> FTLocationFactory (where the loop is) it's still right there.  I  
>>> assume
>>> that
>>> this may be fixed in other contexts but is still not fixed in
>>> Bio::SeqIO::genbank?  Or am I doing something wrong?
>>>
>>> I think that this should probably be filed as an open bug.  I would
>> think
>>> that
>>> even if bioperl isn't interested in parsing this type of file via  
>>> SeqIO,
>>> certainly you'd want to ensure that no finite input file would  
>>> send the
>>> parser
>>> into an infinite loop.  Have others encountered this problem?  Is  
>>> there
>>> any plan
>>> to address it?
>>>
>>> Thanks very much for any information or help!
>>>
>>> -Mike
>>>
>>> P.S.  I've played around with my version of FTLocationFactory and it
>> seems
>>> to
>>> actually work and parse the gaps.  I'm not sure if I've created  
>>> other
>> bugs
>>> or if
>>> it works in all cases, but at least the parser doesn't die.  I also
>> don't
>>> know
>>> that my hacky code is appropriate for putting back in to BioPerl,  
>>> but
>> I'm
>>> happy
>>> to provide it if someone wants to check it out and/or consider it  
>>> for
>>> checkin.
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hubert.prielinger at gmx.at  Thu May  4 19:57:44 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 17:57:44 -0600
Subject: [Bioperl-l] can't parse blast file anymore
In-Reply-To: <445A7449.1080607@infotech.monash.edu.au>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
Message-ID: <445A94F8.9000903@gmx.at>

Torsten Seemann wrote:
> Hubert
>
>> the following perl script worked fine until a few days ago....
>>
>>    #iterate over each query sequence
>>    print "try to enter while loop\n";
>>  
>>
> die "Bad BLAST report" if not defined $search;
>
>>    while (my $result = $search->next_result) {
>>    print "entered 1st while loop\n";
>>
>> Output:
>>
>> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
>> try to enter while loop
>>
>> but it doesn't enter the first while loop, it stuck there, first I  
>>
> What is the value of $search before you start the WHILE loop ?
>
>


hi,
$search is defined, like

my $search = new Bio::SearchIO (-format => 'blast',
                                       -file => $file)


if I try it with the debugger as barry has suggested than I get the following

 
DB<1> n
main::(Blast.pl:24):    print "Connection established \n";
  DB<1> n
Connection established
main::(Blast.pl:26):    my $selectID = 0;
  DB<1> n
main::(Blast.pl:27):    my $count = 0;
  DB<1> n
main::(Blast.pl:37):    print "start program\n";
  DB<1> n
start program
main::(Blast.pl:38):    my $directory = '/home/Hubert/test';
  DB<1> n
main::(Blast.pl:39):    opendir(DIR, $directory) || die("Cannot open 
directory");
  DB<1> n
main::(Blast.pl:40):    print "opened directory\n";
  DB<1> n
opened directory
main::(Blast.pl:42):    foreach my $file (readdir(DIR))  {
  DB<1> n
main::(Blast.pl:43):      if ($file =~ /txt$/)   {
  DB<1> n
main::(Blast.pl:44):            $count++;
  DB<1> n
main::(Blast.pl:45):        print "read file $file \n";
  DB<1> n
read file 40026.txt
main::(Blast.pl:48):        $file = $directory . '/' . $file;
  DB<1> n
main::(Blast.pl:50):        my $search = new Bio::SearchIO (-format => 
'blast',
main::(Blast.pl:51):                                                           
-file => $file);
  DB<1> n
main::(Blast.pl:52):            print "bioperl seems to work....\n";
  DB<1> s $search
main::((eval 14)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
3:      $search;
  DB<<2>> n

  DB<2> n
bioperl seems to work....
main::(Blast.pl:53):        my $cutoff_len = 10;
  DB<2> n
main::(Blast.pl:56):        print "try to enter while loop\n";
  DB<2> n
try to enter while loop
main::(Blast.pl:57):        while (my $result = $search->next_result) {
  DB<2> s $result
main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
3:      $result;
  DB<<3>>


From torsten.seemann at infotech.monash.edu.au  Thu May  4 17:38:17 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 05 May 2006 07:38:17 +1000
Subject: [Bioperl-l] can't parse blast file anymore
In-Reply-To: <445A578E.8050207@gmx.at>
References: <445A578E.8050207@gmx.at>
Message-ID: <445A7449.1080607@infotech.monash.edu.au>

Hubert

>the following perl script worked fine until a few days ago....
>
>    #iterate over each query sequence
>    print "try to enter while loop\n";
>  
>
die "Bad BLAST report" if not defined $search;

>    while (my $result = $search->next_result) {
>    print "entered 1st while loop\n";
>
>Output:
>
>[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
>try to enter while loop
>
>but it doesn't enter the first while loop, it stuck there, first I 
>  
>
What is the value of $search before you start the WHILE loop ?


From barry.moore at genetics.utah.edu  Thu May  4 20:39:57 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 4 May 2006 18:39:57 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445A94F8.9000903@gmx.at>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
	<445A94F8.9000903@gmx.at>
Message-ID: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>

That should be 'x $resust' and you should see the object dumped to  
the screen.

or just 's' by itself which will step you into the sub on the while  
line will step you into the next_result sub, and you can look around  
and watch what's happening.

B

>   DB<2> s $result
> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
> 3:      $result;
>   DB<<3>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Thu May  4 22:04:20 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 20:04:20 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>
	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
Message-ID: <445AB2A4.7020405@gmx.at>

if I do so it returns:
0 undef


Barry Moore wrote:
> That should be 'x $resust' and you should see the object dumped to  
> the screen.
>
> or just 's' by itself which will step you into the sub on the while  
> line will step you into the next_result sub, and you can look around  
> and watch what's happening.
>
> B
>
>   
>>   DB<2> s $result
>> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
>> 3:      $result;
>>   DB<<3>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From torsten.seemann at infotech.monash.edu.au  Fri May  5 00:40:34 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 05 May 2006 14:40:34 +1000
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445AB2A4.7020405@gmx.at>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
	<445A94F8.9000903@gmx.at>
	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
	<445AB2A4.7020405@gmx.at>
Message-ID: <445AD742.4070408@infotech.monash.edu.au>

Hubert Prielinger wrote:
> if I do so it returns:
> 0 undef

That means the value of $search was undef.
That means that it could not parse or open the BLAST report.
I repeat the line that I put in my earlier email which you ignored.

# your line
my $search = Bio::SearchIO->new( ..... );

# then check if it was successful!
die "could not open blast report" if not defined $search;

--Torsten


From jason.stajich at duke.edu  Fri May  5 09:21:38 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 5 May 2006 09:21:38 -0400
Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files
In-Reply-To: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
Message-ID: <B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>

Space after the > is causing the problem since we infer the ID as the  
everything after the '>' BEFORE the first whitespace.  Get rid of the  
space.
   $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE

On May 4, 2006, at 7:00 PM, Gloria Rendon wrote:

> contents of the input file has a single sequence:
>
>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel
> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNFS
> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN
> ------------------------------------------
> this is the script that tries to parse it:
>
> use Bio::AlignIO;
> my $inseq = Bio::AlignIO->new(-format => 'fasta',
>                            -file   => 'test.fasta');
> while( my $aln = $inseq->next_aln ) {
>      print "name: ", $aln->displayname;
>      print "length: ", $aln->length;
>      print "\n";
> }
>
> ------------------------------------------
> and this is the result of running that script on winxp
>
> D:\msa\NAK MUTANTS>perl parseFasta.pl
>
>
> ------------- EXCEPTION  -------------
> MSG: No sequence with name []
> STACK Bio::SimpleAlign::displayname
> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047
> STACK toplevel parseFasta.pl:11
>
> --------------------------------------
> D:\msa\NAK MUTANTS>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From thoufek at pngg.org  Thu May  4 12:50:44 2006
From: thoufek at pngg.org (T.D. Houfek)
Date: Thu, 04 May 2006 12:50:44 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
Message-ID: <445A30E4.6070103@pngg.org>

Using Bioperl 1.5, having trouble with writing FASTA-style quality files 
using Bio::Seq::Quality.

I create the Bio::Seq::Quality object, giving its constructor an ID, a 
description, a nucleotide sequence, and a quality sequence. I then write 
the sequence FASTA and the quality FASTA. The description string will 
appear in the header line of the sequence FASTA, but not in the header 
line of the quality FASTA.

Can anybody help me figure out how to fix this? I've attached a sample 
script and output.

-T.D.

------------------- sample script follows 
---------------------------------------

#!/usr/bin/perl
use strict;
use Bio::Seq::Quality;
use Bio::SeqIO;

my $id = "bogus_id";
my $desc = "bogus description";
my $seq = "ATTATTATTATTATT";
my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";

my $sequal_obj = Bio::Seq::Quality->new(
-display_id => $id,
-desc => $desc,
-seq => $seq,
-qual => $qual
);

my $qualout = Bio::SeqIO->new(
-file => ">myfile.qual",
-format => 'qual'
);
my $seqout = Bio::SeqIO->new(
-file => ">myfile.seq",
-format => 'Fasta'
);

$seqout->write_seq($sequal_obj);
$qualout->write_seq($sequal_obj);


------------------ sample output follows 
---------------------------------------

tdhoufek at aether:~$ cat myfile.seq
 >bogus_id bogus description
ATTATTATTATTATT
tdhoufek at aether:~$ cat myfile.qual
 >bogus_id
10 20 30 10 20 30 10 20 30 10 20 30 10 20 30

--------------------------------------------------------------------------------------------------


-- 
T.D. Houfek
senior bioinformatics developer
plant nematode genetics group
north carolina state university
Email: thoufek at pngg.org
----------------------------------------------------------
use Bio::Seq; @a =qw/NNN CCT GAG CAT GCG TGT AAG AAC TAG/;
$u=seq;$r=Bio::Seq;sub c{$c=$r->new(-$u=>"@_[0]")->revcom;
$t=$c->$u;}map{m/\d/?$g=c($a[$_]):tr/a-i/1-9/&&($g=$a[$_])
;$x[$i++]=$g;} split //,"dgh5cb40ab120cdefb4";$z=$r->new(-
$u=>(join"", at x))->translate()->$u;$z =~s/X/ /g;print"$z\n"


From jason.stajich at duke.edu  Fri May  5 09:27:51 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 5 May 2006 09:27:51 -0400
Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files
In-Reply-To: <B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>
References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
	<B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>
Message-ID: <0F79C9AD-DE36-4424-9E59-37ABE8B62A5E@duke.edu>

[replying to myself]

although if you are trying to just read a sequence not an alignment  
then you want to use Bio::SeqIO.

See the copious help on the HOWTO page at bioperl website including a  
sequence and feature howto and beginner's guide.
  http://bioperl.org/wiki/HOWTOs

-jason
On May 5, 2006, at 9:21 AM, Jason Stajich wrote:

> Space after the > is causing the problem since we infer the ID as the
> everything after the '>' BEFORE the first whitespace.  Get rid of the
> space.
>    $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE
>
> On May 4, 2006, at 7:00 PM, Gloria Rendon wrote:
>
>> contents of the input file has a single sequence:
>>
>>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel
>> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNF 
>> S
>> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN
>> ------------------------------------------
>> this is the script that tries to parse it:
>>
>> use Bio::AlignIO;
>> my $inseq = Bio::AlignIO->new(-format => 'fasta',
>>                            -file   => 'test.fasta');
>> while( my $aln = $inseq->next_aln ) {
>>      print "name: ", $aln->displayname;
>>      print "length: ", $aln->length;
>>      print "\n";
>> }
>>
>> ------------------------------------------
>> and this is the result of running that script on winxp
>>
>> D:\msa\NAK MUTANTS>perl parseFasta.pl
>>
>>
>> ------------- EXCEPTION  -------------
>> MSG: No sequence with name []
>> STACK Bio::SimpleAlign::displayname
>> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047
>> STACK toplevel parseFasta.pl:11
>>
>> --------------------------------------
>> D:\msa\NAK MUTANTS>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From osborne1 at optonline.net  Fri May  5 10:04:02 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 05 May 2006 10:04:02 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
Message-ID: <C080D392.8567%osborne1@optonline.net>

T.D.,

According to the documentation,
http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file looks
right. What are you trying to create?

Brian O.


On 5/4/06 12:50 PM, "T.D. Houfek" <thoufek at pngg.org> wrote:

> Using Bioperl 1.5, having trouble with writing FASTA-style quality files
> using Bio::Seq::Quality.
> 
> I create the Bio::Seq::Quality object, giving its constructor an ID, a
> description, a nucleotide sequence, and a quality sequence. I then write
> the sequence FASTA and the quality FASTA. The description string will
> appear in the header line of the sequence FASTA, but not in the header
> line of the quality FASTA.
> 
> Can anybody help me figure out how to fix this? I've attached a sample
> script and output.
> 
> -T.D.
> 
> ------------------- sample script follows
> ---------------------------------------
> 
> #!/usr/bin/perl
> use strict;
> use Bio::Seq::Quality;
> use Bio::SeqIO;
> 
> my $id = "bogus_id";
> my $desc = "bogus description";
> my $seq = "ATTATTATTATTATT";
> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";
> 
> my $sequal_obj = Bio::Seq::Quality->new(
> -display_id => $id,
> -desc => $desc,
> -seq => $seq,
> -qual => $qual
> );
> 
> my $qualout = Bio::SeqIO->new(
> -file => ">myfile.qual",
> -format => 'qual'
> );
> my $seqout = Bio::SeqIO->new(
> -file => ">myfile.seq",
> -format => 'Fasta'
> );
> 
> $seqout->write_seq($sequal_obj);
> $qualout->write_seq($sequal_obj);
> 
> 
> ------------------ sample output follows
> ---------------------------------------
> 
> tdhoufek at aether:~$ cat myfile.seq
>> bogus_id bogus description
> ATTATTATTATTATT
> tdhoufek at aether:~$ cat myfile.qual
>> bogus_id
> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
> 
> ------------------------------------------------------------------------------
> --------------------
> 
> 
> 


From cjfields at uiuc.edu  Fri May  5 10:24:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 09:24:05 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net>
Message-ID: <001701c6704f$90dbd090$15327e82@pyrimidine>

I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk
from the longer file Michael used as an example here (NW_925173). I believe
the CONTIG line is currently handled like a feature so I think it goes
through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix is;
I think it's getting beaten up in there somehow. I may see what happens if
it's treated like a WGS line (like a Bio::Annotation::SimpleValue object)
and just glob the whole mess together as is.


Chris

...
FEATURES             Location/Qualifiers
     source          1..44976370
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /chromosome="11"
CONTIG      join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321,
            gap(441),AADB02014318.1:1..173584,gap(676),
            AADB02014319.1:1..377558,gap(20),
            complement(AADB02014320.1:1..431263),gap(20),
            AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198,
            gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771,
            gap(4611),AADB02014325.1:1..383881,gap(20),
            complement(AADB02014326.1:1..381633),gap(1930),
            complement(AADB02014327.1:1..460053),gap(20),
            AADB02014328.1:1..4186,gap(1587),
...

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Thursday, May 04, 2006 5:39 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> The two notations are equivalent and syntactically correct, or so I
> believe ... I don't think 100% verbatim preservation should be the
> goal. Or am I missing the point?
> 
> On May 4, 2006, at 6:27 PM, Chris Fields wrote:
> 
> > Here's another odd bit.  This is what I get for the CONTIG line when I
> > passed a simple contig file (NW_925062, with one join) through
> > Bio::SeqIO:
> >
> > -----------------------------------
> > ....
> > FEATURES             Location/Qualifiers
> >      source          1..8541
> >                      /db_xref="taxon:9606"
> >                      /mol_type="genomic DNA"
> >                      /chromosome="11"
> >                      /organism="Homo sapiens"
> > CONTIG      AADB02014027.1:1..8541
> >
> > //
> > -----------------------------------
> > Here's the original:
> > -----------------------------------
> > FEATURES             Location/Qualifiers
> >      source          1..8541
> >                      /organism="Homo sapiens"
> >                      /mol_type="genomic DNA"
> >                      /db_xref="taxon:9606"
> >                      /chromosome="11"
> > CONTIG      join(AADB02014027.1:1..8541)
> > //
> > -----------------------------------
> >
> > Looks like it lopped out the 'join' here as well.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >> Sent: Thursday, May 04, 2006 1:41 PM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >>
> >> Are you using the CONTIG record or the full GenBank file? 	I see
> >> problems with both (using bioperl-live) which seem unrelated to one
> >> another.
> >> The full file seems to be running a bit slow b/c the full GenBank
> >> record
> >> is
> >> huge (~55 MB) but the CONTIG file does exactly what you said (runs
> >> out of
> >> memory).
> >>
> >> Chris
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> >>> Sent: Tuesday, May 02, 2006 10:32 PM
> >>> To: bioperl-l at lists.open-bio.org
> >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >>>
> >>>
> >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing
> >>> certain
> >>> genbank
> >>> files that contain CONTIG entries with gaps.  One such record is
> >>> NW_925173.
> >>>
> >>> When I try to parse this file using Bio::SeqIO::genbank, it will
> >>> enter
> >> an
> >>> infinite loop and spin until it runs out of memory.
> >>>
> >>> I'm pretty certain it relates to this bug:
> >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to
> >>> indicate
> >>> that
> >>> genbank records with CONTIG gaps are not valid and can't be
> >>> parsed.  But
> >>> this
> >>> bug actually claims to be fixed, which is strange, since looking
> >>> at the
> >>> code for
> >>> FTLocationFactory (where the loop is) it's still right there.  I
> >>> assume
> >>> that
> >>> this may be fixed in other contexts but is still not fixed in
> >>> Bio::SeqIO::genbank?  Or am I doing something wrong?
> >>>
> >>> I think that this should probably be filed as an open bug.  I would
> >> think
> >>> that
> >>> even if bioperl isn't interested in parsing this type of file via
> >>> SeqIO,
> >>> certainly you'd want to ensure that no finite input file would
> >>> send the
> >>> parser
> >>> into an infinite loop.  Have others encountered this problem?  Is
> >>> there
> >>> any plan
> >>> to address it?
> >>>
> >>> Thanks very much for any information or help!
> >>>
> >>> -Mike
> >>>
> >>> P.S.  I've played around with my version of FTLocationFactory and it
> >> seems
> >>> to
> >>> actually work and parse the gaps.  I'm not sure if I've created
> >>> other
> >> bugs
> >>> or if
> >>> it works in all cases, but at least the parser doesn't die.  I also
> >> don't
> >>> know
> >>> that my hacky code is appropriate for putting back in to BioPerl,
> >>> but
> >> I'm
> >>> happy
> >>> to provide it if someone wants to check it out and/or consider it
> >>> for
> >>> checkin.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Fri May  5 10:47:50 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 5 May 2006 10:47:50 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <C080D392.8567%osborne1@optonline.net>
References: <C080D392.8567%osborne1@optonline.net>
Message-ID: <2E1683FE-57E4-4D97-A958-1B529973E89E@gmx.net>

He wants the description on the description line, like for the  
sequence file.

Thomas, my guess is the code doesn't print the description to the  
line although I haven't made sure. Do you want to volunteer and  
check, add that print statement and post the patch?

	-hilmar

On May 5, 2006, at 10:04 AM, Brian Osborne wrote:

> T.D.,
>
> According to the documentation,
> http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file  
> looks
> right. What are you trying to create?
>
> Brian O.
>
>
> On 5/4/06 12:50 PM, "T.D. Houfek" <thoufek at pngg.org> wrote:
>
>> Using Bioperl 1.5, having trouble with writing FASTA-style quality  
>> files
>> using Bio::Seq::Quality.
>>
>> I create the Bio::Seq::Quality object, giving its constructor an  
>> ID, a
>> description, a nucleotide sequence, and a quality sequence. I then  
>> write
>> the sequence FASTA and the quality FASTA. The description string will
>> appear in the header line of the sequence FASTA, but not in the  
>> header
>> line of the quality FASTA.
>>
>> Can anybody help me figure out how to fix this? I've attached a  
>> sample
>> script and output.
>>
>> -T.D.
>>
>> ------------------- sample script follows
>> ---------------------------------------
>>
>> #!/usr/bin/perl
>> use strict;
>> use Bio::Seq::Quality;
>> use Bio::SeqIO;
>>
>> my $id = "bogus_id";
>> my $desc = "bogus description";
>> my $seq = "ATTATTATTATTATT";
>> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";
>>
>> my $sequal_obj = Bio::Seq::Quality->new(
>> -display_id => $id,
>> -desc => $desc,
>> -seq => $seq,
>> -qual => $qual
>> );
>>
>> my $qualout = Bio::SeqIO->new(
>> -file => ">myfile.qual",
>> -format => 'qual'
>> );
>> my $seqout = Bio::SeqIO->new(
>> -file => ">myfile.seq",
>> -format => 'Fasta'
>> );
>>
>> $seqout->write_seq($sequal_obj);
>> $qualout->write_seq($sequal_obj);
>>
>>
>> ------------------ sample output follows
>> ---------------------------------------
>>
>> tdhoufek at aether:~$ cat myfile.seq
>>> bogus_id bogus description
>> ATTATTATTATTATT
>> tdhoufek at aether:~$ cat myfile.qual
>>> bogus_id
>> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
>>
>> --------------------------------------------------------------------- 
>> ---------
>> --------------------
>>
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From dmessina at wustl.edu  Fri May  5 11:24:47 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 10:24:47 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <5A549C57-A310-4623-BC44-787AC8BFD6C2@wustl.edu>

Apologies if this is a repost -- mail troubles this morning.

Hilmar is correct.

 From a cursory walk through the code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From dmessina at wustl.edu  Fri May  5 10:53:15 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 09:53:15 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <DCF490F7-46CC-47B7-81A7-229BCC819980@wustl.edu>

T.D.,

 From a cursory walk through your code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From dmessina at wustl.edu  Fri May  5 10:53:15 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 09:53:15 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <DCF490F7-46CC-47B7-81A7-229BCC819980@wustl.edu>

T.D.,

 From a cursory walk through your code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From hubert.prielinger at gmx.at  Fri May  5 14:30:24 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 12:30:24 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445AD742.4070408@infotech.monash.edu.au>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>	<445AB2A4.7020405@gmx.at>
	<445AD742.4070408@infotech.monash.edu.au>
Message-ID: <445B99C0.6050407@gmx.at>

hi,
I have done, as you suggested and I got the error message:

Can't call method "next_result" on an undefined value at....

then I looked up at the internet and found a thread which suggested to 
use strict and then the problem is solved....
but I'm already using use strict..

thanks

Torsten Seemann wrote:
> Hubert Prielinger wrote:
>   
>> if I do so it returns:
>> 0 undef
>>     
>
> That means the value of $search was undef.
> That means that it could not parse or open the BLAST report.
> I repeat the line that I put in my earlier email which you ignored.
>
> # your line
> my $search = Bio::SearchIO->new( ..... );
>
> # then check if it was successful!
> die "could not open blast report" if not defined $search;
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri May  5 15:18:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 14:18:16 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
Message-ID: <000001c67078$a9a7ca10$15327e82@pyrimidine>

What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping through
your files and performing a task on each one, such as parsing output.  It
changes into the working directory each time; you should be able to do
something like this:

use File::Find;
use Bio::SearchIO;


Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report.
> > I repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 15:27:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 14:27:12 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
Message-ID: <000101c67079$e8c86a00$15327e82@pyrimidine>

Sorry, mail got sent before I finished it!  Here I go again...

What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping through
your files and performing a task on each one, such as parsing output.  It
changes into the working directory each time; you should be able to do
something like this:

use File::Find;
use Bio::SearchIO;

my @dirlist = ("/home/Hubert/test");

find (\&dir, @dirlist);

sub printdir {
    return unless /txt$/; 
    return if (-d);
    my $parser = Bio::SearchIO->new(-file => $_,
                                    -format => 'blast');	
    while (my $result = $parser->next_result) {
        while (my $hit = $result->next_hit) {
            while (my $hsp = $hit->next_hsp) {
                # do stuff here
            }
        }
    }
}

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report.
> > I repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From barry.moore at genetics.utah.edu  Fri May  5 15:39:37 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Fri, 5 May 2006 13:39:37 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>	<445AB2A4.7020405@gmx.at>
	<445AD742.4070408@infotech.monash.edu.au> <445B99C0.6050407@gmx.at>
Message-ID: <7F3D73A6-392E-4728-ACB9-FD3BEDFD3C18@genetics.utah.edu>

Hubert-

If you want to send me your script and input file I'll try to have a  
look at it.

Barry

On May 5, 2006, at 12:30 PM, Hubert Prielinger wrote:

> hi,
> I have done, as you suggested and I got the error message:
>
> Can't call method "next_result" on an undefined value at....
>
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
>
> thanks
>
> Torsten Seemann wrote:
>> Hubert Prielinger wrote:
>>
>>> if I do so it returns:
>>> 0 undef
>>>
>>
>> That means the value of $search was undef.
>> That means that it could not parse or open the BLAST report.
>> I repeat the line that I put in my earlier email which you ignored.
>>
>> # your line
>> my $search = Bio::SearchIO->new( ..... );
>>
>> # then check if it was successful!
>> die "could not open blast report" if not defined $search;
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 16:07:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 15:07:53 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <000101c67079$e8c86a00$15327e82@pyrimidine>
Message-ID: <000201c6707f$97aaaba0$15327e82@pyrimidine>

Oops!  This is what happens when I copy and paste in a hurry.

> use File::Find;
> use Bio::SearchIO;
> 
> my @dirlist = ("/home/Hubert/test");
> 
> find (\&dir, @dirlist);
> 
> sub printdir {
  ^^^^^^^^^^^

Should be: sub dir {

>     return unless /txt$/;
>     return if (-d);
>     my $parser = Bio::SearchIO->new(-file => $_,
>                                     -format => 'blast');
>     while (my $result = $parser->next_result) {
>         while (my $hit = $result->next_hit) {
>             while (my $hsp = $hit->next_hsp) {
>                 # do stuff here
>             }
>         }
>     }
> }

Hubert, if the file you are parsing looks fine (i.e. valid BLAST output),
post it and your script on Bugzilla and let us take a look.  Leave out your
password though ; >

Chris


From golharam at umdnj.edu  Fri May  5 15:58:03 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 05 May 2006 15:58:03 -0400
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <000001c67078$a9a7ca10$15327e82@pyrimidine>
Message-ID: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>

I'm not sure how applicable this is, but I've seen a problem with Perl
if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8).
I've changed mine to en_US and lots of perl string parsing problems went
away.

Also, what about running the bioperl tests on your installation (make
test).  What happens?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Friday, May 05, 2006 3:18 PM
To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore


What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping
through your files and performing a task on each one, such as parsing
output.  It changes into the working directory each time; you should be
able to do something like this:

use File::Find;
use Bio::SearchIO;


Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to

> use strict and then the problem is solved.... but I'm already using 
> use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report. I 
> > repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 17:56:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 16:56:29 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <001701c6704f$90dbd090$15327e82@pyrimidine>
Message-ID: <000901c6708e$c77442b0$15327e82@pyrimidine>

Okay, I have changed the way the CONTIG line is handled in
Bio::SeqIO::genbank.  It was handling it as a feature; I just changed it
over to handling it as a Bio::Annotation::SimpleValue object with the value
being the entire contig section.  It seems to pass tests fine but I'm
operating off Windows and my wife's IBook went to the great desktop in the
sky (motherboard), so I can't test it there.  Pulling the file off using
Bio::DB::GenBank (using the no-redirect flag) works w/o crashing out.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Friday, May 05, 2006 9:24 AM
> To: 'Hilmar Lapp'
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk
> from the longer file Michael used as an example here (NW_925173). I
> believe
> the CONTIG line is currently handled like a feature so I think it goes
> through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix
> is;
> I think it's getting beaten up in there somehow. I may see what happens if
> it's treated like a WGS line (like a Bio::Annotation::SimpleValue object)
> and just glob the whole mess together as is.
> 
> 
> Chris
> 
> ...
> FEATURES             Location/Qualifiers
>      source          1..44976370
>                      /organism="Homo sapiens"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:9606"
>                      /chromosome="11"
> CONTIG
> join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321,
>             gap(441),AADB02014318.1:1..173584,gap(676),
>             AADB02014319.1:1..377558,gap(20),
>             complement(AADB02014320.1:1..431263),gap(20),
>             AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198,
> 
> gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771,
>             gap(4611),AADB02014325.1:1..383881,gap(20),
>             complement(AADB02014326.1:1..381633),gap(1930),
>             complement(AADB02014327.1:1..460053),gap(20),
>             AADB02014328.1:1..4186,gap(1587),
> ...
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> > Sent: Thursday, May 04, 2006 5:39 PM
> > To: Chris Fields
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> > The two notations are equivalent and syntactically correct, or so I
> > believe ... I don't think 100% verbatim preservation should be the
> > goal. Or am I missing the point?
> >
> > On May 4, 2006, at 6:27 PM, Chris Fields wrote:
> >
> > > Here's another odd bit.  This is what I get for the CONTIG line when I
> > > passed a simple contig file (NW_925062, with one join) through
> > > Bio::SeqIO:
> > >
> > > -----------------------------------
> > > ....
> > > FEATURES             Location/Qualifiers
> > >      source          1..8541
> > >                      /db_xref="taxon:9606"
> > >                      /mol_type="genomic DNA"
> > >                      /chromosome="11"
> > >                      /organism="Homo sapiens"
> > > CONTIG      AADB02014027.1:1..8541
> > >
> > > //
> > > -----------------------------------
> > > Here's the original:
> > > -----------------------------------
> > > FEATURES             Location/Qualifiers
> > >      source          1..8541
> > >                      /organism="Homo sapiens"
> > >                      /mol_type="genomic DNA"
> > >                      /db_xref="taxon:9606"
> > >                      /chromosome="11"
> > > CONTIG      join(AADB02014027.1:1..8541)
> > > //
> > > -----------------------------------
> > >
> > > Looks like it lopped out the 'join' here as well.
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> > >> Sent: Thursday, May 04, 2006 1:41 PM
> > >> To: bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> > >>
> > >> Are you using the CONTIG record or the full GenBank file? 	I
see
> > >> problems with both (using bioperl-live) which seem unrelated to one
> > >> another.
> > >> The full file seems to be running a bit slow b/c the full GenBank
> > >> record
> > >> is
> > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs
> > >> out of
> > >> memory).
> > >>
> > >> Chris
> > >>
> > >>> -----Original Message-----
> > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > >>> Sent: Tuesday, May 02, 2006 10:32 PM
> > >>> To: bioperl-l at lists.open-bio.org
> > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> > >>>
> > >>>
> > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing
> > >>> certain
> > >>> genbank
> > >>> files that contain CONTIG entries with gaps.  One such record is
> > >>> NW_925173.
> > >>>
> > >>> When I try to parse this file using Bio::SeqIO::genbank, it will
> > >>> enter
> > >> an
> > >>> infinite loop and spin until it runs out of memory.
> > >>>
> > >>> I'm pretty certain it relates to this bug:
> > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to
> > >>> indicate
> > >>> that
> > >>> genbank records with CONTIG gaps are not valid and can't be
> > >>> parsed.  But
> > >>> this
> > >>> bug actually claims to be fixed, which is strange, since looking
> > >>> at the
> > >>> code for
> > >>> FTLocationFactory (where the loop is) it's still right there.  I
> > >>> assume
> > >>> that
> > >>> this may be fixed in other contexts but is still not fixed in
> > >>> Bio::SeqIO::genbank?  Or am I doing something wrong?
> > >>>
> > >>> I think that this should probably be filed as an open bug.  I would
> > >> think
> > >>> that
> > >>> even if bioperl isn't interested in parsing this type of file via
> > >>> SeqIO,
> > >>> certainly you'd want to ensure that no finite input file would
> > >>> send the
> > >>> parser
> > >>> into an infinite loop.  Have others encountered this problem?  Is
> > >>> there
> > >>> any plan
> > >>> to address it?
> > >>>
> > >>> Thanks very much for any information or help!
> > >>>
> > >>> -Mike
> > >>>
> > >>> P.S.  I've played around with my version of FTLocationFactory and it
> > >> seems
> > >>> to
> > >>> actually work and parse the gaps.  I'm not sure if I've created
> > >>> other
> > >> bugs
> > >>> or if
> > >>> it works in all cases, but at least the parser doesn't die.  I also
> > >> don't
> > >>> know
> > >>> that my hacky code is appropriate for putting back in to BioPerl,
> > >>> but
> > >> I'm
> > >>> happy
> > >>> to provide it if someone wants to check it out and/or consider it
> > >>> for
> > >>> checkin.
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May  5 19:54:55 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 17:54:55 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>
References: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>
Message-ID: <445BE5CF.2000007@gmx.at>

hi ryan,
nothing happend if I add the verbose flag
and how can I test my bioperl installation.....


Ryan Golhar wrote:
> I'm not sure how applicable this is, but I've seen a problem with Perl
> if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8).
> I've changed mine to en_US and lots of perl string parsing problems went
> away.
>
> Also, what about running the bioperl tests on your installation (make
> test).  What happens?
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Friday, May 05, 2006 3:18 PM
> To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
>
>
> What happens if you add the verbose flag?
>
> my $search = new Bio::SearchIO (-verbose => 1,
>                                 -format => 'blast',
>                                 -file => $file);
>
> Added thought : you might want to look at File::Find for stepping
> through your files and performing a task on each one, such as parsing
> output.  It changes into the working directory each time; you should be
> able to do something like this:
>
> use File::Find;
> use Bio::SearchIO;
>
>
>
>
> Original Message-----
>   
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, May 05, 2006 1:30 PM
>> To: Torsten Seemann; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
>>
>> hi,
>> I have done, as you suggested and I got the error message:
>>
>> Can't call method "next_result" on an undefined value at....
>>
>> then I looked up at the internet and found a thread which suggested to
>>     
>
>   
>> use strict and then the problem is solved.... but I'm already using 
>> use strict..
>>
>> thanks
>>
>> Torsten Seemann wrote:
>>     
>>> Hubert Prielinger wrote:
>>>
>>>       
>>>> if I do so it returns:
>>>> 0 undef
>>>>
>>>>         
>>> That means the value of $search was undef.
>>> That means that it could not parse or open the BLAST report. I 
>>> repeat the line that I put in my earlier email which you ignored.
>>>
>>> # your line
>>> my $search = Bio::SearchIO->new( ..... );
>>>
>>> # then check if it was successful!
>>> die "could not open blast report" if not defined $search;
>>>
>>> --Torsten
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org 
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org 
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From hubert.prielinger at gmx.at  Fri May  5 20:01:11 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 18:01:11 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <445BE747.5020202@gmx.at>

hi
I have posted my script and the blast file to bugzilla......


From hubert.prielinger at gmx.at  Fri May  5 21:21:33 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 19:21:33 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445BE747.5020202@gmx.at>
References: <445BE747.5020202@gmx.at>
Message-ID: <445BFA1D.5060008@gmx.at>

they bugzilla posting didn't work, what is the exact email address for 
bugzilla

Hubert Prielinger wrote:
> hi
> I have posted my script and the blast file to bugzilla......
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri May  5 21:38:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 20:38:47 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445BFA1D.5060008@gmx.at>
Message-ID: <000d01c670ad$d209f980$15327e82@pyrimidine>

Hubert,

Calm down.  Breathe in, breath out.  Relax.......

Okay, here is the place to start.  Read the instructions there first.

http://www.bioperl.org/wiki/Bugs

Bugs are reported at this site:

http://bugzilla.bioperl.org/

Again, follow the instructions.  You will have to create a user name and
password to submit.  Once that is set up, click the "Submit a new bug" link
on the main bugzilla page.  On that page, fill out all information first and
a description of the error and hit 'commit'.   Add the BLAST report and some
sample script by clicking on the "Create a New Attachment" link (you'll have
to do this for each file).  Once you go back to the bug page you should see
two attachments and the bug report.  Any commits get sent through the
bioperl-guts-l mail list which most developers subscribe to, so they'll know
there's a new bug out there.  

I will not be able to get to it personally; our home computer died a slow
painful death today (RIP 2002-2006) but I can get to it next week.  If you
post the bug, somebody might be able to get to it sooner!

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 8:22 PM
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore
> 
> they bugzilla posting didn't work, what is the exact email address for
> bugzilla
> 
> Hubert Prielinger wrote:
> > hi
> > I have posted my script and the blast file to bugzilla......
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 22:26:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 21:26:35 -0500
Subject: [Bioperl-l] Changes to NCBIHelper (RE: CONTIG, genome files)
Message-ID: <000f01c670b4$7f22f760$15327e82@pyrimidine>

I committed a change to NCBIHelper that permits the downloading of CON
(contig) files and corrects an issue where no sequence features were saved
when rebuilding those files.  If you use Bio::DB::GenBank regularly to
download genome files, this likely will NOT affect your code unless you
explicitly set the format type to 'genbank', like so:

$factory = Bio::DB::GenBank->new(-format => 'gb'); # or 'genbank'

I believe most will not have that setting since the default was already
'gb'.  Now, the default is 'gbwithparts', which returns the full sequence
regardless.  If it is a file with a CONTIG line, the sequence is built on
NCBI's end and will include seq features if they are present).  As Brian
said, we'll let NCBI do the work for us!  

If you need the actual file w/o sequence, then you can set the format to
'genbank' (like above) and it will grab it for you.  There was an unrelated
problem with CONTIG line parsing that I also fixed, where I changed the
format over to a Bio::Annotation::SimpleValue as a workaround for now; for
some reason some CON files were misparsed and resulted in infinite loops or
missing 'join' statements.  

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hubert.prielinger at gmx.at  Sat May  6 18:22:05 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sat, 06 May 2006 16:22:05 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <000d01c670ad$d209f980$15327e82@pyrimidine>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
Message-ID: <445D218D.2030504@gmx.at>

ok, thanks
I have submitted the bug
bug #1994


Chris Fields wrote:
> Hubert,
>
> Calm down.  Breathe in, breath out.  Relax.......
>
> Okay, here is the place to start.  Read the instructions there first.
>
> http://www.bioperl.org/wiki/Bugs
>
> Bugs are reported at this site:
>
> http://bugzilla.bioperl.org/
>
> Again, follow the instructions.  You will have to create a user name and
> password to submit.  Once that is set up, click the "Submit a new bug" link
> on the main bugzilla page.  On that page, fill out all information first and
> a description of the error and hit 'commit'.   Add the BLAST report and some
> sample script by clicking on the "Create a New Attachment" link (you'll have
> to do this for each file).  Once you go back to the bug page you should see
> two attachments and the bug report.  Any commits get sent through the
> bioperl-guts-l mail list which most developers subscribe to, so they'll know
> there's a new bug out there.  
>
> I will not be able to get to it personally; our home computer died a slow
> painful death today (RIP 2002-2006) but I can get to it next week.  If you
> post the bug, somebody might be able to get to it sooner!
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, May 05, 2006 8:22 PM
>> To: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore
>>
>> they bugzilla posting didn't work, what is the exact email address for
>> bugzilla
>>
>> Hubert Prielinger wrote:
>>     
>>> hi
>>> I have posted my script and the blast file to bugzilla......
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From torsten.seemann at infotech.monash.edu.au  Sat May  6 20:57:14 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 07 May 2006 10:57:14 +1000
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445D218D.2030504@gmx.at>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
	<445D218D.2030504@gmx.at>
Message-ID: <445D45EA.8020804@infotech.monash.edu.au>

Hubert Prielinger wrote:
> ok, thanks
> I have submitted the bug
> bug #1994

This is a line from the script you sent to Bugzilla:

my $search = new Bio::SearchIO (
-verbose => 1,-format => 'blast', -file => $file)
or die "could not open blast report" if not defined my $search;

Althoygh syntactically correct, I don't think it is doing what you want.
Please change it to this:

my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die 
"could not open blast report";

or alternatively, this:

my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
if (not defined $search) {
   die "could not open blast report";
}

and let us know what happens.

all the example output you have supplied still suggests that Bio::SearchIO can 
not load or parse your blast report.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia


From mamillerpa at yahoo.com  Sat May  6 19:07:30 2006
From: mamillerpa at yahoo.com (Mark A. Miller)
Date: Sat, 6 May 2006 16:07:30 -0700 (PDT)
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <C07E8961.84F2%osborne1@optonline.net>
Message-ID: <20060506230730.56480.qmail@web50410.mail.yahoo.com>

Thanks for your responses, Jason and Brian.

Brian, you suggestion works great.  I had really hoped that by parsing
the OS line as well, I could be sure I wasn't missing any sequences
from my organisms.  Well, I gave up on that and just obtained the NCBI
taxonomy values.  I find it pretty easy to work with them in bioperl. 
Unfortunately, walking through all of Trembl takes a while, and I'm
getting this error:

  Can't call method "ncbi_taxid" on an undefined value at ./ga2.pl line
55, <GEN0> line 3253682.

When I try to extract annotations, etc., from entries like:

  DHE4_UNKP

with:

  my $species_object = $seq->species;
  my $taxid_string = $species_object->ncbi_taxid;

I guess I have to write an error handler for incomplete taxonomy
values.

Bye for now,
Mark


--- Brian Osborne <osborne1 at optonline.net> wrote:

> Mark,
> 
> The RC line is part of the description of a reference, I'm guessing
> 'RC'
> stands for Reference Comment. In order to get the attributes of a
> reference
> you'll first do something like:
> 
> my $anno_collection = $seq->annotation;
> my @references = $anno_collection->get_Annotations('reference');
> 
> To get the comment field for a specific reference you can do:
> 
> $references[0]->comment;
> 
> See the Feature-Annotation HOWTO for more information on Annotations,
> the
> Reference object is a kind of Annotation object.
> 
> Brian O.
> 
> 
> On 5/3/06 3:34 PM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:
> 
> > Yeah.  Do you have any experience with that?
> > 
> > Mark
> > 
> > --- Brian Osborne <osborne1 at optonline.net> wrote:
> > 
> >> Mark,
> >> 
> >> So you're trying to get the information in the RC line from a
> >> Swissprot
> >> format file?
> >> 
> >> Brian O.
> > 
> > 
> > ---   ---   ---   ---   ---   ---   ---   ---
> > 
> > Mark A. Miller
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com 
> 
> 
> 


---   ---   ---   ---   ---   ---   ---   ---

Mark A. Miller

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Sat May  6 23:33:40 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sat, 6 May 2006 22:33:40 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>

The -verbose flag was my suggestion; it should output a ton of debugging info 
from SearchIO::blast; if you see anything there, then it means that it's at least 
attempting to parse the report.  

Of course I can't test this myself at the moment since my wife's computer died 
(along with the bioperl setup); I'm using a loaner computer at the moment.

Chris

---- Original message ----
>Date: Sun, 07 May 2006 10:57:14 +1000
>From: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>  
>Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore  
>To: Hubert Prielinger <hubert.prielinger at gmx.at>
>Cc: bioperl-l at bioperl.org
>
>Hubert Prielinger wrote:
>> ok, thanks
>> I have submitted the bug
>> bug #1994
>
>This is a line from the script you sent to Bugzilla:
>
>my $search = new Bio::SearchIO (
>-verbose => 1,-format => 'blast', -file => $file)
>or die "could not open blast report" if not defined my $search;
>
>Althoygh syntactically correct, I don't think it is doing what you want.
>Please change it to this:
>
>my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die 
>"could not open blast report";
>
>or alternatively, this:
>
>my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>if (not defined $search) {
>   die "could not open blast report";
>}
>
>and let us know what happens.
>
>all the example output you have supplied still suggests that Bio::SearchIO can 
>not load or parse your blast report.
>
>-- 
>Torsten Seemann
>Victorian Bioinformatics Consortium, Monash University, Australia
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May  7 03:34:55 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 7 May 2006 00:34:55 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
Message-ID: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>

Hi all,

I use Bio::Tools::Run::Primer3 to design PCR primers.
I want to change some default values, for example, to
increase the PCR product size to 490-510 bp instead of
using the default value of 100-300 bp. What should I
do ?  


Thanks,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From jason.stajich at duke.edu  Sun May  7 16:49:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 16:49:29 -0400
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>
References: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>
Message-ID: <F69897D1-C65F-47F3-8324-EC2E52A2ACCD@duke.edu>

The problem is in how SearchIO was being initialized, the code  
basically looked like this:

  my $x = new Foo() or die if not defined my $x;

which is invalid for two reason.
  1) if not defined my $x;
  Will ALWAYS be false.

  2) my $x = new Foo() or die ;
  Will cast the new object as a boolean.

Whenever things aren't working, take a look at the code and try and  
walk through any shortcuts.  For clarity make it a two-step process
my $x = new Foo();
die "no valid $x" unless defined $x;

Please note that currently BioPerl WILL die (via throw) if you try  
and ask for an invalid file when you initialize a new IO object  --  
this is handled by code in Bio::Root::IO (line 313 in Bio/Root/IO.pm)  
which all the IO objects use, so you don't really need to do a test  
on the object after all.

--jason
On May 6, 2006, at 11:33 PM, Christopher Fields wrote:

> The -verbose flag was my suggestion; it should output a ton of  
> debugging info
> from SearchIO::blast; if you see anything there, then it means that  
> it's at least
> attempting to parse the report.
>
> Of course I can't test this myself at the moment since my wife's  
> computer died
> (along with the bioperl setup); I'm using a loaner computer at the  
> moment.
>
> Chris
>
> ---- Original message ----
>> Date: Sun, 07 May 2006 10:57:14 +1000
>> From: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>
>> Subject: Re: [Bioperl-l] [BULK]  can't parse blast file anymore
>> To: Hubert Prielinger <hubert.prielinger at gmx.at>
>> Cc: bioperl-l at bioperl.org
>>
>> Hubert Prielinger wrote:
>>> ok, thanks
>>> I have submitted the bug
>>> bug #1994
>>
>> This is a line from the script you sent to Bugzilla:
>>
>> my $search = new Bio::SearchIO (
>> -verbose => 1,-format => 'blast', -file => $file)
>> or die "could not open blast report" if not defined my $search;
>>
>> Althoygh syntactically correct, I don't think it is doing what you  
>> want.
>> Please change it to this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file)  
>> or die
>> "could not open blast report";
>>
>> or alternatively, this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>> if (not defined $search) {
>>   die "could not open blast report";
>> }
>>
>> and let us know what happens.
>>
>> all the example output you have supplied still suggests that  
>> Bio::SearchIO can
>> not load or parse your blast report.
>>
>> -- 
>> Torsten Seemann
>> Victorian Bioinformatics Consortium, Monash University, Australia
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Sun May  7 17:01:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 17:01:29 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>
References: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>
Message-ID: <C9CE0912-9C48-4404-AB56-054A425FE3A0@duke.edu>

I put up some info on the wiki (and I encourage other people to do  
the same!)
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3

Set the command line parameters by just calling a function of the  
name of the parameter.  To get a list of the available options, this  
perl code will report it to you:

# what are the arguments, and what do they mean?
   my $args = $primer3->arguments;

   print "ARGUMENT\tMEANING\n";
   foreach my $key (keys %{$args}) {print "$key\t", $$args{$key}, "\n"}

The info for PRODUCT_SIZE_RANGE is:
   (size range list, default 100-300) space separated list of product  
sizes eg <a>-<b> <x>-<y>

I believe you can set the PCR product size with
   $primer3->primer_product_size_range("490-510");

-jason
On May 7, 2006, at 3:34 AM, chen li wrote:

> Hi all,
>
> I use Bio::Tools::Run::Primer3 to design PCR primers.
> I want to change some default values, for example, to
> increase the PCR product size to 490-510 bp instead of
> using the default value of 100-300 bp. What should I
> do ?
>
>
> Thanks,
>
> Li
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From chen_li3 at yahoo.com  Sun May  7 21:18:17 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 7 May 2006 18:18:17 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <C9CE0912-9C48-4404-AB56-054A425FE3A0@duke.edu>
Message-ID: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>

Hi Jason,

I add the line code   
$primer3->primer_product_size_range("490-510");
 to my script. But it doesn't work nor primer3
complains it.

Li

--- Jason Stajich <jason.stajich at duke.edu> wrote:

> I put up some info on the wiki (and I encourage
> other people to do  
> the same!)
>
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
> 
> Set the command line parameters by just calling a
> function of the  
> name of the parameter.  To get a list of the
> available options, this  
> perl code will report it to you:
> 
> # what are the arguments, and what do they mean?
>    my $args = $primer3->arguments;
> 
>    print "ARGUMENT\tMEANING\n";
>    foreach my $key (keys %{$args}) {print "$key\t",
> $$args{$key}, "\n"}
> 
> The info for PRODUCT_SIZE_RANGE is:
>    (size range list, default 100-300) space
> separated list of product  
> sizes eg <a>-<b> <x>-<y>
> 
> I believe you can set the PCR product size with
>    $primer3->primer_product_size_range("490-510");
> 
> -jason
> On May 7, 2006, at 3:34 AM, chen li wrote:
> 
> > Hi all,
> >
> > I use Bio::Tools::Run::Primer3 to design PCR
> primers.
> > I want to change some default values, for example,
> to
> > increase the PCR product size to 490-510 bp
> instead of
> > using the default value of 100-300 bp. What should
> I
> > do ?
> >
> >
> > Thanks,
> >
> > Li
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From hubert.prielinger at gmx.at  Sun May  7 21:41:14 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 07 May 2006 19:41:14 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445D45EA.8020804@infotech.monash.edu.au>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
	<445D218D.2030504@gmx.at> <445D45EA.8020804@infotech.monash.edu.au>
Message-ID: <445EA1BA.9050301@gmx.at>

hi,
I have corrected that and now I finally I got a few error messages:

blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
Madden, Alejandro A. Sch?ffer,
blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
David J. Lipman
blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
generation of
blast.pm: unrecognized line protein database search programs", Nucleic 
Acids Res. 25:3389-3402.
blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1

after that line it stops without terminating....


Torsten Seemann wrote:
> Hubert Prielinger wrote:
>> ok, thanks
>> I have submitted the bug
>> bug #1994
>
> This is a line from the script you sent to Bugzilla:
>
> my $search = new Bio::SearchIO (
> -verbose => 1,-format => 'blast', -file => $file)
> or die "could not open blast report" if not defined my $search;
>
> Althoygh syntactically correct, I don't think it is doing what you want.
> Please change it to this:
>
> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
> die "could not open blast report";
>
> or alternatively, this:
>
> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
> if (not defined $search) {
>   die "could not open blast report";
> }
>
> and let us know what happens.
>
> all the example output you have supplied still suggests that 
> Bio::SearchIO can not load or parse your blast report.
>


From cjfields at uiuc.edu  Sun May  7 22:04:13 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sun, 7 May 2006 21:04:13 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>

These are debugging lines (not errors); you still have the -verbose flag set.  

Did you follow Jason's advice?  I believe he's right on the money about the issue 
at hand...

Chris

---- Original message ----
>Date: Sun, 07 May 2006 19:41:14 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore  
>To: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>, bioperl-
l at bioperl.org, Chris Fields <cjfields at uiuc.edu>, Jason Stajich 
<jason.stajich at duke.edu>
>
>hi,
>I have corrected that and now I finally I got a few error messages:
>
>blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
>Madden, Alejandro A. Sch?ffer,
>blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
>David J. Lipman
>blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
>generation of
>blast.pm: unrecognized line protein database search programs", Nucleic 
>Acids Res. 25:3389-3402.
>blast.pm: unrecognized line RID: 
1137529800-24476-151611170370.BLASTQ1
>
>after that line it stops without terminating....
>
>
>Torsten Seemann wrote:
>> Hubert Prielinger wrote:
>>> ok, thanks
>>> I have submitted the bug
>>> bug #1994
>>
>> This is a line from the script you sent to Bugzilla:
>>
>> my $search = new Bio::SearchIO (
>> -verbose => 1,-format => 'blast', -file => $file)
>> or die "could not open blast report" if not defined my $search;
>>
>> Althoygh syntactically correct, I don't think it is doing what you want.
>> Please change it to this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
>> die "could not open blast report";
>>
>> or alternatively, this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>> if (not defined $search) {
>>   die "could not open blast report";
>> }
>>
>> and let us know what happens.
>>
>> all the example output you have supplied still suggests that 
>> Bio::SearchIO can not load or parse your blast report.
>>
>


From jason.stajich at duke.edu  Sun May  7 22:47:00 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 22:47:00 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <430DE892-8EE8-4FC9-8BAC-7D344C876B72@duke.edu>

I'm not really familiar with the module more  than what the  
documentation says so did you try and use the add_targets method to  
add arguments instead?  I had thought the AUTOLOAD method took care  
of access to the cmd line arguments as it does for the other Run  
modules but I am not really sure.  Perhaps folks on the list who use  
this module can provide better advice.

-jason
On May 7, 2006, at 9:18 PM, chen li wrote:

> Hi Jason,
>
> I add the line code
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
>
> Li
>
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
>
>> I put up some info on the wiki (and I encourage
>> other people to do
>> the same!)
>>
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>>
>> Set the command line parameters by just calling a
>> function of the
>> name of the parameter.  To get a list of the
>> available options, this
>> perl code will report it to you:
>>
>> # what are the arguments, and what do they mean?
>>    my $args = $primer3->arguments;
>>
>>    print "ARGUMENT\tMEANING\n";
>>    foreach my $key (keys %{$args}) {print "$key\t",
>> $$args{$key}, "\n"}
>>
>> The info for PRODUCT_SIZE_RANGE is:
>>    (size range list, default 100-300) space
>> separated list of product
>> sizes eg <a>-<b> <x>-<y>
>>
>> I believe you can set the PCR product size with
>>    $primer3->primer_product_size_range("490-510");
>>
>> -jason
>> On May 7, 2006, at 3:34 AM, chen li wrote:
>>
>>> Hi all,
>>>
>>> I use Bio::Tools::Run::Primer3 to design PCR
>> primers.
>>> I want to change some default values, for example,
>> to
>>> increase the PCR product size to 490-510 bp
>> instead of
>>> using the default value of 100-300 bp. What should
>> I
>>> do ?
>>>
>>>
>>> Thanks,
>>>
>>> Li
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From osborne1 at optonline.net  Mon May  8 10:49:22 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 08 May 2006 10:49:22 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <C084D2B2.85D7%osborne1@optonline.net>

Li,

Read the documentation, Bio::Tools::Run::Primer3. It shows examples of the
correct syntax. Also look at bioperl-run/t/Primer3.t.

Brian O.


On 5/7/06 9:18 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> Hi Jason,
> 
> I add the line code
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
> 
> Li
> 
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
> 
>> I put up some info on the wiki (and I encourage
>> other people to do
>> the same!)
>> 
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>> 
>> Set the command line parameters by just calling a
>> function of the 
>> name of the parameter.  To get a list of the
>> available options, this
>> perl code will report it to you:
>> 
>> # what are the arguments, and what do they mean?
>>    my $args = $primer3->arguments;
>> 
>>    print "ARGUMENT\tMEANING\n";
>>    foreach my $key (keys %{$args}) {print "$key\t",
>> $$args{$key}, "\n"}
>> 
>> The info for PRODUCT_SIZE_RANGE is:
>>    (size range list, default 100-300) space
>> separated list of product
>> sizes eg <a>-<b> <x>-<y>
>> 
>> I believe you can set the PCR product size with
>>    $primer3->primer_product_size_range("490-510");
>> 
>> -jason
>> On May 7, 2006, at 3:34 AM, chen li wrote:
>> 
>>> Hi all,
>>> 
>>> I use Bio::Tools::Run::Primer3 to design PCR
>> primers.
>>> I want to change some default values, for example,
>> to
>>> increase the PCR product size to 490-510 bp
>> instead of
>>> using the default value of 100-300 bp. What should
>> I
>>> do ?
>>> 
>>> 
>>> Thanks,
>>> 
>>> Li
>>> 
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> 
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy at colibase.bham.ac.uk  Mon May  8 07:12:49 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Mon, 08 May 2006 12:12:49 +0100
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <445F27B1.40501@colibase.bham.ac.uk>

Hi Li,

I think the syntax you need is:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

I guess you may also need to change the parameter PRIMER_PRODUCT_OPT_SIZE.

Incidentally, such a restricted product size range may mean that Primer3 
is unable to design any suitable primers. If I recall correctly, this 
doesn't cause an error, you just get a Bio::Tools::Primer3 object with 
no primers in it. I have had some success with testing for this, and if 
necessary relaxing some constraints on primer design and re-running 
Primer3.

Hope this helps.
Roy.

--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk

> Hi Jason,
> 
> I add the line code   
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
> 
> Li
> 
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
> 
>> > I put up some info on the wiki (and I encourage
>> > other people to do  
>> > the same!)
>> >
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>> > 
>> > Set the command line parameters by just calling a
>> > function of the  
>> > name of the parameter.  To get a list of the
>> > available options, this  
>> > perl code will report it to you:
>> > 
>> > # what are the arguments, and what do they mean?
>> >    my $args = $primer3->arguments;
>> > 
>> >    print "ARGUMENT\tMEANING\n";
>> >    foreach my $key (keys %{$args}) {print "$key\t",
>> > $$args{$key}, "\n"}
>> > 
>> > The info for PRODUCT_SIZE_RANGE is:
>> >    (size range list, default 100-300) space
>> > separated list of product  
>> > sizes eg <a>-<b> <x>-<y>
>> > 
>> > I believe you can set the PCR product size with
>> >    $primer3->primer_product_size_range("490-510");
>> > 
>> > -jason
>> > On May 7, 2006, at 3:34 AM, chen li wrote:
>> > 
>>> > > Hi all,
>>> > >
>>> > > I use Bio::Tools::Run::Primer3 to design PCR
>> > primers.
>>> > > I want to change some default values, for example,
>> > to
>>> > > increase the PCR product size to 490-510 bp
>> > instead of
>>> > > using the default value of 100-300 bp. What should
>> > I
>>> > > do ?
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Li
>>> > >
>>> > > __________________________________________________
>>> > > Do You Yahoo!?
>>> > > Tired of spam?  Yahoo! Mail has the best spam
>> > protection around
>>> > > http://mail.yahoo.com
>>> > > _______________________________________________
>>> > > Bioperl-l mailing list
>>> > > Bioperl-l at lists.open-bio.org
>>> > >
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > 
>> > --
>> > Jason Stajich
>> > Duke University
>> > http://www.duke.edu/~jes12
>> > 
>> > 
>> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Mon May  8 09:21:54 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 8 May 2006 06:21:54 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <445F27B1.40501@colibase.bham.ac.uk>
Message-ID: <20060508132154.71440.qmail@web36802.mail.mud.yahoo.com>

I think Dr. Chaudhuri is correct. 

I add the follwoing line codes to my script(actually
copy from the document)

$primer3->add_targets(
PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

$primer3->add_targets('PRIMER_MIN_TM'=>60,
'PRIMER_MAX_TM'=>64);

to design the primers with product size from 490-510
bp and primer annealing Tm from 60 to 64C .

Here is part of the output in the file called
temp.out:

.......... original sequence.....
GTGGGCTGGTGTTGCTTGGAAAATTTCAAAATCCCAAAGTTTCAGGCTTCCCAAAGTTGGCTTGGAAAAATGTGATAGTCTCACCTGAGTCTAGACATGT
.................

PRIMER_PRODUCT_SIZE_RANGE=490-510
PRIMER_MIN_TM=60
PRIMER_MAX_TM=64
PRIMER_PAIR_PENALTY=0.1544
PRIMER_LEFT_PENALTY=0.081468
PRIMER_RIGHT_PENALTY=0.072951
PRIMER_LEFT_SEQUENCE=CCAAAGTTGGCTTGGAAAAA
...............................
PRIMER_PRODUCT_SIZE=501

..............

This is what I want. If you don't set the special
parameters such annealing Tm program will use the
defualt ones. If you set your own parameters they will
show up after the sequence (see this output example).

If one needs to set more parameters and wants to know
what parameters are available just browse the code for
BEGIN section.

Now I have another question: the program always prints
out the original sequence at the beginning is it
possible not to do that?

Thanks all for join this topic,

Li 


--- Roy Chaudhuri <roy at colibase.bham.ac.uk> wrote:

> Hi Li,
> 
> I think the syntax you need is:
> 
>
$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');
> 
> I guess you may also need to change the parameter
> PRIMER_PRODUCT_OPT_SIZE.
> 
> Incidentally, such a restricted product size range
> may mean that Primer3 
> is unable to design any suitable primers. If I
> recall correctly, this 
> doesn't cause an error, you just get a
> Bio::Tools::Primer3 object with 
> no primers in it. I have had some success with
> testing for this, and if 
> necessary relaxing some constraints on primer design
> and re-running 
> Primer3.
> 
> Hope this helps.
> Roy.
> 
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
> 
> http://xbase.bham.ac.uk
> 
> > Hi Jason,
> > 
> > I add the line code   
> > $primer3->primer_product_size_range("490-510");
> >  to my script. But it doesn't work nor primer3
> > complains it.
> > 
> > Li
> > 
> > --- Jason Stajich <jason.stajich at duke.edu> wrote:
> > 
> >> > I put up some info on the wiki (and I encourage
> >> > other people to do  
> >> > the same!)
> >> >
> >
>
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
> >> > 
> >> > Set the command line parameters by just calling
> a
> >> > function of the  
> >> > name of the parameter.  To get a list of the
> >> > available options, this  
> >> > perl code will report it to you:
> >> > 
> >> > # what are the arguments, and what do they
> mean?
> >> >    my $args = $primer3->arguments;
> >> > 
> >> >    print "ARGUMENT\tMEANING\n";
> >> >    foreach my $key (keys %{$args}) {print
> "$key\t",
> >> > $$args{$key}, "\n"}
> >> > 
> >> > The info for PRODUCT_SIZE_RANGE is:
> >> >    (size range list, default 100-300) space
> >> > separated list of product  
> >> > sizes eg <a>-<b> <x>-<y>
> >> > 
> >> > I believe you can set the PCR product size with
> >> >   
> $primer3->primer_product_size_range("490-510");
> >> > 
> >> > -jason
> >> > On May 7, 2006, at 3:34 AM, chen li wrote:
> >> > 
> >>> > > Hi all,
> >>> > >
> >>> > > I use Bio::Tools::Run::Primer3 to design PCR
> >> > primers.
> >>> > > I want to change some default values, for
> example,
> >> > to
> >>> > > increase the PCR product size to 490-510 bp
> >> > instead of
> >>> > > using the default value of 100-300 bp. What
> should
> >> > I
> >>> > > do ?
> >>> > >
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Li
> >>> > >
> >>> > >
> __________________________________________________
> >>> > > Do You Yahoo!?
> >>> > > Tired of spam?  Yahoo! Mail has the best
> spam
> >> > protection around
> >>> > > http://mail.yahoo.com
> >>> > >
> _______________________________________________
> >>> > > Bioperl-l mailing list
> >>> > > Bioperl-l at lists.open-bio.org
> >>> > >
> >> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> > 
> >> > --
> >> > Jason Stajich
> >> > Duke University
> >> > http://www.duke.edu/~jes12
> >> > 
> >> > 
> >> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> > http://mail.yahoo.com 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From hubert.prielinger at gmx.at  Mon May  8 15:09:29 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 08 May 2006 13:09:29 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>
References: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>
Message-ID: <445F9769.70500@gmx.at>

hi all together,
i have solved the problem, because I'm parsing blast 2.2.13 and I have 
installed an early bioperl 1.5.1 and there it occurred that
bug 1934 wasn't fixed yet, so I had to exchange the blast.pm file and 
now it works properly.

thank you very much
Hubert

Christopher Fields wrote:
> These are debugging lines (not errors); you still have the -verbose flag set.  
>
> Did you follow Jason's advice?  I believe he's right on the money about the issue 
> at hand...
>
> Chris
>
> ---- Original message ----
>   
>> Date: Sun, 07 May 2006 19:41:14 -0600
>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>> Subject: Re: [Bioperl-l] [BULK]  can't parse blast file anymore  
>> To: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>, bioperl-
>>     
> l at bioperl.org, Chris Fields <cjfields at uiuc.edu>, Jason Stajich 
> <jason.stajich at duke.edu>
>   
>> hi,
>> I have corrected that and now I finally I got a few error messages:
>>
>> blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
>> Madden, Alejandro A. Sch?ffer,
>> blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
>> David J. Lipman
>> blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
>> generation of
>> blast.pm: unrecognized line protein database search programs", Nucleic 
>> Acids Res. 25:3389-3402.
>> blast.pm: unrecognized line RID: 
>>     
> 1137529800-24476-151611170370.BLASTQ1
>   
>> after that line it stops without terminating....
>>
>>
>> Torsten Seemann wrote:
>>     
>>> Hubert Prielinger wrote:
>>>       
>>>> ok, thanks
>>>> I have submitted the bug
>>>> bug #1994
>>>>         
>>> This is a line from the script you sent to Bugzilla:
>>>
>>> my $search = new Bio::SearchIO (
>>> -verbose => 1,-format => 'blast', -file => $file)
>>> or die "could not open blast report" if not defined my $search;
>>>
>>> Althoygh syntactically correct, I don't think it is doing what you want.
>>> Please change it to this:
>>>
>>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
>>> die "could not open blast report";
>>>
>>> or alternatively, this:
>>>
>>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>>> if (not defined $search) {
>>>   die "could not open blast report";
>>> }
>>>
>>> and let us know what happens.
>>>
>>> all the example output you have supplied still suggests that 
>>> Bio::SearchIO can not load or parse your blast report.
>>>
>>>       
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From s.johri at imperial.ac.uk  Mon May  8 11:38:13 2006
From: s.johri at imperial.ac.uk (Johri, Saurabh)
Date: Mon, 8 May 2006 16:38:13 +0100
Subject: [Bioperl-l] PAML + Codeml problem..
Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>

Hi all,
 
I'm trying to use codeml from PAML to estimate Ka, Ks values from
sequences within a multi fasta file:
i'm using the code which has been posted on the bioperl wiki...
 
However, when I run the code, i get the following errors:
 
I did a google search to see if anyone had come across similar
problems.... in which case the problem seems to have been due to the
sequences not being a multiple of 3,
In my code I check if the sequence is a multiple of 3 and if  not, i
alter the sequences until this is the case, although I still have the
same error messages,
 
Any suggestions as to why this could be happening?
 
Thanks!!!
 
Saurabh Johri
Tuberculosis Research Group
Centre for Molecular Microbiology & Infection
Imperial College London
SW7 2AZ

 
-------------------- WARNING ---------------------
MSG: There was an error - see error_string for the program output
---------------------------------------------------
 
------------- EXCEPTION Bio::Root::NotImplemented -------------
MSG: Unknown format of PAML output
STACK Bio::Tools::Phylo::PAML::_parse_summary
/sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359
STACK Bio::Tools::Phylo::PAML::next_result
/sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224
------------------------------------
 
>Rv3923c
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_cdc1551
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac
>Rv3923c_mtb_f11
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_c1
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_210
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mbovis
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
 
------------------------------------


From chen_li3 at yahoo.com  Mon May  8 20:21:42 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 8 May 2006 17:21:42 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple sequences
Message-ID: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com>

Dear all,

The following is the script I use to design primers
for one sequence:

#!/cygdrive/c/Perl/bin/perl.exe

use warnings;
use strict;

use Bio::Tools::Run::Primer3;
use Bio::SeqIO;

my $file_in='piwil2.fa';
my $file_out='temp.out';
my $seqio=Bio::SeqIO->new(-file=>$file_in)
                    
my $seq=$seqio->next_seq;      
my $primer3=Bio::Tools::Run::Primer3->new(
                                            
-seq=>$seq,
-outfile=>$file_out,
- path=>"c:/Perl/local/primer3_1.0.0/src/primer3.exe" 
                                           
 );
                                                    
  unless ($primer3->executable){                	print
"primer3 can not be found. 
             Is it installed?\n"; 
  		exit(-1);
   }

$primer3->add_targets(
# set your own parameters for the primers or product
				
'PRIMER_OPT_GC_PERCENT'=>' 50   ',		
'PRIMER_OPT_SIZE'=>  '24    ',		
'PRIMER_OPT_TM'=>  ' 60   ');
                      	
  my $result=$primer3->run;    

   exit;

I try to modify it for multiple sequences by using a
while loop as following:

while ($seq=$seqio->next_seq){

my $primer3=Bio::Tools::Run::Primer3->new()
# design the primer}
....}

I get primers only for the last sequence. It seems the
earlier ones are overwritten.

Any idea will be highly aprreciated.

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From jason.stajich at duke.edu  Mon May  8 20:59:26 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 8 May 2006 20:59:26 -0400
Subject: [Bioperl-l] PAML + Codeml problem..
In-Reply-To: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>
References: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>
Message-ID: <4796FE3D-9D14-4D93-B455-69EDFE2B2B62@duke.edu>

Saurabh -

a) These sequences are identical except for difference in length so  
there isn't going to be any interesting values from PAML, but maybe  
you are just providing an example?
b) I think you are missing the trailing gaps in the alignment of the  
Rv3923c_mtb_cdc1551 sequence as it is shorter PAML requires aligned  
sequences as input.
c) The sequences, in the reading frame you have provided (and using  
the standard translation table), have stop codons in them, this will  
cause failure as well.

Which code from the wiki are you running, the 'running PAML' part of  
the HOWTO?

Try looking at the actual output from PAML to figure out what is wrong.
Add this when initializing the Run object:
-save_tempfiles => 1,
-verbose => 1,

then open up the tempdir that is reported and look at the output  
files (mlc file).

-jason

On May 8, 2006, at 11:38 AM, Johri, Saurabh wrote:

> Hi all,
>
> I'm trying to use codeml from PAML to estimate Ka, Ks values from
> sequences within a multi fasta file:
> i'm using the code which has been posted on the bioperl wiki...
>
> However, when I run the code, i get the following errors:
>
> I did a google search to see if anyone had come across similar
> problems.... in which case the problem seems to have been due to the
> sequences not being a multiple of 3,
> In my code I check if the sequence is a multiple of 3 and if  not, i
> alter the sequences until this is the case, although I still have the
> same error messages,
>
> Any suggestions as to why this could be happening?
>
> Thanks!!!
>
> Saurabh Johri
> Tuberculosis Research Group
> Centre for Molecular Microbiology & Infection
> Imperial College London
> SW7 2AZ
>
>
>
>
> -------------------- WARNING ---------------------
> MSG: There was an error - see error_string for the program output
> ---------------------------------------------------
>
> ------------- EXCEPTION Bio::Root::NotImplemented -------------
> MSG: Unknown format of PAML output
> STACK Bio::Tools::Phylo::PAML::_parse_summary
> /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359
> STACK Bio::Tools::Phylo::PAML::next_result
> /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224
> ------------------------------------
>
>> Rv3923c
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_cdc1551
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac
>> Rv3923c_mtb_f11
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_c1
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_210
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mbovis
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>
> ------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From osborne1 at optonline.net  Mon May  8 21:17:22 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 08 May 2006 21:17:22 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com>
Message-ID: <C08565E2.85FD%osborne1@optonline.net>

Li,

If you're analyzing multiple input sequences you're going to have to create
multiple output sequences.

Brian O.


On 5/8/06 8:21 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> I get primers only for the last sequence. It seems the
> earlier ones are overwritten.


From WiersmaP at AGR.GC.CA  Mon May  8 21:28:27 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Mon, 8 May 2006 21:28:27 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C41@onncrxms5.agr.gc.ca>

Hi Li,

 
When you execute $primer3->run with a Bio::Tools::Run::Primer3 object it
opens -outfile=>"filename" for writing and then closes.  That's why
putting it in a loop will overwrite your output file each time so you
only see the last one.  I suppose you could read in each output file
before looping to the next seq and append it to another file.

 
If you're doing a fair bit of work with this module it would be worth
looking at the Bio::Tools::Primer3 module.  The statement $result =
$primer3->run produces a Bio::Tools::Primer3 object which has all the
methods you need for customizing your output.

 
Paul

 
Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC

wiersmap at agr.gc.ca

 
From simon_sask at yahoo.com  Tue May  9 04:06:04 2006
From: simon_sask at yahoo.com (Simon K. Chan)
Date: Tue, 9 May 2006 01:06:04 -0700 (PDT)
Subject: [Bioperl-l] Raw Blast Alignment
Message-ID: <20060509080604.53621.qmail@web54104.mail.yahoo.com>

Hi Fellow Bioperl-ers,

bioperl-live/examples/searchio/rawwriter.pl is
supposed to show the raw alignments using
Bio::SearchIO.  The script is written to parse a
PSI-BLAST report.  I found an old email in the archive
from Jason stating that this should parse other
flavors of blast reports as well.  

What do I need to do to make this script parse non-PSI
blast reports?  I tried to just specify a file and
that the -format is 'blast', but I get an error
stating that the object method 'raw_hit_data' is not
defined in Bio::Search::Hit::BlastHit.

Basically, I want to obtain the raw alignment because
I'd like to get the size of the gaps, not just the
number.

Any help will be much appreciated.
Many thanks


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  9 08:21:02 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Tue, 9 May 2006 07:21:02 -0500
Subject: [Bioperl-l] Raw Blast Alignment
Message-ID: <fe65cab2.b5b5696a.81acb00@expms6.cites.uiuc.edu>

You need to read the SearchIO HOWTO, which gives several examples:

http://www.bioperl.org/wiki/HOWTO:SearchIO

Chris

---- Original message ----
>Date: Tue, 9 May 2006 01:06:04 -0700 (PDT)
>From: "Simon K. Chan" <simon_sask at yahoo.com>  
>Subject: [Bioperl-l] Raw Blast Alignment  
>To: bioperl-l at lists.open-bio.org
>
>Hi Fellow Bioperl-ers,
>
>bioperl-live/examples/searchio/rawwriter.pl is
>supposed to show the raw alignments using
>Bio::SearchIO.  The script is written to parse a
>PSI-BLAST report.  I found an old email in the archive
>from Jason stating that this should parse other
>flavors of blast reports as well.  
>
>What do I need to do to make this script parse non-PSI
>blast reports?  I tried to just specify a file and
>that the -format is 'blast', but I get an error
>stating that the object method 'raw_hit_data' is not
>defined in Bio::Search::Hit::BlastHit.
>
>Basically, I want to obtain the raw alignment because
>I'd like to get the size of the gaps, not just the
>number.
>
>Any help will be much appreciated.
>Many thanks
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From peterm at bioinf.uni-leipzig.de  Tue May  9 08:44:25 2006
From: peterm at bioinf.uni-leipzig.de (Peter Menzel)
Date: Tue, 09 May 2006 14:44:25 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <44608EA9.1030808@bioinf.uni-leipzig.de>

Hi all,
I am using the Bio::Graphics module to draw sequences and their features 
with Bio::SeqFeature::Generic.
The features I want to highlight are occurrences of transcription 
binding factors. Therefore I want to give every factor its own color, 
but i didn't see how to manage it. I only can colorize complete tracks.
Is there a known workaround?

Thanks, Peter


From Marc.Logghe at DEVGEN.com  Tue May  9 10:13:24 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 9 May 2006 16:13:24 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D88@ANTARESIA.be.devgen.com>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Peter Menzel
> Sent: Tuesday, May 09, 2006 2:44 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] colorize features
> 
> Hi all,
> I am using the Bio::Graphics module to draw sequences and 
> their features with Bio::SeqFeature::Generic.
> The features I want to highlight are occurrences of 
> transcription binding factors. Therefore I want to give every 
> factor its own color, but i didn't see how to manage it. I 
> only can colorize complete tracks.
> Is there a known workaround?

Yes, instead of giving a hardcoded color value you can pass a subroutine
to the option.
-bgcolor => sub {
    my $feat = shift;
    # get your attribute on which you want to base your color
    my ($attr) = $feat->get_tag_values('my_attribute');

    return $attr > 10 ? 'red' : 'green'
}

Not sure about the method calls I am making here (could as well be
get_attributes()) but you get the idea.
Cheers,
Marc


From Marc.Logghe at DEVGEN.com  Tue May  9 10:47:06 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 9 May 2006 16:47:06 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D89@ANTARESIA.be.devgen.com>

Hi Peter,
Actually it is explained much better in this howto:
http://bioperl.org/wiki/HOWTO:Graphics

The examples show the principle I mentioned in my previous post (e.g.
Example 4), but then for the -label or -description options.
But as said, you can apply this as well for (most of ?) the other
options as well.
Regards,
ML 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Marc Logghe
> Sent: Tuesday, May 09, 2006 4:13 PM
> To: Peter Menzel; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] colorize features
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Peter 
> > Menzel
> > Sent: Tuesday, May 09, 2006 2:44 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] colorize features
> > 
> > Hi all,
> > I am using the Bio::Graphics module to draw sequences and their 
> > features with Bio::SeqFeature::Generic.
> > The features I want to highlight are occurrences of transcription 
> > binding factors. Therefore I want to give every factor its 
> own color, 
> > but i didn't see how to manage it. I only can colorize complete 
> > tracks.
> > Is there a known workaround?
> 
> Yes, instead of giving a hardcoded color value you can pass a 
> subroutine to the option.
> -bgcolor => sub {
>     my $feat = shift;
>     # get your attribute on which you want to base your color
>     my ($attr) = $feat->get_tag_values('my_attribute');
> 
>     return $attr > 10 ? 'red' : 'green'
> }
> 
> Not sure about the method calls I am making here (could as well be
> get_attributes()) but you get the idea.
> Cheers,
> Marc
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From WiersmaP at AGR.GC.CA  Tue May  9 11:49:33 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Tue, 9 May 2006 11:49:33 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca>

Hi Li,

The line "my $result = $primer3->run" is already in the code you submitted.  In the Bio::Tools::Primer3 module the author uses "$p3" for the object.  If you change your line to "my $p3 = $primer3->run" you should be able to run the examples below. Process the results for each sequence and output the results before looping to the next sequence.

>From Bio::Tools::Primer3.pm:

 # how many results were there?
 my $num=$p3->number_of_results;
 print "There were $num results\n";

 # get all the results
 my $all_results=$p3->all_results;
 print "ALL the results\n";
 foreach my $key (keys %{$all_results}) {print "$key\t${$all_results}{$key}\n"}

 # get specific results
 my $result1=$p3->primer_results(1);
 print "The first primer is\n";
 foreach my $key (keys %{$result1}) {print "$key\t${$result1}{$key}\n"}

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Monday, May 08, 2006 8:32 PM
To: Wiersma, Paul
Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

I read both documents. What I understand is that
Bio:Tools::Run:Primer3 is for designing primers and
Bio:Tools::Primer3 is for parsing the results. When I
read the documents I do not see this line
 $result = $primer3->run in Bio:Tools::Primer3. I
wonder how you get this infomration.

Thanks,

Li 

--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
>  
> 
> When you execute $primer3->run with a
> Bio::Tools::Run::Primer3 object it
> opens -outfile=>"filename" for writing and then
> closes.  That's why
> putting it in a loop will overwrite your output file
> each time so you
> only see the last one.  I suppose you could read in
> each output file
> before looping to the next seq and append it to
> another file.
> 
>  
> 
> If you're doing a fair bit of work with this module
> it would be worth
> looking at the Bio::Tools::Primer3 module.  The
> statement $result =
> $primer3->run produces a Bio::Tools::Primer3 object
> which has all the
> methods you need for customizing your output.
> 
>  
> 
> Paul
> 
>  
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> 
> wiersmap at agr.gc.ca
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Tue May  9 13:32:32 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 9 May 2006 10:32:32 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca>
Message-ID: <20060509173232.18843.qmail@web36802.mail.mud.yahoo.com>

Thanks Paul it REALLY works.

I have other questions:
1) When I run the script I use this line on the
command prompt

perl primer.pl >test

When I check the default output file(temp.out) used by
the script I only see the information about the last
sequence which is different from what is in the test
file. In test file I can get all the information for
all the sequences.

2)Is it possible directly to use Bio::Tools:: Primer3
to print out selective information such as the primer
sequence and the size of PCR product?  Or do I have
parse the file by myself?

After I get all these information I would like to post
the script for bacth-designing PCR primers.


Thanks,

Li 


--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
> The line "my $result = $primer3->run" is already in
> the code you submitted.  In the Bio::Tools::Primer3
> module the author uses "$p3" for the object.  If you
> change your line to "my $p3 = $primer3->run" you
> should be able to run the examples below. Process
> the results for each sequence and output the results
> before looping to the next sequence.
> 
> >From Bio::Tools::Primer3.pm:
> 
>  # how many results were there?
>  my $num=$p3->number_of_results;
>  print "There were $num results\n";
> 
>  # get all the results
>  my $all_results=$p3->all_results;
>  print "ALL the results\n";
>  foreach my $key (keys %{$all_results}) {print
> "$key\t${$all_results}{$key}\n"}
> 
>  # get specific results
>  my $result1=$p3->primer_results(1);
>  print "The first primer is\n";
>  foreach my $key (keys %{$result1}) {print
> "$key\t${$result1}{$key}\n"}
> 
> Paul
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> ?
> 
> 
> 
> -----Original Message-----
> From: chen li [mailto:chen_li3 at yahoo.com] 
> Sent: Monday, May 08, 2006 8:32 PM
> To: Wiersma, Paul
> Subject: Re: [Bioperl-l] use primer3 to design
> primers with multiple sequences
> 
> Hi Paul,
> 
> I read both documents. What I understand is that
> Bio:Tools::Run:Primer3 is for designing primers and
> Bio:Tools::Primer3 is for parsing the results. When
> I
> read the documents I do not see this line
>  $result = $primer3->run in Bio:Tools::Primer3. I
> wonder how you get this infomration.
> 
> Thanks,
> 
> Li 
> 
> --- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:
> 
> > Hi Li,
> > 
> >  
> > 
> > When you execute $primer3->run with a
> > Bio::Tools::Run::Primer3 object it
> > opens -outfile=>"filename" for writing and then
> > closes.  That's why
> > putting it in a loop will overwrite your output
> file
> > each time so you
> > only see the last one.  I suppose you could read
> in
> > each output file
> > before looping to the next seq and append it to
> > another file.
> > 
> >  
> > 
> > If you're doing a fair bit of work with this
> module
> > it would be worth
> > looking at the Bio::Tools::Primer3 module.  The
> > statement $result =
> > $primer3->run produces a Bio::Tools::Primer3
> object
> > which has all the
> > methods you need for customizing your output.
> > 
> >  
> > 
> > Paul
> > 
> >  
> > 
> > Paul A. Wiersma
> > Agriculture and Agri-Food Canada/Agriculture et
> > Agroalimentaire Canada
> > Summerland, BC
> > 
> > wiersmap at agr.gc.ca
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From WiersmaP at AGR.GC.CA  Tue May  9 13:59:20 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Tue, 9 May 2006 13:59:20 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca>

Hi Li,

I've attached some code I used to explore basic functionality of Primer3.pm modules.  Hopefully you can see how I've picked out parts of the results for printing.  You can modify it as you need to output only some results.

>>>>>>>>
  # design the primers. This runs primer3 and returns a 
  # Bio::Tools::Run::Primer3 object with the results
my $results=$primer3->run;

  # see the Bio::Tools::Run::Primer3 pod for
  # things that you can get from this. For example:

print "There were ", $results->number_of_results+1, " primers\n";

my @out_keys_part = qw(	START
			LENGTH
			TM
			GC_PERCENT
			SELF_ANY
			SELF_END
			SEQUENCE );

for (my $i=0;$i <= $results->number_of_results;$i++){
	
	# get specific results
	my $result1=$results->primer_results($i);
 
	print "\n",$i+1;	
	for $key qw(PRIMER_LEFT PRIMER_RIGHT){
			
			my ($start, $length) = split /,/, ${$result1}{$key};
			${$result1}{$key."_START"} = $start;
			${$result1}{$key."_LENGTH"} = $length;
			foreach $partkey (@out_keys_part) {
				print "\t", ${$result1}{$key."_".$partkey};
			} 
			print "\n";
	}
	print "\tPRODUCT SIZE: ", ${$result1}{'PRIMER_PRODUCT_SIZE'}, ", PAIR ANY COMPL: ",
				${$result1}{'PRIMER_PAIR_COMPL_ANY'};
	print ", PAIR 3\' COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_END'}, "\n";
}
>>>>>>>>>>>>>>>

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Telephone/T?l?phone: 250-494-6388
Facsimile/T?l?copieur: 250-494-0755
Box 5000, 4200 Hwy 97
Summerland, BC
V0H 1Z0
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Tuesday, May 09, 2006 10:33 AM
To: Wiersma, Paul
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences

Thanks Paul it REALLY works.

I have other questions:
1) When I run the script I use this line on the
command prompt

perl primer.pl >test

When I check the default output file(temp.out) used by
the script I only see the information about the last
sequence which is different from what is in the test
file. In test file I can get all the information for
all the sequences.

2)Is it possible directly to use Bio::Tools:: Primer3
to print out selective information such as the primer
sequence and the size of PCR product?  Or do I have
parse the file by myself?

After I get all these information I would like to post
the script for bacth-designing PCR primers.


Thanks,

Li 


--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
> The line "my $result = $primer3->run" is already in
> the code you submitted.  In the Bio::Tools::Primer3
> module the author uses "$p3" for the object.  If you
> change your line to "my $p3 = $primer3->run" you
> should be able to run the examples below. Process
> the results for each sequence and output the results
> before looping to the next sequence.
> 
> >From Bio::Tools::Primer3.pm:
> 
>  # how many results were there?
>  my $num=$p3->number_of_results;
>  print "There were $num results\n";
> 
>  # get all the results
>  my $all_results=$p3->all_results;
>  print "ALL the results\n";
>  foreach my $key (keys %{$all_results}) {print
> "$key\t${$all_results}{$key}\n"}
> 
>  # get specific results
>  my $result1=$p3->primer_results(1);
>  print "The first primer is\n";
>  foreach my $key (keys %{$result1}) {print
> "$key\t${$result1}{$key}\n"}
> 
> Paul
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> ?
> 
> 
> 
> -----Original Message-----
> From: chen li [mailto:chen_li3 at yahoo.com] 
> Sent: Monday, May 08, 2006 8:32 PM
> To: Wiersma, Paul
> Subject: Re: [Bioperl-l] use primer3 to design
> primers with multiple sequences
> 
> Hi Paul,
> 
> I read both documents. What I understand is that
> Bio:Tools::Run:Primer3 is for designing primers and
> Bio:Tools::Primer3 is for parsing the results. When
> I
> read the documents I do not see this line
>  $result = $primer3->run in Bio:Tools::Primer3. I
> wonder how you get this infomration.
> 
> Thanks,
> 
> Li 
> 
> --- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:
> 
> > Hi Li,
> > 
> >  
> > 
> > When you execute $primer3->run with a
> > Bio::Tools::Run::Primer3 object it
> > opens -outfile=>"filename" for writing and then
> > closes.  That's why
> > putting it in a loop will overwrite your output
> file
> > each time so you
> > only see the last one.  I suppose you could read
> in
> > each output file
> > before looping to the next seq and append it to
> > another file.
> > 
> >  
> > 
> > If you're doing a fair bit of work with this
> module
> > it would be worth
> > looking at the Bio::Tools::Primer3 module.  The
> > statement $result =
> > $primer3->run produces a Bio::Tools::Primer3
> object
> > which has all the
> > methods you need for customizing your output.
> > 
> >  
> > 
> > Paul
> > 
> >  
> > 
> > Paul A. Wiersma
> > Agriculture and Agri-Food Canada/Agriculture et
> > Agroalimentaire Canada
> > Summerland, BC
> > 
> > wiersmap at agr.gc.ca
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  9 17:13:43 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 9 May 2006 16:13:43 -0500
Subject: [Bioperl-l] Oddness in Bio::SeqIO
Message-ID: <000601c673ad$74601c30$15327e82@pyrimidine>

I noticed an odd thing with SeqIO parsing of species lines (those
problematic bacterial tax names again).  I have a simple script that runs
output to STDOUT to generate a list of hits.  Here's what I get:

Bacterium: Corynebacterium glutamicum ATCC 13032
         hits: 4
Bacterium: Corynebacterium jeikeium K411 K411 <--
         hits: 1
Bacterium: Frankia sp. CcI3 CcI3 <--
         hits: 1
Bacterium: Frankia sp. EAN1pec EAN1pec <--
         hits: 1
Bacterium: Janibacter sp. HTCC2649 HTCC2649 <--
         hits: 1
Bacterium: Kineococcus radiotolerans SRS30216 SRS30216  <--
         hits: 1
Bacterium: Leifsonia xyli subsp. xyli str. CTCB07 xyli str. CTCB07 <--
         hits: 1
Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
K-10 <--

...

Most (but not all) of the strain numbers get repeated (marked with arrows).
This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
(and thus passed through Bio::SeqIO).  Anyone seen this before?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From torsten.seemann at infotech.monash.edu.au  Tue May  9 19:42:29 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 10 May 2006 09:42:29 +1000
Subject: [Bioperl-l] Oddness in Bio::SeqIO
In-Reply-To: <000601c673ad$74601c30$15327e82@pyrimidine>
References: <000601c673ad$74601c30$15327e82@pyrimidine>
Message-ID: <446128E5.1000908@infotech.monash.edu.au>

Chris,

> I noticed an odd thing with SeqIO parsing of species lines (those
> problematic bacterial tax names again).  I have a simple script that runs
> output to STDOUT to generate a list of hits.  Here's what I get:

> Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
> K-10 <--

In this case,

Genus = Mycobacterium
Species = avium
Subspecies = paratuberculosis
Strain = K-10

which suggests that BioPerl is trying to handle something special, 
because the 'subsp.' is gone?

Here's the pertinent parts of the Genbank file
(apologies for the wrapping):

LOCUS       NC_002944            4829781 bp    DNA     circular BCT 
18-JAN-2006
DEFINITION  Mycobacterium avium subsp. paratuberculosis K-10, complete 
genome.
SOURCE      Mycobacterium avium subsp. paratuberculosis K-10
   ORGANISM  Mycobacterium avium subsp. paratuberculosis K-10
             Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
             Corynebacterineae; Mycobacteriaceae; Mycobacterium; 
Mycobacterium
             avium complex (MAC).

                      /organism="Mycobacterium avium subsp. 
paratuberculosis K-10"
                      /strain="K-10"
                      /sub_species="paratuberculosis"


> Most (but not all) of the strain numbers get repeated (marked with arrows).
> This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
> (and thus passed through Bio::SeqIO).  Anyone seen this before?

The problem is mentioned in the wiki so it must have come up before?
http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data

I also deal with Bacteria mainly, and should also look into this. I 
haven't been using the genbank headers directly, only the features, so i 
never came across this.

Another thing which may crop up is when no Species has been allocated 
yet but the genus is known (or something like that). In that case the 
name is written as "Genus spp." eg.  	 Gallibacterium spp.

--Torsten


From chen_li3 at yahoo.com  Tue May  9 21:04:08 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 9 May 2006 18:04:08 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C47@onncrxms5.agr.gc.ca>
Message-ID: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com>

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now
figure out the line
"my $result1=$results->primer_results(1);"

returns a hash reference containing all the
information for the first pair of primer.  1)Since it
is a hash I should be able to get the specific value
for its corresponding  key by telling Perl which key
is the entry for the value. 2) Also it is a reference
I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks
a little bit complicated to me. But I get the job done
by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration 
#foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From zhouyubio at gmail.com  Tue May  9 21:35:01 2006
From: zhouyubio at gmail.com (Yu ZHOU)
Date: Wed, 10 May 2006 01:35:01 +0000 (UTC)
Subject: [Bioperl-l] pubmed
References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu>
Message-ID: <loom.20060510T032601-573@post.gmane.org>

Qunfeng <qfdong <at> iastate.edu> writes:

> 
> Hi there,
> 
> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
> 
> I am not very familiar with BioPerl. I tried to follow the example showing 
> in the above page to retrieve pubmed ID under each Reference tag , i.e., 
> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The 
> authors() works for me.  Appreciate any suggestions.
> 
> Qunfeng 
> 


Hi, 

I have the same problem with you. Here is what I have done, by using regular
expression to match the value of 'location' tag, if there is.

#------------------
my $ann = $seqobj->annotation(); # annotation object
foreach my $ref ( $ann->get_Annotations('reference') ) {
    print "Title: ", $ref->title,"\n";
    print "Location: ", $ref->location, "\n";
    if ($ref->location =~ /PUBMED\s+(\d+)/) {
	my $pmid = $1;
	print "PMID: ", $pmid, "\n";
    }
    print "Authors: ", $ref->authors, "\n";
}
#------------------


From osborne1 at optonline.net  Tue May  9 23:01:49 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 09 May 2006 23:01:49 -0400
Subject: [Bioperl-l] pubmed
In-Reply-To: <loom.20060510T032601-573@post.gmane.org>
Message-ID: <C086CFDD.865A%osborne1@optonline.net>

Qunfeng,

I'm using bioperl-live, I'm able retrieve the single PubMed id found in the
56961711 entry using the pubmed() method. Note that there are 4 references,
only one of which has a Pubmed id. Also, the authors() method prints out the
authors, not the Pubmed id. If you have a problem please show your code and
tell us which version of Bioperl you're using.

Brian O.


use strict;

use lib "/Users/bosborne/bioperl-live";

use Bio::DB::GenBank;


my $db = Bio::DB::GenBank->new;

my $seq = $db->get_Seq_by_id(56961711);

my $ann_coll = $seq->annotation;


foreach my $ann ($ann_coll->get_Annotations('reference')) {

  print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n";

}


On 5/9/06 9:35 PM, "Yu ZHOU" <zhouyubio at gmail.com> wrote:

> Qunfeng <qfdong <at> iastate.edu> writes:
> 
>> 
>> Hi there,
>> 
>> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
>> 
>> I am not very familiar with BioPerl. I tried to follow the example showing
>> in the above page to retrieve pubmed ID under each Reference tag , i.e.,
>> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The
>> authors() works for me.  Appreciate any suggestions.
>> 
>> Qunfeng 
>> 
> 
> 
> Hi, 
> 
> I have the same problem with you. Here is what I have done, by using regular
> expression to match the value of 'location' tag, if there is.
> 
> #------------------
> my $ann = $seqobj->annotation(); # annotation object
> foreach my $ref ( $ann->get_Annotations('reference') ) {
>     print "Title: ", $ref->title,"\n";
>     print "Location: ", $ref->location, "\n";
>     if ($ref->location =~ /PUBMED\s+(\d+)/) {
> my $pmid = $1;
> print "PMID: ", $pmid, "\n";
>     }
>     print "Authors: ", $ref->authors, "\n";
> }
> #------------------
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Wed May 10 05:30:59 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 10 May 2006 10:30:59 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
Message-ID: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>

Hi,
I'm a little confused as to how names are supposed to work in 
Bio::Taxonomy::Node.

In the bioperl versions that I've looked at a Node doesn't seem to store 
the most important information about itself - it's scientific name - in 
an obvious place. bioperl 1.5.1 puts it at the start of the 
classification list. I'd have thought sticking it in -name would make 
more sense, but this is used only for the GenBank common name.

The Bio::Taxonomy docs still suggests:

my $node_species_sapiens = Bio::Taxonomy::Node->new(
   -object_id => 9606, # or -ncbi_taxid. Requird tag
   -names => {
       'scientific' => ['sapiens'],
       'common_name' => ['human']
   },
   -rank => 'species'  # Required tag
);

and whilst Bio::Taxonomy::Node does not accept -names, it does have a 
'name' method which claims to work like:

$obj->name('scientific', 'sapiens');

This kind of thing would be really nice, but afaics 
Bio::Taxonomy::Node->new takes the -name value and makes a common name 
out of it, whilst the name() method passes any 'scientific' name to the 
scientific_name() method which is unable to set any value (and warns 
about this), only get.

It seems like the need to have this classification array work the same 
way as Bio::Species is causing some unnecessary restrictions. Can't the 
more sensible idea of having a dedicated storage spot for the 
ScientificName and other parameters be used, with the classification 
array either being generated just-in-time from the hash-stored data, or 
indeed being generated from the Lineage field?


Also, why does a node store the complete hierarchy on itself in the 
classification array? If we're going that far, why don't the 
Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a 
get_taxonomy() method instead of a get_Taxonomy_Node() method. 
get_taxonomy() could, from a single efetch.fcgi lookup, create a 
complete Bio::Taxonomy with all the nodes. Whilst most nodes would only 
have a minimum of information, if you could simply ask a node what its 
rank and scientific name was you could easily build a classification 
array, or ask what Kingdom your species was in etc.

Are there good reasons for Taxonomy working the way it does in 1.5.1, or 
would I not be wasting my time re-writing things to make more sense (to me)?


Cheers,
Sendu.


From osborne1 at optonline.net  Wed May 10 10:33:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 10 May 2006 10:33:18 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca>
Message-ID: <C08771EE.866A%osborne1@optonline.net>

Paul,

I took your code, added some "run" code and made it into a script and added
this to CVS, examples/tools/run_primer3.pl. I hope this is OK with you.

Brian O.


On 5/9/06 1:59 PM, "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> $results->number_of_results


From stoltzfu at umbi.umd.edu  Tue May  9 16:22:43 2006
From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus)
Date: Tue, 09 May 2006 16:22:43 -0400
Subject: [Bioperl-l] proposal: CDAT (character data and trees) integrative
	object
Message-ID: <D8EE6983-2123-45B3-967C-0E4982428CFA@umbi.umd.edu>

Dear developers--

We propose a Bio::CDAT (Character Data And Trees) module to  
facilitate comparative analysis
using evolutionary methods by 1) managing evolutionary relationships  
(by linking data to trees)
and 2) allowing coordinated analysis of different types of data (by  
implementing a generic concept
of ?character-state? data).

Bio::CDAT would take advantage of existing BioPerl objects and would  
include the functionality
of Rutger Vos's Bio::Phylo.  It would provide the framework to  
develop interfaces to analysis tools
(phylogeny inference, evolutionary rate models, functional shift  
inference, etc), as well as to file
formats and visualization methods appropriate for such analyses.

A proposal is attached.  We would like to hear your thoughts (e.g.,  
see the section on "Questions
to consider")!  Thanks

Arlin Stoltzfus
WeiGang Qiu
Rutger Vos
(with thanks to Justin Reese and Aaron Mackey)
------------------
Arlin Stoltzfus (stoltzfu at umbi.umd.edu)
CARB, 9600 Gudelsky Drive, Rockville, Maryland  20850
tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel
---------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CDAT-proposal.pdf
Type: application/pdf
Size: 193701 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060509/48aeca4b/attachment-0003.pdf>
-------------- next part --------------


From zhouyubio at gmail.com  Wed May 10 04:55:46 2006
From: zhouyubio at gmail.com (Yu Zhou)
Date: Wed, 10 May 2006 16:55:46 +0800
Subject: [Bioperl-l] pubmed
In-Reply-To: <C086CFDD.865A%osborne1@optonline.net>
References: <loom.20060510T032601-573@post.gmane.org>
	<C086CFDD.865A%osborne1@optonline.net>
Message-ID: <613ffb490605100155w43a9ea4sca23818bc7fa4e33@mail.gmail.com>

Thanks!

I am using Bioperl-1.4, not bioperl-live. That may be the reason why
it does not work!


On 5/10/06, Brian Osborne <osborne1 at optonline.net> wrote:
> Qunfeng,
>
> I'm using bioperl-live, I'm able retrieve the single PubMed id found in the
> 56961711 entry using the pubmed() method. Note that there are 4 references,
> only one of which has a Pubmed id. Also, the authors() method prints out the
> authors, not the Pubmed id. If you have a problem please show your code and
> tell us which version of Bioperl you're using.
>
> Brian O.
>
>
> use strict;
>
> use lib "/Users/bosborne/bioperl-live";
>
> use Bio::DB::GenBank;
>
>
>
> my $db = Bio::DB::GenBank->new;
>
> my $seq = $db->get_Seq_by_id(56961711);
>
> my $ann_coll = $seq->annotation;
>
>
> foreach my $ann ($ann_coll->get_Annotations('reference')) {
>
>   print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n";
>
> }
>
>
>
>
>
> On 5/9/06 9:35 PM, "Yu ZHOU" <zhouyubio at gmail.com> wrote:
>
> > Qunfeng <qfdong <at> iastate.edu> writes:
> >
> >>
> >> Hi there,
> >>
> >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
> >>
> >> I am not very familiar with BioPerl. I tried to follow the example
> showing
> >> in the above page to retrieve pubmed ID under each Reference tag , i.e.,
> >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The
> >> authors() works for me.  Appreciate any suggestions.
> >>
> >> Qunfeng
> >>
> >
> >
> > Hi,
> >
> > I have the same problem with you. Here is what I have done, by using
> regular
> > expression to match the value of 'location' tag, if there is.
> >
> > #------------------
> > my $ann = $seqobj->annotation(); # annotation object
> > foreach my $ref ( $ann->get_Annotations('reference') ) {
> >     print "Title: ", $ref->title,"\n";
> >     print "Location: ", $ref->location, "\n";
> >     if ($ref->location =~ /PUBMED\s+(\d+)/) {
> > my $pmid = $1;
> > print "PMID: ", $pmid, "\n";
> >     }
> >     print "Authors: ", $ref->authors, "\n";
> > }
> > #------------------
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>


--
Best Wishes!

Yu


From cjfields at uiuc.edu  Wed May 10 11:46:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 10:46:27 -0500
Subject: [Bioperl-l] Oddness in Bio::SeqIO
In-Reply-To: <446128E5.1000908@infotech.monash.edu.au>
Message-ID: <000f01c67448$e63973b0$15327e82@pyrimidine>

This actually pops up when using $seq->species->common_name; using
$seq->species->binomial chops some of the strain designations off, so really
neither one works optimally for bacterial genus-species-strain taxonomy.
Hilmar made the suggestion that it's probably best to grab the NCBI TaxID
and parse it out that way by looking it up in the taxonomy database (using
Bio::DB::Taxonomy), but at the moment that's not what Bio::SeqIO::genbank
does.  

I wonder if we should be trying to shove most of this stuff into species
objects directly from the beginning; in other words, maybe we should try to
get the information in Bio::Annotation objects and then, after the
parsing/IO is finished, have a method to get the information into
Bio::Species objects when wanted/needed; a check could be added against the
NCBI Taxonomy database there.  

Anyway, I really haven't looked at how they are parsed out and don't have
the time at the moment.  I may look into this as well but not until I get
back from conference (end of May).  Jason and Brian have been calling for a
refactoring of Bio::SeqIO::genbank for a while; maybe it's getting time to
do something about it...

Chris 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> Sent: Tuesday, May 09, 2006 6:42 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Oddness in Bio::SeqIO
> 
> Chris,
> 
> > I noticed an odd thing with SeqIO parsing of species lines (those
> > problematic bacterial tax names again).  I have a simple script that
> runs
> > output to STDOUT to generate a list of hits.  Here's what I get:
> 
> > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10
> paratuberculosis
> > K-10 <--
> 
> In this case,
> 
> Genus = Mycobacterium
> Species = avium
> Subspecies = paratuberculosis
> Strain = K-10
> 
> which suggests that BioPerl is trying to handle something special,
> because the 'subsp.' is gone?
> 
> Here's the pertinent parts of the Genbank file
> (apologies for the wrapping):
> 
> LOCUS       NC_002944            4829781 bp    DNA     circular BCT
> 18-JAN-2006
> DEFINITION  Mycobacterium avium subsp. paratuberculosis K-10, complete
> genome.
> SOURCE      Mycobacterium avium subsp. paratuberculosis K-10
>    ORGANISM  Mycobacterium avium subsp. paratuberculosis K-10
>              Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
>              Corynebacterineae; Mycobacteriaceae; Mycobacterium;
> Mycobacterium
>              avium complex (MAC).
> 
>                       /organism="Mycobacterium avium subsp.
> paratuberculosis K-10"
>                       /strain="K-10"
>                       /sub_species="paratuberculosis"
> 
> 
> > Most (but not all) of the strain numbers get repeated (marked with
> arrows).
> > This is actually in the GenBank file itself, downloaded via
> Bio::DB::GenBank
> > (and thus passed through Bio::SeqIO).  Anyone seen this before?
> 
> The problem is mentioned in the wiki so it must have come up before?
> http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data
> 
> I also deal with Bacteria mainly, and should also look into this. I
> haven't been using the genbank headers directly, only the features, so i
> never came across this.
> 
> Another thing which may crop up is when no Species has been allocated
> yet but the genus is known (or something like that). In that case the
> name is written as "Genus spp." eg.  	 Gallibacterium spp.
> 
> --Torsten
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cuiw at mail.nih.gov  Wed May 10 12:02:55 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 12:02:55 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiplesequences
In-Reply-To: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F4999@nihcesmlbx10.nih.gov>


'PRIMER_SEQUENCE_ID' is not a key in the Bio::Tools::Primer3 output
hash.

You can find all legal keys by "print keys %{$result1};"


There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

From WiersmaP at AGR.GC.CA  Wed May 10 12:08:37 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Wed, 10 May 2006 12:08:37 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca>

Brian, no problem with the code, thanks for asking.

Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0).  If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error.

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Tuesday, May 09, 2006 6:04 PM
To: Wiersma, Paul
Cc: bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now
figure out the line
"my $result1=$results->primer_results(1);"

returns a hash reference containing all the
information for the first pair of primer.  1)Since it
is a hash I should be able to get the specific value
for its corresponding  key by telling Perl which key
is the entry for the value. 2) Also it is a reference
I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks
a little bit complicated to me. But I get the job done
by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration 
#foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cuiw at mail.nih.gov  Wed May 10 14:42:36 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 14:42:36 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiplesequences:
	bug in code!
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F49A0@nihcesmlbx10.nih.gov>

Hope this works!

Bio::Tools::Primer3 line 264 should be:
 
$self->{seqobject}=Bio::Seq->new(-seq=>$value, -id=>$id);

Then you should be able to display PRIMER_SEQUENCE_ID by

####read primer3 output file############
my $p3=Bio::Tools::Primer3->new(-file=>"data/primer3_output.txt");

########  print id###############
print $p3->seqobject->id;

Wenwu Cui, PhD
NIH/NCI


-----Original Message-----
From: Wiersma, Paul [mailto:WiersmaP at agr.gc.ca] 
Sent: Wednesday, May 10, 2006 12:09 PM
To: chen li
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] use primer3 to design primers with multiplesequences

Brian, no problem with the code, thanks for asking.

Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0).  If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error.

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com]
Sent: Tuesday, May 09, 2006 6:04 PM
To: Wiersma, Paul
Cc: bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);"

returns a hash reference containing all the information for the first pair of primer.  1)Since it is a hash I should be able to get the specific value for its corresponding  key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 10 14:58:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 13:58:19 -0500
Subject: [Bioperl-l] ListSummaries for April 26-May 9
Message-ID: <001801c67463$b3c0a910$15327e82@pyrimidine>

ListSummaries for April 26-May 9 are up at the usual place:

http://www.bioperl.org/wiki/Mailing_list_summaries

Direct link:

http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006

It's a bit of a hurried one so don't be surprised to find a few spelling
errors here and there.  I'm getting ready for a conference in a couple weeks
so I may be off the radar a bit here and there.  The next ListSummary won't
be posted until May 26.  Enjoy!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From chen_li3 at yahoo.com  Wed May 10 20:27:34 2006
From: chen_li3 at yahoo.com (chen li)
Date: Wed, 10 May 2006 17:27:34 -0700 (PDT)
Subject: [Bioperl-l] What is the relationship between primer3 module and
	run-primer3 module?
Message-ID: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>

First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual 
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3 

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run; 

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li 

 
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From jason.stajich at duke.edu  Wed May 10 20:41:31 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 10 May 2006 20:41:31 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module and
	run-primer3 module?
In-Reply-To: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <B1D9C06A-09FF-4342-81E4-7D38AD66F4CA@duke.edu>

Bio::Tools::Run::XXX modules are for running applications...

On May 10, 2006, at 8:27 PM, chen li wrote:

> First thank you all for replying my previous post
> about primer3.
>
> But now I am a little confused even after I read the
> documents: What is the relationship between these two
> modules? What is correct/standard way to use them to
> do the batch-primer design? What I do is that I use
> Bio::Tools::Run::Primer3 to design primers. Based on
> Dr. Roy Chaudhuri's information I can set the
> parameters using the following syntax:
>
> $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');
>
> Based on Paul A. Wiersma's explanation I can also
> print out part of the primer results(because I don't
> need all the information). But there is a little
> trouble: PRIMER_SEQUENCE_ID can't be accessed using
> this method. And Paul points out that
> "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
> individual
> results but only end up by default with
> $results->primer_results(0)".  So it seems there is no
> way to get around this problem using
> Bio::Tools::Run::Primer3. And others suggest using
> Bio::Tools::Primer3 to parse the results. So is true
> that Bio::Tools::Run::Primer3 is for primer design and
> Bio::Tools::Primer3 is for parsing the results from
> Bio::Tools::Run::Primer3? But what I find is that I
> get almost all the results (except PRIMER_SEQUENCE_ID
> and SEQUENCE ) without providing a line code
>
> use Bio::Tools::Primer3
>
> in the script.  How to explain this? Is it because the
> following line code?
>
> my $result=$primer3->run;
>
> The last question: which line code is used to invoke
> program primer3.exe? How does Perl script call the
> primer3.exe?
>
> Once again thank you all very much,
>
> Li
>
>
>
>
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Wed May 10 20:53:43 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 10 May 2006 20:53:43 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
Message-ID: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>

I would use the implementation that talks to the flatfile db as the  
standard here.  nodes are defined by the data in from taxonomy dump  
dbs from ncbi.
the eutils is pretty worthless except for taxid->name or reverse, you  
can't get the full taxonomy (or couldn't when that implementation was  
written).

The "name" method refers to the name of the node - each level in the  
taxonomy can have a "name".

The bits of hackiness relate to wrapping the node object as a  
Bio::Species and/or being able to read  a genbank file and the  
organism taxonomy data as a list and instantiating.  If we could rely  
on everything being in a DB of course this would be simpler.

Another problem is the depth of the taxonomy is not constant for  
every node so assuming that a fixed number of slots will be filled in  
to generate the taxonomy leads to problems.

Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the  
best example of working code as this is how I really wanted it to  
work, the Bio::Species hacks are only there to shoehorn data  
retrieved from genbank files in.  With the flatfile implementation  
you have to walk all the way up the db hierarchy to get the kingdom  
for a node so you do have to build up the classification hierarchy as  
each node only stores data about itsself.

I'm not exactly sure what you are proposing to do, but would  
definitely enjoy another pair of hands, I don't really have time to  
mess with it any time soon.

-jason
On May 10, 2006, at 5:30 AM, Sendu Bala wrote:

> Hi,
> I'm a little confused as to how names are supposed to work in
> Bio::Taxonomy::Node.
>
> In the bioperl versions that I've looked at a Node doesn't seem to  
> store
> the most important information about itself - it's scientific name  
> - in
> an obvious place. bioperl 1.5.1 puts it at the start of the
> classification list. I'd have thought sticking it in -name would make
> more sense, but this is used only for the GenBank common name.
>
> The Bio::Taxonomy docs still suggests:
>
> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>    -names => {
>        'scientific' => ['sapiens'],
>        'common_name' => ['human']
>    },
>    -rank => 'species'  # Required tag
> );
>
> and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> 'name' method which claims to work like:
>
> $obj->name('scientific', 'sapiens');
>
> This kind of thing would be really nice, but afaics
> Bio::Taxonomy::Node->new takes the -name value and makes a common name
> out of it, whilst the name() method passes any 'scientific' name to  
> the
> scientific_name() method which is unable to set any value (and warns
> about this), only get.
>
> It seems like the need to have this classification array work the same
> way as Bio::Species is causing some unnecessary restrictions. Can't  
> the
> more sensible idea of having a dedicated storage spot for the
> ScientificName and other parameters be used, with the classification
> array either being generated just-in-time from the hash-stored  
> data, or
> indeed being generated from the Lineage field?
>
>
> Also, why does a node store the complete hierarchy on itself in the
> classification array? If we're going that far, why don't the
> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> complete Bio::Taxonomy with all the nodes. Whilst most nodes would  
> only
> have a minimum of information, if you could simply ask a node what its
> rank and scientific name was you could easily build a classification
> array, or ask what Kingdom your species was in etc.
>
> Are there good reasons for Taxonomy working the way it does in  
> 1.5.1, or
> would I not be wasting my time re-writing things to make more sense  
> (to me)?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cuiw at mail.nih.gov  Wed May 10 21:46:00 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 21:46:00 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module
	andrun-primer3 module?
References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B07D391@nihcesmlbx10.nih.gov>

1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file.
 
2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output PRIMER_SEQUENCE_ID
 
3. primer3.exe is called in the Bio::Tools::Run::Primer3  "run" function, please read the function definition.


________________________________

From: chen li [mailto:chen_li3 at yahoo.com]
Sent: Wed 5/10/2006 8:27 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module?


First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run;

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 10 23:36:39 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 22:36:39 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
Message-ID: <000301c674ac$1d40f0f0$15327e82@pyrimidine>

I think you can get pretty much everything now, though I can definitely see
the use of a local database.  I ran a few tests, really unrelated to this,
using the powerscripting test page at NCBI for eutils (for the curious, at
http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was able to
retrieve XML-formatted taxonomic information; here's the bacterium Frankia
sp. CcI3 TaxID info, which looks like they have everything set up by rank.
It gives quite a bit of information. 
 
<?xml version="1.0"?>
<!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
"http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
<TaxaSet>

<Taxon>
  <TaxId>106370</TaxId>
  <ScientificName>Frankia sp. CcI3</ScientificName>
  <ParentTaxId>1854</ParentTaxId>
  <Rank>species</Rank>
  <Division>Bacteria</Division>
  <GeneticCode>
    <GCId>11</GCId>
    <GCName>Bacterial and Plant Plastid</GCName>
  </GeneticCode>
  <MitoGeneticCode>
    <MGCId>0</MGCId>
    <MGCName>Unspecified</MGCName>
  </MitoGeneticCode>
  <Lineage>cellular organisms; Bacteria; Actinobacteria; Actinobacteria
(class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
Frankia</Lineage>
  <LineageEx>
    <Taxon>
      <TaxId>131567</TaxId>
      <ScientificName>cellular organisms</ScientificName>
      <Rank>no rank</Rank>
    </Taxon>
    <Taxon>
      <TaxId>2</TaxId>
      <ScientificName>Bacteria</ScientificName>
      <Rank>superkingdom</Rank>
    </Taxon>
    <Taxon>
      <TaxId>201174</TaxId>
      <ScientificName>Actinobacteria</ScientificName>
      <Rank>phylum</Rank>
    </Taxon>
    <Taxon>
      <TaxId>1760</TaxId>
      <ScientificName>Actinobacteria (class)</ScientificName>
      <Rank>class</Rank>
    </Taxon>
    <Taxon>
      <TaxId>85003</TaxId>
      <ScientificName>Actinobacteridae</ScientificName>
      <Rank>subclass</Rank>
    </Taxon>
    <Taxon>
      <TaxId>2037</TaxId>
      <ScientificName>Actinomycetales</ScientificName>
      <Rank>order</Rank>
    </Taxon>
    <Taxon>
      <TaxId>85013</TaxId>
      <ScientificName>Frankineae</ScientificName>
      <Rank>suborder</Rank>
    </Taxon>
    <Taxon>
      <TaxId>74712</TaxId>
      <ScientificName>Frankiaceae</ScientificName>
      <Rank>family</Rank>
    </Taxon>
    <Taxon>
      <TaxId>1854</TaxId>
      <ScientificName>Frankia</ScientificName>
      <Rank>genus</Rank>
    </Taxon>
  </LineageEx>
  <CreateDate>1999/10/22</CreateDate>
  <UpdateDate>2005/01/19</UpdateDate>
  <PubDate>2000/02/02</PubDate>
</Taxon>


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Wednesday, May 10, 2006 7:54 PM
> To: Sendu Bala
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> 
> I would use the implementation that talks to the flatfile db as the
> standard here.  nodes are defined by the data in from taxonomy dump
> dbs from ncbi.
> the eutils is pretty worthless except for taxid->name or reverse, you
> can't get the full taxonomy (or couldn't when that implementation was
> written).
> 
> The "name" method refers to the name of the node - each level in the
> taxonomy can have a "name".
> 
> The bits of hackiness relate to wrapping the node object as a
> Bio::Species and/or being able to read  a genbank file and the
> organism taxonomy data as a list and instantiating.  If we could rely
> on everything being in a DB of course this would be simpler.
> 
> Another problem is the depth of the taxonomy is not constant for
> every node so assuming that a fixed number of slots will be filled in
> to generate the taxonomy leads to problems.
> 
> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
> best example of working code as this is how I really wanted it to
> work, the Bio::Species hacks are only there to shoehorn data
> retrieved from genbank files in.  With the flatfile implementation
> you have to walk all the way up the db hierarchy to get the kingdom
> for a node so you do have to build up the classification hierarchy as
> each node only stores data about itsself.
> 
> I'm not exactly sure what you are proposing to do, but would
> definitely enjoy another pair of hands, I don't really have time to
> mess with it any time soon.
> 
> -jason
> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> 
> > Hi,
> > I'm a little confused as to how names are supposed to work in
> > Bio::Taxonomy::Node.
> >
> > In the bioperl versions that I've looked at a Node doesn't seem to
> > store
> > the most important information about itself - it's scientific name
> > - in
> > an obvious place. bioperl 1.5.1 puts it at the start of the
> > classification list. I'd have thought sticking it in -name would make
> > more sense, but this is used only for the GenBank common name.
> >
> > The Bio::Taxonomy docs still suggests:
> >
> > my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >    -names => {
> >        'scientific' => ['sapiens'],
> >        'common_name' => ['human']
> >    },
> >    -rank => 'species'  # Required tag
> > );
> >
> > and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> > 'name' method which claims to work like:
> >
> > $obj->name('scientific', 'sapiens');
> >
> > This kind of thing would be really nice, but afaics
> > Bio::Taxonomy::Node->new takes the -name value and makes a common name
> > out of it, whilst the name() method passes any 'scientific' name to
> > the
> > scientific_name() method which is unable to set any value (and warns
> > about this), only get.
> >
> > It seems like the need to have this classification array work the same
> > way as Bio::Species is causing some unnecessary restrictions. Can't
> > the
> > more sensible idea of having a dedicated storage spot for the
> > ScientificName and other parameters be used, with the classification
> > array either being generated just-in-time from the hash-stored
> > data, or
> > indeed being generated from the Lineage field?
> >
> >
> > Also, why does a node store the complete hierarchy on itself in the
> > classification array? If we're going that far, why don't the
> > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> > get_taxonomy() method instead of a get_Taxonomy_Node() method.
> > get_taxonomy() could, from a single efetch.fcgi lookup, create a
> > complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> > only
> > have a minimum of information, if you could simply ask a node what its
> > rank and scientific name was you could easily build a classification
> > array, or ask what Kingdom your species was in etc.
> >
> > Are there good reasons for Taxonomy working the way it does in
> > 1.5.1, or
> > would I not be wasting my time re-writing things to make more sense
> > (to me)?
> >
> >
> > Cheers,
> > Sendu.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 11 08:04:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 08:04:54 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
Message-ID: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>

Great - now we just need someone to volunteer to actually work on this.

The current code grabs most of this but I believe expects a different  
XML


On May 10, 2006, at 11:36 PM, Chris Fields wrote:

> I think you can get pretty much everything now, though I can  
> definitely see
> the use of a local database.  I ran a few tests, really unrelated  
> to this,
> using the powerscripting test page at NCBI for eutils (for the  
> curious, at
> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was  
> able to
> retrieve XML-formatted taxonomic information; here's the bacterium  
> Frankia
> sp. CcI3 TaxID info, which looks like they have everything set up  
> by rank.
> It gives quite a bit of information.
>
> <?xml version="1.0"?>
> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> <TaxaSet>
>
> <Taxon>
>   <TaxId>106370</TaxId>
>   <ScientificName>Frankia sp. CcI3</ScientificName>
>   <ParentTaxId>1854</ParentTaxId>
>   <Rank>species</Rank>
>   <Division>Bacteria</Division>
>   <GeneticCode>
>     <GCId>11</GCId>
>     <GCName>Bacterial and Plant Plastid</GCName>
>   </GeneticCode>
>   <MitoGeneticCode>
>     <MGCId>0</MGCId>
>     <MGCName>Unspecified</MGCName>
>   </MitoGeneticCode>
>   <Lineage>cellular organisms; Bacteria; Actinobacteria;  
> Actinobacteria
> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> Frankia</Lineage>
>   <LineageEx>
>     <Taxon>
>       <TaxId>131567</TaxId>
>       <ScientificName>cellular organisms</ScientificName>
>       <Rank>no rank</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>2</TaxId>
>       <ScientificName>Bacteria</ScientificName>
>       <Rank>superkingdom</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>201174</TaxId>
>       <ScientificName>Actinobacteria</ScientificName>
>       <Rank>phylum</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>1760</TaxId>
>       <ScientificName>Actinobacteria (class)</ScientificName>
>       <Rank>class</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>85003</TaxId>
>       <ScientificName>Actinobacteridae</ScientificName>
>       <Rank>subclass</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>2037</TaxId>
>       <ScientificName>Actinomycetales</ScientificName>
>       <Rank>order</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>85013</TaxId>
>       <ScientificName>Frankineae</ScientificName>
>       <Rank>suborder</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>74712</TaxId>
>       <ScientificName>Frankiaceae</ScientificName>
>       <Rank>family</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>1854</TaxId>
>       <ScientificName>Frankia</ScientificName>
>       <Rank>genus</Rank>
>     </Taxon>
>   </LineageEx>
>   <CreateDate>1999/10/22</CreateDate>
>   <UpdateDate>2005/01/19</UpdateDate>
>   <PubDate>2000/02/02</PubDate>
> </Taxon>
>
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>> Sent: Wednesday, May 10, 2006 7:54 PM
>> To: Sendu Bala
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>
>> I would use the implementation that talks to the flatfile db as the
>> standard here.  nodes are defined by the data in from taxonomy dump
>> dbs from ncbi.
>> the eutils is pretty worthless except for taxid->name or reverse, you
>> can't get the full taxonomy (or couldn't when that implementation was
>> written).
>>
>> The "name" method refers to the name of the node - each level in the
>> taxonomy can have a "name".
>>
>> The bits of hackiness relate to wrapping the node object as a
>> Bio::Species and/or being able to read  a genbank file and the
>> organism taxonomy data as a list and instantiating.  If we could rely
>> on everything being in a DB of course this would be simpler.
>>
>> Another problem is the depth of the taxonomy is not constant for
>> every node so assuming that a fixed number of slots will be filled in
>> to generate the taxonomy leads to problems.
>>
>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
>> best example of working code as this is how I really wanted it to
>> work, the Bio::Species hacks are only there to shoehorn data
>> retrieved from genbank files in.  With the flatfile implementation
>> you have to walk all the way up the db hierarchy to get the kingdom
>> for a node so you do have to build up the classification hierarchy as
>> each node only stores data about itsself.
>>
>> I'm not exactly sure what you are proposing to do, but would
>> definitely enjoy another pair of hands, I don't really have time to
>> mess with it any time soon.
>>
>> -jason
>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
>>
>>> Hi,
>>> I'm a little confused as to how names are supposed to work in
>>> Bio::Taxonomy::Node.
>>>
>>> In the bioperl versions that I've looked at a Node doesn't seem to
>>> store
>>> the most important information about itself - it's scientific name
>>> - in
>>> an obvious place. bioperl 1.5.1 puts it at the start of the
>>> classification list. I'd have thought sticking it in -name would  
>>> make
>>> more sense, but this is used only for the GenBank common name.
>>>
>>> The Bio::Taxonomy docs still suggests:
>>>
>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>>>    -names => {
>>>        'scientific' => ['sapiens'],
>>>        'common_name' => ['human']
>>>    },
>>>    -rank => 'species'  # Required tag
>>> );
>>>
>>> and whilst Bio::Taxonomy::Node does not accept -names, it does  
>>> have a
>>> 'name' method which claims to work like:
>>>
>>> $obj->name('scientific', 'sapiens');
>>>
>>> This kind of thing would be really nice, but afaics
>>> Bio::Taxonomy::Node->new takes the -name value and makes a common  
>>> name
>>> out of it, whilst the name() method passes any 'scientific' name to
>>> the
>>> scientific_name() method which is unable to set any value (and warns
>>> about this), only get.
>>>
>>> It seems like the need to have this classification array work the  
>>> same
>>> way as Bio::Species is causing some unnecessary restrictions. Can't
>>> the
>>> more sensible idea of having a dedicated storage spot for the
>>> ScientificName and other parameters be used, with the classification
>>> array either being generated just-in-time from the hash-stored
>>> data, or
>>> indeed being generated from the Lineage field?
>>>
>>>
>>> Also, why does a node store the complete hierarchy on itself in the
>>> classification array? If we're going that far, why don't the
>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
>>> only
>>> have a minimum of information, if you could simply ask a node  
>>> what its
>>> rank and scientific name was you could easily build a classification
>>> array, or ask what Kingdom your species was in etc.
>>>
>>> Are there good reasons for Taxonomy working the way it does in
>>> 1.5.1, or
>>> would I not be wasting my time re-writing things to make more sense
>>> (to me)?
>>>
>>>
>>> Cheers,
>>> Sendu.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Thu May 11 07:51:44 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 11 May 2006 12:51:44 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
	<655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
Message-ID: <44632550.3040603@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> I would use the implementation that talks to the flatfile db as the 
> standard here.  nodes are defined by the data in from taxonomy dump
> dbs from ncbi. the eutils is pretty worthless except for taxid->name
> or reverse, you can't get the full taxonomy (or couldn't when that
> implementation was written).

I'm not sure what you mean. In 1.5.1 you have access to the full
taxonomy because you're using efetch.fcgi. Indeed, you parse the full
taxonomy already to get the classification.


> The "name" method refers to the name of the node - each level in the
>  taxonomy can have a "name".

Yes, and to me the 'name of the node' is its scientific name (something
like 'sapiens'), not a 'common' name. So why is it stored as a
'common' name in the object? Why don't the DB::Taxonomy modules store
the actual common names (something like 'human')?


> The bits of hackiness relate to wrapping the node object as a 
> Bio::Species and/or being able to read  a genbank file and the
> organism taxonomy data as a list and instantiating.  If we could rely
> on everything being in a DB of course this would be simpler.

I think that Taxonomy stuff could be done in a 'pure' way, with a new
Bio::Species made as a wrapper around an appropriate Taxonomy module(s)
that cheated and made fake nodes from a genbank list and then made a
proper Bio::Taxonomy.


> With the flatfile implementation you have to walk all the way up the
> db hierarchy to get the kingdom for a node so you do have to build up
> the classification hierarchy as each node only stores data about
> itsself.

I'm still actually using bioperl 1.4 but I'm looking at 1.5.1 assuming
it is the latest available and I see that the flatfile implementation
works the same way as the entrez one. The requested node is fetched, but
then internally it walks the hierarchy purely so it can build a
classification list which is then stored on the object. If you're
already retrieving every node above the the requested node, why not just
return every node? Why not just return a whole Bio::Taxonomy?


> I'm not exactly sure what you are proposing to do, but would
> definitely enjoy another pair of hands, I don't really have time to
> mess with it any time soon.

I shouldn't really be spending any time on it either, but I knocked up a
quick implementation for myself yesterday/today. I'm working on a bunch 
of modules that inherit from bioperl and then add/alter to suit my 
needs. In this regard they're a bit limited and kind of hard-coded to my 
way of thinking, but hopefully you can see my intent and perhaps use 
some of my implementation.

In my implementation:
# DB::Taxonomy::* return a Bio::Taxonomy equivalent with a single 
database lookup.
# The Taxonomy is implicitly a tree.
# The Taxonomy can have branches of different length from root to the
same rank level.
# The Taxonomy isn't told what ranks is has (isn't limited by some
supplied rank list); it has the ranks that its Nodes have and knows
(without being told) what order those ranks should be in.
# The Taxonomy is made of Nodes that truly only contain information
about themselves and have no classification array or anything like that.
# A Node can still be classified.
# We can have Nodes of rank 'no rank' that will be correctly ordered in
the classification.
# Nodes have a scientific name and common names
# You get parent and all children nodes without database lookups.
# There is a Bio::Species like thing that wraps around this and gives
easy access to what I really want to do:

my $human = TFBS::Species->new(-common_name => 'human');
my @classification = $human->classification; # returns the array you'd
expect from a normally created, fully classified Bio::Species
my $kingdom = $human->kingdom # returns 'Metazoa'

# For genbank, we can still supply TFBS::Species a classification array

http://bix.sendu.me.uk/files/taxonomy_the_tfbs_way.tar.gz
(only tested inheriting from bioperl 1.4, but ideally that shouldn't 
make any difference!)

Is there any scope for bioperl Taxonomy becoming more like this? Or are
there problems with my design (quite likely!)? Or are there good reasons
for maintaining the current way of working? Please feel free to shoot me
down/ discuss.


Cheers,
Sendu.


From sb at mrc-dunn.cam.ac.uk  Thu May 11 08:22:53 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 11 May 2006 13:22:53 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
	<EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
Message-ID: <44632C9D.4010408@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> Great - now we just need someone to volunteer to actually work on this.

Now I'm really confused...


> The current code grabs most of this but I believe expects a different XML

No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez expects 
that XML, and parses it as fully as flatfile.pm does. Nothing more to 
do. Weren't you the person that wrote that parser?

I parse the same XML in my version of entrez.pm (see my previous email); 
the main difference being I make Nodes out of each Taxon instead of just 
adding each Taxon's ScientificName to the classification array.


From jason.stajich at duke.edu  Thu May 11 09:53:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 09:53:56 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <44632C9D.4010408@mrc-dunn.cam.ac.uk>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
	<EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
	<44632C9D.4010408@mrc-dunn.cam.ac.uk>
Message-ID: <AAFFC5EC-8B54-4D87-BE38-CB90785AD4B5@duke.edu>

i guess so - long since forgotten what it supports though since I  
don't regularly use it. sorry.

On May 11, 2006, at 8:22 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>> Great - now we just need someone to volunteer to actually work on  
>> this.
>
> Now I'm really confused...
>
>
>> The current code grabs most of this but I believe expects a  
>> different XML
>
> No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez  
> expects
> that XML, and parses it as fully as flatfile.pm does. Nothing more to
> do. Weren't you the person that wrote that parser?
>
> I parse the same XML in my version of entrez.pm (see my previous  
> email);
> the main difference being I make Nodes out of each Taxon instead of  
> just
> adding each Taxon's ScientificName to the classification array.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Thu May 11 10:57:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 11 May 2006 09:57:20 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
Message-ID: <000b01c6750b$33e95ea0$15327e82@pyrimidine>

Heh... 

To tell the truth, I haven't looked at Bio::DB::Taxonomy in any depth yet,
but I myself have seen issues with the way Bio::Species treats bacterial
strains (I guess this also involves Bio::Taxonomy::Node since that's what
Bio::Species delegates to).  Seems it likes to repeat some strain names when
using $seq->species->common_name.  Not a killer problem but annoying since
the correct name is in the source tag in the feature table!  I 'could' take
a look at it but I can't guarantee quick results.

Jason, I could add Taxonomy to the EUtilities overhaul I mentioned to you
previously but it'll take awhile to get going.  I'm really more interested
in getting epost-esearch-efetch sequence retrieval up and running first with
the same API as Bio::DB::GenBank/Genpept and Bio::DB::Query::GenBank, donate
the code (late summer/fall???) after working out namespace issues so it
doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I suppose I
could also look at Bio::DB:Taxonomy to see what's up in the next couple of
weeks (after conference), unless someone gets to it sooner.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Thursday, May 11, 2006 7:05 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> 
> Great - now we just need someone to volunteer to actually work on this.
> 
> The current code grabs most of this but I believe expects a different
> XML
> 
> 
> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
> 
> > I think you can get pretty much everything now, though I can
> > definitely see
> > the use of a local database.  I ran a few tests, really unrelated
> > to this,
> > using the powerscripting test page at NCBI for eutils (for the
> > curious, at
> > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
> > able to
> > retrieve XML-formatted taxonomic information; here's the bacterium
> > Frankia
> > sp. CcI3 TaxID info, which looks like they have everything set up
> > by rank.
> > It gives quite a bit of information.
> >
> > <?xml version="1.0"?>
> > <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> > <TaxaSet>
> >
> > <Taxon>
> >   <TaxId>106370</TaxId>
> >   <ScientificName>Frankia sp. CcI3</ScientificName>
> >   <ParentTaxId>1854</ParentTaxId>
> >   <Rank>species</Rank>
> >   <Division>Bacteria</Division>
> >   <GeneticCode>
> >     <GCId>11</GCId>
> >     <GCName>Bacterial and Plant Plastid</GCName>
> >   </GeneticCode>
> >   <MitoGeneticCode>
> >     <MGCId>0</MGCId>
> >     <MGCName>Unspecified</MGCName>
> >   </MitoGeneticCode>
> >   <Lineage>cellular organisms; Bacteria; Actinobacteria;
> > Actinobacteria
> > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> > Frankia</Lineage>
> >   <LineageEx>
> >     <Taxon>
> >       <TaxId>131567</TaxId>
> >       <ScientificName>cellular organisms</ScientificName>
> >       <Rank>no rank</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>2</TaxId>
> >       <ScientificName>Bacteria</ScientificName>
> >       <Rank>superkingdom</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>201174</TaxId>
> >       <ScientificName>Actinobacteria</ScientificName>
> >       <Rank>phylum</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>1760</TaxId>
> >       <ScientificName>Actinobacteria (class)</ScientificName>
> >       <Rank>class</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>85003</TaxId>
> >       <ScientificName>Actinobacteridae</ScientificName>
> >       <Rank>subclass</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>2037</TaxId>
> >       <ScientificName>Actinomycetales</ScientificName>
> >       <Rank>order</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>85013</TaxId>
> >       <ScientificName>Frankineae</ScientificName>
> >       <Rank>suborder</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>74712</TaxId>
> >       <ScientificName>Frankiaceae</ScientificName>
> >       <Rank>family</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>1854</TaxId>
> >       <ScientificName>Frankia</ScientificName>
> >       <Rank>genus</Rank>
> >     </Taxon>
> >   </LineageEx>
> >   <CreateDate>1999/10/22</CreateDate>
> >   <UpdateDate>2005/01/19</UpdateDate>
> >   <PubDate>2000/02/02</PubDate>
> > </Taxon>
> >
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >> Sent: Wednesday, May 10, 2006 7:54 PM
> >> To: Sendu Bala
> >> Cc: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>
> >> I would use the implementation that talks to the flatfile db as the
> >> standard here.  nodes are defined by the data in from taxonomy dump
> >> dbs from ncbi.
> >> the eutils is pretty worthless except for taxid->name or reverse, you
> >> can't get the full taxonomy (or couldn't when that implementation was
> >> written).
> >>
> >> The "name" method refers to the name of the node - each level in the
> >> taxonomy can have a "name".
> >>
> >> The bits of hackiness relate to wrapping the node object as a
> >> Bio::Species and/or being able to read  a genbank file and the
> >> organism taxonomy data as a list and instantiating.  If we could rely
> >> on everything being in a DB of course this would be simpler.
> >>
> >> Another problem is the depth of the taxonomy is not constant for
> >> every node so assuming that a fixed number of slots will be filled in
> >> to generate the taxonomy leads to problems.
> >>
> >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
> >> best example of working code as this is how I really wanted it to
> >> work, the Bio::Species hacks are only there to shoehorn data
> >> retrieved from genbank files in.  With the flatfile implementation
> >> you have to walk all the way up the db hierarchy to get the kingdom
> >> for a node so you do have to build up the classification hierarchy as
> >> each node only stores data about itsself.
> >>
> >> I'm not exactly sure what you are proposing to do, but would
> >> definitely enjoy another pair of hands, I don't really have time to
> >> mess with it any time soon.
> >>
> >> -jason
> >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> >>
> >>> Hi,
> >>> I'm a little confused as to how names are supposed to work in
> >>> Bio::Taxonomy::Node.
> >>>
> >>> In the bioperl versions that I've looked at a Node doesn't seem to
> >>> store
> >>> the most important information about itself - it's scientific name
> >>> - in
> >>> an obvious place. bioperl 1.5.1 puts it at the start of the
> >>> classification list. I'd have thought sticking it in -name would
> >>> make
> >>> more sense, but this is used only for the GenBank common name.
> >>>
> >>> The Bio::Taxonomy docs still suggests:
> >>>
> >>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >>>    -names => {
> >>>        'scientific' => ['sapiens'],
> >>>        'common_name' => ['human']
> >>>    },
> >>>    -rank => 'species'  # Required tag
> >>> );
> >>>
> >>> and whilst Bio::Taxonomy::Node does not accept -names, it does
> >>> have a
> >>> 'name' method which claims to work like:
> >>>
> >>> $obj->name('scientific', 'sapiens');
> >>>
> >>> This kind of thing would be really nice, but afaics
> >>> Bio::Taxonomy::Node->new takes the -name value and makes a common
> >>> name
> >>> out of it, whilst the name() method passes any 'scientific' name to
> >>> the
> >>> scientific_name() method which is unable to set any value (and warns
> >>> about this), only get.
> >>>
> >>> It seems like the need to have this classification array work the
> >>> same
> >>> way as Bio::Species is causing some unnecessary restrictions. Can't
> >>> the
> >>> more sensible idea of having a dedicated storage spot for the
> >>> ScientificName and other parameters be used, with the classification
> >>> array either being generated just-in-time from the hash-stored
> >>> data, or
> >>> indeed being generated from the Lineage field?
> >>>
> >>>
> >>> Also, why does a node store the complete hierarchy on itself in the
> >>> classification array? If we're going that far, why don't the
> >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> >>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> >>> only
> >>> have a minimum of information, if you could simply ask a node
> >>> what its
> >>> rank and scientific name was you could easily build a classification
> >>> array, or ask what Kingdom your species was in etc.
> >>>
> >>> Are there good reasons for Taxonomy working the way it does in
> >>> 1.5.1, or
> >>> would I not be wasting my time re-writing things to make more sense
> >>> (to me)?
> >>>
> >>>
> >>> Cheers,
> >>> Sendu.
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 11 11:42:07 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 11:42:07 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <000b01c6750b$33e95ea0$15327e82@pyrimidine>
References: <000b01c6750b$33e95ea0$15327e82@pyrimidine>
Message-ID: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu>


I think you'll see it is different and mostly a limitation of the  
genbank format and the Bio::Species objects that you get from a  
genbank parse do represent the full capabilities of a Taxonomy::Node.

I am happy for someone to overhaul things, but it all boils down to  
inferring which part of a list of names is the species versus sub- 
species versus strain when none of the members of the list are  
labeled.  This is some of the same problems we have for swissprot as  
well.  I just don't think we can do it right only from the genbank  
file data so I don't see a lot of point of expecting Bio::Species to  
provide more than a representation of what is in the file and just  
return that array.


It has seemed like we need to special case things pretty heavily or  
do a lookup in the taxonomydb for something.

Can you guess what value is the strain versus sub-species?  What  
happens when there is a two part strain name (space separated) and a  
sub-species or variety designation?

SOURCE      Staphylococcus haemolyticus JCSC1435
   ORGANISM  Staphylococcus haemolyticus JCSC1435
             Bacteria; Firmicutes; Bacillales; Staphylococcus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
strain is JCSC1435

versus
SOURCE      Muntiacus muntjak vaginalis
   ORGANISM  Muntiacus muntjak vaginalis
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;  
Ruminantia;
             Pecora; Cervidae; Muntiacinae; Muntiacus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
species is muntjak, sub-species vaginalis ?

versus
SOURCE      Aspergillus nidulans FGSC A4
   ORGANISM  Aspergillus nidulans FGSC A4
             Eukaryota; Fungi; Ascomycota; Pezizomycotina;  
Eurotiomycetes;
             Eurotiales; Trichocomaceae; Emericella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321

Genus should be Aspergillus or Emericella ?

Strain and subspecies/variety in the same entry
SOURCE      Cryptococcus neoformans var. grubii H99
   ORGANISM  Cryptococcus neoformans var. grubii H99
             Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
             Heterobasidiomycetes; Tremellomycetidae; Tremellales;  
Tremellaceae;
             Filobasidiella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443


On May 11, 2006, at 10:57 AM, Chris Fields wrote:

> Heh...
>
> To tell the truth, I haven't looked at Bio::DB::Taxonomy in any  
> depth yet,
> but I myself have seen issues with the way Bio::Species treats  
> bacterial
> strains (I guess this also involves Bio::Taxonomy::Node since  
> that's what
> Bio::Species delegates to).  Seems it likes to repeat some strain  
> names when
> using $seq->species->common_name.  Not a killer problem but  
> annoying since
> the correct name is in the source tag in the feature table!  I  
> 'could' take
> a look at it but I can't guarantee quick results.
>
> Jason, I could add Taxonomy to the EUtilities overhaul I mentioned  
> to you
> previously but it'll take awhile to get going.  I'm really more  
> interested
> in getting epost-esearch-efetch sequence retrieval up and running  
> first with
> the same API as Bio::DB::GenBank/Genpept and  
> Bio::DB::Query::GenBank, donate
> the code (late summer/fall???) after working out namespace issues  
> so it
> doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I  
> suppose I
> could also look at Bio::DB:Taxonomy to see what's up in the next  
> couple of
> weeks (after conference), unless someone gets to it sooner.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>> Sent: Thursday, May 11, 2006 7:05 AM
>> To: Chris Fields
>> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>
>> Great - now we just need someone to volunteer to actually work on  
>> this.
>>
>> The current code grabs most of this but I believe expects a different
>> XML
>>
>>
>> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
>>
>>> I think you can get pretty much everything now, though I can
>>> definitely see
>>> the use of a local database.  I ran a few tests, really unrelated
>>> to this,
>>> using the powerscripting test page at NCBI for eutils (for the
>>> curious, at
>>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
>>> able to
>>> retrieve XML-formatted taxonomic information; here's the bacterium
>>> Frankia
>>> sp. CcI3 TaxID info, which looks like they have everything set up
>>> by rank.
>>> It gives quite a bit of information.
>>>
>>> <?xml version="1.0"?>
>>> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
>>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
>>> <TaxaSet>
>>>
>>> <Taxon>
>>>   <TaxId>106370</TaxId>
>>>   <ScientificName>Frankia sp. CcI3</ScientificName>
>>>   <ParentTaxId>1854</ParentTaxId>
>>>   <Rank>species</Rank>
>>>   <Division>Bacteria</Division>
>>>   <GeneticCode>
>>>     <GCId>11</GCId>
>>>     <GCName>Bacterial and Plant Plastid</GCName>
>>>   </GeneticCode>
>>>   <MitoGeneticCode>
>>>     <MGCId>0</MGCId>
>>>     <MGCName>Unspecified</MGCName>
>>>   </MitoGeneticCode>
>>>   <Lineage>cellular organisms; Bacteria; Actinobacteria;
>>> Actinobacteria
>>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
>>> Frankia</Lineage>
>>>   <LineageEx>
>>>     <Taxon>
>>>       <TaxId>131567</TaxId>
>>>       <ScientificName>cellular organisms</ScientificName>
>>>       <Rank>no rank</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>2</TaxId>
>>>       <ScientificName>Bacteria</ScientificName>
>>>       <Rank>superkingdom</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>201174</TaxId>
>>>       <ScientificName>Actinobacteria</ScientificName>
>>>       <Rank>phylum</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>1760</TaxId>
>>>       <ScientificName>Actinobacteria (class)</ScientificName>
>>>       <Rank>class</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>85003</TaxId>
>>>       <ScientificName>Actinobacteridae</ScientificName>
>>>       <Rank>subclass</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>2037</TaxId>
>>>       <ScientificName>Actinomycetales</ScientificName>
>>>       <Rank>order</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>85013</TaxId>
>>>       <ScientificName>Frankineae</ScientificName>
>>>       <Rank>suborder</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>74712</TaxId>
>>>       <ScientificName>Frankiaceae</ScientificName>
>>>       <Rank>family</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>1854</TaxId>
>>>       <ScientificName>Frankia</ScientificName>
>>>       <Rank>genus</Rank>
>>>     </Taxon>
>>>   </LineageEx>
>>>   <CreateDate>1999/10/22</CreateDate>
>>>   <UpdateDate>2005/01/19</UpdateDate>
>>>   <PubDate>2000/02/02</PubDate>
>>> </Taxon>
>>>
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>>>> Sent: Wednesday, May 10, 2006 7:54 PM
>>>> To: Sendu Bala
>>>> Cc: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>>>
>>>> I would use the implementation that talks to the flatfile db as the
>>>> standard here.  nodes are defined by the data in from taxonomy dump
>>>> dbs from ncbi.
>>>> the eutils is pretty worthless except for taxid->name or  
>>>> reverse, you
>>>> can't get the full taxonomy (or couldn't when that  
>>>> implementation was
>>>> written).
>>>>
>>>> The "name" method refers to the name of the node - each level in  
>>>> the
>>>> taxonomy can have a "name".
>>>>
>>>> The bits of hackiness relate to wrapping the node object as a
>>>> Bio::Species and/or being able to read  a genbank file and the
>>>> organism taxonomy data as a list and instantiating.  If we could  
>>>> rely
>>>> on everything being in a DB of course this would be simpler.
>>>>
>>>> Another problem is the depth of the taxonomy is not constant for
>>>> every node so assuming that a fixed number of slots will be  
>>>> filled in
>>>> to generate the taxonomy leads to problems.
>>>>
>>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as  
>>>> the
>>>> best example of working code as this is how I really wanted it to
>>>> work, the Bio::Species hacks are only there to shoehorn data
>>>> retrieved from genbank files in.  With the flatfile implementation
>>>> you have to walk all the way up the db hierarchy to get the kingdom
>>>> for a node so you do have to build up the classification  
>>>> hierarchy as
>>>> each node only stores data about itsself.
>>>>
>>>> I'm not exactly sure what you are proposing to do, but would
>>>> definitely enjoy another pair of hands, I don't really have time to
>>>> mess with it any time soon.
>>>>
>>>> -jason
>>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
>>>>
>>>>> Hi,
>>>>> I'm a little confused as to how names are supposed to work in
>>>>> Bio::Taxonomy::Node.
>>>>>
>>>>> In the bioperl versions that I've looked at a Node doesn't seem to
>>>>> store
>>>>> the most important information about itself - it's scientific name
>>>>> - in
>>>>> an obvious place. bioperl 1.5.1 puts it at the start of the
>>>>> classification list. I'd have thought sticking it in -name would
>>>>> make
>>>>> more sense, but this is used only for the GenBank common name.
>>>>>
>>>>> The Bio::Taxonomy docs still suggests:
>>>>>
>>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>>>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>>>>>    -names => {
>>>>>        'scientific' => ['sapiens'],
>>>>>        'common_name' => ['human']
>>>>>    },
>>>>>    -rank => 'species'  # Required tag
>>>>> );
>>>>>
>>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does
>>>>> have a
>>>>> 'name' method which claims to work like:
>>>>>
>>>>> $obj->name('scientific', 'sapiens');
>>>>>
>>>>> This kind of thing would be really nice, but afaics
>>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common
>>>>> name
>>>>> out of it, whilst the name() method passes any 'scientific'  
>>>>> name to
>>>>> the
>>>>> scientific_name() method which is unable to set any value (and  
>>>>> warns
>>>>> about this), only get.
>>>>>
>>>>> It seems like the need to have this classification array work the
>>>>> same
>>>>> way as Bio::Species is causing some unnecessary restrictions.  
>>>>> Can't
>>>>> the
>>>>> more sensible idea of having a dedicated storage spot for the
>>>>> ScientificName and other parameters be used, with the  
>>>>> classification
>>>>> array either being generated just-in-time from the hash-stored
>>>>> data, or
>>>>> indeed being generated from the Lineage field?
>>>>>
>>>>>
>>>>> Also, why does a node store the complete hierarchy on itself in  
>>>>> the
>>>>> classification array? If we're going that far, why don't the
>>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just  
>>>>> have a
>>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
>>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
>>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
>>>>> only
>>>>> have a minimum of information, if you could simply ask a node
>>>>> what its
>>>>> rank and scientific name was you could easily build a  
>>>>> classification
>>>>> array, or ask what Kingdom your species was in etc.
>>>>>
>>>>> Are there good reasons for Taxonomy working the way it does in
>>>>> 1.5.1, or
>>>>> would I not be wasting my time re-writing things to make more  
>>>>> sense
>>>>> (to me)?
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Sendu.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From WiersmaP at AGR.GC.CA  Thu May 11 13:04:01 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Thu, 11 May 2006 13:04:01 -0400
Subject: [Bioperl-l] What is the relationship between primer3
	moduleandrun-primer3 module?
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca>

The bug that Wenwu referred should only occur when reading a Primer3 output file;  the Bio::Tools::Run::Primer3->run method takes the results and directly transfers them to a Bio::Tools::Primer3 object without an intermediate file.  A Data::Dumper look at the Bio::Tools::Primer3 object shows the keys and results for PRIMER_SEQUENCE_ID and SEQUENCE in 'results' and then again in the 'results_by_number' hash but only in the '0' hash.

All of this doesn't really matter for Li's original concern.  If you want to include the id of sequence along with the primer3 results just take it from the seq object (i.e. $seq->display_id() ).  Since you are in a loop taking one sequence at a time this $seq will be the one that was sent to primer3.

PAW

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Cui, Wenwu (NIH/NCI) [F]
Sent: Wednesday, May 10, 2006 6:46 PM
To: chen li; bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module?

1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file.
 
2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output 
 
3. primer3.exe is called in the Bio::Tools::Run::Primer3  "run" function, please read the function definition.


From cjfields at uiuc.edu  Thu May 11 13:16:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 11 May 2006 12:16:19 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu>
Message-ID: <000f01c6751e$9e89d6a0$15327e82@pyrimidine>

> I think you'll see it is different and mostly a limitation of the
> genbank format and the Bio::Species objects that you get from a
> genbank parse do represent the full capabilities of a Taxonomy::Node.

I definitely see the rational for using a TaxID lookup (I think Hilmar said
so as well), especially for local databases.  I wonder, though, if there is
a way that RichSeqs like GenBank, when passed through SeqIO, can be just be
'short-circuited' using the sequence builder to just accept what's on the
SOURCE or ORGANISM line of a file as is, without forcing it into
Bio::Species/Bio::Taxonomy::Node.  Or maybe diminish the role of the
SOURCE/ORGANISM lines altogether to just simple Annotation objects and place
much greater emphasis on the TaxID itself, in effect decoupling the TaxID
(taxonomic information) from SOURCE/ORGANISM (annotation information).

In other words, have GenBank/EMBL classification lines and organism lines
essentially stay like they are in the input file (use simple objects).
Then, if one were really intent on getting the full name, classification,
etc., or one wanted to store their sequences in bioperl-db, they would be
required to either have a local db of NCBI Taxonomy or remote access to a
similar database (NCBI or something else) so a lookup could be accomplished
using the TaxID.  If they us BioSQL, then require them to preload their
BioSQL database with NCBI's taxonomy, something Hilmar already strongly
suggests.

If anyone isn't interested in the taxonomic information or doesn't want to
bother grabbing the database or setting up remote access, tough luck; just
grab the Bio::Annotation/Bio::Species object and use that.  As the saying
goes, "you can't be all things to all people."  At some point you have to
throw your arms in the air, do the best you can, but give up trying to
please everyone.

> I am happy for someone to overhaul things, but it all boils down to
> inferring which part of a list of names is the species versus sub-
> species versus strain when none of the members of the list are
> labeled.  This is some of the same problems we have for swissprot as
> well.  I just don't think we can do it right only from the genbank
> file data so I don't see a lot of point of expecting Bio::Species to
> provide more than a representation of what is in the file and just
> return that array.
> 
> 
> It has seemed like we need to special case things pretty heavily or
> do a lookup in the taxonomydb for something.
> 
> Can you guess what value is the strain versus sub-species?  What
> happens when there is a two part strain name (space separated) and a
> sub-species or variety designation?
> 
> SOURCE      Staphylococcus haemolyticus JCSC1435
>    ORGANISM  Staphylococcus haemolyticus JCSC1435
>              Bacteria; Firmicutes; Bacillales; Staphylococcus.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
> strain is JCSC1435
> 
> versus
> SOURCE      Muntiacus muntjak vaginalis
>    ORGANISM  Muntiacus muntjak vaginalis
>              Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
>              Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;
> Ruminantia;
>              Pecora; Cervidae; Muntiacinae; Muntiacus.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
> species is muntjak, sub-species vaginalis ?
> 
> versus
> SOURCE      Aspergillus nidulans FGSC A4
>    ORGANISM  Aspergillus nidulans FGSC A4
>              Eukaryota; Fungi; Ascomycota; Pezizomycotina;
> Eurotiomycetes;
>              Eurotiales; Trichocomaceae; Emericella.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321
> 
> Genus should be Aspergillus or Emericella ?
> 
> Strain and subspecies/variety in the same entry
> SOURCE      Cryptococcus neoformans var. grubii H99
>    ORGANISM  Cryptococcus neoformans var. grubii H99
>              Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
>              Heterobasidiomycetes; Tremellomycetidae; Tremellales;
> Tremellaceae;
>              Filobasidiella.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443

Definitely tricky!  This really points out the problem here.  It used to be
a problem for only a few cases but with so many bacterial and fungal genomes
that's changed.  

The Frankia XML example has the scientific name set to "Frankia sp. CcI3",
which matches the SOURCE/ORGANISM line in NCBI's GenBank files and the OS
line in EMBL files.  It looks like the lines are parsed into and then built
from the ground-up in Bio::SeqIO::genbank using Bio::Species objects, which,
in my case with the strain designation, is where the problem lies.  They
could be placed in annotation objects with (-tagname=> 'SOURCE', value
=>'Frankia sp. CcI3') or similar settings.  Or simplify Bio::Species to only
represent the information in the GenBank SOURCE/ORGANISM/CLASSIFICATION or
EMBL OS/OC lines and nothing more complex than that (no complex taxonomy;
for that you use the TaxID and local database). 

Okay,  I need to lay off the coffee now...

Chris

> On May 11, 2006, at 10:57 AM, Chris Fields wrote:
> 
> > Heh...
> >
> > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any
> > depth yet,
> > but I myself have seen issues with the way Bio::Species treats
> > bacterial
> > strains (I guess this also involves Bio::Taxonomy::Node since
> > that's what
> > Bio::Species delegates to).  Seems it likes to repeat some strain
> > names when
> > using $seq->species->common_name.  Not a killer problem but
> > annoying since
> > the correct name is in the source tag in the feature table!  I
> > 'could' take
> > a look at it but I can't guarantee quick results.
> >
> > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned
> > to you
> > previously but it'll take awhile to get going.  I'm really more
> > interested
> > in getting epost-esearch-efetch sequence retrieval up and running
> > first with
> > the same API as Bio::DB::GenBank/Genpept and
> > Bio::DB::Query::GenBank, donate
> > the code (late summer/fall???) after working out namespace issues
> > so it
> > doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I
> > suppose I
> > could also look at Bio::DB:Taxonomy to see what's up in the next
> > couple of
> > weeks (after conference), unless someone gets to it sooner.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >> Sent: Thursday, May 11, 2006 7:05 AM
> >> To: Chris Fields
> >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
> >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>
> >> Great - now we just need someone to volunteer to actually work on
> >> this.
> >>
> >> The current code grabs most of this but I believe expects a different
> >> XML
> >>
> >>
> >> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
> >>
> >>> I think you can get pretty much everything now, though I can
> >>> definitely see
> >>> the use of a local database.  I ran a few tests, really unrelated
> >>> to this,
> >>> using the powerscripting test page at NCBI for eutils (for the
> >>> curious, at
> >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
> >>> able to
> >>> retrieve XML-formatted taxonomic information; here's the bacterium
> >>> Frankia
> >>> sp. CcI3 TaxID info, which looks like they have everything set up
> >>> by rank.
> >>> It gives quite a bit of information.
> >>>
> >>> <?xml version="1.0"?>
> >>> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> >>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> >>> <TaxaSet>
> >>>
> >>> <Taxon>
> >>>   <TaxId>106370</TaxId>
> >>>   <ScientificName>Frankia sp. CcI3</ScientificName>
> >>>   <ParentTaxId>1854</ParentTaxId>
> >>>   <Rank>species</Rank>
> >>>   <Division>Bacteria</Division>
> >>>   <GeneticCode>
> >>>     <GCId>11</GCId>
> >>>     <GCName>Bacterial and Plant Plastid</GCName>
> >>>   </GeneticCode>
> >>>   <MitoGeneticCode>
> >>>     <MGCId>0</MGCId>
> >>>     <MGCName>Unspecified</MGCName>
> >>>   </MitoGeneticCode>
> >>>   <Lineage>cellular organisms; Bacteria; Actinobacteria;
> >>> Actinobacteria
> >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> >>> Frankia</Lineage>
> >>>   <LineageEx>
> >>>     <Taxon>
> >>>       <TaxId>131567</TaxId>
> >>>       <ScientificName>cellular organisms</ScientificName>
> >>>       <Rank>no rank</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>2</TaxId>
> >>>       <ScientificName>Bacteria</ScientificName>
> >>>       <Rank>superkingdom</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>201174</TaxId>
> >>>       <ScientificName>Actinobacteria</ScientificName>
> >>>       <Rank>phylum</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>1760</TaxId>
> >>>       <ScientificName>Actinobacteria (class)</ScientificName>
> >>>       <Rank>class</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>85003</TaxId>
> >>>       <ScientificName>Actinobacteridae</ScientificName>
> >>>       <Rank>subclass</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>2037</TaxId>
> >>>       <ScientificName>Actinomycetales</ScientificName>
> >>>       <Rank>order</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>85013</TaxId>
> >>>       <ScientificName>Frankineae</ScientificName>
> >>>       <Rank>suborder</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>74712</TaxId>
> >>>       <ScientificName>Frankiaceae</ScientificName>
> >>>       <Rank>family</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>1854</TaxId>
> >>>       <ScientificName>Frankia</ScientificName>
> >>>       <Rank>genus</Rank>
> >>>     </Taxon>
> >>>   </LineageEx>
> >>>   <CreateDate>1999/10/22</CreateDate>
> >>>   <UpdateDate>2005/01/19</UpdateDate>
> >>>   <PubDate>2000/02/02</PubDate>
> >>> </Taxon>
> >>>
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >>>> Sent: Wednesday, May 10, 2006 7:54 PM
> >>>> To: Sendu Bala
> >>>> Cc: bioperl-l at lists.open-bio.org
> >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>>>
> >>>> I would use the implementation that talks to the flatfile db as the
> >>>> standard here.  nodes are defined by the data in from taxonomy dump
> >>>> dbs from ncbi.
> >>>> the eutils is pretty worthless except for taxid->name or
> >>>> reverse, you
> >>>> can't get the full taxonomy (or couldn't when that
> >>>> implementation was
> >>>> written).
> >>>>
> >>>> The "name" method refers to the name of the node - each level in
> >>>> the
> >>>> taxonomy can have a "name".
> >>>>
> >>>> The bits of hackiness relate to wrapping the node object as a
> >>>> Bio::Species and/or being able to read  a genbank file and the
> >>>> organism taxonomy data as a list and instantiating.  If we could
> >>>> rely
> >>>> on everything being in a DB of course this would be simpler.
> >>>>
> >>>> Another problem is the depth of the taxonomy is not constant for
> >>>> every node so assuming that a fixed number of slots will be
> >>>> filled in
> >>>> to generate the taxonomy leads to problems.
> >>>>
> >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as
> >>>> the
> >>>> best example of working code as this is how I really wanted it to
> >>>> work, the Bio::Species hacks are only there to shoehorn data
> >>>> retrieved from genbank files in.  With the flatfile implementation
> >>>> you have to walk all the way up the db hierarchy to get the kingdom
> >>>> for a node so you do have to build up the classification
> >>>> hierarchy as
> >>>> each node only stores data about itsself.
> >>>>
> >>>> I'm not exactly sure what you are proposing to do, but would
> >>>> definitely enjoy another pair of hands, I don't really have time to
> >>>> mess with it any time soon.
> >>>>
> >>>> -jason
> >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> >>>>
> >>>>> Hi,
> >>>>> I'm a little confused as to how names are supposed to work in
> >>>>> Bio::Taxonomy::Node.
> >>>>>
> >>>>> In the bioperl versions that I've looked at a Node doesn't seem to
> >>>>> store
> >>>>> the most important information about itself - it's scientific name
> >>>>> - in
> >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the
> >>>>> classification list. I'd have thought sticking it in -name would
> >>>>> make
> >>>>> more sense, but this is used only for the GenBank common name.
> >>>>>
> >>>>> The Bio::Taxonomy docs still suggests:
> >>>>>
> >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >>>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >>>>>    -names => {
> >>>>>        'scientific' => ['sapiens'],
> >>>>>        'common_name' => ['human']
> >>>>>    },
> >>>>>    -rank => 'species'  # Required tag
> >>>>> );
> >>>>>
> >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does
> >>>>> have a
> >>>>> 'name' method which claims to work like:
> >>>>>
> >>>>> $obj->name('scientific', 'sapiens');
> >>>>>
> >>>>> This kind of thing would be really nice, but afaics
> >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common
> >>>>> name
> >>>>> out of it, whilst the name() method passes any 'scientific'
> >>>>> name to
> >>>>> the
> >>>>> scientific_name() method which is unable to set any value (and
> >>>>> warns
> >>>>> about this), only get.
> >>>>>
> >>>>> It seems like the need to have this classification array work the
> >>>>> same
> >>>>> way as Bio::Species is causing some unnecessary restrictions.
> >>>>> Can't
> >>>>> the
> >>>>> more sensible idea of having a dedicated storage spot for the
> >>>>> ScientificName and other parameters be used, with the
> >>>>> classification
> >>>>> array either being generated just-in-time from the hash-stored
> >>>>> data, or
> >>>>> indeed being generated from the Lineage field?
> >>>>>
> >>>>>
> >>>>> Also, why does a node store the complete hierarchy on itself in
> >>>>> the
> >>>>> classification array? If we're going that far, why don't the
> >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just
> >>>>> have a
> >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> >>>>> only
> >>>>> have a minimum of information, if you could simply ask a node
> >>>>> what its
> >>>>> rank and scientific name was you could easily build a
> >>>>> classification
> >>>>> array, or ask what Kingdom your species was in etc.
> >>>>>
> >>>>> Are there good reasons for Taxonomy working the way it does in
> >>>>> 1.5.1, or
> >>>>> would I not be wasting my time re-writing things to make more
> >>>>> sense
> >>>>> (to me)?
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>> Sendu.
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> Jason Stajich
> >>>> Duke University
> >>>> http://www.duke.edu/~jes12
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From WiersmaP at AGR.GC.CA  Thu May 11 20:13:12 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Thu, 11 May 2006 20:13:12 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module
	andrun-primer3 module?
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C52@onncrxms5.agr.gc.ca>

Li,

If you are only "a little confused" by the OO concepts in the primer3 modules than you are doing well.

To expand a little on Wenwu's explanations.  A Bio::Tools::Run:Primer3 object is a "wrapper" around the Primer3 program. All the commands and parameters that Primer3 needs for it to run are collected inside the object.  This includes a sequence (which you must supply as a sequence object) and parameters (most of which are already supplied by default but can be changed using the $primer3_object->add_targets method). Then, when everything is set the way you want it you 'run' the Primer3 program by using $primer3_object->run.  The "wrapper" collects all the run parameters and sends them off to the Primer3 executable.  Primer3 does the analysis and outputs the results to "stdout" in boulder-io format.  By redirecting the output (i.e. perl p3run_script.pl > out.txt) you will get the Primer3 output directly in the  boulder-io format ('tag'='value') stored in out.txt.  Because out.txt is not being closed between each sequence called in the script you get all of the results concatenated in out.txt.  However, if you supplied an output filename (-outfile=>$file_out) in the "wrapper", each line of output from Primer3 will be written to $file_out and at the end of Primer3 output the file will be closed.  Now if your script loops to another sequence it will open the same outfile again and overwrite.  

One last important detail for the "wrapper" object.  When Primer3 is executed the $primer3_object is designed to return a Bio::Tools::Primer3 object (the code is: my $results_object = $primer3_object->run).  $results_object is a Bio::Tools::Primer3 object and contains the results of your Primer3 run as well as having methods for getting at that information.  This includes finding out how many primer sets were found and the means to access the primer set results one at a time.  It does work as advertised.  Because all of the primer sets are based on the same sequence, Primer3 only outputs the SEQUENCE and PRIMER_SEQUENCE_ID one time instead of for each primer set.  That is why they only show up in $results_object as if they belonged with the first primer set (set '0') and they are not available for the other primer sets.

PAW

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
Sent: Wednesday, May 10, 2006 5:28 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module?

First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual 
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3 

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run; 

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li 

 
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Fri May 12 00:29:37 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 12 May 2006 14:29:37 +1000
Subject: [Bioperl-l] Using bioperl to convert gene predictions to gff
In-Reply-To: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin>
References: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin>
Message-ID: <44640F31.6090702@infotech.monash.edu.au>

Mark,

> I'd like to reformat gene predictions from several different programs
> (genscan, glimmerhmm, fgenesh) to gff format. I know bioperl can parse the
> output from these and other predictors and that it can export into GFF. But
> I'm not clear on how to string the two together.
> Can anyone point me at any example code?

The parser module for the gene predictions generally allow you to 
iterate through the predicted genes. Each prediction is usually returned 
as a Bio::SeqFeatureI-derived object. Those objects have a gff_string() 
method to print them as GFF.

So something as simple as this *may* work:

use Bio::Tools::Glimmer;
my $parser = new Bio::Tools::Glimmer(-file => 'glimmer.out');
while(my $gene = $parser->next_prediction) {
   print $gene->gff_string;
}

If you want separate GFF lines for each exon, you'll have to do another 
loop over $gene->exons() etc each of which are luckily also 
Bio::SeqFeatures!

Or if want to modify some of the GFF columns first, eg. the source tag, 
just do $gene->source_tag('mynewtag') before printing it.

Hope this helps,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From torsten.seemann at infotech.monash.edu.au  Fri May 12 00:36:46 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 12 May 2006 14:36:46 +1000
Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making
	with	Bio::Graphics::Panel
In-Reply-To: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>
References: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>
Message-ID: <446410DE.7070305@infotech.monash.edu.au>

Kevin,

> I want to create an imagemap of short sequence matches with a longer one
> with clickable imagemaps for the short sequences. I figure I can do this
> easily enough using the example script for parsing blast output but I need
> an example script to understand how to produce the html code for the
> imagemap. I can find only rather cryptic references about how this can be
> done (see below).

The "blastGraphic" project probably has Perl code that could help you.

	http://www.gmod.org/blastGraphic.shtml

It is/was part of the GMOD project.
It produces pretty clickable image maps from BLAST reports.

Hope it helps,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From brianjgilmartin at hotmail.com  Fri May 12 05:29:15 2006
From: brianjgilmartin at hotmail.com (brian gilmartin)
Date: Fri, 12 May 2006 10:29:15 +0100
Subject: [Bioperl-l] (no subject)
Message-ID: <BAY107-F354AD036A551D290A1874CBCAC0@phx.gbl>

please remove me from the list

_________________________________________________________________
Be the first to hear what's new at MSN - sign up to our free newsletters! 
http://www.msn.co.uk/newsletters


From sb at mrc-dunn.cam.ac.uk  Fri May 12 06:24:39 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 12 May 2006 11:24:39 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
Message-ID: <44646267.2000802@mrc-dunn.cam.ac.uk>

In bioperl up to at least 1.5.1, when one of the database modules comes 
across a species rank it does:

if ($rank eq 'species') {
   # get rid of genus from species name
   (undef,$taxon_name) = split(/\s+/,$taxon_name,2);
}

However even though true scientific name is usually 'Genus species' in 
the database, note the 'usually' - sometimes the species is a multiword 
item that does not include the Genus, so we can't do some simple split 
and take the second word.
The same applies to levels below species, eg. 'Avian erythroblastosis 
virus' is a variant of the species 'Avian leukosis virus' but 'Avian 
erythroblastosis virus (strain ES4)' is a variant of that variant...

My solution is to just remove whatever is the same between the current 
rank and the previous rank. Maybe even that's not so perfect, but it 
must be a lot better than turning the species 'Avian leukosis virus' 
into the species 'virus' (especially given that the genus here is 
'Alpharetrovirus')!

# we need to be going root(kingdom) -> leaf (species or lower) order
#
# we need to be storing untouched versions of the scientific name of
# the previous rank ($self->{_last_raw})
#
# probably only bother start doing this when we get to genus
my $last_raw = $self->{_last_raw} || undef;
$self->{_last_raw} = $sci_name;
if ($last_raw) {
   $sci_name =~ s/$last_raw//;
   $sci_name =~ s/^\s+//;
}

Are there even more strange species (and lower) names that would still 
not work well with the above solution?

Cheers,
Sendu.


From s_maheshwari84 at rediffmail.com  Fri May 12 09:55:49 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 12 May 2006 13:55:49 -0000
Subject: [Bioperl-l] problem help me...........please
Message-ID: <20060512135549.27106.qmail@webmail9.rediffmail.com>

  
hello
I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
I am working on protein protein interaction but I am unable to use the protein interaction module i.e. ProteinGraph.pm..
Actially I am facing lots of problem in the programme I have written Please help me since last four months I am not able to solve the same problem..
I am pasting my programe here also I am attaching it also. ......

#!usr/bin/perl
use lib "/usr/local/bioxapps/bioperl/library/";
use strict;
use Bio::Graph::SimpleGraph;
use Bio::Graph::IO;
our @ISA=qw( Bio::SeqI);
use Bio::Graph::Edge;
use Bio::Graph::IO::dip;
use Bio::Graph::IO::psi_xml;
use Clone qw(clone);
use vars  qw(@ISA);
use Bio::AnnotatableI;
use Bio::IdentifiableI;
our @ISA = qw(Bio::Graph::SimpleGraph);
@ISA = qw(Bio::Graph::IO);
our @ISA=qw(Expoerter);
use Bio::Graph::ProteinGraph;
use Class::AutoClass;
use Bio::Graph::SimpleGraph::Traversal;

my $graphio = Bio::Graph::IO->new(-file   => '/users/saurabh/perl_program/sample1.txt',-format => 'dip');
print "$graphio";
my $graph   = $graphio->next_network();
print "$graph->nodes\t";
$graph->remove_dup_edges();
my @un=$graph->unconnected_nodes();
print "\nthe unconnected nodes are =@un";
my @n=$graph->subgraph();
print "\subgraph=@n\n";
#print "Please the protein-id whose clusering coefficient is to be detemined\n";
#my $v=<STDIN>;
my $density = $graph->density();
print "\ngraph density=$density\n";
my @graphs = $graph->components();
print "\nno of Connected components=$#graphs\n";
print "\nplease enter the protein-id whom you want to remove from the network\n";
my $no=<STDIN>;
$graph->remove_nodes($graph->nodes_by_id($no));
my $count = $graph->edge_count();
print "\nno of edges=$count\n ";
my $ncount = $graph->node_count();
print "\nno of nodes=$ncount\n ";

print"\nenter the protein  whose interactions is to be find "; 
my $x=<STDIN>;
my $node = $graph->nodes_by_id($x);
#print " this is $node\n";
my @neighbors = $graph->neighbors($node); 
print "to check";
print join",",map{$_->object_id()} @neighbors;
my @nodes = $graph->nodes();
print "\nno of nodes = @nodes\t\n";
my @hubs;
foreach my $nodi (@nodes) 
 {
  if ($graph->neighbor_count($node) > 10) 
      {
       push @hubs, $nodi;
      }
  }
  
foreach my $r(@hubs)
  {
     my @y=@$r;
      print "the following proteins have > 10 interactors=@y\n";
  }
  #siblingual protein

 my @edgeref = $graph->articulation_points();
 print "no of articulation points=$#edgeref\n";
 print "please enter the protein whom you want to check for articulation point \n ";
 my $nod=<STDIN>;
  # make pathgen graph
  my $grap = Bio::Graph::IO->new(-file   => 'org.txt',-format => 'dip');
  my $gra   = $grap->next_network();
  $graph->remove_dup_edges();
  $graph->union($gra);
  my @duplicates = $graph->dup_edges();
  print "these interactions exist in cere and c.elegan\n=@duplicates";
  print "please enter the first protein for identifiaction of shortest path\n";
  my $p1=<STDIN>;
  print "please enter the second  protein for identifiaction of shortest path\n";
  my $p2=<STDIN>;
  
    my @a=$graph->shortest_paths();
 print "shortest path=@a\t\n";
    
  
with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: from.pl
Type: application/octet-stream
Size: 2723 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060512/fe287972/attachment-0003.obj>

From chen_li3 at yahoo.com  Thu May 11 13:47:33 2006
From: chen_li3 at yahoo.com (chen li)
Date: Thu, 11 May 2006 10:47:33 -0700 (PDT)
Subject: [Bioperl-l] script for batch-primer design using primer3 module
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca>
Message-ID: <20060511174733.68836.qmail@web36812.mail.mud.yahoo.com>

Hi all,

With the valuable input from many of you I finally
come out a script for my personal need:
1)bacth-primer design
2)set some of the parameters instead of using all the
default values
3)output only part of the information for the first
pair of primers but not all of them(but you can
choose)
4)the reults can be exported into excel for my
convience.

Enclosed are the script and the results tested.  I
also include some lines about how I figure out which
keys/entries are vailable for change.If you don't 
want the sequence part just add # to comment it.

Any comments are welcome.

BTW the solution suggested by Dr. Cui and Paul doesn't
work for me.

Once again thank you very much,

Li  

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: primer3-5
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment-0003.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: result1.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment-0003.txt>

From Marc.Logghe at DEVGEN.com  Fri May 12 11:28:55 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Fri, 12 May 2006 17:28:55 +0200
Subject: [Bioperl-l] problem help me...........please
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAB@ANTARESIA.be.devgen.com>

Hi,
What is actually the problem ? Do you have errors ? Is the script not
behaving as you expect ?
You also might attach the input file sample1.txt so that people can try
it.
Regards,
Marc
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> saurabh maheshwari
> Sent: Friday, May 12, 2006 3:56 PM
> To: bioperl-l at bioperl.org; s_maheshwari84
> Subject: [Bioperl-l] problem help me...........please
> 
>   
> hello
> I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
> I am working on protein protein interaction but I am unable 
> to use the protein interaction module i.e. ProteinGraph.pm..
> Actially I am facing lots of problem in the programme I have 
> written Please help me since last four months I am not able 
> to solve the same problem..
> I am pasting my programe here also I am attaching it also. ......
> 
> #!usr/bin/perl
> use lib "/usr/local/bioxapps/bioperl/library/";
> use strict;
> use Bio::Graph::SimpleGraph;
> use Bio::Graph::IO;
> our @ISA=qw( Bio::SeqI);
> use Bio::Graph::Edge;
> use Bio::Graph::IO::dip;
> use Bio::Graph::IO::psi_xml;
> use Clone qw(clone);
> use vars  qw(@ISA);
> use Bio::AnnotatableI;
> use Bio::IdentifiableI;
> our @ISA = qw(Bio::Graph::SimpleGraph);
> @ISA = qw(Bio::Graph::IO);
> our @ISA=qw(Expoerter);
> use Bio::Graph::ProteinGraph;
> use Class::AutoClass;
> use Bio::Graph::SimpleGraph::Traversal;
> 
> my $graphio = Bio::Graph::IO->new(-file   => 
> '/users/saurabh/perl_program/sample1.txt',-format => 'dip');
> print "$graphio";
> my $graph   = $graphio->next_network();
> print "$graph->nodes\t";
> $graph->remove_dup_edges();
> my @un=$graph->unconnected_nodes();
> print "\nthe unconnected nodes are =@un"; my 
> @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please 
> the protein-id whose clusering coefficient is to be 
> detemined\n"; #my $v=<STDIN>; my $density = 
> $graph->density(); print "\ngraph density=$density\n"; my 
> @graphs = $graph->components(); print "\nno of Connected 
> components=$#graphs\n"; print "\nplease enter the protein-id 
> whom you want to remove from the network\n"; my $no=<STDIN>; 
> $graph->remove_nodes($graph->nodes_by_id($no));
> my $count = $graph->edge_count();
> print "\nno of edges=$count\n ";
> my $ncount = $graph->node_count();
> print "\nno of nodes=$ncount\n ";
> 
> print"\nenter the protein  whose interactions is to be find 
> "; my $x=<STDIN>; my $node = $graph->nodes_by_id($x); #print 
> " this is $node\n"; my @neighbors = $graph->neighbors($node); 
> print "to check"; print join",",map{$_->object_id()} 
> @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes 
> = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes)  {
>   if ($graph->neighbor_count($node) > 10) 
>       {
>        push @hubs, $nodi;
>       }
>   }
>   
> foreach my $r(@hubs)
>   {
>      my @y=@$r;
>       print "the following proteins have > 10 interactors=@y\n";
>   }
>   #siblingual protein
> 
>  my @edgeref = $graph->articulation_points();  print "no of 
> articulation points=$#edgeref\n";  print "please enter the 
> protein whom you want to check for articulation point \n ";  
> my $nod=<STDIN>;
>   # make pathgen graph
>   my $grap = Bio::Graph::IO->new(-file   => 'org.txt',-format 
> => 'dip');
>   my $gra   = $grap->next_network();
>   $graph->remove_dup_edges();
>   $graph->union($gra);
>   my @duplicates = $graph->dup_edges();
>   print "these interactions exist in cere and c.elegan\n=@duplicates";
>   print "please enter the first protein for identifiaction of 
> shortest path\n";
>   my $p1=<STDIN>;
>   print "please enter the second  protein for identifiaction 
> of shortest path\n";
>   my $p2=<STDIN>;
>   
>     my @a=$graph->shortest_paths();
>  print "shortest path=@a\t\n";
>     
>   
> 
> with Regards
> 
> SAURABH MAHESHWARI
> 
> M.Sc. (BIOINFORMATICS)
> 
> JAMIA MILLIA ISLAMIA
> 
> NEW DELHI
> 


From stoltzfu at umbi.umd.edu  Fri May 12 11:56:06 2006
From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus)
Date: Fri, 12 May 2006 11:56:06 -0400
Subject: [Bioperl-l] proposal: Bio::CDAT (character data and trees)
Message-ID: <A52F256F-A851-4429-A5B1-D3162A344790@umbi.umd.edu>

Dear developers--

We propose a Bio::CDAT (Character Data And Trees) module to  
facilitate comparative analysis
using evolutionary methods by 1) managing evolutionary relationships  
(by linking data to trees)
and 2) allowing coordinated analysis of different types of data (by  
implementing a generic concept
of ?character-state? data).  Bio::CDAT would leverage existing  
BioPerl objects and include the functionality
of Rutger Vos's Bio::Phylo.  It would provide the framework to  
develop interfaces to analysis tools
(phylogeny inference, evolutionary rate models, functional shift  
inference, etc), as well as to file
formats and visualization methods appropriate for such analyses.  A  
proposal is available at

   http://www.molevol.org/camel/projects/CDAT-proposal.pdf

We would like to hear your thoughts (e.g., see the section on  
"Questions to consider")!  Thanks

Arlin Stoltzfus
WeiGang Qiu
Rutger Vos
(with thanks to Justin Reese and Aaron Mackey)
------------------
Arlin Stoltzfus (stoltzfu at umbi.umd.edu)
CARB, 9600 Gudelsky Drive, Rockville, Maryland  20850
tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel


From sdavis2 at mail.nih.gov  Fri May 12 11:54:57 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Fri, 12 May 2006 11:54:57 -0400
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060512135549.27106.qmail@webmail9.rediffmail.com>
Message-ID: <C08A2811.B6B5%sdavis2@mail.nih.gov>


On 5/12/06 9:55 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
wrote:

>   
> hello
> I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
> I am working on protein protein interaction but I am unable to use the protein
> interaction module i.e. ProteinGraph.pm..
> Actially I am facing lots of problem in the programme I have written Please
> help me since last four months I am not able to solve the same problem..
> I am pasting my programe here also I am attaching it also. ......

You haven't really told us what you are trying to do or what problems you
are having.

Sean


From cjfields at uiuc.edu  Fri May 12 13:08:11 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 12 May 2006 12:08:11 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <44646267.2000802@mrc-dunn.cam.ac.uk>
Message-ID: <000f01c675e6$a61bde90$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Friday, May 12, 2006 5:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles
> species,subspecies/variant names
> 
> In bioperl up to at least 1.5.1, when one of the database modules comes
> across a species rank it does:
> 
> if ($rank eq 'species') {
>    # get rid of genus from species name
>    (undef,$taxon_name) = split(/\s+/,$taxon_name,2);
> }

The XML example from NCBI Taxonomy I mentioned previously seems to have
everything in the classification, from superkingdom down to species (no
strain unfortunately, and I'm nit sure about subspecies); if it's missing
the rank then the designation doesn't exist or is tagged as 'no rank'.  Like
I mentioned before I'm not intimately familiar Bio::Taxonomy,
Bio::DB::Taxonomy, or Bio::Species, so I don't have a clue as to how
everything is parsed and plugged in to Bio::Taxonomy objects.  I do know
that XML::Twig is used for parsing through the data so it shouldn't be too
hard to change what you want.

I haven't tried using Bio::DB::Taxonomy directly yet, but I would have
thought that the binomial is just built from the XML twig 'LineageEx'
Rank=Genus + Rank=Species, that the genus comes from the tag 'Genus' and
species from 'Species', and that the scientific name is from the tag
'ScientificName'.  Guess not. 

> However even though true scientific name is usually 'Genus species' in
> the database, note the 'usually' - sometimes the species is a multiword
> item that does not include the Genus, so we can't do some simple split
> and take the second word.
> The same applies to levels below species, eg. 'Avian erythroblastosis
> virus' is a variant of the species 'Avian leukosis virus' but 'Avian
> erythroblastosis virus (strain ES4)' is a variant of that variant...
> 
> My solution is to just remove whatever is the same between the current
> rank and the previous rank. Maybe even that's not so perfect, but it
> must be a lot better than turning the species 'Avian leukosis virus'
> into the species 'virus' (especially given that the genus here is
> 'Alpharetrovirus')!
> 
> # we need to be going root(kingdom) -> leaf (species or lower) order
> #
> # we need to be storing untouched versions of the scientific name of
> # the previous rank ($self->{_last_raw})
> #
> # probably only bother start doing this when we get to genus
> my $last_raw = $self->{_last_raw} || undef;
> $self->{_last_raw} = $sci_name;
> if ($last_raw) {
>    $sci_name =~ s/$last_raw//;
>    $sci_name =~ s/^\s+//;
> }
> 
> Are there even more strange species (and lower) names that would still
> not work well with the above solution?

I'm don't think taking Genus/Species directly from the scientific name
(normally what is in the SOURCE or ORGANISM annotation for GenBank or OS for
EMBL) is the best way to go about it since it's really a best guess using
regex; Jason pointed out several examples where this falls apart, and being
a bacterial man I have found many examples myself.  I'm also not sure that
forcing a lookup for every TaxID in every sequence every time it's passed
through SeqIO is the best way to go either, though I think it should be
required for storing sequences.  It's a tricky balance.  

I still think that maybe we should absolve ourselves from using
SOURCE/ORGANISM or OS/OC information in GenBank files as anything more than
strictly annotation, or reconstruct Bio::Species to maybe a
Bio::Annotation::Species object to handle that annotation and either
deprecate Bio::Species or separate it completely from any Bio::Taxonomy
objects.  It would really simplify things.  Then, if anyone is interested in
taxonomy, either install a local database or use Entrez efetch, and then use
Bio::DB::Taxonomy (fixed of course) to grab the TaxID info.  Seems like
we're running more and more into exceptions to the rule as more genomes are
made available.

Anyway, using Bio::Species for GenBank is really screwy for bacterial names,
so currently I get around BioPerl issues with bacterial names by grabbing
the 'source' seqfeature and pulling the 'organism' tag out.  But it really
shouldn't be that obfuscated, right?

Chris

> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Sat May 13 08:19:21 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sat, 13 May 2006 08:19:21 -0400
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060513041853.16091.qmail@webmail31.rediffmail.com>
References: <20060513041853.16091.qmail@webmail31.rediffmail.com>
Message-ID: <4465CEC9.2010909@mail.nih.gov>

saurabh maheshwari wrote:
>  
> hello
> Thanks for your prompt reply.
> Actaully I am trying to make a protein interaction graph from a dip 
> file.But I am not able to do so.In my last mail I have already attached 
> my program which is giving some error and I am not able troble shot 
> them.Please help
> Thanks

I meant that since we don't know what error(s) you are getting, it is 
really not possible to determine what the problem is.  Also, someone 
else on the list offered to look at your code if you were to privide the 
input file.  I find it helpful to look at this webpage every now and 
then to remind myself what constitutes a useful question to email lists:

http://www.catb.org/~esr/faqs/smart-questions.html

Sean


> On Fri, 12 May 2006 Sean Davis wrote :
>  >
>  >
>  >
>  >On 5/12/06 9:55 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
>  >wrote:
>  >
>  > >
>  > > hello
>  > > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
>  > > I am working on protein protein interaction but I am unable to use 
> the protein
>  > > interaction module i.e. ProteinGraph.pm..
>  > > Actially I am facing lots of problem in the programme I have 
> written Please
>  > > help me since last four months I am not able to solve the same 
> problem..
>  > > I am pasting my programe here also I am attaching it also. ......
>  >
>  >You haven't really told us what you are trying to do or what problems you
>  >are having.
>  >
>  >Sean
>  >
>  >_______________________________________________
>  >Bioperl-l mailing list
>  >Bioperl-l at lists.open-bio.org
>  >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> with Regards
> SAURABH MAHESHWARI
> M.Sc. (BIOINFORMATICS)
> JAMIA MILLIA ISLAMIA
> NEW DELHI
> 
> <http://adworks.rediff.com/cgi-bin/AdWorks/sigclick.cgi/www.rediff.com/signature-home.htm/1507191490 at Middle5?PARTNER=3> 
> 


From s_maheshwari84 at rediffmail.com  Sat May 13 01:17:58 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 13 May 2006 05:17:58 -0000
Subject: [Bioperl-l] problem help me...........please
Message-ID: <20060513051758.4610.qmail@webmail31.rediffmail.com>

  
hello
I am very happy to see the prompt reply from the group members..
As you all suggested  to attach the required files ..
So I have attached all the three file first the input file,secod I have saved the error I was getting into a error file and third the programme file..
Actully in error file I want to know some thing .
I am putting here one error line,
## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
what this stand for
Second thing I want to get the connected graph as I have.
which type of connected grph I explain you by example..
Let there are five object in such a way.
A connected to B
A connected to C
B connected to C
D connected to C
E connected to A
I want to create a whole link in betwwen all five.


Please help me I am not getting the result


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample.dip
Type: application/octet-stream
Size: 5794 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment-0006.obj>
-------------- next part --------------
bash-2.05b$ perl from.pl
Bio::Graph::ProteinGraph=HASH(0x1182e70)
Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes
the unconnected nodes are =subgraph=Bio::Graph::SimpleGraph=HASH(0x11e2160)

graph density=0.00826446280991736

no of Connected components=60

please enter the protein-id whom you want to remove from the network
XMECF2

no of edges=61

no of nodes=122

enter the protein  whose interactions is to be find XMECF2
XMECF2
 interacts with map{->object_id()}

no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) Bio::Seq::RichSeq=HASH(0x11d1850
) Bio::Seq::RichSeq=HASH(0x11bd4c0) Bio::Seq::RichSeq=HASH(0x11c2fd0) Bio::Seq::
RichSeq=HASH(0x11aa7f0) Bio::Seq::RichSeq=HASH(0x1198340) Bio::Seq::RichSeq=HASH
(0x11d81a0) Bio::Seq::RichSeq=HASH(0x11ca320) Bio::Seq::RichSeq=HASH(0x11b5e40)
Bio::Seq::RichSeq=HASH(0x1190e00) Bio::Seq::RichSeq=HASH(0x11c1350) Bio::Seq::Ri
chSeq=HASH(0x11b2e20) Bio::Seq::RichSeq=HASH(0x11cb360) Bio::Seq::RichSeq=HASH(0
x1198250) Bio::Seq::RichSeq=HASH(0x11d0240) Bio::Seq::RichSeq=HASH(0x11c8f20) Bi
o::Seq::RichSeq=HASH(0x11b4ef0) Bio::Seq::RichSeq=HASH(0x119f7a0) Bio::Seq::Rich
Seq=HASH(0x11c2ee0) Bio::Seq::RichSeq=HASH(0x11dba20) Bio::Seq::RichSeq=HASH(0x1
1e2300) Bio::Seq::RichSeq=HASH(0x11b2f10) Bio::Seq::RichSeq=HASH(0x11b4b90) Bio:
:Seq::RichSeq=HASH(0x11d4df0) Bio::Seq::RichSeq=HASH(0x11d4b80) Bio::Seq::RichSe
q=HASH(0x11d8e70) Bio::Seq::RichSeq=HASH(0x11a1270) Bio::Seq::RichSeq=HASH(0x11c
b5d0) Bio::Seq::RichSeq=HASH(0x11d5cc0) Bio::Seq::RichSeq=HASH(0x11d32a0) Bio::S
eq::RichSeq=HASH(0x11b4c80) Bio::Seq::RichSeq=HASH(0x119e0c0) Bio::Seq::RichSeq=
HASH(0x11b7ed0) Bio::Seq::RichSeq=HASH(0x11ad490) Bio::Seq::RichSeq=HASH(0x1196e
60) Bio::Seq::RichSeq=HASH(0x119b7f0) Bio::Seq::RichSeq=HASH(0x11cef60) Bio::Seq
::RichSeq=HASH(0x11b7b70) Bio::Seq::RichSeq=HASH(0x11dd330) Bio::Seq::RichSeq=HA
SH(0x11da8c0) Bio::Seq::RichSeq=HASH(0x11a9f70) Bio::Seq::RichSeq=HASH(0x119b700
) Bio::Seq::RichSeq=HASH(0x119a550) Bio::Seq::RichSeq=HASH(0x11ba910) Bio::Seq::
RichSeq=HASH(0x11e0b30) Bio::Seq::RichSeq=HASH(0x11d3030) Bio::Seq::RichSeq=HASH
(0x11c62d0) Bio::Seq::RichSeq=HASH(0x11abb20) Bio::Seq::RichSeq=HASH(0x11d5bd0)
Bio::Seq::RichSeq=HASH(0x11b03c0) Bio::Seq::RichSeq=HASH(0x119e1b0) Bio::Seq::Ri
chSeq=HASH(0x11aa060) Bio::Seq::RichSeq=HASH(0x11a5700) Bio::Seq::RichSeq=HASH(0
x11a81e0) Bio::Seq::RichSeq=HASH(0x1196b00) Bio::Seq::RichSeq=HASH(0x11c1260) Bi
o::Seq::RichSeq=HASH(0x11a2800) Bio::Seq::RichSeq=HASH(0x11c63c0) Bio::Seq::Rich
Seq=HASH(0x11b60b0) Bio::Seq::RichSeq=HASH(0x11b93b0) Bio::Seq::RichSeq=HASH(0x1
1a4490) Bio::Seq::RichSeq=HASH(0x11ded50) Bio::Seq::RichSeq=HASH(0x11bbcd0) Bio:
:Seq::RichSeq=HASH(0x1194780) Bio::Seq::RichSeq=HASH(0x11aedd0) Bio::Seq::RichSe
q=HASH(0x11cd300) Bio::Seq::RichSeq=HASH(0x11a14e0) Bio::Seq::RichSeq=HASH(0x11c
4630) Bio::Seq::RichSeq=HASH(0x11a43a0) Bio::Seq::RichSeq=HASH(0x11a80f0) Bio::S
eq::RichSeq=HASH(0x11bbbe0) Bio::Seq::RichSeq=HASH(0x11d5960) Bio::Seq::RichSeq=
HASH(0x11c8e30) Bio::Seq::RichSeq=HASH(0x11cd3f0) Bio::Seq::RichSeq=HASH(0x11dd4
20) Bio::Seq::RichSeq=HASH(0x11cee70) Bio::Seq::RichSeq=HASH(0x11dbb10) Bio::Seq
::RichSeq=HASH(0x119a460) Bio::Seq::RichSeq=HASH(0x11aaa60) Bio::Seq::RichSeq=HA
SH(0x11d1760) Bio::Seq::RichSeq=HASH(0x11cb6c0) Bio::Seq::RichSeq=HASH(0x11c7530
) Bio::Seq::RichSeq=HASH(0x11deae0) Bio::Seq::RichSeq=HASH(0x11c4720) Bio::Seq::
RichSeq=HASH(0x119f890) Bio::Seq::RichSeq=HASH(0x11a6c40) Bio::Seq::RichSeq=HASH
(0x11ad130) Bio::Seq::RichSeq=HASH(0x11e23f0) Bio::Seq::RichSeq=HASH(0x11d2f40)
Bio::Seq::RichSeq=HASH(0x1194640) Bio::Seq::RichSeq=HASH(0x11d8f60) Bio::Seq::Ri
chSeq=HASH(0x11d0150) Bio::Seq::RichSeq=HASH(0x119d070) Bio::Seq::RichSeq=HASH(0
x11a5610) Bio::Seq::RichSeq=HASH(0x11aa2d0) Bio::Seq::RichSeq=HASH(0x11b94a0) Bi
o::Seq::RichSeq=HASH(0x11bd5b0) Bio::Seq::RichSeq=HASH(0x11c0ff0) Bio::Seq::Rich
Seq=HASH(0x11a6b50) Bio::Seq::RichSeq=HASH(0x119cf80) Bio::Seq::RichSeq=HASH(0x1
1baa00) Bio::Seq::RichSeq=HASH(0x11c7620) Bio::Seq::RichSeq=HASH(0x119fb00) Bio:
:Seq::RichSeq=HASH(0x11a2a70) Bio::Seq::RichSeq=HASH(0x11b1960) Bio::Seq::RichSe
q=HASH(0x11ab8b0) Bio::Seq::RichSeq=HASH(0x11e0c20) Bio::Seq::RichSeq=HASH(0x11a
d3a0) Bio::Seq::RichSeq=HASH(0x1197fe0) Bio::Seq::RichSeq=HASH(0x11b1870) Bio::S
eq::RichSeq=HASH(0x11a2b60) Bio::Seq::RichSeq=HASH(0x1192750) Bio::Seq::RichSeq=
HASH(0x11c9190) Bio::Seq::RichSeq=HASH(0x11e08c0) Bio::Seq::RichSeq=HASH(0x11dd6
90) Bio::Seq::RichSeq=HASH(0x11da7d0) Bio::Seq::RichSeq=HASH(0x11aece0) Bio::Seq
::RichSeq=HASH(0x11d80b0) Bio::Seq::RichSeq=HASH(0x11ca0b0) Bio::Seq::RichSeq=HA
SH(0x1196bf0) Bio::Seq::RichSeq=HASH(0x11b7de0) Bio::Seq::RichSeq=HASH(0x11b02d0
)
Can't call method "isa" on an undefined value at /usr/local/bioxapps/bioperl/lib
rary//Bio/Graph/ProteinGraph.pm line 477, <STDIN> line 2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: from.pl
Type: application/octet-stream
Size: 2723 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment-0007.obj>

From cjfields at uiuc.edu  Sat May 13 14:18:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 13 May 2006 13:18:53 -0500
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060513051758.4610.qmail@webmail31.rediffmail.com>
Message-ID: <000901c676b9$b14479c0$15327e82@pyrimidine>

I really hate to break the bad news here, but I'm going to be brutally
honest.  I have not looked at any of the Bio::Graph modules and have no idea
how they are implemented, and I haven't looked at your input file, but I can
tell right off the bat your script has major logic problems.  I can also
pretty much  tell that you don't understand the object model we use here, at
all.  This is why I say that (from your last response):

> ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
> what this stand for

Did you cut and paste from several other scripts hoping that it would work?
I say that b/c you mix styles quite frequently here, using objects correctly
(deref'ing with '->') and incorrectly (print "$object").  You also declare
(and redeclare) @ISA four times for a script (not needed unless you're
declaring a class and inheriting methods from other modules).  You also use
@ISA once with a misspelled module name (I don't think there is a module
named 'Expoerter').  So, I'm actually stunned that the script doesn't crash
at all.  Yikes!

Okay, brutal honesty time over.  Any time you see something like this:

Bio::Graph::ProteinGraph=HASH(0x1182e70)

means that what you are printing out is an reference to an object (it refers
to the object class and the location in memory) and is NOT what you want.
You should be doing something along the lines of $object->method, not 'print
$object', to get at the object data and methods.  You use this several times
in your script already; that should be a big hint as the areas where it
doesn't work do not use this syntax.  Read the documentation for the many
varied modules you use in your script.  Look at script examples.  Start
simply, then work your way up.  

Also, using the '->' dereferencing operator inside double quotes doesn't
work; you have to do something like:

print $graph->nodes,"\t";

not 

print "$graph->nodes\t";

That's why you get this in your output:

Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes

Which just prints the object reference with the string '->nodes'.

If any of what I just said doesn't make any sense, you really need to pick
up 'Learning Perl' and 'Intermediate Perl' by Schwartz et al and
'Programming Perl' by Wall et al.  I don't know if anyone can really help at
this point w/o completely writing the script for you.  We will fix problems
to a point but we, for the most part, will not do your work for you.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari
> Sent: Saturday, May 13, 2006 12:18 AM
> To: bioperl_l
> Subject: [Bioperl-l] problem help me...........please
> 
> 
> hello
> I am very happy to see the prompt reply from the group members..
> As you all suggested  to attach the required files ..
> So I have attached all the three file first the input file,secod I have
> saved the error I was getting into a error file and third the programme
> file..
> Actully in error file I want to know some thing .
> I am putting here one error line,
> ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
> what this stand for
> Second thing I want to get the connected graph as I have.
> which type of connected grph I explain you by example..
> Let there are five object in such a way.
> A connected to B
> A connected to C
> B connected to C
> D connected to C
> E connected to A
> I want to create a whole link in betwwen all five.
> 
> 
> Please help me I am not getting the result
> 
> 
> with Regards
> 
> SAURABH MAHESHWARI
> 
> M.Sc. (BIOINFORMATICS)
> 
> JAMIA MILLIA ISLAMIA
> 
> NEW DELHI


From hubert.prielinger at gmx.at  Sat May 13 23:45:58 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sat, 13 May 2006 21:45:58 -0600
Subject: [Bioperl-l] parsing output files from other tools
Message-ID: <4466A7F6.30204@gmx.at>

hi,
Is it possible to parse text outputfiles rather than blast output files, 
like the text outputfiles form the search tool mpSrch that is offered by
EBI, because the WU Blast output files are possible to parse with bioperl.

thanks
Hubert


From arareko at campus.iztacala.unam.mx  Sun May 14 00:09:35 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 13 May 2006 23:09:35 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <4466AD7F.6050700@campus.iztacala.unam.mx>

I'm glad to announce the availability of the Deobfuscator interface at 
the BioPerl website. You can use it at the following URL:

http://bioperl.org/cgi-bin/deob_interface.cgi

Many thanks to Laura Kavanaugh and David Messina for this great 
contribution to the BioPerl project!

Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Sun May 14 12:18:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 14 May 2006 11:18:10 -0500
Subject: [Bioperl-l] parsing output files from other tools
In-Reply-To: <4466A7F6.30204@gmx.at>
Message-ID: <000301c67772$00b4e4f0$15327e82@pyrimidine>

These are the current report types parsed through SearchIO:

http://www.bioperl.org/wiki/Module:Bio::SearchIO

I don't see mpsrch among them.  If you want you could create a new plugin
module to parse those reports; the SearchIO HOWTO gives some pointers:

http://www.bioperl.org/wiki/HOWTO:SearchIO

You can always look at some of the current modules like blast, blastxml, or
fasta to get an idea of how it works.  Judging by the mpsrch output I'm
pretty sure you would have to build a custom plugin for it.  

A viable alternative: looking through the mail list it looks like mpsrch is
a multiprocessor implementation of ssearch, itself an implementation of the
Smith-Waterman algorithm for local alignments in the FASTA package of
programs:

http://www.bioperl.org/wiki/SSEARCH

You might be able to use SearchIO::fasta there...

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Saturday, May 13, 2006 10:46 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] parsing output files from other tools
> 
> hi,
> Is it possible to parse text outputfiles rather than blast output files,
> like the text outputfiles form the search tool mpSrch that is offered by
> EBI, because the WU Blast output files are possible to parse with bioperl.
> 
> thanks
> Hubert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May 14 13:14:30 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 14 May 2006 10:14:30 -0700 (PDT)
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com>

Hi all,

I need to get a reverse-complemenary sequence out of a
fasta sequence file. And the Synopsis of Bio::Seq
points out I can do like this way:

$revcom=$seqobj->revcom();

I use the following script trying to get the job done
but it doesn't work. Then I read documentation of
Bio::Seq and it looks like it doesn't contain revcom
method.

Any idea will be appreciated.

Li 


###############################
Here is the code:

#!c:/perl/bin/perl.exe
use strict;
use warnings;

use Bio::Seq; 
use Bio::SeqIO;     
       
my $file='c:/perl/local/primer3_1.0.0/src/est.txt';   
 
    
my $seqIO=Bio::SeqIO->new(-file=>"<$file",
                            -format=>'fasta' );
                            
    my $seqobj=$seqIO->next_seq();#create object  
    
  print "what attributes/keys are available:\n";    
  for my $key (sort keys %$seqobj){
           my $value=$seqobj->{$key};
	    print "$key\t=>\t$value\n"
	    }
# These are the output on the screen	    
#primary_id =>      gi|54093|emb|X61809.1|
#primary_seq =>     Bio::PrimarySeq=HASH(0x10492848)

#based on these results primary_id can get 
#access right away
# as to primary_seq it is an object in
#Bio::Primaryseq and it provides the following
#methods after reading the documentaion:
                #new   
		#seq 
		#validate_seq 
		#subseq 
		#length 
		#display_id
		#accession_number 
		#primary_id 
		#alphabet 
		#desc 
		#can_call_new
		#id 
		#is_circular 
		#object_id
		#version 
		#authority 
		#namespace 
		#display_name 
		#description 
    
print "primary_id=",$seqobj->primary_id, "\n\n";
print "id=",$seqobj->id, "\n\n"; 
print "revcom=",$seqobj->revcom,"\n\n";
                      
        my $now_time=localtime;
        print  $now_time, "\n\n";  
        exit;

 #These are the output on the screen 
	#primary_id=gi|54093|emb|X61809.1|
	#id=gi|54093|emb|X61809.1
	#revcom=Bio::Seq=HASH(0x10493304)
	#Sun May 14 12:45:20 2006

      
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Sun May 14 13:39:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 14 May 2006 12:39:50 -0500
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
In-Reply-To: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com>
Message-ID: <000401c6777d$66ddb120$15327e82@pyrimidine>

This line should give you the hint:

	#revcom=Bio::Seq=HASH(0x10493304)

You're getting an object ref here.  The actual way to get the rev. comp on
the wiki states '$seq->revcom->seq', not '$seq->revcom'.

When I ran your script and change your line to the wiki version I get (using
my test seq):

what attributes/keys are available:
primary_id      =>      test,
primary_seq     =>      Bio::PrimarySeq=HASH(0x1d47fe0)
primary_id=test,

id=test,

revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG
CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG
TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG
GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG

Sun May 14 17:34:45 2006

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of chen li
> Sent: Sunday, May 14, 2006 12:15 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] no revcom method in Bio::Seq module?
> 
> Hi all,
> 
> I need to get a reverse-complemenary sequence out of a
> fasta sequence file. And the Synopsis of Bio::Seq
> points out I can do like this way:
> 
> $revcom=$seqobj->revcom();
> 
> I use the following script trying to get the job done
> but it doesn't work. Then I read documentation of
> Bio::Seq and it looks like it doesn't contain revcom
> method.
> 
> Any idea will be appreciated.
> 
> Li
> 
> 
> ###############################
> Here is the code:
> 
> #!c:/perl/bin/perl.exe
> use strict;
> use warnings;
> 
> use Bio::Seq;
> use Bio::SeqIO;
> 
> my $file='c:/perl/local/primer3_1.0.0/src/est.txt';
> 
> 
> my $seqIO=Bio::SeqIO->new(-file=>"<$file",
>                             -format=>'fasta' );
> 
>     my $seqobj=$seqIO->next_seq();#create object
> 
>   print "what attributes/keys are available:\n";
>   for my $key (sort keys %$seqobj){
>            my $value=$seqobj->{$key};
> 	    print "$key\t=>\t$value\n"
> 	    }
> # These are the output on the screen
> #primary_id =>      gi|54093|emb|X61809.1|
> #primary_seq =>     Bio::PrimarySeq=HASH(0x10492848)
> 
> #based on these results primary_id can get
> #access right away
> # as to primary_seq it is an object in
> #Bio::Primaryseq and it provides the following
> #methods after reading the documentaion:
>                 #new
> 		#seq
> 		#validate_seq
> 		#subseq
> 		#length
> 		#display_id
> 		#accession_number
> 		#primary_id
> 		#alphabet
> 		#desc
> 		#can_call_new
> 		#id
> 		#is_circular
> 		#object_id
> 		#version
> 		#authority
> 		#namespace
> 		#display_name
> 		#description
> 
> print "primary_id=",$seqobj->primary_id, "\n\n";
> print "id=",$seqobj->id, "\n\n";
> print "revcom=",$seqobj->revcom,"\n\n";
> 
>         my $now_time=localtime;
>         print  $now_time, "\n\n";
>         exit;
> 
>  #These are the output on the screen
> 	#primary_id=gi|54093|emb|X61809.1|
> 	#id=gi|54093|emb|X61809.1
> 	#revcom=Bio::Seq=HASH(0x10493304)
> 	#Sun May 14 12:45:20 2006
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May 14 14:08:49 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 14 May 2006 11:08:49 -0700 (PDT)
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
In-Reply-To: <000401c6777d$66ddb120$15327e82@pyrimidine>
Message-ID: <20060514180849.55423.qmail@web36808.mail.mud.yahoo.com>

Hi Chris,

Thank you very much. But could you please give me the
link for this syntax: $seq->revcom->seq?

Li


--- Chris Fields <cjfields at uiuc.edu> wrote:

> This line should give you the hint:
> 
> 	#revcom=Bio::Seq=HASH(0x10493304)
> 
> You're getting an object ref here.  The actual way
> to get the rev. comp on
> the wiki states '$seq->revcom->seq', not
> '$seq->revcom'.
> 
> When I ran your script and change your line to the
> wiki version I get (using
> my test seq):
> 
> what attributes/keys are available:
> primary_id      =>      test,
> primary_seq     =>     
> Bio::PrimarySeq=HASH(0x1d47fe0)
> primary_id=test,
> 
> id=test,
> 
>
revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG
> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
>
CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG
> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
>
GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG
> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG
> 
> Sun May 14 17:34:45 2006
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of chen li
> > Sent: Sunday, May 14, 2006 12:15 PM
> > To: bioperl-l at bioperl.org
> > Subject: [Bioperl-l] no revcom method in Bio::Seq
> module?
> > 
> > Hi all,
> > 
> > I need to get a reverse-complemenary sequence out
> of a
> > fasta sequence file. And the Synopsis of Bio::Seq
> > points out I can do like this way:
> > 
> > $revcom=$seqobj->revcom();
> > 
> > I use the following script trying to get the job
> done
> > but it doesn't work. Then I read documentation of
> > Bio::Seq and it looks like it doesn't contain
> revcom
> > method.
> > 
> > Any idea will be appreciated.
> > 
> > Li
> > 
> > 
> > ###############################
> > Here is the code:
> > 
> > #!c:/perl/bin/perl.exe
> > use strict;
> > use warnings;
> > 
> > use Bio::Seq;
> > use Bio::SeqIO;
> > 
> > my
> $file='c:/perl/local/primer3_1.0.0/src/est.txt';
> > 
> > 
> > my $seqIO=Bio::SeqIO->new(-file=>"<$file",
> >                             -format=>'fasta' );
> > 
> >     my $seqobj=$seqIO->next_seq();#create object
> > 
> >   print "what attributes/keys are available:\n";
> >   for my $key (sort keys %$seqobj){
> >            my $value=$seqobj->{$key};
> > 	    print "$key\t=>\t$value\n"
> > 	    }
> > # These are the output on the screen
> > #primary_id =>      gi|54093|emb|X61809.1|
> > #primary_seq =>    
> Bio::PrimarySeq=HASH(0x10492848)
> > 
> > #based on these results primary_id can get
> > #access right away
> > # as to primary_seq it is an object in
> > #Bio::Primaryseq and it provides the following
> > #methods after reading the documentaion:
> >                 #new
> > 		#seq
> > 		#validate_seq
> > 		#subseq
> > 		#length
> > 		#display_id
> > 		#accession_number
> > 		#primary_id
> > 		#alphabet
> > 		#desc
> > 		#can_call_new
> > 		#id
> > 		#is_circular
> > 		#object_id
> > 		#version
> > 		#authority
> > 		#namespace
> > 		#display_name
> > 		#description
> > 
> > print "primary_id=",$seqobj->primary_id, "\n\n";
> > print "id=",$seqobj->id, "\n\n";
> > print "revcom=",$seqobj->revcom,"\n\n";
> > 
> >         my $now_time=localtime;
> >         print  $now_time, "\n\n";
> >         exit;
> > 
> >  #These are the output on the screen
> > 	#primary_id=gi|54093|emb|X61809.1|
> > 	#id=gi|54093|emb|X61809.1
> > 	#revcom=Bio::Seq=HASH(0x10493304)
> > 	#Sun May 14 12:45:20 2006
> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Sun May 14 14:28:14 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sun, 14 May 2006 13:28:14 -0500
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <b3ef767e.b86a2fe8.820dd00@expms6.cites.uiuc.edu>

I think the confusion lies in what revcom returns.  This page

http://www.bioperl.org/wiki/Getting_Started

show a quick way of using revcom, (which I mentioned previously) while this 
page

http://www.bioperl.org/wiki/HOWTO:Beginners

explains what is returned when you use revcom.  '$seq_obj->revcom' returns a 
sequence object (not a sequence string):

http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object

which is why you need to use the 'seq' method to get the string.

Hence, '$seq_obj->revcom->seq'.

Chris

---- Original message ----
>Date: Sun, 14 May 2006 11:08:49 -0700 (PDT)
>From: chen li <chen_li3 at yahoo.com>  
>Subject: RE: [Bioperl-l] no revcom method in Bio::Seq module?  
>To: Chris Fields <cjfields at uiuc.edu>
>Cc: bioperl-l at bioperl.org
>
>Hi Chris,
>
>Thank you very much. But could you please give me the
>link for this syntax: $seq->revcom->seq?
>
>Li
>
>
>
>--- Chris Fields <cjfields at uiuc.edu> wrote:
>
>> This line should give you the hint:
>> 
>> 	#revcom=Bio::Seq=HASH(0x10493304)
>> 
>> You're getting an object ref here.  The actual way
>> to get the rev. comp on
>> the wiki states '$seq->revcom->seq', not
>> '$seq->revcom'.
>> 
>> When I ran your script and change your line to the
>> wiki version I get (using
>> my test seq):
>> 
>> what attributes/keys are available:
>> primary_id      =>      test,
>> primary_seq     =>     
>> Bio::PrimarySeq=HASH(0x1d47fe0)
>> primary_id=test,
>> 
>> id=test,
>> 
>>
>revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGAT
CGCGCGGTCCGGCAGCATCG
>> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
>>
>CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCG
TCGGCCGCGGGCAGTTCGGCG
>> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
>>
>GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGT
CACGTTGGAGCGGGCCACGCG
>> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG
>> 
>> Sun May 14 17:34:45 2006
>> 
>> Chris
>> 
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-
>> > bounces at lists.open-bio.org] On Behalf Of chen li
>> > Sent: Sunday, May 14, 2006 12:15 PM
>> > To: bioperl-l at bioperl.org
>> > Subject: [Bioperl-l] no revcom method in Bio::Seq
>> module?
>> > 
>> > Hi all,
>> > 
>> > I need to get a reverse-complemenary sequence out
>> of a
>> > fasta sequence file. And the Synopsis of Bio::Seq
>> > points out I can do like this way:
>> > 
>> > $revcom=$seqobj->revcom();
>> > 
>> > I use the following script trying to get the job
>> done
>> > but it doesn't work. Then I read documentation of
>> > Bio::Seq and it looks like it doesn't contain
>> revcom
>> > method.
>> > 
>> > Any idea will be appreciated.
>> > 
>> > Li
>> > 
>> > 
>> > ###############################
>> > Here is the code:
>> > 
>> > #!c:/perl/bin/perl.exe
>> > use strict;
>> > use warnings;
>> > 
>> > use Bio::Seq;
>> > use Bio::SeqIO;
>> > 
>> > my
>> $file='c:/perl/local/primer3_1.0.0/src/est.txt';
>> > 
>> > 
>> > my $seqIO=Bio::SeqIO->new(-file=>"<$file",
>> >                             -format=>'fasta' );
>> > 
>> >     my $seqobj=$seqIO->next_seq();#create object
>> > 
>> >   print "what attributes/keys are available:\n";
>> >   for my $key (sort keys %$seqobj){
>> >            my $value=$seqobj->{$key};
>> > 	    print "$key\t=>\t$value\n"
>> > 	    }
>> > # These are the output on the screen
>> > #primary_id =>      gi|54093|emb|X61809.1|
>> > #primary_seq =>    
>> Bio::PrimarySeq=HASH(0x10492848)
>> > 
>> > #based on these results primary_id can get
>> > #access right away
>> > # as to primary_seq it is an object in
>> > #Bio::Primaryseq and it provides the following
>> > #methods after reading the documentaion:
>> >                 #new
>> > 		#seq
>> > 		#validate_seq
>> > 		#subseq
>> > 		#length
>> > 		#display_id
>> > 		#accession_number
>> > 		#primary_id
>> > 		#alphabet
>> > 		#desc
>> > 		#can_call_new
>> > 		#id
>> > 		#is_circular
>> > 		#object_id
>> > 		#version
>> > 		#authority
>> > 		#namespace
>> > 		#display_name
>> > 		#description
>> > 
>> > print "primary_id=",$seqobj->primary_id, "\n\n";
>> > print "id=",$seqobj->id, "\n\n";
>> > print "revcom=",$seqobj->revcom,"\n\n";
>> > 
>> >         my $now_time=localtime;
>> >         print  $now_time, "\n\n";
>> >         exit;
>> > 
>> >  #These are the output on the screen
>> > 	#primary_id=gi|54093|emb|X61809.1|
>> > 	#id=gi|54093|emb|X61809.1
>> > 	#revcom=Bio::Seq=HASH(0x10493304)
>> > 	#Sun May 14 12:45:20 2006
>> > 
>> > 
>> > 
>> > __________________________________________________
>> > Do You Yahoo!?
>> > Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>> > http://mail.yahoo.com
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> >
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 


From Marc.Logghe at DEVGEN.com  Sun May 14 16:28:34 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Sun, 14 May 2006 22:28:34 +0200
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAC@ANTARESIA.be.devgen.com>

Hi Li,
> doesn't work. Then I read documentation of Bio::Seq and it 
> looks like it doesn't contain revcom method.
Here, the Deobfuscator interface that Mauricio announced earlier, comes
in handy.
http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3
A%3ASeq&sort_order=by+method&search_string=
If you look in the methods table, you will find out that the revcom
method is inherited from, and implemented by Bio::PrimarySeqI.
HTH,
Marc 


From sb at mrc-dunn.cam.ac.uk  Mon May 15 04:18:11 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 09:18:11 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
 subspecies/variant names
In-Reply-To: <000f01c675e6$a61bde90$15327e82@pyrimidine>
References: <000f01c675e6$a61bde90$15327e82@pyrimidine>
Message-ID: <44683943.5020307@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sendu Bala wrote:
>> In bioperl up to at least 1.5.1, when one of the database modules 
>> comes across a species rank it does:
>> 
>> if ($rank eq 'species') { # get rid of genus from species name 
>> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); }
> 
> The XML example from NCBI Taxonomy I mentioned previously seems to 
> have everything in the classification, from superkingdom down to 
> species (no strain unfortunately, and I'm nit sure about subspecies);
> if it's missing the rank then the designation doesn't exist or is 
> tagged as 'no rank'.  Like I mentioned before I'm not intimately 
> familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I 
> don't have a clue as to how everything is parsed and plugged in to 
> Bio::Taxonomy objects.  I do know that XML::Twig is used for parsing
> through the data so it shouldn't be too hard to change what you
> want.

Yes, that's all true, but I'm not sure what it has to do with what I was
saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my
own implementation I change the rank of all 'no rank' Nodes below
species to 'variant'.


> I haven't tried using Bio::DB::Taxonomy directly yet, but I would 
> have thought that the binomial is just built from the XML twig 
> 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the
> tag 'Genus' and species from 'Species', and that the scientific name
> is from the tag 'ScientificName'.  Guess not.

No. See above for what it actually does. That is a copy/paste from the
code (there, $taxon_name == ScientificName). When it finds a species
rank it does that split because in the
ncbi taxonomy database the 'genus' rank for a human has a ScientificName
of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo
sapiens', and the bioperl model (quite rightly, I think) wants the
'species' node to not have information of other nodes (well, except for
the classification array). So it removes the 'Homo' from 'Homo sapiens'
giving a species name of 'sapiens'. This then allows the binomial method
to return 'Homo sapiens' instead of 'Homo Homo sapiens'.

(though in a bizarre twist, and this is one of my problems with how
names are currently represented in the Taxonomy modules, 'Scientific
Name' and 'binomial' are synonymous)


[snip]
>> My solution is to just remove whatever is the same between the 
>> current rank and the previous rank. Maybe even that's not so 
>> perfect, but it must be a lot better than turning the species 
>> 'Avian leukosis virus' into the species 'virus' (especially given 
>> that the genus here is 'Alpharetrovirus')!
> 
> I'm don't think taking Genus/Species directly from the scientific 
> name (normally what is in the SOURCE or ORGANISM annotation for 
> GenBank or OS for EMBL) is the best way to go about it [snip]

Perhaps, but again I'm not sure what this has to do with what I was
saying. If you don't want your species name to contain your genus name
you have to do some kind of parsing. My post merely pointed out that the
parsing currently in bioperl does not work for viruses and possibly
other species. I'd like to think that someone cares about this error and
would do the simple fix I offered, or that they already know about the
problem and have done their own fix.


> I'm also not sure that forcing a lookup for every TaxID in every 
> sequence every time it's passed through SeqIO is the best way to go 
> either, though I think it should be required for storing sequences. 
> It's a tricky balance.

In my own implementation any database lookups are cached, and you have
the option of not doing any database lookup at all and 'faking' a
taxonomy from the supplied list of names (so it works just like normal
Bio::Seq).


> I still think that maybe we should absolve ourselves from using 
> SOURCE/ORGANISM or OS/OC information in GenBank files as anything 
> more than strictly annotation, or reconstruct Bio::Species to maybe a
>  Bio::Annotation::Species object to handle that annotation and either
>  deprecate Bio::Species or separate it completely from any 
> Bio::Taxonomy objects.  It would really simplify things.  Then, if 
> anyone is interested in taxonomy, either install a local database or
>  use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course)
>  to grab the TaxID info.

My personal view is that having it as an annotation would serve no real
purpose. For me the whole point of any kind of species representation in
bioperl is to allow you to compare species in a biologically meaningful
way. If it's just some annotation then that means it's basically
free-form text and you have no guarantee that two sequences from the
same species are annotated exactly the same - no guarantee that your
code would identify that those sequences are from the same species.
The only other useful thing that a species object needs to do it let you
know how related two different species are - you need to be able to ask
what a species' class, kingdom etc. are. Again, not viable with an
annotation - you need something strict like a properly constructed Taxonomy.

I guess it comes down to the philosophy of parsing a file. Do you try
and reflect exactly what the file contains, letter for letter, so that
your resulting object can recreate that file letter for letter, or do
you parse the file and extract the correct /meaning/ in order to be more
useful?
I think there can be a choice by the user, and this is best done by
making Bio::Species a clever wrapper around an improved Bio::Taxonomy,
as in my own implementation.


From s_maheshwari84 at rediffmail.com  Mon May 15 04:15:26 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 15 May 2006 08:15:26 -0000
Subject: [Bioperl-l] please help
Message-ID: <20060515081526.27270.qmail@webmail7.rediffmail.com>

  
Hello All
I have sent a problem to the earlier also but my problem is still unsolve so i have modified the problem in another way please can any body give me code to make a graph between some items which are in a text file in the following formate:
Example
item1 interacts with item2 and i want to make graph by giving any item as input and asking all interactions of that item.

item 1      item 2 
A            B
A            C
C            B
D            B
D            E
A            F
G            A     

with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI


From sdavis2 at mail.nih.gov  Mon May 15 06:26:53 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 15 May 2006 06:26:53 -0400
Subject: [Bioperl-l] please help
In-Reply-To: <20060515081526.27270.qmail@webmail7.rediffmail.com>
Message-ID: <C08DCFAD.B7D2%sdavis2@mail.nih.gov>


On 5/15/06 4:15 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
wrote:

>   
> Hello All
> I have sent a problem to the earlier also but my problem is still unsolve so i
> have modified the problem in another way please can any body give me code to
> make a graph between some items which are in a text file in the following
> formate:
> Example
> item1 interacts with item2 and i want to make graph by giving any item as
> input and asking all interactions of that item.
> 
> item 1      item 2
> A            B
> A            C
> C            B
> D            B
> D            E
> A            F
> G            A   

Not a bioperl answer, but in your case, I would suggest looking at using
cytoscape to do this.  Look here for details:

http://www.cytoscape.org/

Sean


From sdavis2 at mail.nih.gov  Mon May 15 07:03:28 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 15 May 2006 07:03:28 -0400
Subject: [Bioperl-l] please help
In-Reply-To: <C08DCFAD.B7D2%sdavis2@mail.nih.gov>
Message-ID: <C08DD840.B7DE%sdavis2@mail.nih.gov>


On 5/15/06 6:26 AM, "Sean Davis" <sdavis2 at mail.nih.gov> wrote:

> 
> 
> 
> On 5/15/06 4:15 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
> wrote:
> 
>>   
>> Hello All
>> I have sent a problem to the earlier also but my problem is still unsolve so
>> i
>> have modified the problem in another way please can any body give me code to
>> make a graph between some items which are in a text file in the following
>> formate:
>> Example
>> item1 interacts with item2 and i want to make graph by giving any item as
>> input and asking all interactions of that item.
>> 
>> item 1      item 2
>> A            B
>> A            C
>> C            B
>> D            B
>> D            E
>> A            F
>> G            A  
> 
> Not a bioperl answer, but in your case, I would suggest looking at using
> cytoscape to do this.  Look here for details:
> 
> http://www.cytoscape.org/

I forgot to mention, if you are looking for a perl solution, I would look at
the Graph module.

http://search.cpan.org/~jhi/Graph-0.69/lib/Graph.pod

You can create the graph according to the docs and then use the neighbors()
method (if I remember correctly) to get the nodes connected to the query
node.

Sean


From akarger at CGR.Harvard.edu  Mon May 15 08:20:11 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 15 May 2006 08:20:11 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>

This tool is quite nice, and may save me a lot of perdoc'ing.

A couple of minor interface thoughts. 

1)There's quite a lot of methods for many of the classes. As such, I
think I'll often want to browse through what's available in a class. But
60% or so of the screen real estate is used for "Enter a search
string... OR select a class from the list". IMO, it would be better to
have two pages, a search page and a result page.   It only takes a click
on Back (or a "new search" button) to get to a new search, and now you
can use your whole screen for reading your results.

2) Please sort the "select a class from the list" alphabetically. I
guess I can enter a search term to get the right classes, but it would
be nice to be able to browse.
2a) if you want to be really fancy, make a javascript nested menu with
expandable submenus. OK, maybe not.

3) Minimalist is nice, but documentation is even nicer. It wasn't clear
to me that the search searches within class names rather than function
names. What I really want to know sometimes is which module has, say,
the revcom method in it. So, if it's not easy to include that within
this search, then at least tell me what my search space is.

4) When I search for something that's not found, I get a screen that
looks pretty familiar, with the extra text "No match to string found"
down at the bottom. It took me a while to even notice it. (Studies show
that most users don't read most of the text on a page.) Bold might be
nice here. Or put the error at the top of the screen. Or both.

5) I'll save my stupidest comment for last - please make the page title
"Bioperl Deobfuscator", so that when I bookmark it I'll know what the
bookmark stands for.

Thanks, Laura Kavanaugh and David Messina, for a neat AND useful tool.

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626


From sb at mrc-dunn.cam.ac.uk  Mon May 15 09:08:32 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 14:08:32 +0100
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>
References: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>
Message-ID: <44687D50.6080306@mrc-dunn.cam.ac.uk>

Amir Karger wrote:
> This tool is quite nice, and may save me a lot of perdoc'ing.

Yes, many thanks to everyone involved.


> A couple of minor interface thoughts. 
> 
> 1)There's quite a lot of methods for many of the classes. As such, I
> think I'll often want to browse through what's available in a class. But
> 60% or so of the screen real estate is used for "Enter a search
> string... OR select a class from the list". IMO, it would be better to
> have two pages, a search page and a result page.   It only takes a click
> on Back (or a "new search" button) to get to a new search, and now you
> can use your whole screen for reading your results.

As the compromise it must be, I like the way it behaves. I don't like 
lots of windows. I especially don't like pop up windows. Right now when 
I'm using the bioperl docs I tend to have a whole bunch of tabs open to 
different class pages at once, so being able to see an overview all on 
one page in Deobfuscator is very nice.

Further to that, I'd love it if clicking on a method name caused an 
in-place css(&|javascript) reveal (similar to how a well implemented 
drop down menu works in a website) rather than a new window opened. 
Alternatively, just have more columns in the results table, ie. usage, 
function, returns, args columns. I feel that opening a window for each 
method you want to understand is far too slow.

I'd also really like a link to the code for the method as well. The 
bioperl docs are rarely complete enough that you can really understand 
what every method is supposed to do without looking at the code.


> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
> to me that the search searches within class names rather than function
> names. What I really want to know sometimes is which module has, say,
> the revcom method in it.

This would be a great feature to add.


Another minor interface thought:
6) Have a little more cell padding in all the tables. Things are just a 
little too cramped and things start to look messy/ run into each other.


From cjfields at uiuc.edu  Mon May 15 09:59:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 08:59:57 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <44687D50.6080306@mrc-dunn.cam.ac.uk>
Message-ID: <000901c67827$d99eabb0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, May 15, 2006 8:09 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Amir Karger wrote:
> > This tool is quite nice, and may save me a lot of perdoc'ing.
> 
> Yes, many thanks to everyone involved.

The Deobfuscator currently indexes bioperl-1.4, so it's not completely
up-to-date.  I believe Mauricio and Dave may be working on updating to the
newer versions and maybe bioperl-live, as well as getting the other bioperl
packages up and running.

For modules added after v1.4 I use the script in the FAQ question mentioned
on the Deobfuscator wiki page to get up-to-date methods, then grab the that
ActiveState HTML'd perldocs pumped out when installing using PPM (I make a
custom PPM/PPD file and install myself every once in a while):

#!/usr/bin/perl -w
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector-

> > A couple of minor interface thoughts.
> >
> > 1)There's quite a lot of methods for many of the classes. As such, I
> > think I'll often want to browse through what's available in a class. But
> > 60% or so of the screen real estate is used for "Enter a search
> > string... OR select a class from the list". IMO, it would be better to
> > have two pages, a search page and a result page.   It only takes a click
> > on Back (or a "new search" button) to get to a new search, and now you
> > can use your whole screen for reading your results.
> 
> As the compromise it must be, I like the way it behaves. I don't like
> lots of windows. I especially don't like pop up windows. Right now when
> I'm using the bioperl docs I tend to have a whole bunch of tabs open to
> different class pages at once, so being able to see an overview all on
> one page in Deobfuscator is very nice.
>
> Further to that, I'd love it if clicking on a method name caused an
> in-place css(&|javascript) reveal (similar to how a well implemented
> drop down menu works in a website) rather than a new window opened.
> Alternatively, just have more columns in the results table, ie. usage,
> function, returns, args columns. I feel that opening a window for each
> method you want to understand is far too slow.

Agreed.

> I'd also really like a link to the code for the method as well. The
> bioperl docs are rarely complete enough that you can really understand
> what every method is supposed to do without looking at the code.

The methods that pop up are in columns along with the class module that
implements the method.  


If you click on that link you get PDOC documentation for the module which
includes most of the code (strangely, though Deobfuscator indexes bioperl
1.4, the PDOC corresponds to bioperl-live).  Is that what you meant, or
something a bit more detailed?

> > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
> > to me that the search searches within class names rather than function
> > names. What I really want to know sometimes is which module has, say,
> > the revcom method in it.

That's listed in the method results table (the next column has the module
with a link to the module's online docs).


Chris


> This would be a great feature to add.
> 
> 
> Another minor interface thought:
> 6) Have a little more cell padding in all the tables. Things are just a
> little too cramped and things start to look messy/ run into each other.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 12:08:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 11:08:30 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <44683943.5020307@mrc-dunn.cam.ac.uk>
Message-ID: <001601c67839$cf289490$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, May 15, 2006 3:18 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
> subspecies/variant names
> 
> Chris Fields wrote:
> > Sendu Bala wrote:
> >> In bioperl up to at least 1.5.1, when one of the database modules
> >> comes across a species rank it does:
> >>
> >> if ($rank eq 'species') { # get rid of genus from species name
> >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); }
> >
> > The XML example from NCBI Taxonomy I mentioned previously seems to
> > have everything in the classification, from superkingdom down to
> > species (no strain unfortunately, and I'm nit sure about subspecies);
> > if it's missing the rank then the designation doesn't exist or is
> > tagged as 'no rank'.  Like I mentioned before I'm not intimately
> > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I
> > don't have a clue as to how everything is parsed and plugged in to
> > Bio::Taxonomy objects.  I do know that XML::Twig is used for parsing
> > through the data so it shouldn't be too hard to change what you
> > want.
> 
> Yes, that's all true, but I'm not sure what it has to do with what I was
> saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my
> own implementation I change the rank of all 'no rank' Nodes below
> species to 'variant'.

Sorry; wandered a bit off topic there.

> > I haven't tried using Bio::DB::Taxonomy directly yet, but I would
> > have thought that the binomial is just built from the XML twig
> > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the
> > tag 'Genus' and species from 'Species', and that the scientific name
> > is from the tag 'ScientificName'.  Guess not.
> 
> No. See above for what it actually does. That is a copy/paste from the
> code (there, $taxon_name == ScientificName). When it finds a species
> rank it does that split because in the
> ncbi taxonomy database the 'genus' rank for a human has a ScientificName
> of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo
> sapiens', and the bioperl model (quite rightly, I think) wants the
> 'species' node to not have information of other nodes (well, except for
> the classification array). So it removes the 'Homo' from 'Homo sapiens'
> giving a species name of 'sapiens'. This then allows the binomial method
> to return 'Homo sapiens' instead of 'Homo Homo sapiens'.
> 
> (though in a bizarre twist, and this is one of my problems with how
> names are currently represented in the Taxonomy modules, 'Scientific
> Name' and 'binomial' are synonymous)
 
Ah, now I see.  That's a bit screwy, but it's not on our end so we have to
deal with it.  I also noticed that subspecies also contains the entire
string:

    <Taxon>
      <TaxId>135461</TaxId>
      <ScientificName>Bacillus subtilis subsp. subtilis</ScientificName>
      <Rank>subspecies</Rank>
    </Taxon>

As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy,
I don't get the actual scientific name for the node (from the GenBank
ORGANISM line) almost every time; I get the name with the strain chopped off
instead and a number of times the names get mangled.  The regexes below only
grab from the topmost tags:

Script:
---------------------------------
#! perl
use strict;
use warnings;

use Bio::DB::Taxonomy;
my $file = shift @ARGV;

print "\nNCBI XML output ScientificName tag for each node:\n";
my @taxid =();
open (TAXFILE, "<tax.xml") or die "Can't open file:$!\n";
while (<TAXFILE>){
	if (/^\s{2}<TaxId>(\d+)<\/TaxId>/) {
		print "$1\t";
		push @taxid, $1;
	}
	print "$1\n" if /^\s{2}<ScientificName>(.*)<\/ScientificName>/;
}
close TAXFILE;

print "\nBio::DB::Taxonomy scientific_name:\n";
for my $id (@taxid){
	my $factory = Bio::DB::Taxonomy->new(-source => 'entrez');
	my $node = $factory->get_Taxonomy_Node(-taxonid => $id);
	print $node->ncbi_taxid,"\t",$node->scientific_name,"\n";
}
---------------------------------

Output:
---------------------------------
NCBI XML output ScientificName tag for each node:
191218  Bacillus anthracis str. A2012
198094  Bacillus anthracis str. Ames
222523  Bacillus cereus ATCC 10987
224308  Bacillus subtilis subsp. subtilis str. 168
226186  Bacteroides thetaiotaomicron VPI-5482
226900  Bacillus cereus ATCC 14579
246194  Carboxydothermus hydrogenoformans Z-2901
260799  Bacillus anthracis str. Sterne
261594  Bacillus anthracis str. 'Ames Ancestor'
264462  Bdellovibrio bacteriovorus HD100
272558  Bacillus halodurans C-125
272559  Bacteroides fragilis NCTC 9343
279010  Bacillus licheniformis ATCC 14580
281309  Bacillus thuringiensis serovar konkukian str. 97-27
288681  Bacillus cereus E33L
295405  Bacteroides fragilis YCH46
66692   Bacillus clausii KSM-K16
76114   Azoarcus sp. EbN1

Bio::DB::Taxonomy scientific_name:
191218  Bacillus cereus group anthracis
198094  Bacillus cereus group anthracis
222523  Bacillus cereus group cereus
224308  subtilis Bacillus subtilis subsp. subtilis
226186  Bacteroides thetaiotaomicron
226900  Bacillus cereus group cereus
246194  Carboxydothermus hydrogenoformans
260799  Bacillus cereus group anthracis
261594  Bacillus cereus group anthracis
264462  Bdellovibrio bacteriovorus
272558  Bacillus halodurans
272559  Bacteroides fragilis
279010  Bacillus licheniformis
281309  Bacillus cereus group thuringiensis
288681  Bacillus cereus group cereus
295405  Bacteroides fragilis
66692   Bacillus clausii
76114   Azoarcus sp.
---------------------------------
Note Bacillus subtilis in the Bio::Tax output above.  Not one of those is
the scientific name as defined by NCBI (and most taxonomists for that
matter).

So, in a nutshell, there's a problem here.  I don't know if your fix works
for that, but I definitely don't think the 'scientific name' should be
assembled ad hoc but should be taken from the tagname for that node.  I am
currently reduced to grabbing the feature primary_tagged 'source' and
getting the 'organism' tagname from that.  I cannot stress enough that it
should NOT be that way.

As for 'binomial' == 'scientific_name', I agree; I see it as well and that
should be fixed.
 
...
> Perhaps, but again I'm not sure what this has to do with what I was
> saying. If you don't want your species name to contain your genus name
> you have to do some kind of parsing. My post merely pointed out that the
> parsing currently in bioperl does not work for viruses and possibly
> other species. I'd like to think that someone cares about this error and
> would do the simple fix I offered, or that they already know about the
> problem and have done their own fix.

Again me going off-topic, so my apologies; it's more to do with my
frustrations with Bio::Species (not Bio::DB::Taxonomy).  My point here was,
since there is no real way to surmise from a GenBank flatfile what the
taxonomic ranks are w/o guessing (which seems to break more often than not
when dealing with complex names), there shouldn't be any tie to Bio::Tax
objects, at least directly.  I guess methods could be incorporated into
Bio::Species for those who want to give it a try, but I would like to get a
GenBank file, for once, in which the scientific name/binomial name isn't
mangled by Bio::Species.

Back to Bio::DB::Taxonomy; I don't have a problem with implementing your
methods here; on the contrary, if they fix my problem above then I'll be
more than glad to.  I can't get to it immediately but maybe later
today/tomorrow.
 
> > I'm also not sure that forcing a lookup for every TaxID in every
> > sequence every time it's passed through SeqIO is the best way to go
> > either, though I think it should be required for storing sequences.
> > It's a tricky balance.
> 
> In my own implementation any database lookups are cached, and you have
> the option of not doing any database lookup at all and 'faking' a
> taxonomy from the supplied list of names (so it works just like normal
> Bio::Seq).
>
> 
> > I still think that maybe we should absolve ourselves from using
> > SOURCE/ORGANISM or OS/OC information in GenBank files as anything
> > more than strictly annotation, or reconstruct Bio::Species to maybe a
> >  Bio::Annotation::Species object to handle that annotation and either
> >  deprecate Bio::Species or separate it completely from any
> > Bio::Taxonomy objects.  It would really simplify things.  Then, if
> > anyone is interested in taxonomy, either install a local database or
> >  use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course)
> >  to grab the TaxID info.
> 
> My personal view is that having it as an annotation would serve no real
> purpose. For me the whole point of any kind of species representation in
> bioperl is to allow you to compare species in a biologically meaningful
> way. If it's just some annotation then that means it's basically
> free-form text and you have no guarantee that two sequences from the
> same species are annotated exactly the same - no guarantee that your
> code would identify that those sequences are from the same species.
> The only other useful thing that a species object needs to do it let you
> know how related two different species are - you need to be able to ask
> what a species' class, kingdom etc. are. Again, not viable with an
> annotation - you need something strict like a properly constructed
> Taxonomy.

My point is, a large number of users do NOT use, nor care about, taxonomic
information to the degree they need to know the entire classification of the
organism; many are just as happy about getting the scientific name only,
which is in the GenBank/EMBL file itself.  To take one extreme, it is not
productive to force every user to download the NCBI tax database and use
lookups just to convert sequences from EMBL format to GenBank format.  It's
not productive to allow users to spam the NCBI tax database remotely either,
so hardcoding lookups is, IMHO, a big mistake.  

> I guess it comes down to the philosophy of parsing a file. Do you try
> and reflect exactly what the file contains, letter for letter, so that
> your resulting object can recreate that file letter for letter, or do
> you parse the file and extract the correct /meaning/ in order to be more
> useful?
> I think there can be a choice by the user, and this is best done by
> making Bio::Species a clever wrapper around an improved Bio::Taxonomy,
> as in my own implementation.

I understand both philosophies, but the latter implies that you know the
intention of the ones submitting the sequence.  99.9% of the time that's
fine, something I can live with.  However, when we mess up something as
simple as getting the scientific name for an organism when the information
is directly in the flat file (ORGANISM line) by trying to 'imply' what the
classification is, yes, I get frustrated.  Even more frustrating to me is
that Bio::DB::Taxonomy, which should return accurate information directly
from the Taxonomy database, still manages to screw up the scientific name.  

The NCBI definition in the sample record:

http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

state that the ORGANISM line contains the formal scientific name and it's
lineage (no ranking).  If the lineage is very long it is abbreviated so you
don't get the same thing as you would through using TaxID. 

So, in essence, I believe you are correct, that Bio::Species can be used as
a 'wrapper' for Bio::Taxonomy objects, but only up to a certain degree with
caveats or warnings for possible inaccuracies.  I also believe that lookups
should be allowed but optional, not required (i.e. left up to the user, as
you state).  

I just feel that it's somewhat misleading to imply, by delegating to
Bio::Taxonomy, that Bio::Species contains accurate taxonomic information
when NCBI themselves state that the GenBank flatfile classification can be
incomplete and does not supply rankings (genus, species) in the file.  It's
our best guess in most cases, and a best guess by definition is not very
accurate.  If you want taxonomic accuracy, use the TaxID and a local tax
database.  I feel that we shouldn't punish those who don't worry/care about
taxonomy by implementing Bio::Species with methods that mangle data that's
directly in the flat file they're parsing.

Okay, not to cut short this discussion, but I have to get back to $job.
I'll try adding your fixes in a bit later today/tomorrow; if they pass tests
I'll commit them in.

Chris


From hlapp at gmx.net  Mon May 15 12:59:06 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 12:59:06 -0400
Subject: [Bioperl-l] error loading uniprot release 49.6 into mysql
In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
Message-ID: <C78E4724-CC95-483E-876B-69AF7C1CC6AF@gmx.net>

You found the right instance. Unfortunately with the way the bioperl  
swissprot parser works the group (RG) isn't promoted to author if  
there is no author in addition (in fact you may debate whether that  
would even be the best way of doing things), so it doesn't find it on  
second occurrence by unique key.

If you can live without this entry, or any other entry that causes a  
hiccup, just supply the flag --safe and it will gracefully move on to  
the next entry.

Fixing the issue would require either to fix the bioperl swissprot  
parser (or Bio::Annotation::Reference) to stick the RG group into the  
author slot if there is no author, or to fix Bioperl  
Bio::Annotation::Reference to also feature a group and biosql to use  
it in place of a missing author.

Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql)  
should just use that in place of a missing author?

The downside is that upon round-tripping an entry, the RG annotation  
line will become an RA annotation line. How bad would that be?

Any thoughts from anyone?

	-hilmar

On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote:

> I found where the script is hiccuping....
>
> The Uniprot release contains lines with identical annotation for  
> the RL keyword for two different sequences.
>
> ___________________
>
> First occurence...
> ___________________
>
> ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
> AC   Q5RFJ2; Q5RDK2;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein theta.
> GN   Name=YWHAQ;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Brain cortex, and Kidney;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.   
> <======  Not Unique
>
>
> ___________________
>
> Second occurence...
> ___________________
>
>
> ID   1433G_PONPY    STANDARD;      PRT;   246 AA.
> AC   Q5RC20;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein gamma.
> GN   Name=YWHAG;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Heart;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.    
> <======  Not Unique
>
>
>
> in these two cases the generated CRC key is identical and so MySQL  
> throws a wobbly.
>
> if i look at the MySQL entry in the REFERENCE table for the first  
> sequence
> ------+-------+---------+----------------------+
> |          139 |      NULL | Submitted (NOV-2004) to the EMBL/ 
> GenBank/DDBJ databases. | NULL  | NULL    | CRC-E7973FEA4B5611DC |
> +--------------+----------- 
> +----------------------------------------------------
>
> and the error when the script choked was
>
>  MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
> values were
>  ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ
>  databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
>  Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3
>
> hence the problem.
>
> I'm guessing i'm not the first person to encounter this, but dont  
> see any hints for an easy way around this.
>
> any suggestions....?
>
> ta
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon May 15 13:01:14 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 13:01:14 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <4466AD7F.6050700@campus.iztacala.unam.mx>
References: <4466AD7F.6050700@campus.iztacala.unam.mx>
Message-ID: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net>

Hey, thanks to Laura & David for this interface.

Any idea why most of the Bio::Ontology::* modules show up without  
their leading Bio::Ontology? And clicking on those hyperlinks doesn't  
go anywhere either ... Anything different with those modules that I  
can fix?

	-hilmar

On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:

> I'm glad to announce the availability of the Deobfuscator interface at
> the BioPerl website. You can use it at the following URL:
>
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> Many thanks to Laura Kavanaugh and David Messina for this great
> contribution to the BioPerl project!
>
> Mauricio.
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 13:22:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 12:22:13 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net>
Message-ID: <000301c67844$1b506280$15327e82@pyrimidine>

That's strange.  Clicking on the list gives me the results for that module.
When I click on the hyperlinks in the results section they open fine; the
method column links opens a new page containing usage-function-returns-args
and the class column links opens pdoc (same page) for bioperl-live.  I'm
using Firefox 1.5 on WinXP.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 12:01 PM
> To: Mauricio Herrera Cuadra
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Hey, thanks to Laura & David for this interface.
> 
> Any idea why most of the Bio::Ontology::* modules show up without
> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> go anywhere either ... Anything different with those modules that I
> can fix?
> 
> 	-hilmar
> 
> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> 
> > I'm glad to announce the availability of the Deobfuscator interface at
> > the BioPerl website. You can use it at the following URL:
> >
> > http://bioperl.org/cgi-bin/deob_interface.cgi
> >
> > Many thanks to Laura Kavanaugh and David Messina for this great
> > contribution to the BioPerl project!
> >
> > Mauricio.
> >
> > --
> > MAURICIO HERRERA CUADRA
> > arareko at campus.iztacala.unam.mx
> > Laboratorio de Gen?tica
> > Unidad de Morfofisiolog?a y Funci?n
> > Facultad de Estudios Superiores Iztacala, UNAM
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Mon May 15 14:00:15 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 19:00:15 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
 subspecies/variant names
In-Reply-To: <001601c67839$cf289490$15327e82@pyrimidine>
References: <001601c67839$cf289490$15327e82@pyrimidine>
Message-ID: <4468C1AF.9080400@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Ah, now I see.  That's a bit screwy, but it's not on our end so we have to
> deal with it.  I also noticed that subspecies also contains the entire
> string:
> 
>     <Taxon>
>       <TaxId>135461</TaxId>
>       <ScientificName>Bacillus subtilis subsp. subtilis</ScientificName>
>       <Rank>subspecies</Rank>
>     </Taxon>

Yes, this is one of the problems I mentioned in the first post to this
thread.


> As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy,
> I don't get the actual scientific name for the node (from the GenBank
> ORGANISM line) almost every time; I get the name with the strain chopped off
> instead and a number of times the names get mangled.

[snip, should be:]
> 224308  Bacillus subtilis subsp. subtilis str. 168
> 281309  Bacillus thuringiensis serovar konkukian str. 97-27

[snip, but Bio::DB::Taxonomy gives:]
> 224308  subtilis Bacillus subtilis subsp. subtilis
> 281309  Bacillus cereus group thuringiensis

[snip]
> So, in a nutshell, there's a problem here.  I don't know if your fix works
> for that, but I definitely don't think the 'scientific name' should be
> assembled ad hoc but should be taken from the tagname for that node.

Yes, my implementation will get you the correct answer, but not quite as
you say. My solution was to munge the actual ScientificName but 'ensure'
that the binomial would give you back the actual binomial name you
wanted - which is the intent of current Bio::DB::Taxonomy code.

my $species0 = TFBS::Species->new(-ncbi_taxid => 224308);
my $leaf_node = $species0->taxonomy->get_leaves();
print "sci_name of Node = '", $leaf_node->scientific_name, "'\n";
print "Species0 subspecies = '", $species0->subspecies, "'\n";
print "Species0 variants = '", scalar($species0->variant), "'\n";
print "Species0 binomial = '", $species0->binomial('FULL'), "'\n";

gives:
sci_name of Node = 'str. 168'
Species0 subspecies = 'subsp. subtilis'
Species0 variants = 'str. 168'
Species0 binomial = 'Bacillus subtilis subsp. subtilis str. 168'

and the same again for id 281309:

sci_name of Node = 'str. 97-27'
Species0 subspecies = ''
Species0 variants = 'serovar konkukian str. 97-27'
Species0 binomial = 'Bacillus thuringiensis serovar konkukian str. 97-27'

I've done it this way because even though strictly speaking the
ScientificName for 224308 (a 'no rank') is 'Bacillus subtilis subsp.
subtilis str. 168', when I ask for the variant I don't want that whole
string. I just want the bit that will be different when comparing other
strains of this subspecies of this species of Bacillus. I want 'str.
168'. Note that my objects never store the original ScientificName; it
is due to 'luck' (or as I like to think, a good implementation) that the
binomial method is able to reconstruct a string that is identical to
what the original ScientificName was.

If you'd like to see my code let me know. You can't just drop the code
snippet I posted in this thread into existing bioperl modules; quite a
bit else has to change as well. I'll have to make an updated
taxonomy_the_tfbs_way.tar.gz file available if you want an example
implementation; the current version of that file is now out of date - it
doesn't do any of what I describe above.


From hlapp at gmx.net  Mon May 15 14:08:49 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 14:08:49 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000301c67844$1b506280$15327e82@pyrimidine>
References: <000301c67844$1b506280$15327e82@pyrimidine>
Message-ID: <F85F6F46-3AB7-4D42-825B-BAD4CA748FC8@gmx.net>

Safari or Firefox on MacOSX don't do this. Note that the appearance  
in the browsable list is already different (the prefix is missing),  
and the JavaScript link also lacks the prefix in the module name in  
contrast to others, e.g., Bio::Ontology::Ontology (which is one of  
the few Bio::Ontology exceptions that do work and do display correctly).

I suppose there is something peculiar about the code formatting of  
those modules? Some of the modules under Bio::OntologyIO are also  
affected BTW.

What happens is after you click on the link the page apppears to  
reload (i.e., gets submitted) but the second table that is supposed  
open underneath the first doesn't appear. However, the sort-by drop  
down selector does appear.

	-hilmar

On May 15, 2006, at 1:22 PM, Chris Fields wrote:

> That's strange.  Clicking on the list gives me the results for that  
> module.
> When I click on the hyperlinks in the results section they open  
> fine; the
> method column links opens a new page containing usage-function- 
> returns-args
> and the class column links opens pdoc (same page) for bioperl- 
> live.  I'm
> using Firefox 1.5 on WinXP.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Monday, May 15, 2006 12:01 PM
>> To: Mauricio Herrera Cuadra
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Hey, thanks to Laura & David for this interface.
>>
>> Any idea why most of the Bio::Ontology::* modules show up without
>> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
>> go anywhere either ... Anything different with those modules that I
>> can fix?
>>
>> 	-hilmar
>>
>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>
>>> I'm glad to announce the availability of the Deobfuscator  
>>> interface at
>>> the BioPerl website. You can use it at the following URL:
>>>
>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>
>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>> contribution to the BioPerl project!
>>>
>>> Mauricio.
>>>
>>> --
>>> MAURICIO HERRERA CUADRA
>>> arareko at campus.iztacala.unam.mx
>>> Laboratorio de Gen?tica
>>> Unidad de Morfofisiolog?a y Funci?n
>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 15:07:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 14:07:59 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <F85F6F46-3AB7-4D42-825B-BAD4CA748FC8@gmx.net>
Message-ID: <000501c67852$e1bb55c0$15327e82@pyrimidine>

I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab
which I can try it on).  I'll let you know what I find.  

This is what I get when I do a search for 'Bio::Ont*' using Firefox on WinXP
and this Deobfuscator link (http://bioperl.org/cgi-bin/deob_interface.cgi?);
all the classes have links that work (I added newline and tab to make it a
bit more readable) :

Bio::OntologyIO	
	Parser factory for Ontology formats
Bio::OntologyIO::Handlers::BaseSAXHandler	
	no short description available
Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
	no short description available
Bio::Ontology::OntologyI
	Interface for an ontology implementation
Bio::Ontology::TermFactory
	Instantiates a new Bio::Ontology::TermI (or derived class) through a
factory
Bio::Ontology::OntologyStore
	A repository of ontologies
Bio::Ontology::RelationshipFactory
	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
through a factory
Bio::Ontology::Ontology
	standard implementation of an Ontology

So the names seem fine here.

When I click on a class (Bio::Ontology::Ontology) I get in the results
section:

Method                  Class                                     Returns
Usage
add_relationship        Bio::Ontology::Ontology	                  Its
argument.     add_relationship(RelationshipI relationship): RelationshipI
add_relationship_type   Bio::Ontology::OntologyEngineI            not
documented    not documented
add_term                Bio::Ontology::Ontology                   its
argument.     add_term(TermI term): TermI

....and so on

Where each method is clickable and opens a new page containing a table:

Bio::Ontology::Ontology::add_relationship
Usage	add_relationship(RelationshipI relationship): RelationshipI
Function	Adds a relationship object to the ontology engine.
Returns	Its argument.
Args	A RelationshipI object.


Each class is also linked to the bioperl-live PDOC.  Clicking on class
Bio::Ontology::Ontology in the results table gets me this page (no new
page):

http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html


Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Monday, May 15, 2006 1:09 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Safari or Firefox on MacOSX don't do this. Note that the appearance
> in the browsable list is already different (the prefix is missing),
> and the JavaScript link also lacks the prefix in the module name in
> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> the few Bio::Ontology exceptions that do work and do display correctly).
> 
> I suppose there is something peculiar about the code formatting of
> those modules? Some of the modules under Bio::OntologyIO are also
> affected BTW.
> 
> What happens is after you click on the link the page apppears to
> reload (i.e., gets submitted) but the second table that is supposed
> open underneath the first doesn't appear. However, the sort-by drop
> down selector does appear.
> 
> 	-hilmar
> 
> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> 
> > That's strange.  Clicking on the list gives me the results for that
> > module.
> > When I click on the hyperlinks in the results section they open
> > fine; the
> > method column links opens a new page containing usage-function-
> > returns-args
> > and the class column links opens pdoc (same page) for bioperl-
> > live.  I'm
> > using Firefox 1.5 on WinXP.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >> Sent: Monday, May 15, 2006 12:01 PM
> >> To: Mauricio Herrera Cuadra
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> Hey, thanks to Laura & David for this interface.
> >>
> >> Any idea why most of the Bio::Ontology::* modules show up without
> >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> >> go anywhere either ... Anything different with those modules that I
> >> can fix?
> >>
> >> 	-hilmar
> >>
> >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>
> >>> I'm glad to announce the availability of the Deobfuscator
> >>> interface at
> >>> the BioPerl website. You can use it at the following URL:
> >>>
> >>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>
> >>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>> contribution to the BioPerl project!
> >>>
> >>> Mauricio.
> >>>
> >>> --
> >>> MAURICIO HERRERA CUADRA
> >>> arareko at campus.iztacala.unam.mx
> >>> Laboratorio de Gen?tica
> >>> Unidad de Morfofisiolog?a y Funci?n
> >>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From cjfields at uiuc.edu  Mon May 15 15:12:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 14:12:34 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <000601c67853$85d49cc0$15327e82@pyrimidine>

I just tried the same thing (links, search, etc) with Mac OS X v 10.3.9 and
Safari (no Firefox sorry) and it worked fine as well (all links, no missing
Bio::Ontology, etc).  Not sure what it could be...

Chris

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Monday, May 15, 2006 2:08 PM
> To: 'Hilmar Lapp'
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: RE: [Bioperl-l] Deobfuscator interface now available
> 
> I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab
> which I can try it on).  I'll let you know what I find.
> 
> This is what I get when I do a search for 'Bio::Ont*' using Firefox on
> WinXP and this Deobfuscator link (http://bioperl.org/cgi-
> bin/deob_interface.cgi?); all the classes have links that work (I added
> newline and tab to make it a bit more readable) :
> 
> Bio::OntologyIO
> 	Parser factory for Ontology formats
> Bio::OntologyIO::Handlers::BaseSAXHandler
> 	no short description available
> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> 	no short description available
> Bio::Ontology::OntologyI
> 	Interface for an ontology implementation
> Bio::Ontology::TermFactory
> 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> factory
> Bio::Ontology::OntologyStore
> 	A repository of ontologies
> Bio::Ontology::RelationshipFactory
> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> through a factory
> Bio::Ontology::Ontology
> 	standard implementation of an Ontology
> 
> So the names seem fine here.
> 
> When I click on a class (Bio::Ontology::Ontology) I get in the results
> section:
> 
> Method                  Class                                     Returns
> Usage
> add_relationship        Bio::Ontology::Ontology
Its
> argument.     add_relationship(RelationshipI relationship): RelationshipI
> add_relationship_type   Bio::Ontology::OntologyEngineI            not
> documented    not documented
> add_term                Bio::Ontology::Ontology                   its
> argument.     add_term(TermI term): TermI
> 
> ....and so on
> 
> Where each method is clickable and opens a new page containing a table:
> 
> Bio::Ontology::Ontology::add_relationship
> Usage	add_relationship(RelationshipI relationship): RelationshipI
> Function	Adds a relationship object to the ontology engine.
> Returns	Its argument.
> Args	A RelationshipI object.
> 
> 
> Each class is also linked to the bioperl-live PDOC.  Clicking on class
> Bio::Ontology::Ontology in the results table gets me this page (no new
> page):
> 
> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> 
> 
> Chris
> 
> > -----Original Message-----
> > From: Hilmar Lapp [mailto:hlapp at gmx.net]
> > Sent: Monday, May 15, 2006 1:09 PM
> > To: Chris Fields
> > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> > Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >
> > Safari or Firefox on MacOSX don't do this. Note that the appearance
> > in the browsable list is already different (the prefix is missing),
> > and the JavaScript link also lacks the prefix in the module name in
> > contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> > the few Bio::Ontology exceptions that do work and do display correctly).
> >
> > I suppose there is something peculiar about the code formatting of
> > those modules? Some of the modules under Bio::OntologyIO are also
> > affected BTW.
> >
> > What happens is after you click on the link the page apppears to
> > reload (i.e., gets submitted) but the second table that is supposed
> > open underneath the first doesn't appear. However, the sort-by drop
> > down selector does appear.
> >
> > 	-hilmar
> >
> > On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >
> > > That's strange.  Clicking on the list gives me the results for that
> > > module.
> > > When I click on the hyperlinks in the results section they open
> > > fine; the
> > > method column links opens a new page containing usage-function-
> > > returns-args
> > > and the class column links opens pdoc (same page) for bioperl-
> > > live.  I'm
> > > using Firefox 1.5 on WinXP.
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> > >> Sent: Monday, May 15, 2006 12:01 PM
> > >> To: Mauricio Herrera Cuadra
> > >> Cc: bioperl-l
> > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> > >>
> > >> Hey, thanks to Laura & David for this interface.
> > >>
> > >> Any idea why most of the Bio::Ontology::* modules show up without
> > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> > >> go anywhere either ... Anything different with those modules that I
> > >> can fix?
> > >>
> > >> 	-hilmar
> > >>
> > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> > >>
> > >>> I'm glad to announce the availability of the Deobfuscator
> > >>> interface at
> > >>> the BioPerl website. You can use it at the following URL:
> > >>>
> > >>> http://bioperl.org/cgi-bin/deob_interface.cgi
> > >>>
> > >>> Many thanks to Laura Kavanaugh and David Messina for this great
> > >>> contribution to the BioPerl project!
> > >>>
> > >>> Mauricio.
> > >>>
> > >>> --
> > >>> MAURICIO HERRERA CUADRA
> > >>> arareko at campus.iztacala.unam.mx
> > >>> Laboratorio de Gen?tica
> > >>> Unidad de Morfofisiolog?a y Funci?n
> > >>> Facultad de Estudios Superiores Iztacala, UNAM
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>
> > >>
> > >> --
> > >> ===========================================================
> > >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > >> ===========================================================
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >


From arareko at campus.iztacala.unam.mx  Mon May 15 15:20:10 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 15 May 2006 14:20:10 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
Message-ID: <4468D46A.8070203@campus.iztacala.unam.mx>

Laura and Dave would be very happy to see all of your 
comments/suggestions/enhancements/complaints summarized in the 
appropriate wiki page. Just be sure to sign them properly with your name 
and date:

http://bioperl.org/wiki/Deobfuscator

I think they'll have to discuss which features will be nice to implement 
and which don't, depending on the direction they want their project to 
go. But don't worry, they're extremely nice people who are open to all 
kind of ideas. The best of all: the Deobfuscator is open-source so 
everyone is invited to contribute to it, just ask them for the code :)

On my side, I'm working on tweaking the code so it would be able of 
browsing different BioPerl packages (core, run, ext) and their 
respective releases (stable, developer, cvs).

Regards,
Mauricio.

Chris Fields wrote:
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>> Sent: Monday, May 15, 2006 8:09 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Amir Karger wrote:
>>> This tool is quite nice, and may save me a lot of perdoc'ing.
>> Yes, many thanks to everyone involved.
> 
> The Deobfuscator currently indexes bioperl-1.4, so it's not completely
> up-to-date.  I believe Mauricio and Dave may be working on updating to the
> newer versions and maybe bioperl-live, as well as getting the other bioperl
> packages up and running.
> 
> For modules added after v1.4 I use the script in the FAQ question mentioned
> on the Deobfuscator wiki page to get up-to-date methods, then grab the that
> ActiveState HTML'd perldocs pumped out when installing using PPM (I make a
> custom PPM/PPD file and install myself every once in a while):
> 
> #!/usr/bin/perl -w
> use Class::Inspector;
> $class = shift || die "Usage: methods perl_class_name\n";
> eval "require $class";
> print join ("\n", sort @{Class::Inspector-
> 
>>> A couple of minor interface thoughts.
>>>
>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>> think I'll often want to browse through what's available in a class. But
>>> 60% or so of the screen real estate is used for "Enter a search
>>> string... OR select a class from the list". IMO, it would be better to
>>> have two pages, a search page and a result page.   It only takes a click
>>> on Back (or a "new search" button) to get to a new search, and now you
>>> can use your whole screen for reading your results.
>> As the compromise it must be, I like the way it behaves. I don't like
>> lots of windows. I especially don't like pop up windows. Right now when
>> I'm using the bioperl docs I tend to have a whole bunch of tabs open to
>> different class pages at once, so being able to see an overview all on
>> one page in Deobfuscator is very nice.
>>
>> Further to that, I'd love it if clicking on a method name caused an
>> in-place css(&|javascript) reveal (similar to how a well implemented
>> drop down menu works in a website) rather than a new window opened.
>> Alternatively, just have more columns in the results table, ie. usage,
>> function, returns, args columns. I feel that opening a window for each
>> method you want to understand is far too slow.
> 
> Agreed.
> 
>> I'd also really like a link to the code for the method as well. The
>> bioperl docs are rarely complete enough that you can really understand
>> what every method is supposed to do without looking at the code.
> 
> The methods that pop up are in columns along with the class module that
> implements the method.  
> 
> 
> If you click on that link you get PDOC documentation for the module which
> includes most of the code (strangely, though Deobfuscator indexes bioperl
> 1.4, the PDOC corresponds to bioperl-live).  Is that what you meant, or
> something a bit more detailed?
> 
>>> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
>>> to me that the search searches within class names rather than function
>>> names. What I really want to know sometimes is which module has, say,
>>> the revcom method in it.
> 
> That's listed in the method results table (the next column has the module
> with a link to the module's online docs).
> 
> 
> Chris
> 
> 
>> This would be a great feature to add.
>>
>>
>> Another minor interface thought:
>> 6) Have a little more cell padding in all the tables. Things are just a
>> little too cramped and things start to look messy/ run into each other.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at gmx.net  Mon May 15 15:23:55 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 15:23:55 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000501c67852$e1bb55c0$15327e82@pyrimidine>
References: <000501c67852$e1bb55c0$15327e82@pyrimidine>
Message-ID: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net>

I wasn't using the search. It's in the scrollable table for browsing.  
-hilmar

On May 15, 2006, at 3:07 PM, Chris Fields wrote:

> I'll have to give it a try on Mac OS X (we have an ancient G4 in  
> the lab
> which I can try it on).  I'll let you know what I find.
>
> This is what I get when I do a search for 'Bio::Ont*' using Firefox  
> on WinXP
> and this Deobfuscator link (http://bioperl.org/cgi-bin/ 
> deob_interface.cgi?);
> all the classes have links that work (I added newline and tab to  
> make it a
> bit more readable) :
>
> Bio::OntologyIO	
> 	Parser factory for Ontology formats
> Bio::OntologyIO::Handlers::BaseSAXHandler	
> 	no short description available
> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> 	no short description available
> Bio::Ontology::OntologyI
> 	Interface for an ontology implementation
> Bio::Ontology::TermFactory
> 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> factory
> Bio::Ontology::OntologyStore
> 	A repository of ontologies
> Bio::Ontology::RelationshipFactory
> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> through a factory
> Bio::Ontology::Ontology
> 	standard implementation of an Ontology
>
> So the names seem fine here.
>
> When I click on a class (Bio::Ontology::Ontology) I get in the results
> section:
>
> Method                  Class                                      
> Returns
> Usage
> add_relationship        Bio::Ontology::Ontology	                  Its
> argument.     add_relationship(RelationshipI relationship):  
> RelationshipI
> add_relationship_type   Bio::Ontology::OntologyEngineI            not
> documented    not documented
> add_term                Bio::Ontology::Ontology                   its
> argument.     add_term(TermI term): TermI
>
> ....and so on
>
> Where each method is clickable and opens a new page containing a  
> table:
>
> Bio::Ontology::Ontology::add_relationship
> Usage	add_relationship(RelationshipI relationship): RelationshipI
> Function	Adds a relationship object to the ontology engine.
> Returns	Its argument.
> Args	A RelationshipI object.
>
>
> Each class is also linked to the bioperl-live PDOC.  Clicking on class
> Bio::Ontology::Ontology in the results table gets me this page (no new
> page):
>
> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
>
>
> Chris
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Monday, May 15, 2006 1:09 PM
>> To: Chris Fields
>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Safari or Firefox on MacOSX don't do this. Note that the appearance
>> in the browsable list is already different (the prefix is missing),
>> and the JavaScript link also lacks the prefix in the module name in
>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
>> the few Bio::Ontology exceptions that do work and do display  
>> correctly).
>>
>> I suppose there is something peculiar about the code formatting of
>> those modules? Some of the modules under Bio::OntologyIO are also
>> affected BTW.
>>
>> What happens is after you click on the link the page apppears to
>> reload (i.e., gets submitted) but the second table that is supposed
>> open underneath the first doesn't appear. However, the sort-by drop
>> down selector does appear.
>>
>> 	-hilmar
>>
>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
>>
>>> That's strange.  Clicking on the list gives me the results for that
>>> module.
>>> When I click on the hyperlinks in the results section they open
>>> fine; the
>>> method column links opens a new page containing usage-function-
>>> returns-args
>>> and the class column links opens pdoc (same page) for bioperl-
>>> live.  I'm
>>> using Firefox 1.5 on WinXP.
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>> Sent: Monday, May 15, 2006 12:01 PM
>>>> To: Mauricio Herrera Cuadra
>>>> Cc: bioperl-l
>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>
>>>> Hey, thanks to Laura & David for this interface.
>>>>
>>>> Any idea why most of the Bio::Ontology::* modules show up without
>>>> their leading Bio::Ontology? And clicking on those hyperlinks  
>>>> doesn't
>>>> go anywhere either ... Anything different with those modules that I
>>>> can fix?
>>>>
>>>> 	-hilmar
>>>>
>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>>>
>>>>> I'm glad to announce the availability of the Deobfuscator
>>>>> interface at
>>>>> the BioPerl website. You can use it at the following URL:
>>>>>
>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>>
>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>>>> contribution to the BioPerl project!
>>>>>
>>>>> Mauricio.
>>>>>
>>>>> --
>>>>> MAURICIO HERRERA CUADRA
>>>>> arareko at campus.iztacala.unam.mx
>>>>> Laboratorio de Gen?tica
>>>>> Unidad de Morfofisiolog?a y Funci?n
>>>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ClarkeW at AGR.GC.CA  Mon May 15 15:40:15 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Mon, 15 May 2006 15:40:15 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>

Hey everyone, 

 
I have been developing some code to download and parse blast reports
from a remote server using Soap::Lite as well as insert the results into
a mysql database. The problem I am having is that my program seems to be
taking up and huge amount of RAM. For a single job of 10000 queries it
can consume as much as a couple hundred Mb inside an hour. I realize
that a lot of work is being done but this seems like way too much. This
leads me to the subject of my post. I think I may have traced the source
of the memory leak to Bio::SearchIO. I have used Devel::Size to track
the size of my variables and done other debugging steps and have had no
luck with resolving this very frustrating problem. My code is as
follows:

 
 my $result = $connector->getQueryResult($query_id);

 
                my $FH;

                open $FH, "<", \$result;

 
                my $searchio = new Bio::SearchIO(-format => "blast",

 
                         -fh => $FH);

 
                while (my $o_blast = $searchio->next_result()) {

                        my $clone_id = $o_blast->query_name();

 
                        my $statement = $bdbi->form_push_SQL ($o_blast,
$clone_id, 5);

 
this is just the leading and tailing code surrounding the use of
Bio::SearchIO since there is quite a lot. I am mostly just wondering if
anyone has ever had problems with SearchIO and its memory usage. I
looked at the source code for it but am afraid it is out of my league.
Any help/suggestions/questions would be great. Thanks


From dmessina at wustl.edu  Mon May 15 15:34:10 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 15 May 2006 14:34:10 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
Message-ID: <C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>

Responding to:
>>> Amir Karger
>> Sendu Bala
>  Chris Fields


> The Deobfuscator currently indexes bioperl-1.4, so it's not completely
> up-to-date.  I believe Mauricio and Dave may be working on updating  
> to the
> newer versions and maybe bioperl-live, as well as getting the other  
> bioperl
> packages up and running.

That's correct -- Mauricio is currently working on a version that  
will allow you to search 1.4, 1.5.1, or bioperl-live. The  
Deobfuscator indexes will be updated (daily?) to keep them in sync  
with the CVS repository.


>>> A couple of minor interface thoughts.
>>>
>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>> think I'll often want to browse through what's available in a  
>>> class. But
>>> 60% or so of the screen real estate is used for "Enter a search
>>> string... OR select a class from the list". IMO, it would be  
>>> better to
>>> have two pages, a search page and a result page.   It only takes  
>>> a click
>>> on Back (or a "new search" button) to get to a new search, and  
>>> now you
>>> can use your whole screen for reading your results.
>>
>> As the compromise it must be, I like the way it behaves. I don't like
>> lots of windows. I especially don't like pop up windows. Right now  
>> when
>> I'm using the bioperl docs I tend to have a whole bunch of tabs  
>> open to
>> different class pages at once, so being able to see an overview  
>> all on
>> one page in Deobfuscator is very nice.

I think the current behavior makes sense as the default, but I like  
the idea of being able to view the search results in a separate  
window for easier browsing. Thanks for the suggestion; I'll add it to  
the list.


>> Further to that, I'd love it if clicking on a method name caused an
>> in-place css(&|javascript) reveal (similar to how a well implemented
>> drop down menu works in a website) rather than a new window opened.
>> Alternatively, just have more columns in the results table, ie.  
>> usage,
>> function, returns, args columns. I feel that opening a window for  
>> each
>> method you want to understand is far too slow.
>
> Agreed.

Yeah, the way it currently works is admittedly lame, and was done as  
a placeholder until we figured out a better way to do it. An in-place  
reveal sounds like a good solution.


>>> 2) Please sort the "select a class from the list" alphabetically. I
>>> guess I can enter a search term to get the right classes, but it  
>>> would
>>> be nice to be able to browse.

Agreed. I think we were doing this in an earlier test version, but I  
must have left it out of the release I handed off to Mauricio.


>>> 3) Minimalist is nice, but documentation is even nicer. It wasn't  
>>> clear
>>> to me that the search searches within class names rather than  
>>> function
>>> names. What I really want to know sometimes is which module has,  
>>> say,
>>> the revcom method in it.
>>
>> This would be a great feature to add.

That's a great idea.


>>> 4) When I search for something that's not found, I get a screen that
>>> looks pretty familiar, with the extra text "No match to string  
>>> found"
>>> down at the bottom. It took me a while to even notice it.  
>>> (Studies show
>>> that most users don't read most of the text on a page.) Bold  
>>> might be
>>> nice here. Or put the error at the top of the screen. Or both.

Added to the list.


>>> 5) I'll save my stupidest comment for last - please make the page  
>>> title
>>> "Bioperl Deobfuscator", so that when I bookmark it I'll know what  
>>> the
>>> bookmark stands for.

Added to the list. Not stupid, by the way -- much to my surprise,  
there are at least 2 or 3 other (obviously inferior :) )  
deobfuscators floating around out there.


>> Another minor interface thought:
>> 6) Have a little more cell padding in all the tables. Things are  
>> just a
>> little too cramped and things start to look messy/ run into each  
>> other.

Added to the list.


Thanks to all of you for taking the time to give such detailed  
feedback -- it's really helpful.

There is a wiki page on the BioPerl site for this project (http:// 
www.bioperl.org/wiki/Deobfuscator), so I'll be putting your comments  
there for tracking and further discussion. Please feel free to add to  
it.


Dave


-- 
Dave Messina
WashU Genome Sequencing Center
dmessina at wustl.edu
314-286-1825


From faruque at ebi.ac.uk  Mon May 15 15:47:27 2006
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Mon, 15 May 2006 20:47:27 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
Message-ID: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk>

>> My personal view is that having it as an annotation would serve no  
>> real
>> purpose. For me the whole point of any kind of species  
>> representation in
>> bioperl is to allow you to compare species in a biologically  
>> meaningful
>> way. If it's just some annotation then that means it's basically

I understand the need to find the species name of entries, especially  
now that so many complete genomes have been given their own strain- 
specific tax nodes, and I also think it is a shame that the ncbi tax  
dump does not give a rank to entries such as these (they cannot  
easily be distinguished from unofficial ranks higher in the tree  
without ascending the tree).
Would it be useful for the species name to be included within EMBL  
file headers, eg in a line called OB (OB is a terrible suggestion  
based on 'Organism Binomial' since OS is already in use)?

eg two examples of the species 'Apple stem grooving virus', where the  
second one would appear to be a different species without delving  
into the tax tree or the inclusion of an OB line.

AC   D14995; S47260;
DE   Apple stem grooving virus genome, complete sequence.
OS   Apple stem grooving virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.

AC   AY646511;
DE   Citrus tatter leaf virus strain Kumquat 1, complete genome.
OS   Citrus tatter leaf virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.


> My point is, a large number of users do NOT use, nor care about,  
> taxonomic
> information to the degree they need to know the entire  
> classification of the
> organism; many are just as happy about getting the scientific name  
> only,
> which is in the GenBank/EMBL file itself.  To take one extreme, it  
> is not
> productive to force every user to download the NCBI tax database  
> and use
> lookups just to convert sequences from EMBL format to GenBank  
> format.  It's
> not productive to allow users to spam the NCBI tax database  
> remotely either,
> so hardcoding lookups is, IMHO, a big mistake.

I don't think you need to add any information to turn an embl-format  
file into a Genbank flatfile, but maybe I'm missing something obvious.

Nadeem


--
Dr S.M. Nadeem N. Faruque
9 Barley Court
Saffron Walden
Essex  CB11 3HG
01799 500 120


From dmessina at wustl.edu  Mon May 15 16:12:48 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 15 May 2006 15:12:48 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <5A2309FD-8C6E-4349-99CC-B3EDA8B2F499@wustl.edu>

On May 15, 2006, at 2:23 PM, Hilmar Lapp wrote:

> I wasn't using the search. It's in the scrollable table for browsing.
> -hilmar

I'm seeing this too on OS X with Safari 2.0.3.

If you type 'goflat' (without the quotes) into the search box, you'll  
see the behavior. Chris, can you try it again this way just to  
confirm it's an OS/browser-specific thing?

Not sure what's going on, Hilmar -- I'll take a look.

Dave


From cjfields at uiuc.edu  Mon May 15 16:56:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 15:56:29 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net>
Message-ID: <000a01c67862$0a00cab0$15327e82@pyrimidine>

Okay, I see what you mean.  Using the search term "Bio::Ont*" also explains
why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and Mac OS
X), and those links are broken like you said.  Could be something to do with
indexing.  

Using the methods script in the FAQ
(http://www.bioperl.org/wiki/FAQ#Why_can.27t_I_easily_get_a_list_of_all_the_
methods_a_object_can_call.3F) I get this:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
Bio::OntologyIO::simplehierarchy::Dumper
Bio::OntologyIO::simplehierarchy::basename
Bio::OntologyIO::simplehierarchy::dirname
Bio::OntologyIO::simplehierarchy::fileparse
Bio::OntologyIO::simplehierarchy::fileparse_set_fstype

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 2:24 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> I wasn't using the search. It's in the scrollable table for browsing.
> -hilmar
> 
> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
> 
> > I'll have to give it a try on Mac OS X (we have an ancient G4 in
> > the lab
> > which I can try it on).  I'll let you know what I find.
> >
> > This is what I get when I do a search for 'Bio::Ont*' using Firefox
> > on WinXP
> > and this Deobfuscator link (http://bioperl.org/cgi-bin/
> > deob_interface.cgi?);
> > all the classes have links that work (I added newline and tab to
> > make it a
> > bit more readable) :
> >
> > Bio::OntologyIO
> > 	Parser factory for Ontology formats
> > Bio::OntologyIO::Handlers::BaseSAXHandler
> > 	no short description available
> > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> > 	no short description available
> > Bio::Ontology::OntologyI
> > 	Interface for an ontology implementation
> > Bio::Ontology::TermFactory
> > 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> > factory
> > Bio::Ontology::OntologyStore
> > 	A repository of ontologies
> > Bio::Ontology::RelationshipFactory
> > 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> > through a factory
> > Bio::Ontology::Ontology
> > 	standard implementation of an Ontology
> >
> > So the names seem fine here.
> >
> > When I click on a class (Bio::Ontology::Ontology) I get in the results
> > section:
> >
> > Method                  Class
> > Returns
> > Usage
> > add_relationship        Bio::Ontology::Ontology
> Its
> > argument.     add_relationship(RelationshipI relationship):
> > RelationshipI
> > add_relationship_type   Bio::Ontology::OntologyEngineI            not
> > documented    not documented
> > add_term                Bio::Ontology::Ontology                   its
> > argument.     add_term(TermI term): TermI
> >
> > ....and so on
> >
> > Where each method is clickable and opens a new page containing a
> > table:
> >
> > Bio::Ontology::Ontology::add_relationship
> > Usage	add_relationship(RelationshipI relationship): RelationshipI
> > Function	Adds a relationship object to the ontology engine.
> > Returns	Its argument.
> > Args	A RelationshipI object.
> >
> >
> > Each class is also linked to the bioperl-live PDOC.  Clicking on class
> > Bio::Ontology::Ontology in the results table gets me this page (no new
> > page):
> >
> > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> >
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Monday, May 15, 2006 1:09 PM
> >> To: Chris Fields
> >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> Safari or Firefox on MacOSX don't do this. Note that the appearance
> >> in the browsable list is already different (the prefix is missing),
> >> and the JavaScript link also lacks the prefix in the module name in
> >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> >> the few Bio::Ontology exceptions that do work and do display
> >> correctly).
> >>
> >> I suppose there is something peculiar about the code formatting of
> >> those modules? Some of the modules under Bio::OntologyIO are also
> >> affected BTW.
> >>
> >> What happens is after you click on the link the page apppears to
> >> reload (i.e., gets submitted) but the second table that is supposed
> >> open underneath the first doesn't appear. However, the sort-by drop
> >> down selector does appear.
> >>
> >> 	-hilmar
> >>
> >> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >>
> >>> That's strange.  Clicking on the list gives me the results for that
> >>> module.
> >>> When I click on the hyperlinks in the results section they open
> >>> fine; the
> >>> method column links opens a new page containing usage-function-
> >>> returns-args
> >>> and the class column links opens pdoc (same page) for bioperl-
> >>> live.  I'm
> >>> using Firefox 1.5 on WinXP.
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >>>> Sent: Monday, May 15, 2006 12:01 PM
> >>>> To: Mauricio Herrera Cuadra
> >>>> Cc: bioperl-l
> >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>
> >>>> Hey, thanks to Laura & David for this interface.
> >>>>
> >>>> Any idea why most of the Bio::Ontology::* modules show up without
> >>>> their leading Bio::Ontology? And clicking on those hyperlinks
> >>>> doesn't
> >>>> go anywhere either ... Anything different with those modules that I
> >>>> can fix?
> >>>>
> >>>> 	-hilmar
> >>>>
> >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>>>
> >>>>> I'm glad to announce the availability of the Deobfuscator
> >>>>> interface at
> >>>>> the BioPerl website. You can use it at the following URL:
> >>>>>
> >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>>>
> >>>>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>>>> contribution to the BioPerl project!
> >>>>>
> >>>>> Mauricio.
> >>>>>
> >>>>> --
> >>>>> MAURICIO HERRERA CUADRA
> >>>>> arareko at campus.iztacala.unam.mx
> >>>>> Laboratorio de Gen?tica
> >>>>> Unidad de Morfofisiolog?a y Funci?n
> >>>>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 17:29:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 16:29:14 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk>
Message-ID: <000b01c67866$9dac2620$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nadeem Faruque
> Sent: Monday, May 15, 2006 2:47 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles
> species,subspecies/variant names
> 
> >> My personal view is that having it as an annotation would serve no
> >> real
> >> purpose. For me the whole point of any kind of species
> >> representation in
> >> bioperl is to allow you to compare species in a biologically
> >> meaningful
> >> way. If it's just some annotation then that means it's basically
> 
> I understand the need to find the species name of entries, especially
> now that so many complete genomes have been given their own strain-
> specific tax nodes, and I also think it is a shame that the ncbi tax
> dump does not give a rank to entries such as these (they cannot
> easily be distinguished from unofficial ranks higher in the tree
> without ascending the tree).
> Would it be useful for the species name to be included within EMBL
> file headers, eg in a line called OB (OB is a terrible suggestion
> based on 'Organism Binomial' since OS is already in use)?
> 
> eg two examples of the species 'Apple stem grooving virus', where the
> second one would appear to be a different species without delving
> into the tax tree or the inclusion of an OB line.
> 
> AC   D14995; S47260;
> DE   Apple stem grooving virus genome, complete sequence.
> OS   Apple stem grooving virus
> OB   Apple stem grooving virus
> OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
> OC   Capillovirus.
> 
> AC   AY646511;
> DE   Citrus tatter leaf virus strain Kumquat 1, complete genome.
> OS   Citrus tatter leaf virus
> OB   Apple stem grooving virus
> OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
> OC   Capillovirus.

Jason also mentions a few examples (see below).  The problem lies in the
fact that EMBL and GenBank flatfiles do not give hierarchy ranking for
taxonomy, so it's a best guess.  What I'm seeing is that the guess is wrong
more often than not when it comes to complex scientific names (viruses,
bacteria, etc).  Notice the doubling of the strain in the following GenBank
files passed through SeqIO (genbank->genbank conversion, BTW; haven't tried
EMBL):

SOURCE      Azoarcus sp. EbN1 EbN1
  ORGANISM  Azoarcus sp.
            Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales;
            Rhodocyclaceae; Azoarcus.

SOURCE      Mycobacterium sp. KMS KMS
  ORGANISM  Mycobacterium sp.
            Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
            Corynebacterineae; Mycobacteriaceae; Mycobacterium.

SOURCE      Mycobacterium tuberculosis C C
  ORGANISM  Mycobacterium tuberculosis
            Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
            Corynebacterineae; Mycobacteriaceae; Mycobacterium;
Mycobacterium;
            tuberculosis complex; Mycobacterium.

SOURCE      Bacillus subtilis subsp. subtilis str. 168 subtilis str. 168
  ORGANISM  Bacillus subtilis subsp.
            Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus.

Here are Jason's examples, for posterity:

Can you guess what value is the strain versus sub-species?  What happens
when there is a two part strain name (space separated) and a sub-species or
variety designation?

SOURCE      Staphylococcus haemolyticus JCSC1435
   ORGANISM  Staphylococcus haemolyticus JCSC1435
             Bacteria; Firmicutes; Bacillales; Staphylococcus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
strain is JCSC1435

versus
SOURCE      Muntiacus muntjak vaginalis
   ORGANISM  Muntiacus muntjak vaginalis
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;
Ruminantia;
             Pecora; Cervidae; Muntiacinae; Muntiacus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
species is muntjak, sub-species vaginalis ?

versus
SOURCE      Aspergillus nidulans FGSC A4
   ORGANISM  Aspergillus nidulans FGSC A4
             Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes;
             Eurotiales; Trichocomaceae; Emericella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321
Genus should be Aspergillus or Emericella ?

Strain and subspecies/variety in the same entry
SOURCE      Cryptococcus neoformans var. grubii H99
   ORGANISM  Cryptococcus neoformans var. grubii H99
             Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
             Heterobasidiomycetes; Tremellomycetidae; Tremellales;
Tremellaceae;
             Filobasidiella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443


> > My point is, a large number of users do NOT use, nor care about,
> > taxonomic
> > information to the degree they need to know the entire
> > classification of the
> > organism; many are just as happy about getting the scientific name
> > only,
> > which is in the GenBank/EMBL file itself.  To take one extreme, it
> > is not
> > productive to force every user to download the NCBI tax database
> > and use
> > lookups just to convert sequences from EMBL format to GenBank
> > format.  It's
> > not productive to allow users to spam the NCBI tax database
> > remotely either,
> > so hardcoding lookups is, IMHO, a big mistake.
> 
> I don't think you need to add any information to turn an embl-format
> file into a Genbank flatfile, but maybe I'm missing something obvious.

The issue is the way the SOURCE and ORGANISM lines are handled (OS/OC lines
in EMBL, I believe), which is using a Bio::Species object.  The problem is,
like I mentioned above, no hierarchal ranking is in the flat file, just the
order of the ranking.  We can try to make a best guess based on that but
it's obviously very tricky, particularly when dealing with subspecies,
strains, etc.  

NCBI also states that many times the classification can be too long for a
file so may be incomplete (I think they leave out nodes which have 'no rank'
tags, but I can't be completely sure), so there's another issue.

Anyway, this is where the lookup would come in, which would require a local
taxonomy  database (we can't spam the NCBI remote database, that would just
be rude) which would give the complete taxonomic classification if it worked
properly.  

So now we have three possible situations:

1) One extreme : We require a lookup to get it right (which, BTW, it
currently doesn't); this by default requires a local database.  
2) Middle of the road : we try and guess the information as best as we can
with the information given (the current situation); this is breaking more
and more often now, so is becoming more unreliable.
3) Other extreme : we punt and absolve ourselves of even trying to parse the
data and just have a strict tagname->value or similar simple construct to
handle the data.

#3 as default with option to do #1 is probably best (least error prone with
option for most information), with caching to speed up lookups as Sendu Bala
does now.

Chris

 
> Nadeem
> 
> 
> --
> Dr S.M. Nadeem N. Faruque
> 9 Barley Court
> Saffron Walden
> Essex  CB11 3HG
> 01799 500 120
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Mon May 15 17:37:56 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 17:37:56 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000a01c67862$0a00cab0$15327e82@pyrimidine>
References: <000a01c67862$0a00cab0$15327e82@pyrimidine>
Message-ID: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net>

It does have the following line though (and a 'use' statement for  
OntologyIO);

@ISA = qw( Bio::OntologyIO );

So what is it doing 'wrong' (there aren't any tests or so in which  
anything erroneous would show)?

	-hilmar

On May 15, 2006, at 4:56 PM, Chris Fields wrote:

> Okay, I see what you mean.  Using the search term "Bio::Ont*" also  
> explains
> why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and  
> Mac OS
> X), and those links are broken like you said.  Could be something  
> to do with
> indexing.
>
> Using the methods script in the FAQ
> (http://www.bioperl.org/wiki/FAQ#Why_can. 
> 27t_I_easily_get_a_list_of_all_the_
> methods_a_object_can_call.3F) I get this:
>
> C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
> Bio::OntologyIO::simplehierarchy::Dumper
> Bio::OntologyIO::simplehierarchy::basename
> Bio::OntologyIO::simplehierarchy::dirname
> Bio::OntologyIO::simplehierarchy::fileparse
> Bio::OntologyIO::simplehierarchy::fileparse_set_fstype
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Monday, May 15, 2006 2:24 PM
>> To: Chris Fields
>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> I wasn't using the search. It's in the scrollable table for browsing.
>> -hilmar
>>
>> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
>>
>>> I'll have to give it a try on Mac OS X (we have an ancient G4 in
>>> the lab
>>> which I can try it on).  I'll let you know what I find.
>>>
>>> This is what I get when I do a search for 'Bio::Ont*' using Firefox
>>> on WinXP
>>> and this Deobfuscator link (http://bioperl.org/cgi-bin/
>>> deob_interface.cgi?);
>>> all the classes have links that work (I added newline and tab to
>>> make it a
>>> bit more readable) :
>>>
>>> Bio::OntologyIO
>>> 	Parser factory for Ontology formats
>>> Bio::OntologyIO::Handlers::BaseSAXHandler
>>> 	no short description available
>>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
>>> 	no short description available
>>> Bio::Ontology::OntologyI
>>> 	Interface for an ontology implementation
>>> Bio::Ontology::TermFactory
>>> 	Instantiates a new Bio::Ontology::TermI (or derived class)  
>>> through a
>>> factory
>>> Bio::Ontology::OntologyStore
>>> 	A repository of ontologies
>>> Bio::Ontology::RelationshipFactory
>>> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
>>> through a factory
>>> Bio::Ontology::Ontology
>>> 	standard implementation of an Ontology
>>>
>>> So the names seem fine here.
>>>
>>> When I click on a class (Bio::Ontology::Ontology) I get in the  
>>> results
>>> section:
>>>
>>> Method                  Class
>>> Returns
>>> Usage
>>> add_relationship        Bio::Ontology::Ontology
>> Its
>>> argument.     add_relationship(RelationshipI relationship):
>>> RelationshipI
>>> add_relationship_type   Bio::Ontology::OntologyEngineI             
>>> not
>>> documented    not documented
>>> add_term                Bio::Ontology::Ontology                    
>>> its
>>> argument.     add_term(TermI term): TermI
>>>
>>> ....and so on
>>>
>>> Where each method is clickable and opens a new page containing a
>>> table:
>>>
>>> Bio::Ontology::Ontology::add_relationship
>>> Usage	add_relationship(RelationshipI relationship): RelationshipI
>>> Function	Adds a relationship object to the ontology engine.
>>> Returns	Its argument.
>>> Args	A RelationshipI object.
>>>
>>>
>>> Each class is also linked to the bioperl-live PDOC.  Clicking on  
>>> class
>>> Bio::Ontology::Ontology in the results table gets me this page  
>>> (no new
>>> page):
>>>
>>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
>>>
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>>>> Sent: Monday, May 15, 2006 1:09 PM
>>>> To: Chris Fields
>>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>
>>>> Safari or Firefox on MacOSX don't do this. Note that the appearance
>>>> in the browsable list is already different (the prefix is missing),
>>>> and the JavaScript link also lacks the prefix in the module name in
>>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
>>>> the few Bio::Ontology exceptions that do work and do display
>>>> correctly).
>>>>
>>>> I suppose there is something peculiar about the code formatting of
>>>> those modules? Some of the modules under Bio::OntologyIO are also
>>>> affected BTW.
>>>>
>>>> What happens is after you click on the link the page apppears to
>>>> reload (i.e., gets submitted) but the second table that is supposed
>>>> open underneath the first doesn't appear. However, the sort-by drop
>>>> down selector does appear.
>>>>
>>>> 	-hilmar
>>>>
>>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
>>>>
>>>>> That's strange.  Clicking on the list gives me the results for  
>>>>> that
>>>>> module.
>>>>> When I click on the hyperlinks in the results section they open
>>>>> fine; the
>>>>> method column links opens a new page containing usage-function-
>>>>> returns-args
>>>>> and the class column links opens pdoc (same page) for bioperl-
>>>>> live.  I'm
>>>>> using Firefox 1.5 on WinXP.
>>>>>
>>>>> Chris
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>>>> Sent: Monday, May 15, 2006 12:01 PM
>>>>>> To: Mauricio Herrera Cuadra
>>>>>> Cc: bioperl-l
>>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>>>
>>>>>> Hey, thanks to Laura & David for this interface.
>>>>>>
>>>>>> Any idea why most of the Bio::Ontology::* modules show up without
>>>>>> their leading Bio::Ontology? And clicking on those hyperlinks
>>>>>> doesn't
>>>>>> go anywhere either ... Anything different with those modules  
>>>>>> that I
>>>>>> can fix?
>>>>>>
>>>>>> 	-hilmar
>>>>>>
>>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>>>>>
>>>>>>> I'm glad to announce the availability of the Deobfuscator
>>>>>>> interface at
>>>>>>> the BioPerl website. You can use it at the following URL:
>>>>>>>
>>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>>>>
>>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>>>>>> contribution to the BioPerl project!
>>>>>>>
>>>>>>> Mauricio.
>>>>>>>
>>>>>>> --
>>>>>>> MAURICIO HERRERA CUADRA
>>>>>>> arareko at campus.iztacala.unam.mx
>>>>>>> Laboratorio de Gen?tica
>>>>>>> Unidad de Morfofisiolog?a y Funci?n
>>>>>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 18:03:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 17:03:48 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net>
Message-ID: <000d01c6786b$71c04e60$15327e82@pyrimidine>

And Bio::OntologyIO works on it's own:

C:\Perl\Scripts>methods.pl Bio::OntologyIO
Bio::OntologyIO::DESTROY
Bio::OntologyIO::new
Bio::OntologyIO::next_ontology
Bio::OntologyIO::term_factory
Bio::OntologyIO::unescape
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented

But when I try these:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::goflat


C:\Perl\Scripts>methods.pl Bio::OntologyIO::dagflat


I get nada.  It could be related to the way the methods are parsed using
Class::Inspector :

print join ("\n", sort
@{Class::Inspector->methods($class,'full','public')}), "\n";

I haven't tried it on all the weird Bio::Ontology-missing modules (don't
have time today).  It's not common to all of those modules though:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::InterProParser
Bio::OntologyIO::DESTROY
Bio::OntologyIO::InterProParser::next_ontology
Bio::OntologyIO::InterProParser::parse
Bio::OntologyIO::InterProParser::secondary_accessions_map
Bio::OntologyIO::new
Bio::OntologyIO::term_factory
Bio::OntologyIO::unescape
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented


Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 4:38 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> It does have the following line though (and a 'use' statement for
> OntologyIO);
> 
> @ISA = qw( Bio::OntologyIO );
> 
> So what is it doing 'wrong' (there aren't any tests or so in which
> anything erroneous would show)?
> 
> 	-hilmar
> 
> On May 15, 2006, at 4:56 PM, Chris Fields wrote:
> 
> > Okay, I see what you mean.  Using the search term "Bio::Ont*" also
> > explains
> > why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and
> > Mac OS
> > X), and those links are broken like you said.  Could be something
> > to do with
> > indexing.
> >
> > Using the methods script in the FAQ
> > (http://www.bioperl.org/wiki/FAQ#Why_can.
> > 27t_I_easily_get_a_list_of_all_the_
> > methods_a_object_can_call.3F) I get this:
> >
> > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
> > Bio::OntologyIO::simplehierarchy::Dumper
> > Bio::OntologyIO::simplehierarchy::basename
> > Bio::OntologyIO::simplehierarchy::dirname
> > Bio::OntologyIO::simplehierarchy::fileparse
> > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >> Sent: Monday, May 15, 2006 2:24 PM
> >> To: Chris Fields
> >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> I wasn't using the search. It's in the scrollable table for browsing.
> >> -hilmar
> >>
> >> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
> >>
> >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in
> >>> the lab
> >>> which I can try it on).  I'll let you know what I find.
> >>>
> >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox
> >>> on WinXP
> >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/
> >>> deob_interface.cgi?);
> >>> all the classes have links that work (I added newline and tab to
> >>> make it a
> >>> bit more readable) :
> >>>
> >>> Bio::OntologyIO
> >>> 	Parser factory for Ontology formats
> >>> Bio::OntologyIO::Handlers::BaseSAXHandler
> >>> 	no short description available
> >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> >>> 	no short description available
> >>> Bio::Ontology::OntologyI
> >>> 	Interface for an ontology implementation
> >>> Bio::Ontology::TermFactory
> >>> 	Instantiates a new Bio::Ontology::TermI (or derived class)
> >>> through a
> >>> factory
> >>> Bio::Ontology::OntologyStore
> >>> 	A repository of ontologies
> >>> Bio::Ontology::RelationshipFactory
> >>> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> >>> through a factory
> >>> Bio::Ontology::Ontology
> >>> 	standard implementation of an Ontology
> >>>
> >>> So the names seem fine here.
> >>>
> >>> When I click on a class (Bio::Ontology::Ontology) I get in the
> >>> results
> >>> section:
> >>>
> >>> Method                  Class
> >>> Returns
> >>> Usage
> >>> add_relationship        Bio::Ontology::Ontology
> >> Its
> >>> argument.     add_relationship(RelationshipI relationship):
> >>> RelationshipI
> >>> add_relationship_type   Bio::Ontology::OntologyEngineI
> >>> not
> >>> documented    not documented
> >>> add_term                Bio::Ontology::Ontology
> >>> its
> >>> argument.     add_term(TermI term): TermI
> >>>
> >>> ....and so on
> >>>
> >>> Where each method is clickable and opens a new page containing a
> >>> table:
> >>>
> >>> Bio::Ontology::Ontology::add_relationship
> >>> Usage	add_relationship(RelationshipI relationship): RelationshipI
> >>> Function	Adds a relationship object to the ontology engine.
> >>> Returns	Its argument.
> >>> Args	A RelationshipI object.
> >>>
> >>>
> >>> Each class is also linked to the bioperl-live PDOC.  Clicking on
> >>> class
> >>> Bio::Ontology::Ontology in the results table gets me this page
> >>> (no new
> >>> page):
> >>>
> >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> >>>
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >>>> Sent: Monday, May 15, 2006 1:09 PM
> >>>> To: Chris Fields
> >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>
> >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance
> >>>> in the browsable list is already different (the prefix is missing),
> >>>> and the JavaScript link also lacks the prefix in the module name in
> >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> >>>> the few Bio::Ontology exceptions that do work and do display
> >>>> correctly).
> >>>>
> >>>> I suppose there is something peculiar about the code formatting of
> >>>> those modules? Some of the modules under Bio::OntologyIO are also
> >>>> affected BTW.
> >>>>
> >>>> What happens is after you click on the link the page apppears to
> >>>> reload (i.e., gets submitted) but the second table that is supposed
> >>>> open underneath the first doesn't appear. However, the sort-by drop
> >>>> down selector does appear.
> >>>>
> >>>> 	-hilmar
> >>>>
> >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >>>>
> >>>>> That's strange.  Clicking on the list gives me the results for
> >>>>> that
> >>>>> module.
> >>>>> When I click on the hyperlinks in the results section they open
> >>>>> fine; the
> >>>>> method column links opens a new page containing usage-function-
> >>>>> returns-args
> >>>>> and the class column links opens pdoc (same page) for bioperl-
> >>>>> live.  I'm
> >>>>> using Firefox 1.5 on WinXP.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >>>>>> Sent: Monday, May 15, 2006 12:01 PM
> >>>>>> To: Mauricio Herrera Cuadra
> >>>>>> Cc: bioperl-l
> >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>>>
> >>>>>> Hey, thanks to Laura & David for this interface.
> >>>>>>
> >>>>>> Any idea why most of the Bio::Ontology::* modules show up without
> >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks
> >>>>>> doesn't
> >>>>>> go anywhere either ... Anything different with those modules
> >>>>>> that I
> >>>>>> can fix?
> >>>>>>
> >>>>>> 	-hilmar
> >>>>>>
> >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>>>>>
> >>>>>>> I'm glad to announce the availability of the Deobfuscator
> >>>>>>> interface at
> >>>>>>> the BioPerl website. You can use it at the following URL:
> >>>>>>>
> >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>>>>>
> >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>>>>>> contribution to the BioPerl project!
> >>>>>>>
> >>>>>>> Mauricio.
> >>>>>>>
> >>>>>>> --
> >>>>>>> MAURICIO HERRERA CUADRA
> >>>>>>> arareko at campus.iztacala.unam.mx
> >>>>>>> Laboratorio de Gen?tica
> >>>>>>> Unidad de Morfofisiolog?a y Funci?n
> >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> ===========================================================
> >>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>>>> ===========================================================
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 20:14:28 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Mon, 15 May 2006 19:14:28 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <a7d26051.b90db78f.81ac600@expms6.cites.uiuc.edu>

---- Original message ----
>Date: Mon, 15 May 2006 15:40:15 -0400
>From: "Clarke, Wayne" <ClarkeW at agr.gc.ca>  
>Subject: [Bioperl-l] Memory Leak in Bio::SearchIO  
>To: <bioperl-l at lists.open-bio.org>
>
>Hey everyone, 
>
> 
>
>I have been developing some code to download and parse blast reports
>from a remote server using Soap::Lite as well as insert the results into
>a mysql database. The problem I am having is that my program seems to be
>taking up and huge amount of RAM. For a single job of 10000 queries it
>can consume as much as a couple hundred Mb inside an hour. 

If you're parsing 10000 queries (10000 different BLAST reports, right?) then it's 
not necessarily a memory leak as much as it is object creatio.  Each report 
generates hit objects which in turn generate hsp objects.  I think Jason 
recommends using the tabular output option (-m8 or -m9) for huge reports as 
it cuts down considerably on this.  If you are cycling through each report it 
shouldn't be as much of a problem unless your BLAST reports are really huge.  
Have you tried parsing a single report to see if the problem persists?

Now, if you are using Bioperl 1.5.1 with BLAST 2.2.13 or newer, you'll likely run 
into a problem with an infinite loop that occurs due to a change in NCBI's text 
output.  You can try updating bioperl from CVS in either case to see if that helps 
any.  Tabular output and XML output, AFAIK, is the same regardless of version; 
this bug only affected text output of BLAST reports.

> I realize
>that a lot of work is being done but this seems like way too much. This
>leads me to the subject of my post. I think I may have traced the source
>of the memory leak to Bio::SearchIO. I have used Devel::Size to track
>the size of my variables and done other debugging steps and have had no
>luck with resolving this very frustrating problem. My code is as
>follows:
>
> 
>
> my $result = $connector->getQueryResult($query_id);
>
> 
>
>                my $FH;
>
>                open $FH, "<", \$result;
>
> 
>
>                my $searchio = new Bio::SearchIO(-format => "blast",
>
> 
>
>                         -fh => $FH);
>
> 
>
>                while (my $o_blast = $searchio->next_result()) {
>
>                        my $clone_id = $o_blast->query_name();
>
> 
>
>                        my $statement = $bdbi->form_push_SQL ($o_blast,
>$clone_id, 5);
>
> 
>
>this is just the leading and tailing code surrounding the use of
>Bio::SearchIO since there is quite a lot. I am mostly just wondering if
>anyone has ever had problems with SearchIO and its memory usage. I
>looked at the source code for it but am afraid it is out of my league.
>Any help/suggestions/questions would be great. Thanks
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Mon May 15 20:18:44 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 16 May 2006 10:18:44 +1000
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>
Message-ID: <44691A64.8040607@infotech.monash.edu.au>

> taking up and huge amount of RAM. For a single job of 10000 queries it
> can consume as much as a couple hundred Mb inside an hour. I realize

>  my $result = $connector->getQueryResult($query_id);
>                 my $searchio = new Bio::SearchIO(-format => "blast",
>                 while (my $o_blast = $searchio->next_result()) {
>                         my $clone_id = $o_blast->query_name();
>                         my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); }

Some comments:

Have you considered that whatever class/module $bdbi belongs to is 
causing the problem? ie. is it keeping a reference to $o_blast around?

Are you aware that Perl garbage collection does not necessarily return 
freed memory back to the OS? This may affect how you were measuring 
"memory usage".

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From kmdaily at indiana.edu  Mon May 15 17:00:12 2006
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Mon, 15 May 2006 17:00:12 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
Message-ID: <20528E699A515C499B80C222BDBEBC34043FF8@iu-mssg-mbx108.ads.iu.edu>

I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module?

Kenny Daily
IU School of Informatics
kmdaily at indiana.edu


From letondal at pasteur.fr  Tue May 16 02:06:19 2006
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 16 May 2006 08:06:19 +0200
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
	<C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>
Message-ID: <9c36140009c3d80bbb0d543376afa6e0@pasteur.fr>


On May 15, 2006, at 9:34 PM, David Messina wrote:

>>>> A couple of minor interface thoughts.
>>>>
>>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>>> think I'll often want to browse through what's available in a
>>>> class. But
>>>> 60% or so of the screen real estate is used for "Enter a search
>>>> string... OR select a class from the list". IMO, it would be
>>>> better to
>>>> have two pages, a search page and a result page.   It only takes
>>>> a click
>>>> on Back (or a "new search" button) to get to a new search, and
>>>> now you
>>>> can use your whole screen for reading your results.
>>>
>>> As the compromise it must be, I like the way it behaves. I don't like
>>> lots of windows. I especially don't like pop up windows. Right now
>>> when
>>> I'm using the bioperl docs I tend to have a whole bunch of tabs
>>> open to
>>> different class pages at once, so being able to see an overview
>>> all on
>>> one page in Deobfuscator is very nice.
>
> I think the current behavior makes sense as the default, but I like
> the idea of being able to view the search results in a separate
> window for easier browsing. Thanks for the suggestion; I'll add it to
> the list.
>

First, thanks for this very useful Web interface!

There are examples (quite ajaxian ones) that reach a compromise between 
several windows for easily browsing large results, and composing 
everything in one window to get an overview - the 2 examples that come 
in my mind currently are (not biology related):
	- http://montreal.mspace.fm/chi/sched/
	- http://www.live.com/
		(see the slider on the top right enabling to squeeze or enlarge the 
results area)


--
Catherine Letondal -- Institut Pasteur


From cjfields at uiuc.edu  Tue May 16 07:38:42 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Tue, 16 May 2006 06:38:42 -0500
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
Message-ID: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>

You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
developer release  (1.5.1):

http://www.bioperl.org/wiki/Installing_BioPerl

Chris

---- Original message ----
>Date: Mon, 15 May 2006 17:00:12 -0400
>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>To: <bioperl-l at lists.open-bio.org>
>
>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
Bio/SeqIO). How can I get this module?
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Tue May 16 07:37:46 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 16 May 2006 13:37:46 +0200
Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
Message-ID: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com>

Hi all,

I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and
found some issues and differences (bugs?) in behaviour wrt the pod.
Do these look familiar ?

Some example code:
my $query = Bio::DB::Query::GenBank->new
       (-query   =>'Lassa Virus[ORGN]',
        -reldate => '30',
        -db      => 'protein',
        -ids => [195052,2981014,11127914],
        -maxids => 30 );

$gb = new Bio::DB::GenBank(format=>'fasta');
my $seqio = $gb->get_Stream_by_query($query);
while (my $seq = $seqio->next_seq) {
       print $seq->desc,"\n"; }

The module states that if we provide -ids that:
       If you provide an array reference of IDs in -ids, the query will be
       ignored and the list of IDs will be used when the query is passed to a
       Bio::DB::GenBank object's get_Stream_by_query() method.

In the above case actually the query is passed ('Lassa Virus[ORGN]),
not the IDs. Also $query->query shows the original query. Am I doing
something wrong or is the pod not reflecting current behaviour of this
module?

I was also surprised that if internet is down no warning is thrown for
$query->query or $query->count at all. Only the get_Stream_by_query
above will warn us if the site is unreachable (500 Internal Server
Error).

$query->ids or $query->count will not throw a warning and
@ids=$query->ids will just be an empty array. (I realize $query->count
is not initialized, so I am using this now to check for succes, but a
warning from WebDBSeqI would me more approprotiate I think).

Last, the example from the pod is not working, but no warnings are raised:
          # initialize the list yourself
          my $query =
Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]);

$query->count returns zero w/o any warning. Of course this query did
not specify a DB. Only if we specify -db=>'nucleotide' $query->count
is 3.
However, why not any warning if we set -db->'protein' or if we did not set this?

On the NCBI website searching Protein DB returns for 19505:
      See Details. No items found.
      The following term(s) refer to a different DB:195052

But this is not reflected via Bio::DB::Query::GenBank.

Can I check for this situation in the code apart from checking on
$query->count == 0 ? Or would it indeed be better to check for these
situations in the module?

Regards,
Bernd


From chen_li3 at yahoo.com  Tue May 16 10:55:51 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 16 May 2006 07:55:51 -0700 (PDT)
Subject: [Bioperl-l] module for 6 reading frames
Message-ID: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com>

Hi all,

I wonder which module is available for translating DNA
sequence into 6 reading frames.

Thank you,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From smarkel at scitegic.com  Tue May 16 11:10:35 2006
From: smarkel at scitegic.com (smarkel at scitegic.com)
Date: Tue, 16 May 2006 08:10:35 -0700
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com>
Message-ID: <OF41BF3DF8.D7365B03-ON88257170.00534209-88257170.00535904@scitegic.com>

Li,

Use the translate() function in Bio::Tools::CodonTable.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


bioperl-l-bounces at lists.open-bio.org wrote on 16.05.2006 07:55:51:

> Hi all,
> 
> I wonder which module is available for translating DNA
> sequence into 6 reading frames.
> 
> Thank you,
> 
> Li


From golharam at umdnj.edu  Tue May 16 12:18:19 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 16 May 2006 12:18:19 -0400
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
Message-ID: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1>

I just updated my local copy of bioperl from cvs.  When I ran the
configure script, it says I need the external module
Bio::ASN1::EntrezGene.  Which package contains this module?

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From golharam at umdnj.edu  Tue May 16 12:24:03 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 16 May 2006 12:24:03 -0400
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
Message-ID: <002001c67905$2622a580$2f01a8c0@GOLHARMOBILE1>

Never mind.  I see its in CPAN.

-----Original Message-----
From: Ryan Golhar [mailto:golharam at umdnj.edu] 
Sent: Tuesday, May 16, 2006 12:18 PM
To: 'bioperl-l at bioperl.org'
Subject: Where is Bio::ASN1::EntrezGene?


I just updated my local copy of bioperl from cvs.  When I ran the
configure script, it says I need the external module
Bio::ASN1::EntrezGene.  Which package contains this module?

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From cjfields at uiuc.edu  Tue May 16 13:27:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 12:27:32 -0500
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
In-Reply-To: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <002701c6790e$03d8f110$15327e82@pyrimidine>

It's actually not part of Bioperl currently; you can find it on CPAN:

http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene-1.091/lib/Bio/ASN1/Ent
rezGene.pm

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, May 16, 2006 11:18 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
> 
> I just updated my local copy of bioperl from cvs.  When I ran the
> configure script, it says I need the external module
> Bio::ASN1::EntrezGene.  Which package contains this module?
> 
> --
> Ryan Golhar  -  golharam at umdnj.edu
> The Informatics Institute of UMDNJ
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue May 16 16:57:13 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 16 May 2006 16:57:13 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca>


With regards to the suggestions/comments made thank you. However I think
I should clear a few things up. I am running bioperl v1.4, I am cycling
through the blast reports which should not be of absurd size since they
only contain the top 5 hits, and I am using top to track(although I
realize fairly inacuately) the memory usage. I have looked through the
code for both AAFCBLAST and BEAST_UPDATE but do not believe the
leak/problem to be contained within them since they are almost
exclusively using method calls and those variables should be destroyed
upon leaving the scope of the method. I have used Devel::Size to check
the size of the variables $bdbi and $searchio and $connector and on each
iteration these variables have the same size. Any other suggestions
would be greatly appreciated as I have nearly gone insane trying to
track this problem down.

Thanks, Wayne 


-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] 
Sent: Monday, May 15, 2006 6:19 PM
To: Clarke, Wayne
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO

> taking up and huge amount of RAM. For a single job of 10000 queries it
> can consume as much as a couple hundred Mb inside an hour. I realize

>  my $result = $connector->getQueryResult($query_id);
>                 my $searchio = new Bio::SearchIO(-format => "blast",
>                 while (my $o_blast = $searchio->next_result()) {
>                         my $clone_id = $o_blast->query_name();
>                         my $statement = $bdbi->form_push_SQL
($o_blast, $clone_id, 5); }

Some comments:

Have you considered that whatever class/module $bdbi belongs to is 
causing the problem? ie. is it keeping a reference to $o_blast around?

Are you aware that Perl garbage collection does not necessarily return 
freed memory back to the OS? This may affect how you were measuring 
"memory usage".

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From smarkel at scitegic.com  Tue May 16 16:52:05 2006
From: smarkel at scitegic.com (smarkel at scitegic.com)
Date: Tue, 16 May 2006 13:52:05 -0700
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <20060516200436.34908.qmail@web36812.mail.mud.yahoo.com>
Message-ID: <OFE1576D7B.C032BA7B-ON88257170.00721261-88257170.00729CCD@scitegic.com>

Li,

You can either do the substring, and reverse complement, yourself
or you can use the translate() function in Bio::PrimarySeq.  It
inherits from Bio::PrimarySeqI, so check there for the documentation.
That translate() function takes a "-frame" argument.

Scott

PS In future, please respond to the list.  That way others see
the questions and answers.

chen li <chen_li3 at yahoo.com> wrote on 16.05.2006 13:04:36:

> Dear Dr. Markel,
> 
>     I browse through the document of 
> Bio:Tools::Codontable and find this line:
> 
> my $translation= $CodonTable->translate($seq);
> 
> I think this line is to do the translation. Here is my
> question: which line in the doc says how to translate
> the remaining frames 2,3, and -1, -2, -3? 
> 
> 
> Thank you,
> 
> Li
> 
> --- smarkel at scitegic.com wrote:
> 
> > Li,
> > 
> > Use the translate() function in
> > Bio::Tools::CodonTable.
> > 
> > Scott
> > 
> > Scott Markel, Ph.D.
> > Principal Bioinformatics Architect  email: 
> > smarkel at scitegic.com
> > SciTegic Inc.                       mobile: +1 858
> > 205 3653
> > 10188 Telesis Court, Suite 100      voice:  +1 858
> > 799 5603
> > San Diego, CA 92121                 fax:    +1 858
> > 279 8804
> > USA                                 web: 
> > http://www.scitegic.com
> > 
> > 
> > bioperl-l-bounces at lists.open-bio.org wrote on
> > 16.05.2006 07:55:51:
> > 
> > > Hi all,
> > > 
> > > I wonder which module is available for translating
> > DNA
> > > sequence into 6 reading frames.
> > > 
> > > Thank you,
> > > 
> > > Li
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> 
> -- 
> Click on the link below to report this email as spam
> https://www.mailcontrol.
> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO!
> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI!
> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT!
> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2!
> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV 


From cjfields at uiuc.edu  Tue May 16 17:15:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 16:15:10 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca>
Message-ID: <000601c6792d$d0ab1500$15327e82@pyrimidine>

I mentioned two possibilities last time I posted: 1) that the BLAST file was
too large, or 2) that you are using an old version of bioperl that SearchIO
is broken.  You seem to fit #2. 

The issue is that NCBI does not consider text BLAST output sacrosanct and
routinely makes changes to it that break parsing.  Due to this,
SearchIO::blast needs to be constantly updated, so much so that there are
normally a few updates a year to fix parsing issues in that module alone
compared to BioPerl as a whole.  And, BTW, although bioperl-1.4 is about 2
years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the
latest NCBI BLAST (2.2.14 now).  I seriously suggest updating your local
bioperl distribution to the latest bioperl-live (from CVS).

Take one of those 10000 reports, just one, and try parsing it.  If you have
the same problem (a CPU spike and increasing memory usage) then it may be
fixed in CVS.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 3:57 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> With regards to the suggestions/comments made thank you. However I think
> I should clear a few things up. I am running bioperl v1.4, I am cycling
> through the blast reports which should not be of absurd size since they
> only contain the top 5 hits, and I am using top to track(although I
> realize fairly inacuately) the memory usage. I have looked through the
> code for both AAFCBLAST and BEAST_UPDATE but do not believe the
> leak/problem to be contained within them since they are almost
> exclusively using method calls and those variables should be destroyed
> upon leaving the scope of the method. I have used Devel::Size to check
> the size of the variables $bdbi and $searchio and $connector and on each
> iteration these variables have the same size. Any other suggestions
> would be greatly appreciated as I have nearly gone insane trying to
> track this problem down.
> 
> Thanks, Wayne
> 
> 
> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Monday, May 15, 2006 6:19 PM
> To: Clarke, Wayne
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> > taking up and huge amount of RAM. For a single job of 10000 queries it
> > can consume as much as a couple hundred Mb inside an hour. I realize
> 
> >  my $result = $connector->getQueryResult($query_id);
> >                 my $searchio = new Bio::SearchIO(-format => "blast",
> >                 while (my $o_blast = $searchio->next_result()) {
> >                         my $clone_id = $o_blast->query_name();
> >                         my $statement = $bdbi->form_push_SQL
> ($o_blast, $clone_id, 5); }
> 
> Some comments:
> 
> Have you considered that whatever class/module $bdbi belongs to is
> causing the problem? ie. is it keeping a reference to $o_blast around?
> 
> Are you aware that Perl garbage collection does not necessarily return
> freed memory back to the OS? This may affect how you were measuring
> "memory usage".
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue May 16 17:24:51 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 16 May 2006 17:24:51 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca>


Thanks Chris, 

I did forget to mention however that I did parse one single report and
found no problems, it finished fast and with no noticeable memory usage.
I will consider getting my SA to update bioperl from CVS as a precaution
but he has already stated he prefers to wait for the release of v1.5.
Even a single job of 10000 will finish but the problem is that I am
trying to loop through many jobs of 10000 and it seems to be additive
for reasons I can not determine. During testing I noticed that the RSS
on top decreased around 80% MEM usage, but then the shared mem
increased. I am wondering if this is due to the perl garbage collector
freeing up memory but keeping it in its pool for use, if so that is fine
as long as the it does not then want to reach into swapped mem.

Thanks again, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Tuesday, May 16, 2006 3:15 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Memory Leak in Bio::SearchIO

I mentioned two possibilities last time I posted: 1) that the BLAST file
was
too large, or 2) that you are using an old version of bioperl that
SearchIO
is broken.  You seem to fit #2. 

The issue is that NCBI does not consider text BLAST output sacrosanct
and
routinely makes changes to it that break parsing.  Due to this,
SearchIO::blast needs to be constantly updated, so much so that there
are
normally a few updates a year to fix parsing issues in that module alone
compared to BioPerl as a whole.  And, BTW, although bioperl-1.4 is about
2
years old now, even bioperl-1.5.1 SearchIO is broken when it comes to
the
latest NCBI BLAST (2.2.14 now).  I seriously suggest updating your local
bioperl distribution to the latest bioperl-live (from CVS).

Take one of those 10000 reports, just one, and try parsing it.  If you
have
the same problem (a CPU spike and increasing memory usage) then it may
be
fixed in CVS.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 3:57 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> With regards to the suggestions/comments made thank you. However I
think
> I should clear a few things up. I am running bioperl v1.4, I am
cycling
> through the blast reports which should not be of absurd size since
they
> only contain the top 5 hits, and I am using top to track(although I
> realize fairly inacuately) the memory usage. I have looked through the
> code for both AAFCBLAST and BEAST_UPDATE but do not believe the
> leak/problem to be contained within them since they are almost
> exclusively using method calls and those variables should be destroyed
> upon leaving the scope of the method. I have used Devel::Size to check
> the size of the variables $bdbi and $searchio and $connector and on
each
> iteration these variables have the same size. Any other suggestions
> would be greatly appreciated as I have nearly gone insane trying to
> track this problem down.
> 
> Thanks, Wayne
> 
> 
> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Monday, May 15, 2006 6:19 PM
> To: Clarke, Wayne
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> > taking up and huge amount of RAM. For a single job of 10000 queries
it
> > can consume as much as a couple hundred Mb inside an hour. I realize
> 
> >  my $result = $connector->getQueryResult($query_id);
> >                 my $searchio = new Bio::SearchIO(-format => "blast",
> >                 while (my $o_blast = $searchio->next_result()) {
> >                         my $clone_id = $o_blast->query_name();
> >                         my $statement = $bdbi->form_push_SQL
> ($o_blast, $clone_id, 5); }
> 
> Some comments:
> 
> Have you considered that whatever class/module $bdbi belongs to is
> causing the problem? ie. is it keeping a reference to $o_blast around?
> 
> Are you aware that Perl garbage collection does not necessarily return
> freed memory back to the OS? This may affect how you were measuring
> "memory usage".
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue May 16 17:45:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 16:45:16 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca>
Message-ID: <000801c67932$050dbd30$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 4:25 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> Thanks Chris,
> 
> I did forget to mention however that I did parse one single report and
> found no problems, it finished fast and with no noticeable memory usage.
> I will consider getting my SA to update bioperl from CVS as a precaution
> but he has already stated he prefers to wait for the release of v1.5.

Um, you can tell him the last release was v.1.5.1 (last October).  It's
considered a developer release but is pretty stable; well, except for that
whole SearchIO quibble, and that's not our fault.

You could also install a local version in case he doesn't budge; see here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_I
N_A_PERSONAL_MODULE_AREA

Chris

> Even a single job of 10000 will finish but the problem is that I am
> trying to loop through many jobs of 10000 and it seems to be additive
> for reasons I can not determine. During testing I noticed that the RSS
> on top decreased around 80% MEM usage, but then the shared mem
> increased. I am wondering if this is due to the perl garbage collector
> freeing up memory but keeping it in its pool for use, if so that is fine
> as long as the it does not then want to reach into swapped mem.
> 
> Thanks again, Wayne
> ...


From cjfields at uiuc.edu  Tue May 16 18:20:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 17:20:29 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
In-Reply-To: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com>
Message-ID: <000901c67936$f0896990$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernd Web
> Sent: Tuesday, May 16, 2006 6:38 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
> 
> Hi all,
> 
> I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and
> found some issues and differences (bugs?) in behaviour wrt the pod.
> Do these look familiar ?
> 
> Some example code:
> my $query = Bio::DB::Query::GenBank->new
>        (-query   =>'Lassa Virus[ORGN]',
>         -reldate => '30',
>         -db      => 'protein',
>         -ids => [195052,2981014,11127914],
>         -maxids => 30 );
> 
> $gb = new Bio::DB::GenBank(format=>'fasta');
> my $seqio = $gb->get_Stream_by_query($query);
> while (my $seq = $seqio->next_seq) {
>        print $seq->desc,"\n"; }
> 
> The module states that if we provide -ids that:
>        If you provide an array reference of IDs in -ids, the query will be
>        ignored and the list of IDs will be used when the query is passed
> to a
>        Bio::DB::GenBank object's get_Stream_by_query() method.
> 
> In the above case actually the query is passed ('Lassa Virus[ORGN]),
> not the IDs. Also $query->query shows the original query. Am I doing
> something wrong or is the pod not reflecting current behaviour of this
> module?
> 
> I was also surprised that if internet is down no warning is thrown for
> $query->query or $query->count at all. Only the get_Stream_by_query
> above will warn us if the site is unreachable (500 Internal Server
> Error).

I believe this has to do with the difference in the objects and the way they
retrieve request data; Bio::DB::GenBank and Bio::DB::Query::GenBank use
different methods to retrieve ids, Bio::DB::GenBank's get_Stream_by_query
method just makes it a bit easier to retrieve a list of uid's directly
instead of saving them as an array then reposting them using
get_Stream_by_id.  Not fullproof but it works okay.

> $query->ids or $query->count will not throw a warning and
> @ids=$query->ids will just be an empty array. (I realize $query->count
> is not initialized, so I am using this now to check for succes, but a
> warning from WebDBSeqI would me more approprotiate I think).

WebDBSeqI would be the place to make general warnings (it supposed to be and
interface for any web seq DB), but not eutils-specific warnings. 

> Last, the example from the pod is not working, but no warnings are raised:
>           # initialize the list yourself
>           my $query =
> Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]);
> 
> $query->count returns zero w/o any warning. Of course this query did
> not specify a DB. Only if we specify -db=>'nucleotide' $query->count
> is 3.
> However, why not any warning if we set -db->'protein' or if we did not set
> this?
>
>
> On the NCBI website searching Protein DB returns for 19505:
>       See Details. No items found.
>       The following term(s) refer to a different DB:195052
> 
> But this is not reflected via Bio::DB::Query::GenBank.
> 
> Can I check for this situation in the code apart from checking on
> $query->count == 0 ? Or would it indeed be better to check for these
> situations in the module?
> 
> Regards,
> Bernd

I can probably play around with adding a few things in tomorrow and clean up
the POD somewhat.  I'm planning a rewrite for EUtilities-based searches but
that's a ways off still...  Can't promise much;l I'm pretty busy til next
week.

Chris


From chen_li3 at yahoo.com  Tue May 16 20:53:17 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 16 May 2006 17:53:17 -0700 (PDT)
Subject: [Bioperl-l] module for formating sequence output on the screen
Message-ID: <20060517005317.3976.qmail@web36815.mail.mud.yahoo.com>

Hi all,

Thank you very much for the help.

I have some DNA sequences printed on the screen. But
the default output is longer than I expect.  I need 50
necleotides/line. I search CPAN but can not get the
right module.  Which bioperl module can do this job?

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From kmdaily at indiana.edu  Tue May 16 09:57:52 2006
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Tue, 16 May 2006 09:57:52 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>
Message-ID: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>

OK, got that installed. But I still get an error:

Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557.

I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core".

Kenny Daily
IU School of Informatics
kmdaily at indiana.edu


-----Original Message-----
From: Christopher Fields [mailto:cjfields at uiuc.edu]
Sent: Tue 5/16/2006 7:38 AM
To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
 
You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
developer release  (1.5.1):

http://www.bioperl.org/wiki/Installing_BioPerl

Chris

---- Original message ----
>Date: Mon, 15 May 2006 17:00:12 -0400
>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>To: <bioperl-l at lists.open-bio.org>
>
>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
Bio/SeqIO). How can I get this module?
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From skirov at utk.edu  Wed May 17 07:48:29 2006
From: skirov at utk.edu (Stefan Kirov)
Date: Wed, 17 May 2006 07:48:29 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
In-Reply-To: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>
References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>
	<20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>
Message-ID: <446B0D8D.40901@utk.edu>

You are using an old Bio::Annotation::DBLink module. Did you download 
only entrezgene.pm or the whole  bioperl? If yes, what does the tests 
tell you?
Stefan
 
Daily, Kenneth Michael wrote:

>OK, got that installed. But I still get an error:
>
>Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557.
>
>I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core".
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>
>-----Original Message-----
>From: Christopher Fields [mailto:cjfields at uiuc.edu]
>Sent: Tue 5/16/2006 7:38 AM
>To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
> 
>You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
>developer release  (1.5.1):
>
>http://www.bioperl.org/wiki/Installing_BioPerl
>
>Chris
>
>---- Original message ----
>  
>
>>Date: Mon, 15 May 2006 17:00:12 -0400
>>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>>To: <bioperl-l at lists.open-bio.org>
>>
>>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
>>    
>>
>Bio/SeqIO). How can I get this module?
>  
>
>>Kenny Daily
>>IU School of Informatics
>>kmdaily at indiana.edu
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>    
>>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>  
>


From osborne1 at optonline.net  Tue May 16 20:46:00 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 16 May 2006 20:46:00 -0400
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <OFE1576D7B.C032BA7B-ON88257170.00721261-88257170.00729CCD@scitegic.com>
Message-ID: <C08FEA88.877A%osborne1@optonline.net>

Chen Li,

There's some documentation on translate() in bptutorial:

http://bioperl.org/Core/Latest/bptutorial.html

You could also use the translate_6frames() method of Bio::SeqUtils.


Brian O.


On 5/16/06 4:52 PM, "smarkel at scitegic.com" <smarkel at scitegic.com> wrote:

> Li,
> 
> You can either do the substring, and reverse complement, yourself
> or you can use the translate() function in Bio::PrimarySeq.  It
> inherits from Bio::PrimarySeqI, so check there for the documentation.
> That translate() function takes a "-frame" argument.
> 
> Scott
> 
> PS In future, please respond to the list.  That way others see
> the questions and answers.
> 
> chen li <chen_li3 at yahoo.com> wrote on 16.05.2006 13:04:36:
> 
>> Dear Dr. Markel,
>> 
>>     I browse through the document of
>> Bio:Tools::Codontable and find this line:
>> 
>> my $translation= $CodonTable->translate($seq);
>> 
>> I think this line is to do the translation. Here is my
>> question: which line in the doc says how to translate
>> the remaining frames 2,3, and -1, -2, -3?
>> 
>> 
>> Thank you,
>> 
>> Li
>> 
>> --- smarkel at scitegic.com wrote:
>> 
>>> Li,
>>> 
>>> Use the translate() function in
>>> Bio::Tools::CodonTable.
>>> 
>>> Scott
>>> 
>>> Scott Markel, Ph.D.
>>> Principal Bioinformatics Architect  email:
>>> smarkel at scitegic.com
>>> SciTegic Inc.                       mobile: +1 858
>>> 205 3653
>>> 10188 Telesis Court, Suite 100      voice:  +1 858
>>> 799 5603
>>> San Diego, CA 92121                 fax:    +1 858
>>> 279 8804
>>> USA                                 web:
>>> http://www.scitegic.com
>>> 
>>> 
>>> bioperl-l-bounces at lists.open-bio.org wrote on
>>> 16.05.2006 07:55:51:
>>> 
>>>> Hi all,
>>>> 
>>>> I wonder which module is available for translating
>>> DNA
>>>> sequence into 6 reading frames.
>>>> 
>>>> Thank you,
>>>> 
>>>> Li
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> 
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>> 
>> 
>> -- 
>> Click on the link below to report this email as spam
>> https://www.mailcontrol.
>> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO!
>> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI!
>> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT!
>> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2!
>> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e-just at northwestern.edu  Wed May 17 11:03:41 2006
From: e-just at northwestern.edu (Eric Just)
Date: Wed, 17 May 2006 10:03:41 -0500
Subject: [Bioperl-l] Modware: a BioPerl based API for Chado
Message-ID: <6.1.1.1.2.20060517095821.13353920@hecky.it.northwestern.edu>

Hi Everyone,

We are announcing a new Sourceforge Project called Modware that may be of 
interest to you.   It is an object-oriented API written in Perl that 
creates BioPerl object representations of biological features stored in a 
Chado database. It basically creates a Bio::Seq object for chromosomes in 
Chado and creates Bio::SeqFeature::Gene objects for protein coding 
transcripts stored in Chado.  Things like contigs are represented as 
Bio::SeqFeature::Generic objects.  We also provide many methods for 
manipulating these objects once they are in memory.

For download please visit our Sourceforge project page:
http://sourceforge.net/projects/gmod-ware

For API documentation and some short examples of selected use cases visit 
our project home page:
http://gmod-ware.sourceforge.net/

This software is adapted from the production middleware code that dictyBase 
uses.  Modware 0.1 requires the latest stable GMOD release: 0.003 be 
installed.  We are currently calling it a release candidate and if we get 
some feedback will call it an official release if there are no major 
install bugs (we've installed it only on two different machines).  If you 
would like a version that works on the latest CVS version of GMOD, let me 
know and I'll expedite getting that out the door.

Lastly, please use the direct download version, we have not fully recovered 
from the recent Sourceforge CVS issues.

Please try the software out and let us know what you think!


Sincerely,
Eric Just and Sohel Merchant

e-just at northwestern.edu
s-merchant at northwestern.edu


============================================

Eric Just
e-just at northwestern.edu
dictyBase Programmer
Center for Genetic Medicine
Northwestern University
http://dictybase.org

============================================ 


From sb at mrc-dunn.cam.ac.uk  Wed May 17 13:46:45 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 17 May 2006 18:46:45 +0100
Subject: [Bioperl-l] Bio::Map:: enhancements
Message-ID: <446B6185.1000602@mrc-dunn.cam.ac.uk>

I added bug http://bugzilla.bioperl.org/show_bug.cgi?id=1998

I'm interested in what people have to say about the secondary 
enhancement I talk about there. Is it a sane thing to do? What are the 
better ways of doing that?
If it /is/ ok, I suppose I'd have to go back and alter 
Bio::Map::MappableI and Bio::Map::MarkerI as well, not just Marker.


Oh, on a side note, you'll see I had to override RangeI's intersection 
method to work on multiple ranges. Why is RangeI limited to an 
intersection of only two ranges?

Cheers,
Sendu.


From David_Waner/San_Diego/Accelrys at scitegic.com  Thu May 18 15:30:46 2006
From: David_Waner/San_Diego/Accelrys at scitegic.com (David_Waner/San_Diego/Accelrys at scitegic.com)
Date: Thu, 18 May 2006 12:30:46 -0700
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on
	Windows
Message-ID: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>

BioPerl Users/Developers,

In our testing we have found severe performance problems using BioPerl 
with Perl 5.8 on Windows (but not on Linux). They show up especially in 
SeqIO when reading or writing Fasta files containing large (~16 MB) 
sequences.  The same files that can be read in 1 or 2 seconds with Windows 
Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.

Although the fault is clearly with Perl, not with BioPerl, I have 
identified a couple of places where BioPerl could be modified in order to 
save Windows Perl 5.8 users a lot of time, while not affecting other 
users. 

For example, in my testing the following excerpt from 
Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a 
16 MB sequence):

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015?\012/\n/g;
        $line =~ s/\015/\n/g unless $ONMAC;
    }
 
whereas the following replacement code should be equivalent: 

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015\012/\012/g;                        # Change all 
CR/LF pairs to LF
        $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to 
NEWLINE
    }
 
but executes in less than 1 second.

In addition, changing:

    defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
 
to:

    defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove 
whitespace
 
in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.

There are also problems in reading files with the <> operator when $/ is 
redefined to "\n>", where reading the first line of Fasta files containing 
large sequences takes ~50 seconds, but reading subsequent lines or files 
takes about 1 second. I don't have a work-around for this.

I would like to ask the mailing list:

1. Has anyone else run into this problem? Any fixes?
2. Do you think BioPerl should incorporate these changes? 

I plan to submit a bug report to perlbug, but don't know when or if the 
problem will be fixed. 

- David


From cjfields at uiuc.edu  Thu May 18 16:07:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 18 May 2006 15:07:14 -0500
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8
	onWindows
In-Reply-To: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>
Message-ID: <002901c67ab6$a84c3140$15327e82@pyrimidine>

David,

I have seen some slowdowns with Bio::SeqIO associated with GenBank files,
which this could be related to.  I can't do anything about it (test or
commit changes) until next week but someone else using Windows might (though
we are few and far between, and I'm switching to Mac OS X in fall).  Would
be nice to try the changes and test it out on a few platforms.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of
> David_Waner/San_Diego/Accelrys at scitegic.com
> Sent: Thursday, May 18, 2006 2:31 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8
> onWindows
> 
> BioPerl Users/Developers,
> 
> In our testing we have found severe performance problems using BioPerl
> with Perl 5.8 on Windows (but not on Linux). They show up especially in
> SeqIO when reading or writing Fasta files containing large (~16 MB)
> sequences.  The same files that can be read in 1 or 2 seconds with Windows
> Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.
> 
> Although the fault is clearly with Perl, not with BioPerl, I have
> identified a couple of places where BioPerl could be modified in order to
> save Windows Perl 5.8 users a lot of time, while not affecting other
> users.
> 
> For example, in my testing the following excerpt from
> Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a
> 16 MB sequence):
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015?\012/\n/g;
>         $line =~ s/\015/\n/g unless $ONMAC;
>     }
> 
> whereas the following replacement code should be equivalent:
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015\012/\012/g;                        # Change all
> CR/LF pairs to LF
>         $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to
> NEWLINE
>     }
> 
> but executes in less than 1 second.
> 
> In addition, changing:
> 
>     defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
> 
> to:
> 
>     defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove
> whitespace
> 
> in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.
> 
> There are also problems in reading files with the <> operator when $/ is
> redefined to "\n>", where reading the first line of Fasta files containing
> large sequences takes ~50 seconds, but reading subsequent lines or files
> takes about 1 second. I don't have a work-around for this.
> 
> I would like to ask the mailing list:
> 
> 1. Has anyone else run into this problem? Any fixes?
> 2. Do you think BioPerl should incorporate these changes?
> 
> I plan to submit a bug report to perlbug, but don't know when or if the
> problem will be fixed.
> 
> - David
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Thu May 18 16:27:57 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 18 May 2006 16:27:57 -0400
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on
 Windows
In-Reply-To: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>
Message-ID: <C092510D.87EB%osborne1@optonline.net>

David,

What are the results from the relevant t/*t files before and after these
patches?

Brian O.


On 5/18/06 3:30 PM, "David_Waner/San_Diego/Accelrys at scitegic.com"
<David_Waner/San_Diego/Accelrys at scitegic.com> wrote:

> BioPerl Users/Developers,
> 
> In our testing we have found severe performance problems using BioPerl
> with Perl 5.8 on Windows (but not on Linux). They show up especially in
> SeqIO when reading or writing Fasta files containing large (~16 MB)
> sequences.  The same files that can be read in 1 or 2 seconds with Windows
> Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.
> 
> Although the fault is clearly with Perl, not with BioPerl, I have
> identified a couple of places where BioPerl could be modified in order to
> save Windows Perl 5.8 users a lot of time, while not affecting other
> users. 
> 
> For example, in my testing the following excerpt from
> Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a
> 16 MB sequence):
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015?\012/\n/g;
>         $line =~ s/\015/\n/g unless $ONMAC;
>     }
>  
> whereas the following replacement code should be equivalent:
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015\012/\012/g;                        # Change all
> CR/LF pairs to LF
>         $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to
> NEWLINE
>     }
>  
> but executes in less than 1 second.
> 
> In addition, changing:
> 
>     defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
>  
> to:
> 
>     defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove
> whitespace
>  
> in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.
> 
> There are also problems in reading files with the <> operator when $/ is
> redefined to "\n>", where reading the first line of Fasta files containing
> large sequences takes ~50 seconds, but reading subsequent lines or files
> takes about 1 second. I don't have a work-around for this.
> 
> I would like to ask the mailing list:
> 
> 1. Has anyone else run into this problem? Any fixes?
> 2. Do you think BioPerl should incorporate these changes?
> 
> I plan to submit a bug report to perlbug, but don't know when or if the
> problem will be fixed.
> 
> - David
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Thu May 18 16:41:27 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 18 May 2006 14:41:27 -0600
Subject: [Bioperl-l] parsing xml output
Message-ID: <446CDBF7.10908@gmx.at>

hi,
what is the best way to parse NCBI- and WU- Blast XML output....
and is it possible to parse both with the same parser, or differ their 
XML output...

thanks


From staffa at niehs.nih.gov  Thu May 18 16:49:15 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Thu, 18 May 2006 16:49:15 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>

Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations.
Namely the six D.melanogaster sequences.  
Specifically to find gene entries and learn the gene name, begin and end and CDS.
Please point me to appropriate modules and documentation.


Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From adamnkraut at gmail.com  Thu May 18 17:07:42 2006
From: adamnkraut at gmail.com (Adam Kraut)
Date: Thu, 18 May 2006 17:07:42 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C?
Message-ID: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>

I am currently using a pairwise alignment algorithm written in C (not by
me).  The program consists of a library of routines, structures, and
definitions which I do not want to spend a lot of time abstracting.  I
already have a hack method of writing the parameters and inputs I want from
perl, calling the c program with system( ), and then parsing the output in
Perl.  Any good programmer would probably smack me but I'm just an undergrad
and I needed to show my boss that this works in order to spend more time on
it.

So on to my question, what is the preferred method of extending Bioperl to
use this algorithm?  I have just read the XS tutorial and a bit about Inline
C.  Can I put the main function in my script using Inline, and then just
point Inline at the rest of the C library?  The program has several
C-structures that are semantically equivalent to Bioperl objects, so just
need somewhere to start.  I will spend some more time so that I have a more
specific question, I just wanted a little feedback, this is my first post to
the bioperl list.

Thanks,
Adam


From osborne1 at optonline.net  Thu May 18 17:54:01 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 18 May 2006 17:54:01 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
Message-ID: <C0926539.87F5%osborne1@optonline.net>

Nick,

Have you read the Feature-Annotation HOWTO? This would be a good starting
point...

Brian O.


On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
wrote:

> Would like a fairly simple way to extract certain information from Genbank
> Genomic File Annotations.
> Namely the six D.melanogaster sequences.
> Specifically to find gene entries and learn the gene name, begin and end and
> CDS.
> Please point me to appropriate modules and documentation.
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 18 18:22:32 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 18 May 2006 18:22:32 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446CDBF7.10908@gmx.at>
References: <446CDBF7.10908@gmx.at>
Message-ID: <EA7E8F20-2531-45B2-83CD-FDA216A18615@duke.edu>

we don't parse WU-BLAST XML at this time.  We'd welcome someone  
contributing this.

ncbi XML is parsed with blastxml format.

-jason
On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:

> hi,
> what is the best way to parse NCBI- and WU- Blast XML output....
> and is it possible to parse both with the same parser, or differ their
> XML output...
>
> thanks
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From MEC at stowers-institute.org  Thu May 18 18:39:15 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 18 May 2006 17:39:15 -0500
Subject: [Bioperl-l] module for formating sequence output on the screen
Message-ID: <CED81D34E37D5043A1211565277A51E50563F496@exchkc02.stowers-institute.org>

Li,

Here's a one-liner that uses bioperl's Bio::SeqIO module to reformat
fasta on standard input to 50 char wide fasta on standard output.

perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta",
-width => 50);  $in = Bio::SeqIO->newFh(-format => "fasta", -fh =>
\*STDIN); print while <$in>' 

You can call it like this:

perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta",
-width => 50);  $in = Bio::SeqIO->newFh(-format => "fasta", -fh =>
\*STDIN); print while <$in>' inputfile.fasta > outputfile.fasta

Does this help?

--Malcolm Cook


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
>Sent: Tuesday, May 16, 2006 7:53 PM
>To: bioperl-l at bioperl.org
>Subject: [Bioperl-l] module for formating sequence output on the screen
>
>Hi all,
>
>Thank you very much for the help.
>
>I have some DNA sequences printed on the screen. But
>the default output is longer than I expect.  I need 50
>necleotides/line. I search CPAN but can not get the
>right module.  Which bioperl module can do this job?
>
>Li
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gish at watson.wustl.edu  Thu May 18 19:57:03 2006
From: gish at watson.wustl.edu (Warren Gish)
Date: Thu, 18 May 2006 18:57:03 -0500
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <EA7E8F20-2531-45B2-83CD-FDA216A18615@duke.edu>
Message-ID: <009f01c67ad6$c359a560$0d00a8c0@PM>

Just to clarify, the XML output from WU-BLAST conforms to the standard
NCBI_BlastOutput.dtd.  Technically, contents of data fields could still be
incompatible, but care was taken to ensure compatibility.  If someone
identifies a difference that prevents parsing or proper interpretation of
the WU-BLAST output, please let me know.
Regards,
--Warren 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Thursday, May 18, 2006 5:23 PM
> To: Hubert Prielinger
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] parsing xml output
> 
> we don't parse WU-BLAST XML at this time.  We'd welcome someone  
> contributing this.
> 
> ncbi XML is parsed with blastxml format.
> 
> -jason
> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
> 
> > hi,
> > what is the best way to parse NCBI- and WU- Blast XML output....
> > and is it possible to parse both with the same parser, or 
> differ their
> > XML output...
> >
> > thanks
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Thu May 18 21:10:50 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Thu, 18 May 2006 20:10:50 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <ca6b1c14.ba9e5f4f.81c0100@expms6.cites.uiuc.edu>

Just to make sure everybody knows, if you use bioperl v1.5.1, 
SearchIO::blastxml uses XML::Parser which should come with most recent perl 
distributions.   The bioperl-live version has switched over to XML::SAX for SAX2 
parsing and it is recommended that you install XML::SAX::ExpatXS as well for 
faster parsing. 

Chris

---- Original message ----
>Date: Thu, 18 May 2006 18:57:03 -0500
>From: "Warren Gish" <gish at watson.wustl.edu>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: "'Hubert Prielinger'" <hubert.prielinger at gmx.at>
>Cc: bioperl-l at bioperl.org
>
>Just to clarify, the XML output from WU-BLAST conforms to the standard
>NCBI_BlastOutput.dtd.  Technically, contents of data fields could still be
>incompatible, but care was taken to ensure compatibility.  If someone
>identifies a difference that prevents parsing or proper interpretation of
>the WU-BLAST output, please let me know.
>Regards,
>--Warren 
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Jason Stajich
>> Sent: Thursday, May 18, 2006 5:23 PM
>> To: Hubert Prielinger
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] parsing xml output
>> 
>> we don't parse WU-BLAST XML at this time.  We'd welcome someone  
>> contributing this.
>> 
>> ncbi XML is parsed with blastxml format.
>> 
>> -jason
>> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
>> 
>> > hi,
>> > what is the best way to parse NCBI- and WU- Blast XML output....
>> > and is it possible to parse both with the same parser, or 
>> differ their
>> > XML output...
>> >
>> > thanks
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Fri May 19 08:52:13 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 19 May 2006 08:52:13 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <009f01c67ad6$c359a560$0d00a8c0@PM>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>
Message-ID: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>

Whoops - sorry Warren - for some reason I had it in my mind that it  
was different.  So the blastxml parser should work fine.  The WUBLAST  
tab-delimited output is different than NCBI's -m8/9 though, right?

-jason


On May 18, 2006, at 7:57 PM, Warren Gish wrote:

> Just to clarify, the XML output from WU-BLAST conforms to the standard
> NCBI_BlastOutput.dtd.  Technically, contents of data fields could  
> still be
> incompatible, but care was taken to ensure compatibility.  If someone
> identifies a difference that prevents parsing or proper  
> interpretation of
> the WU-BLAST output, please let me know.
> Regards,
> --Warren
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Jason Stajich
>> Sent: Thursday, May 18, 2006 5:23 PM
>> To: Hubert Prielinger
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] parsing xml output
>>
>> we don't parse WU-BLAST XML at this time.  We'd welcome someone
>> contributing this.
>>
>> ncbi XML is parsed with blastxml format.
>>
>> -jason
>> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>
>>> hi,
>>> what is the best way to parse NCBI- and WU- Blast XML output....
>>> and is it possible to parse both with the same parser, or
>> differ their
>>> XML output...
>>>
>>> thanks
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Thu May 18 18:42:05 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 19 May 2006 08:42:05 +1000
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446CDBF7.10908@gmx.at>
References: <446CDBF7.10908@gmx.at>
Message-ID: <446CF83D.60207@infotech.monash.edu.au>

> what is the best way to parse NCBI- and WU- Blast XML output....
> and is it possible to parse both with the same parser, or differ their 
> XML output...


For NCBI BLAST XML format, use
	Bio::SearchIO->new(-format=>'blastxml', ...)

I don't know if 'blastxml' will load WU-BLAST XML format.
http://www.bioperl.org/wiki/HOWTO:SearchIO does not mention it.

Why not try it, and report back the results to the bioperl list?

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.seemann.vcf
Type: text/x-vcard
Size: 348 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/b6343abe/attachment-0003.vcf>

From torsten.seemann at infotech.monash.edu.au  Thu May 18 18:37:17 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 19 May 2006 08:37:17 +1000
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
Message-ID: <446CF71D.2070207@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations.
> Namely the six D.melanogaster sequences.  
> Specifically to find gene entries and learn the gene name, begin and end and CDS.
> Please point me to appropriate modules and documentation.

http://www.bioperl.org/
-> http://www.bioperl.org/wiki/HOWTOs
-> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

http://www.bioperl.org/
-> http://www.bioperl.org/wiki/FAQ
-> http://www.bioperl.org/wiki/FAQ#Annotations_and_Features

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.seemann.vcf
Type: text/x-vcard
Size: 348 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/27f849fc/attachment-0003.vcf>

From gish at watson.wustl.edu  Fri May 19 10:50:08 2006
From: gish at watson.wustl.edu (Warren Gish)
Date: Fri, 19 May 2006 09:50:08 -0500
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>
	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
Message-ID: <D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>

Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
blast.wustl.edu/blast/tabular.html).
--Warren

> Whoops - sorry Warren - for some reason I had it in my mind that it  
> was different.  So the blastxml parser should work fine.  The  
> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
> right?
>
> -jason


From adamnkraut at gmail.com  Fri May 19 11:04:01 2006
From: adamnkraut at gmail.com (Adam Kraut)
Date: Fri, 19 May 2006 11:04:01 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
	C?
In-Reply-To: <OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>
References: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>
	<OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>
Message-ID: <134ede0b0605190804i60ee5ce1v984a33e0c91adf52@mail.gmail.com>

The program generates an ensemble of weighted suboptimal alignments by use
of a partition function and stochastic backtracking.  The algorithm is quite
novel and it's really only part of a larger multi-scale comparative modeling
project. There documentation is here:

http://www.tbi.univie.ac.at/~ulim/probA/probA_lib.html

While I think this would be useful to the bioperl community if it were fully
abstracted/extended, I would at the least like to be able to pass in any two
sequences and get back SimpleAlign objects for our internal uses first.  I
have a good idea on how to get started.  I will be sure to post when I get
into trouble.


On 5/19/06, aaron.j.mackey at gsk.com <aaron.j.mackey at gsk.com> wrote:
>
> bioperl-ext is the package in which alignment algorithms and/or BioPerl
> "wrapped" external C libraries live.  Subprojects in bioperl-ext use both
> XS and Inline::C, that's up to you.
>
> You'll need to get your C code compiled to a dynamically loaded library
> (.so) to use either XS or Inline::C; this precludes any reuse of the C
> main() function (although your Inline::C wrapper might recapitulate/copy
> the main() function code).
>
> Out of curiosity, what pairwise alignment algorithm are you using?  This
> is a heavily beaten path, you might want to dig around first to see if
> someone else already has what you need.
>
> -Aaron
>
>


From slenk at emich.edu  Fri May 19 10:42:41 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Fri, 19 May 2006 10:42:41 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
	C?
Message-ID: <f141831f144a37.f144a37f141831@emich.edu>

There is nothing wrong with a reasonable way that works - better not 
to put yourself down.

Inline is good if you can get it to work for you - I have had issues 
with linking Inline to dynamic libraries. I believe Inline makes a 
file that has linkage characteristics specified. Try it and see, then 
tell people how you did it. My two cents.

Another way to use exterior executables is popen3, then reading and 
writing to the pipes. I use it (primer3 and local lab automation 
code) - snippet follows:

my $pid     = 0;
my $cancmd  = 'cancmd.exe';
my $write   = 0;
my $read    = 0;

sub new {

    my $c = {};

    $pid   = open3(\*WTRFH, \*RDRFH, \*RDRFH, $cancmd);

    $write = *WTRFH;
    $read  = *RDRFH;

    $write->autoflush();

    bless $c;
    return $c;
}

Just write your request, then read it back - I make sure that each 
pair is a newline terminated text line - be sure you harvest the child 
pid when you are done.


----- Original Message -----
From: Adam Kraut <adamnkraut at gmail.com>
Date: Thursday, May 18, 2006 5:07 pm
Subject: [Bioperl-l] writing a pairwise alignment module: XS and 
Inline C?

> I am currently using a pairwise alignment algorithm written in C 
> (not by
> me).  The program consists of a library of routines, structures, and
> definitions which I do not want to spend a lot of time 
> abstracting.  I
> already have a hack method of writing the parameters and inputs I 
> want from
> perl, calling the c program with system( ), and then parsing the 
> output in
> Perl.  Any good programmer would probably smack me but I'm just an 
> undergradand I needed to show my boss that this works in order to 
> spend more time on
> it.
> 
> So on to my question, what is the preferred method of extending 
> Bioperl to
> use this algorithm?  I have just read the XS tutorial and a bit 
> about Inline
> C.  Can I put the main function in my script using Inline, and 
> then just
> point Inline at the rest of the C library?  The program has several
> C-structures that are semantically equivalent to Bioperl objects, 
> so just
> need somewhere to start.  I will spend some more time so that I 
> have a more
> specific question, I just wanted a little feedback, this is my 
> first post to
> the bioperl list.
> 
> Thanks,
> Adam
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hubert.prielinger at gmx.at  Fri May 19 12:52:28 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 10:52:28 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
	<D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
Message-ID: <446DF7CC.5060509@gmx.at>

hi,
I wondered whether is it also possible in the xml output (either WU or 
NCBI - Blast) to get the species (taxononmy) for every hit, if I do a 
general search.
regards

Warren Gish wrote:
> Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
> blast.wustl.edu/blast/tabular.html).
> --Warren
>
>   
>> Whoops - sorry Warren - for some reason I had it in my mind that it  
>> was different.  So the blastxml parser should work fine.  The  
>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
>> right?
>>
>> -jason
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From staffa at niehs.nih.gov  Fri May 19 14:12:47 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Fri, 19 May 2006 14:12:47 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <C0926539.87F5%osborne1@optonline.net>
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>

Specifically: 
I have the document to which you refer,
but have not seen this one thing I need in the printout of tags etc.:
the values in this line;
     mRNA            join(380..509,578..1913,7784..8649,9439..10200)
Is that a  location object?


Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


> ----------
> From: 	Brian Osborne
> Sent: 	Thursday, May 18, 2006 5:54 PM
> To: 	Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
> Subject: 	Re: [Bioperl-l] Reading GenBank Genomic File Annotation
> 
> Nick,
> 
> Have you read the Feature-Annotation HOWTO? This would be a good starting
> point...
> 
> Brian O.
> 
> 
> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
> wrote:
> 
> > Would like a fairly simple way to extract certain information from Genbank
> > Genomic File Annotations.
> > Namely the six D.melanogaster sequences.
> > Specifically to find gene entries and learn the gene name, begin and end and
> > CDS.
> > Please point me to appropriate modules and documentation.
> > 
> > 
> > Nick Staffa
> > Telephone: 919-316-4569  (NIEHS: 6-4569)
> > Scientific Computing Support Group
> > NIEHS Information Technology Support Services Contract
> > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> > National Institute of Environmental Health Sciences
> > National Institutes of Health
> > Research Triangle Park, North Carolina
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 


From chandan.kr.singh at gmail.com  Fri May 19 14:37:26 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Sat, 20 May 2006 00:07:26 +0530
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
References: <C0926539.87F5%osborne1@optonline.net>
	<7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
Message-ID: <2d4f320605191137n11017ec0xe41a632a3c7ea9a9@mail.gmail.com>

On 5/19/06, Staffa, Nick (NIH/NIEHS) [C] <staffa at niehs.nih.gov> wrote:
>
> Specifically:
> I have the document to which you refer,
> but have not seen this one thing I need in the printout of tags etc.:
> the values in this line;
>      mRNA            join(380..509,578..1913,7784..8649,9439..10200)
> Is that a  location object?


Yes it is a  location object .  If you  want  that  as a  string (this is
what seems  from ur mail ) , u just have to do this :

$loc = $fet->location();

$loc_str = $loc->to_FTstring() ;

Hope it helps.
Chandan

Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> > ----------
> > From:         Brian Osborne
> > Sent:         Thursday, May 18, 2006 5:54 PM
> > To:   Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
> > Subject:      Re: [Bioperl-l] Reading GenBank Genomic File Annotation
> >
> > Nick,
> >
> > Have you read the Feature-Annotation HOWTO? This would be a good
> starting
> > point...
> >
> > Brian O.
> >
> >
> > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov
> >
> > wrote:
> >
> > > Would like a fairly simple way to extract certain information from
> Genbank
> > > Genomic File Annotations.
> > > Namely the six D.melanogaster sequences.
> > > Specifically to find gene entries and learn the gene name, begin and
> end and
> > > CDS.
> > > Please point me to appropriate modules and documentation.
> > >
> > >
> > > Nick Staffa
> > > Telephone: 919-316-4569  (NIEHS: 6-4569)
> > > Scientific Computing Support Group
> > > NIEHS Information Technology Support Services Contract
> > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> > > National Institute of Environmental Health Sciences
> > > National Institutes of Health
> > > Research Triangle Park, North Carolina
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From osborne1 at optonline.net  Fri May 19 15:39:36 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 19 May 2006 15:39:36 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
Message-ID: <C0939738.8849%osborne1@optonline.net>

Nick,

This is from the HOWTO:

Another way of describing a feature in Genbank involves multiple start and
end positions. These could be called "split" locations, and a very common
example is the join statement in the CDS feature found in Genbank entries
(e.g. join(45..122,233..267)). This calls for a specialized object,
Bio::Location::SplitLocationI, which is a container for Location objects:

      for my $feature ($seqobj->top_SeqFeatures){
        if ( $feature->location->isa('Bio::Location::SplitLocationI')
                       && $feature->primary_tag eq 'CDS' )  {
            for my $location ( $feature->location->sub_Location ) {
                print $location->start . ".." . $location->end . "\n";
          }
        }
      }


Brian O.


On 5/19/06 2:12 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
wrote:

> Specifically: 
> I have the document to which you refer,
> but have not seen this one thing I need in the printout of tags etc.:
> the values in this line;
>      mRNA            join(380..509,578..1913,7784..8649,9439..10200)
> Is that a  location object?
> 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
>> ----------
>> From:  Brian Osborne
>> Sent:  Thursday, May 18, 2006 5:54 PM
>> To:  Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
>> Subject:  Re: [Bioperl-l] Reading GenBank Genomic File Annotation
>> 
>> Nick,
>> 
>> Have you read the Feature-Annotation HOWTO? This would be a good starting
>> point...
>> 
>> Brian O.
>> 
>> 
>> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
>> wrote:
>> 
>>> Would like a fairly simple way to extract certain information from Genbank
>>> Genomic File Annotations.
>>> Namely the six D.melanogaster sequences.
>>> Specifically to find gene entries and learn the gene name, begin and end and
>>> CDS.
>>> Please point me to appropriate modules and documentation.
>>> 
>>> 
>>> Nick Staffa
>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>> Scientific Computing Support Group
>>> NIEHS Information Technology Support Services Contract
>>> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
>>> National Institute of Environmental Health Sciences
>>> National Institutes of Health
>>> Research Triangle Park, North Carolina
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May 19 16:42:09 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 14:42:09 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <F5CA1CDF-B22E-4DFD-9CC1-7CEC7FF6FD75@watson.wustl.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
	<D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
	<446DF7CC.5060509@gmx.at>
	<F5CA1CDF-B22E-4DFD-9CC1-7CEC7FF6FD75@watson.wustl.edu>
Message-ID: <446E2DA1.1050503@gmx.at>

hi warren,
that means if I alter the DTD (if that is possible) by adding the 
taxonomic id to the DTD..... then I should have the taxonomic id tag in 
the xml file (theoretically)
but I guess this is only possible with a local search (blastall) but not 
with an online search.

greetings

Warren Gish wrote:
>
> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote:
>
>> hi,
>> I wondered whether is it also possible in the xml output (either WU 
>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>> do a general search.
>> regards
>>
> The taxonomic id is not an entity in the NCBI XML DTD.  If the 
> information was embedded in deflines, one could conceivably parse for 
> it, but I believe the NCBI only distributes taxids in their ASN.1 data 
> and in their pre-formated BLAST databases, and NCBI BLAST only reports 
> taxids in its ASN.1 output format, where taxid is available as an entity.
>
> --Warren
>
>


From cjfields at uiuc.edu  Fri May 19 16:56:56 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Fri, 19 May 2006 15:56:56 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>

You'll have to pull the GI or accession from each hit and do a lookup by either 
grabbing the sequence and using Bio::Species or use Bio::DB::Taxonomy; there 
isn't any tax information directly incorporated into BLAST reports AFAIK.

Chris

---- Original message ----
>Date: Fri, 19 May 2006 10:52:28 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: Warren Gish <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>
>hi,
>I wondered whether is it also possible in the xml output (either WU or 
>NCBI - Blast) to get the species (taxononmy) for every hit, if I do a 
>general search.
>regards
>
>Warren Gish wrote:
>> Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
>> blast.wustl.edu/blast/tabular.html).
>> --Warren
>>
>>   
>>> Whoops - sorry Warren - for some reason I had it in my mind that it  
>>> was different.  So the blastxml parser should work fine.  The  
>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
>>> right?
>>>
>>> -jason
>>>     
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>   
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May 19 16:59:35 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Fri, 19 May 2006 15:59:35 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <65932c77.bb0b33b0.8253400@expms6.cites.uiuc.edu>

Um, I don't think it works that way.  I'm pretty sure the XML is generated from 
the ASN1 output.  I don't think (like Warren says) that you can directly get to the 
tax information.  Indirectly is another matter...

Chris

---- Original message ----
>Date: Fri, 19 May 2006 14:42:09 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: Warren Gish <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>
>hi warren,
>that means if I alter the DTD (if that is possible) by adding the 
>taxonomic id to the DTD..... then I should have the taxonomic id tag in 
>the xml file (theoretically)
>but I guess this is only possible with a local search (blastall) but not 
>with an online search.
>
>greetings
>
>Warren Gish wrote:
>>
>> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote:
>>
>>> hi,
>>> I wondered whether is it also possible in the xml output (either WU 
>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>>> do a general search.
>>> regards
>>>
>> The taxonomic id is not an entity in the NCBI XML DTD.  If the 
>> information was embedded in deflines, one could conceivably parse for 
>> it, but I believe the NCBI only distributes taxids in their ASN.1 data 
>> and in their pre-formated BLAST databases, and NCBI BLAST only reports 
>> taxids in its ASN.1 output format, where taxid is available as an entity.
>>
>> --Warren
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May 19 17:30:20 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 15:30:20 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446E3854.5010708@gmx.at>
References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>
	<446E3854.5010708@gmx.at>
Message-ID: <446E38EC.9020100@gmx.at>

ok, thanks,
it appears that I only need the species where the Protein is derived 
from, so I guess Bio:Species would satisfy me, or?
and it would work that I just pull off the accession from the blast 
output file and then assign the accession code and get as return value  
the  species name.
is it possible to just assign the accession code, because I looked up 
but they were always talking of the entire file.

regards
>
>
> Christopher Fields wrote:
>> You'll have to pull the GI or accession from each hit and do a lookup 
>> by either grabbing the sequence and using Bio::Species or use 
>> Bio::DB::Taxonomy; there isn't any tax information directly 
>> incorporated into BLAST reports AFAIK.
>>
>> Chris
>>
>> ---- Original message ----
>>  
>>> Date: Fri, 19 May 2006 10:52:28 -0600
>>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  Subject: Re: 
>>> [Bioperl-l] parsing xml output  To: Warren Gish 
>>> <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>>>
>>> hi,
>>> I wondered whether is it also possible in the xml output (either WU 
>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>>> do a general search.
>>> regards
>>>
>>> Warren Gish wrote:
>>>    
>>>> Right, the WU-BLAST tabbed output contains more fields.  (See 
>>>> http:// blast.wustl.edu/blast/tabular.html).
>>>> --Warren
>>>>
>>>>        
>>>>> Whoops - sorry Warren - for some reason I had it in my mind that 
>>>>> it  was different.  So the blastxml parser should work fine.  The  
>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 
>>>>> though,  right?
>>>>>
>>>>> -jason
>>>>>             
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>     
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>   
>
>


From jason.stajich at duke.edu  Fri May 19 18:40:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 19 May 2006 18:40:54 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446E38EC.9020100@gmx.at>
References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>
	<446E3854.5010708@gmx.at> <446E38EC.9020100@gmx.at>
Message-ID: <FAE3151B-301F-4A42-9EFD-D1F8D3CBE752@duke.edu>

There is a gi2taxid table in the /pub/taxonomy part of NCBI FTP site  
(ftp.ncbi.nih.gov) -- I have used this to take GI numbers from report  
and get taxonomy for overall classification. I think something like  
this exists in the scripts or examples directory in the bioperl  
distro. I know I posted about it when I wrote about it a while ago.

-jason
On May 19, 2006, at 5:30 PM, Hubert Prielinger wrote:

> ok, thanks,
> it appears that I only need the species where the Protein is derived
> from, so I guess Bio:Species would satisfy me, or?
> and it would work that I just pull off the accession from the blast
> output file and then assign the accession code and get as return value
> the  species name.
> is it possible to just assign the accession code, because I looked up
> but they were always talking of the entire file.
>
> regards
>>
>>
>> Christopher Fields wrote:
>>> You'll have to pull the GI or accession from each hit and do a  
>>> lookup
>>> by either grabbing the sequence and using Bio::Species or use
>>> Bio::DB::Taxonomy; there isn't any tax information directly
>>> incorporated into BLAST reports AFAIK.
>>>
>>> Chris
>>>
>>> ---- Original message ----
>>>
>>>> Date: Fri, 19 May 2006 10:52:28 -0600
>>>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  Subject: Re:
>>>> [Bioperl-l] parsing xml output  To: Warren Gish
>>>> <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>>>>
>>>> hi,
>>>> I wondered whether is it also possible in the xml output (either WU
>>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I
>>>> do a general search.
>>>> regards
>>>>
>>>> Warren Gish wrote:
>>>>
>>>>> Right, the WU-BLAST tabbed output contains more fields.  (See
>>>>> http:// blast.wustl.edu/blast/tabular.html).
>>>>> --Warren
>>>>>
>>>>>
>>>>>> Whoops - sorry Warren - for some reason I had it in my mind that
>>>>>> it  was different.  So the blastxml parser should work fine.  The
>>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9
>>>>>> though,  right?
>>>>>>
>>>>>> -jason
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From ewijaya at i2r.a-star.edu.sg  Sat May 20 08:36:44 2006
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sat, 20 May 2006 20:36:44 +0800
Subject: [Bioperl-l] Method for checking Sequence type of a file
Message-ID: <30362db229c.446f7ddc@i2r.a-star.edu.sg>


Dear expert,

Is there any Bioperl method that allows
you to check verify sequence type in a file?

For example, given a file we wish
to check (return true  or false) whether
it is in FASTA format, GENBANK format, etc.

This method is useful in web application
as taint checking procedure.

Regards,
Edward WIJAYA
SINGAPORE


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From aaron.j.mackey at gsk.com  Fri May 19 09:33:01 2006
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Fri, 19 May 2006 09:33:01 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
 C?
In-Reply-To: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>
Message-ID: <OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>

bioperl-ext is the package in which alignment algorithms and/or BioPerl 
"wrapped" external C libraries live.  Subprojects in bioperl-ext use both 
XS and Inline::C, that's up to you.

You'll need to get your C code compiled to a dynamically loaded library 
(.so) to use either XS or Inline::C; this precludes any reuse of the C 
main() function (although your Inline::C wrapper might recapitulate/copy 
the main() function code).

Out of curiosity, what pairwise alignment algorithm are you using?  This 
is a heavily beaten path, you might want to dig around first to see if 
someone else already has what you need.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 05/18/2006 05:07:42 PM:

> I am currently using a pairwise alignment algorithm written in C (not by
> me).  The program consists of a library of routines, structures, and
> definitions which I do not want to spend a lot of time abstracting.  I
> already have a hack method of writing the parameters and inputs I want 
from
> perl, calling the c program with system( ), and then parsing the output 
in
> Perl.  Any good programmer would probably smack me but I'm just an 
undergrad
> and I needed to show my boss that this works in order to spend more time 
on
> it.
> 
> So on to my question, what is the preferred method of extending Bioperl 
to
> use this algorithm?  I have just read the XS tutorial and a bit about 
Inline
> C.  Can I put the main function in my script using Inline, and then just
> point Inline at the rest of the C library?  The program has several
> C-structures that are semantically equivalent to Bioperl objects, so 
just
> need somewhere to start.  I will spend some more time so that I have a 
more
> specific question, I just wanted a little feedback, this is my first 
post to
> the bioperl list.
> 
> Thanks,
> Adam
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason.stajich at duke.edu  Sat May 20 10:50:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 20 May 2006 10:50:17 -0400
Subject: [Bioperl-l] Method for checking Sequence type of a file
In-Reply-To: <30362db229c.446f7ddc@i2r.a-star.edu.sg>
References: <30362db229c.446f7ddc@i2r.a-star.edu.sg>
Message-ID: <F42D42CC-B609-48DF-B291-E0CE803D527C@duke.edu>

Try Bio::Tools::GuessSeqFormat

On May 20, 2006, at 8:36 AM, Wijaya Edward wrote:

>
> Dear expert,
>
> Is there any Bioperl method that allows
> you to check verify sequence type in a file?
>
> For example, given a file we wish
> to check (return true  or false) whether
> it is in FASTA format, GENBANK format, etc.
>
> This method is useful in web application
> as taint checking procedure.
>
> Regards,
> Edward WIJAYA
> SINGAPORE
>
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From chen_li3 at yahoo.com  Sat May 20 20:15:01 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sat, 20 May 2006 17:15:01 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
Message-ID: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>

Dear all,


I try one script from GraphicsHowTo under Cygwin
environment(GD and libpng already installed). I type
this line in Cygwin X window:


$ perl render_blast1.pl data1.txt | display -

And here is the result:

display: no decode delegate for this image format
`/tmp/magick-qKiRPDRS'.

Any idea?


Thank you very much,

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From osborne1 at optonline.net  Sat May 20 20:59:06 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sat, 20 May 2006 20:59:06 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <C095339A.886C%osborne1@optonline.net>

Chen,

Not sure. However, whenever I see a new or incomprehensible error message
like "display: no decode delegate for this image format" I Google it.

Brian O.


On 5/20/06 8:15 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> Dear all,
> 
> 
> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
> 
> 
> $ perl render_blast1.pl data1.txt | display -
> 
> And here is the result:
> 
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.
> 
> Any idea?
> 
> 
> Thank you very much,
> 
> Li
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From n.saunders at uq.edu.au  Sun May 21 18:17:44 2006
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Mon, 22 May 2006 08:17:44 +1000
Subject: [Bioperl-l] problems with Bio::Graph
Message-ID: <4470E708.3070402@uq.edu.au>

dear all,

I am having some problems with the Bio::Graph modules.  Running Bioperl 1.5.0 
RC1 with Ubuntu 5.10 i686.

I would like to parse files in PSI MI XML 2.5 format and for selected proteins, 
get the Uniprot accession of interacting partners (this is outlined in the 
documentation for Bio::Graph::ProteinGraph).  I wrote a very simple test script 
and ran it on a selection of XML files.  The script is simply:

----------------------------------------------------------------
use strict;
use Bio::Graph::IO;

my $mifile = shift || die("Usage = biograph.pl <MI datafile>\n");
my $graphio = Bio::Graph::IO->new('-file'   => $mifile,
		  		  '-format' => 'psi_xml');
my $gr = $graphio->next_network;
----------------------------------------------------------------

Here's a summary of the error messages with some sample files (I tried PSI MI 
XML versions 1 and 2.5):

1.  MINT database 9707552_small.xml (PSI 2.5)
Can't call method "att" on an undefined value at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.

2. IntAct database yeast_small-11.xml (PSI 2.5)
Can't call method "att" on an undefined value at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.

3. IntAct database yeast_small-11.xml (PSI 1)
Use of uninitialized value in string eq at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126.

4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1)
These give no errors

5. DIP file dip20060402.mif (PSI 1, complete dataset)
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1'
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328
STACK: Bio::Species::validate_species_name 
/usr/local/share/perl/5.8.7/Bio/Species.pm:340
STACK: Bio::Species::classification /usr/local/share/perl/5.8.7/Bio/Species.pm:170
STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118
STACK: Bio::Graph::IO::psi_xml::_proteinInteractor 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105
STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473
STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469
STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187
STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233
STACK: Bio::Graph::IO::psi_xml::next_network 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79
STACK: ./biograph.pl:18
-----------------------------------------------------------


Looking at the module code, it seems that the first 2 errors relate to a 
parameter "proteinInteractorRef", found in PSI MI version 1 but not version 2.5. 
  Error 3 I haven't yet figured out.  DIP PSI MI XML version 1 for single 
species seems OK, but it seems there are species names in the complete dataset 
that cause problems (error 5).


Is the CVS version of Bio::Graph any better at handling PSI MI XML?  Are there 
plans to get it to work with version 2.5 files from all sources (MINT and 
IntAct) ?  Googling and checking the list archives didn't give a lot of hits 
which made me think it's not a widely-used module.

thanks,
Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://psychro.bioinformatics.unsw.edu.au/neil


From torsten.seemann at infotech.monash.edu.au  Sun May 21 21:31:56 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 22 May 2006 11:31:56 +1000
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <4471148C.5090404@infotech.monash.edu.au>

> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
> $ perl render_blast1.pl data1.txt | display -
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.

You are piping the output of the Perl script (which is a GIF/PNG image) 
into the input of a program called "display". This program is part of 
the ImageMagick toolkit, standard on most Linux installations. Because 
you are using Windows you probably don't have it installed! Try this:

$ perl render_blast1.pl data1.txt > image.gif

Then load 'image.gif' into whatever your favourite image viewer is.
	
-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From darin.london at duke.edu  Mon May 22 11:29:45 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 11:29:45 -0400
Subject: [Bioperl-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <4471D8E9.8090109@duke.edu>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.


From darin.london at duke.edu  Mon May 22 12:00:55 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 09:00:55 -0700
Subject: [Bioperl-l] [Bioperl-announce-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <000301c67db8$e8391f70$6400a8c0@CodonSolutions.local>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.

_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From osborne1 at optonline.net  Mon May 22 17:37:50 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 22 May 2006 17:37:50 -0400
Subject: [Bioperl-l] problems with Bio::Graph
In-Reply-To: <4470E708.3070402@uq.edu.au>
Message-ID: <C097A76E.88A9%osborne1@optonline.net>

Neil,

Let me propose an alternative. In the past few months I've been working on a
Bioperl package for handling protein interaction networks, it is called
bioperl-network. It's similar to the Bio::Graph modules, except for the
following:

- It does not use Nat Goodman's SimpleGraph, it uses Perl's Graph. The
advantage is that we are not responsible for maintaining the algorithm code,
the disadvantage is that Graph has some bugs but Jarkko Hietaniemi has been
working on these and has fixed some significant ones recently.

- It uses names and concepts from Graph. It also has separate notions of
edge and interaction, where one edge can have one or more interactions.

- It uses more method names and conventions borrowed from interaction
databases and PSI MI. For example, a node can be a protein complex composed
of multiple Seq objects, not just a protein.

This package is a makeover of Bio::Graph, therefore Nat Goodman and Richard
Adams are major contributors to it. It's also worth mentioning that it's not
complete, meaning it won't parse all fields from PSI MI 2 or 2.5 but I think
it should be able to handle the code you've shown (and if it cannot then
I'll see that it's fixed). I don't know about PSI MI version 1 but if I'm
not mistaken there's a version 1 -> version 2 converter.

I'm about to put this into CVS so you can take a look, should you choose to.

Brian O.


On 5/21/06 6:17 PM, "Neil Saunders" <n.saunders at uq.edu.au> wrote:

> dear all,
> 
> I am having some problems with the Bio::Graph modules.  Running Bioperl 1.5.0
> RC1 with Ubuntu 5.10 i686.
> 
> I would like to parse files in PSI MI XML 2.5 format and for selected
> proteins, 
> get the Uniprot accession of interacting partners (this is outlined in the
> documentation for Bio::Graph::ProteinGraph).  I wrote a very simple test
> script 
> and ran it on a selection of XML files.  The script is simply:
> 
> ----------------------------------------------------------------
> use strict;
> use Bio::Graph::IO;
> 
> my $mifile = shift || die("Usage = biograph.pl <MI datafile>\n");
> my $graphio = Bio::Graph::IO->new('-file'   => $mifile,
>  '-format' => 'psi_xml');
> my $gr = $graphio->next_network;
> ----------------------------------------------------------------
> 
> Here's a summary of the error messages with some sample files (I tried PSI MI
> XML versions 1 and 2.5):
> 
> 1.  MINT database 9707552_small.xml (PSI 2.5)
> Can't call method "att" on an undefined value at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.
> 
> 2. IntAct database yeast_small-11.xml (PSI 2.5)
> Can't call method "att" on an undefined value at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.
> 
> 3. IntAct database yeast_small-11.xml (PSI 1)
> Use of uninitialized value in string eq at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126.
> 
> 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1)
> These give no errors
> 
> 5. DIP file dip20060402.mif (PSI 1, complete dataset)
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1'
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328
> STACK: Bio::Species::validate_species_name
> /usr/local/share/perl/5.8.7/Bio/Species.pm:340
> STACK: Bio::Species::classification
> /usr/local/share/perl/5.8.7/Bio/Species.pm:170
> STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118
> STACK: Bio::Graph::IO::psi_xml::_proteinInteractor
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105
> STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473
> STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469
> STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187
> STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233
> STACK: Bio::Graph::IO::psi_xml::next_network
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79
> STACK: ./biograph.pl:18
> -----------------------------------------------------------
> 
> 
> Looking at the module code, it seems that the first 2 errors relate to a
> parameter "proteinInteractorRef", found in PSI MI version 1 but not version
> 2.5. 
>   Error 3 I haven't yet figured out.  DIP PSI MI XML version 1 for single
> species seems OK, but it seems there are species names in the complete dataset
> that cause problems (error 5).
> 
> 
> Is the CVS version of Bio::Graph any better at handling PSI MI XML?  Are there
> plans to get it to work with version 2.5 files from all sources (MINT and
> IntAct) ?  Googling and checking the list archives didn't give a lot of hits
> which made me think it's not a widely-used module.
> 
> thanks,
> Neil


From torsten.seemann at infotech.monash.edu.au  Mon May 22 17:53:02 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 23 May 2006 07:53:02 +1000
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>
References: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>
Message-ID: <447232BE.1080001@infotech.monash.edu.au>

Chen Li

>  perl render_blast1.pl data1.txt >im.png

Based on http://bioperl.org/wiki/HOWTO:Graphics I believe the example 
script is creating a PNG image. The last line is:
print $panel->png;

> and Perl runs without any problem. I use adobe
> photoshop to open them and Adobe can't recognize them.
> If I use ACDSee to open them I only get a black
> background. If I issue this line under Cygwin X window
> display im.png  or display im.gif
> Cygwin says:
> display: Improper image header `im.png'.
> It seems Perl can't produce an image with right
> format.

Are you sure Perl is producing a PNG file at all?
How many bytes does im.png use? Zero?

Did you notice this in http://bioperl.org/wiki/HOWTO:Graphics ?

It says: "If you are on a Windows platform, you need to put STDOUT into 
binary mode so that the PNG file does not go through Window's carriage 
return/linefeed transformations. Before the final print statement, put 
the statement binmode(STDOUT)."

ie. your script should have

binmode(STDOUT);
print $panel->png;

as the last 2 lines.

> Do you experience the same problem before?

No.

--Torsten


From chen_li3 at yahoo.com  Mon May 22 09:25:53 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 22 May 2006 06:25:53 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <4471148C.5090404@infotech.monash.edu.au>
Message-ID: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>

Dear Dr. Seemann,


Thank you very much for the reply.

I issue this line:
 perl render_blast1.pl data1.txt >im.gif
or 
 perl render_blast1.pl data1.txt >im.png

and Perl runs without any problem. I use adobe
photoshop to open them and Adobe can't recognize them.
If I use ACDSee to open them I only get a black
background. If I issue this line under Cygwin X window

display im.png  or display im.gif

Cygwin says:

display: Improper image header `im.png'.

or display: Improper image header `im.gif'.

It seems Perl can't produce an image with right
format.


Do you experience the same problem before?

Li


--- Torsten Seemann
<torsten.seemann at infotech.monash.edu.au> wrote:

> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I
> type
> > this line in Cygwin X window:
> > $ perl render_blast1.pl data1.txt | display -
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> 
> You are piping the output of the Perl script (which
> is a GIF/PNG image) 
> into the input of a program called "display". This
> program is part of 
> the ImageMagick toolkit, standard on most Linux
> installations. Because 
> you are using Windows you probably don't have it
> installed! Try this:
> 
> $ perl render_blast1.pl data1.txt > image.gif
> 
> Then load 'image.gif' into whatever your favourite
> image viewer is.
> 	
> -- 
> Dr Torsten Seemann              
> http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash
> University, Australia
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Mon May 22 18:57:42 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 22 May 2006 15:57:42 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <447232BE.1080001@infotech.monash.edu.au>
Message-ID: <20060522225742.78245.qmail@web36804.mail.mud.yahoo.com>

Hi,

I try both: either with or without this statement 
 binmode(STDOUT) before the last line print
$panel->png; But there are no differenes. I get a file
of 2432 bytes.

Li


> Chen Li
> 
> >  perl render_blast1.pl data1.txt >im.png
> 
> Based on http://bioperl.org/wiki/HOWTO:Graphics I
> believe the example 
> script is creating a PNG image. The last line is:
> print $panel->png;
> 
> > and Perl runs without any problem. I use adobe
> > photoshop to open them and Adobe can't recognize
> them.
> > If I use ACDSee to open them I only get a black
> > background. If I issue this line under Cygwin X
> window
> > display im.png  or display im.gif
> > Cygwin says:
> > display: Improper image header `im.png'.
> > It seems Perl can't produce an image with right
> > format.
> 
> Are you sure Perl is producing a PNG file at all?
> How many bytes does im.png use? Zero?
> 
> Did you notice this in
> http://bioperl.org/wiki/HOWTO:Graphics ?
> 
> It says: "If you are on a Windows platform, you need
> to put STDOUT into 
> binary mode so that the PNG file does not go through
> Window's carriage 
> return/linefeed transformations. Before the final
> print statement, put 
> the statement binmode(STDOUT)."
> 
> ie. your script should have
> 
> binmode(STDOUT);
> print $panel->png;
> 
> as the last 2 lines.
> 
> > Do you experience the same problem before?
> 
> No.
> 
> --Torsten
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From barry.moore at genetics.utah.edu  Mon May 22 21:00:06 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Mon, 22 May 2006 19:00:06 -0600
Subject: [Bioperl-l] Problems with Unflattener.pm
Message-ID: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu>

Hi All,

NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into  
an infinite recursive loop.  The trouble occurs in the method  
find_best_matches between lines 2258 and 2281, and in particular the  
loop is perpetuated by line 2273.   NT_113910 has a fairly complex  
features table, and but I have as yet been unable to figure out why  
this loop is not exiting properly.  This has been submitted to  
bugzilla, but I?ll post here so it gets documented on the list also.   
Any suggestions from Chris or others would be greatly appreciated.

This problem can be recreated as follows:

Grab NT_113910 from genbank.
bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk

Pass NT_113910.gbk on the command line to the attached script.


#!/usr/bin/perl;

use strict;
use warnings;

use Bio::SeqIO;
use Bio::SeqFeature::Tools::Unflattener;

my $file = shift;

# generate an Unflattener object
my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
#$unflattener->verbose(1);

# first fetch a genbank SeqI object
my $seqio =
     Bio::SeqIO->new(-file   => $file,
                     -format => 'GenBank');
my $out =
     Bio::SeqIO->new(-format => 'asciitree');
while (my $seq = $seqio->next_seq()) {

         # get top level unflattended SeqFeatureI objects
         $unflattener->unflatten_seq(-seq       => $seq,
                                     -use_magic => 1);
         $out->write_seq($seq);
}


From miker at biotiquesystems.com  Mon May 22 19:56:52 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Mon, 22 May 2006 16:56:52 -0700
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
Message-ID: <002a01c67dfb$663cc600$c100a8c0@mike>


As best as I can tell, using Bio::SeqIO to parse a uniprot file ignores the
sequence version, and calling seq_version() on the resulting RichSeq object
returns undef.

It looks like swiss.pm is trying to parse the version out of the SV line, which
apparently doesn't exist any more?  The sequence version(s) are now specified as
part of the Date (DT) lines.  

Is this not a bug?  Is swiss.pm not designed to parse uniprot files?

Thanks for any help ...


From jason.stajich at duke.edu  Mon May 22 21:37:13 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 22 May 2006 21:37:13 -0400
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <002a01c67dfb$663cc600$c100a8c0@mike>
References: <002a01c67dfb$663cc600$c100a8c0@mike>
Message-ID: <B62A5429-083F-4B93-87EF-0F5DCD4033FE@duke.edu>

Sounds like a "missing feature" =)

AFAIK the module was only written for swissprot files.  It is  
possible there have been changes in the format that have not been  
tracked to the current code.  We'd certainly appreciate someone  
testing it out as versions evolve.  If you submit a bug to bugzilla  
with version of bioperl and example files you can track when a fix is  
in.  We of course appreciate anyone's efforts to provide a patch as  
most bugs get fixed of late when someone gets "itchy" enough to fix  
them.

-jason

On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:

>
> As best as I can tell, using Bio::SeqIO to parse a uniprot file  
> ignores the
> sequence version, and calling seq_version() on the resulting  
> RichSeq object
> returns undef.
>
> It looks like swiss.pm is trying to parse the version out of the SV  
> line, which
> apparently doesn't exist any more?  The sequence version(s) are now  
> specified as
> part of the Date (DT) lines.
>
> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>
> Thanks for any help ...
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Mon May 22 22:04:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 22 May 2006 22:04:17 -0400
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike>
References: <003301c67e0b$5dd44410$c100a8c0@mike>
Message-ID: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu>

We ask that people post patches to the bugzilla as an attachment to  
the bugzilla so we can track what and why the bug was that the patch  
fixes.

I am not totally sure this patch works because it seems like we need  
to strip out more information now from the DT line if the $date  
actually contains more information than just the date.

If you would go ahead and create a bug in bugzilla for  this (http:// 
bugzilla.open-bio.org) this sort of conversation can be tracked to  
the bug.

If any of this is unclear please let us know - I though we had put  
some pages up about this sort of thing on the wiki but maybe they  
need to be expanded.

-jason
On May 22, 2006, at 9:51 PM, Michael Rogoff wrote:

> I have a patch that seems to work but I'm not familiar with the  
> proper method to
> "provide" it.  How do I go about that?
>
> The patch is pretty simple, it just parses the sequence version out  
> of the date
> line where it now hides:
>
>          #date
>          elsif( /^DT\s+(.*)/ ) {
>            my $date = $1;
> +
> +          if ($date =~ /sequence version (\d+)/i) {
> +              $params{'-seq_version'} ||= $1;
> +          }
> +
>            $date =~ s/\;//;
>            $date =~ s/\s+$//;
>            push @{$params{'-dates'}}, $date;
>          }
>
> By the way, what is the difference between Bio::Seq::version and
> Bio::Seq::RichSeq::seq_version?
>
>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Monday, May 22, 2006 6:37 PM
>> To: Michael Rogoff
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
>>
>>
>> Sounds like a "missing feature" =)
>>
>> AFAIK the module was only written for swissprot files.  It is
>> possible there have been changes in the format that have not been
>> tracked to the current code.  We'd certainly appreciate someone
>> testing it out as versions evolve.  If you submit a bug to bugzilla
>> with version of bioperl and example files you can track when
>> a fix is
>> in.  We of course appreciate anyone's efforts to provide a patch as
>> most bugs get fixed of late when someone gets "itchy" enough to fix
>> them.
>>
>> -jason
>>
>> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
>>
>>>
>>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
>>> ignores the
>>> sequence version, and calling seq_version() on the resulting
>>> RichSeq object
>>> returns undef.
>>>
>>> It looks like swiss.pm is trying to parse the version out
>> of the SV
>>> line, which
>>> apparently doesn't exist any more?  The sequence version(s)
>> are now
>>> specified as
>>> part of the Date (DT) lines.
>>>
>>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>>>
>>> Thanks for any help ...
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Marc.Logghe at DEVGEN.com  Tue May 23 03:08:37 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 23 May 2006 09:08:37 +0200
Subject: [Bioperl-l] problems iwth Bio::graphics module
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com>

Hi Li,
Did you check your script for any other print statements (to STDOUT,
that is) that potentially could contaminate your png stream ?

Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
> Sent: Tuesday, May 23, 2006 12:58 AM
> To: Torsten Seemann
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] problems iwth Bio::graphics module
> 
> Hi,
> 
> I try both: either with or without this statement
>  binmode(STDOUT) before the last line print $panel->png; But 
> there are no differenes. I get a file of 2432 bytes.
> 
> Li
> 
> 
> 
> > Chen Li
> > 
> > >  perl render_blast1.pl data1.txt >im.png
> > 
> > Based on http://bioperl.org/wiki/HOWTO:Graphics I believe 
> the example 
> > script is creating a PNG image. The last line is:
> > print $panel->png;
> > 
> > > and Perl runs without any problem. I use adobe photoshop to open 
> > > them and Adobe can't recognize
> > them.
> > > If I use ACDSee to open them I only get a black background. If I 
> > > issue this line under Cygwin X
> > window
> > > display im.png  or display im.gif
> > > Cygwin says:
> > > display: Improper image header `im.png'.
> > > It seems Perl can't produce an image with right format.
> > 
> > Are you sure Perl is producing a PNG file at all?
> > How many bytes does im.png use? Zero?
> > 
> > Did you notice this in
> > http://bioperl.org/wiki/HOWTO:Graphics ?
> > 
> > It says: "If you are on a Windows platform, you need to put STDOUT 
> > into binary mode so that the PNG file does not go through Window's 
> > carriage return/linefeed transformations. Before the final print 
> > statement, put the statement binmode(STDOUT)."
> > 
> > ie. your script should have
> > 
> > binmode(STDOUT);
> > print $panel->png;
> > 
> > as the last 2 lines.
> > 
> > > Do you experience the same problem before?
> > 
> > No.
> > 
> > --Torsten
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection 
> around http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From chen_li3 at yahoo.com  Tue May 23 09:27:06 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 06:27:06 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com>
Message-ID: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com>

Dear Dr. Logghe,

Thank you so much. I have the script worked after
getting your suggestion under Cygwin. Here are the
last two lines:

either binmode (STDOUT);
       print STDOUT $panel->png;

or only print STDOUT $panel->png;

They both work for me. I know the default output in
perl to the screen. I don't why it works if STDOUT
after print is added. Could you explain it?  

BTW I copy  this script from GraphicsHowTo on Bioperl
website and only one line contains print statement,
which is 'print $panel->png'. 

Once again thank you so much,

Li

--- Marc Logghe <Marc.Logghe at devgen.com> wrote:

> Hi Li,
> Did you check your script for any other print
> statements (to STDOUT,
> that is) that potentially could contaminate your png
> stream ?
> 
> Marc
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org 
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On
> Behalf Of chen li
> > Sent: Tuesday, May 23, 2006 12:58 AM
> > To: Torsten Seemann
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] problems iwth
> Bio::graphics module
> > 
> > Hi,
> > 
> > I try both: either with or without this statement
> >  binmode(STDOUT) before the last line print
> $panel->png; But 
> > there are no differenes. I get a file of 2432
> bytes.
> > 
> > Li
> > 
> > 
> > 
> > > Chen Li
> > > 
> > > >  perl render_blast1.pl data1.txt >im.png
> > > 
> > > Based on http://bioperl.org/wiki/HOWTO:Graphics
> I believe 
> > the example 
> > > script is creating a PNG image. The last line
> is:
> > > print $panel->png;
> > > 
> > > > and Perl runs without any problem. I use adobe
> photoshop to open 
> > > > them and Adobe can't recognize
> > > them.
> > > > If I use ACDSee to open them I only get a
> black background. If I 
> > > > issue this line under Cygwin X
> > > window
> > > > display im.png  or display im.gif
> > > > Cygwin says:
> > > > display: Improper image header `im.png'.
> > > > It seems Perl can't produce an image with
> right format.
> > > 
> > > Are you sure Perl is producing a PNG file at
> all?
> > > How many bytes does im.png use? Zero?
> > > 
> > > Did you notice this in
> > > http://bioperl.org/wiki/HOWTO:Graphics ?
> > > 
> > > It says: "If you are on a Windows platform, you
> need to put STDOUT 
> > > into binary mode so that the PNG file does not
> go through Window's 
> > > carriage return/linefeed transformations. Before
> the final print 
> > > statement, put the statement binmode(STDOUT)."
> > > 
> > > ie. your script should have
> > > 
> > > binmode(STDOUT);
> > > print $panel->png;
> > > 
> > > as the last 2 lines.
> > > 
> > > > Do you experience the same problem before?
> > > 
> > > No.
> > > 
> > > --Torsten
> > > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection 
> > around http://mail.yahoo.com 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From lstein at cshl.edu  Tue May 23 10:06:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 23 May 2006 10:06:27 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <200605231006.28392.lstein@cshl.edu>

Hi,

It is possible that your version of display can't handle PNG images. Try 
saving the output as a file and then opening it in another image program:

	perl render_blast1.pl data1.txt > data1.png

Another thing to watch out for is that, depending on what version of Perl 
you're using, you may have to insert this statement into the render_blast1.pl 
script (somewhere near the top):

	binmode STDOUT;

Lincoln


On Saturday 20 May 2006 20:15, chen li wrote:
> Dear all,
>
>
> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
>
>
> $ perl render_blast1.pl data1.txt | display -
>
> And here is the result:
>
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.
>
> Any idea?
>
>
> Thank you very much,
>
> Li
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From Derek.Fairley at bll.n-i.nhs.uk  Tue May 23 10:39:16 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Tue, 23 May 2006 15:39:16 +0100
Subject: [Bioperl-l] Bio::Restriction::IO query
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C04019F@bllmail.bll.n-i.nhs.uk>

Hi folks,

I'm new to BioPerl, and struggling to make the Bio::Restriction::*
modules work (using BioPerl-1.4; Perl-5.8.1; Linux-2.4). Specifically,
I'm having some trouble understanding the behaviour of the
Bio::Restriction::IO module. I'm trying to use this to create a
Bio::Restriction::EnzymeCollection object from a local REBASE file
(which is in bairoch-format); this will in turn be passed to a
Bio::Restriction::Analysis object.

The following test script (derived from the Bio::Restriction::IO
perldoc) runs fine:

#! /usr/bin/perl -w
use strict;
use warnings;
use Bio::Restriction::IO;

my $in = Bio::Restriction::IO->new(	-file => "REBASE_file",
						-format =>'Bairoch');
my $collection = $in->read();
print "Number of REs in the collection: ", scalar
$collection->each_enzyme, "\n";

#note that using -format=>'bairoch' without capitalisation (as shown in
perldoc synopsis) throws an exception: Failed to load module
Bio::Restriction::IO::bairoch...

However... the test script returns the number 532 - the number of
enzymes in the default enzyme set - regardless of the number of enzymes
in the file. A default Bio::Restriction::EnzymeCollection object has
presumably been created (as the 'read()' and 'each_enzyme' methods are
available) but it didn't come from the local file. The result is the
same if the Bio::Restriction::IO->new() method is called with no
arguments - a default EnzymeCollection object is created. It's not clear
to me where this has come from.

My (mis?)understanding was that the default set of enzymes would be
loaded on creation of a new Bio::Restriction::Analysis object (in the
absence of a -enzymes=>... argument). Presumably this is down to my poor
understanding of the BioPerl object model... ;-)

So: how should I create an EnzymeCollection object from file?

Any help or advice would be gratefully received.

PS. Congratulations to the development team for creating a very
impressive and useful open source toolkit.

Derek.


-----------------------------------------
Derek Fairley, Ph.D.
Regional Virus Laboratory,
Kelvin Building,
Royal Victoria Hospital, 
Grosvenor Road,
Belfast,
N. Ireland.
BT12 6BA

Tel. +44 (0)2890 635303


From rowan.mitchell at bbsrc.ac.uk  Tue May 23 10:53:42 2006
From: rowan.mitchell at bbsrc.ac.uk (rowan mitchell (RRes-Roth))
Date: Tue, 23 May 2006 15:53:42 +0100
Subject: [Bioperl-l] Assembly::IO ace output
Message-ID: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>

Hi 

I am very interested in writing ace format files and had assumed that I
would be able to do this with Assembly::IO until I tried it! I see there
has been some correspondence last year on this, but as far as I can see
this is still not implemented in 1.5.1. Is this correct ? Is it planned
to be included; are there modules under development available ?

many thanks

Rowan

===============================================
Dr Rowan Mitchell
Rothamsted Research
Harpenden
Herts AL5 2JQ UK

Tel: +44 (0)1582 763133 x2469
Fax: +44 (0)1582 763010
E-mail: rowan.mitchell at bbsrc.ac.uk
WWW: http://www.rothamsted.bbsrc.ac.uk/
===============================================
Rothamsted Research is a company limited by guarantee, registered in
England under the registration number 2393175 and a not for profit
charity number 802038.


From rfsouza at cecm.usp.br  Tue May 23 16:17:36 2006
From: rfsouza at cecm.usp.br (Robson Francisco de Souza {S})
Date: Tue, 23 May 2006 17:17:36 -0300
Subject: [Bioperl-l] Assembly::IO ace output
In-Reply-To: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>
References: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>
Message-ID: <20060523201736.GA28401@cecm.usp.br>

Hi Rowan,

On Tue, May 23, 2006 at 03:53:42PM +0100, rowan mitchell (RRes-Roth) wrote:
> Hi 
> 
> I am very interested in writing ace format files and had assumed that I
> would be able to do this with Assembly::IO until I tried it! I see there
> has been some correspondence last year on this, but as far as I can see
> this is still not implemented in 1.5.1. Is this correct ? Is it planned
> to be included; are there modules under development available ?

As far as I know, there are no plans to add write support to
Bio::Assembly::IO. When I wrote the original modules there was no need
for this so I left it aside.

Best regards,
Robson

> many thanks
> 
> Rowan
> 
> ===============================================
> Dr Rowan Mitchell
> Rothamsted Research
> Harpenden
> Herts AL5 2JQ UK
> 
> Tel: +44 (0)1582 763133 x2469
> Fax: +44 (0)1582 763010
> E-mail: rowan.mitchell at bbsrc.ac.uk
> WWW: http://www.rothamsted.bbsrc.ac.uk/
> ===============================================
> Rothamsted Research is a company limited by guarantee, registered in
> England under the registration number 2393175 and a not for profit
> charity number 802038.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From lstein at cshl.edu  Tue May 23 16:53:34 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 23 May 2006 16:53:34 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231006.28392.lstein@cshl.edu>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
	<200605231006.28392.lstein@cshl.edu>
Message-ID: <200605231653.36087.lstein@cshl.edu>

Hi Chen,

It looks to me like you cut and paste the data1.txt file from the web site, 
consequently replacing the tabs with spaces. Please get table1.txt from the 
BioPerl distribution, as instructed in the tutorial.

Best,

Lincoln

On Tuesday 23 May 2006 10:06, Lincoln Stein wrote:
> Hi,
>
> It is possible that your version of display can't handle PNG images. Try
> saving the output as a file and then opening it in another image program:
>
> 	perl render_blast1.pl data1.txt > data1.png
>
> Another thing to watch out for is that, depending on what version of Perl
> you're using, you may have to insert this statement into the
> render_blast1.pl script (somewhere near the top):
>
> 	binmode STDOUT;
>
> Lincoln
>
> On Saturday 20 May 2006 20:15, chen li wrote:
> > Dear all,
> >
> >
> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I type
> > this line in Cygwin X window:
> >
> >
> > $ perl render_blast1.pl data1.txt | display -
> >
> > And here is the result:
> >
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> >
> > Any idea?
> >
> >
> > Thank you very much,
> >
> > Li
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From chen_li3 at yahoo.com  Tue May 23 17:46:16 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 14:46:16 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231653.36087.lstein@cshl.edu>
Message-ID: <20060523214616.15131.qmail@web36813.mail.mud.yahoo.com>

Dear Dr. Stein,

Thank you so much. I follow your suggestions and
download codes from the Bioperl CVS website. Now
everything is working.


Li 


--- Lincoln Stein <lstein at cshl.edu> wrote:

> Hi Chen,
> 
> It looks to me like you cut and paste the data1.txt
> file from the web site, 
> consequently replacing the tabs with spaces. Please
> get table1.txt from the 
> BioPerl distribution, as instructed in the tutorial.
> 
> Best,
> 
> Lincoln
> 
> On Tuesday 23 May 2006 10:06, Lincoln Stein wrote:
> > Hi,
> >
> > It is possible that your version of display can't
> handle PNG images. Try
> > saving the output as a file and then opening it in
> another image program:
> >
> > 	perl render_blast1.pl data1.txt > data1.png
> >
> > Another thing to watch out for is that, depending
> on what version of Perl
> > you're using, you may have to insert this
> statement into the
> > render_blast1.pl script (somewhere near the top):
> >
> > 	binmode STDOUT;
> >
> > Lincoln
> >
> > On Saturday 20 May 2006 20:15, chen li wrote:
> > > Dear all,
> > >
> > >
> > > I try one script from GraphicsHowTo under Cygwin
> > > environment(GD and libpng already installed). I
> type
> > > this line in Cygwin X window:
> > >
> > >
> > > $ perl render_blast1.pl data1.txt | display -
> > >
> > > And here is the result:
> > >
> > > display: no decode delegate for this image
> format
> > > `/tmp/magick-qKiRPDRS'.
> > >
> > > Any idea?
> > >
> > >
> > > Thank you very much,
> > >
> > > Li
> > >
> > >
> > >
> __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > > http://mail.yahoo.com
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING, 
> PLEASE CONTACT MY ASSISTANT, 
> SANDRA MICHELSEN, AT michelse at cshl.edu
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Tue May 23 18:59:46 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 15:59:46 -0700 (PDT)
Subject: [Bioperl-l] How to download sequence files either in EMBL format
Message-ID: <20060523225946.2118.qmail@web36805.mail.mud.yahoo.com>

Hi all,

I need to download one sequence for a gene. I go to
NCBI website,find the gene of interest,download the
file in Genbank format(saved as sequence.genbank). But
to my surprise this so-called genbank format file
doesn't contain many features such as exons,compared
to the one in Emsembl. 

My question: where can I download this sequence file
in EMBL format? It looks like the one in EMBL might
contain other information such exon.

Thank you very much,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From osborne1 at optonline.net  Wed May 24 10:33:16 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 24 May 2006 10:33:16 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com>
Message-ID: <C099E6EC.88F0%osborne1@optonline.net>

Li,

The Graphics HOWTO talks about this Windows workaround in _four_ different
places, it's impossible to miss if you read it from start to finish. This is
what one should do if one wants to use these modules and one is a novice.
Example:

Important! Remember that if you are on a Windows platform, you need to put
STDOUT into binary mode so that the PNG file does not go through Window's
carriage return/linefeed transformations. Before the final print statement,
write binmode(STDOUT).

Brian O.


On 5/23/06 9:27 AM, "chen li" <chen_li3 at yahoo.com> wrote:

> BTW I copy  this script from GraphicsHowTo on Bioperl
> website and only one line contains print statement,
> which is 'print $panel->png'. 


From chen_li3 at yahoo.com  Wed May 24 12:17:15 2006
From: chen_li3 at yahoo.com (chen li)
Date: Wed, 24 May 2006 09:17:15 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <C099E6EC.88F0%osborne1@optonline.net>
Message-ID: <20060524161715.45141.qmail@web36807.mail.mud.yahoo.com>

Thanks but Dr. Stein already helps me to figure out
what is going on: I should have copied the source
codes for the examples in CVS instead of "cut and
paste" from the HOWTO tutorial. And sorry for any
inconvience.

Li

--- Brian Osborne <osborne1 at optonline.net> wrote:

> Li,
> 
> The Graphics HOWTO talks about this Windows
> workaround in _four_ different
> places, it's impossible to miss if you read it from
> start to finish. This is
> what one should do if one wants to use these modules
> and one is a novice.
> Example:
> 
> Important! Remember that if you are on a Windows
> platform, you need to put
> STDOUT into binary mode so that the PNG file does
> not go through Window's
> carriage return/linefeed transformations. Before the
> final print statement,
> write binmode(STDOUT).
> 
> Brian O.
> 
> 
> On 5/23/06 9:27 AM, "chen li" <chen_li3 at yahoo.com>
> wrote:
> 
> > BTW I copy  this script from GraphicsHowTo on
> Bioperl
> > website and only one line contains print
> statement,
> > which is 'print $panel->png'. 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From ULNJUJERYDIX at spammotel.com  Wed May 24 21:59:36 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Thu, 25 May 2006 09:59:36 +0800
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have
	negative (-) position numbering imagemap making
Message-ID: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>

Hi
thanks for the help offered thus far!
sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using
bioperl. therefore i was asked to make the numberings as such (-1000) is
there any way at all to do this in bioperl without changing the .pm file?

thanks guys..
kevin


From cjfields at uiuc.edu  Thu May 25 12:43:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 11:43:37 -0500
Subject: [Bioperl-l] Problems with Unflattener.pm
In-Reply-To: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu>
Message-ID: <009d01c6801a$5f75d2a0$15327e82@pyrimidine>

I was able to reproduce this using WinXP and bioperl-live.  Seems to get
caught up in the loop during recursion: debugging shows it is unable to get
past 'find_best_matches: (/15)'.  There are lots of unmatched pairs here
with this sequence, so could that be the problem?  I not terribly familiar
with Unflattener...

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Barry Moore
> Sent: Monday, May 22, 2006 8:00 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Problems with Unflattener.pm
> 
> Hi All,
> 
> NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into
> an infinite recursive loop.  The trouble occurs in the method
> find_best_matches between lines 2258 and 2281, and in particular the
> loop is perpetuated by line 2273.   NT_113910 has a fairly complex
> features table, and but I have as yet been unable to figure out why
> this loop is not exiting properly.  This has been submitted to
> bugzilla, but I'll post here so it gets documented on the list also.
> Any suggestions from Chris or others would be greatly appreciated.
> 
> This problem can be recreated as follows:
> 
> Grab NT_113910 from genbank.
> bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk
> 
> Pass NT_113910.gbk on the command line to the attached script.
> 
> 
> 
> #!/usr/bin/perl;
> 
> use strict;
> use warnings;
> 
> use Bio::SeqIO;
> use Bio::SeqFeature::Tools::Unflattener;
> 
> my $file = shift;
> 
> # generate an Unflattener object
> my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
> #$unflattener->verbose(1);
> 
> # first fetch a genbank SeqI object
> my $seqio =
>      Bio::SeqIO->new(-file   => $file,
>                      -format => 'GenBank');
> my $out =
>      Bio::SeqIO->new(-format => 'asciitree');
> while (my $seq = $seqio->next_seq()) {
> 
>          # get top level unflattended SeqFeatureI objects
>          $unflattener->unflatten_seq(-seq       => $seq,
>                                      -use_magic => 1);
>          $out->write_seq($seq);
> }
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu May 25 15:44:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 14:44:01 -0500
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu>
Message-ID: <00a101c68033$95606dd0$15327e82@pyrimidine>

This is due to recent changes in the SwissProt/UniProt format (there
apparently are many other changes besides this).  

>From UniProtKB news (http://ca.expasy.org/sprot/relnotes/sp_news.html) is
this tidbit:
----------------------------------------------------------
 UniProtKB release 7.0 of 07-Feb-2006

    Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB
releases in the DT lines to displaying the date of the biweekly release at
which an entry is integrated or updated. We dropped the information
concerning the release number and introduced entry and sequence version
numbers in the DT lines.

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino
acid sequence is modified. The entry version number is incremented by one
whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates,
using archived UniProtKB releases.

----------------------------------------------------------

Probably should explain on the swissprot wiki page that the format is in a
state of flux at the moment.  I've added this tidbit to the bug page (#2003)
as well.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Monday, May 22, 2006 9:04 PM
> To: Michael Rogoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> 
> We ask that people post patches to the bugzilla as an attachment to
> the bugzilla so we can track what and why the bug was that the patch
> fixes.
> 
> I am not totally sure this patch works because it seems like we need
> to strip out more information now from the DT line if the $date
> actually contains more information than just the date.
> 
> If you would go ahead and create a bug in bugzilla for  this (http://
> bugzilla.open-bio.org) this sort of conversation can be tracked to
> the bug.
> 
> If any of this is unclear please let us know - I though we had put
> some pages up about this sort of thing on the wiki but maybe they
> need to be expanded.
> 
> -jason
> On May 22, 2006, at 9:51 PM, Michael Rogoff wrote:
> 
> > I have a patch that seems to work but I'm not familiar with the
> > proper method to
> > "provide" it.  How do I go about that?
> >
> > The patch is pretty simple, it just parses the sequence version out
> > of the date
> > line where it now hides:
> >
> >          #date
> >          elsif( /^DT\s+(.*)/ ) {
> >            my $date = $1;
> > +
> > +          if ($date =~ /sequence version (\d+)/i) {
> > +              $params{'-seq_version'} ||= $1;
> > +          }
> > +
> >            $date =~ s/\;//;
> >            $date =~ s/\s+$//;
> >            push @{$params{'-dates'}}, $date;
> >          }
> >
> > By the way, what is the difference between Bio::Seq::version and
> > Bio::Seq::RichSeq::seq_version?
> >
> >
> >> -----Original Message-----
> >> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >> Sent: Monday, May 22, 2006 6:37 PM
> >> To: Michael Rogoff
> >> Cc: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> >>
> >>
> >> Sounds like a "missing feature" =)
> >>
> >> AFAIK the module was only written for swissprot files.  It is
> >> possible there have been changes in the format that have not been
> >> tracked to the current code.  We'd certainly appreciate someone
> >> testing it out as versions evolve.  If you submit a bug to bugzilla
> >> with version of bioperl and example files you can track when
> >> a fix is
> >> in.  We of course appreciate anyone's efforts to provide a patch as
> >> most bugs get fixed of late when someone gets "itchy" enough to fix
> >> them.
> >>
> >> -jason
> >>
> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
> >>
> >>>
> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
> >>> ignores the
> >>> sequence version, and calling seq_version() on the resulting
> >>> RichSeq object
> >>> returns undef.
> >>>
> >>> It looks like swiss.pm is trying to parse the version out
> >> of the SV
> >>> line, which
> >>> apparently doesn't exist any more?  The sequence version(s)
> >> are now
> >>> specified as
> >>> part of the Date (DT) lines.
> >>>
> >>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
> >>>
> >>> Thanks for any help ...
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >>
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From miker at biotiquesystems.com  Mon May 22 21:51:10 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Mon, 22 May 2006 18:51:10 -0700
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <B62A5429-083F-4B93-87EF-0F5DCD4033FE@duke.edu>
Message-ID: <003301c67e0b$5dd44410$c100a8c0@mike>

I have a patch that seems to work but I'm not familiar with the proper method to
"provide" it.  How do I go about that?

The patch is pretty simple, it just parses the sequence version out of the date
line where it now hides:

         #date
         elsif( /^DT\s+(.*)/ ) {
           my $date = $1;
+
+          if ($date =~ /sequence version (\d+)/i) {
+              $params{'-seq_version'} ||= $1;
+          }
+
           $date =~ s/\;//;
           $date =~ s/\s+$//;
           push @{$params{'-dates'}}, $date;
         }

By the way, what is the difference between Bio::Seq::version and
Bio::Seq::RichSeq::seq_version?


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Monday, May 22, 2006 6:37 PM
> To: Michael Rogoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> 
> 
> Sounds like a "missing feature" =)
> 
> AFAIK the module was only written for swissprot files.  It is  
> possible there have been changes in the format that have not been  
> tracked to the current code.  We'd certainly appreciate someone  
> testing it out as versions evolve.  If you submit a bug to bugzilla  
> with version of bioperl and example files you can track when 
> a fix is  
> in.  We of course appreciate anyone's efforts to provide a patch as  
> most bugs get fixed of late when someone gets "itchy" enough to fix  
> them.
> 
> -jason
> 
> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
> 
> >
> > As best as I can tell, using Bio::SeqIO to parse a uniprot file  
> > ignores the
> > sequence version, and calling seq_version() on the resulting  
> > RichSeq object
> > returns undef.
> >
> > It looks like swiss.pm is trying to parse the version out 
> of the SV  
> > line, which
> > apparently doesn't exist any more?  The sequence version(s) 
> are now  
> > specified as
> > part of the Date (DT) lines.
> >
> > Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
> >
> > Thanks for any help ...
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 


From chen_li3 at yahoo.com  Tue May 23 11:48:46 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 08:48:46 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231006.28392.lstein@cshl.edu>
Message-ID: <20060523154846.70831.qmail@web36815.mail.mud.yahoo.com>

Dear Dr. Stein,

I have the job partially done by adding this line
(under Cygwin)

print STDOUT $panel->png;

It is done because I can produce the image to be
viewed by other programs but it is only partially done
because I don't get exactly the same image as that
shown on the website. Enclosed is the image I get.

Thank you,

Li

--- Lincoln Stein <lstein at cshl.edu> wrote:

> Hi,
> 
> It is possible that your version of display can't
> handle PNG images. Try 
> saving the output as a file and then opening it in
> another image program:
> 
> 	perl render_blast1.pl data1.txt > data1.png
> 
> Another thing to watch out for is that, depending on
> what version of Perl 
> you're using, you may have to insert this statement
> into the render_blast1.pl 
> script (somewhere near the top):
> 
> 	binmode STDOUT;
> 
> Lincoln
> 
> 
> On Saturday 20 May 2006 20:15, chen li wrote:
> > Dear all,
> >
> >
> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I
> type
> > this line in Cygwin X window:
> >
> >
> > $ perl render_blast1.pl data1.txt | display -
> >
> > And here is the result:
> >
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> >
> > Any idea?
> >
> >
> > Thank you very much,
> >
> > Li
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING, 
> PLEASE CONTACT MY ASSISTANT, 
> SANDRA MICHELSEN, AT michelse at cshl.edu
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: im1
Type: image/x-png
Size: 2423 bytes
Desc: 2615755531-im1
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060523/6870f840/attachment-0003.bin>

From cjfields at uiuc.edu  Thu May 25 21:28:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 20:28:14 -0500
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike>
References: <003301c67e0b$5dd44410$c100a8c0@mike>
Message-ID: <D422B7D5-C92D-436A-8385-01CFD306DFA8@uiuc.edu>

This patch works only for the recent change in swissprot seq format  
for sequence versions on the DT line.  I checked it out vs the test  
data provided with bioperl (t\data\swiss.dat).  I did manage to get  
it working for both old and new using a modification to your patch  
but there's another issue; using $seq->get_dates, which should only  
show dates, shows the entire line (date and version info).  Jason  
mentioned that there needs to be a better way to address this which  
I'm looking into.

Chris

On May 22, 2006, at 8:51 PM, Michael Rogoff wrote:

> I have a patch that seems to work but I'm not familiar with the  
> proper method to
> "provide" it.  How do I go about that?
>
> The patch is pretty simple, it just parses the sequence version out  
> of the date
> line where it now hides:
>
>          #date
>          elsif( /^DT\s+(.*)/ ) {
>            my $date = $1;
> +
> +          if ($date =~ /sequence version (\d+)/i) {
> +              $params{'-seq_version'} ||= $1;
> +          }
> +
>            $date =~ s/\;//;
>            $date =~ s/\s+$//;
>            push @{$params{'-dates'}}, $date;
>          }
>
> By the way, what is the difference between Bio::Seq::version and
> Bio::Seq::RichSeq::seq_version?
>
>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Monday, May 22, 2006 6:37 PM
>> To: Michael Rogoff
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
>>
>>
>> Sounds like a "missing feature" =)
>>
>> AFAIK the module was only written for swissprot files.  It is
>> possible there have been changes in the format that have not been
>> tracked to the current code.  We'd certainly appreciate someone
>> testing it out as versions evolve.  If you submit a bug to bugzilla
>> with version of bioperl and example files you can track when
>> a fix is
>> in.  We of course appreciate anyone's efforts to provide a patch as
>> most bugs get fixed of late when someone gets "itchy" enough to fix
>> them.
>>
>> -jason
>>
>> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
>>
>>>
>>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
>>> ignores the
>>> sequence version, and calling seq_version() on the resulting
>>> RichSeq object
>>> returns undef.
>>>
>>> It looks like swiss.pm is trying to parse the version out
>> of the SV
>>> line, which
>>> apparently doesn't exist any more?  The sequence version(s)
>> are now
>>> specified as
>>> part of the Date (DT) lines.
>>>
>>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>>>
>>> Thanks for any help ...
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Fri May 26 10:38:29 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 26 May 2006 10:38:29 -0400
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have
	negative (-) position numbering imagemap making
In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
Message-ID: <200605261038.30380.lstein@cshl.edu>

Hi,

For some reason I didn't see the first posting on this. In current bioperl 
live, the ruler can have negative numberings - I use this routinely. You need 
to create a feature that starts in negative coordinates. What is happening to 
you when you try this?

Lincoln

On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> Hi
> thanks for the help offered thus far!
> sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using
> bioperl. therefore i was asked to make the numberings as such (-1000) is
> there any way at all to do this in bioperl without changing the .pm file?
>
> thanks guys..
> kevin
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From jelenaob at gmail.com  Fri May 26 12:47:05 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Fri, 26 May 2006 09:47:05 -0700
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
Message-ID: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>

Hi there,

I have tried loading enzyme list from a file REBASE bairoch.605 using
Bio::Restriction::IO;

1. But for some reason the number of enzymes in the list is always 532
which is a default set of enzymes in enzyme collection.

Is there any known issue with this module or a workaround?

And here is the code I have been using:

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-format=>"Bairoch")
|| die "can't load the file bairoch.605: $!";
my $enzymes = $re_in->read;
print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";

2. The other problem is when trying to use format that is lower-case
it throws an exception, but when "B" is capitalized it is ok.
I assume it cannot load a file and does not initilize enzyme
collection properly.

Can't call method "each_enzyme" on an undefined value at
.../cgi-bin/seq-load.pl line 51.

Any thoughts?


Thanks in advance,


Jelena Obradovic
jelenaob at gmail.com


From cjfields at uiuc.edu  Fri May 26 15:27:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 14:27:13 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
Message-ID: <002601c680fa$644635a0$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> Sent: Friday, May 26, 2006 11:47 AM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
> 
> Hi there,
> 
> I have tried loading enzyme list from a file REBASE bairoch.605 using
> Bio::Restriction::IO;
> 
> 1. But for some reason the number of enzymes in the list is always 532
> which is a default set of enzymes in enzyme collection.
> 
> Is there any known issue with this module or a workaround?
> 
> And here is the code I have been using:
> 
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
> format=>"Bairoch")
> || die "can't load the file bairoch.605: $!";
> my $enzymes = $re_in->read;
> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
 
my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"Bairoch");

should be 

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"bairoch");

Note the case change for the format; this is noted in the bug report you
submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (i.e.
requires a specific format, which I believe is case-sensitive).  Judging by
the modules in Bio/Restriction/IO directory, looks like the
Bio::Restriction::IO format should match one of the following formats:
bairoch, itype2, withrefm, and you can also build your own if needed using
the previous as examples and implementing Bio::Restriction::IO::base.

> 2. The other problem is when trying to use format that is lower-case
> it throws an exception, but when "B" is capitalized it is ok.
> I assume it cannot load a file and does not initilize enzyme
> collection properly.
> 
> Can't call method "each_enzyme" on an undefined value at
> .../cgi-bin/seq-load.pl line 51.

My guess?  The reason it works with an uppercase ('Bairoch') is that it
can't find the module and uses the default set of enzymes as a fallback.
The exception that you reported when you use lowercase ('bairoch') is real
and I reported it as a bug (there are a few I found in that module).

You might want to try using one of the other formats if you can get the
files in the right format from REBASE.  I'm looking into the bugs
specifically associated with Bio::Restriction::IO::bairoch.

> Any thoughts?
> 
> 
> Thanks in advance,
> 
> 
> Jelena Obradovic
> jelenaob at gmail.com
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Fri May 26 15:43:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 26 May 2006 15:43:18 -0400
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine>
Message-ID: <C09CD296.8961%osborne1@optonline.net>

Chris,

SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA'
should work). This is what the documentation says and what the code seems to
suggest. This is probably what the Restriction modules should do as well.

Brian O.


From cjfields at uiuc.edu  Fri May 26 16:21:03 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 15:21:03 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <C09CD296.8961%osborne1@optonline.net>
Message-ID: <002701c68101$e9432540$15327e82@pyrimidine>

Okay, my bad.  Having the format be case-insensitive makes sense and is
probably an easy fix, but there seem to be more serious issues with the
Bio::Restriction::IO modules at the moment.  None have implemented write
methods though POD implies they work:

SYNOPSIS

    use Bio::Restriction::IO;

    $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                     -format => 'withrefm');
    $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                     -format => 'bairoch');
    my $res = $in->read; # a Bio::Restriction::EnzymeCollection
    $out->write($res);

and no tests exist for Bio::Restriction::IO::bairoch yet.  In fact, the
tests are pretty confusing; when did we allow this syntax: '-format => 8'?
Anyway, I'm muddling my way through this and will probably write something
up for the project priority list if I can't work this bug out.  

Chris

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Friday, May 26, 2006 2:43 PM
> To: Chris Fields; 'Jelena Obradovic'; Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file
> 
> Chris,
> 
> SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA'
> should work). This is what the documentation says and what the code seems
> to
> suggest. This is probably what the Restriction modules should do as well.
> 
> Brian O.
> 
> 


From andreas.bender at complife.org  Fri May 26 10:50:03 2006
From: andreas.bender at complife.org (Andreas Bender (CompLife'06))
Date: Fri, 26 May 2006 10:50:03 -0400
Subject: [Bioperl-l] Bioperl-based Applications for "Free Software" Session?
Message-ID: <e83118520605260750w3e66286bmbd6a14be3d2299d6@mail.gmail.com>

Dear All,

Did anyone of you implement some cool programs/tools using Bioperl? Or
is there someone from the Bioperl core team who wants to present
Bioperl itself at our conference? We are holding a "free software"
session (free at least as in free beer, ideally also open source, some
GNU-type license) at our "Computational Life Sciences" Conference in
Cambridge/UK later this year and you are warmly welcome to present
your software there. Please contact me directly or visit the website
in case of any questions.

Enjoy the weekend,
Andreas


                                  Call for Contributions
==================================================
               LIFE SCIENCE FREE SOFTWARE SESSION

          held at CompLife 2006 (http://www.complife.org)
     in Cambridge, United Kingdom, on September 27 - 29, 2006
==================================================
In the last years more and more free and open source software has been
developed for chemo- and bioinformatics, molecular modelling or other
Life Science applications, but many of the programs are not well
known. During the CompLife 2006 conference we will organize a special
session dedicated to this type of free software. The demo session will
be preceeded by a short session having room for brief introductory
presentations whereas the demo session itself will allow attendees to
see the tools in action. Authors of free software will have the
opportunity to present their program to the CompLife audience which
will consist of researchers and users from computer science, biology,
chemistry and everything in between.

In case you are interested in the free software session, send us an
email at fss at complife.org and briefly describe your program and how
you intend to present it at the conference (1-2 pages max - please
include URL to downloadable version where available). The only
restrictions are that the program must be freely available for
everyone or even open source and that it must be related to Life
Science applications. The deadline for these proposals is June, 16th
2006. In mid July we will notify you if your software demo was
accepted.
************************

-- 
Computational Life Sciences '06 Cambridge/UK, 27-29 September 2006:
Visit http://www.complife.org for more information!

Andreas Kieron Patrick Bender - http://www.andreasbender.de
Novartis Institutes for BioMedical Research, Cambridge/MA


From cjfields at uiuc.edu  Fri May 26 17:19:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 16:19:08 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
Message-ID: <002b01c6810a$06642400$15327e82@pyrimidine>

The POD documentation is a bit misleading for Bio::Restriction::IO.  Brian's
right, there needs to be more flexibility with the case for the formats
used.  I found a few other odd things as well which I may file bug reports
for.  Looks like another post for the project priority list.

 
Chris

 
  _____  

From: Jelena Obradovic [mailto:jobradovic at gmail.com] 
Sent: Friday, May 26, 2006 3:56 PM
To: Chris Fields
Cc: Jelena Obradovic; Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file

 
Hi guys, I tried with the other formats, and it works fine with "withrefm"
format but not with "withref".

Thanks a lot for your reponse.

Cheers,

Jelena

On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> Sent: Friday, May 26, 2006 11:47 AM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file 
>
> Hi there,
>
> I have tried loading enzyme list from a file REBASE bairoch.605 using
> Bio::Restriction::IO;
>
> 1. But for some reason the number of enzymes in the list is always 532 
> which is a default set of enzymes in enzyme collection.
>
> Is there any known issue with this module or a workaround?
>
> And here is the code I have been using:
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- 
> format=>"Bairoch")
> || die "can't load the file bairoch.605: $!";
> my $enzymes = $re_in->read;
> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; 

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"Bairoch");

should be

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- 
                                   format=>"bairoch");

Note the case change for the format; this is noted in the bug report you
submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (
i.e.
requires a specific format, which I believe is case-sensitive).  Judging by
the modules in Bio/Restriction/IO directory, looks like the
Bio::Restriction::IO format should match one of the following formats:
bairoch, itype2, withrefm, and you can also build your own if needed using
the previous as examples and implementing Bio::Restriction::IO::base.

> 2. The other problem is when trying to use format that is lower-case 
> it throws an exception, but when "B" is capitalized it is ok.
> I assume it cannot load a file and does not initilize enzyme
> collection properly.
>
> Can't call method "each_enzyme" on an undefined value at 
> .../cgi-bin/seq-load.pl line 51.

My guess?  The reason it works with an uppercase ('Bairoch') is that it
can't find the module and uses the default set of enzymes as a fallback.
The exception that you reported when you use lowercase ('bairoch') is real 
and I reported it as a bug (there are a few I found in that module).

You might want to try using one of the other formats if you can get the
files in the right format from REBASE.  I'm looking into the bugs
specifically associated with Bio::Restriction::IO::bairoch.

> Any thoughts?
>
>
> Thanks in advance,
>
>
> Jelena Obradovic
> jelenaob at gmail.com  <mailto:jelenaob at gmail.com> 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Jelena Obradovic
Email: jobradovic at gmail.com


From jay at jays.net  Sat May 27 12:47:27 2006
From: jay at jays.net (Jay Hannah)
Date: Sat, 27 May 2006 11:47:27 -0500
Subject: [Bioperl-l] "Project OpenLab" (working title)
Message-ID: <4478829F.5030508@jays.net>

Hola --

We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)

   "Project OpenLab":
   http://omaha.pm.org/kwiki/?BioPerl

- Does any such project already exist? 
- If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 
- I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
- I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
- I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
- I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.

Thanks for your time,

j


From fernan at iib.unsam.edu.ar  Sat May 27 18:30:44 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Sat, 27 May 2006 19:30:44 -0300
Subject: [Bioperl-l] "Project OpenLab" (working title)
In-Reply-To: <4478829F.5030508@jays.net>
References: <4478829F.5030508@jays.net>
Message-ID: <20060527223044.GA40583@iib.unsam.edu.ar>

+----[ Jay Hannah <jay at jays.net> (27.May.2006 15:15):
|
| Hola --

Hola!

| We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)
| 
|    "Project OpenLab":
|    http://omaha.pm.org/kwiki/?BioPerl
| 
| - Does any such project already exist? 

mmm ... maybe ... both GUS (Genomics Unified Schema:
gusdb.org, though not developed around bioperl) and GMOD
(Generic Model Organism Database: gmod.org) provide you with 
i) RDBMS storage
ii) a Perl object layer
iii) a web app framework

Though certainly overkill for the needs you describe
in the wiki, they can be customized to work in the way you
describe or at least serve as a guide.

| - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 

Have you considered Perl Catalyst? It has the benefits of
allowing you to work with bioperl modules naturally (it's
Perl!) a choice of templating toolkits (Template Toolkit, Mason,
among others) and will provide you with an almost ready to
go controller/url dispatcher.

| - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
| - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
| - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
| - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.
| 
| Thanks for your time,
| 
| j
|
+----]

Good luck,

Fernan


From epsteinj at mail.nih.gov  Fri May 26 14:46:32 2006
From: epsteinj at mail.nih.gov (Epstein, Jonathan A (NIH/NICHD) [E])
Date: Fri, 26 May 2006 14:46:32 -0400
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler
	havenegative (-) position numbering imagemap making
In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
Message-ID: <42504F69898FE546B3F0238C9BD032750915F8@NIHCESMLBX7.nih.gov>

While this is being discussed and we have Lincoln's attention; in example 4 on the Biographics Howto:
   http://stein.cshl.org/genome_informatics/BioGraphics/Graphics-HOWTO.html
how can one assign directional arrows to the graded segments which represent the BLAST hits?  I.e., is there a glyph type which is both an 'arrow' and a 'graded_segment'?  What other techniques do you recommend for associating directionality with these hits?

Thanks&regards,

Jonathan


From jobradovic at gmail.com  Fri May 26 16:55:35 2006
From: jobradovic at gmail.com (Jelena Obradovic)
Date: Fri, 26 May 2006 13:55:35 -0700
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine>
References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
	<002601c680fa$644635a0$15327e82@pyrimidine>
Message-ID: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>

Hi guys, I tried with the other formats, and it works fine with "withrefm"
format but not with "withref".

Thanks a lot for your reponse.

Cheers,

Jelena

On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> > Sent: Friday, May 26, 2006 11:47 AM
> > To: Bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
> >
> > Hi there,
> >
> > I have tried loading enzyme list from a file REBASE bairoch.605 using
> > Bio::Restriction::IO;
> >
> > 1. But for some reason the number of enzymes in the list is always 532
> > which is a default set of enzymes in enzyme collection.
> >
> > Is there any known issue with this module or a workaround?
> >
> > And here is the code I have been using:
> >
> > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
> > format=>"Bairoch")
> > || die "can't load the file bairoch.605: $!";
> > my $enzymes = $re_in->read;
> > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>                                    format=>"Bairoch");
>
> should be
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>                                    format=>"bairoch");
>
> Note the case change for the format; this is noted in the bug report you
> submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (
> i.e.
> requires a specific format, which I believe is case-sensitive).  Judging
> by
> the modules in Bio/Restriction/IO directory, looks like the
> Bio::Restriction::IO format should match one of the following formats:
> bairoch, itype2, withrefm, and you can also build your own if needed using
> the previous as examples and implementing Bio::Restriction::IO::base.
>
> > 2. The other problem is when trying to use format that is lower-case
> > it throws an exception, but when "B" is capitalized it is ok.
> > I assume it cannot load a file and does not initilize enzyme
> > collection properly.
> >
> > Can't call method "each_enzyme" on an undefined value at
> > .../cgi-bin/seq-load.pl line 51.
>
> My guess?  The reason it works with an uppercase ('Bairoch') is that it
> can't find the module and uses the default set of enzymes as a fallback.
> The exception that you reported when you use lowercase ('bairoch') is real
> and I reported it as a bug (there are a few I found in that module).
>
> You might want to try using one of the other formats if you can get the
> files in the right format from REBASE.  I'm looking into the bugs
> specifically associated with Bio::Restriction::IO::bairoch.
>
> > Any thoughts?
> >
> >
> > Thanks in advance,
> >
> >
> > Jelena Obradovic
> > jelenaob at gmail.com
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Jelena Obradovic
Email: jobradovic at gmail.com


From gad14 at cornell.edu  Fri May 26 16:02:33 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Fri, 26 May 2006 16:02:33 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
Message-ID: <44775ED9.4020208@cornell.edu>

Hi,

I'm running local blast with Bio::Tools::Run::StandAloneBlast. 
Everything seems to work ok up to the point of accessing the results. I 
am able to print the results but when I try to do more than one thing 
with the result, nothing is returned for the second activity..

I'd like to first sort the results into groups of results that hit the 
db seq once, twice, three times, etc - where the results are stored as 
SeqFeature objects in temporary arrays whose contents are printed 
sequentially to stdout when the whole sort is complete.

Secondly, I need to print the results in Hit Table (i.e. -m 8) format to 
stdout.

If I've sorted the results the sorted-results will print to screen, 
however when I try to print the Hit Table results nothing is returned, 
as if the blast results have evaporated.... and visa versa, if i comment 
out the part where i point my sorting subroutine to the blast results 
reference,  my hit table results suddenly prints to screen. It's almost 
like the reference to the SearchIO obj that holds the StandAloneBlast 
results is lost after one use?? (I'm beginning to think there is 
something naive about the way I'm using references?..)


Here's an abbreviated version of my code:


my $ref_seq_objs; # ref to array of Sequence obj's
my $genome_seq; # fasta containing 1 genomic sequence

my @params = ('program' => 'blastn',
	       'database' => $genome_seq,
                 );
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

my $blast_report = $factory->blastall($ref_seq_objs); #OK

#######
### the following 2 actions seem to be mutually exclusive.
# 1) sort results into 1-hitter, 2-hitter, etc. groups of
# SeqFeature objs stored in arrays. arrays are then printed
# to stdout
&sort_results($blast_report);

# 2) print blast results
&print_blast_results($blast_report);
#######


sub print_blast_results{
   my $report = shift;
   while(my $result = $report->next_result()){
     while(my $hit = $result->next_hit()){
       while(my $hsp = $hit->next_hsp()){
	my $q_name = $hsp_q_seq_obj->display_id;
         print join(", ",$q_name,$hit->name,$hsp->bits)."\n";
       }
     }
   }
}


I'm about to lose my mind on this... any assistance appreciated!

Thanks,
Genevieve


From rvosa at sfu.ca  Sun May 28 03:43:23 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Sun, 28 May 2006 00:43:23 -0700
Subject: [Bioperl-l] "Project OpenLab" (working title)
In-Reply-To: <4478829F.5030508@jays.net>
References: <4478829F.5030508@jays.net>
Message-ID: <4479549B.5030202@sfu.ca>

The TreeBaseII team (part of the cipres project: http://www.phylo.org) 
are working on a lab database system for storage of intermediate 
calculation results and data (sequence alignments, trees, taxon sets). I 
think what you're discussing is a bit more molecular and less 
phylogenetic, but it does sound similar in spirit.

Rutger

Jay Hannah wrote:
> Hola --
>
> We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)
>
>    "Project OpenLab":
>    http://omaha.pm.org/kwiki/?BioPerl
>
> - Does any such project already exist? 
> - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 
> - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
> - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
> - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
> - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.
>
> Thanks for your time,
>
> j
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From cjfields at uiuc.edu  Sun May 28 09:55:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 28 May 2006 08:55:47 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
	<002601c680fa$644635a0$15327e82@pyrimidine>
	<286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
Message-ID: <EA78F27A-074E-4C9D-AC70-27D4CC20F8C4@uiuc.edu>

Again, it's b/c 'withrefm' is a valid Restriction::IO module and  
'withref' is not.  Similar to the case issue you saw before with  
'bairoch.'  Making this more lenient would help but there are more  
serious issues with these modules that need to be addressed...

http://www.bioperl.org/wiki/Project_priority_list#Restriction_Enzymes

Chris

On May 26, 2006, at 3:55 PM, Jelena Obradovic wrote:

> Hi guys, I tried with the other formats, and it works fine with  
> "withrefm"
> format but not with "withref".
>
> Thanks a lot for your reponse.
>
> Cheers,
>
> Jelena
>
> On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
>>> Sent: Friday, May 26, 2006 11:47 AM
>>> To: Bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
>>>
>>> Hi there,
>>>
>>> I have tried loading enzyme list from a file REBASE bairoch.605  
>>> using
>>> Bio::Restriction::IO;
>>>
>>> 1. But for some reason the number of enzymes in the list is  
>>> always 532
>>> which is a default set of enzymes in enzyme collection.
>>>
>>> Is there any known issue with this module or a workaround?
>>>
>>> And here is the code I have been using:
>>>
>>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>> format=>"Bairoch")
>>> || die "can't load the file bairoch.605: $!";
>>> my $enzymes = $re_in->read;
>>> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
>>
>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>                                    format=>"Bairoch");
>>
>> should be
>>
>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>                                    format=>"bairoch");
>>
>> Note the case change for the format; this is noted in the bug  
>> report you
>> submitted earlier.  Bio::Restriction::IO works similarly to  
>> Bio::SeqIO (
>> i.e.
>> requires a specific format, which I believe is case-sensitive).   
>> Judging
>> by
>> the modules in Bio/Restriction/IO directory, looks like the
>> Bio::Restriction::IO format should match one of the following  
>> formats:
>> bairoch, itype2, withrefm, and you can also build your own if  
>> needed using
>> the previous as examples and implementing Bio::Restriction::IO::base.
>>
>>> 2. The other problem is when trying to use format that is lower-case
>>> it throws an exception, but when "B" is capitalized it is ok.
>>> I assume it cannot load a file and does not initilize enzyme
>>> collection properly.
>>>
>>> Can't call method "each_enzyme" on an undefined value at
>>> .../cgi-bin/seq-load.pl line 51.
>>
>> My guess?  The reason it works with an uppercase ('Bairoch') is  
>> that it
>> can't find the module and uses the default set of enzymes as a  
>> fallback.
>> The exception that you reported when you use lowercase ('bairoch')  
>> is real
>> and I reported it as a bug (there are a few I found in that module).
>>
>> You might want to try using one of the other formats if you can  
>> get the
>> files in the right format from REBASE.  I'm looking into the bugs
>> specifically associated with Bio::Restriction::IO::bairoch.
>>
>>> Any thoughts?
>>>
>>>
>>> Thanks in advance,
>>>
>>>
>>> Jelena Obradovic
>>> jelenaob at gmail.com
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> -- 
> Jelena Obradovic
> Email: jobradovic at gmail.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From osborne1 at optonline.net  Sun May 28 11:03:37 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 28 May 2006 11:03:37 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
Message-ID: <C09F3409.8992%osborne1@optonline.net>

Genevieve,

Does this simplified code, without the &sort_results($blast_report) line,
work?

By the way, no one can really help you here because you haven't shown us all
of the code. The code you are showing certainly looks OK.


Brian O.


On 5/26/06 4:02 PM, "Genevieve DeClerck" <gad14 at cornell.edu> wrote:

> &sort_results($blast_report);


From simon.rayner.mlist at gmail.com  Mon May 29 03:37:24 2006
From: simon.rayner.mlist at gmail.com (mailing lists)
Date: Mon, 29 May 2006 15:37:24 +0800
Subject: [Bioperl-l] installation problems with bioperl-ext on x86_64
	running SuSE linux
Message-ID: <f73437f70605290037q3c7637e4h29faa3aed16ec77a@mail.gmail.com>

Hello,

i'm having a problem trying to install the bioperl-ext package on my
system.

biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # perl Makefile.PL
Writing Makefile for Bio::Ext::Align
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # make
cc -c  -I./libs -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -fPIC -O2 -fmessage-length=0 -Wall
-D_FORTIFY_SOURCE=2 -g -Wall -pipe   -DVERSION=\"0.1\" -DXS_VERSION=
\"0.1\" -fPIC "-I/usr/lib/perl5/5.8.7/x86_64-linux-thread-multi/CORE"
-DPOSIX -DNOERROR Align.c
In file included from Align.xs:12:
./libs/sw.h:1360:1: warning: "/*" within comment
.
.
.
Running Mkbootstrap for Bio::Ext::Align ()
chmod 644 Align.bs
rm -f blib/arch/auto/Bio/Ext/Align/Align.so
LD_RUN_PATH="" cc  -shared -L/usr/local/lib64 Align.o  -o
blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a  -lm
/usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld:
libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not
be used when making a shared object; recompile with -fPIC
libs/libsw.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #

the -fPIC flag is already set in the makefile.

I found a similar problem in an earlier posting with the following
suggestions....


  From: Aaron J. Mackey <amackey <at> pcbi.upenn.edu>
  Subject: Re: compiling bioperl-ext
  Newsgroups: gmane.comp.lang.perl.bio.general
  Date: 2004-06-09 20:46:05 GMT (1 year, 50 weeks, 3 days, 3 hours and 50
  minutes ago)

  1) Are you starting with a clean build directory?

  2) Does installing other compiled Perl modules work for you (e.g.
  Data::Dumper or Storable)?

  That's a pretty arcane error, and if the answer to #2 is "no", then I
  don't think we can help you.

  -Aaron


....In my case, both 1) and 2) are true.  I installed Data::Dumper without
any problems.


I've found plenty of similar incidences for other sofware and it seems to
relate to
32/64bit issues.

Does anyone have any suggestions about how to get around this?

thanks

Simon Rayner


From ULNJUJERYDIX at spammotel.com  Mon May 29 05:46:21 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Mon, 29 May 2006 17:46:21 +0800
Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the
	ruler have
In-Reply-To: <200605261038.30380.lstein@cshl.edu>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
	<200605261038.30380.lstein@cshl.edu>
Message-ID: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>

Hi!
oh it was in a slightly different header asking about the create image map
feature.
I am using the stable version 1.4 of bioperl now. In any case I have not
added the sequence as a feature annotated seq. as I already have the bp
where the TF binds (in 1-1050 numberings) so what I did was to just add
graded segments based on the position.
I saw that there is a scale function for the arrow glyp however, it is a
multiply function, can it be hacked to take in a offset value (ie minus the
scale by 1000?)

cheers
kevin


Hi,
>
> For some reason I didn't see the first posting on this. In current bioperl
> live, the ruler can have negative numberings - I use this routinely. You
> need
> to create a feature that starts in negative coordinates. What is happening
> to
> you when you try this?
>
> Lincoln
>
> On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > Hi
> > thanks for the help offered thus far!
> > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> using
> > bioperl. therefore i was asked to make the numberings as such (-1000) is
> > there any way at all to do this in bioperl without changing the .pm
> file?
> >
> > thanks guys..
> > kevin
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From shameer at ncbs.res.in  Mon May 29 06:07:17 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 29 May 2006 15:37:17 +0530 (IST)
Subject: [Bioperl-l] Reg. Integrated Server / CGI to pass PDB to multiple
	Servers
Message-ID: <49187.192.168.1.1.1148897237.squirrel@192.168.1.1>

Dear All,

My query may not be directly related to BioPERL, But am sure I will get
some idea to move on. Some possibilities wil be available from Pise or
related modules

Query :
---------
We have several public servers(say a,b,c). All of them will take a
pdb-file as an input and process it and displays it. Now, I need to create
a web page(a meta-server/integrated web-server) with three radio
buttons(a,b,c) and a single input form(to accept pdb file from the users
...:( - File passing as an argument seems to be some what impossible to
me). I need output as 3 links in next page.

Is there any Bio-PERL module / CGI / Perl tricks to do it ?

Thanks in advance,
-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675
W - http://caps.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."


From torsten.seemann at infotech.monash.edu.au  Tue May 30 02:41:31 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 30 May 2006 16:41:31 +1000
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
References: <44775ED9.4020208@cornell.edu>
Message-ID: <447BE91B.30001@infotech.monash.edu.au>

> my $ref_seq_objs; # ref to array of Sequence obj's
> my $genome_seq; # fasta containing 1 genomic sequence
> my @params = ('program' => 'blastn',
> 	       'database' => $genome_seq,
 >                  );

The database parameter needs to be the same thing you would pass to the 
"-d" option in "blastall". I don't think you can pass a perl string 
here. ie. there needs to be a properly formatted set of blast indices 
for your genome sequence on the disk in the appropriate place.
See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html

> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
> my $blast_report = $factory->blastall($ref_seq_objs); #OK

But I could be wrong, and $blast_report here contains a valid BLAST report.

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From sb at mrc-dunn.cam.ac.uk  Tue May 30 03:59:28 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 30 May 2006 08:59:28 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
References: <44775ED9.4020208@cornell.edu>
Message-ID: <447BFB60.4000006@mrc-dunn.cam.ac.uk>

Genevieve DeClerck wrote:
> Hi,
[snip]
> If I've sorted the results the sorted-results will print to screen, 
> however when I try to print the Hit Table results nothing is returned, 
> as if the blast results have evaporated.... and visa versa, if i comment 
> out the part where i point my sorting subroutine to the blast results 
> reference,  my hit table results suddenly prints to screen.
[snip]
> Here's an abbreviated version of my code:
[snip]
> #######
> ### the following 2 actions seem to be mutually exclusive.
> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
> # SeqFeature objs stored in arrays. arrays are then printed
> # to stdout
> &sort_results($blast_report);
> 
> # 2) print blast results
> &print_blast_results($blast_report);

> sub print_blast_results{
>    my $report = shift;
>    while(my $result = $report->next_result()){
[snip]

You didn't give us your sort_results subroutine, but is it as simple as
they both use $report->next_result (and/or $result->next_hit), but you
don't reset the internal counter back to the start, so the second
subroutine tries to get the next_result and finds the first subroutine
has already looked at the last result and so next_result returns false?

 From a quick look it wasn't obvious how to reset the counter. Hopefully
this can be done and someone else knows how.


From torsten.seemann at infotech.monash.edu.au  Tue May 30 04:18:45 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 30 May 2006 18:18:45 +1000
Subject: [Bioperl-l] For CVS developers - potential pitfall with "return
	undef"
Message-ID: <447BFFE5.8010508@infotech.monash.edu.au>

FYI Bioperl developers:

I just audited the bioperl-live CVS and found about 450 occurrences of 
"return undef".

Page 199 of "Perl Best Practices" by Damian Conway, and this URL
http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:

"Use return; instead of return undef; if you want to return nothing. If 
someone assigns the return value to an array, the latter creates an 
array of one value (undef), which evaluates to true. The former will 
correctly handle all contexts."

So I'm guessing at least some of these 450 occurrences *could* result in 
bugs and should probably be changed.

Your opinion may differ :-)

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From cjfields at uiuc.edu  Tue May 30 10:07:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 09:07:45 -0500
Subject: [Bioperl-l] For CVS developers - potential pitfall with
	"returnundef"
In-Reply-To: <447BFFE5.8010508@infotech.monash.edu.au>
Message-ID: <000c01c683f2$6ca62570$15327e82@pyrimidine>

Torsten,

Any way you can post a list of some/all of the offending lines or modules?
Sounds like something to consider, but if the list is as large as you say we
made need something (bugzilla? wiki?) to track the changes and make sure
they pass tests; I'm sure a large majority will.  

I'm guessing Jason would want this somewhere on the project priority list or
bugzilla, with a link to the actual list, but I'm not sure.  Maybe start a
page on the wiki for proposed code changes?

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> Sent: Tuesday, May 30, 2006 3:19 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> "returnundef"
> 
> FYI Bioperl developers:
> 
> I just audited the bioperl-live CVS and found about 450 occurrences of
> "return undef".
> 
> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
> 
> "Use return; instead of return undef; if you want to return nothing. If
> someone assigns the return value to an array, the latter creates an
> array of one value (undef), which evaluates to true. The former will
> correctly handle all contexts."
> 
> So I'm guessing at least some of these 450 occurrences *could* result in
> bugs and should probably be changed.
> 
> Your opinion may differ :-)
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Tue May 30 10:47:48 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 30 May 2006 10:47:48 -0400
Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the
	ruler have
In-Reply-To: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
	<200605261038.30380.lstein@cshl.edu>
	<5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>
Message-ID: <200605301047.49127.lstein@cshl.edu>

Hi Kevin,

I'm afraid that there is no offset value. You'll need the 1.51 version of 
bioperl to handle negative numbers properly. I understand your reluctance to 
upgrade just to get the Bio::Graphics functionality. You might consider 
checking out just the Bio/Graphics subtree and installing that. It should 
work on top of 1.4

Lincoln

On Monday 29 May 2006 05:46, Kevin Lam Koiyau wrote:
> Hi!
> oh it was in a slightly different header asking about the create image map
> feature.
> I am using the stable version 1.4 of bioperl now. In any case I have not
> added the sequence as a feature annotated seq. as I already have the bp
> where the TF binds (in 1-1050 numberings) so what I did was to just add
> graded segments based on the position.
> I saw that there is a scale function for the arrow glyp however, it is a
> multiply function, can it be hacked to take in a offset value (ie minus the
> scale by 1000?)
>
> cheers
> kevin
>
>
> Hi,
>
> > For some reason I didn't see the first posting on this. In current
> > bioperl live, the ruler can have negative numberings - I use this
> > routinely. You need
> > to create a feature that starts in negative coordinates. What is
> > happening to
> > you when you try this?
> >
> > Lincoln
> >
> > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > Hi
> > > thanks for the help offered thus far!
> > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> >
> > using
> >
> > > bioperl. therefore i was asked to make the numberings as such (-1000)
> > > is there any way at all to do this in bioperl without changing the .pm
> >
> > file?
> >
> > > thanks guys..
> > > kevin
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Tue May 30 10:50:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 09:50:06 -0500
Subject: [Bioperl-l] Bio::Restriction::IO issues
Message-ID: <000f01c683f8$5771ed50$15327e82@pyrimidine>

Jason, Brian, et al,

I found several major issues with Bio::Restriction::IO (this popped up while
bug squashing).  In particular, the POD is pretty misleading.  It states
(directly from perldoc):

SYNOPSIS
        use Bio::Restriction::IO;

        $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                         -format => 'withrefm');
        $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                         -format => 'bairoch');
        my $res = $in->read; # a Bio::Restriction::EnzymeCollection
        $out->write($res);

      # or

      #    use Bio::Restriction::IO;
      #
      #    #input file format can be read from the file extension (dat|xml)
      #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
      #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
      #
      #    # World's shortest flat<->xml format converter:
      #    print $out $_ while <$in>;

So, I have found several problems with these modules.  I really hate to
criticize code here, as my own is pretty hacky, but I think these are things
to seriously mull over: 

1)	Note that, though some of the lines above are commented they are
still there in POD and thus present in perldoc/pod2html etc.  So, judging
from the above, it suggests using the script above should read in from one
format and write out to another (like SeqIO).  However, NONE of the current
write() methods are implemented for any of the IO modules (withref, base,
itype2, bairoch), so this does not happen as expected.  You get the nasty
thrown 'method not implemented error' instead when writing.
2)	The commented statements in POD above also suggest that REBASE XML
format is supported when there is no XML module.  
3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
made it unusable until I added a few small changes; it still can't handle
multisite/multicut enzymes properly, so in essence it is useless until that
is addressed.
4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make
up it's own methods?  

I'm working on at least getting the 'bairoch' input format up and running
(so at least it gets the enzymes into a
Bio::Restriction::Enzyme::Collection).  From this point I'm not sure where
to proceed.  The POD obviously needs to be corrected to reflect that writing
formats is not implemented (and the bit about XML should be taken out
completely); that's the easy part which I am working on and plan committing
today.  However, these modules don't seem to be used too frequently so I'm
not sure whether it's worth spending too much time getting these up to speed
at the moment (adding write methods, switching to Bio::Root::Root, etc); I
have other priorities at the moment (including a way overdue ListSummary).
I'm also not sure who else is (using|working) on these so I don't want to
(make too many changes|step on someone else's toes), but these are, IMHO,
pretty serious problems.  

Any thoughts?

Chris


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Tue May 30 12:34:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 11:34:18 -0500
Subject: [Bioperl-l] Bio::Restriction::IO changes
Message-ID: <001401c68406$e71e9850$15327e82@pyrimidine>

Jason, Brian, et al:

I have made changes to the Bio::Restriction::IO POD to remove any reference
to write functions since almost none have been implemented yet, so including
this into POD is a bit misleading.  At the moment, you can't write to any
REBASE format except for 'base', which I found is the only one that works.
And, upon further checking, even that one has issues: it looks like there
are problems with multicut/multisite enzymes when writing in 'base' format
which I'm not delving into ('TaqII' only displays one site when writing when
it has two cut sites).  I'll add this to the wiki and a bug report
(enhancement) for this module.

I am also removing mention of XML and 'bairoch' formats (the former isn't
present and the latter is broken at the moment) and added a few things to
the POD TO DO section.  

Rob (if you're out there somewhere in the ether), have you made any more
changes to these modules that need to be committed?  Didn't know if any of
these issues have already been addressed/changed etc.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From jelenaob at gmail.com  Tue May 30 00:58:35 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Mon, 29 May 2006 21:58:35 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>

Hello everybody,

does anybody know how to remove the background color of the Panel.
Currently, I am not adding anything to it, so I can troubleshot the problem,
and I have tried setting up
all color attributes I could find to the panel, but no luck. Whatever I do,
I get the BLUE border of the panel.

Has anybody faced the same problem?

Thanks in advance,

Jelena

And here is the code I am currently using:

-----------------------------------------------------------------------------------------------------------
my $panel =
    Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
                              -width => 800,
                              -pad_left => 10,
                              -pad_right => 10,
                              -key_color => 'white',
                              -bgcolor => 'white',
                              -gridcolor=>'black',
                              -fgcolor => 'black',
                              -grid => 0,
                              );
   my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
     -url  => '/tmpimages');
   #make clickable image
   print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
   print $map;

-----------------------------------------------------------------------------------------------------------


From jelenaob at gmail.com  Tue May 30 00:58:35 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Mon, 29 May 2006 21:58:35 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>

Hello everybody,

does anybody know how to remove the background color of the Panel.
Currently, I am not adding anything to it, so I can troubleshot the problem,
and I have tried setting up
all color attributes I could find to the panel, but no luck. Whatever I do,
I get the BLUE border of the panel.

Has anybody faced the same problem?

Thanks in advance,

Jelena

And here is the code I am currently using:

-----------------------------------------------------------------------------------------------------------
my $panel =
    Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
                              -width => 800,
                              -pad_left => 10,
                              -pad_right => 10,
                              -key_color => 'white',
                              -bgcolor => 'white',
                              -gridcolor=>'black',
                              -fgcolor => 'black',
                              -grid => 0,
                              );
   my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
     -url  => '/tmpimages');
   #make clickable image
   print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
   print $map;

-----------------------------------------------------------------------------------------------------------


From luciap at sas.upenn.edu  Tue May 30 14:49:48 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Tue, 30 May 2006 14:49:48 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
Message-ID: <1149014988.447c93cc01761@128.91.55.38>

Hi

I am here again, I finally got to write the "collapse nodes" function and have a
couple of questions.

In order to collpase any node $node, I first have to get the parent
which I can do as $parent=$node->ancestor

and then the children as:
@children=$node->get_all_Descendents (or should I use each descendent?)

Then before deleting $node I have to assign all its children to $parent,
and here is where I am kind of confussed.
Can I use the add_Descendent function for this?
I've been tryig to write something like this:
foreach $child (@children){
         $parent=add_Descendent->$child;
}
but this doesn't work and I think it is because I don't have any idea of what I
am doing
any suggestions?

thanks


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From rvosa at sfu.ca  Tue May 30 14:52:52 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 11:52:52 -0700
Subject: [Bioperl-l] For CVS developers - potential pitfall
	with	"returnundef"
In-Reply-To: <000c01c683f2$6ca62570$15327e82@pyrimidine>
References: <000c01c683f2$6ca62570$15327e82@pyrimidine>
Message-ID: <447C9484.9030102@sfu.ca>

Although I agree with the sentiment of following PBP, I'm not so sure 
changing 'return undef' to 'return' *now* will fix any bugs without 
introducing new, subtle ones.

Chris Fields wrote:
> Torsten,
>
> Any way you can post a list of some/all of the offending lines or modules?
> Sounds like something to consider, but if the list is as large as you say we
> made need something (bugzilla? wiki?) to track the changes and make sure
> they pass tests; I'm sure a large majority will.  
>
> I'm guessing Jason would want this somewhere on the project priority list or
> bugzilla, with a link to the actual list, but I'm not sure.  Maybe start a
> page on the wiki for proposed code changes?
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>> Sent: Tuesday, May 30, 2006 3:19 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>> "returnundef"
>>
>> FYI Bioperl developers:
>>
>> I just audited the bioperl-live CVS and found about 450 occurrences of
>> "return undef".
>>
>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
>>
>> "Use return; instead of return undef; if you want to return nothing. If
>> someone assigns the return value to an array, the latter creates an
>> array of one value (undef), which evaluates to true. The former will
>> correctly handle all contexts."
>>
>> So I'm guessing at least some of these 450 occurrences *could* result in
>> bugs and should probably be changed.
>>
>> Your opinion may differ :-)
>>
>> --
>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>> Victorian Bioinformatics Consortium, Monash University, Australia
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From luciap at sas.upenn.edu  Tue May 30 16:11:52 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Tue, 30 May 2006 16:11:52 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
Message-ID: <1149019912.447ca7085124e@128.91.55.38>

Hi
OK that was silly, but what I have in my code is what you just wrote
But the problem is that if I write

$parent->add_Descendent($child)

it tells me that I am calling  the method "ass_Descendent" on an undefined value
(but I did define $parent before??)

So here it goes the code so far:

use Bio::TreeIO;
 my $in = new Bio::TreeIO(-file => 'Test2.tre',
                          -format => 'newick');
 my $out = new Bio::TreeIO(-file => '>mytree.out',
                           -format => 'newick');
 while( my $tree = $in->next_tree ) {
    foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
    my $bootstrap=$node->_creation_id;

    if ($bootstrap < 70 ){
            my $parent = $node->ancestor;
            my @children=$node->get_all_Descendents;
            foreach my $child (@children){
                $parent->add_Descendent($child);
            }

........

eventually I'll add (once I assigned the children to the parent succesfully):
$tree->remove_Node($node);

        }
    }
    $out->write_tree($tree);
}

Quoting aaron.j.mackey at gsk.com:

> > foreach $child (@children){
> >          $parent=add_Descendent->$child;
> > }
>
> I think what you want is $parent->add_Descendent($child)
>
> -Aaron
>


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From jason.stajich at duke.edu  Tue May 30 16:30:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 30 May 2006 16:30:56 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <1149019912.447ca7085124e@128.91.55.38>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
Message-ID: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>

you need to special case the root - it won't have an ancestor.  just  
protect the my $parent = $node->ancestor with an if statement as I  
did below

On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:

> Hi
> OK that was silly, but what I have in my code is what you just wrote
> But the problem is that if I write
>
> $parent->add_Descendent($child)
>
> it tells me that I am calling  the method "ass_Descendent" on an  
> undefined value
> (but I did define $parent before??)
>
> So here it goes the code so far:
>
> use Bio::TreeIO;
>  my $in = new Bio::TreeIO(-file => 'Test2.tre',
>                           -format => 'newick');
>  my $out = new Bio::TreeIO(-file => '>mytree.out',
>                            -format => 'newick');
>  while( my $tree = $in->next_tree ) {
>     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
>     my $bootstrap=$node->_creation_id;
>
>     if ($bootstrap < 70 ){
>    >>> if(        my $parent = $node->ancestor ) {
>               my @children=$node->get_all_Descendents;
>               foreach my $child (@children){
>                  $parent->add_Descendent($child);
>               }
         }
>
> ........
>
> eventually I'll add (once I assigned the children to the parent  
> succesfully):
> $tree->remove_Node($node);
>
>         }
>     }
>     $out->write_tree($tree);
> }
>
> Quoting aaron.j.mackey at gsk.com:
>
>>> foreach $child (@children){
>>>          $parent=add_Descendent->$child;
>>> }
>>
>> I think what you want is $parent->add_Descendent($child)
>>
>> -Aaron
>>
>
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Tue May 30 17:40:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 16:40:18 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith	"returnundef"
In-Reply-To: <447C9484.9030102@sfu.ca>
Message-ID: <001801c68431$a586b2d0$15327e82@pyrimidine>

Agreed, though I think these changes should be implemented at some point
(Conway's argument here makes sense and it is nice for Torsten to check this
out).  If proper tests are written then any changes resulting in errors
should be picked up by checking the appropriate test suite, though I know it
doesn't absolutely guarantee it.  ; P  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> Sent: Tuesday, May 30, 2006 1:53 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> "returnundef"
> 
> Although I agree with the sentiment of following PBP, I'm not so sure
> changing 'return undef' to 'return' *now* will fix any bugs without
> introducing new, subtle ones.
> 
> Chris Fields wrote:
> > Torsten,
> >
> > Any way you can post a list of some/all of the offending lines or
> modules?
> > Sounds like something to consider, but if the list is as large as you
> say we
> > made need something (bugzilla? wiki?) to track the changes and make sure
> > they pass tests; I'm sure a large majority will.
> >
> > I'm guessing Jason would want this somewhere on the project priority
> list or
> > bugzilla, with a link to the actual list, but I'm not sure.  Maybe start
> a
> > page on the wiki for proposed code changes?
> >
> > Chris
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> >> Sent: Tuesday, May 30, 2006 3:19 AM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> >> "returnundef"
> >>
> >> FYI Bioperl developers:
> >>
> >> I just audited the bioperl-live CVS and found about 450 occurrences of
> >> "return undef".
> >>
> >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
> >>
> >> "Use return; instead of return undef; if you want to return nothing. If
> >> someone assigns the return value to an array, the latter creates an
> >> array of one value (undef), which evaluates to true. The former will
> >> correctly handle all contexts."
> >>
> >> So I'm guessing at least some of these 450 occurrences *could* result
> in
> >> bugs and should probably be changed.
> >>
> >> Your opinion may differ :-)
> >>
> >> --
> >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> >> Victorian Bioinformatics Consortium, Monash University, Australia
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
> 
> --
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Rutger Vos, PhD. candidate
> Department of Biological Sciences
> Simon Fraser University
> 8888 University Drive
> Burnaby, BC, V5A1S6
> Phone: 604-291-5625
> Fax: 604-291-3496
> Personal site: http://www.sfu.ca/~rvosa
> FAB* lab: http://www.sfu.ca/~fabstar
> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rvosa at sfu.ca  Tue May 30 17:58:25 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 14:58:25 -0700
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001901c68433$026b1ad0$15327e82@pyrimidine>
References: <001901c68433$026b1ad0$15327e82@pyrimidine>
Message-ID: <447CC001.4050000@sfu.ca>

I've been following the perl6 mailing lists for a while now. I think 
this time around it won't really take that long (one year?) for 
pugs/perl6 stacks to become more than just toys. I think especially 
large projects, like bioperl, will really benefit from the improved OO 
implementation in perl6, so it might be of interest to at least 
fantasize about it.

Chris Fields wrote:
> Ha!  Or may be the 'nonexistent' bioperl-experimental.  Wonder what'll
> happen once Perl6 comes to term?
>
> -CJF
>
>   
>> -----Original Message-----
>> From: Rutger Vos [mailto:rvosa at sfu.ca]
>> Sent: Tuesday, May 30, 2006 4:48 PM
>> To: Chris Fields
>> Subject: Re: [Bioperl-l] For CVS developers - potential
>> pitfallwith"returnundef"
>>
>> Surely this will all sort itself out in bioperl6 ;-)
>>
>> Chris Fields wrote:
>>     
>>> Agreed, though I think these changes should be implemented at some point
>>> (Conway's argument here makes sense and it is nice for Torsten to check
>>>       
>> this
>>     
>>> out).  If proper tests are written then any changes resulting in errors
>>> should be picked up by checking the appropriate test suite, though I
>>>       
>> know it
>>     
>>> doesn't absolutely guarantee it.  ; P
>>>
>>> Chris
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
>>>> Sent: Tuesday, May 30, 2006 1:53 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
>>>> "returnundef"
>>>>
>>>> Although I agree with the sentiment of following PBP, I'm not so sure
>>>> changing 'return undef' to 'return' *now* will fix any bugs without
>>>> introducing new, subtle ones.
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> Torsten,
>>>>>
>>>>> Any way you can post a list of some/all of the offending lines or
>>>>>
>>>>>           
>>>> modules?
>>>>
>>>>         
>>>>> Sounds like something to consider, but if the list is as large as you
>>>>>
>>>>>           
>>>> say we
>>>>
>>>>         
>>>>> made need something (bugzilla? wiki?) to track the changes and make
>>>>>           
>> sure
>>     
>>>>> they pass tests; I'm sure a large majority will.
>>>>>
>>>>> I'm guessing Jason would want this somewhere on the project priority
>>>>>
>>>>>           
>>>> list or
>>>>
>>>>         
>>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
>>>>>           
>> start
>>     
>>>> a
>>>>
>>>>         
>>>>> page on the wiki for proposed code changes?
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>>>>>> "returnundef"
>>>>>>
>>>>>> FYI Bioperl developers:
>>>>>>
>>>>>> I just audited the bioperl-live CVS and found about 450 occurrences
>>>>>>             
>> of
>>     
>>>>>> "return undef".
>>>>>>
>>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
>>>>>>             
>> suggest:
>>     
>>>>>> "Use return; instead of return undef; if you want to return nothing.
>>>>>>             
>> If
>>     
>>>>>> someone assigns the return value to an array, the latter creates an
>>>>>> array of one value (undef), which evaluates to true. The former will
>>>>>> correctly handle all contexts."
>>>>>>
>>>>>> So I'm guessing at least some of these 450 occurrences *could* result
>>>>>>
>>>>>>             
>>>> in
>>>>
>>>>         
>>>>>> bugs and should probably be changed.
>>>>>>
>>>>>> Your opinion may differ :-)
>>>>>>
>>>>>> --
>>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>             
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> --
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Rutger Vos, PhD. candidate
>>>> Department of Biological Sciences
>>>> Simon Fraser University
>>>> 8888 University Drive
>>>> Burnaby, BC, V5A1S6
>>>> Phone: 604-291-5625
>>>> Fax: 604-291-3496
>>>> Personal site: http://www.sfu.ca/~rvosa
>>>> FAB* lab: http://www.sfu.ca/~fabstar
>>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>>
>>>
>>>
>>>       
>> --
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Rutger Vos, PhD. candidate
>> Department of Biological Sciences
>> Simon Fraser University
>> 8888 University Drive
>> Burnaby, BC, V5A1S6
>> Phone: 604-291-5625
>> Fax: 604-291-3496
>> Personal site: http://www.sfu.ca/~rvosa
>> FAB* lab: http://www.sfu.ca/~fabstar
>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>     
>
>
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From cjfields at uiuc.edu  Tue May 30 18:08:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 17:08:26 -0500
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <447CC001.4050000@sfu.ca>
Message-ID: <001a01c68435$93135a50$15327e82@pyrimidine>

Agreed.  I would say, probably 6-12 months time, might be a good idea to try
getting something actually started, maybe under the 'bioperl-experimental'
title Jason has mentioned.  One could always try getting a Bio::Root-like
object going in Pugs/Perl6 as a starter and work up from there, with
emphasis on key areas (seq. parsing, so on).

CJF

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> Sent: Tuesday, May 30, 2006 4:58 PM
> To: bioperl list
> Subject: Re: [Bioperl-l] For CVS developers -
> potentialpitfallwith"returnundef"
> 
> I've been following the perl6 mailing lists for a while now. I think
> this time around it won't really take that long (one year?) for
> pugs/perl6 stacks to become more than just toys. I think especially
> large projects, like bioperl, will really benefit from the improved OO
> implementation in perl6, so it might be of interest to at least
> fantasize about it.
> 
> Chris Fields wrote:
> > Ha!  Or may be the 'nonexistent' bioperl-experimental.  Wonder what'll
> > happen once Perl6 comes to term?
> >
> > -CJF
> >
> >
> >> -----Original Message-----
> >> From: Rutger Vos [mailto:rvosa at sfu.ca]
> >> Sent: Tuesday, May 30, 2006 4:48 PM
> >> To: Chris Fields
> >> Subject: Re: [Bioperl-l] For CVS developers - potential
> >> pitfallwith"returnundef"
> >>
> >> Surely this will all sort itself out in bioperl6 ;-)
> >>
> >> Chris Fields wrote:
> >>
> >>> Agreed, though I think these changes should be implemented at some
> point
> >>> (Conway's argument here makes sense and it is nice for Torsten to
> check
> >>>
> >> this
> >>
> >>> out).  If proper tests are written then any changes resulting in
> errors
> >>> should be picked up by checking the appropriate test suite, though I
> >>>
> >> know it
> >>
> >>> doesn't absolutely guarantee it.  ; P
> >>>
> >>> Chris
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> >>>> Sent: Tuesday, May 30, 2006 1:53 PM
> >>>> To: bioperl-l at lists.open-bio.org
> >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> >>>> "returnundef"
> >>>>
> >>>> Although I agree with the sentiment of following PBP, I'm not so sure
> >>>> changing 'return undef' to 'return' *now* will fix any bugs without
> >>>> introducing new, subtle ones.
> >>>>
> >>>> Chris Fields wrote:
> >>>>
> >>>>
> >>>>> Torsten,
> >>>>>
> >>>>> Any way you can post a list of some/all of the offending lines or
> >>>>>
> >>>>>
> >>>> modules?
> >>>>
> >>>>
> >>>>> Sounds like something to consider, but if the list is as large as
> you
> >>>>>
> >>>>>
> >>>> say we
> >>>>
> >>>>
> >>>>> made need something (bugzilla? wiki?) to track the changes and make
> >>>>>
> >> sure
> >>
> >>>>> they pass tests; I'm sure a large majority will.
> >>>>>
> >>>>> I'm guessing Jason would want this somewhere on the project priority
> >>>>>
> >>>>>
> >>>> list or
> >>>>
> >>>>
> >>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> >>>>>
> >> start
> >>
> >>>> a
> >>>>
> >>>>
> >>>>> page on the wiki for proposed code changes?
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
> >>>>>> To: bioperl-l at lists.open-bio.org
> >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> >>>>>> "returnundef"
> >>>>>>
> >>>>>> FYI Bioperl developers:
> >>>>>>
> >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences
> >>>>>>
> >> of
> >>
> >>>>>> "return undef".
> >>>>>>
> >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> >>>>>>
> >> suggest:
> >>
> >>>>>> "Use return; instead of return undef; if you want to return
> nothing.
> >>>>>>
> >> If
> >>
> >>>>>> someone assigns the return value to an array, the latter creates an
> >>>>>> array of one value (undef), which evaluates to true. The former
> will
> >>>>>> correctly handle all contexts."
> >>>>>>
> >>>>>> So I'm guessing at least some of these 450 occurrences *could*
> result
> >>>>>>
> >>>>>>
> >>>> in
> >>>>
> >>>>
> >>>>>> bugs and should probably be changed.
> >>>>>>
> >>>>>> Your opinion may differ :-)
> >>>>>>
> >>>>>> --
> >>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
> >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> --
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Rutger Vos, PhD. candidate
> >>>> Department of Biological Sciences
> >>>> Simon Fraser University
> >>>> 8888 University Drive
> >>>> Burnaby, BC, V5A1S6
> >>>> Phone: 604-291-5625
> >>>> Fax: 604-291-3496
> >>>> Personal site: http://www.sfu.ca/~rvosa
> >>>> FAB* lab: http://www.sfu.ca/~fabstar
> >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >> --
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Rutger Vos, PhD. candidate
> >> Department of Biological Sciences
> >> Simon Fraser University
> >> 8888 University Drive
> >> Burnaby, BC, V5A1S6
> >> Phone: 604-291-5625
> >> Fax: 604-291-3496
> >> Personal site: http://www.sfu.ca/~rvosa
> >> FAB* lab: http://www.sfu.ca/~fabstar
> >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >
> >
> >
> >
> >
> >
> 
> --
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Rutger Vos, PhD. candidate
> Department of Biological Sciences
> Simon Fraser University
> 8888 University Drive
> Burnaby, BC, V5A1S6
> Phone: 604-291-5625
> Fax: 604-291-3496
> Personal site: http://www.sfu.ca/~rvosa
> FAB* lab: http://www.sfu.ca/~fabstar
> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ULNJUJERYDIX at spammotel.com  Tue May 30 23:45:12 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 31 May 2006 11:45:12 +0800
Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values
Message-ID: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>

I am so sorry for the truncated email accidentally hit reply.
if anyone is interested i have opted to change

change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
in linux its
/usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm


      $gd->string($font,$middle,$center+$a2-1,$label,$font_color)

to

      $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)

just  for this one-off use.


strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
option for coords offset?
    my $relative_coords_offset = $self->option('relative_coords_offset');
    $relative_coords_offset    = 1 unless defined $relative_coords_offset;
but entering the option -relative_coords_offset=>1000 in the arrow glyphs
didn't do anything...


Hi!
> oh it was in a slightly different header asking about the create image map
> feature.
> I am using the stable version 1.4 of bioperl now. In any case I have not
> added the sequence as a feature annotated seq. as I already have the bp
> where the TF binds (in 1-1050 numberings) so what I did was to just add
> graded segments based on the position.
> I saw that there is a scale function for the arrow glyp however, it is a
> multiply function, can it be hacked to take in a offset value (ie minus
> the
> scale by 1000?)
>
> cheers
> kevin
>
>
> Hi,
> >
> > For some reason I didn't see the first posting on this. In current
> bioperl
> > live, the ruler can have negative numberings - I use this routinely. You
> > need
> > to create a feature that starts in negative coordinates. What is
> happening
> > to
> > you when you try this?
> >
> > Lincoln
> >
> > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > Hi
> > > thanks for the help offered thus far!
> > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > using
> > > bioperl. therefore i was asked to make the numberings as such (-1000)
> is
> > > there any way at all to do this in bioperl without changing the .pm
> > file?
> > >
> > > thanks guys..
> > > kevin
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From sb at mrc-dunn.cam.ac.uk  Wed May 31 04:40:08 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 31 May 2006 09:40:08 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447C7985.9000404@cornell.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
Message-ID: <447D5668.7070500@mrc-dunn.cam.ac.uk>

Genevieve DeClerck wrote:
> Thanks for your comment Sendu, it was very helpful. I think this must be 
> what's going on.. I am using $blast_report->next_result in both 
> subroutines. It appears that analyzing the blast results first w/ my 
> sort subroutine empties (?) the $blast_result object so that when I try 
> to print, there is nothing left to print. (and visa-versa when I print 
> first then try to sort).
> So, from the looks of things, using next_result has the effect of 
> popping the Bio::Search::Result::ResultI objects off of the SearchIO 
> blast report object??

Not quite. It's more or less exactly like opening a file and then trying 
to read it all twice like this:
open(FILE, "file");
while (<FILE>) {
     print # prints each line in the file
}
while (<FILE>) {
     print # never happens, we never enter this while loop
}

To get the second while loop to print anything we need to say seek(FILE, 
0, 0) before it. Or in the first while loop store each line in an array, 
and then make the second loop a foreach through that array.


> It seems I could get around this by making a copy of the blast report by 
> setting it to another new variable...(not the most elegant solution) but 
> I'm having trouble with this...
> 
> If I do:
> 
>     my $blast_report_copy = $blast_report;
> 
> I'm just copying the reference to the SearchIO blast result, so it 
> doesn't help me. How can I make another physical copy of this blast 
> result object? Seems like a simple thing but how to do it is escaping me.

Not really a good idea, and it may not work anyway if the object 
contains a filehandle. But for a simple object you might recursively 
loop through the data structure and copy each element out into a similar 
data structure.


> But better yet, the way to go is to 'reset the counter,' or to find a 
> way to look at/print/sort the results without removing data from the 
> blast result object. How is this done though??

It would be rather nice if this worked:
my $blast_report = $factory->blastall($ref_seq_objs);
my $blast_fh = $blast_report->fh();
while (<$blast_fh>) {
     # $_ is a ResultI object, use as normal
}
seek($blast_fh, 0, 0); # this would be great, but does it work?
while <$blast_fh>) {
     # go through the results again in your second subroutine
}

An alternative hacky way of doing it, which may also not work, would be 
to go through your $blast_report as normal, but then before going 
through it a second time, say
my $fh = $blast_report->_fh;
seek($fh, 0, 0);

Finally, the most sensible way (assuming bioperl provides no methods of 
its own for this) of solving the problem is, the first time you go 
through each next_result, next_hit and next_hsp, just store the returned 
objects in an array of arrays of arrays. Then the second time get the 
objects from your array structure instead of with the method calls.


From heikki at sanbi.ac.za  Wed May 31 06:55:18 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 31 May 2006 12:55:18 +0200
Subject: [Bioperl-l]
	=?iso-8859-1?q?For_CVS_developers_-_potential_pitfall?=
	=?iso-8859-1?q?with_=22returnundef=22?=
In-Reply-To: <001801c68431$a586b2d0$15327e82@pyrimidine>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
Message-ID: <200605311255.19166.heikki@sanbi.ac.za>

In my opinion the sooner the bugs get exposed the better. It is much more 
likely that there is a well hidden bug caused by assigning accidentally undef 
into an one element array that someone intentionally writing code that 
expects that behaviour!

I removed (but did not commit yet) all undefs from my old Bio::Variation code 
and could not see any differences in the test output. 

Let's remove them!

	-Heikki

On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> Agreed, though I think these changes should be implemented at some point
> (Conway's argument here makes sense and it is nice for Torsten to check
> this out).  If proper tests are written then any changes resulting in
> errors should be picked up by checking the appropriate test suite, though I
> know it doesn't absolutely guarantee it.  ; P
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > Sent: Tuesday, May 30, 2006 1:53 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > "returnundef"
> >
> > Although I agree with the sentiment of following PBP, I'm not so sure
> > changing 'return undef' to 'return' *now* will fix any bugs without
> > introducing new, subtle ones.
> >
> > Chris Fields wrote:
> > > Torsten,
> > >
> > > Any way you can post a list of some/all of the offending lines or
> >
> > modules?
> >
> > > Sounds like something to consider, but if the list is as large as you
> >
> > say we
> >
> > > made need something (bugzilla? wiki?) to track the changes and make
> > > sure they pass tests; I'm sure a large majority will.
> > >
> > > I'm guessing Jason would want this somewhere on the project priority
> >
> > list or
> >
> > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > start
> >
> > a
> >
> > > page on the wiki for proposed code changes?
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > >> To: bioperl-l at lists.open-bio.org
> > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > >> "returnundef"
> > >>
> > >> FYI Bioperl developers:
> > >>
> > >> I just audited the bioperl-live CVS and found about 450 occurrences of
> > >> "return undef".
> > >>
> > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > >> suggest:
> > >>
> > >> "Use return; instead of return undef; if you want to return nothing.
> > >> If someone assigns the return value to an array, the latter creates an
> > >> array of one value (undef), which evaluates to true. The former will
> > >> correctly handle all contexts."
> > >>
> > >> So I'm guessing at least some of these 450 occurrences *could* result
> >
> > in
> >
> > >> bugs and should probably be changed.
> > >>
> > >> Your opinion may differ :-)
> > >>
> > >> --
> > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Rutger Vos, PhD. candidate
> > Department of Biological Sciences
> > Simon Fraser University
> > 8888 University Drive
> > Burnaby, BC, V5A1S6
> > Phone: 604-291-5625
> > Fax: 604-291-3496
> > Personal site: http://www.sfu.ca/~rvosa
> > FAB* lab: http://www.sfu.ca/~fabstar
> > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of the Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Wed May 31 06:44:28 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 31 May 2006 12:44:28 +0200
Subject: [Bioperl-l] Bio::Restriction::IO issues
In-Reply-To: <000f01c683f8$5771ed50$15327e82@pyrimidine>
References: <000f01c683f8$5771ed50$15327e82@pyrimidine>
Message-ID: <200605311244.29187.heikki@sanbi.ac.za>


Chris,

Thanks for stepping in. I feel partly responsible here because I originally 
changed some of Rob's code but have not followed up since.

There have not been active development on these modules so do not worry about 
stepping on anyone's toes.

   -Heikki

On Tuesday 30 May 2006 16:50, Chris Fields wrote:
> Jason, Brian, et al,
>
> I found several major issues with Bio::Restriction::IO (this popped up
> while bug squashing).  In particular, the POD is pretty misleading.  It
> states (directly from perldoc):
>
> SYNOPSIS
>         use Bio::Restriction::IO;
>
>         $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>                                          -format => 'withrefm');
>         $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>                                          -format => 'bairoch');
>         my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>         $out->write($res);
>
>       # or
>
>       #    use Bio::Restriction::IO;
>       #
>       #    #input file format can be read from the file extension (dat|xml)
>       #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
>       #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>       #
>       #    # World's shortest flat<->xml format converter:
>       #    print $out $_ while <$in>;
>
> So, I have found several problems with these modules.  I really hate to
> criticize code here, as my own is pretty hacky, but I think these are
> things to seriously mull over:
>
> 1)	Note that, though some of the lines above are commented they are
> still there in POD and thus present in perldoc/pod2html etc.  So, judging
> from the above, it suggests using the script above should read in from one
> format and write out to another (like SeqIO).  However, NONE of the current
> write() methods are implemented for any of the IO modules (withref, base,
> itype2, bairoch), so this does not happen as expected.  You get the nasty
> thrown 'method not implemented error' instead when writing.
> 2)	The commented statements in POD above also suggest that REBASE XML
> format is supported when there is no XML module.
> 3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
> made it unusable until I added a few small changes; it still can't handle
> multisite/multicut enzymes properly, so in essence it is useless until that
> is addressed.
> 4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
> why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make
> up it's own methods?
>
> I'm working on at least getting the 'bairoch' input format up and running
> (so at least it gets the enzymes into a
> Bio::Restriction::Enzyme::Collection).  From this point I'm not sure where
> to proceed.  The POD obviously needs to be corrected to reflect that
> writing formats is not implemented (and the bit about XML should be taken
> out completely); that's the easy part which I am working on and plan
> committing today.  However, these modules don't seem to be used too
> frequently so I'm not sure whether it's worth spending too much time
> getting these up to speed at the moment (adding write methods, switching to
> Bio::Root::Root, etc); I have other priorities at the moment (including a
> way overdue ListSummary). I'm also not sure who else is (using|working) on
> these so I don't want to (make too many changes|step on someone else's
> toes), but these are, IMHO, pretty serious problems.
>
> Any thoughts?
>
> Chris
>
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of the Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From cjfields at uiuc.edu  Wed May 31 09:10:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 08:10:00 -0500
Subject: [Bioperl-l] Bio::Restriction::IO issues
In-Reply-To: <200605311244.29187.heikki@sanbi.ac.za>
References: <000f01c683f8$5771ed50$15327e82@pyrimidine>
	<200605311244.29187.heikki@sanbi.ac.za>
Message-ID: <C8B60E1D-D5A5-4661-AA2B-CEE1E5B5D758@uiuc.edu>

Heikki,

I mainly just changed a few things so no one would get the wrong  
ideas from POD (that they write format as well) and added a few  
things to the TO DO.  I also added a warning to  
Bio::Restriction::IO::bairoch for the multisite/multicut issue.   
Besides that I haven't done much to them.  I also added a bit to the  
Project Priority List in case someone wants to take it up.  I may  
tinker with it but it's not really high on my priority list.  I've  
been pretty busy getting the ListSummaries back up to speed (very  
busy mail lists since the last one) and am writing/testing a new  
interface to NCBI EUtilities which I may donate at some in the next  
few months or so.

Chris


On May 31, 2006, at 5:44 AM, Heikki Lehvaslaiho wrote:

>
> Chris,
>
> Thanks for stepping in. I feel partly responsible here because I  
> originally
> changed some of Rob's code but have not followed up since.
>
> There have not been active development on these modules so do not  
> worry about
> stepping on anyone's toes.
>
>    -Heikki
>
> On Tuesday 30 May 2006 16:50, Chris Fields wrote:
>> Jason, Brian, et al,
>>
>> I found several major issues with Bio::Restriction::IO (this  
>> popped up
>> while bug squashing).  In particular, the POD is pretty  
>> misleading.  It
>> states (directly from perldoc):
>>
>> SYNOPSIS
>>         use Bio::Restriction::IO;
>>
>>         $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>>                                          -format => 'withrefm');
>>         $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>>                                          -format => 'bairoch');
>>         my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>>         $out->write($res);
>>
>>       # or
>>
>>       #    use Bio::Restriction::IO;
>>       #
>>       #    #input file format can be read from the file extension  
>> (dat|xml)
>>       #    $in  = Bio::Restriction::IO->newFh(-file =>  
>> "inputfilename");
>>       #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>>       #
>>       #    # World's shortest flat<->xml format converter:
>>       #    print $out $_ while <$in>;
>>
>> So, I have found several problems with these modules.  I really  
>> hate to
>> criticize code here, as my own is pretty hacky, but I think these are
>> things to seriously mull over:
>>
>> 1)	Note that, though some of the lines above are commented they are
>> still there in POD and thus present in perldoc/pod2html etc.  So,  
>> judging
>> from the above, it suggests using the script above should read in  
>> from one
>> format and write out to another (like SeqIO).  However, NONE of  
>> the current
>> write() methods are implemented for any of the IO modules  
>> (withref, base,
>> itype2, bairoch), so this does not happen as expected.  You get  
>> the nasty
>> thrown 'method not implemented error' instead when writing.
>> 2)	The commented statements in POD above also suggest that REBASE XML
>> format is supported when there is no XML module.
>> 3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
>> made it unusable until I added a few small changes; it still can't  
>> handle
>> multisite/multicut enzymes properly, so in essence it is useless  
>> until that
>> is addressed.
>> 4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
>> why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO  
>> and make
>> up it's own methods?
>>
>> I'm working on at least getting the 'bairoch' input format up and  
>> running
>> (so at least it gets the enzymes into a
>> Bio::Restriction::Enzyme::Collection).  From this point I'm not  
>> sure where
>> to proceed.  The POD obviously needs to be corrected to reflect that
>> writing formats is not implemented (and the bit about XML should  
>> be taken
>> out completely); that's the easy part which I am working on and plan
>> committing today.  However, these modules don't seem to be used too
>> frequently so I'm not sure whether it's worth spending too much time
>> getting these up to speed at the moment (adding write methods,  
>> switching to
>> Bio::Root::Root, etc); I have other priorities at the moment  
>> (including a
>> way overdue ListSummary). I'm also not sure who else is (using| 
>> working) on
>> these so I don't want to (make too many changes|step on someone  
>> else's
>> toes), but these are, IMHO, pretty serious problems.
>>
>> Any thoughts?
>>
>> Chris
>>
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of the Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jay at jays.net  Wed May 31 09:07:10 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 08:07:10 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
Message-ID: <447D94FE.8090305@jays.net>

http://www.bioperl.org/wiki/Bptutorial.pl

I think I just partially fulfilled this TODO:

  TODO: check if the POD is in the Wiki yet, and if not, put it here? 

I used Pod::Simple::Wiki (format 'mediawiki') to burn bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the wiki page via my web browser. (Is that proper procedure? Is the plan to just do that manually from time to time as the document changes?)

Now what?

Should there be a new link on the far left of bioperl.org called "Tutorial"? 

It's an amazing document. IMHO it should be listed prominently on bioperl.org.

HTH,

j


From osborne1 at optonline.net  Wed May 31 09:58:01 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 31 May 2006 09:58:01 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447D94FE.8090305@jays.net>
Message-ID: <C0A31929.89F9%osborne1@optonline.net>

Jay,

Excellent! Now we need to answer a few more questions for ourselves:

- Do we remove the file bptutorial.pl from the package now? I'd say yes, we
don't want to have to maintain two bptutorials.

- What do we do with the script part of bptutorial.pl? It certainly could be
excised and put into the examples/ directory, for example, but this would
break a few of the paths that are being used.

- A link to bptutorial? Or a link to the existing tutorials page?
http://www.bioperl.org/wiki/Tutorials.

Any thoughts on these?


Brian O.


On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:

> http://www.bioperl.org/wiki/Bptutorial.pl
> 
> I think I just partially fulfilled this TODO:
> 
>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> 
> I used Pod::Simple::Wiki (format 'mediawiki') to burn
> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the
> wiki page via my web browser. (Is that proper procedure? Is the plan to just
> do that manually from time to time as the document changes?)
> 
> Now what?
> 
> Should there be a new link on the far left of bioperl.org called "Tutorial"?
> 
> It's an amazing document. IMHO it should be listed prominently on bioperl.org.
> 
> HTH,
> 
> j
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From luciap at sas.upenn.edu  Wed May 31 10:06:13 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Wed, 31 May 2006 10:06:13 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
	<6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
Message-ID: <1149084373.447da2d5c5339@128.91.55.38>

Hi
Thanks
a couple more questions
why is the bootstrap value stored as the node id? Is that right?

also, in the add_descendant method, how do you set the $ignoreoverwrite
parameter to true?

Lucia

Quoting Jason Stajich <jason.stajich at duke.edu>:

> you need to special case the root - it won't have an ancestor.  just
> protect the my $parent = $node->ancestor with an if statement as I
> did below
>
> On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:
>
> > Hi
> > OK that was silly, but what I have in my code is what you just wrote
> > But the problem is that if I write
> >
> > $parent->add_Descendent($child)
> >
> > it tells me that I am calling  the method "ass_Descendent" on an
> > undefined value
> > (but I did define $parent before??)
> >
> > So here it goes the code so far:
> >
> > use Bio::TreeIO;
> >  my $in = new Bio::TreeIO(-file => 'Test2.tre',
> >                           -format => 'newick');
> >  my $out = new Bio::TreeIO(-file => '>mytree.out',
> >                            -format => 'newick');
> >  while( my $tree = $in->next_tree ) {
> >     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
> >     my $bootstrap=$node->_creation_id;
> >
> >     if ($bootstrap < 70 ){
> >    >>> if(        my $parent = $node->ancestor ) {
> >               my @children=$node->get_all_Descendents;
> >               foreach my $child (@children){
> >                  $parent->add_Descendent($child);
> >               }
>          }
> >
> > ........
> >
> > eventually I'll add (once I assigned the children to the parent
> > succesfully):
> > $tree->remove_Node($node);
> >
> >         }
> >     }
> >     $out->write_tree($tree);
> > }
> >
> > Quoting aaron.j.mackey at gsk.com:
> >
> >>> foreach $child (@children){
> >>>          $parent=add_Descendent->$child;
> >>> }
> >>
> >> I think what you want is $parent->add_Descendent($child)
> >>
> >> -Aaron
> >>
> >
> >
> > Lucia Peixoto
> > Department of Biology,SAS
> > University of Pennsylvania
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From sb at mrc-dunn.cam.ac.uk  Wed May 31 10:56:49 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 31 May 2006 15:56:49 +0100
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>

Heikki Lehvaslaiho wrote:
> In my opinion the sooner the bugs get exposed the better. It is much more 
> likely that there is a well hidden bug caused by assigning accidentally undef 
> into an one element array that someone intentionally writing code that 
> expects that behaviour!
> 
> I removed (but did not commit yet) all undefs from my old Bio::Variation code 
> and could not see any differences in the test output. 
> 
> Let's remove them!

Just looking for all return undef;s isn't enough. It's entirely possible 
to do something like:

my $return_value;
{
   # do something that assigns to return_value on success
   # on failure, just do nothing
}
return $return_value;

The bioperl docs will typically explicitly state that undef is returned, 
and under what circumstance. If a user suffers from the 
undef-into-array-problem, yes it can be slightly unexpected, but lots of 
unexpected things will happen when you don't use a method correctly, as 
per the docs!

Fixing the return of undef is either a job that shouldn't be done, or a 
much harder job than expected.


From bernd.web at gmail.com  Wed May 31 10:30:30 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 31 May 2006 16:30:30 +0200
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <447D94FE.8090305@jays.net> <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <716af09c0605310730o7de20489m674a07b5a928039d@mail.gmail.com>

Hi,

I am not sure to what extent bptutorial will be removed, but
I actually like having bptutorial.pl in my BioPerl base for reference.

regards,
Bernd

On 5/31/06, Brian Osborne <osborne1 at optonline.net> wrote:
> Jay,
>
> Excellent! Now we need to answer a few more questions for ourselves:
>
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.
>
> - What do we do with the script part of bptutorial.pl? It certainly could be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.
>
> - A link to bptutorial? Or a link to the existing tutorials page?
> http://www.bioperl.org/wiki/Tutorials.
>
> Any thoughts on these?
>
>
> Brian O.
>
>
> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
>
> > http://www.bioperl.org/wiki/Bptutorial.pl
> >
> > I think I just partially fulfilled this TODO:
> >
> >   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >
> > I used Pod::Simple::Wiki (format 'mediawiki') to burn
> > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the
> > wiki page via my web browser. (Is that proper procedure? Is the plan to just
> > do that manually from time to time as the document changes?)
> >
> > Now what?
> >
> > Should there be a new link on the far left of bioperl.org called "Tutorial"?
> >
> > It's an amazing document. IMHO it should be listed prominently on bioperl.org.
> >
> > HTH,
> >
> > j
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From lstein at cshl.edu  Wed May 31 12:03:13 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 12:03:13 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <200605311203.13922.lstein@cshl.edu>

I'm afraid that everything depends on the context. If the subroutine is 
documented to return a single scalar, then returning undef is appropriate. If 
the subroutine is documented to return "false" on failure, then one must call 
return (or "return ()" ).

Changing all the return undefs to return is going to expose hidden bugs in the 
code written by people who are using BioPerl. While I agree wholeheartedly 
with the proposed audit, I think we need to expect that people are going to 
complain.

Lincoln


On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote:
> In my opinion the sooner the bugs get exposed the better. It is much more
> likely that there is a well hidden bug caused by assigning accidentally
> undef into an one element array that someone intentionally writing code
> that expects that behaviour!
>
> I removed (but did not commit yet) all undefs from my old Bio::Variation
> code and could not see any differences in the test output.
>
> Let's remove them!
>
> 	-Heikki
>
> On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> > Agreed, though I think these changes should be implemented at some point
> > (Conway's argument here makes sense and it is nice for Torsten to check
> > this out).  If proper tests are written then any changes resulting in
> > errors should be picked up by checking the appropriate test suite, though
> > I know it doesn't absolutely guarantee it.  ; P
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > > Sent: Tuesday, May 30, 2006 1:53 PM
> > > To: bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > > "returnundef"
> > >
> > > Although I agree with the sentiment of following PBP, I'm not so sure
> > > changing 'return undef' to 'return' *now* will fix any bugs without
> > > introducing new, subtle ones.
> > >
> > > Chris Fields wrote:
> > > > Torsten,
> > > >
> > > > Any way you can post a list of some/all of the offending lines or
> > >
> > > modules?
> > >
> > > > Sounds like something to consider, but if the list is as large as you
> > >
> > > say we
> > >
> > > > made need something (bugzilla? wiki?) to track the changes and make
> > > > sure they pass tests; I'm sure a large majority will.
> > > >
> > > > I'm guessing Jason would want this somewhere on the project priority
> > >
> > > list or
> > >
> > > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > > start
> > >
> > > a
> > >
> > > > page on the wiki for proposed code changes?
> > > >
> > > > Chris
> > > >
> > > >> -----Original Message-----
> > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > > >> To: bioperl-l at lists.open-bio.org
> > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > > >> "returnundef"
> > > >>
> > > >> FYI Bioperl developers:
> > > >>
> > > >> I just audited the bioperl-live CVS and found about 450 occurrences
> > > >> of "return undef".
> > > >>
> > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > > >> suggest:
> > > >>
> > > >> "Use return; instead of return undef; if you want to return nothing.
> > > >> If someone assigns the return value to an array, the latter creates
> > > >> an array of one value (undef), which evaluates to true. The former
> > > >> will correctly handle all contexts."
> > > >>
> > > >> So I'm guessing at least some of these 450 occurrences *could*
> > > >> result
> > >
> > > in
> > >
> > > >> bugs and should probably be changed.
> > > >>
> > > >> Your opinion may differ :-)
> > > >>
> > > >> --
> > > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > > >>
> > > >> _______________________________________________
> > > >> Bioperl-l mailing list
> > > >> Bioperl-l at lists.open-bio.org
> > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Rutger Vos, PhD. candidate
> > > Department of Biological Sciences
> > > Simon Fraser University
> > > 8888 University Drive
> > > Burnaby, BC, V5A1S6
> > > Phone: 604-291-5625
> > > Fax: 604-291-3496
> > > Personal site: http://www.sfu.ca/~rvosa
> > > FAB* lab: http://www.sfu.ca/~fabstar
> > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed May 31 12:34:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 11:34:54 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <001201c684d0$263c5530$15327e82@pyrimidine>

Brian, Jay,

I think it would be nice to have the tutorial prominently displayed somehow
(Jay's suggestion), with a link provided via the tutorials page.  Hopefully
this will help with the bioperl newbies.

Jay, looks like there are still some weird formatting issues with the
bptutorial wiki page, something which I ran into before when getting the
Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
spaces preceding a line denotes code for some reason).  Not much you can do
in these cases except remove the extra spaces in those spots.  Looking good
though!  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Wednesday, May 31, 2006 8:58 AM
> To: Jay Hannah; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Jay,
> 
> Excellent! Now we need to answer a few more questions for ourselves:
> 
> - Do we remove the file bptutorial.pl from the package now? I'd say yes,
> we
> don't want to have to maintain two bptutorials.
> 
> - What do we do with the script part of bptutorial.pl? It certainly could
> be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.
> 
> - A link to bptutorial? Or a link to the existing tutorials page?
> http://www.bioperl.org/wiki/Tutorials.
> 
> Any thoughts on these?
> 
> 
> Brian O.
> 
> 
> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
> 
> > http://www.bioperl.org/wiki/Bptutorial.pl
> >
> > I think I just partially fulfilled this TODO:
> >
> >   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >
> > I used Pod::Simple::Wiki (format 'mediawiki') to burn
> > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
> the
> > wiki page via my web browser. (Is that proper procedure? Is the plan to
> just
> > do that manually from time to time as the document changes?)
> >
> > Now what?
> >
> > Should there be a new link on the far left of bioperl.org called
> "Tutorial"?
> >
> > It's an amazing document. IMHO it should be listed prominently on
> bioperl.org.
> >
> > HTH,
> >
> > j
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 31 12:44:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 11:44:31 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311203.13922.lstein@cshl.edu>
Message-ID: <001301c684d1$7e849fd0$15327e82@pyrimidine>

My feeling is the test suite 'should' pick up a large majority of problems
if changes are made to these lines, the quotes there indicating the utopian
idea that the tests are all written well (I believe 99% of the tests are,
BTW).  You can always try the changes (wholesale or on smaller chunks of
code), see if they pass tests on different OS's using 'make/nmake test',
revert the ones that didn't pass, etc.  It's a matter of someone willing to
try it out.

I think the original argument proposed here (originating from Damian Conway
and 'Perl Best Practices') is maybe using 'return undef' is something we
shouldn't be doing since this can lead to subtle errors itself.  Not that
everything we do is considered 'a good practice' by any means.  If I
remember correctly from 'OOPerl', Conway doesn't like combined get/setters
either (he prefers separate getters and setters); we use the 'bad' combined
version predominately in Bioperl.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> Sent: Wednesday, May 31, 2006 11:03 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Heikki Lehvaslaiho
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> I'm afraid that everything depends on the context. If the subroutine is
> documented to return a single scalar, then returning undef is appropriate.
> If
> the subroutine is documented to return "false" on failure, then one must
> call
> return (or "return ()" ).
> 
> Changing all the return undefs to return is going to expose hidden bugs in
> the
> code written by people who are using BioPerl. While I agree wholeheartedly
> with the proposed audit, I think we need to expect that people are going
> to
> complain.
> 
> Lincoln
> 
> 
> On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote:
> > In my opinion the sooner the bugs get exposed the better. It is much
> more
> > likely that there is a well hidden bug caused by assigning accidentally
> > undef into an one element array that someone intentionally writing code
> > that expects that behaviour!
> >
> > I removed (but did not commit yet) all undefs from my old Bio::Variation
> > code and could not see any differences in the test output.
> >
> > Let's remove them!
> >
> > 	-Heikki
> >
> > On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> > > Agreed, though I think these changes should be implemented at some
> point
> > > (Conway's argument here makes sense and it is nice for Torsten to
> check
> > > this out).  If proper tests are written then any changes resulting in
> > > errors should be picked up by checking the appropriate test suite,
> though
> > > I know it doesn't absolutely guarantee it.  ; P
> > >
> > > Chris
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > > > Sent: Tuesday, May 30, 2006 1:53 PM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > > > "returnundef"
> > > >
> > > > Although I agree with the sentiment of following PBP, I'm not so
> sure
> > > > changing 'return undef' to 'return' *now* will fix any bugs without
> > > > introducing new, subtle ones.
> > > >
> > > > Chris Fields wrote:
> > > > > Torsten,
> > > > >
> > > > > Any way you can post a list of some/all of the offending lines or
> > > >
> > > > modules?
> > > >
> > > > > Sounds like something to consider, but if the list is as large as
> you
> > > >
> > > > say we
> > > >
> > > > > made need something (bugzilla? wiki?) to track the changes and
> make
> > > > > sure they pass tests; I'm sure a large majority will.
> > > > >
> > > > > I'm guessing Jason would want this somewhere on the project
> priority
> > > >
> > > > list or
> > > >
> > > > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > > > start
> > > >
> > > > a
> > > >
> > > > > page on the wiki for proposed code changes?
> > > > >
> > > > > Chris
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > > > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > > > >> To: bioperl-l at lists.open-bio.org
> > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > > > >> "returnundef"
> > > > >>
> > > > >> FYI Bioperl developers:
> > > > >>
> > > > >> I just audited the bioperl-live CVS and found about 450
> occurrences
> > > > >> of "return undef".
> > > > >>
> > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > > > >> suggest:
> > > > >>
> > > > >> "Use return; instead of return undef; if you want to return
> nothing.
> > > > >> If someone assigns the return value to an array, the latter
> creates
> > > > >> an array of one value (undef), which evaluates to true. The
> former
> > > > >> will correctly handle all contexts."
> > > > >>
> > > > >> So I'm guessing at least some of these 450 occurrences *could*
> > > > >> result
> > > >
> > > > in
> > > >
> > > > >> bugs and should probably be changed.
> > > > >>
> > > > >> Your opinion may differ :-)
> > > > >>
> > > > >> --
> > > > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > > > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > > > >>
> > > > >> _______________________________________________
> > > > >> Bioperl-l mailing list
> > > > >> Bioperl-l at lists.open-bio.org
> > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > Rutger Vos, PhD. candidate
> > > > Department of Biological Sciences
> > > > Simon Fraser University
> > > > 8888 University Drive
> > > > Burnaby, BC, V5A1S6
> > > > Phone: 604-291-5625
> > > > Fax: 604-291-3496
> > > > Personal site: http://www.sfu.ca/~rvosa
> > > > FAB* lab: http://www.sfu.ca/~fabstar
> > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Wed May 31 10:59:53 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 10:59:53 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <949F348A-391B-495D-ABCE-30BABC37FF05@gmx.net>

I agree. Thanks to Torsten for the audit and Chris for stepping up.

	-hilmar

On May 31, 2006, at 6:55 AM, Heikki Lehvaslaiho wrote:

> In my opinion the sooner the bugs get exposed the better. It is  
> much more
> likely that there is a well hidden bug caused by assigning  
> accidentally undef
> into an one element array that someone intentionally writing code that
> expects that behaviour!
>
> I removed (but did not commit yet) all undefs from my old  
> Bio::Variation code
> and could not see any differences in the test output.
>
> Let's remove them!
>
> 	-Heikki
>
> On Tuesday 30 May 2006 23:40, Chris Fields wrote:
>> Agreed, though I think these changes should be implemented at some  
>> point
>> (Conway's argument here makes sense and it is nice for Torsten to  
>> check
>> this out).  If proper tests are written then any changes resulting in
>> errors should be picked up by checking the appropriate test suite,  
>> though I
>> know it doesn't absolutely guarantee it.  ; P
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
>>> Sent: Tuesday, May 30, 2006 1:53 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
>>> "returnundef"
>>>
>>> Although I agree with the sentiment of following PBP, I'm not so  
>>> sure
>>> changing 'return undef' to 'return' *now* will fix any bugs without
>>> introducing new, subtle ones.
>>>
>>> Chris Fields wrote:
>>>> Torsten,
>>>>
>>>> Any way you can post a list of some/all of the offending lines or
>>>
>>> modules?
>>>
>>>> Sounds like something to consider, but if the list is as large  
>>>> as you
>>>
>>> say we
>>>
>>>> made need something (bugzilla? wiki?) to track the changes and make
>>>> sure they pass tests; I'm sure a large majority will.
>>>>
>>>> I'm guessing Jason would want this somewhere on the project  
>>>> priority
>>>
>>> list or
>>>
>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
>>>> start
>>>
>>> a
>>>
>>>> page on the wiki for proposed code changes?
>>>>
>>>> Chris
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>>>>> "returnundef"
>>>>>
>>>>> FYI Bioperl developers:
>>>>>
>>>>> I just audited the bioperl-live CVS and found about 450  
>>>>> occurrences of
>>>>> "return undef".
>>>>>
>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
>>>>> suggest:
>>>>>
>>>>> "Use return; instead of return undef; if you want to return  
>>>>> nothing.
>>>>> If someone assigns the return value to an array, the latter  
>>>>> creates an
>>>>> array of one value (undef), which evaluates to true. The former  
>>>>> will
>>>>> correctly handle all contexts."
>>>>>
>>>>> So I'm guessing at least some of these 450 occurrences *could*  
>>>>> result
>>>
>>> in
>>>
>>>>> bugs and should probably be changed.
>>>>>
>>>>> Your opinion may differ :-)
>>>>>
>>>>> --
>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Rutger Vos, PhD. candidate
>>> Department of Biological Sciences
>>> Simon Fraser University
>>> 8888 University Drive
>>> Burnaby, BC, V5A1S6
>>> Phone: 604-291-5625
>>> Fax: 604-291-3496
>>> Personal site: http://www.sfu.ca/~rvosa
>>> FAB* lab: http://www.sfu.ca/~fabstar
>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of the Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed May 31 14:08:43 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:08:43 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311203.13922.lstein@cshl.edu>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
	<200605311203.13922.lstein@cshl.edu>
Message-ID: <FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>


On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:

> If the subroutine is documented to return "false" on failure, then  
> one must call
> return (or "return ()" ).

The problem seems to be that 'a value that evaluates to either true  
or false' and 'a [meaningful] value or undef' and 'a value or  
false' ('a value or no value) are not the same in perl. And what  
would/should one expect if the doc states 'true on success and false  
otherwise'?

Maybe the documentation should also be fixed to avoid any ambiguity.  
I.e., avoid documenting 'a value or false' because it may be  
ambiguous (not only) to the less proficient. 'True or false' should  
imply a value being returned.

Comments?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lstein at cshl.edu  Wed May 31 14:14:59 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 14:14:59 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311203.13922.lstein@cshl.edu>
	<FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
Message-ID: <200605311415.00414.lstein@cshl.edu>

If the documentation says "returns false" then I expect to be able to do this:

	@result = foo();
	die "foo() failed" unless @result;

If the documentation says "returns undef" then I expect this:

	@result = foo();
	die "foo() failed" unless $result[0];

Lincoln


On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > If the subroutine is documented to return "false" on failure, then
> > one must call
> > return (or "return ()" ).
>
> The problem seems to be that 'a value that evaluates to either true
> or false' and 'a [meaningful] value or undef' and 'a value or
> false' ('a value or no value) are not the same in perl. And what
> would/should one expect if the doc states 'true on success and false
> otherwise'?
>
> Maybe the documentation should also be fixed to avoid any ambiguity.
> I.e., avoid documenting 'a value or false' because it may be
> ambiguous (not only) to the less proficient. 'True or false' should
> imply a value being returned.
>
> Comments?
>
> 	-hilmar

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From hlapp at gmx.net  Wed May 31 14:31:21 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:31:21 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311415.00414.lstein@cshl.edu>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311203.13922.lstein@cshl.edu>
	<FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
	<200605311415.00414.lstein@cshl.edu>
Message-ID: <241E77AE-8D1E-4708-9C4C-8A9619822DB4@gmx.net>


On May 31, 2006, at 2:14 PM, Lincoln Stein wrote:

> If the documentation says "returns false" then I expect to be able  
> to do this:
>
> 	@result = foo();
> 	die "foo() failed" unless @result;

Except if the alternative to 'false' would be a scalar, you normally  
wouldn't assign it to an array, would you?

I.e., I wouldn't expect this strict of a behavior from an open-source  
package written largely from people whose job is biological science,  
not programming perl knowing and following DC to the letter ... I'd  
rather be on the safe side and assign to a scalar.

Just my $0.02 ...

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed May 31 14:50:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 13:50:30 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>
Message-ID: <001801c684e3$16e33730$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Wednesday, May 31, 2006 9:57 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> Heikki Lehvaslaiho wrote:
> > In my opinion the sooner the bugs get exposed the better. It is much
> more
> > likely that there is a well hidden bug caused by assigning accidentally
> undef
> > into an one element array that someone intentionally writing code that
> > expects that behaviour!
> >
> > I removed (but did not commit yet) all undefs from my old Bio::Variation
> code
> > and could not see any differences in the test output.
> >
> > Let's remove them!
> 
> Just looking for all return undef;s isn't enough. It's entirely possible
> to do something like:
> 
> my $return_value;
> {
>    # do something that assigns to return_value on success
>    # on failure, just do nothing
> }
> return $return_value;

Agreed, though looking for these is obviously much harder.  

The way to get around those is:

return $return_value if $return_value;
return;

which I've seen used in a number of get/set methods. 

> The bioperl docs will typically explicitly state that undef is returned,
> and under what circumstance. If a user suffers from the
> undef-into-array-problem, yes it can be slightly unexpected, but lots of
> unexpected things will happen when you don't use a method correctly, as
> per the docs!

Right, but the argument you make is that code will always work as expected
from the perldoc examples.  My recent experiences with the
Bio::Restriction::IO and Bio::Species classes show that the docs are not
always up-to-date and may indicate the unimplemented intent of the author
more than the actual implementation.  Again, I believe a large majority of
the docs are fine, but it's those few errors that made a devil's advocate of
me...

> Fixing the return of undef is either a job that shouldn't be done, or a
> much harder job than expected.

I don't think ignoring the problem is the best answer here though I agree
the problem is more complicated than at first glance.  Judging from code I'm
trolled through a bit lately I've seen a lot of methods (mainly get/setters)
that are essentially copied multiple times in the same or across similar
modules to save time.  You could see a scenario where, in those instances,
so-called 'bad code' would spread quite quickly.

I think adding a wiki page to address some of these issues would be nice,
something separate from the Project Priority List.

Chris
 _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From forward at hongyu.org  Wed May 31 14:03:46 2006
From: forward at hongyu.org (Hongyu Zhang)
Date: Wed, 31 May 2006 11:03:46 -0700
Subject: [Bioperl-l] New functions for SimpleAlign.pm
Message-ID: <20060531110346.78xod658td8o0w0w@hongyu.org>

Greetings,

I am a new member in this mailing list. Nice to be here.

I wrote two more functions for the alignment module SimpleAlign.pm  
that calculate the percentage of identity based on the shortest and  
longest sequence length, respectively. I also found an error in the  
no_residues() function that calculate the number of residues in the  
alignment.

I am wondering whether they can be added to the official bioperl  
package. I've contacted the original author of this module, Heikki  
Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet.

Thanks.

-- 
Hongyu Zhang, Ph.D.
Computational biologist
Ceres Inc.


From cjfields at uiuc.edu  Wed May 31 15:39:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 14:39:26 -0500
Subject: [Bioperl-l] New functions for SimpleAlign.pm
In-Reply-To: <20060531110346.78xod658td8o0w0w@hongyu.org>
Message-ID: <001901c684e9$ed4a1720$15327e82@pyrimidine>

I added a bit to the FAQ about this:

http://www.bioperl.org/wiki/FAQ#How_do_I_submit_a_patch_or_enhancement_to_Bi
oPerl.3F

and the HOWTO explains things a bit more directly:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

In brief, these need to be submitted to Bugzilla as either code enhancements
(for your added methods) or bugs with the patch to the relevant code.  Code
enhancements probably should include some code and test cases to demonstrate
usage.  Patches to buggy code are checked to make sure they pass relevant
tests by the core developers.  Submitting it to the mail list is definitely
the first step, though, so you're on the right path.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hongyu Zhang
> Sent: Wednesday, May 31, 2006 1:04 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] New functions for SimpleAlign.pm
> 
> Greetings,
> 
> I am a new member in this mailing list. Nice to be here.
> 
> I wrote two more functions for the alignment module SimpleAlign.pm
> that calculate the percentage of identity based on the shortest and
> longest sequence length, respectively. I also found an error in the
> no_residues() function that calculate the number of residues in the
> alignment.
> 
> I am wondering whether they can be added to the official bioperl
> package. I've contacted the original author of this module, Heikki
> Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet.
> 
> Thanks.
> 
> --
> Hongyu Zhang, Ph.D.
> Computational biologist
> Ceres Inc.
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 31 16:40:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 15:40:19 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311415.00414.lstein@cshl.edu>
Message-ID: <002001c684f2$6fb7daf0$15327e82@pyrimidine>

What about modules that have 'throw_not_implemented' statements present?
Here's a list with the total for each.  Some of these are interfaces (I got
rid of a number that ended in 'I' or 'IO' to remove the I/IO interfaces but
it misses a few).  There are a number here that are implementations, though
(Bio::AlignIO::maf, Bio::Restriction:IO::*), so they are technically
incomplete:

Instances: 1	Module : Bio::AlignIO::maf
Instances: 25	Module : Bio::Assembly::Contig
Instances: 2	Module : Bio::Assembly::ContigAnalysis
Instances: 2	Module : Bio::Biblio::BiblioBase
Instances: 4	Module : Bio::DB::Expression
Instances: 2	Module : Bio::DB::Expression::geo
Instances: 5	Module : Bio::DB::Flat
Instances: 2	Module : Bio::DB::Query::WebQuery
Instances: 17	Module : Bio::DB::SeqFeature::Store
Instances: 2	Module : Bio::DB::SeqVersion
Instances: 3	Module : Bio::DB::Taxonomy
Instances: 1	Module : Bio::FeatureIO::bed
Instances: 1	Module : Bio::Map::Marker
Instances: 1	Module : Bio::MapIO::fpc
Instances: 1	Module : Bio::MapIO::mapmaker
Instances: 1	Module : Bio::Restriction::IO::bairoch
Instances: 1	Module : Bio::Restriction::IO::itype2
Instances: 1	Module : Bio::Restriction::IO::withrefm
Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
Instances: 3	Module : Bio::Tools::Run::WrapperBase

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> Sent: Wednesday, May 31, 2006 1:15 PM
> To: Hilmar Lapp
> Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> If the documentation says "returns false" then I expect to be able to do
> this:
> 
> 	@result = foo();
> 	die "foo() failed" unless @result;
> 
> If the documentation says "returns undef" then I expect this:
> 
> 	@result = foo();
> 	die "foo() failed" unless $result[0];
> 
> Lincoln
> 
> 
> On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > If the subroutine is documented to return "false" on failure, then
> > > one must call
> > > return (or "return ()" ).
> >
> > The problem seems to be that 'a value that evaluates to either true
> > or false' and 'a [meaningful] value or undef' and 'a value or
> > false' ('a value or no value) are not the same in perl. And what
> > would/should one expect if the doc states 'true on success and false
> > otherwise'?
> >
> > Maybe the documentation should also be fixed to avoid any ambiguity.
> > I.e., avoid documenting 'a value or false' because it may be
> > ambiguous (not only) to the less proficient. 'True or false' should
> > imply a value being returned.
> >
> > Comments?
> >
> > 	-hilmar
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Wed May 31 17:07:06 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 17:07:06 -0400
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
References: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
Message-ID: <200605311707.08196.lstein@cshl.edu>


> Instances: 17	Module : Bio::DB::SeqFeature::Store

This is intentional. Bio::DB::SeqFeature::Store is intended to be a virtual 
base class. The throw_not_implemented() calls are there to force developers 
to override the needed interface methods.

If this is not the right way to do it, let me know and I'll fix it.

Lincoln


> Instances: 2	Module : Bio::DB::SeqVersion
> Instances: 3	Module : Bio::DB::Taxonomy
> Instances: 1	Module : Bio::FeatureIO::bed
> Instances: 1	Module : Bio::Map::Marker
> Instances: 1	Module : Bio::MapIO::fpc
> Instances: 1	Module : Bio::MapIO::mapmaker
> Instances: 1	Module : Bio::Restriction::IO::bairoch
> Instances: 1	Module : Bio::Restriction::IO::itype2
> Instances: 1	Module : Bio::Restriction::IO::withrefm
> Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
> Instances: 3	Module : Bio::Tools::Run::WrapperBase
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> > Sent: Wednesday, May 31, 2006 1:15 PM
> > To: Hilmar Lapp
> > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> > Subject: Re: [Bioperl-l] For CVS developers - potential
> > pitfallwith"returnundef"
> >
> > If the documentation says "returns false" then I expect to be able to do
> > this:
> >
> > 	@result = foo();
> > 	die "foo() failed" unless @result;
> >
> > If the documentation says "returns undef" then I expect this:
> >
> > 	@result = foo();
> > 	die "foo() failed" unless $result[0];
> >
> > Lincoln
> >
> > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > > If the subroutine is documented to return "false" on failure, then
> > > > one must call
> > > > return (or "return ()" ).
> > >
> > > The problem seems to be that 'a value that evaluates to either true
> > > or false' and 'a [meaningful] value or undef' and 'a value or
> > > false' ('a value or no value) are not the same in perl. And what
> > > would/should one expect if the doc states 'true on success and false
> > > otherwise'?
> > >
> > > Maybe the documentation should also be fixed to avoid any ambiguity.
> > > I.e., avoid documenting 'a value or false' because it may be
> > > ambiguous (not only) to the less proficient. 'True or false' should
> > > imply a value being returned.
> > >
> > > Comments?
> > >
> > > 	-hilmar
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From hlapp at gmx.net  Wed May 31 17:21:57 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:21:57 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
References: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
Message-ID: <A5EEA3BE-DEC6-42F2-AC44-D54F6C49DD8E@gmx.net>


On May 31, 2006, at 4:40 PM, Chris Fields wrote:

> What about modules that have 'throw_not_implemented' statements  
> present?

Those are often if not always legitimate - the problem are those that  
don't have them but fail to override an inherited interface or  
abstract method.

If something is not implemented what is the better way to express  
this other than throwing an exception? (and if it's not an interface  
or abstract base class, saying so in the documentation)

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed May 31 17:25:48 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:25:48 -0400
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine>
References: <001801c684e3$16e33730$15327e82@pyrimidine>
Message-ID: <8AA04BF0-FA79-43CF-9FBB-310314FECD91@gmx.net>


On May 31, 2006, at 2:50 PM, Chris Fields wrote:

> I've seen a lot of methods (mainly get/setters)
> that are essentially copied multiple times in the same or across  
> similar
> modules to save time.  You could see a scenario where, in those  
> instances,
> so-called 'bad code' would spread quite quickly.

This will usually be code generated by macros, e.g. the emacs macros  
for getter/setter generation for properties.

If the macro generates wrong code, that's indeed pretty bad. (We've  
had that.) OTOH it should be spotted quickly as well. And macro  
changes or new macros should probably be scrutinized by all eyes  
watching ...

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed May 31 17:40:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 16:40:22 -0500
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <A5EEA3BE-DEC6-42F2-AC44-D54F6C49DD8E@gmx.net>
Message-ID: <002401c684fa$d28e7640$15327e82@pyrimidine>

I think, as long as it's reflected in the docs that something doesn't work
(hasn't been implemented) then there's no problem.  It's when the docs are
misleading that we run into problems.  

The sticking point lies with some classes, such as IO classes (like SeqIO,
or Restrict::IO, with read and write methods) where the IO base class
specifies that it is possible to read and write a particular format but the
actual implementation varies according to whether or not the derived class
overrides the base or interface method (in other words, 'doesn't work as
advertised' only in specific circumstances).  I don't know how to solve this
issue except to add in the docs that specific formats don't implement
write() methods.  

Personally, I haven't had an issue with it and it probably makes no
difference, but I think it needs to be pointed out.  The most extreme I ran
into was Bio::Restriction::IO, which had 3 out of 4 plugin modules that
didn't implement the write() method but left this in the synopsis in POD:

    use Bio::Restriction::IO;

    $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                     -format => 'withrefm');
    $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                     -format => 'bairoch');
    my $res = $in->read; # a Bio::Restriction::EnzymeCollection
    $out->write($res);

  # or

  #    use Bio::Restriction::IO;
  #
  #    #input file format can be read from the file extension (dat|xml)
  #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
  #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
  #
  #    # World's shortest flat<->xml format converter:
  #    print $out $_ while <$in>;

None of this code works; in fact, no XML parser even exists for these IO
classes!  Bio::AlignIO also has a few as well (maf and Stockholm formats
don't write).

Chris


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, May 31, 2006 4:22 PM
> To: Chris Fields
> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho'
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> 
> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> 
> > What about modules that have 'throw_not_implemented' statements
> > present?
> 
> Those are often if not always legitimate - the problem are those that
> don't have them but fail to override an inherited interface or
> abstract method.
> 
> If something is not implemented what is the better way to express
> this other than throwing an exception? (and if it's not an interface
> or abstract base class, saying so in the documentation)
> 
> 	-hilmar
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From hlapp at gmx.net  Wed May 31 17:55:37 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:55:37 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <002401c684fa$d28e7640$15327e82@pyrimidine>
References: <002401c684fa$d28e7640$15327e82@pyrimidine>
Message-ID: <CB29173C-0BFC-43CA-A620-519084AFEE04@gmx.net>

This is documentation cruft resulting from copy&paste w/o later  
fixing it. (which isn't a justification)

Note that not implementing the write is as legitimate as not  
implementing the read method ... It should be pointed out in the  
documentation though that it will depend on the actual implementation  
of the format whether it supports reading or writing or both.

	-hilmar

On May 31, 2006, at 5:40 PM, Chris Fields wrote:

> I think, as long as it's reflected in the docs that something  
> doesn't work
> (hasn't been implemented) then there's no problem.  It's when the  
> docs are
> misleading that we run into problems.
>
> The sticking point lies with some classes, such as IO classes (like  
> SeqIO,
> or Restrict::IO, with read and write methods) where the IO base class
> specifies that it is possible to read and write a particular format  
> but the
> actual implementation varies according to whether or not the  
> derived class
> overrides the base or interface method (in other words, 'doesn't  
> work as
> advertised' only in specific circumstances).  I don't know how to  
> solve this
> issue except to add in the docs that specific formats don't implement
> write() methods.
>
> Personally, I haven't had an issue with it and it probably makes no
> difference, but I think it needs to be pointed out.  The most  
> extreme I ran
> into was Bio::Restriction::IO, which had 3 out of 4 plugin modules  
> that
> didn't implement the write() method but left this in the synopsis  
> in POD:
>
>     use Bio::Restriction::IO;
>
>     $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>                                      -format => 'withrefm');
>     $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>                                      -format => 'bairoch');
>     my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>     $out->write($res);
>
>   # or
>
>   #    use Bio::Restriction::IO;
>   #
>   #    #input file format can be read from the file extension (dat| 
> xml)
>   #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
>   #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>   #
>   #    # World's shortest flat<->xml format converter:
>   #    print $out $_ while <$in>;
>
> None of this code works; in fact, no XML parser even exists for  
> these IO
> classes!  Bio::AlignIO also has a few as well (maf and Stockholm  
> formats
> don't write).
>
> Chris
>
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Wednesday, May 31, 2006 4:22 PM
>> To: Chris Fields
>> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki  
>> Lehvaslaiho'
>> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
>>
>>
>> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
>>
>>> What about modules that have 'throw_not_implemented' statements
>>> present?
>>
>> Those are often if not always legitimate - the problem are those that
>> don't have them but fail to override an inherited interface or
>> abstract method.
>>
>> If something is not implemented what is the better way to express
>> this other than throwing an exception? (and if it's not an interface
>> or abstract base class, saying so in the documentation)
>>
>> 	-hilmar
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From slenk at emich.edu  Wed May 31 17:52:13 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Wed, 31 May 2006 17:52:13 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
Message-ID: <100682f110067a83.10067a83100682f1@emich.edu>


Isn't it fairly standard in OO schemes/languages to have an exception thrown if a method 
can't be found at the 
end of a search up the class hierarchy? I recall being very mad at Smalltalk because "method 
not found" kept 
biting me. C++ has pure virtual base classes that do not allow objects to be instantiated 
directly; they are 
meant to be inherited and then implemented. 

Perl 6 was mentioned a bit back. Is this issue addressed there? Should it be? Do the Bioperl 
people feed their 
needs into Perl 6 so that all the code effort to make Bio::Root is handled for them in the next 
effort by Perl 6 
itself. Make the Perl 6 people solve these issues with your input, then you will not have to 
deal with 
implementing it yourselves. I'll just bet that you are not the only potential users of Perl 6 who 
will have to solve 
these issues eventually.


----- Original Message -----
From: Hilmar Lapp <hlapp at gmx.net>
Date: Wednesday, May 31, 2006 5:21 pm
Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented

> 
> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> 
> > What about modules that have 'throw_not_implemented' statements  
> > present?
> 
> Those are often if not always legitimate - the problem are those 
> that  
> don't have them but fail to override an inherited interface or  
> abstract method.
> 
> If something is not implemented what is the better way to express  
> this other than throwing an exception? (and if it's not an 
> interface  
> or abstract base class, saying so in the documentation)
> 
> 	-hilmar
> 
> -- 
> 
=========================================================
==
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> 
=========================================================
==
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From arareko at campus.iztacala.unam.mx  Wed May 31 18:49:03 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 31 May 2006 17:49:03 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <001201c684d0$263c5530$15327e82@pyrimidine>
References: <001201c684d0$263c5530$15327e82@pyrimidine>
Message-ID: <447E1D5F.1050807@campus.iztacala.unam.mx>

Brian, Jay, Chris,

I agree with what Bernd Web said in another reply. For some people will 
be nice to still be able to run the script from the codebase and 
interact with it.

I don't think it should be a lot of problem to maintain both tutorials, 
as long as the 'main' one is the one in the CVS tree. By reading what 
Jay did in order to convert it into mediawiki format, I suppose this can 
be easily done again for each new change to the script (again, this is 
just my guessing). Besides, as far as I've seen, there aren't frequent 
commits to the script at all.

I've added a link in the left menu of the wiki. If you think it should 
point to the Tutorials page instead of the Bptutorial.pl page please let 
me know.

Regards,
Mauricio.

Chris Fields wrote:
> Brian, Jay,
> 
> I think it would be nice to have the tutorial prominently displayed somehow
> (Jay's suggestion), with a link provided via the tutorials page.  Hopefully
> this will help with the bioperl newbies.
> 
> Jay, looks like there are still some weird formatting issues with the
> bptutorial wiki page, something which I ran into before when getting the
> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
> spaces preceding a line denotes code for some reason).  Not much you can do
> in these cases except remove the extra spaces in those spots.  Looking good
> though!  
> 
> Chris
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
>> Sent: Wednesday, May 31, 2006 8:58 AM
>> To: Jay Hannah; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
>>
>> Jay,
>>
>> Excellent! Now we need to answer a few more questions for ourselves:
>>
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes,
>> we
>> don't want to have to maintain two bptutorials.
>>
>> - What do we do with the script part of bptutorial.pl? It certainly could
>> be
>> excised and put into the examples/ directory, for example, but this would
>> break a few of the paths that are being used.
>>
>> - A link to bptutorial? Or a link to the existing tutorials page?
>> http://www.bioperl.org/wiki/Tutorials.
>>
>> Any thoughts on these?
>>
>>
>> Brian O.
>>
>>
>> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
>>
>>> http://www.bioperl.org/wiki/Bptutorial.pl
>>>
>>> I think I just partially fulfilled this TODO:
>>>
>>>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
>>>
>>> I used Pod::Simple::Wiki (format 'mediawiki') to burn
>>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
>> the
>>> wiki page via my web browser. (Is that proper procedure? Is the plan to
>> just
>>> do that manually from time to time as the document changes?)
>>>
>>> Now what?
>>>
>>> Should there be a new link on the far left of bioperl.org called
>> "Tutorial"?
>>> It's an amazing document. IMHO it should be listed prominently on
>> bioperl.org.
>>> HTH,
>>>
>>> j
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Wed May 31 20:43:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 19:43:48 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311707.08196.lstein@cshl.edu>
Message-ID: <002801c68514$72f11480$15327e82@pyrimidine>


> -----Original Message-----
> From: Lincoln Stein [mailto:lstein at cshl.edu]
> Sent: Wednesday, May 31, 2006 4:07 PM
> To: Chris Fields
> Cc: 'Hilmar Lapp'; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho'
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> 
> > Instances: 17	Module : Bio::DB::SeqFeature::Store
> 
> This is intentional. Bio::DB::SeqFeature::Store is intended to be a
> virtual
> base class. The throw_not_implemented() calls are there to force
> developers
> to override the needed interface methods.
> 
> If this is not the right way to do it, let me know and I'll fix it.

That's the right way, though I don't really know what the 'right way' is.
Sorry Lincoln, didn't mean to imply anything directly at you specifically; I
responded to your last post to stay in the thread, so to speak.  It was
meant to be a general statement that some classes haven't implemented
methods specified by their abstract base or interface class.  This is just
output from a quickie script I wrote up to check on this and see how many of
these statements are out there, and since there isn't a full-proof method to
know what an abstract base class is, it pulls in a few abstract classes
(such as yours) along with all the others.  At least there aren't as many
hits as Torsten's ~400-500 for 'return undef'! 

Anyway, I'm not sure what would be the best place to address code problems
or issues like the unimplemented methods issue or Torsten's audits (list,
wiki, etc); it's a delicate issue b/c it's bordering on code critiquing and
what constitutes good vs. bad code.  I remember some pretty heated arguments
about the 'proper' way to do things a while back involving AUTOLOAD'ing
methods, which I think is summarized somewhere in the wiki.  Myself, I'm a
microbiologist and not a programmer, so I'm prone to bouts of hackery, but I
try to have the code at least do what the docs state.

Chris

> Lincoln
> 
> 
> > Instances: 2	Module : Bio::DB::SeqVersion
> > Instances: 3	Module : Bio::DB::Taxonomy
> > Instances: 1	Module : Bio::FeatureIO::bed
> > Instances: 1	Module : Bio::Map::Marker
> > Instances: 1	Module : Bio::MapIO::fpc
> > Instances: 1	Module : Bio::MapIO::mapmaker
> > Instances: 1	Module : Bio::Restriction::IO::bairoch
> > Instances: 1	Module : Bio::Restriction::IO::itype2
> > Instances: 1	Module : Bio::Restriction::IO::withrefm
> > Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
> > Instances: 3	Module : Bio::Tools::Run::WrapperBase
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> > > Sent: Wednesday, May 31, 2006 1:15 PM
> > > To: Hilmar Lapp
> > > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> > > Subject: Re: [Bioperl-l] For CVS developers - potential
> > > pitfallwith"returnundef"
> > >
> > > If the documentation says "returns false" then I expect to be able to
> do
> > > this:
> > >
> > > 	@result = foo();
> > > 	die "foo() failed" unless @result;
> > >
> > > If the documentation says "returns undef" then I expect this:
> > >
> > > 	@result = foo();
> > > 	die "foo() failed" unless $result[0];
> > >
> > > Lincoln
> > >
> > > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > > > If the subroutine is documented to return "false" on failure, then
> > > > > one must call
> > > > > return (or "return ()" ).
> > > >
> > > > The problem seems to be that 'a value that evaluates to either true
> > > > or false' and 'a [meaningful] value or undef' and 'a value or
> > > > false' ('a value or no value) are not the same in perl. And what
> > > > would/should one expect if the doc states 'true on success and false
> > > > otherwise'?
> > > >
> > > > Maybe the documentation should also be fixed to avoid any ambiguity.
> > > > I.e., avoid documenting 'a value or false' because it may be
> > > > ambiguous (not only) to the less proficient. 'True or false' should
> > > > imply a value being returned.
> > > >
> > > > Comments?
> > > >
> > > > 	-hilmar
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed May 31 20:56:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 19:56:12 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx>
Message-ID: <002901c68516$316d4fe0$15327e82@pyrimidine>

Mauricio et al,

Sounds good, except that there are a few issues with the formatting done by
Pod::Simple::Wiki, such as changing some things to <code> tags when they
obviously aren't code; I don't know if thee is a work around for that
(Jay?).  It may not be anything too serious though.  

There was a similar issue with the INSTALL doc conversion to wiki that I ran
into, in that I don't think it will be easy converting one way or the other
(POD->wiki or wiki->POD or text), so syncing updates with wiki and CVS docs
could be an issue we'll have to face in the future.

We could strip the POD out of the script and have the docs on the wiki
(Brian's idea), or have minimal POD in the tutorial and keep the wiki
updated, just to simplify things, but this may not appeal to those who use
perldoc frequently (I personally use browsable prettified HTML).

cjf

> -----Original Message-----
> From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx]
> Sent: Wednesday, May 31, 2006 5:49 PM
> To: Chris Fields
> Cc: 'Brian Osborne'; 'Jay Hannah'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Brian, Jay, Chris,
> 
> I agree with what Bernd Web said in another reply. For some people will
> be nice to still be able to run the script from the codebase and
> interact with it.
> 
> I don't think it should be a lot of problem to maintain both tutorials,
> as long as the 'main' one is the one in the CVS tree. By reading what
> Jay did in order to convert it into mediawiki format, I suppose this can
> be easily done again for each new change to the script (again, this is
> just my guessing). Besides, as far as I've seen, there aren't frequent
> commits to the script at all.
> 
> I've added a link in the left menu of the wiki. If you think it should
> point to the Tutorials page instead of the Bptutorial.pl page please let
> me know.
> 
> Regards,
> Mauricio.
> 
> Chris Fields wrote:
> > Brian, Jay,
> >
> > I think it would be nice to have the tutorial prominently displayed
> somehow
> > (Jay's suggestion), with a link provided via the tutorials page.
> Hopefully
> > this will help with the bioperl newbies.
> >
> > Jay, looks like there are still some weird formatting issues with the
> > bptutorial wiki page, something which I ran into before when getting the
> > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or
> more
> > spaces preceding a line denotes code for some reason).  Not much you can
> do
> > in these cases except remove the extra spaces in those spots.  Looking
> good
> > though!
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> >> Sent: Wednesday, May 31, 2006 8:58 AM
> >> To: Jay Hannah; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> >>
> >> Jay,
> >>
> >> Excellent! Now we need to answer a few more questions for ourselves:
> >>
> >> - Do we remove the file bptutorial.pl from the package now? I'd say
> yes,
> >> we
> >> don't want to have to maintain two bptutorials.
> >>
> >> - What do we do with the script part of bptutorial.pl? It certainly
> could
> >> be
> >> excised and put into the examples/ directory, for example, but this
> would
> >> break a few of the paths that are being used.
> >>
> >> - A link to bptutorial? Or a link to the existing tutorials page?
> >> http://www.bioperl.org/wiki/Tutorials.
> >>
> >> Any thoughts on these?
> >>
> >>
> >> Brian O.
> >>
> >>
> >> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
> >>
> >>> http://www.bioperl.org/wiki/Bptutorial.pl
> >>>
> >>> I think I just partially fulfilled this TODO:
> >>>
> >>>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >>>
> >>> I used Pod::Simple::Wiki (format 'mediawiki') to burn
> >>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
> >> the
> >>> wiki page via my web browser. (Is that proper procedure? Is the plan
> to
> >> just
> >>> do that manually from time to time as the document changes?)
> >>>
> >>> Now what?
> >>>
> >>> Should there be a new link on the far left of bioperl.org called
> >> "Tutorial"?
> >>> It's an amazing document. IMHO it should be listed prominently on
> >> bioperl.org.
> >>> HTH,
> >>>
> >>> j
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM


From osborne1 at optonline.net  Wed May 31 21:37:15 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 31 May 2006 21:37:15 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E1D5F.1050807@campus.iztacala.unam.mx>
Message-ID: <C0A3BD0B.8A2C%osborne1@optonline.net>

Mauricio,

Bernd didn't say he want the _script_ in the package, he said he wanted
bptutorial.pl in the package, not indicating whether it was the
documentation or the script that was important. It's my suspicion that the
documentation is more important than the script, and this is what my last
letter was asking, in part: is the script important? Or can we focus on the
text/POD part?

Brian O.


On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra"
<arareko at campus.iztacala.unam.mx> wrote:

> I agree with what Bernd Web said in another reply. For some people will
> be nice to still be able to run the script from the codebase and
> interact with it.


From cjfields at uiuc.edu  Wed May 31 21:42:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 20:42:54 -0500
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <100682f110067a83.10067a83100682f1@emich.edu>
Message-ID: <002a01c6851c$b3b8a980$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Stephen Gordon Lenk
> Sent: Wednesday, May 31, 2006 4:52 PM
> To: Hilmar Lapp
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> 
> Isn't it fairly standard in OO schemes/languages to have an exception
> thrown if a method
> can't be found at the
> end of a search up the class hierarchy? I recall being very mad at
> Smalltalk because "method
> not found" kept
> biting me. C++ has pure virtual base classes that do not allow objects to
> be instantiated
> directly; they are
> meant to be inherited and then implemented.

Perl will throw an error if it can't find a method in a class hierarchy.
It will do a few things first before dying, like looking for AUTOLOAD, etc.
AUTOLOAD has it's supporters and detractors; I try to stay away from it as
much as possible.

Not sure about C++ like pure virtual classes in Perl5, i.e. not allowing
direct object instantiation, but Perl6 is supposed to have them, at least
according to Apocalypse 12.  From what Mr. Wall says about OOP in Perl5,
it's essentially 'bolted on' but works with caveats (is 'private' really
'private'?).  Perl6 is rebuilt from scratch (internals are OO).

> Perl 6 was mentioned a bit back. Is this issue addressed there? Should it
> be? Do the Bioperl
> people feed their
> needs into Perl 6 so that all the code effort to make Bio::Root is handled
> for them in the next
> effort by Perl 6
> itself. Make the Perl 6 people solve these issues with your input, then
> you will not have to
> deal with
> implementing it yourselves. I'll just bet that you are not the only
> potential users of Perl 6 who
> will have to solve
> these issues eventually.

I think Perl6 will solve most (if not all) these problems since it's a
complete rebuild.  In fact, it's pretty much a new language altogether from
what I have seen (and the little I have played around with using Pugs).
Parrot is supposed to handle mixes of Perl5/Perl6, so it may not be
necessary to immediately convert all of bioperl to Perl6.  Though I have
also heard of a Perl5->6 converter in the works as well...  

>From an OO standpoint, I believe everything is considered an object in
Perl6, though it's not supposed to force you into using objects according to
the Apocalypses that I have read.  I actually see a lot there that reminds
me of C++ (but in a Perl-ish way, of course).  Apocalypse 12 is a good
primer, though you may want to go through the others first, they're heavy
slogging:

http://dev.perl.org/perl6/doc/design/apo/A12.html

Not sure what you mean by 'feeding our needs into Perl6'.  I have
periodically checked on perl6 progress and they seem to have everything well
under control.

Chris
 
> ----- Original Message -----
> From: Hilmar Lapp <hlapp at gmx.net>
> Date: Wednesday, May 31, 2006 5:21 pm
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> >
> > On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> >
> > > What about modules that have 'throw_not_implemented' statements
> > > present?
> >
> > Those are often if not always legitimate - the problem are those
> > that
> > don't have them but fail to override an inherited interface or
> > abstract method.
> >
> > If something is not implemented what is the better way to express
> > this other than throwing an exception? (and if it's not an
> > interface
> > or abstract base class, saying so in the documentation)
> >
> > 	-hilmar
> >
> > --
> >
> =========================================================
> ==
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >
> =========================================================
> ==
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jay at jays.net  Wed May 31 21:54:01 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 20:54:01 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <447E48B9.4080503@jays.net>

Brian Osborne wrote:
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.

We certainly wouldn't want to try to maintain two copies, one POD one in wiki. That would be the worst of all options. One option that hasn't been mentioned yet is to keep maintenance of that in POD in the distro (leaving the cool runability alone), and then flag that document as unchangeable in the wiki with a note on top "Maintenance of this document is done in POD in the distro. Submit POD patches to bioperl-l and we'll re-post an updated copy to this wiki."

Just a thought.

> - What do we do with the script part of bptutorial.pl? It certainly could be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.

/README says this:

 scripts/    - Useful production-quality scripts with POD documentation
 examples/   - Scripts demonstrating the many uses of Bioperl

I'm personally not clear on the difference. Little stuff should start in examples/ and graduate to scripts/ once they've matured? 

Is the doc/ tree being abandoned?

doc/faq        (empty?)
doc/howto      
doc/howto/examples
doc/howto/figs (empty?)
doc/howto/html (empty?)
doc/howto/pdf  (empty?)
doc/howto/sgml (empty?)
doc/howto/txt  (empty?)
doc/howto/xml  (empty?)

Does all that stuff officially live in and is being changed in the wiki, never to return to the distro?

Any reason those empty dirs aren't nuked out of CVS?

Chris Fields wrote:
> Jay, looks like there are still some weird formatting issues with the
> bptutorial wiki page, something which I ran into before when getting the
> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
> spaces preceding a line denotes code for some reason).  Not much you can do
> in these cases except remove the extra spaces in those spots.  Looking good
> though!  

Sorry, I spent zero time on the whole conversion. I'm not sure what parts didn't convert well. I've never done that conversion before, and know nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran off to work. :)

Mauricio Herrera Cuadra wrote:
> I've added a link in the left menu of the wiki. If you think it should 
> point to the Tutorials page instead of the Bptutorial.pl page please let 
> me know.

Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so?

Documentation  (linked on the left menu)
- Quick start
- FAQ
- HOWTOs
- Tutorials

(What's the conceptual difference between a HOWTO and a tutorial?)

It's hard for me to dive into a wiki lifestyle for the huge documentation pillars since it can't ever get back into the distro... (can it?)  Small, throw away stuff is great for the wiki, but huge, established, thoughtful, long documents should be left in the distro? Present (and searchable) on the wiki but static?

Why isn't the short "Current events" just listed on the top of the "News" page?

Sick of my endless questions yet? -grin-

j


From cjfields at uiuc.edu  Wed May 31 23:09:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 22:09:38 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447E48B9.4080503@jays.net>
Message-ID: <000001c68528$d1b6ec10$15327e82@pyrimidine>


...

> We certainly wouldn't want to try to maintain two copies, one POD one in
> wiki. That would be the worst of all options. One option that hasn't been
> mentioned yet is to keep maintenance of that in POD in the distro (leaving
> the cool runability alone), and then flag that document as unchangeable in
> the wiki with a note on top "Maintenance of this document is done in POD
> in the distro. Submit POD patches to bioperl-l and we'll re-post an
> updated copy to this wiki."
> 
> Just a thought.

There are probably three schools of thought on docs: those that like nice
docs with links within and beyond BioPerl (hence the wiki), those who like
including docs with the distribution, and those that would like both.  The
latter would be nice but isn't realistic unless we can come up with a way to
sync changes between the wiki and CVS those docs we want to include with the
distribution w/o too much trouble.  I'm in the first school of thought since
rich text with links is better and more informative than plain text any day.
It might be a very small school though...

> > - What do we do with the script part of bptutorial.pl? It certainly
> could be
> > excised and put into the examples/ directory, for example, but this
> would
> > break a few of the paths that are being used.
> 
> /README says this:
> 
>  scripts/    - Useful production-quality scripts with POD documentation
>  examples/   - Scripts demonstrating the many uses of Bioperl
> 
> I'm personally not clear on the difference. Little stuff should start in
> examples/ and graduate to scripts/ once they've matured?
> 
> Is the doc/ tree being abandoned?

Most docs have been moved over to the wiki, which generates nicely formatted
docs for printing.
...

> Does all that stuff officially live in and is being changed in the wiki,
> never to return to the distro?

It's easier to add changes in the wiki and add markup, links, etc.  Much
richer text, so on.
 
> Any reason those empty dirs aren't nuked out of CVS?
> 
> Chris Fields wrote:
> > Jay, looks like there are still some weird formatting issues with the
> > bptutorial wiki page, something which I ran into before when getting the
> > Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or
> more
> > spaces preceding a line denotes code for some reason).  Not much you can
> do
> > in these cases except remove the extra spaces in those spots.  Looking
> good
> > though!
> 
> Sorry, I spent zero time on the whole conversion. I'm not sure what parts
> didn't convert well. I've never done that conversion before, and know
> nothing about mediawiki. I just blindly let Pod::Simple::Wiki do its thing
> then ran off to work. :)

No big deal.  

> Mauricio Herrera Cuadra wrote:
> > I've added a link in the left menu of the wiki. If you think it should
> > point to the Tutorials page instead of the Bptutorial.pl page please let
> > me know.
> 
> Instead of all these competing links on the left, maybe we should have a
> master "documentation" page linked on the left cascading like so?
> 
> Documentation  (linked on the left menu)
> - Quick start
> - FAQ
> - HOWTOs
> - Tutorials

Okay, though Mauricio may know a bit more on how/if this can be done.
Mauricio?

> (What's the conceptual difference between a HOWTO and a tutorial?)

I believe the reasoning is along these lines: HOWTO's are focused in on
specific areas (graphics, trees, BLAST report parsing, etc) and thus usually
has greater detail. The tutorials are more broadly based (sort of a general
bioperl HOWTO).  The only exception is the Beginner's HOWTO, but even that
has additional information over the tutorial (at least it did the last time
I looked at the tutorial, which has been a while).

> It's hard for me to dive into a wiki lifestyle for the huge documentation
> pillars since it can't ever get back into the distro... (can it?)  Small,
> throw away stuff is great for the wiki, but huge, established, thoughtful,
> long documents should be left in the distro? Present (and searchable) on
> the wiki but static?

Hence the problem we face now.  It is something we need to really look into
before adding too much more to the wiki.  IMHO, I think we should have very
little information directly in the distribution itself since it's already
quite large.  It's almost as easy to have a bare-bones INSTALL file, which
would point to the wiki for additional information.  But I may be very much
alone in that train of thought ; >

> Why isn't the short "Current events" just listed on the top of the "News"
> page?

Don't know.
 
> Sick of my endless questions yet? -grin-

Not really.

cjf

> j
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gad14 at cornell.edu  Tue May 30 12:57:41 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Tue, 30 May 2006 12:57:41 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447BFB20.40501@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
Message-ID: <447C7985.9000404@cornell.edu>

Thanks for your comment Sendu, it was very helpful. I think this must be 
what's going on.. I am using $blast_report->next_result in both 
subroutines. It appears that analyzing the blast results first w/ my 
sort subroutine empties (?) the $blast_result object so that when I try 
to print, there is nothing left to print. (and visa-versa when I print 
first then try to sort).
So, from the looks of things, using next_result has the effect of 
popping the Bio::Search::Result::ResultI objects off of the SearchIO 
blast report object??

It seems I could get around this by making a copy of the blast report by 
setting it to another new variable...(not the most elegant solution) but 
I'm having trouble with this...

If I do:

	my $blast_report_copy = $blast_report;

I'm just copying the reference to the SearchIO blast result, so it 
doesn't help me. How can I make another physical copy of this blast 
result object? Seems like a simple thing but how to do it is escaping me.

But better yet, the way to go is to 'reset the counter,' or to find a 
way to look at/print/sort the results without removing data from the 
blast result object. How is this done though??

Sendu and Brian, I didn't post the sort_results subroutine because it is 
sprawling, as is a lot of my code. The code I provided was more like an 
aid for my explanation of the problem.. it doesn't actually run - sorry 
for the confusion, I should have more clear on that.  The important 
thing to know perhaps is that both sort_results and print_blast_results 
contain a foreach loop where I am using the 'next_results' method to 
view blast results. (And to clarify for Torsten, the blastall() is 
working just fine - the analysis/viewing of the results object is where 
I am encountering the problem.)


Any other ideas would be greatly appreciated...

Thank you,
Genevieve


Sendu Bala wrote:

> Genevieve DeClerck wrote:
> 
>> Hi,
> 
> [snip]
> 
>> If I've sorted the results the sorted-results will print to screen, 
>> however when I try to print the Hit Table results nothing is returned, 
>> as if the blast results have evaporated.... and visa versa, if i 
>> comment out the part where i point my sorting subroutine to the blast 
>> results reference,  my hit table results suddenly prints to screen.
> 
> [snip]
> 
>> Here's an abbreviated version of my code:
> 
> [snip]
> 
>> #######
>> ### the following 2 actions seem to be mutually exclusive.
>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>> # SeqFeature objs stored in arrays. arrays are then printed
>> # to stdout
>> &sort_results($blast_report);
>>
>> # 2) print blast results
>> &print_blast_results($blast_report);
> 
> 
>> sub print_blast_results{
>>    my $report = shift;
>>    while(my $result = $report->next_result()){
> 
> [snip]
> 
> You didn't give us your sort_results subroutine, but is it as simple as 
> they both use $report->next_result (and/or $result->next_hit), but you 
> don't reset the internal counter back to the start, so the second 
> subroutine tries to get the next_result and finds the first subroutine 
> has already looked at the last result and so next_result returns false?
> 
>  From a quick look it wasn't obvious how to reset the counter. Hopefully 
> this can be done and someone else knows how.
> 


From lstein at cshl.edu  Wed May 31 11:17:39 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 11:17:39 -0400
Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg
	values
In-Reply-To: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>
References: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>
Message-ID: <200605311117.41479.lstein@cshl.edu>

Hi Kevin,

Since you are modifying the Panel.pm source code, why don't you just go ahead 
and use the current Bio::Graphics development tree? Since 1.5.1 it supports 
negative coordinates. Here's an illustration:

 #!/usr/bin/perl

 use strict;

 use Bio::Graphics;
 use Bio::Graphics::Feature;

 my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
 my $feature = 
Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
 my $panel   = Bio::Graphics::Panel->new(-start=> -200,
					 -end  => +200,
					 -width=>800,
					 -pad_left=>10,
					 -pad_right=>10);
 $panel->add_track($whole,
		   -glyph=>'arrow',
		   -double=>1,
		   -tick=>2);
 $panel->add_track($feature,
	 	  -glyph=>'box',
		   -stranded=>1);
 print $panel->png;

 exit 0;

The resulting image is attached.

Lincoln

On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> I am so sorry for the truncated email accidentally hit reply.
> if anyone is interested i have opted to change
>
> change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> in linux its
> /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
>
>
>       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
>
> to
>
>       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
>
> just  for this one-off use.
>
>
>
> strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> option for coords offset?
>     my $relative_coords_offset = $self->option('relative_coords_offset');
>     $relative_coords_offset    = 1 unless defined $relative_coords_offset;
> but entering the option -relative_coords_offset=>1000 in the arrow glyphs
> didn't do anything...
>
>
>
> Hi!
>
> > oh it was in a slightly different header asking about the create image
> > map feature.
> > I am using the stable version 1.4 of bioperl now. In any case I have not
> > added the sequence as a feature annotated seq. as I already have the bp
> > where the TF binds (in 1-1050 numberings) so what I did was to just add
> > graded segments based on the position.
> > I saw that there is a scale function for the arrow glyp however, it is a
> > multiply function, can it be hacked to take in a offset value (ie minus
> > the
> > scale by 1000?)
> >
> > cheers
> > kevin
> >
> >
> > Hi,
> >
> > > For some reason I didn't see the first posting on this. In current
> >
> > bioperl
> >
> > > live, the ruler can have negative numberings - I use this routinely.
> > > You need
> > > to create a feature that starts in negative coordinates. What is
> >
> > happening
> >
> > > to
> > > you when you try this?
> > >
> > > Lincoln
> > >
> > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > Hi
> > > > thanks for the help offered thus far!
> > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > >
> > > using
> > >
> > > > bioperl. therefore i was asked to make the numberings as such (-1000)
> >
> > is
> >
> > > > there any way at all to do this in bioperl without changing the .pm
> > >
> > > file?
> > >
> > > > thanks guys..
> > > > kevin
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: negatives.png
Type: image/png
Size: 1065 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/eaeb5e28/attachment-0003.png>

From lstein at cshl.edu  Wed May 31 12:05:47 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 12:05:47 -0400
Subject: [Bioperl-l] Fwd: Re: SOLVED Bio::Graphics::Panel make ruler have
	neg values
Message-ID: <200605311205.48122.lstein@cshl.edu>

Oddly, bioperl-l listserver is holding this mail because it has "a suspicious 
header". I took out Kevin's email address in case it is the "spammotel" 
header that is bothering it.

Lincoln

----------  Forwarded Message  ----------

Subject: Re: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg 
values
Date: Wednesday 31 May 2006 11:17
From: Lincoln Stein <lstein at cshl.edu>
To: bioperl-l at lists.open-bio.org
Cc: "Kevin Lam Koiyau" <ULNJUJERYDIX at spammotel.com>

Hi Kevin,

Since you are modifying the Panel.pm source code, why don't you just go ahead
and use the current Bio::Graphics development tree? Since 1.5.1 it supports
negative coordinates. Here's an illustration:

 #!/usr/bin/perl

 use strict;

 use Bio::Graphics;
 use Bio::Graphics::Feature;

 my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
 my $feature =
Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
 my $panel   = Bio::Graphics::Panel->new(-start=> -200,
					 -end  => +200,
					 -width=>800,
					 -pad_left=>10,
					 -pad_right=>10);
 $panel->add_track($whole,
		   -glyph=>'arrow',
		   -double=>1,
		   -tick=>2);
 $panel->add_track($feature,
	 	  -glyph=>'box',
		   -stranded=>1);
 print $panel->png;

 exit 0;

The resulting image is attached.

Lincoln

On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> I am so sorry for the truncated email accidentally hit reply.
> if anyone is interested i have opted to change
>
> change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> in linux its
> /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
>
>
>       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
>
> to
>
>       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
>
> just  for this one-off use.
>
>
>
> strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> option for coords offset?
>     my $relative_coords_offset = $self->option('relative_coords_offset');
>     $relative_coords_offset    = 1 unless defined $relative_coords_offset;
> but entering the option -relative_coords_offset=>1000 in the arrow glyphs
> didn't do anything...
>
>
>
> Hi!
>
> > oh it was in a slightly different header asking about the create image
> > map feature.
> > I am using the stable version 1.4 of bioperl now. In any case I have not
> > added the sequence as a feature annotated seq. as I already have the bp
> > where the TF binds (in 1-1050 numberings) so what I did was to just add
> > graded segments based on the position.
> > I saw that there is a scale function for the arrow glyp however, it is a
> > multiply function, can it be hacked to take in a offset value (ie minus
> > the
> > scale by 1000?)
> >
> > cheers
> > kevin
> >
> >
> > Hi,
> >
> > > For some reason I didn't see the first posting on this. In current
> >
> > bioperl
> >
> > > live, the ruler can have negative numberings - I use this routinely.
> > > You need
> > > to create a feature that starts in negative coordinates. What is
> >
> > happening
> >
> > > to
> > > you when you try this?
> > >
> > > Lincoln
> > >
> > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > Hi
> > > > thanks for the help offered thus far!
> > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > >
> > > using
> > >
> > > > bioperl. therefore i was asked to make the numberings as such (-1000)
> >
> > is
> >
> > > > there any way at all to do this in bioperl without changing the .pm
> > >
> > > file?
> > >
> > > > thanks guys..
> > > > kevin
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

-------------------------------------------------------

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: negatives.png
Type: image/png
Size: 1065 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/6c5f4137/attachment-0003.png>

From rvosa at sfu.ca  Tue May 30 15:10:17 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 12:10:17 -0700
Subject: [Bioperl-l] New mailing list for Bio::Phylo
Message-ID: <447C9899.5060102@sfu.ca>

Dear recipients,

the open bioinformatics foundation has been kind enough to host a 
mailing list for Bio::Phylo (http://search.cpan.org/~rvosa/Bio-Phylo/, 
the cpan distribution for phylogenetic analysis using perl).

The scope of this list is at present fairly broad as it is both meant 
for user questions and development discussion on deeper integration with 
bioperl.

You are invited to sign up at: 
http://lists.open-bio.org/mailman/listinfo/bio-phylo-l

Best wishes,

Rutger Vos

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From bioperlanand at yahoo.com  Mon May  1 18:36:20 2006
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Mon, 1 May 2006 11:36:20 -0700 (PDT)
Subject: [Bioperl-l] how to obtain GIs from clone_ids
Message-ID: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com>


Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry) 
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions..

Thanks in advance.

Anand

		
---------------------------------
Blab-away for as little as 1?/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.


From cuiw at mail.nih.gov  Mon May  1 19:39:01 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Mon, 1 May 2006 15:39:01 -0400
Subject: [Bioperl-l] how to obtain GIs from clone_ids
In-Reply-To: <20060501183620.85791.qmail@web37901.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F48B0@nihcesmlbx10.nih.gov>

use strict;
use Bio::DB::Query::GenBank;

my $query_string = 'EST["C0005918b04"]';   
my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',                                           
					 -query=>$query_string,				       
					);   
my $count = $query->count;


my @ids   = $query->ids;  


for (@ids) {
  print;
}

-----Original Message-----
From: Anand Venkatraman [mailto:bioperlanand at yahoo.com] 
Sent: Monday, May 01, 2006 2:36 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] how to obtain GIs from clone_ids


Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry)
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI number for that clone_id? Any suggestions..

Thanks in advance.

Anand

		
---------------------------------
Blab-away for as little as 1?/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From s.ryazansky at gmail.com  Mon May  1 21:55:13 2006
From: s.ryazansky at gmail.com (Sergei Ryazansky)
Date: Mon, 1 May 2006 21:55:13 +0000 (UTC)
Subject: [Bioperl-l] blast program to run locally on windows
References: <007c01c66883$61f29490$15327e82@pyrimidine>
	<20060425215433.35436.qmail@web36613.mail.mud.yahoo.com>
Message-ID: <loom.20060501T235327-11@post.gmane.org>

Hi,
Can you post your formatdb.log file here?


From cjfields at uiuc.edu  Tue May  2 04:15:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 1 May 2006 23:15:19 -0500
Subject: [Bioperl-l] blast program to run locally on windows
In-Reply-To: <loom.20060501T235327-11@post.gmane.org>
References: <007c01c66883$61f29490$15327e82@pyrimidine>
	<20060425215433.35436.qmail@web36613.mail.mud.yahoo.com>
	<loom.20060501T235327-11@post.gmane.org>
Message-ID: <D54C8321-6A9C-4674-8C7E-5452DEF84599@uiuc.edu>

We managed to work our way through it.  He hadn't set ncbi.ini to the  
correct directories; the database was formatted correctly.

Chris

On May 1, 2006, at 4:55 PM, Sergei Ryazansky wrote:

> Hi,
> Can you post your formatdb.log file here?
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue May  2 16:19:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 2 May 2006 11:19:34 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and complexity
Message-ID: <000901c66e04$33e07370$15327e82@pyrimidine>

I ran into some wonkiness with using extra parameters ('seq_start',
'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
gone through, fixed, and committed.  I also have added a few tests to DB.t
for everything (all changes were in Bio::DB::WebDBSeqI and
Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
manage to get it added as well (with tests).  This is how NCBI defines
complexity:

complexity regulates the display:
0 - get the whole blob
1 - get the bioseq for gi of interest (default in Entrez)
2 - get the minimal bioseq-set containing the gi of interest
3 - get the minimal nuc-prot containing the gi of interest
4 - get the minimal pub-set containing the gi of interest

Here's my quandary; when setting complexity to '0', you get a glob back (the
main sequence as well as any subsequences, such as CDS); this is in essence
a sequence stream with multiple alphabet types.  So, I now have it set up to
do this:

my $factory = Bio::DB::GenBank->new(-format => 'fasta',
                                    -complexity => 0
                                   );

my $seqin = $factory->get_Seq_by_acc($acc);

while (my $seq = $seqin->next_seq) {
    $seqout->write_seq($seq);
}

since I thought returning an array would be horrendously expensive on
memory, esp. with larger sequences.  Currently this is only set up for
sequences which are retrieved when complexity is set to '0' so it's a pretty
unique case.  Regardless, I'm worried that, since users expect a Bio::Seq
object instead of a Bio::SeqIO object here, it will cause a lot of confusion
with the API.  Any suggestions/gripes?

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From mamillerpa at yahoo.com  Tue May  2 11:41:01 2006
From: mamillerpa at yahoo.com (Mark A. Miller)
Date: Tue, 2 May 2006 04:41:01 -0700 (PDT)
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC lines
Message-ID: <20060502114101.29745.qmail@web50409.mail.yahoo.com>

Hello all.

I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
make FASTA subset files for some bacterial strains.  I haven't been
able to parse out the strain information from the OS or RC lines. 
These lines typically look like:

OS Somegenus somespecies subsp. somesubspecies strain ABC123.
RC STRAIN=ABC123.

I'm not especiialy good with Perl, and I'm definitely weak when it
comes to OOP.

I have included some code I pasted together from various pages on the
bioperl wiki.  In addition to the wiki, I have been making use of 
www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html

The code I have so far reports the species but not the subspecies or
variant.  I have also tried to walk through all of the feature,
annotation and reference objects but I still can't seem to parse out
the information I need.  (For brevity, the example I'm including below
only lists the code I used for the annotation objects.)  Also, this
code only prints the information...  I know that I'll have to write a
FASTA sequence object seperately.

Any suggestions?

Thanks,
Mark

---   ---   ---


#!/usr/bin/perl


use Bio::SeqIO;


my $usage = "getaccs.pl file format\n";

my $file = shift or die $usage;

my $format = shift or die $usage;


my $inseq = Bio::SeqIO->new(-file   => "<$file",

   -format => $format );


while (my $seq = $inseq->next_seq) {


  my $species_object = $seq->species;

  my $species_string = $species_object->species;

  my $variant_string = $species_object->variant;

  my $common_string = $species_object->common_name;

  my $sub_string = $species_object->sub_species;

  my $binomial = $species_object->binomial('FULL');

  
  print "display   ",$seq->display_id,"\n";

  print "accession ",$seq->accession_number,"\n";

  print "desc      ",$seq->desc,"\n";

  
  print "species   ",$species_string,"\n";

  print "variant   ",$variant_string,"\n";

  print "common    ",$common_string,"\n";

  print "sub       ",$sub_string,"\n";

  print "binomial  ",$binomial,"\n";

  
  print $seq->seq,"\n";

  
  my $anno_collection = $seq->annotation;

  for my $key ( $anno_collection->get_all_annotation_keys ) {

    my @annotations = $anno_collection->get_Annotations($key);

    for my $value ( @annotations ) {

      print "tagname : ", $value->tagname, "\n";

      # $value is an Bio::Annotation, and has an "as_text" method

      print "  annotation value: ", $value->as_text, "\n";


       if ($value->tagname eq "reference") {

        my $hash_ref = $value->hash_tree;

        for my $key (keys %{$hash_ref}) {

          print $key,": ",$hash_ref->{$key},"\n";

        }

      }

    }

  }

  print "\n";

}

exit;


---   ---   ---   ---   ---   ---   ---   ---

Mark A. Miller

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  2 18:01:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 2 May 2006 13:01:58 -0500
Subject: [Bioperl-l] Bio::DB::GenBank and complexity
In-Reply-To: <000901c66e04$33e07370$15327e82@pyrimidine>
Message-ID: <000a01c66e12$8131a960$15327e82@pyrimidine>

I hate responding to my own post!  Just wanted to add that I'm adding a
warnings for the get_Seq* methods to use the approp. get_Stream* method when
complexity == 0 before returning the Bio::SeqIO object.

CJF

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, May 02, 2006 11:20 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::GenBank and complexity
> 
> I ran into some wonkiness with using extra parameters ('seq_start',
> 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
> gone through, fixed, and committed.  I also have added a few tests to DB.t
> for everything (all changes were in Bio::DB::WebDBSeqI and
> Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
> manage to get it added as well (with tests).  This is how NCBI defines
> complexity:
> 
> complexity regulates the display:
> 0 - get the whole blob
> 1 - get the bioseq for gi of interest (default in Entrez)
> 2 - get the minimal bioseq-set containing the gi of interest
> 3 - get the minimal nuc-prot containing the gi of interest
> 4 - get the minimal pub-set containing the gi of interest
> 
> Here's my quandary; when setting complexity to '0', you get a glob back
> (the
> main sequence as well as any subsequences, such as CDS); this is in
> essence
> a sequence stream with multiple alphabet types.  So, I now have it set up
> to
> do this:
> 
> my $factory = Bio::DB::GenBank->new(-format => 'fasta',
>                                     -complexity => 0
>                                    );
> 
> my $seqin = $factory->get_Seq_by_acc($acc);
> 
> while (my $seq = $seqin->next_seq) {
>     $seqout->write_seq($seq);
> }
> 
> since I thought returning an array would be horrendously expensive on
> memory, esp. with larger sequences.  Currently this is only set up for
> sequences which are retrieved when complexity is set to '0' so it's a
> pretty
> unique case.  Regardless, I'm worried that, since users expect a Bio::Seq
> object instead of a Bio::SeqIO object here, it will cause a lot of
> confusion
> with the API.  Any suggestions/gripes?
> 
> Chris
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Tue May  2 18:36:08 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 2 May 2006 14:36:08 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
References: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
Message-ID: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>

This is really a limitation of the EMBL/GenBank format

See this thread:
http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html

or on GMANE
http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557

I don't know if any of this has been resolved really so hopefully  
James will speak up if he's implemented anything.

-jason
On May 2, 2006, at 7:41 AM, Mark A. Miller wrote:

> Hello all.
>
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
>
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
>
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
>
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
>
> The code I have so far reports the species but not the subspecies or
> variant.  I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need.  (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.)  Also, this
> code only prints the information...  I know that I'll have to write a
> FASTA sequence object seperately.
>
> Any suggestions?
>
> Thanks,
> Mark
>
> ---   ---   ---
>
>
> #!/usr/bin/perl
>
>
>
> use Bio::SeqIO;
>
>
>
> my $usage = "getaccs.pl file format\n";
>
> my $file = shift or die $usage;
>
> my $format = shift or die $usage;
>
>
>
> my $inseq = Bio::SeqIO->new(-file   => "<$file",
>
>    -format => $format );
>
>
>
> while (my $seq = $inseq->next_seq) {
>
>
>
>   my $species_object = $seq->species;
>
>   my $species_string = $species_object->species;
>
>   my $variant_string = $species_object->variant;
>
>   my $common_string = $species_object->common_name;
>
>   my $sub_string = $species_object->sub_species;
>
>   my $binomial = $species_object->binomial('FULL');
>
>
>
>   print "display   ",$seq->display_id,"\n";
>
>   print "accession ",$seq->accession_number,"\n";
>
>   print "desc      ",$seq->desc,"\n";
>
>
>
>   print "species   ",$species_string,"\n";
>
>   print "variant   ",$variant_string,"\n";
>
>   print "common    ",$common_string,"\n";
>
>   print "sub       ",$sub_string,"\n";
>
>   print "binomial  ",$binomial,"\n";
>
>
>
>   print $seq->seq,"\n";
>
>
>
>   my $anno_collection = $seq->annotation;
>
>   for my $key ( $anno_collection->get_all_annotation_keys ) {
>
>     my @annotations = $anno_collection->get_Annotations($key);
>
>     for my $value ( @annotations ) {
>
>       print "tagname : ", $value->tagname, "\n";
>
>       # $value is an Bio::Annotation, and has an "as_text" method
>
>       print "  annotation value: ", $value->as_text, "\n";
>
>
>
>        if ($value->tagname eq "reference") {
>
>         my $hash_ref = $value->hash_tree;
>
>         for my $key (keys %{$hash_ref}) {
>
>           print $key,": ",$hash_ref->{$key},"\n";
>
>         }
>
>       }
>
>     }
>
>   }
>
>   print "\n";
>
> }
>
> exit;
>
>
>
>
>
> ---   ---   ---   ---   ---   ---   ---   ---
>
> Mark A. Miller
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From mblanche at berkeley.edu  Tue May  2 19:30:49 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 12:30:49 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
Message-ID: <C07D0179.2183%mblanche@berkeley.edu>

Dear all--

I have been trying to use the intersection function to extract overlapping
region from alternatively spliced exons as in the following script. The
returned object from the 'my $overlap = $exon1->intersection($exon2);' is
actually loosing the strand of $exon1 if $exon1 is from the negative strand.
Is this behavior expected? Should I check the strand of $exon1 before
working on the object return by any Bio::RangeI function?

Many thanks 

#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;

MAIN:{

    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                -dsn =>
'dbi:mysql:database=dmel_43_LS;host=riolab.net',
                                -user => 'guest');
    my $test_db = $db->segment('4');
    
    # Load up the exons into $exons_p
    for my $gene ($test_db->features(-types => 'gene')){

        my $exons_p = extractExons($gene);

        cluster($exons_p) unless ($#{$exons_p} == -1);

    }
}

sub extractExons {
    my $gene = shift;
    my %ex_list;
    my @tcs = $gene->features(    -type =>'processed_transcript',
                                    -attributes =>{Gene => $gene->group});
                   
    for my $tc (@tcs){
        my @exons = $tc->features (-type => 'exon',
                                     -attributes => {Parent => $tc->group}
);        
    
        for (@exons){
            my $ex_id    = $_->id;
            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});

        }
    
    }
    my @values = values %ex_list;
    return(\@values);
}

sub cluster {
    my $exons_p = shift;
    
    for (my $s = 0; $s <= $#{$exons_p}; $s++){
        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
            my $exon1 = $exons_p->[$s];
            my $exon2 = $exons_p->[$t];
            
            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
            
                my $overlap = $exon1->intersection($exon2);
                
                print "===\n";;
                print "ex1\n", $exon1->seq, "\n";
                print "ex2\n", $exon2->seq, "\n";
                print "overlap\n", $overlap->seq, "\n";
            }
        }
    }
}
______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From osborne1 at optonline.net  Tue May  2 20:17:29 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 02 May 2006 16:17:29 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D0179.2183%mblanche@berkeley.edu>
Message-ID: <C07D3699.84BC%osborne1@optonline.net>

Marco,

Yes, this is how intersection() is supposed to work. If both of the Range
objects have the same strand then the strand information is returned as part
of the result but if they aren't on the same strand then no strand
information is returned.

Brian O.


On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:

> Dear all--
> 
> I have been trying to use the intersection function to extract overlapping
> region from alternatively spliced exons as in the following script. The
> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
> Is this behavior expected? Should I check the strand of $exon1 before
> working on the object return by any Bio::RangeI function?
> 
> Many thanks 
> 
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
> 
> MAIN:{
> 
>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                 -dsn =>
> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                 -user => 'guest');
>     my $test_db = $db->segment('4');
>     
>     # Load up the exons into $exons_p
>     for my $gene ($test_db->features(-types => 'gene')){
> 
>         my $exons_p = extractExons($gene);
> 
>         cluster($exons_p) unless ($#{$exons_p} == -1);
> 
>     }
> }
> 
> sub extractExons {
>     my $gene = shift;
>     my %ex_list;
>     my @tcs = $gene->features(    -type =>'processed_transcript',
>                                     -attributes =>{Gene => $gene->group});
>                  
>     for my $tc (@tcs){
>         my @exons = $tc->features (-type => 'exon',
>                                      -attributes => {Parent => $tc->group}
> );        
>     
>         for (@exons){
>             my $ex_id    = $_->id;
>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> 
>         }
>     
>     }
>     my @values = values %ex_list;
>     return(\@values);
> }
> 
> sub cluster {
>     my $exons_p = shift;
>     
>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>             my $exon1 = $exons_p->[$s];
>             my $exon2 = $exons_p->[$t];
>             
>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>             
>                 my $overlap = $exon1->intersection($exon2);
>                 
>                 print "===\n";;
>                 print "ex1\n", $exon1->seq, "\n";
>                 print "ex2\n", $exon2->seq, "\n";
>                 print "overlap\n", $overlap->seq, "\n";
>             }
>         }
>     }
> }
> ______________________________
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062


From mblanche at berkeley.edu  Tue May  2 20:32:58 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 13:32:58 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D3699.84BC%osborne1@optonline.net>
Message-ID: <C07D100A.218A%mblanche@berkeley.edu>

Brian--

Even when both elements of intersection() are from the negative strand, the
return object is from the positive strand and $overlap is actually the
revervese complement of the intersection between the 2 exons. Here is part
of the output from the script below:

===
ex1     Strand: -1
CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
ex2     Strand: -1
CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
CAAATCG
overlap Strand: 1
CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
TGCCGACTGCCATGTTCAACTAATAAACCGG
AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
...

If both are from the positive strand, the return object is positive as in:

===
ex1     Strand: 1
CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
TTTGTGCCTGTTTCAGTATAAATTAATTATG
CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
AAATATACATATATGCAACATATATAACTTC
CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
ex2     Strand: 1
ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
overlap Strand: 1
CAACGCAGACGTG

Is there something I am missing? Here is the script generating the output

Many thanks all...

Marco


use strict;
use warnings;
use Bio::DB::GFF;

MAIN:{

    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                -dsn =>
'dbi:mysql:database=dmel_43_LS;host=riolab.net',
                                -user => 'guest');
    my $test_db = $db->segment('4');
    
    # Load up the exons into $exons_p
    for my $gene ($test_db->features(-types => 'gene')){

        my $exons_p = extractExons($gene);

        cluster($exons_p) unless ($#{$exons_p} == -1);

    }
}

sub extractExons {
    my $gene = shift;
    my %ex_list;
    my @tcs = $gene->features(    -type =>'processed_transcript',
                                    -attributes =>{Gene => $gene->group});
                   
    for my $tc (@tcs){
        my @exons = $tc->features (-type => 'exon',
                                     -attributes => {Parent => $tc->group}
);        
    
        for (@exons){
            my $ex_id    = $_->id;
            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});

        }
    
    }
    my @values = values %ex_list;
    return(\@values);
}

sub cluster {
    my $exons_p = shift;
    
    for (my $s = 0; $s <= $#{$exons_p}; $s++){
        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
            my $exon1 = $exons_p->[$s];
            my $exon2 = $exons_p->[$t];
            
            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
            
                my $overlap = $exon1->intersection($exon2);
                
                print "===\n";;
                print     "ex1\tStrand: ", $exon1->strand, "\n",
                        $exon1->seq, "\n";
                print     "ex2\tStrand: ", $exon2->strand, "\n",
                        $exon2->seq, "\n";
                print "overlap\tStrand: ", $overlap->strand, "\n",
                        $overlap->seq, "\n";
            }
        }
    }
}

On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:

> Marco,
> 
> Yes, this is how intersection() is supposed to work. If both of the Range
> objects have the same strand then the strand information is returned as part
> of the result but if they aren't on the same strand then no strand
> information is returned.
> 
> Brian O.
> 
> 
> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> 
>> Dear all--
>> 
>> I have been trying to use the intersection function to extract overlapping
>> region from alternatively spliced exons as in the following script. The
>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
>> Is this behavior expected? Should I check the strand of $exon1 before
>> working on the object return by any Bio::RangeI function?
>> 
>> Many thanks 
>> 
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> 
>> MAIN:{
>> 
>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                 -dsn =>
>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>                                 -user => 'guest');
>>     my $test_db = $db->segment('4');
>>     
>>     # Load up the exons into $exons_p
>>     for my $gene ($test_db->features(-types => 'gene')){
>> 
>>         my $exons_p = extractExons($gene);
>> 
>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>> 
>>     }
>> }
>> 
>> sub extractExons {
>>     my $gene = shift;
>>     my %ex_list;
>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>                                     -attributes =>{Gene => $gene->group});
>>                 
>>     for my $tc (@tcs){
>>         my @exons = $tc->features (-type => 'exon',
>>                                      -attributes => {Parent => $tc->group}
>> );        
>>     
>>         for (@exons){
>>             my $ex_id    = $_->id;
>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>> 
>>         }
>>     
>>     }
>>     my @values = values %ex_list;
>>     return(\@values);
>> }
>> 
>> sub cluster {
>>     my $exons_p = shift;
>>     
>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>             my $exon1 = $exons_p->[$s];
>>             my $exon2 = $exons_p->[$t];
>>             
>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>             
>>                 my $overlap = $exon1->intersection($exon2);
>>                 
>>                 print "===\n";;
>>                 print "ex1\n", $exon1->seq, "\n";
>>                 print "ex2\n", $exon2->seq, "\n";
>>                 print "overlap\n", $overlap->seq, "\n";
>>             }
>>         }
>>     }
>> }
>> ______________________________
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at uclink.berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
> 
> 

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From osborne1 at optonline.net  Tue May  2 21:49:49 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 02 May 2006 17:49:49 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D100A.218A%mblanche@berkeley.edu>
Message-ID: <C07D4C3D.84C4%osborne1@optonline.net>

Marco,

Odd, because the intersection() code is quite simple and it's clear how it
should behave. What version of Bioperl are you using? I'm looking at the
latest, in bioperl-live...

Brian O.


On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:

> Brian--
> 
> Even when both elements of intersection() are from the negative strand, the
> return object is from the positive strand and $overlap is actually the
> revervese complement of the intersection between the 2 exons. Here is part
> of the output from the script below:
> 
> ===
> ex1     Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
> ex2     Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
> CAAATCG
> overlap Strand: 1
> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
> TGCCGACTGCCATGTTCAACTAATAAACCGG
> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
> ...
> 
> If both are from the positive strand, the return object is positive as in:
> 
> ===
> ex1     Strand: 1
> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
> TTTGTGCCTGTTTCAGTATAAATTAATTATG
> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
> AAATATACATATATGCAACATATATAACTTC
> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
> ex2     Strand: 1
> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
> overlap Strand: 1
> CAACGCAGACGTG
> 
> Is there something I am missing? Here is the script generating the output
> 
> Many thanks all...
> 
> Marco
> 
> 
> use strict;
> use warnings;
> use Bio::DB::GFF;
> 
> MAIN:{
> 
>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                 -dsn =>
> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                 -user => 'guest');
>     my $test_db = $db->segment('4');
>     
>     # Load up the exons into $exons_p
>     for my $gene ($test_db->features(-types => 'gene')){
> 
>         my $exons_p = extractExons($gene);
> 
>         cluster($exons_p) unless ($#{$exons_p} == -1);
> 
>     }
> }
> 
> sub extractExons {
>     my $gene = shift;
>     my %ex_list;
>     my @tcs = $gene->features(    -type =>'processed_transcript',
>                                     -attributes =>{Gene => $gene->group});
>                  
>     for my $tc (@tcs){
>         my @exons = $tc->features (-type => 'exon',
>                                      -attributes => {Parent => $tc->group}
> );        
>     
>         for (@exons){
>             my $ex_id    = $_->id;
>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> 
>         }
>     
>     }
>     my @values = values %ex_list;
>     return(\@values);
> }
> 
> sub cluster {
>     my $exons_p = shift;
>     
>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>             my $exon1 = $exons_p->[$s];
>             my $exon2 = $exons_p->[$t];
>             
>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>             
>                 my $overlap = $exon1->intersection($exon2);
>                 
>                 print "===\n";;
>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
>                         $exon1->seq, "\n";
>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
>                         $exon2->seq, "\n";
>                 print "overlap\tStrand: ", $overlap->strand, "\n",
>                         $overlap->seq, "\n";
>             }
>         }
>     }
> }
> 
> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
> 
>> Marco,
>> 
>> Yes, this is how intersection() is supposed to work. If both of the Range
>> objects have the same strand then the strand information is returned as part
>> of the result but if they aren't on the same strand then no strand
>> information is returned.
>> 
>> Brian O.
>> 
>> 
>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>> 
>>> Dear all--
>>> 
>>> I have been trying to use the intersection function to extract overlapping
>>> region from alternatively spliced exons as in the following script. The
>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
>>> Is this behavior expected? Should I check the strand of $exon1 before
>>> working on the object return by any Bio::RangeI function?
>>> 
>>> Many thanks 
>>> 
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::DB::GFF;
>>> 
>>> MAIN:{
>>> 
>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>                                 -dsn =>
>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>>                                 -user => 'guest');
>>>     my $test_db = $db->segment('4');
>>>     
>>>     # Load up the exons into $exons_p
>>>     for my $gene ($test_db->features(-types => 'gene')){
>>> 
>>>         my $exons_p = extractExons($gene);
>>> 
>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>>> 
>>>     }
>>> }
>>> 
>>> sub extractExons {
>>>     my $gene = shift;
>>>     my %ex_list;
>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>>                                     -attributes =>{Gene => $gene->group});
>>>                
>>>     for my $tc (@tcs){
>>>         my @exons = $tc->features (-type => 'exon',
>>>                                      -attributes => {Parent => $tc->group}
>>> );        
>>>     
>>>         for (@exons){
>>>             my $ex_id    = $_->id;
>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>> 
>>>         }
>>>     
>>>     }
>>>     my @values = values %ex_list;
>>>     return(\@values);
>>> }
>>> 
>>> sub cluster {
>>>     my $exons_p = shift;
>>>     
>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>>             my $exon1 = $exons_p->[$s];
>>>             my $exon2 = $exons_p->[$t];
>>>             
>>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>             
>>>                 my $overlap = $exon1->intersection($exon2);
>>>                
>>>                 print "===\n";;
>>>                 print "ex1\n", $exon1->seq, "\n";
>>>                 print "ex2\n", $exon2->seq, "\n";
>>>                 print "overlap\n", $overlap->seq, "\n";
>>>             }
>>>         }
>>>     }
>>> }
>>> ______________________________
>>> Marco Blanchette, Ph.D.
>>> 
>>> mblanche at uclink.berkeley.edu
>>> 
>>> Donald C. Rio's lab
>>> Department of Molecular and Cell Biology
>>> 16 Barker Hall
>>> University of California
>>> Berkeley, CA 94720-3204
>>> 
>>> Tel: (510) 642-1084
>>> Cell: (510) 847-0996
>>> Fax: (510) 642-6062
>> 
>> 
> 
> ______________________________
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062


From mblanche at berkeley.edu  Tue May  2 22:31:44 2006
From: mblanche at berkeley.edu (Marco Blanchette)
Date: Tue, 02 May 2006 15:31:44 -0700
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D4C3D.84C4%osborne1@optonline.net>
Message-ID: <C07D2BE0.2196%mblanche@berkeley.edu>

Brian--

I checked out last week version from the CVS.

Silly question: How do I get the version of BioPerl I am using... Never had
to check a module/bundle version number before...

Marco


On 5/2/06 14:49, "Brian Osborne" <osborne1 at optonline.net> wrote:

> Marco,
> 
> Odd, because the intersection() code is quite simple and it's clear how it
> should behave. What version of Bioperl are you using? I'm looking at the
> latest, in bioperl-live...
> 
> Brian O.
> 
> 
> On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> 
>> Brian--
>> 
>> Even when both elements of intersection() are from the negative strand, the
>> return object is from the positive strand and $overlap is actually the
>> revervese complement of the intersection between the 2 exons. Here is part
>> of the output from the script below:
>> 
>> ===
>> ex1     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
>> ex2     Strand: -1
>> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
>> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
>> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
>> CAAATCG
>> overlap Strand: 1
>> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
>> TGCCGACTGCCATGTTCAACTAATAAACCGG
>> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
>> ...
>> 
>> If both are from the positive strand, the return object is positive as in:
>> 
>> ===
>> ex1     Strand: 1
>> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
>> TTTGTGCCTGTTTCAGTATAAATTAATTATG
>> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
>> AAATATACATATATGCAACATATATAACTTC
>> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
>> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
>> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
>> ex2     Strand: 1
>> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
>> overlap Strand: 1
>> CAACGCAGACGTG
>> 
>> Is there something I am missing? Here is the script generating the output
>> 
>> Many thanks all...
>> 
>> Marco
>> 
>> 
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> 
>> MAIN:{
>> 
>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>                                 -dsn =>
>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>                                 -user => 'guest');
>>     my $test_db = $db->segment('4');
>>     
>>     # Load up the exons into $exons_p
>>     for my $gene ($test_db->features(-types => 'gene')){
>> 
>>         my $exons_p = extractExons($gene);
>> 
>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>> 
>>     }
>> }
>> 
>> sub extractExons {
>>     my $gene = shift;
>>     my %ex_list;
>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>                                     -attributes =>{Gene => $gene->group});
>>                 
>>     for my $tc (@tcs){
>>         my @exons = $tc->features (-type => 'exon',
>>                                      -attributes => {Parent => $tc->group}
>> );        
>>     
>>         for (@exons){
>>             my $ex_id    = $_->id;
>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>> 
>>         }
>>     
>>     }
>>     my @values = values %ex_list;
>>     return(\@values);
>> }
>> 
>> sub cluster {
>>     my $exons_p = shift;
>>     
>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>             my $exon1 = $exons_p->[$s];
>>             my $exon2 = $exons_p->[$t];
>>             
>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>             
>>                 my $overlap = $exon1->intersection($exon2);
>>                 
>>                 print "===\n";;
>>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
>>                         $exon1->seq, "\n";
>>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
>>                         $exon2->seq, "\n";
>>                 print "overlap\tStrand: ", $overlap->strand, "\n",
>>                         $overlap->seq, "\n";
>>             }
>>         }
>>     }
>> }
>> 
>> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
>> 
>>> Marco,
>>> 
>>> Yes, this is how intersection() is supposed to work. If both of the Range
>>> objects have the same strand then the strand information is returned as part
>>> of the result but if they aren't on the same strand then no strand
>>> information is returned.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>>> 
>>>> Dear all--
>>>> 
>>>> I have been trying to use the intersection function to extract overlapping
>>>> region from alternatively spliced exons as in the following script. The
>>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>>> actually loosing the strand of $exon1 if $exon1 is from the negative
>>>> strand.
>>>> Is this behavior expected? Should I check the strand of $exon1 before
>>>> working on the object return by any Bio::RangeI function?
>>>> 
>>>> Many thanks 
>>>> 
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use Bio::DB::GFF;
>>>> 
>>>> MAIN:{
>>>> 
>>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>>                                 -dsn =>
>>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>>>                                 -user => 'guest');
>>>>     my $test_db = $db->segment('4');
>>>>     
>>>>     # Load up the exons into $exons_p
>>>>     for my $gene ($test_db->features(-types => 'gene')){
>>>> 
>>>>         my $exons_p = extractExons($gene);
>>>> 
>>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
>>>> 
>>>>     }
>>>> }
>>>> 
>>>> sub extractExons {
>>>>     my $gene = shift;
>>>>     my %ex_list;
>>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
>>>>                                     -attributes =>{Gene => $gene->group});
>>>>               
>>>>     for my $tc (@tcs){
>>>>         my @exons = $tc->features (-type => 'exon',
>>>>                                      -attributes => {Parent => $tc->group}
>>>> );        
>>>>     
>>>>         for (@exons){
>>>>             my $ex_id    = $_->id;
>>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>>> 
>>>>         }
>>>>     
>>>>     }
>>>>     my @values = values %ex_list;
>>>>     return(\@values);
>>>> }
>>>> 
>>>> sub cluster {
>>>>     my $exons_p = shift;
>>>>     
>>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>>>             my $exon1 = $exons_p->[$s];
>>>>             my $exon2 = $exons_p->[$t];
>>>>             
>>>>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>>             
>>>>                 my $overlap = $exon1->intersection($exon2);
>>>>               
>>>>                 print "===\n";;
>>>>                 print "ex1\n", $exon1->seq, "\n";
>>>>                 print "ex2\n", $exon2->seq, "\n";
>>>>                 print "overlap\n", $overlap->seq, "\n";
>>>>             }
>>>>         }
>>>>     }
>>>> }
>>>> ______________________________
>>>> Marco Blanchette, Ph.D.
>>>> 
>>>> mblanche at uclink.berkeley.edu
>>>> 
>>>> Donald C. Rio's lab
>>>> Department of Molecular and Cell Biology
>>>> 16 Barker Hall
>>>> University of California
>>>> Berkeley, CA 94720-3204
>>>> 
>>>> Tel: (510) 642-1084
>>>> Cell: (510) 847-0996
>>>> Fax: (510) 642-6062
>>> 
>>> 
>> 
>> ______________________________
>> Marco Blanchette, Ph.D.
>> 
>> mblanche at uclink.berkeley.edu
>> 
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>> 
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
> 
> 

______________________________
Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
-- 


From arareko at campus.iztacala.unam.mx  Tue May  2 22:32:24 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Tue, 02 May 2006 17:32:24 -0500
Subject: [Bioperl-l] BioPerl-run in FreeBSD
Message-ID: <4457DDF8.4050005@campus.iztacala.unam.mx>

It?s my great pleasure to announce the availability of the BioPerl-run 
packages (stable & developer releases) for the FreeBSD operating system.

For instructions on how to install BioPerl ports in FreeBSD, please take 
a look into the Getting Bioperl section of the BioPerl Wiki.

Regards,
Mauricio.
-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From heikki at sanbi.ac.za  Wed May  3 06:51:12 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 3 May 2006 08:51:12 +0200
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D2BE0.2196%mblanche@berkeley.edu>
References: <C07D2BE0.2196%mblanche@berkeley.edu>
Message-ID: <200605030851.13007.heikki@sanbi.ac.za>

On Wednesday 03 May 2006 00:31, Marco Blanchette wrote:
> Brian--
>
> I checked out last week version from the CVS.
>
> Silly question: How do I get the version of BioPerl I am using... Never had
> to check a module/bundle version number before...

It is not that silly. The syntax in not too easy:

	perl -MBio::Perl -le 'print Bio::Perl->VERSION;'

You can use any module in bioperl, of course.

     -Heikki

> Marco
>
> On 5/2/06 14:49, "Brian Osborne" <osborne1 at optonline.net> wrote:
> > Marco,
> >
> > Odd, because the intersection() code is quite simple and it's clear how
> > it should behave. What version of Bioperl are you using? I'm looking at
> > the latest, in bioperl-live...
> >
> > Brian O.
> >
> > On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> >> Brian--
> >>
> >> Even when both elements of intersection() are from the negative strand,
> >> the return object is from the positive strand and $overlap is actually
> >> the revervese complement of the intersection between the 2 exons. Here
> >> is part of the output from the script below:
> >>
> >> ===
> >> ex1     Strand: -1
> >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA
> >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
> >> ex2     Strand: -1
> >> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAA
> >>AATA AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> >> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAAC
> >>CCGT CAAATCG
> >> overlap Strand: 1
> >> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTA
> >>TTTT TGCCGACTGCCATGTTCAACTAATAAACCGG
> >> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
> >> ...
> >>
> >> If both are from the positive strand, the return object is positive as
> >> in:
> >>
> >> ===
> >> ex1     Strand: 1
> >> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCT
> >>TTGT TTTGTGCCTGTTTCAGTATAAATTAATTATG
> >> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGAT
> >>GAAT AAATATACATATATGCAACATATATAACTTC
> >> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGG
> >>CAGA GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
> >> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
> >> ex2     Strand: 1
> >> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
> >> overlap Strand: 1
> >> CAACGCAGACGTG
> >>
> >> Is there something I am missing? Here is the script generating the
> >> output
> >>
> >> Many thanks all...
> >>
> >> Marco
> >>
> >>
> >> use strict;
> >> use warnings;
> >> use Bio::DB::GFF;
> >>
> >> MAIN:{
> >>
> >>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >>                                 -dsn =>
> >> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
> >>                                 -user => 'guest');
> >>     my $test_db = $db->segment('4');
> >>
> >>     # Load up the exons into $exons_p
> >>     for my $gene ($test_db->features(-types => 'gene')){
> >>
> >>         my $exons_p = extractExons($gene);
> >>
> >>         cluster($exons_p) unless ($#{$exons_p} == -1);
> >>
> >>     }
> >> }
> >>
> >> sub extractExons {
> >>     my $gene = shift;
> >>     my %ex_list;
> >>     my @tcs = $gene->features(    -type =>'processed_transcript',
> >>                                     -attributes =>{Gene =>
> >> $gene->group});
> >>
> >>     for my $tc (@tcs){
> >>         my @exons = $tc->features (-type => 'exon',
> >>                                      -attributes => {Parent =>
> >> $tc->group} );
> >>
> >>         for (@exons){
> >>             my $ex_id    = $_->id;
> >>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> >>
> >>         }
> >>
> >>     }
> >>     my @values = values %ex_list;
> >>     return(\@values);
> >> }
> >>
> >> sub cluster {
> >>     my $exons_p = shift;
> >>
> >>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
> >>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
> >>             my $exon1 = $exons_p->[$s];
> >>             my $exon2 = $exons_p->[$t];
> >>
> >>             if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
> >>
> >>                 my $overlap = $exon1->intersection($exon2);
> >>
> >>                 print "===\n";;
> >>                 print     "ex1\tStrand: ", $exon1->strand, "\n",
> >>                         $exon1->seq, "\n";
> >>                 print     "ex2\tStrand: ", $exon2->strand, "\n",
> >>                         $exon2->seq, "\n";
> >>                 print "overlap\tStrand: ", $overlap->strand, "\n",
> >>                         $overlap->seq, "\n";
> >>             }
> >>         }
> >>     }
> >> }
> >>
> >> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
> >>> Marco,
> >>>
> >>> Yes, this is how intersection() is supposed to work. If both of the
> >>> Range objects have the same strand then the strand information is
> >>> returned as part of the result but if they aren't on the same strand
> >>> then no strand information is returned.
> >>>
> >>> Brian O.
> >>>
> >>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> >>>> Dear all--
> >>>>
> >>>> I have been trying to use the intersection function to extract
> >>>> overlapping region from alternatively spliced exons as in the
> >>>> following script. The returned object from the 'my $overlap =
> >>>> $exon1->intersection($exon2);' is actually loosing the strand of
> >>>> $exon1 if $exon1 is from the negative strand.
> >>>> Is this behavior expected? Should I check the strand of $exon1 before
> >>>> working on the object return by any Bio::RangeI function?
> >>>>
> >>>> Many thanks
> >>>>
> >>>> #!/usr/bin/perl
> >>>> use strict;
> >>>> use warnings;
> >>>> use Bio::DB::GFF;
> >>>>
> >>>> MAIN:{
> >>>>
> >>>>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> >>>>                                 -dsn =>
> >>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
> >>>>                                 -user => 'guest');
> >>>>     my $test_db = $db->segment('4');
> >>>>
> >>>>     # Load up the exons into $exons_p
> >>>>     for my $gene ($test_db->features(-types => 'gene')){
> >>>>
> >>>>         my $exons_p = extractExons($gene);
> >>>>
> >>>>         cluster($exons_p) unless ($#{$exons_p} == -1);
> >>>>
> >>>>     }
> >>>> }
> >>>>
> >>>> sub extractExons {
> >>>>     my $gene = shift;
> >>>>     my %ex_list;
> >>>>     my @tcs = $gene->features(    -type =>'processed_transcript',
> >>>>                                     -attributes =>{Gene =>
> >>>> $gene->group});
> >>>>
> >>>>     for my $tc (@tcs){
> >>>>         my @exons = $tc->features (-type => 'exon',
> >>>>                                      -attributes => {Parent =>
> >>>> $tc->group} );
> >>>>
> >>>>         for (@exons){
> >>>>             my $ex_id    = $_->id;
> >>>>             $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
> >>>>
> >>>>         }
> >>>>
> >>>>     }
> >>>>     my @values = values %ex_list;
> >>>>     return(\@values);
> >>>> }
> >>>>
> >>>> sub cluster {
> >>>>     my $exons_p = shift;
> >>>>
> >>>>     for (my $s = 0; $s <= $#{$exons_p}; $s++){
> >>>>         for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
> >>>>             my $exon1 = $exons_p->[$s];
> >>>>             my $exon2 = $exons_p->[$t];
> >>>>
> >>>>             if (!($exon1->equals($exon2)) &&
> >>>> $exon1->overlaps($exon2)){
> >>>>
> >>>>                 my $overlap = $exon1->intersection($exon2);
> >>>>
> >>>>                 print "===\n";;
> >>>>                 print "ex1\n", $exon1->seq, "\n";
> >>>>                 print "ex2\n", $exon2->seq, "\n";
> >>>>                 print "overlap\n", $overlap->seq, "\n";
> >>>>             }
> >>>>         }
> >>>>     }
> >>>> }
> >>>> ______________________________
> >>>> Marco Blanchette, Ph.D.
> >>>>
> >>>> mblanche at uclink.berkeley.edu
> >>>>
> >>>> Donald C. Rio's lab
> >>>> Department of Molecular and Cell Biology
> >>>> 16 Barker Hall
> >>>> University of California
> >>>> Berkeley, CA 94720-3204
> >>>>
> >>>> Tel: (510) 642-1084
> >>>> Cell: (510) 847-0996
> >>>> Fax: (510) 642-6062
> >>
> >> ______________________________
> >> Marco Blanchette, Ph.D.
> >>
> >> mblanche at uclink.berkeley.edu
> >>
> >> Donald C. Rio's lab
> >> Department of Molecular and Cell Biology
> >> 16 Barker Hall
> >> University of California
> >> Berkeley, CA 94720-3204
> >>
> >> Tel: (510) 642-1084
> >> Cell: (510) 847-0996
> >> Fax: (510) 642-6062
>
> ______________________________
> Marco Blanchette, Ph.D.
>
> mblanche at uclink.berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From nuclearn at gmail.com  Wed May  3 06:05:42 2006
From: nuclearn at gmail.com (Li Xiao)
Date: Wed, 3 May 2006 14:05:42 +0800
Subject: [Bioperl-l] about the frame and strand of a blastx report
Message-ID: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com>

Hi, anybody,

    I am working to parse a blastx report by using BioPerl modules
(Bio::SearchIO).
The blastx result was created by NCBI-BLAST. How i can obtain the strand ( +
or -)
of query sequence against the hited protein? I tried to use the strand
function, but
nothing were reported. And i used the frame funtion, the result usually
display 0,1,2,
so, the result can not give any information about the query strand( + o r-
).
  How i obtain the strand of a query squence?
--
*********************************************************************
Li Xiao
Sichuan Key Laboratory of Molecular Biology and Biotechnology
College of Life Science, Sichuan University
Chengdu, SiChuan, P.R.China
TEL:86-28-85470083 FAX:86-28-85412738
E-MAIL: nuclearn at gmail.com
URL: http://scbi.scu.edu.cn
**********************************************************************


From cjfields at uiuc.edu  Wed May  3 13:38:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 08:38:17 -0500
Subject: [Bioperl-l] about the frame and strand of a blastx report
In-Reply-To: <150864390605022305p5a04e743l24938386af12edf3@mail.gmail.com>
Message-ID: <000601c66eb6$d5d5f530$15327e82@pyrimidine>

$hsp->strand():

my $parser = Bio::SearchIO->new (-file => shift @ARGV,
                                 -format => 'blast');

while (my $result = $parser->next_result) {
    while (my $hit = $result->next_hit) {
        while (my $hsp = $hit->next_hsp) {
            print $hsp->strand,"\n";
        }
    }
}

This will give 1 or -1.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Li Xiao
> Sent: Wednesday, May 03, 2006 1:06 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] about the frame and strand of a blastx report
> 
> Hi, anybody,
> 
>     I am working to parse a blastx report by using BioPerl modules
> (Bio::SearchIO).
> The blastx result was created by NCBI-BLAST. How i can obtain the strand (
> +
> or -)
> of query sequence against the hited protein? I tried to use the strand
> function, but
> nothing were reported. And i used the frame funtion, the result usually
> display 0,1,2,
> so, the result can not give any information about the query strand( + o r-
> ).
>   How i obtain the strand of a query squence?
> --
> *********************************************************************
> Li Xiao
> Sichuan Key Laboratory of Molecular Biology and Biotechnology
> College of Life Science, Sichuan University
> Chengdu, SiChuan, P.R.China
> TEL:86-28-85470083 FAX:86-28-85412738
> E-MAIL: nuclearn at gmail.com
> URL: http://scbi.scu.edu.cn
> **********************************************************************
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Wed May  3 15:22:27 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 03 May 2006 11:22:27 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
Message-ID: <C07E42F3.84E3%osborne1@optonline.net>

Mark,

So you're trying to get the information in the RC line from a Swissprot
format file?

Brian O.


On 5/2/06 7:41 AM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:

> Hello all.
> 
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
> 
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
> 
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
> 
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
> 
> The code I have so far reports the species but not the subspecies or
> variant.  I have also tried to walk through all of the feature,
> annotation and reference objects but I still can't seem to parse out
> the information I need.  (For brevity, the example I'm including below
> only lists the code I used for the annotation objects.)  Also, this
> code only prints the information...  I know that I'll have to write a
> FASTA sequence object seperately.
> 
> Any suggestions?
> 
> Thanks,
> Mark
> 
> ---   ---   ---
> 
> 
> #!/usr/bin/perl
> 
> 
> 
> use Bio::SeqIO;
> 
> 
> 
> my $usage = "getaccs.pl file format\n";
> 
> my $file = shift or die $usage;
> 
> my $format = shift or die $usage;
> 
> 
> 
> my $inseq = Bio::SeqIO->new(-file   => "<$file",
> 
>    -format => $format );
> 
> 
> 
> while (my $seq = $inseq->next_seq) {
> 
> 
> 
>   my $species_object = $seq->species;
> 
>   my $species_string = $species_object->species;
> 
>   my $variant_string = $species_object->variant;
> 
>   my $common_string = $species_object->common_name;
> 
>   my $sub_string = $species_object->sub_species;
> 
>   my $binomial = $species_object->binomial('FULL');
> 
>   
> 
>   print "display   ",$seq->display_id,"\n";
> 
>   print "accession ",$seq->accession_number,"\n";
> 
>   print "desc      ",$seq->desc,"\n";
> 
>   
> 
>   print "species   ",$species_string,"\n";
> 
>   print "variant   ",$variant_string,"\n";
> 
>   print "common    ",$common_string,"\n";
> 
>   print "sub       ",$sub_string,"\n";
> 
>   print "binomial  ",$binomial,"\n";
> 
>   
> 
>   print $seq->seq,"\n";
> 
>   
> 
>   my $anno_collection = $seq->annotation;
> 
>   for my $key ( $anno_collection->get_all_annotation_keys ) {
> 
>     my @annotations = $anno_collection->get_Annotations($key);
> 
>     for my $value ( @annotations ) {
> 
>       print "tagname : ", $value->tagname, "\n";
> 
>       # $value is an Bio::Annotation, and has an "as_text" method
> 
>       print "  annotation value: ", $value->as_text, "\n";
> 
> 
> 
>        if ($value->tagname eq "reference") {
> 
>         my $hash_ref = $value->hash_tree;
> 
>         for my $key (keys %{$hash_ref}) {
> 
>           print $key,": ",$hash_ref->{$key},"\n";
> 
>         }
> 
>       }
> 
>     }
> 
>   }
> 
>   print "\n";
> 
> }
> 
> exit;
> 
> 
> 
> 
> 
> ---   ---   ---   ---   ---   ---   ---   ---
> 
> Mark A. Miller
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From MEC at stowers-institute.org  Wed May  3 15:09:04 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Wed, 3 May 2006 10:09:04 -0500
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
Message-ID: <CED81D34E37D5043A1211565277A51E504D2E369@exchkc02.stowers-institute.org>

Marco,

It appears that your code assumes that the exons as returned from call
to BIO::DB::GFF::features are sorted by start; I don't think is
guaranteed (at least not in the documentation I'm reading).  Also I
think your code will not report overlap between two exons that have an
intervening overlapping exon.  Depending on what you're application is,
you may care.  For example, e1, e2, e3 all intersect pairwise, but your
code won't report on e1's overlap with e3.

e1 ---*******-------
e2 -----******------
e3 ------***--------

Out of curiousity, what is your application?  Designing primers for gene
resequencing?

Cheers,

Malcolm Cook
Database Applications Manager, Bioinformatics
Stowers Institute for Medical Research 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>Marco Blanchette
>Sent: Tuesday, May 02, 2006 2:31 PM
>To: bioperl-l at lists.open-bio.org
>Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
>
>Dear all--
>
>I have been trying to use the intersection function to extract 
>overlapping
>region from alternatively spliced exons as in the following script. The
>returned object from the 'my $overlap = 
>$exon1->intersection($exon2);' is
>actually loosing the strand of $exon1 if $exon1 is from the 
>negative strand.
>Is this behavior expected? Should I check the strand of $exon1 before
>working on the object return by any Bio::RangeI function?
>
>Many thanks 
>
>#!/usr/bin/perl
>use strict;
>use warnings;
>use Bio::DB::GFF;
>
>MAIN:{
>
>    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>                                -dsn =>
>'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>                                -user => 'guest');
>    my $test_db = $db->segment('4');
>    
>    # Load up the exons into $exons_p
>    for my $gene ($test_db->features(-types => 'gene')){
>
>        my $exons_p = extractExons($gene);
>
>        cluster($exons_p) unless ($#{$exons_p} == -1);
>
>    }
>}
>
>sub extractExons {
>    my $gene = shift;
>    my %ex_list;
>    my @tcs = $gene->features(    -type =>'processed_transcript',
>                                    -attributes =>{Gene => 
>$gene->group});
>                   
>    for my $tc (@tcs){
>        my @exons = $tc->features (-type => 'exon',
>                                     -attributes => {Parent => 
>$tc->group}
>);        
>    
>        for (@exons){
>            my $ex_id    = $_->id;
>            $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>
>        }
>    
>    }
>    my @values = values %ex_list;
>    return(\@values);
>}
>
>sub cluster {
>    my $exons_p = shift;
>    
>    for (my $s = 0; $s <= $#{$exons_p}; $s++){
>        for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>            my $exon1 = $exons_p->[$s];
>            my $exon2 = $exons_p->[$t];
>            
>            if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>            
>                my $overlap = $exon1->intersection($exon2);
>                
>                print "===\n";;
>                print "ex1\n", $exon1->seq, "\n";
>                print "ex2\n", $exon2->seq, "\n";
>                print "overlap\n", $overlap->seq, "\n";
>            }
>        }
>    }
>}
>______________________________
>Marco Blanchette, Ph.D.
>
>mblanche at uclink.berkeley.edu
>
>Donald C. Rio's lab
>Department of Molecular and Cell Biology
>16 Barker Hall
>University of California
>Berkeley, CA 94720-3204
>
>Tel: (510) 642-1084
>Cell: (510) 847-0996
>Fax: (510) 642-6062
>-- 
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From sdavis2 at mail.nih.gov  Wed May  3 16:18:48 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 03 May 2006 12:18:48 -0400
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <CED81D34E37D5043A1211565277A51E504D2E369@exchkc02.stowers-institute.org>
Message-ID: <C07E5028.AF8A%sdavis2@mail.nih.gov>


On 5/3/06 11:09 AM, "Cook, Malcolm" <MEC at stowers-institute.org> wrote:

> Marco,
> 
> It appears that your code assumes that the exons as returned from call
> to BIO::DB::GFF::features are sorted by start; I don't think is
> guaranteed (at least not in the documentation I'm reading).  Also I
> think your code will not report overlap between two exons that have an
> intervening overlapping exon.  Depending on what you're application is,
> you may care.  For example, e1, e2, e3 all intersect pairwise, but your
> code won't report on e1's overlap with e3.
> 
> e1 ---*******-------
> e2 -----******------
> e3 ------***--------

I think this can be done (looking for "superexons") via the UCSC table
browser or via Penn State University's Galaxy server (written in python and
downloadable) in case you want a quick solution to what I think is your
problem....

Sean


From osborne1 at optonline.net  Wed May  3 20:22:57 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 03 May 2006 16:22:57 -0400
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <20060503193446.92476.qmail@web50412.mail.yahoo.com>
Message-ID: <C07E8961.84F2%osborne1@optonline.net>

Mark,

The RC line is part of the description of a reference, I'm guessing 'RC'
stands for Reference Comment. In order to get the attributes of a reference
you'll first do something like:

my $anno_collection = $seq->annotation;
my @references = $anno_collection->get_Annotations('reference');

To get the comment field for a specific reference you can do:

$references[0]->comment;

See the Feature-Annotation HOWTO for more information on Annotations, the
Reference object is a kind of Annotation object.

Brian O.


On 5/3/06 3:34 PM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:

> Yeah.  Do you have any experience with that?
> 
> Mark
> 
> --- Brian Osborne <osborne1 at optonline.net> wrote:
> 
>> Mark,
>> 
>> So you're trying to get the information in the RC line from a
>> Swissprot
>> format file?
>> 
>> Brian O.
> 
> 
> ---   ---   ---   ---   ---   ---   ---   ---
> 
> Mark A. Miller
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 


From cjfields at uiuc.edu  Wed May  3 21:09:36 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 16:09:36 -0500
Subject: [Bioperl-l] Batch retrieval partially implemented in
	Bio::DB::GenBank/GenPept
Message-ID: <000601c66ef5$e3066d90$15327e82@pyrimidine>

Just wanted to let you guys know I have added a few bits and pieces to
Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
epost/efetch.  I didn't want to break anything too severely so you can only
use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
methods yet).  I also added tests to DB.t, a few each for protein and
nucleotide retrieval using batch mode and so far they all pass fine.  

I haven't tested the upper sequence limit for this yet to see if it's at all
comparable to just using efetch but it seems a bit faster.  The eutils
coursebook states that one should only post ~500 at a time (I think you can
get a bit higher though).

Also, at the moment it only works at the moment for GI's (NOT accessions,
which apparently epost does not accept).  If we want to continue using this
method for retrieval then we may need a workaround for accs.

CJF

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From torsten.seemann at infotech.monash.edu.au  Wed May  3 21:44:48 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 04 May 2006 07:44:48 +1000
Subject: [Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
In-Reply-To: <C07D2BE0.2196%mblanche@berkeley.edu>
References: <C07D2BE0.2196%mblanche@berkeley.edu>
Message-ID: <1146692688.12571.1.camel@chauvel.csse.monash.edu.au>

Marco,

> Silly question: How do I get the version of BioPerl I am using... Never had
> to check a module/bundle version number before...

http://bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

-- 
Torsten Seemann <torsten.seemann at infotech.monash.edu.au>
Victorian Bioinformatics Consortium


From cjfields at uiuc.edu  Wed May  3 22:08:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 3 May 2006 17:08:37 -0500
Subject: [Bioperl-l] Batch retrieval partially implemented
	inBio::DB::GenBank/GenPept
In-Reply-To: <000601c66ef5$e3066d90$15327e82@pyrimidine>
Message-ID: <000001c66efe$21dbcf80$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Wednesday, May 03, 2006 4:10 PM
> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Batch retrieval partially implemented
> inBio::DB::GenBank/GenPept
> 
> Just wanted to let you guys know I have added a few bits and pieces to
> Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
                     ^^^^^^^^^^^^^^^^^^^
                     Bio::DB::NCBIHelper
Fat fingers!

> epost/efetch.  I didn't want to break anything too severely so you can
> only
> use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
> methods yet).  I also added tests to DB.t, a few each for protein and
> nucleotide retrieval using batch mode and so far they all pass fine.
> 
> I haven't tested the upper sequence limit for this yet to see if it's at
> all
> comparable to just using efetch but it seems a bit faster.  The eutils
> coursebook states that one should only post ~500 at a time (I think you
> can
> get a bit higher though).
> 
> Also, at the moment it only works at the moment for GI's (NOT accessions,
> which apparently epost does not accept).  If we want to continue using
> this
> method for retrieval then we may need a workaround for accs.
> 
> CJF
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed May  3 22:24:23 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 03 May 2006 17:24:23 -0500
Subject: [Bioperl-l] Batch retrieval partially
	implemented	inBio::DB::GenBank/GenPept
In-Reply-To: <000001c66efe$21dbcf80$15327e82@pyrimidine>
References: <000001c66efe$21dbcf80$15327e82@pyrimidine>
Message-ID: <44592D97.6090906@campus.iztacala.unam.mx>

hehehe :)

Chris Fields wrote:
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Wednesday, May 03, 2006 4:10 PM
>> To: 'Jason Stajich'; 'Brian Osborne'; bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Batch retrieval partially implemented
>> inBio::DB::GenBank/GenPept
>>
>> Just wanted to let you guys know I have added a few bits and pieces to
>> Bio::DB::Gen*  and BioLLDB::NCBIHelper for batch retrieval using
>                      ^^^^^^^^^^^^^^^^^^^
>                      Bio::DB::NCBIHelper
> Fat fingers!
> 
>> epost/efetch.  I didn't want to break anything too severely so you can
>> only
>> use this at the moment using get_seq_stream (i.e. NOT through get_Stream*
>> methods yet).  I also added tests to DB.t, a few each for protein and
>> nucleotide retrieval using batch mode and so far they all pass fine.
>>
>> I haven't tested the upper sequence limit for this yet to see if it's at
>> all
>> comparable to just using efetch but it seems a bit faster.  The eutils
>> coursebook states that one should only post ~500 at a time (I think you
>> can
>> get a bit higher though).
>>
>> Also, at the moment it only works at the moment for GI's (NOT accessions,
>> which apparently epost does not accept).  If we want to continue using
>> this
>> method for retrieval then we may need a workaround for accs.
>>
>> CJF
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From fernan at iib.unsam.edu.ar  Thu May  4 00:38:07 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Wed, 3 May 2006 21:38:07 -0300
Subject: [Bioperl-l] BioPerl-run in FreeBSD
In-Reply-To: <4457DDF8.4050005@campus.iztacala.unam.mx>
References: <4457DDF8.4050005@campus.iztacala.unam.mx>
Message-ID: <20060504003807.GA86447@iib.unsam.edu.ar>

+----[ Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> (02.May.2006 19:49):
|
| It?s my great pleasure to announce the availability of the BioPerl-run 
| packages (stable & developer releases) for the FreeBSD operating system.
| 
| For instructions on how to install BioPerl ports in FreeBSD, please take 
| a look into the Getting Bioperl section of the BioPerl Wiki.
| 
+----]

Great job Mauricio,

thanks for contributing this!

Fernan


From miker at biotiquesystems.com  Wed May  3 03:31:59 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Tue, 2 May 2006 20:31:59 -0700
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
Message-ID: <007b01c66e62$23161d20$c100a8c0@mike>


I've encountered a pretty serious bug in Bio::SeqIO when parsing certain genbank
files that contain CONTIG entries with gaps.  One such record is NW_925173.

When I try to parse this file using Bio::SeqIO::genbank, it will enter an
infinite loop and spin until it runs out of memory.  

I'm pretty certain it relates to this bug:
http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate that
genbank records with CONTIG gaps are not valid and can't be parsed.  But this
bug actually claims to be fixed, which is strange, since looking at the code for
FTLocationFactory (where the loop is) it's still right there.  I assume that
this may be fixed in other contexts but is still not fixed in
Bio::SeqIO::genbank?  Or am I doing something wrong?

I think that this should probably be filed as an open bug.  I would think that
even if bioperl isn't interested in parsing this type of file via SeqIO,
certainly you'd want to ensure that no finite input file would send the parser
into an infinite loop.  Have others encountered this problem?  Is there any plan
to address it?

Thanks very much for any information or help!

-Mike

P.S.  I've played around with my version of FTLocationFactory and it seems to
actually work and parse the gaps.  I'm not sure if I've created other bugs or if
it works in all cases, but at least the parser doesn't die.  I also don't know
that my hacky code is appropriate for putting back in to BioPerl, but I'm happy
to provide it if someone wants to check it out and/or consider it for checkin.


From ULNJUJERYDIX at spammotel.com  Wed May  3 08:20:38 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 3 May 2006 16:20:38 +0800
Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making with
	Bio::Graphics::Panel
Message-ID: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>

Help!
I can't figure out the docs instructions

I want to create an imagemap of short sequence matches with a longer one
with clickable imagemaps for the short sequences. I figure I can do this
easily enough using the example script for parsing blast output but I need
an example script to understand how to produce the html code for the
imagemap. I can find only rather cryptic references about how this can be
done (see below).

$boxes = $panel-E<gt>boxes
    @boxes = $panel-E<gt>boxes
    The boxes() method returns a list of arrayrefs containing the
coordinates of each glyph.  The method is useful for constructing an
image map.  In a scalar context, boxes() returns an arrayref.  In an
list context, the method returns the list directly.

    Each member of the list is an arrayref of the following format:

      [ $feature, $x1, $y1, $x2, $y2, $track ]

    The first element is the feature object; either an
Ace::Sequence::Feature, a Das::Segment::Feature, or another Bioperl
Bio::SeqFeatureI object.  The coordinates are the topleft and
bottomright corners of the glyph, including any space allocated for
labels. The track is the Bio::Graphics::Glyph object corresponding to
the track that the feature is rendered inside.

    $position = $panel-E<gt>track_position($track)
    After calling gd() or boxes(), you can learn the resulting Y
coordinate of a track by calling track_position() with the value
returned by add_track() or unshift_track().  This will return undef if
called before gd() or boxes() or with an invalid track.

    @pixel_coords = $panel-E<gt>location2pixel(@feature_coords)
    Public routine to map feature coordinates (in base pairs) into pixel
coordinates relative to the left-hand edge of the picture. If you
define a -background callback, the callback may wish to invoke this
routine in order to translate base coordinates into pixel coordinates.

    $left = $panel-E<gt>left
    $right = $panel-E<gt>right
    $top   = $panel-E<gt>top
    $bottom = $panel-E<gt>bottom
    Return the pixel coordinates of the *drawing area*     of the panel, that
is, exclusive of the padding.


got it from http://docs.bioperl.org/bioperl-live/Bio/Graphics/Panel.html


From s.johri at imperial.ac.uk  Thu May  4 12:50:34 2006
From: s.johri at imperial.ac.uk (Johri, Saurabh)
Date: Thu, 4 May 2006 13:50:34 +0100
Subject: [Bioperl-l] Fu and Li's D statistic - calculate
Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AB3@icex5.ic.ac.uk>

Hi all,

I'm trying to calculate Fu and Li's D summary statistic for a group of
sequences.
the function fu_and_li_D(@ingroup,$extmutations)  takes 2 args, the
first being the ingroup (population) and the second being the number of
external mutations
which is calculated from an outgroup sequence.. 
 
my question is, which function do i use to calculate the number of
external mutations ?
would this be the singleton_count() function ?
the singleton_count() function takes a PopGen object - which represents
a clustal alignment file...
would i include the outgroup in a multiple fasta file for alignment with
clustal ?
 
any suggestions as to how to calculate the number of external mutations
would be much appreciated
 
Thanks for your help!
 

Saurabh Johri
Centre for Molecular Microbiology & Infection
Imperial College London
SW7 2AZ

 
From hlapp at gmx.net  Thu May  4 16:30:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 4 May 2006 12:30:05 -0400
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike>
References: <007b01c66e62$23161d20$c100a8c0@mike>
Message-ID: <C9D4D0CB-8340-4157-A603-3935C8F581E6@gmx.net>

Infinite loop on a file you can download (i.e., as opposed to a file  
you tinkered with) is never ok. Could you file this as a bug report?  
And ideally attach your patch?

Thanks,

	-hilmar

On May 2, 2006, at 11:31 PM, Michael Rogoff wrote:

>
> I've encountered a pretty serious bug in Bio::SeqIO when parsing  
> certain genbank
> files that contain CONTIG entries with gaps.  One such record is  
> NW_925173.
>
> When I try to parse this file using Bio::SeqIO::genbank, it will  
> enter an
> infinite loop and spin until it runs out of memory.
>
> I'm pretty certain it relates to this bug:
> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to  
> indicate that
> genbank records with CONTIG gaps are not valid and can't be  
> parsed.  But this
> bug actually claims to be fixed, which is strange, since looking at  
> the code for
> FTLocationFactory (where the loop is) it's still right there.  I  
> assume that
> this may be fixed in other contexts but is still not fixed in
> Bio::SeqIO::genbank?  Or am I doing something wrong?
>
> I think that this should probably be filed as an open bug.  I would  
> think that
> even if bioperl isn't interested in parsing this type of file via  
> SeqIO,
> certainly you'd want to ensure that no finite input file would send  
> the parser
> into an infinite loop.  Have others encountered this problem?  Is  
> there any plan
> to address it?
>
> Thanks very much for any information or help!
>
> -Mike
>
> P.S.  I've played around with my version of FTLocationFactory and  
> it seems to
> actually work and parse the gaps.  I'm not sure if I've created  
> other bugs or if
> it works in all cases, but at least the parser doesn't die.  I also  
> don't know
> that my hacky code is appropriate for putting back in to BioPerl,  
> but I'm happy
> to provide it if someone wants to check it out and/or consider it  
> for checkin.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From saldroubi at yahoo.com  Thu May  4 17:03:00 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Thu, 4 May 2006 10:03:00 -0700 (PDT)
Subject: [Bioperl-l] Is webiste down?
Message-ID: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>

All,
  
  Is the bioperl website down?  I can't get to http://www.bioperl.org 
  
  
  Thank you. 
  

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From arareko at campus.iztacala.unam.mx  Thu May  4 18:22:52 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 04 May 2006 13:22:52 -0500
Subject: [Bioperl-l] Is webiste down?
In-Reply-To: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>
References: <20060504170300.12178.qmail@web34301.mail.mud.yahoo.com>
Message-ID: <445A467C.4070700@campus.iztacala.unam.mx>

Website is ok, maybe your gateway can't lookup the bioperl server at the 
moment.

Regards,
Mauricio.

Sam Al-Droubi wrote:
> All,
>   
>   Is the bioperl website down?  I can't get to http://www.bioperl.org 
>   
>   
>   Thank you. 
>   
> 
> 
> Sincerely, 
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Thu May  4 18:40:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 4 May 2006 13:40:32 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <007b01c66e62$23161d20$c100a8c0@mike>
Message-ID: <000001c66faa$3a25b130$15327e82@pyrimidine>

Are you using the CONTIG record or the full GenBank file? 	I see
problems with both (using bioperl-live) which seem unrelated to one another.
The full file seems to be running a bit slow b/c the full GenBank record is
huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
memory).

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> Sent: Tuesday, May 02, 2006 10:32 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> 
> I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> genbank
> files that contain CONTIG entries with gaps.  One such record is
> NW_925173.
> 
> When I try to parse this file using Bio::SeqIO::genbank, it will enter an
> infinite loop and spin until it runs out of memory.
> 
> I'm pretty certain it relates to this bug:
> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> that
> genbank records with CONTIG gaps are not valid and can't be parsed.  But
> this
> bug actually claims to be fixed, which is strange, since looking at the
> code for
> FTLocationFactory (where the loop is) it's still right there.  I assume
> that
> this may be fixed in other contexts but is still not fixed in
> Bio::SeqIO::genbank?  Or am I doing something wrong?
> 
> I think that this should probably be filed as an open bug.  I would think
> that
> even if bioperl isn't interested in parsing this type of file via SeqIO,
> certainly you'd want to ensure that no finite input file would send the
> parser
> into an infinite loop.  Have others encountered this problem?  Is there
> any plan
> to address it?
> 
> Thanks very much for any information or help!
> 
> -Mike
> 
> P.S.  I've played around with my version of FTLocationFactory and it seems
> to
> actually work and parse the gaps.  I'm not sure if I've created other bugs
> or if
> it works in all cases, but at least the parser doesn't die.  I also don't
> know
> that my hacky code is appropriate for putting back in to BioPerl, but I'm
> happy
> to provide it if someone wants to check it out and/or consider it for
> checkin.
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From j.abbott at imperial.ac.uk  Thu May  4 15:44:44 2006
From: j.abbott at imperial.ac.uk (James Abbott)
Date: Thu, 04 May 2006 16:44:44 +0100
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or
	RC	lines
In-Reply-To: <7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>
References: <20060502114101.29745.qmail@web50409.mail.yahoo.com>
	<7B49D031-9F74-43C3-AA4F-2AE115BB843D@duke.edu>
Message-ID: <445A216C.7090108@imperial.ac.uk>

Jason Stajich wrote:
> I don't know if any of this has been resolved really so hopefully  
> James will speak up if he's implemented anything.
Not as yet, I'm afraid - $job is keeping me overly busy at the moment, 
but it's on my todo list....

Cheers,
James

-- 
Dr. James Abbott <j.abbott at imperial.ac.uk>
Bioinformatics Software Developer, Bioinformatics Support Service
Imperial College, London


From hubert.prielinger at gmx.at  Thu May  4 19:35:42 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 13:35:42 -0600
Subject: [Bioperl-l] can't parse blast file anymore
Message-ID: <445A578E.8050207@gmx.at>

Hi,
the following perl script worked fine until a few days ago....

==============================================================
#!/usr/bin/perl -w

use Bio::SearchIO;
use strict;
use DBI;
use Net::MySQL;

#use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux);

print "trying to connect to database \n";
my $database = 'antimicro_peptides';
my $host = 'ppc7.bio.ucalgary.ca';
my $user = 'Hubert';
my $password = 'Col00eng30';

my $mysql = Net::MySQL->new(
        hostname => $host,
        database => $database,
        user     => $user,
        password => $password,
    );
   

print "Connection established \n";

my $selectID = 0;
my $count = 0;


##output database results
#while (my @row = $sth->fetchrow_array)
#   { print "@row\n" }


print "start program\n";
my $directory = '/home/Hubert/test';
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
  if ($file =~ /txt$/)   {
      $count++;
    print "read file $file \n";
  

    $file = $directory . '/' . $file;

    my $search = new Bio::SearchIO (-format => 'blast',
                                       -file => $file);
    print "bioperl seems to work....\n";                           
    my $cutoff_len = 10;
                               
    #iterate over each query sequence
    print "try to enter while loop\n";
    while (my $result = $search->next_result) {
    print "entered 1st while loop\n";
   
      #iterate over each hit on the query sequence
      while (my $hit = $result->next_hit) {
      print "entered 2nd while loop\n";
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
        print "entered 3rd while loop\n";
           
          if ($hsp->length('sbjct') <= $cutoff_len) {
          #print $hsp->hit_string, "\n";
               
            for ($hsp->hit_string) {        #$hsp->hit_string
             print "count files....., $count ,\n";
.................

===================================================================

Output:

[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
trying to connect to database
Connection established
start program
opened directory
read file 40026.txt
bioperl seems to work....
try to enter while loop


but it doesn't enter the first while loop, it stuck there, first I 
thought it is a linux problem, because I updated from FC4 to FC5, but it 
isn't because perl is working fine, and it seems bioperl is working fine 
too, but it cannot parse the file anymore.....

regards
Hubert


From barry.moore at genetics.utah.edu  Thu May  4 21:22:51 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 4 May 2006 15:22:51 -0600
Subject: [Bioperl-l] [BULK]   can't parse blast file anymore
In-Reply-To: <445A578E.8050207@gmx.at>
References: <445A578E.8050207@gmx.at>
Message-ID: <BD1D97AA-99BD-451C-9835-4F22A59BCFDD@genetics.utah.edu>

Hubert,

My first suggestion would be to log onto your calgary server and  
change your password real quick (unless that is intended to post you  
password to the world).  Well, this isn't an answer, but it may help  
you find one.  Use perl -d your_script.pl to run your script under  
the debugger.  Type 'n' to step forward to the line where you start  
the while loop.  Type 'x $result' to see that an object exists (it  
should or you'd have gotten an error).  Type 's' to step into the  
next_results call, and then continue to type 'n' and 's' as needed to  
burrow down to see if you can find where you're hanging.

Barry

On May 4, 2006, at 1:35 PM, Hubert Prielinger wrote:

> Hi,
> the following perl script worked fine until a few days ago....
>
> ==============================================================
> #!/usr/bin/perl -w
>
> use Bio::SearchIO;
> use strict;
> use DBI;
> use Net::MySQL;
>
> #use lib qw(/usr/local/lib/perl5/site_perl/5.8.6/i686-linux);
>
> print "trying to connect to database \n";
> my $database = 'antimicro_peptides';
> my $host = 'ppc7.bio.ucalgary.ca';
> my $user = 'Hubert';
> my $password = 'Col00eng30';
>
> my $mysql = Net::MySQL->new(
>         hostname => $host,
>         database => $database,
>         user     => $user,
>         password => $password,
>     );
>
>
> print "Connection established \n";
>
> my $selectID = 0;
> my $count = 0;
>
>
>
> ##output database results
> #while (my @row = $sth->fetchrow_array)
> #   { print "@row\n" }
>
>
>
> print "start program\n";
> my $directory = '/home/Hubert/test';
> opendir(DIR, $directory) || die("Cannot open directory");
> print "opened directory\n";
>
> foreach my $file (readdir(DIR))  {
>   if ($file =~ /txt$/)   {
>       $count++;
>     print "read file $file \n";
>
>
>     $file = $directory . '/' . $file;
>
>     my $search = new Bio::SearchIO (-format => 'blast',
>                                        -file => $file);
>     print "bioperl seems to work....\n";
>     my $cutoff_len = 10;
>
>     #iterate over each query sequence
>     print "try to enter while loop\n";
>     while (my $result = $search->next_result) {
>     print "entered 1st while loop\n";
>
>       #iterate over each hit on the query sequence
>       while (my $hit = $result->next_hit) {
>       print "entered 2nd while loop\n";
>
>         #iterate over each HSP in the hit
>         while (my $hsp = $hit->next_hsp) {
>         print "entered 3rd while loop\n";
>
>           if ($hsp->length('sbjct') <= $cutoff_len) {
>           #print $hsp->hit_string, "\n";
>
>             for ($hsp->hit_string) {        #$hsp->hit_string
>              print "count files....., $count ,\n";
> .................
>
> ===================================================================
>
> Output:
>
> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
> trying to connect to database
> Connection established
> start program
> opened directory
> read file 40026.txt
> bioperl seems to work....
> try to enter while loop
>
>
> but it doesn't enter the first while loop, it stuck there, first I
> thought it is a linux problem, because I updated from FC4 to FC5,  
> but it
> isn't because perl is working fine, and it seems bioperl is working  
> fine
> too, but it cannot parse the file anymore.....
>
> regards
> Hubert
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu May  4 22:27:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 4 May 2006 17:27:57 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <000001c66faa$3a25b130$15327e82@pyrimidine>
Message-ID: <000001c66fc9$fe7e5680$15327e82@pyrimidine>

Here's another odd bit.  This is what I get for the CONTIG line when I
passed a simple contig file (NW_925062, with one join) through Bio::SeqIO:

-----------------------------------
....
FEATURES             Location/Qualifiers
     source          1..8541
                     /db_xref="taxon:9606"
                     /mol_type="genomic DNA"
                     /chromosome="11"
                     /organism="Homo sapiens"
CONTIG      AADB02014027.1:1..8541

//
-----------------------------------
Here's the original:
-----------------------------------
FEATURES             Location/Qualifiers
     source          1..8541
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /chromosome="11"
CONTIG      join(AADB02014027.1:1..8541)
//
-----------------------------------

Looks like it lopped out the 'join' here as well.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, May 04, 2006 1:41 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> Are you using the CONTIG record or the full GenBank file? 	I see
> problems with both (using bioperl-live) which seem unrelated to one
> another.
> The full file seems to be running a bit slow b/c the full GenBank record
> is
> huge (~55 MB) but the CONTIG file does exactly what you said (runs out of
> memory).
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > Sent: Tuesday, May 02, 2006 10:32 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> >
> > I've encountered a pretty serious bug in Bio::SeqIO when parsing certain
> > genbank
> > files that contain CONTIG entries with gaps.  One such record is
> > NW_925173.
> >
> > When I try to parse this file using Bio::SeqIO::genbank, it will enter
> an
> > infinite loop and spin until it runs out of memory.
> >
> > I'm pretty certain it relates to this bug:
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to indicate
> > that
> > genbank records with CONTIG gaps are not valid and can't be parsed.  But
> > this
> > bug actually claims to be fixed, which is strange, since looking at the
> > code for
> > FTLocationFactory (where the loop is) it's still right there.  I assume
> > that
> > this may be fixed in other contexts but is still not fixed in
> > Bio::SeqIO::genbank?  Or am I doing something wrong?
> >
> > I think that this should probably be filed as an open bug.  I would
> think
> > that
> > even if bioperl isn't interested in parsing this type of file via SeqIO,
> > certainly you'd want to ensure that no finite input file would send the
> > parser
> > into an infinite loop.  Have others encountered this problem?  Is there
> > any plan
> > to address it?
> >
> > Thanks very much for any information or help!
> >
> > -Mike
> >
> > P.S.  I've played around with my version of FTLocationFactory and it
> seems
> > to
> > actually work and parse the gaps.  I'm not sure if I've created other
> bugs
> > or if
> > it works in all cases, but at least the parser doesn't die.  I also
> don't
> > know
> > that my hacky code is appropriate for putting back in to BioPerl, but
> I'm
> > happy
> > to provide it if someone wants to check it out and/or consider it for
> > checkin.
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Thu May  4 22:39:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 4 May 2006 18:39:05 -0400
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <000001c66fc9$fe7e5680$15327e82@pyrimidine>
References: <000001c66fc9$fe7e5680$15327e82@pyrimidine>
Message-ID: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net>

The two notations are equivalent and syntactically correct, or so I  
believe ... I don't think 100% verbatim preservation should be the  
goal. Or am I missing the point?

On May 4, 2006, at 6:27 PM, Chris Fields wrote:

> Here's another odd bit.  This is what I get for the CONTIG line when I
> passed a simple contig file (NW_925062, with one join) through  
> Bio::SeqIO:
>
> -----------------------------------
> ....
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /db_xref="taxon:9606"
>                      /mol_type="genomic DNA"
>                      /chromosome="11"
>                      /organism="Homo sapiens"
> CONTIG      AADB02014027.1:1..8541
>
> //
> -----------------------------------
> Here's the original:
> -----------------------------------
> FEATURES             Location/Qualifiers
>      source          1..8541
>                      /organism="Homo sapiens"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:9606"
>                      /chromosome="11"
> CONTIG      join(AADB02014027.1:1..8541)
> //
> -----------------------------------
>
> Looks like it lopped out the 'join' here as well.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, May 04, 2006 1:41 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>
>> Are you using the CONTIG record or the full GenBank file? 	I see
>> problems with both (using bioperl-live) which seem unrelated to one
>> another.
>> The full file seems to be running a bit slow b/c the full GenBank  
>> record
>> is
>> huge (~55 MB) but the CONTIG file does exactly what you said (runs  
>> out of
>> memory).
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
>>> Sent: Tuesday, May 02, 2006 10:32 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
>>>
>>>
>>> I've encountered a pretty serious bug in Bio::SeqIO when parsing  
>>> certain
>>> genbank
>>> files that contain CONTIG entries with gaps.  One such record is
>>> NW_925173.
>>>
>>> When I try to parse this file using Bio::SeqIO::genbank, it will  
>>> enter
>> an
>>> infinite loop and spin until it runs out of memory.
>>>
>>> I'm pretty certain it relates to this bug:
>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to  
>>> indicate
>>> that
>>> genbank records with CONTIG gaps are not valid and can't be  
>>> parsed.  But
>>> this
>>> bug actually claims to be fixed, which is strange, since looking  
>>> at the
>>> code for
>>> FTLocationFactory (where the loop is) it's still right there.  I  
>>> assume
>>> that
>>> this may be fixed in other contexts but is still not fixed in
>>> Bio::SeqIO::genbank?  Or am I doing something wrong?
>>>
>>> I think that this should probably be filed as an open bug.  I would
>> think
>>> that
>>> even if bioperl isn't interested in parsing this type of file via  
>>> SeqIO,
>>> certainly you'd want to ensure that no finite input file would  
>>> send the
>>> parser
>>> into an infinite loop.  Have others encountered this problem?  Is  
>>> there
>>> any plan
>>> to address it?
>>>
>>> Thanks very much for any information or help!
>>>
>>> -Mike
>>>
>>> P.S.  I've played around with my version of FTLocationFactory and it
>> seems
>>> to
>>> actually work and parse the gaps.  I'm not sure if I've created  
>>> other
>> bugs
>>> or if
>>> it works in all cases, but at least the parser doesn't die.  I also
>> don't
>>> know
>>> that my hacky code is appropriate for putting back in to BioPerl,  
>>> but
>> I'm
>>> happy
>>> to provide it if someone wants to check it out and/or consider it  
>>> for
>>> checkin.
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hubert.prielinger at gmx.at  Thu May  4 23:57:44 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 17:57:44 -0600
Subject: [Bioperl-l] can't parse blast file anymore
In-Reply-To: <445A7449.1080607@infotech.monash.edu.au>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
Message-ID: <445A94F8.9000903@gmx.at>

Torsten Seemann wrote:
> Hubert
>
>> the following perl script worked fine until a few days ago....
>>
>>    #iterate over each query sequence
>>    print "try to enter while loop\n";
>>  
>>
> die "Bad BLAST report" if not defined $search;
>
>>    while (my $result = $search->next_result) {
>>    print "entered 1st while loop\n";
>>
>> Output:
>>
>> [Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
>> try to enter while loop
>>
>> but it doesn't enter the first while loop, it stuck there, first I  
>>
> What is the value of $search before you start the WHILE loop ?
>
>


hi,
$search is defined, like

my $search = new Bio::SearchIO (-format => 'blast',
                                       -file => $file)


if I try it with the debugger as barry has suggested than I get the following

 
DB<1> n
main::(Blast.pl:24):    print "Connection established \n";
  DB<1> n
Connection established
main::(Blast.pl:26):    my $selectID = 0;
  DB<1> n
main::(Blast.pl:27):    my $count = 0;
  DB<1> n
main::(Blast.pl:37):    print "start program\n";
  DB<1> n
start program
main::(Blast.pl:38):    my $directory = '/home/Hubert/test';
  DB<1> n
main::(Blast.pl:39):    opendir(DIR, $directory) || die("Cannot open 
directory");
  DB<1> n
main::(Blast.pl:40):    print "opened directory\n";
  DB<1> n
opened directory
main::(Blast.pl:42):    foreach my $file (readdir(DIR))  {
  DB<1> n
main::(Blast.pl:43):      if ($file =~ /txt$/)   {
  DB<1> n
main::(Blast.pl:44):            $count++;
  DB<1> n
main::(Blast.pl:45):        print "read file $file \n";
  DB<1> n
read file 40026.txt
main::(Blast.pl:48):        $file = $directory . '/' . $file;
  DB<1> n
main::(Blast.pl:50):        my $search = new Bio::SearchIO (-format => 
'blast',
main::(Blast.pl:51):                                                           
-file => $file);
  DB<1> n
main::(Blast.pl:52):            print "bioperl seems to work....\n";
  DB<1> s $search
main::((eval 14)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
3:      $search;
  DB<<2>> n

  DB<2> n
bioperl seems to work....
main::(Blast.pl:53):        my $cutoff_len = 10;
  DB<2> n
main::(Blast.pl:56):        print "try to enter while loop\n";
  DB<2> n
try to enter while loop
main::(Blast.pl:57):        while (my $result = $search->next_result) {
  DB<2> s $result
main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
3:      $result;
  DB<<3>>


From torsten.seemann at infotech.monash.edu.au  Thu May  4 21:38:17 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 05 May 2006 07:38:17 +1000
Subject: [Bioperl-l] can't parse blast file anymore
In-Reply-To: <445A578E.8050207@gmx.at>
References: <445A578E.8050207@gmx.at>
Message-ID: <445A7449.1080607@infotech.monash.edu.au>

Hubert

>the following perl script worked fine until a few days ago....
>
>    #iterate over each query sequence
>    print "try to enter while loop\n";
>  
>
die "Bad BLAST report" if not defined $search;

>    while (my $result = $search->next_result) {
>    print "entered 1st while loop\n";
>
>Output:
>
>[Hubert at ppc7 Database_Search]$ /usr/bin/perl Blast.pl
>try to enter while loop
>
>but it doesn't enter the first while loop, it stuck there, first I 
>  
>
What is the value of $search before you start the WHILE loop ?


From barry.moore at genetics.utah.edu  Fri May  5 00:39:57 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 4 May 2006 18:39:57 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445A94F8.9000903@gmx.at>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
	<445A94F8.9000903@gmx.at>
Message-ID: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>

That should be 'x $resust' and you should see the object dumped to  
the screen.

or just 's' by itself which will step you into the sub on the while  
line will step you into the next_result sub, and you can look around  
and watch what's happening.

B

>   DB<2> s $result
> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
> 3:      $result;
>   DB<<3>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May  5 02:04:20 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 04 May 2006 20:04:20 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>
	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
Message-ID: <445AB2A4.7020405@gmx.at>

if I do so it returns:
0 undef


Barry Moore wrote:
> That should be 'x $resust' and you should see the object dumped to  
> the screen.
>
> or just 's' by itself which will step you into the sub on the while  
> line will step you into the next_result sub, and you can look around  
> and watch what's happening.
>
> B
>
>   
>>   DB<2> s $result
>> main::((eval 15)[/usr/lib/perl5/5.8.8/perl5db.pl:628]:3):
>> 3:      $result;
>>   DB<<3>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From torsten.seemann at infotech.monash.edu.au  Fri May  5 04:40:34 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 05 May 2006 14:40:34 +1000
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445AB2A4.7020405@gmx.at>
References: <445A578E.8050207@gmx.at> <445A7449.1080607@infotech.monash.edu.au>
	<445A94F8.9000903@gmx.at>
	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>
	<445AB2A4.7020405@gmx.at>
Message-ID: <445AD742.4070408@infotech.monash.edu.au>

Hubert Prielinger wrote:
> if I do so it returns:
> 0 undef

That means the value of $search was undef.
That means that it could not parse or open the BLAST report.
I repeat the line that I put in my earlier email which you ignored.

# your line
my $search = Bio::SearchIO->new( ..... );

# then check if it was successful!
die "could not open blast report" if not defined $search;

--Torsten


From jason.stajich at duke.edu  Fri May  5 13:21:38 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 5 May 2006 09:21:38 -0400
Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files
In-Reply-To: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
Message-ID: <B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>

Space after the > is causing the problem since we infer the ID as the  
everything after the '>' BEFORE the first whitespace.  Get rid of the  
space.
   $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE

On May 4, 2006, at 7:00 PM, Gloria Rendon wrote:

> contents of the input file has a single sequence:
>
>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel
> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNFS
> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN
> ------------------------------------------
> this is the script that tries to parse it:
>
> use Bio::AlignIO;
> my $inseq = Bio::AlignIO->new(-format => 'fasta',
>                            -file   => 'test.fasta');
> while( my $aln = $inseq->next_aln ) {
>      print "name: ", $aln->displayname;
>      print "length: ", $aln->length;
>      print "\n";
> }
>
> ------------------------------------------
> and this is the result of running that script on winxp
>
> D:\msa\NAK MUTANTS>perl parseFasta.pl
>
>
> ------------- EXCEPTION  -------------
> MSG: No sequence with name []
> STACK Bio::SimpleAlign::displayname
> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047
> STACK toplevel parseFasta.pl:11
>
> --------------------------------------
> D:\msa\NAK MUTANTS>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From thoufek at pngg.org  Thu May  4 16:50:44 2006
From: thoufek at pngg.org (T.D. Houfek)
Date: Thu, 04 May 2006 12:50:44 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
Message-ID: <445A30E4.6070103@pngg.org>

Using Bioperl 1.5, having trouble with writing FASTA-style quality files 
using Bio::Seq::Quality.

I create the Bio::Seq::Quality object, giving its constructor an ID, a 
description, a nucleotide sequence, and a quality sequence. I then write 
the sequence FASTA and the quality FASTA. The description string will 
appear in the header line of the sequence FASTA, but not in the header 
line of the quality FASTA.

Can anybody help me figure out how to fix this? I've attached a sample 
script and output.

-T.D.

------------------- sample script follows 
---------------------------------------

#!/usr/bin/perl
use strict;
use Bio::Seq::Quality;
use Bio::SeqIO;

my $id = "bogus_id";
my $desc = "bogus description";
my $seq = "ATTATTATTATTATT";
my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";

my $sequal_obj = Bio::Seq::Quality->new(
-display_id => $id,
-desc => $desc,
-seq => $seq,
-qual => $qual
);

my $qualout = Bio::SeqIO->new(
-file => ">myfile.qual",
-format => 'qual'
);
my $seqout = Bio::SeqIO->new(
-file => ">myfile.seq",
-format => 'Fasta'
);

$seqout->write_seq($sequal_obj);
$qualout->write_seq($sequal_obj);


------------------ sample output follows 
---------------------------------------

tdhoufek at aether:~$ cat myfile.seq
 >bogus_id bogus description
ATTATTATTATTATT
tdhoufek at aether:~$ cat myfile.qual
 >bogus_id
10 20 30 10 20 30 10 20 30 10 20 30 10 20 30

--------------------------------------------------------------------------------------------------


-- 
T.D. Houfek
senior bioinformatics developer
plant nematode genetics group
north carolina state university
Email: thoufek at pngg.org
----------------------------------------------------------
use Bio::Seq; @a =qw/NNN CCT GAG CAT GCG TGT AAG AAC TAG/;
$u=seq;$r=Bio::Seq;sub c{$c=$r->new(-$u=>"@_[0]")->revcom;
$t=$c->$u;}map{m/\d/?$g=c($a[$_]):tr/a-i/1-9/&&($g=$a[$_])
;$x[$i++]=$g;} split //,"dgh5cb40ab120cdefb4";$z=$r->new(-
$u=>(join"", at x))->translate()->$u;$z =~s/X/ /g;print"$z\n"


From jason.stajich at duke.edu  Fri May  5 13:27:51 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 5 May 2006 09:27:51 -0400
Subject: [Bioperl-l] bioperl-AlignIO problems parsing fasta files
In-Reply-To: <B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>
References: <5d35ac4d.b35ce863.8198d00@expms1.cites.uiuc.edu>
	<B55699C6-AA91-4D32-A288-69C64379A48C@duke.edu>
Message-ID: <0F79C9AD-DE36-4424-9E59-37ABE8B62A5E@duke.edu>

[replying to myself]

although if you are trying to just read a sequence not an alignment  
then you want to use Bio::SeqIO.

See the copious help on the HOWTO page at bioperl website including a  
sequence and feature howto and beginner's guide.
  http://bioperl.org/wiki/HOWTOs

-jason
On May 5, 2006, at 9:21 AM, Jason Stajich wrote:

> Space after the > is causing the problem since we infer the ID as the
> everything after the '>' BEFORE the first whitespace.  Get rid of the
> space.
>    $ perl -i.backup -p -e 's/^>\s+/>/' YOURFASALNFILE
>
> On May 4, 2006, at 7:00 PM, Gloria Rendon wrote:
>
>> contents of the input file has a single sequence:
>>
>>> gi|90108701|pdb|2AHZ|B Chain B, K+ Complex Of The Nak Channel
>> MLSFLLTLKRMLRACLRAWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFSVVTLTTVGDGNF 
>> S
>> PQTDFGKIFTILYIFIGIGLVFGFIHKLAVNVQLPSILSN
>> ------------------------------------------
>> this is the script that tries to parse it:
>>
>> use Bio::AlignIO;
>> my $inseq = Bio::AlignIO->new(-format => 'fasta',
>>                            -file   => 'test.fasta');
>> while( my $aln = $inseq->next_aln ) {
>>      print "name: ", $aln->displayname;
>>      print "length: ", $aln->length;
>>      print "\n";
>> }
>>
>> ------------------------------------------
>> and this is the result of running that script on winxp
>>
>> D:\msa\NAK MUTANTS>perl parseFasta.pl
>>
>>
>> ------------- EXCEPTION  -------------
>> MSG: No sequence with name []
>> STACK Bio::SimpleAlign::displayname
>> C:/Perl/site/lib/Bio/SimpleAlign.pm:2047
>> STACK toplevel parseFasta.pl:11
>>
>> --------------------------------------
>> D:\msa\NAK MUTANTS>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From osborne1 at optonline.net  Fri May  5 14:04:02 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 05 May 2006 10:04:02 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
Message-ID: <C080D392.8567%osborne1@optonline.net>

T.D.,

According to the documentation,
http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file looks
right. What are you trying to create?

Brian O.


On 5/4/06 12:50 PM, "T.D. Houfek" <thoufek at pngg.org> wrote:

> Using Bioperl 1.5, having trouble with writing FASTA-style quality files
> using Bio::Seq::Quality.
> 
> I create the Bio::Seq::Quality object, giving its constructor an ID, a
> description, a nucleotide sequence, and a quality sequence. I then write
> the sequence FASTA and the quality FASTA. The description string will
> appear in the header line of the sequence FASTA, but not in the header
> line of the quality FASTA.
> 
> Can anybody help me figure out how to fix this? I've attached a sample
> script and output.
> 
> -T.D.
> 
> ------------------- sample script follows
> ---------------------------------------
> 
> #!/usr/bin/perl
> use strict;
> use Bio::Seq::Quality;
> use Bio::SeqIO;
> 
> my $id = "bogus_id";
> my $desc = "bogus description";
> my $seq = "ATTATTATTATTATT";
> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";
> 
> my $sequal_obj = Bio::Seq::Quality->new(
> -display_id => $id,
> -desc => $desc,
> -seq => $seq,
> -qual => $qual
> );
> 
> my $qualout = Bio::SeqIO->new(
> -file => ">myfile.qual",
> -format => 'qual'
> );
> my $seqout = Bio::SeqIO->new(
> -file => ">myfile.seq",
> -format => 'Fasta'
> );
> 
> $seqout->write_seq($sequal_obj);
> $qualout->write_seq($sequal_obj);
> 
> 
> ------------------ sample output follows
> ---------------------------------------
> 
> tdhoufek at aether:~$ cat myfile.seq
>> bogus_id bogus description
> ATTATTATTATTATT
> tdhoufek at aether:~$ cat myfile.qual
>> bogus_id
> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
> 
> ------------------------------------------------------------------------------
> --------------------
> 
> 
> 


From cjfields at uiuc.edu  Fri May  5 14:24:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 09:24:05 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <2E0D7723-FA6E-4812-8DBB-30FCD11FA85C@gmx.net>
Message-ID: <001701c6704f$90dbd090$15327e82@pyrimidine>

I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk
from the longer file Michael used as an example here (NW_925173). I believe
the CONTIG line is currently handled like a feature so I think it goes
through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix is;
I think it's getting beaten up in there somehow. I may see what happens if
it's treated like a WGS line (like a Bio::Annotation::SimpleValue object)
and just glob the whole mess together as is.


Chris

...
FEATURES             Location/Qualifiers
     source          1..44976370
                     /organism="Homo sapiens"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:9606"
                     /chromosome="11"
CONTIG      join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321,
            gap(441),AADB02014318.1:1..173584,gap(676),
            AADB02014319.1:1..377558,gap(20),
            complement(AADB02014320.1:1..431263),gap(20),
            AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198,
            gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771,
            gap(4611),AADB02014325.1:1..383881,gap(20),
            complement(AADB02014326.1:1..381633),gap(1930),
            complement(AADB02014327.1:1..460053),gap(20),
            AADB02014328.1:1..4186,gap(1587),
...

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Thursday, May 04, 2006 5:39 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> The two notations are equivalent and syntactically correct, or so I
> believe ... I don't think 100% verbatim preservation should be the
> goal. Or am I missing the point?
> 
> On May 4, 2006, at 6:27 PM, Chris Fields wrote:
> 
> > Here's another odd bit.  This is what I get for the CONTIG line when I
> > passed a simple contig file (NW_925062, with one join) through
> > Bio::SeqIO:
> >
> > -----------------------------------
> > ....
> > FEATURES             Location/Qualifiers
> >      source          1..8541
> >                      /db_xref="taxon:9606"
> >                      /mol_type="genomic DNA"
> >                      /chromosome="11"
> >                      /organism="Homo sapiens"
> > CONTIG      AADB02014027.1:1..8541
> >
> > //
> > -----------------------------------
> > Here's the original:
> > -----------------------------------
> > FEATURES             Location/Qualifiers
> >      source          1..8541
> >                      /organism="Homo sapiens"
> >                      /mol_type="genomic DNA"
> >                      /db_xref="taxon:9606"
> >                      /chromosome="11"
> > CONTIG      join(AADB02014027.1:1..8541)
> > //
> > -----------------------------------
> >
> > Looks like it lopped out the 'join' here as well.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >> Sent: Thursday, May 04, 2006 1:41 PM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >>
> >> Are you using the CONTIG record or the full GenBank file? 	I see
> >> problems with both (using bioperl-live) which seem unrelated to one
> >> another.
> >> The full file seems to be running a bit slow b/c the full GenBank
> >> record
> >> is
> >> huge (~55 MB) but the CONTIG file does exactly what you said (runs
> >> out of
> >> memory).
> >>
> >> Chris
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> >>> Sent: Tuesday, May 02, 2006 10:32 PM
> >>> To: bioperl-l at lists.open-bio.org
> >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >>>
> >>>
> >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing
> >>> certain
> >>> genbank
> >>> files that contain CONTIG entries with gaps.  One such record is
> >>> NW_925173.
> >>>
> >>> When I try to parse this file using Bio::SeqIO::genbank, it will
> >>> enter
> >> an
> >>> infinite loop and spin until it runs out of memory.
> >>>
> >>> I'm pretty certain it relates to this bug:
> >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to
> >>> indicate
> >>> that
> >>> genbank records with CONTIG gaps are not valid and can't be
> >>> parsed.  But
> >>> this
> >>> bug actually claims to be fixed, which is strange, since looking
> >>> at the
> >>> code for
> >>> FTLocationFactory (where the loop is) it's still right there.  I
> >>> assume
> >>> that
> >>> this may be fixed in other contexts but is still not fixed in
> >>> Bio::SeqIO::genbank?  Or am I doing something wrong?
> >>>
> >>> I think that this should probably be filed as an open bug.  I would
> >> think
> >>> that
> >>> even if bioperl isn't interested in parsing this type of file via
> >>> SeqIO,
> >>> certainly you'd want to ensure that no finite input file would
> >>> send the
> >>> parser
> >>> into an infinite loop.  Have others encountered this problem?  Is
> >>> there
> >>> any plan
> >>> to address it?
> >>>
> >>> Thanks very much for any information or help!
> >>>
> >>> -Mike
> >>>
> >>> P.S.  I've played around with my version of FTLocationFactory and it
> >> seems
> >>> to
> >>> actually work and parse the gaps.  I'm not sure if I've created
> >>> other
> >> bugs
> >>> or if
> >>> it works in all cases, but at least the parser doesn't die.  I also
> >> don't
> >>> know
> >>> that my hacky code is appropriate for putting back in to BioPerl,
> >>> but
> >> I'm
> >>> happy
> >>> to provide it if someone wants to check it out and/or consider it
> >>> for
> >>> checkin.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Fri May  5 14:47:50 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 5 May 2006 10:47:50 -0400
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <C080D392.8567%osborne1@optonline.net>
References: <C080D392.8567%osborne1@optonline.net>
Message-ID: <2E1683FE-57E4-4D97-A958-1B529973E89E@gmx.net>

He wants the description on the description line, like for the  
sequence file.

Thomas, my guess is the code doesn't print the description to the  
line although I haven't made sure. Do you want to volunteer and  
check, add that print statement and post the patch?

	-hilmar

On May 5, 2006, at 10:04 AM, Brian Osborne wrote:

> T.D.,
>
> According to the documentation,
> http://www.bioperl.org/wiki/Qual_sequence_format, your *qual file  
> looks
> right. What are you trying to create?
>
> Brian O.
>
>
> On 5/4/06 12:50 PM, "T.D. Houfek" <thoufek at pngg.org> wrote:
>
>> Using Bioperl 1.5, having trouble with writing FASTA-style quality  
>> files
>> using Bio::Seq::Quality.
>>
>> I create the Bio::Seq::Quality object, giving its constructor an  
>> ID, a
>> description, a nucleotide sequence, and a quality sequence. I then  
>> write
>> the sequence FASTA and the quality FASTA. The description string will
>> appear in the header line of the sequence FASTA, but not in the  
>> header
>> line of the quality FASTA.
>>
>> Can anybody help me figure out how to fix this? I've attached a  
>> sample
>> script and output.
>>
>> -T.D.
>>
>> ------------------- sample script follows
>> ---------------------------------------
>>
>> #!/usr/bin/perl
>> use strict;
>> use Bio::Seq::Quality;
>> use Bio::SeqIO;
>>
>> my $id = "bogus_id";
>> my $desc = "bogus description";
>> my $seq = "ATTATTATTATTATT";
>> my $qual = "10 20 30 10 20 30 10 20 30 10 20 30 10 20 30";
>>
>> my $sequal_obj = Bio::Seq::Quality->new(
>> -display_id => $id,
>> -desc => $desc,
>> -seq => $seq,
>> -qual => $qual
>> );
>>
>> my $qualout = Bio::SeqIO->new(
>> -file => ">myfile.qual",
>> -format => 'qual'
>> );
>> my $seqout = Bio::SeqIO->new(
>> -file => ">myfile.seq",
>> -format => 'Fasta'
>> );
>>
>> $seqout->write_seq($sequal_obj);
>> $qualout->write_seq($sequal_obj);
>>
>>
>> ------------------ sample output follows
>> ---------------------------------------
>>
>> tdhoufek at aether:~$ cat myfile.seq
>>> bogus_id bogus description
>> ATTATTATTATTATT
>> tdhoufek at aether:~$ cat myfile.qual
>>> bogus_id
>> 10 20 30 10 20 30 10 20 30 10 20 30 10 20 30
>>
>> --------------------------------------------------------------------- 
>> ---------
>> --------------------
>>
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From dmessina at wustl.edu  Fri May  5 15:24:47 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 10:24:47 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <5A549C57-A310-4623-BC44-787AC8BFD6C2@wustl.edu>

Apologies if this is a repost -- mail troubles this morning.

Hilmar is correct.

 From a cursory walk through the code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From dmessina at wustl.edu  Fri May  5 14:53:15 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 09:53:15 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <DCF490F7-46CC-47B7-81A7-229BCC819980@wustl.edu>

T.D.,

 From a cursory walk through your code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From dmessina at wustl.edu  Fri May  5 14:53:15 2006
From: dmessina at wustl.edu (David Messina)
Date: Fri, 5 May 2006 09:53:15 -0500
Subject: [Bioperl-l] Bio::Seq::Quality description line problem
In-Reply-To: <445A30E4.6070103@pngg.org>
References: <mailman.1030.1146673703.2090.bioperl-l@lists.open-bio.org>
	<445A30E4.6070103@pngg.org>
Message-ID: <DCF490F7-46CC-47B7-81A7-229BCC819980@wustl.edu>

T.D.,

 From a cursory walk through your code in a debugger, it looks like  
Bio::SeqIO::qual's write_seq method doesn't read the 'desc' out of  
the Bio::Seq::Quality object.

I think there should be something like this:
if ($source->can('desc') and my $desc = $source->desc()) {
     $desc =~ s/\n//g;
}
$header .= " $desc";

before line 218 in Bio::SeqIO::qual (where the header is printed):
$self->_print (">$header \n");

Dave


From hubert.prielinger at gmx.at  Fri May  5 18:30:24 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 12:30:24 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445AD742.4070408@infotech.monash.edu.au>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>	<445AB2A4.7020405@gmx.at>
	<445AD742.4070408@infotech.monash.edu.au>
Message-ID: <445B99C0.6050407@gmx.at>

hi,
I have done, as you suggested and I got the error message:

Can't call method "next_result" on an undefined value at....

then I looked up at the internet and found a thread which suggested to 
use strict and then the problem is solved....
but I'm already using use strict..

thanks

Torsten Seemann wrote:
> Hubert Prielinger wrote:
>   
>> if I do so it returns:
>> 0 undef
>>     
>
> That means the value of $search was undef.
> That means that it could not parse or open the BLAST report.
> I repeat the line that I put in my earlier email which you ignored.
>
> # your line
> my $search = Bio::SearchIO->new( ..... );
>
> # then check if it was successful!
> die "could not open blast report" if not defined $search;
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Fri May  5 19:18:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 14:18:16 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
Message-ID: <000001c67078$a9a7ca10$15327e82@pyrimidine>

What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping through
your files and performing a task on each one, such as parsing output.  It
changes into the working directory each time; you should be able to do
something like this:

use File::Find;
use Bio::SearchIO;


Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report.
> > I repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 19:27:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 14:27:12 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
Message-ID: <000101c67079$e8c86a00$15327e82@pyrimidine>

Sorry, mail got sent before I finished it!  Here I go again...

What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping through
your files and performing a task on each one, such as parsing output.  It
changes into the working directory each time; you should be able to do
something like this:

use File::Find;
use Bio::SearchIO;

my @dirlist = ("/home/Hubert/test");

find (\&dir, @dirlist);

sub printdir {
    return unless /txt$/; 
    return if (-d);
    my $parser = Bio::SearchIO->new(-file => $_,
                                    -format => 'blast');	
    while (my $result = $parser->next_result) {
        while (my $hit = $result->next_hit) {
            while (my $hsp = $hit->next_hsp) {
                # do stuff here
            }
        }
    }
}

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report.
> > I repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From barry.moore at genetics.utah.edu  Fri May  5 19:39:37 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Fri, 5 May 2006 13:39:37 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <445B99C0.6050407@gmx.at>
References: <445A578E.8050207@gmx.at>
	<445A7449.1080607@infotech.monash.edu.au>	<445A94F8.9000903@gmx.at>	<115146F1-E84D-418C-A1B9-CEC5CC4D21C4@genetics.utah.edu>	<445AB2A4.7020405@gmx.at>
	<445AD742.4070408@infotech.monash.edu.au> <445B99C0.6050407@gmx.at>
Message-ID: <7F3D73A6-392E-4728-ACB9-FD3BEDFD3C18@genetics.utah.edu>

Hubert-

If you want to send me your script and input file I'll try to have a  
look at it.

Barry

On May 5, 2006, at 12:30 PM, Hubert Prielinger wrote:

> hi,
> I have done, as you suggested and I got the error message:
>
> Can't call method "next_result" on an undefined value at....
>
> then I looked up at the internet and found a thread which suggested to
> use strict and then the problem is solved....
> but I'm already using use strict..
>
> thanks
>
> Torsten Seemann wrote:
>> Hubert Prielinger wrote:
>>
>>> if I do so it returns:
>>> 0 undef
>>>
>>
>> That means the value of $search was undef.
>> That means that it could not parse or open the BLAST report.
>> I repeat the line that I put in my earlier email which you ignored.
>>
>> # your line
>> my $search = Bio::SearchIO->new( ..... );
>>
>> # then check if it was successful!
>> die "could not open blast report" if not defined $search;
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 20:07:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 15:07:53 -0500
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <000101c67079$e8c86a00$15327e82@pyrimidine>
Message-ID: <000201c6707f$97aaaba0$15327e82@pyrimidine>

Oops!  This is what happens when I copy and paste in a hurry.

> use File::Find;
> use Bio::SearchIO;
> 
> my @dirlist = ("/home/Hubert/test");
> 
> find (\&dir, @dirlist);
> 
> sub printdir {
  ^^^^^^^^^^^

Should be: sub dir {

>     return unless /txt$/;
>     return if (-d);
>     my $parser = Bio::SearchIO->new(-file => $_,
>                                     -format => 'blast');
>     while (my $result = $parser->next_result) {
>         while (my $hit = $result->next_hit) {
>             while (my $hsp = $hit->next_hsp) {
>                 # do stuff here
>             }
>         }
>     }
> }

Hubert, if the file you are parsing looks fine (i.e. valid BLAST output),
post it and your script on Bugzilla and let us take a look.  Leave out your
password though ; >

Chris


From golharam at umdnj.edu  Fri May  5 19:58:03 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri, 05 May 2006 15:58:03 -0400
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <000001c67078$a9a7ca10$15327e82@pyrimidine>
Message-ID: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>

I'm not sure how applicable this is, but I've seen a problem with Perl
if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8).
I've changed mine to en_US and lots of perl string parsing problems went
away.

Also, what about running the bioperl tests on your installation (make
test).  What happens?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Friday, May 05, 2006 3:18 PM
To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore


What happens if you add the verbose flag?

my $search = new Bio::SearchIO (-verbose => 1,
                                -format => 'blast',
                                -file => $file);

Added thought : you might want to look at File::Find for stepping
through your files and performing a task on each one, such as parsing
output.  It changes into the working directory each time; you should be
able to do something like this:

use File::Find;
use Bio::SearchIO;


Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 1:30 PM
> To: Torsten Seemann; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
> 
> hi,
> I have done, as you suggested and I got the error message:
> 
> Can't call method "next_result" on an undefined value at....
> 
> then I looked up at the internet and found a thread which suggested to

> use strict and then the problem is solved.... but I'm already using 
> use strict..
> 
> thanks
> 
> Torsten Seemann wrote:
> > Hubert Prielinger wrote:
> >
> >> if I do so it returns:
> >> 0 undef
> >>
> >
> > That means the value of $search was undef.
> > That means that it could not parse or open the BLAST report. I 
> > repeat the line that I put in my earlier email which you ignored.
> >
> > # your line
> > my $search = Bio::SearchIO->new( ..... );
> >
> > # then check if it was successful!
> > die "could not open blast report" if not defined $search;
> >
> > --Torsten
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May  5 21:56:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 16:56:29 -0500
Subject: [Bioperl-l] Bug in genbank parsing:  CONTIG gaps
In-Reply-To: <001701c6704f$90dbd090$15327e82@pyrimidine>
Message-ID: <000901c6708e$c77442b0$15327e82@pyrimidine>

Okay, I have changed the way the CONTIG line is handled in
Bio::SeqIO::genbank.  It was handling it as a feature; I just changed it
over to handling it as a Bio::Annotation::SimpleValue object with the value
being the entire contig section.  It seems to pass tests fine but I'm
operating off Windows and my wife's IBook went to the great desktop in the
sky (motherboard), so I can't test it there.  Pulling the file off using
Bio::DB::GenBank (using the no-redirect flag) works w/o crashing out.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Friday, May 05, 2006 9:24 AM
> To: 'Hilmar Lapp'
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> 
> I'm not sure it's a valid CONTIG file w/o the join(...). This is a chunk
> from the longer file Michael used as an example here (NW_925173). I
> believe
> the CONTIG line is currently handled like a feature so I think it goes
> through Bio::SeqIO::FTHelper, which is where Michael mentions his bugfix
> is;
> I think it's getting beaten up in there somehow. I may see what happens if
> it's treated like a WGS line (like a Bio::Annotation::SimpleValue object)
> and just glob the whole mess together as is.
> 
> 
> Chris
> 
> ...
> FEATURES             Location/Qualifiers
>      source          1..44976370
>                      /organism="Homo sapiens"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:9606"
>                      /chromosome="11"
> CONTIG
> join(AADB02014316.1:1..1482320,gap(67),AADB02014317.1:1..577321,
>             gap(441),AADB02014318.1:1..173584,gap(676),
>             AADB02014319.1:1..377558,gap(20),
>             complement(AADB02014320.1:1..431263),gap(20),
>             AADB02014321.1:1..794957,gap(1241),AADB02014322.1:1..1366198,
> 
> gap(6446),AADB02014323.1:1..3366,gap(20),AADB02014324.1:1..4771,
>             gap(4611),AADB02014325.1:1..383881,gap(20),
>             complement(AADB02014326.1:1..381633),gap(1930),
>             complement(AADB02014327.1:1..460053),gap(20),
>             AADB02014328.1:1..4186,gap(1587),
> ...
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> > Sent: Thursday, May 04, 2006 5:39 PM
> > To: Chris Fields
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> >
> > The two notations are equivalent and syntactically correct, or so I
> > believe ... I don't think 100% verbatim preservation should be the
> > goal. Or am I missing the point?
> >
> > On May 4, 2006, at 6:27 PM, Chris Fields wrote:
> >
> > > Here's another odd bit.  This is what I get for the CONTIG line when I
> > > passed a simple contig file (NW_925062, with one join) through
> > > Bio::SeqIO:
> > >
> > > -----------------------------------
> > > ....
> > > FEATURES             Location/Qualifiers
> > >      source          1..8541
> > >                      /db_xref="taxon:9606"
> > >                      /mol_type="genomic DNA"
> > >                      /chromosome="11"
> > >                      /organism="Homo sapiens"
> > > CONTIG      AADB02014027.1:1..8541
> > >
> > > //
> > > -----------------------------------
> > > Here's the original:
> > > -----------------------------------
> > > FEATURES             Location/Qualifiers
> > >      source          1..8541
> > >                      /organism="Homo sapiens"
> > >                      /mol_type="genomic DNA"
> > >                      /db_xref="taxon:9606"
> > >                      /chromosome="11"
> > > CONTIG      join(AADB02014027.1:1..8541)
> > > //
> > > -----------------------------------
> > >
> > > Looks like it lopped out the 'join' here as well.
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> > >> Sent: Thursday, May 04, 2006 1:41 PM
> > >> To: bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> > >>
> > >> Are you using the CONTIG record or the full GenBank file? 	I
see
> > >> problems with both (using bioperl-live) which seem unrelated to one
> > >> another.
> > >> The full file seems to be running a bit slow b/c the full GenBank
> > >> record
> > >> is
> > >> huge (~55 MB) but the CONTIG file does exactly what you said (runs
> > >> out of
> > >> memory).
> > >>
> > >> Chris
> > >>
> > >>> -----Original Message-----
> > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>> bounces at lists.open-bio.org] On Behalf Of Michael Rogoff
> > >>> Sent: Tuesday, May 02, 2006 10:32 PM
> > >>> To: bioperl-l at lists.open-bio.org
> > >>> Subject: [Bioperl-l] Bug in genbank parsing: CONTIG gaps
> > >>>
> > >>>
> > >>> I've encountered a pretty serious bug in Bio::SeqIO when parsing
> > >>> certain
> > >>> genbank
> > >>> files that contain CONTIG entries with gaps.  One such record is
> > >>> NW_925173.
> > >>>
> > >>> When I try to parse this file using Bio::SeqIO::genbank, it will
> > >>> enter
> > >> an
> > >>> infinite loop and spin until it runs out of memory.
> > >>>
> > >>> I'm pretty certain it relates to this bug:
> > >>> http://bugzilla.bioperl.org/show_bug.cgi?id=1319 which seems to
> > >>> indicate
> > >>> that
> > >>> genbank records with CONTIG gaps are not valid and can't be
> > >>> parsed.  But
> > >>> this
> > >>> bug actually claims to be fixed, which is strange, since looking
> > >>> at the
> > >>> code for
> > >>> FTLocationFactory (where the loop is) it's still right there.  I
> > >>> assume
> > >>> that
> > >>> this may be fixed in other contexts but is still not fixed in
> > >>> Bio::SeqIO::genbank?  Or am I doing something wrong?
> > >>>
> > >>> I think that this should probably be filed as an open bug.  I would
> > >> think
> > >>> that
> > >>> even if bioperl isn't interested in parsing this type of file via
> > >>> SeqIO,
> > >>> certainly you'd want to ensure that no finite input file would
> > >>> send the
> > >>> parser
> > >>> into an infinite loop.  Have others encountered this problem?  Is
> > >>> there
> > >>> any plan
> > >>> to address it?
> > >>>
> > >>> Thanks very much for any information or help!
> > >>>
> > >>> -Mike
> > >>>
> > >>> P.S.  I've played around with my version of FTLocationFactory and it
> > >> seems
> > >>> to
> > >>> actually work and parse the gaps.  I'm not sure if I've created
> > >>> other
> > >> bugs
> > >>> or if
> > >>> it works in all cases, but at least the parser doesn't die.  I also
> > >> don't
> > >>> know
> > >>> that my hacky code is appropriate for putting back in to BioPerl,
> > >>> but
> > >> I'm
> > >>> happy
> > >>> to provide it if someone wants to check it out and/or consider it
> > >>> for
> > >>> checkin.
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May  5 23:54:55 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 17:54:55 -0600
Subject: [Bioperl-l] [BULK]  Re:  can't parse blast file anymore
In-Reply-To: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>
References: <02f101c6707e$39a03a30$2f01a8c0@GOLHARMOBILE1>
Message-ID: <445BE5CF.2000007@gmx.at>

hi ryan,
nothing happend if I add the verbose flag
and how can I test my bioperl installation.....


Ryan Golhar wrote:
> I'm not sure how applicable this is, but I've seen a problem with Perl
> if the LANG environment variable contain UTF8 (ex LANG=en_US.UTF8).
> I've changed mine to en_US and lots of perl string parsing problems went
> away.
>
> Also, what about running the bioperl tests on your installation (make
> test).  What happens?
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Friday, May 05, 2006 3:18 PM
> To: 'Hubert Prielinger'; 'Torsten Seemann'; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
>
>
> What happens if you add the verbose flag?
>
> my $search = new Bio::SearchIO (-verbose => 1,
>                                 -format => 'blast',
>                                 -file => $file);
>
> Added thought : you might want to look at File::Find for stepping
> through your files and performing a task on each one, such as parsing
> output.  It changes into the working directory each time; you should be
> able to do something like this:
>
> use File::Find;
> use Bio::SearchIO;
>
>
>
>
> Original Message-----
>   
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- 
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, May 05, 2006 1:30 PM
>> To: Torsten Seemann; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] [BULK] Re: can't parse blast file anymore
>>
>> hi,
>> I have done, as you suggested and I got the error message:
>>
>> Can't call method "next_result" on an undefined value at....
>>
>> then I looked up at the internet and found a thread which suggested to
>>     
>
>   
>> use strict and then the problem is solved.... but I'm already using 
>> use strict..
>>
>> thanks
>>
>> Torsten Seemann wrote:
>>     
>>> Hubert Prielinger wrote:
>>>
>>>       
>>>> if I do so it returns:
>>>> 0 undef
>>>>
>>>>         
>>> That means the value of $search was undef.
>>> That means that it could not parse or open the BLAST report. I 
>>> repeat the line that I put in my earlier email which you ignored.
>>>
>>> # your line
>>> my $search = Bio::SearchIO->new( ..... );
>>>
>>> # then check if it was successful!
>>> die "could not open blast report" if not defined $search;
>>>
>>> --Torsten
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org 
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org 
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From hubert.prielinger at gmx.at  Sat May  6 00:01:11 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 18:01:11 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <445BE747.5020202@gmx.at>

hi
I have posted my script and the blast file to bugzilla......


From hubert.prielinger at gmx.at  Sat May  6 01:21:33 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 05 May 2006 19:21:33 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445BE747.5020202@gmx.at>
References: <445BE747.5020202@gmx.at>
Message-ID: <445BFA1D.5060008@gmx.at>

they bugzilla posting didn't work, what is the exact email address for 
bugzilla

Hubert Prielinger wrote:
> hi
> I have posted my script and the blast file to bugzilla......
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From cjfields at uiuc.edu  Sat May  6 01:38:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 20:38:47 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445BFA1D.5060008@gmx.at>
Message-ID: <000d01c670ad$d209f980$15327e82@pyrimidine>

Hubert,

Calm down.  Breathe in, breath out.  Relax.......

Okay, here is the place to start.  Read the instructions there first.

http://www.bioperl.org/wiki/Bugs

Bugs are reported at this site:

http://bugzilla.bioperl.org/

Again, follow the instructions.  You will have to create a user name and
password to submit.  Once that is set up, click the "Submit a new bug" link
on the main bugzilla page.  On that page, fill out all information first and
a description of the error and hit 'commit'.   Add the BLAST report and some
sample script by clicking on the "Create a New Attachment" link (you'll have
to do this for each file).  Once you go back to the bug page you should see
two attachments and the bug report.  Any commits get sent through the
bioperl-guts-l mail list which most developers subscribe to, so they'll know
there's a new bug out there.  

I will not be able to get to it personally; our home computer died a slow
painful death today (RIP 2002-2006) but I can get to it next week.  If you
post the bug, somebody might be able to get to it sooner!

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Friday, May 05, 2006 8:22 PM
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore
> 
> they bugzilla posting didn't work, what is the exact email address for
> bugzilla
> 
> Hubert Prielinger wrote:
> > hi
> > I have posted my script and the blast file to bugzilla......
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Sat May  6 02:26:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 5 May 2006 21:26:35 -0500
Subject: [Bioperl-l] Changes to NCBIHelper (RE: CONTIG, genome files)
Message-ID: <000f01c670b4$7f22f760$15327e82@pyrimidine>

I committed a change to NCBIHelper that permits the downloading of CON
(contig) files and corrects an issue where no sequence features were saved
when rebuilding those files.  If you use Bio::DB::GenBank regularly to
download genome files, this likely will NOT affect your code unless you
explicitly set the format type to 'genbank', like so:

$factory = Bio::DB::GenBank->new(-format => 'gb'); # or 'genbank'

I believe most will not have that setting since the default was already
'gb'.  Now, the default is 'gbwithparts', which returns the full sequence
regardless.  If it is a file with a CONTIG line, the sequence is built on
NCBI's end and will include seq features if they are present).  As Brian
said, we'll let NCBI do the work for us!  

If you need the actual file w/o sequence, then you can set the format to
'genbank' (like above) and it will grab it for you.  There was an unrelated
problem with CONTIG line parsing that I also fixed, where I changed the
format over to a Bio::Annotation::SimpleValue as a workaround for now; for
some reason some CON files were misparsed and resulted in infinite loops or
missing 'join' statements.  

Chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From hubert.prielinger at gmx.at  Sat May  6 22:22:05 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sat, 06 May 2006 16:22:05 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <000d01c670ad$d209f980$15327e82@pyrimidine>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
Message-ID: <445D218D.2030504@gmx.at>

ok, thanks
I have submitted the bug
bug #1994


Chris Fields wrote:
> Hubert,
>
> Calm down.  Breathe in, breath out.  Relax.......
>
> Okay, here is the place to start.  Read the instructions there first.
>
> http://www.bioperl.org/wiki/Bugs
>
> Bugs are reported at this site:
>
> http://bugzilla.bioperl.org/
>
> Again, follow the instructions.  You will have to create a user name and
> password to submit.  Once that is set up, click the "Submit a new bug" link
> on the main bugzilla page.  On that page, fill out all information first and
> a description of the error and hit 'commit'.   Add the BLAST report and some
> sample script by clicking on the "Create a New Attachment" link (you'll have
> to do this for each file).  Once you go back to the bug page you should see
> two attachments and the bug report.  Any commits get sent through the
> bioperl-guts-l mail list which most developers subscribe to, so they'll know
> there's a new bug out there.  
>
> I will not be able to get to it personally; our home computer died a slow
> painful death today (RIP 2002-2006) but I can get to it next week.  If you
> post the bug, somebody might be able to get to it sooner!
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Friday, May 05, 2006 8:22 PM
>> To: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] [BULK] can't parse blast file anymore
>>
>> they bugzilla posting didn't work, what is the exact email address for
>> bugzilla
>>
>> Hubert Prielinger wrote:
>>     
>>> hi
>>> I have posted my script and the blast file to bugzilla......
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From torsten.seemann at infotech.monash.edu.au  Sun May  7 00:57:14 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 07 May 2006 10:57:14 +1000
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445D218D.2030504@gmx.at>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
	<445D218D.2030504@gmx.at>
Message-ID: <445D45EA.8020804@infotech.monash.edu.au>

Hubert Prielinger wrote:
> ok, thanks
> I have submitted the bug
> bug #1994

This is a line from the script you sent to Bugzilla:

my $search = new Bio::SearchIO (
-verbose => 1,-format => 'blast', -file => $file)
or die "could not open blast report" if not defined my $search;

Althoygh syntactically correct, I don't think it is doing what you want.
Please change it to this:

my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die 
"could not open blast report";

or alternatively, this:

my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
if (not defined $search) {
   die "could not open blast report";
}

and let us know what happens.

all the example output you have supplied still suggests that Bio::SearchIO can 
not load or parse your blast report.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia


From mamillerpa at yahoo.com  Sat May  6 23:07:30 2006
From: mamillerpa at yahoo.com (Mark A. Miller)
Date: Sat, 6 May 2006 16:07:30 -0700 (PDT)
Subject: [Bioperl-l] Can't parse bacterial strain from EMBL OS or RC
	lines
In-Reply-To: <C07E8961.84F2%osborne1@optonline.net>
Message-ID: <20060506230730.56480.qmail@web50410.mail.yahoo.com>

Thanks for your responses, Jason and Brian.

Brian, you suggestion works great.  I had really hoped that by parsing
the OS line as well, I could be sure I wasn't missing any sequences
from my organisms.  Well, I gave up on that and just obtained the NCBI
taxonomy values.  I find it pretty easy to work with them in bioperl. 
Unfortunately, walking through all of Trembl takes a while, and I'm
getting this error:

  Can't call method "ncbi_taxid" on an undefined value at ./ga2.pl line
55, <GEN0> line 3253682.

When I try to extract annotations, etc., from entries like:

  DHE4_UNKP

with:

  my $species_object = $seq->species;
  my $taxid_string = $species_object->ncbi_taxid;

I guess I have to write an error handler for incomplete taxonomy
values.

Bye for now,
Mark


--- Brian Osborne <osborne1 at optonline.net> wrote:

> Mark,
> 
> The RC line is part of the description of a reference, I'm guessing
> 'RC'
> stands for Reference Comment. In order to get the attributes of a
> reference
> you'll first do something like:
> 
> my $anno_collection = $seq->annotation;
> my @references = $anno_collection->get_Annotations('reference');
> 
> To get the comment field for a specific reference you can do:
> 
> $references[0]->comment;
> 
> See the Feature-Annotation HOWTO for more information on Annotations,
> the
> Reference object is a kind of Annotation object.
> 
> Brian O.
> 
> 
> On 5/3/06 3:34 PM, "Mark A. Miller" <mamillerpa at yahoo.com> wrote:
> 
> > Yeah.  Do you have any experience with that?
> > 
> > Mark
> > 
> > --- Brian Osborne <osborne1 at optonline.net> wrote:
> > 
> >> Mark,
> >> 
> >> So you're trying to get the information in the RC line from a
> >> Swissprot
> >> format file?
> >> 
> >> Brian O.
> > 
> > 
> > ---   ---   ---   ---   ---   ---   ---   ---
> > 
> > Mark A. Miller
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com 
> 
> 
> 


---   ---   ---   ---   ---   ---   ---   ---

Mark A. Miller

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Sun May  7 03:33:40 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sat, 6 May 2006 22:33:40 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>

The -verbose flag was my suggestion; it should output a ton of debugging info 
from SearchIO::blast; if you see anything there, then it means that it's at least 
attempting to parse the report.  

Of course I can't test this myself at the moment since my wife's computer died 
(along with the bioperl setup); I'm using a loaner computer at the moment.

Chris

---- Original message ----
>Date: Sun, 07 May 2006 10:57:14 +1000
>From: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>  
>Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore  
>To: Hubert Prielinger <hubert.prielinger at gmx.at>
>Cc: bioperl-l at bioperl.org
>
>Hubert Prielinger wrote:
>> ok, thanks
>> I have submitted the bug
>> bug #1994
>
>This is a line from the script you sent to Bugzilla:
>
>my $search = new Bio::SearchIO (
>-verbose => 1,-format => 'blast', -file => $file)
>or die "could not open blast report" if not defined my $search;
>
>Althoygh syntactically correct, I don't think it is doing what you want.
>Please change it to this:
>
>my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or die 
>"could not open blast report";
>
>or alternatively, this:
>
>my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>if (not defined $search) {
>   die "could not open blast report";
>}
>
>and let us know what happens.
>
>all the example output you have supplied still suggests that Bio::SearchIO can 
>not load or parse your blast report.
>
>-- 
>Torsten Seemann
>Victorian Bioinformatics Consortium, Monash University, Australia
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May  7 07:34:55 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 7 May 2006 00:34:55 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
Message-ID: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>

Hi all,

I use Bio::Tools::Run::Primer3 to design PCR primers.
I want to change some default values, for example, to
increase the PCR product size to 490-510 bp instead of
using the default value of 100-300 bp. What should I
do ?  


Thanks,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From jason.stajich at duke.edu  Sun May  7 20:49:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 16:49:29 -0400
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>
References: <65109dc1.b47d779e.81acb00@expms6.cites.uiuc.edu>
Message-ID: <F69897D1-C65F-47F3-8324-EC2E52A2ACCD@duke.edu>

The problem is in how SearchIO was being initialized, the code  
basically looked like this:

  my $x = new Foo() or die if not defined my $x;

which is invalid for two reason.
  1) if not defined my $x;
  Will ALWAYS be false.

  2) my $x = new Foo() or die ;
  Will cast the new object as a boolean.

Whenever things aren't working, take a look at the code and try and  
walk through any shortcuts.  For clarity make it a two-step process
my $x = new Foo();
die "no valid $x" unless defined $x;

Please note that currently BioPerl WILL die (via throw) if you try  
and ask for an invalid file when you initialize a new IO object  --  
this is handled by code in Bio::Root::IO (line 313 in Bio/Root/IO.pm)  
which all the IO objects use, so you don't really need to do a test  
on the object after all.

--jason
On May 6, 2006, at 11:33 PM, Christopher Fields wrote:

> The -verbose flag was my suggestion; it should output a ton of  
> debugging info
> from SearchIO::blast; if you see anything there, then it means that  
> it's at least
> attempting to parse the report.
>
> Of course I can't test this myself at the moment since my wife's  
> computer died
> (along with the bioperl setup); I'm using a loaner computer at the  
> moment.
>
> Chris
>
> ---- Original message ----
>> Date: Sun, 07 May 2006 10:57:14 +1000
>> From: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>
>> Subject: Re: [Bioperl-l] [BULK]  can't parse blast file anymore
>> To: Hubert Prielinger <hubert.prielinger at gmx.at>
>> Cc: bioperl-l at bioperl.org
>>
>> Hubert Prielinger wrote:
>>> ok, thanks
>>> I have submitted the bug
>>> bug #1994
>>
>> This is a line from the script you sent to Bugzilla:
>>
>> my $search = new Bio::SearchIO (
>> -verbose => 1,-format => 'blast', -file => $file)
>> or die "could not open blast report" if not defined my $search;
>>
>> Althoygh syntactically correct, I don't think it is doing what you  
>> want.
>> Please change it to this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file)  
>> or die
>> "could not open blast report";
>>
>> or alternatively, this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>> if (not defined $search) {
>>   die "could not open blast report";
>> }
>>
>> and let us know what happens.
>>
>> all the example output you have supplied still suggests that  
>> Bio::SearchIO can
>> not load or parse your blast report.
>>
>> -- 
>> Torsten Seemann
>> Victorian Bioinformatics Consortium, Monash University, Australia
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Sun May  7 21:01:29 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 17:01:29 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>
References: <20060507073455.11849.qmail@web36815.mail.mud.yahoo.com>
Message-ID: <C9CE0912-9C48-4404-AB56-054A425FE3A0@duke.edu>

I put up some info on the wiki (and I encourage other people to do  
the same!)
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3

Set the command line parameters by just calling a function of the  
name of the parameter.  To get a list of the available options, this  
perl code will report it to you:

# what are the arguments, and what do they mean?
   my $args = $primer3->arguments;

   print "ARGUMENT\tMEANING\n";
   foreach my $key (keys %{$args}) {print "$key\t", $$args{$key}, "\n"}

The info for PRODUCT_SIZE_RANGE is:
   (size range list, default 100-300) space separated list of product  
sizes eg <a>-<b> <x>-<y>

I believe you can set the PCR product size with
   $primer3->primer_product_size_range("490-510");

-jason
On May 7, 2006, at 3:34 AM, chen li wrote:

> Hi all,
>
> I use Bio::Tools::Run::Primer3 to design PCR primers.
> I want to change some default values, for example, to
> increase the PCR product size to 490-510 bp instead of
> using the default value of 100-300 bp. What should I
> do ?
>
>
> Thanks,
>
> Li
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From chen_li3 at yahoo.com  Mon May  8 01:18:17 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 7 May 2006 18:18:17 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <C9CE0912-9C48-4404-AB56-054A425FE3A0@duke.edu>
Message-ID: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>

Hi Jason,

I add the line code   
$primer3->primer_product_size_range("490-510");
 to my script. But it doesn't work nor primer3
complains it.

Li

--- Jason Stajich <jason.stajich at duke.edu> wrote:

> I put up some info on the wiki (and I encourage
> other people to do  
> the same!)
>
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
> 
> Set the command line parameters by just calling a
> function of the  
> name of the parameter.  To get a list of the
> available options, this  
> perl code will report it to you:
> 
> # what are the arguments, and what do they mean?
>    my $args = $primer3->arguments;
> 
>    print "ARGUMENT\tMEANING\n";
>    foreach my $key (keys %{$args}) {print "$key\t",
> $$args{$key}, "\n"}
> 
> The info for PRODUCT_SIZE_RANGE is:
>    (size range list, default 100-300) space
> separated list of product  
> sizes eg <a>-<b> <x>-<y>
> 
> I believe you can set the PCR product size with
>    $primer3->primer_product_size_range("490-510");
> 
> -jason
> On May 7, 2006, at 3:34 AM, chen li wrote:
> 
> > Hi all,
> >
> > I use Bio::Tools::Run::Primer3 to design PCR
> primers.
> > I want to change some default values, for example,
> to
> > increase the PCR product size to 490-510 bp
> instead of
> > using the default value of 100-300 bp. What should
> I
> > do ?
> >
> >
> > Thanks,
> >
> > Li
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From hubert.prielinger at gmx.at  Mon May  8 01:41:14 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 07 May 2006 19:41:14 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <445D45EA.8020804@infotech.monash.edu.au>
References: <000d01c670ad$d209f980$15327e82@pyrimidine>
	<445D218D.2030504@gmx.at> <445D45EA.8020804@infotech.monash.edu.au>
Message-ID: <445EA1BA.9050301@gmx.at>

hi,
I have corrected that and now I finally I got a few error messages:

blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
Madden, Alejandro A. Sch?ffer,
blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
David J. Lipman
blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
generation of
blast.pm: unrecognized line protein database search programs", Nucleic 
Acids Res. 25:3389-3402.
blast.pm: unrecognized line RID: 1137529800-24476-151611170370.BLASTQ1

after that line it stops without terminating....


Torsten Seemann wrote:
> Hubert Prielinger wrote:
>> ok, thanks
>> I have submitted the bug
>> bug #1994
>
> This is a line from the script you sent to Bugzilla:
>
> my $search = new Bio::SearchIO (
> -verbose => 1,-format => 'blast', -file => $file)
> or die "could not open blast report" if not defined my $search;
>
> Althoygh syntactically correct, I don't think it is doing what you want.
> Please change it to this:
>
> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
> die "could not open blast report";
>
> or alternatively, this:
>
> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
> if (not defined $search) {
>   die "could not open blast report";
> }
>
> and let us know what happens.
>
> all the example output you have supplied still suggests that 
> Bio::SearchIO can not load or parse your blast report.
>


From cjfields at uiuc.edu  Mon May  8 02:04:13 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sun, 7 May 2006 21:04:13 -0500
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
Message-ID: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>

These are debugging lines (not errors); you still have the -verbose flag set.  

Did you follow Jason's advice?  I believe he's right on the money about the issue 
at hand...

Chris

---- Original message ----
>Date: Sun, 07 May 2006 19:41:14 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] [BULK] ?can't parse blast file anymore  
>To: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>, bioperl-
l at bioperl.org, Chris Fields <cjfields at uiuc.edu>, Jason Stajich 
<jason.stajich at duke.edu>
>
>hi,
>I have corrected that and now I finally I got a few error messages:
>
>blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
>Madden, Alejandro A. Sch?ffer,
>blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
>David J. Lipman
>blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
>generation of
>blast.pm: unrecognized line protein database search programs", Nucleic 
>Acids Res. 25:3389-3402.
>blast.pm: unrecognized line RID: 
1137529800-24476-151611170370.BLASTQ1
>
>after that line it stops without terminating....
>
>
>Torsten Seemann wrote:
>> Hubert Prielinger wrote:
>>> ok, thanks
>>> I have submitted the bug
>>> bug #1994
>>
>> This is a line from the script you sent to Bugzilla:
>>
>> my $search = new Bio::SearchIO (
>> -verbose => 1,-format => 'blast', -file => $file)
>> or die "could not open blast report" if not defined my $search;
>>
>> Althoygh syntactically correct, I don't think it is doing what you want.
>> Please change it to this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
>> die "could not open blast report";
>>
>> or alternatively, this:
>>
>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>> if (not defined $search) {
>>   die "could not open blast report";
>> }
>>
>> and let us know what happens.
>>
>> all the example output you have supplied still suggests that 
>> Bio::SearchIO can not load or parse your blast report.
>>
>


From jason.stajich at duke.edu  Mon May  8 02:47:00 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun, 7 May 2006 22:47:00 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <430DE892-8EE8-4FC9-8BAC-7D344C876B72@duke.edu>

I'm not really familiar with the module more  than what the  
documentation says so did you try and use the add_targets method to  
add arguments instead?  I had thought the AUTOLOAD method took care  
of access to the cmd line arguments as it does for the other Run  
modules but I am not really sure.  Perhaps folks on the list who use  
this module can provide better advice.

-jason
On May 7, 2006, at 9:18 PM, chen li wrote:

> Hi Jason,
>
> I add the line code
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
>
> Li
>
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
>
>> I put up some info on the wiki (and I encourage
>> other people to do
>> the same!)
>>
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>>
>> Set the command line parameters by just calling a
>> function of the
>> name of the parameter.  To get a list of the
>> available options, this
>> perl code will report it to you:
>>
>> # what are the arguments, and what do they mean?
>>    my $args = $primer3->arguments;
>>
>>    print "ARGUMENT\tMEANING\n";
>>    foreach my $key (keys %{$args}) {print "$key\t",
>> $$args{$key}, "\n"}
>>
>> The info for PRODUCT_SIZE_RANGE is:
>>    (size range list, default 100-300) space
>> separated list of product
>> sizes eg <a>-<b> <x>-<y>
>>
>> I believe you can set the PCR product size with
>>    $primer3->primer_product_size_range("490-510");
>>
>> -jason
>> On May 7, 2006, at 3:34 AM, chen li wrote:
>>
>>> Hi all,
>>>
>>> I use Bio::Tools::Run::Primer3 to design PCR
>> primers.
>>> I want to change some default values, for example,
>> to
>>> increase the PCR product size to 490-510 bp
>> instead of
>>> using the default value of 100-300 bp. What should
>> I
>>> do ?
>>>
>>>
>>> Thanks,
>>>
>>> Li
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From osborne1 at optonline.net  Mon May  8 14:49:22 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 08 May 2006 10:49:22 -0400
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <C084D2B2.85D7%osborne1@optonline.net>

Li,

Read the documentation, Bio::Tools::Run::Primer3. It shows examples of the
correct syntax. Also look at bioperl-run/t/Primer3.t.

Brian O.


On 5/7/06 9:18 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> Hi Jason,
> 
> I add the line code
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
> 
> Li
> 
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
> 
>> I put up some info on the wiki (and I encourage
>> other people to do
>> the same!)
>> 
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>> 
>> Set the command line parameters by just calling a
>> function of the 
>> name of the parameter.  To get a list of the
>> available options, this
>> perl code will report it to you:
>> 
>> # what are the arguments, and what do they mean?
>>    my $args = $primer3->arguments;
>> 
>>    print "ARGUMENT\tMEANING\n";
>>    foreach my $key (keys %{$args}) {print "$key\t",
>> $$args{$key}, "\n"}
>> 
>> The info for PRODUCT_SIZE_RANGE is:
>>    (size range list, default 100-300) space
>> separated list of product
>> sizes eg <a>-<b> <x>-<y>
>> 
>> I believe you can set the PCR product size with
>>    $primer3->primer_product_size_range("490-510");
>> 
>> -jason
>> On May 7, 2006, at 3:34 AM, chen li wrote:
>> 
>>> Hi all,
>>> 
>>> I use Bio::Tools::Run::Primer3 to design PCR
>> primers.
>>> I want to change some default values, for example,
>> to
>>> increase the PCR product size to 490-510 bp
>> instead of
>>> using the default value of 100-300 bp. What should
>> I
>>> do ?
>>> 
>>> 
>>> Thanks,
>>> 
>>> Li
>>> 
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> 
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From roy at colibase.bham.ac.uk  Mon May  8 11:12:49 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Mon, 08 May 2006 12:12:49 +0100
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
References: <20060508011817.15495.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <445F27B1.40501@colibase.bham.ac.uk>

Hi Li,

I think the syntax you need is:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

I guess you may also need to change the parameter PRIMER_PRODUCT_OPT_SIZE.

Incidentally, such a restricted product size range may mean that Primer3 
is unable to design any suitable primers. If I recall correctly, this 
doesn't cause an error, you just get a Bio::Tools::Primer3 object with 
no primers in it. I have had some success with testing for this, and if 
necessary relaxing some constraints on primer design and re-running 
Primer3.

Hope this helps.
Roy.

--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk

> Hi Jason,
> 
> I add the line code   
> $primer3->primer_product_size_range("490-510");
>  to my script. But it doesn't work nor primer3
> complains it.
> 
> Li
> 
> --- Jason Stajich <jason.stajich at duke.edu> wrote:
> 
>> > I put up some info on the wiki (and I encourage
>> > other people to do  
>> > the same!)
>> >
> http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
>> > 
>> > Set the command line parameters by just calling a
>> > function of the  
>> > name of the parameter.  To get a list of the
>> > available options, this  
>> > perl code will report it to you:
>> > 
>> > # what are the arguments, and what do they mean?
>> >    my $args = $primer3->arguments;
>> > 
>> >    print "ARGUMENT\tMEANING\n";
>> >    foreach my $key (keys %{$args}) {print "$key\t",
>> > $$args{$key}, "\n"}
>> > 
>> > The info for PRODUCT_SIZE_RANGE is:
>> >    (size range list, default 100-300) space
>> > separated list of product  
>> > sizes eg <a>-<b> <x>-<y>
>> > 
>> > I believe you can set the PCR product size with
>> >    $primer3->primer_product_size_range("490-510");
>> > 
>> > -jason
>> > On May 7, 2006, at 3:34 AM, chen li wrote:
>> > 
>>> > > Hi all,
>>> > >
>>> > > I use Bio::Tools::Run::Primer3 to design PCR
>> > primers.
>>> > > I want to change some default values, for example,
>> > to
>>> > > increase the PCR product size to 490-510 bp
>> > instead of
>>> > > using the default value of 100-300 bp. What should
>> > I
>>> > > do ?
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Li
>>> > >
>>> > > __________________________________________________
>>> > > Do You Yahoo!?
>>> > > Tired of spam?  Yahoo! Mail has the best spam
>> > protection around
>>> > > http://mail.yahoo.com
>>> > > _______________________________________________
>>> > > Bioperl-l mailing list
>>> > > Bioperl-l at lists.open-bio.org
>>> > >
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > 
>> > --
>> > Jason Stajich
>> > Duke University
>> > http://www.duke.edu/~jes12
>> > 
>> > 
>> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Mon May  8 13:21:54 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 8 May 2006 06:21:54 -0700 (PDT)
Subject: [Bioperl-l] primer parameters using primer3
In-Reply-To: <445F27B1.40501@colibase.bham.ac.uk>
Message-ID: <20060508132154.71440.qmail@web36802.mail.mud.yahoo.com>

I think Dr. Chaudhuri is correct. 

I add the follwoing line codes to my script(actually
copy from the document)

$primer3->add_targets(
PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

$primer3->add_targets('PRIMER_MIN_TM'=>60,
'PRIMER_MAX_TM'=>64);

to design the primers with product size from 490-510
bp and primer annealing Tm from 60 to 64C .

Here is part of the output in the file called
temp.out:

.......... original sequence.....
GTGGGCTGGTGTTGCTTGGAAAATTTCAAAATCCCAAAGTTTCAGGCTTCCCAAAGTTGGCTTGGAAAAATGTGATAGTCTCACCTGAGTCTAGACATGT
.................

PRIMER_PRODUCT_SIZE_RANGE=490-510
PRIMER_MIN_TM=60
PRIMER_MAX_TM=64
PRIMER_PAIR_PENALTY=0.1544
PRIMER_LEFT_PENALTY=0.081468
PRIMER_RIGHT_PENALTY=0.072951
PRIMER_LEFT_SEQUENCE=CCAAAGTTGGCTTGGAAAAA
...............................
PRIMER_PRODUCT_SIZE=501

..............

This is what I want. If you don't set the special
parameters such annealing Tm program will use the
defualt ones. If you set your own parameters they will
show up after the sequence (see this output example).

If one needs to set more parameters and wants to know
what parameters are available just browse the code for
BEGIN section.

Now I have another question: the program always prints
out the original sequence at the beginning is it
possible not to do that?

Thanks all for join this topic,

Li 


--- Roy Chaudhuri <roy at colibase.bham.ac.uk> wrote:

> Hi Li,
> 
> I think the syntax you need is:
> 
>
$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');
> 
> I guess you may also need to change the parameter
> PRIMER_PRODUCT_OPT_SIZE.
> 
> Incidentally, such a restricted product size range
> may mean that Primer3 
> is unable to design any suitable primers. If I
> recall correctly, this 
> doesn't cause an error, you just get a
> Bio::Tools::Primer3 object with 
> no primers in it. I have had some success with
> testing for this, and if 
> necessary relaxing some constraints on primer design
> and re-running 
> Primer3.
> 
> Hope this helps.
> Roy.
> 
> --
> Dr. Roy Chaudhuri
> Bioinformatics Research Fellow
> Division of Immunity and Infection
> University of Birmingham, U.K.
> 
> http://xbase.bham.ac.uk
> 
> > Hi Jason,
> > 
> > I add the line code   
> > $primer3->primer_product_size_range("490-510");
> >  to my script. But it doesn't work nor primer3
> > complains it.
> > 
> > Li
> > 
> > --- Jason Stajich <jason.stajich at duke.edu> wrote:
> > 
> >> > I put up some info on the wiki (and I encourage
> >> > other people to do  
> >> > the same!)
> >> >
> >
>
http://bioperl.org/wiki/Module:Bio::Tools::Run::Primer3
> >> > 
> >> > Set the command line parameters by just calling
> a
> >> > function of the  
> >> > name of the parameter.  To get a list of the
> >> > available options, this  
> >> > perl code will report it to you:
> >> > 
> >> > # what are the arguments, and what do they
> mean?
> >> >    my $args = $primer3->arguments;
> >> > 
> >> >    print "ARGUMENT\tMEANING\n";
> >> >    foreach my $key (keys %{$args}) {print
> "$key\t",
> >> > $$args{$key}, "\n"}
> >> > 
> >> > The info for PRODUCT_SIZE_RANGE is:
> >> >    (size range list, default 100-300) space
> >> > separated list of product  
> >> > sizes eg <a>-<b> <x>-<y>
> >> > 
> >> > I believe you can set the PCR product size with
> >> >   
> $primer3->primer_product_size_range("490-510");
> >> > 
> >> > -jason
> >> > On May 7, 2006, at 3:34 AM, chen li wrote:
> >> > 
> >>> > > Hi all,
> >>> > >
> >>> > > I use Bio::Tools::Run::Primer3 to design PCR
> >> > primers.
> >>> > > I want to change some default values, for
> example,
> >> > to
> >>> > > increase the PCR product size to 490-510 bp
> >> > instead of
> >>> > > using the default value of 100-300 bp. What
> should
> >> > I
> >>> > > do ?
> >>> > >
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Li
> >>> > >
> >>> > >
> __________________________________________________
> >>> > > Do You Yahoo!?
> >>> > > Tired of spam?  Yahoo! Mail has the best
> spam
> >> > protection around
> >>> > > http://mail.yahoo.com
> >>> > >
> _______________________________________________
> >>> > > Bioperl-l mailing list
> >>> > > Bioperl-l at lists.open-bio.org
> >>> > >
> >> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> > 
> >> > --
> >> > Jason Stajich
> >> > Duke University
> >> > http://www.duke.edu/~jes12
> >> > 
> >> > 
> >> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> > http://mail.yahoo.com 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From hubert.prielinger at gmx.at  Mon May  8 19:09:29 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 08 May 2006 13:09:29 -0600
Subject: [Bioperl-l] [BULK]  can't parse blast file anymore
In-Reply-To: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>
References: <42d52830.b4f91bfc.81e4600@expms6.cites.uiuc.edu>
Message-ID: <445F9769.70500@gmx.at>

hi all together,
i have solved the problem, because I'm parsing blast 2.2.13 and I have 
installed an early bioperl 1.5.1 and there it occurred that
bug 1934 wasn't fixed yet, so I had to exchange the blast.pm file and 
now it works properly.

thank you very much
Hubert

Christopher Fields wrote:
> These are debugging lines (not errors); you still have the -verbose flag set.  
>
> Did you follow Jason's advice?  I believe he's right on the money about the issue 
> at hand...
>
> Chris
>
> ---- Original message ----
>   
>> Date: Sun, 07 May 2006 19:41:14 -0600
>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>> Subject: Re: [Bioperl-l] [BULK]  can't parse blast file anymore  
>> To: Torsten Seemann <torsten.seemann at infotech.monash.edu.au>, bioperl-
>>     
> l at bioperl.org, Chris Fields <cjfields at uiuc.edu>, Jason Stajich 
> <jason.stajich at duke.edu>
>   
>> hi,
>> I have corrected that and now I finally I got a few error messages:
>>
>> blast.pm: unrecognized line Reference: Altschul, Stephen F., Thomas L. 
>> Madden, Alejandro A. Sch?ffer,
>> blast.pm: unrecognized line Jinghui Zhang, Zheng Zhang, Webb Miller, and 
>> David J. Lipman
>> blast.pm: unrecognized line (1997), "Gapped BLAST and PSI-BLAST: a new 
>> generation of
>> blast.pm: unrecognized line protein database search programs", Nucleic 
>> Acids Res. 25:3389-3402.
>> blast.pm: unrecognized line RID: 
>>     
> 1137529800-24476-151611170370.BLASTQ1
>   
>> after that line it stops without terminating....
>>
>>
>> Torsten Seemann wrote:
>>     
>>> Hubert Prielinger wrote:
>>>       
>>>> ok, thanks
>>>> I have submitted the bug
>>>> bug #1994
>>>>         
>>> This is a line from the script you sent to Bugzilla:
>>>
>>> my $search = new Bio::SearchIO (
>>> -verbose => 1,-format => 'blast', -file => $file)
>>> or die "could not open blast report" if not defined my $search;
>>>
>>> Althoygh syntactically correct, I don't think it is doing what you want.
>>> Please change it to this:
>>>
>>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file) or 
>>> die "could not open blast report";
>>>
>>> or alternatively, this:
>>>
>>> my $search = new Bio::SearchIO(-format => 'blast', -file => $file);
>>> if (not defined $search) {
>>>   die "could not open blast report";
>>> }
>>>
>>> and let us know what happens.
>>>
>>> all the example output you have supplied still suggests that 
>>> Bio::SearchIO can not load or parse your blast report.
>>>
>>>       
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From s.johri at imperial.ac.uk  Mon May  8 15:38:13 2006
From: s.johri at imperial.ac.uk (Johri, Saurabh)
Date: Mon, 8 May 2006 16:38:13 +0100
Subject: [Bioperl-l] PAML + Codeml problem..
Message-ID: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>

Hi all,
 
I'm trying to use codeml from PAML to estimate Ka, Ks values from
sequences within a multi fasta file:
i'm using the code which has been posted on the bioperl wiki...
 
However, when I run the code, i get the following errors:
 
I did a google search to see if anyone had come across similar
problems.... in which case the problem seems to have been due to the
sequences not being a multiple of 3,
In my code I check if the sequence is a multiple of 3 and if  not, i
alter the sequences until this is the case, although I still have the
same error messages,
 
Any suggestions as to why this could be happening?
 
Thanks!!!
 
Saurabh Johri
Tuberculosis Research Group
Centre for Molecular Microbiology & Infection
Imperial College London
SW7 2AZ

 
-------------------- WARNING ---------------------
MSG: There was an error - see error_string for the program output
---------------------------------------------------
 
------------- EXCEPTION Bio::Root::NotImplemented -------------
MSG: Unknown format of PAML output
STACK Bio::Tools::Phylo::PAML::_parse_summary
/sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359
STACK Bio::Tools::Phylo::PAML::next_result
/sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224
------------------------------------
 
>Rv3923c
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_cdc1551
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac
>Rv3923c_mtb_f11
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_c1
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mtb_210
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
>Rv3923c_mbovis
caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccgag
gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcgac
ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacggt
acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcaccgc
aaataagcccggtgttgcaatcaa
 
------------------------------------


From chen_li3 at yahoo.com  Tue May  9 00:21:42 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 8 May 2006 17:21:42 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple sequences
Message-ID: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com>

Dear all,

The following is the script I use to design primers
for one sequence:

#!/cygdrive/c/Perl/bin/perl.exe

use warnings;
use strict;

use Bio::Tools::Run::Primer3;
use Bio::SeqIO;

my $file_in='piwil2.fa';
my $file_out='temp.out';
my $seqio=Bio::SeqIO->new(-file=>$file_in)
                    
my $seq=$seqio->next_seq;      
my $primer3=Bio::Tools::Run::Primer3->new(
                                            
-seq=>$seq,
-outfile=>$file_out,
- path=>"c:/Perl/local/primer3_1.0.0/src/primer3.exe" 
                                           
 );
                                                    
  unless ($primer3->executable){                	print
"primer3 can not be found. 
             Is it installed?\n"; 
  		exit(-1);
   }

$primer3->add_targets(
# set your own parameters for the primers or product
				
'PRIMER_OPT_GC_PERCENT'=>' 50   ',		
'PRIMER_OPT_SIZE'=>  '24    ',		
'PRIMER_OPT_TM'=>  ' 60   ');
                      	
  my $result=$primer3->run;    

   exit;

I try to modify it for multiple sequences by using a
while loop as following:

while ($seq=$seqio->next_seq){

my $primer3=Bio::Tools::Run::Primer3->new()
# design the primer}
....}

I get primers only for the last sequence. It seems the
earlier ones are overwritten.

Any idea will be highly aprreciated.

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From jason.stajich at duke.edu  Tue May  9 00:59:26 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 8 May 2006 20:59:26 -0400
Subject: [Bioperl-l] PAML + Codeml problem..
In-Reply-To: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>
References: <4A98ACB8EC146149872BAC9A132A582C277AC4@icex5.ic.ac.uk>
Message-ID: <4796FE3D-9D14-4D93-B455-69EDFE2B2B62@duke.edu>

Saurabh -

a) These sequences are identical except for difference in length so  
there isn't going to be any interesting values from PAML, but maybe  
you are just providing an example?
b) I think you are missing the trailing gaps in the alignment of the  
Rv3923c_mtb_cdc1551 sequence as it is shorter PAML requires aligned  
sequences as input.
c) The sequences, in the reading frame you have provided (and using  
the standard translation table), have stop codons in them, this will  
cause failure as well.

Which code from the wiki are you running, the 'running PAML' part of  
the HOWTO?

Try looking at the actual output from PAML to figure out what is wrong.
Add this when initializing the Run object:
-save_tempfiles => 1,
-verbose => 1,

then open up the tempdir that is reported and look at the output  
files (mlc file).

-jason

On May 8, 2006, at 11:38 AM, Johri, Saurabh wrote:

> Hi all,
>
> I'm trying to use codeml from PAML to estimate Ka, Ks values from
> sequences within a multi fasta file:
> i'm using the code which has been posted on the bioperl wiki...
>
> However, when I run the code, i get the following errors:
>
> I did a google search to see if anyone had come across similar
> problems.... in which case the problem seems to have been due to the
> sequences not being a multiple of 3,
> In my code I check if the sequence is a multiple of 3 and if  not, i
> alter the sequences until this is the case, although I still have the
> same error messages,
>
> Any suggestions as to why this could be happening?
>
> Thanks!!!
>
> Saurabh Johri
> Tuberculosis Research Group
> Centre for Molecular Microbiology & Infection
> Imperial College London
> SW7 2AZ
>
>
>
>
> -------------------- WARNING ---------------------
> MSG: There was an error - see error_string for the program output
> ---------------------------------------------------
>
> ------------- EXCEPTION Bio::Root::NotImplemented -------------
> MSG: Unknown format of PAML output
> STACK Bio::Tools::Phylo::PAML::_parse_summary
> /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:359
> STACK Bio::Tools::Phylo::PAML::next_result
> /sw/lib/perl5/5.8.6/Bio/Tools/Phylo/PAML.pm:224
> ------------------------------------
>
>> Rv3923c
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_cdc1551
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcac
>> Rv3923c_mtb_f11
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_c1
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mtb_210
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>> Rv3923c_mbovis
> caccgatcactacctgccagttcgacagccctccgcaagccgcatcgcagttgctgctccaaccgagccg 
> ag
> gagacatgccggctgctcggcagcgcgcggatcaccacatgatcggacgggtggagttctttgacgatcg 
> ac
> ccagccacgtgccgcagccgacgtgccacgcggtggcgttccacggccgaccccaccgacttggc
> gataatcagtccgacgcgcggcccaccgccactcccacgccaccaataaacgaccatgtcagaccgcacg 
> gt
> acgcatcccgtgcttcaccgttgtttcaaaatccgctgaccgcctcatgcggttgcgtgcacgaagcacc 
> gc
> aaataagcccggtgttgcaatcaa
>
> ------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From osborne1 at optonline.net  Tue May  9 01:17:22 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 08 May 2006 21:17:22 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <20060509002142.94880.qmail@web36806.mail.mud.yahoo.com>
Message-ID: <C08565E2.85FD%osborne1@optonline.net>

Li,

If you're analyzing multiple input sequences you're going to have to create
multiple output sequences.

Brian O.


On 5/8/06 8:21 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> I get primers only for the last sequence. It seems the
> earlier ones are overwritten.


From WiersmaP at AGR.GC.CA  Tue May  9 01:28:27 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Mon, 8 May 2006 21:28:27 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C41@onncrxms5.agr.gc.ca>

Hi Li,

 
When you execute $primer3->run with a Bio::Tools::Run::Primer3 object it
opens -outfile=>"filename" for writing and then closes.  That's why
putting it in a loop will overwrite your output file each time so you
only see the last one.  I suppose you could read in each output file
before looping to the next seq and append it to another file.

 
If you're doing a fair bit of work with this module it would be worth
looking at the Bio::Tools::Primer3 module.  The statement $result =
$primer3->run produces a Bio::Tools::Primer3 object which has all the
methods you need for customizing your output.

 
Paul

 
Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC

wiersmap at agr.gc.ca

 
From simon_sask at yahoo.com  Tue May  9 08:06:04 2006
From: simon_sask at yahoo.com (Simon K. Chan)
Date: Tue, 9 May 2006 01:06:04 -0700 (PDT)
Subject: [Bioperl-l] Raw Blast Alignment
Message-ID: <20060509080604.53621.qmail@web54104.mail.yahoo.com>

Hi Fellow Bioperl-ers,

bioperl-live/examples/searchio/rawwriter.pl is
supposed to show the raw alignments using
Bio::SearchIO.  The script is written to parse a
PSI-BLAST report.  I found an old email in the archive
from Jason stating that this should parse other
flavors of blast reports as well.  

What do I need to do to make this script parse non-PSI
blast reports?  I tried to just specify a file and
that the -format is 'blast', but I get an error
stating that the object method 'raw_hit_data' is not
defined in Bio::Search::Hit::BlastHit.

Basically, I want to obtain the raw alignment because
I'd like to get the size of the gaps, not just the
number.

Any help will be much appreciated.
Many thanks


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  9 12:21:02 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Tue, 9 May 2006 07:21:02 -0500
Subject: [Bioperl-l] Raw Blast Alignment
Message-ID: <fe65cab2.b5b5696a.81acb00@expms6.cites.uiuc.edu>

You need to read the SearchIO HOWTO, which gives several examples:

http://www.bioperl.org/wiki/HOWTO:SearchIO

Chris

---- Original message ----
>Date: Tue, 9 May 2006 01:06:04 -0700 (PDT)
>From: "Simon K. Chan" <simon_sask at yahoo.com>  
>Subject: [Bioperl-l] Raw Blast Alignment  
>To: bioperl-l at lists.open-bio.org
>
>Hi Fellow Bioperl-ers,
>
>bioperl-live/examples/searchio/rawwriter.pl is
>supposed to show the raw alignments using
>Bio::SearchIO.  The script is written to parse a
>PSI-BLAST report.  I found an old email in the archive
>from Jason stating that this should parse other
>flavors of blast reports as well.  
>
>What do I need to do to make this script parse non-PSI
>blast reports?  I tried to just specify a file and
>that the -format is 'blast', but I get an error
>stating that the object method 'raw_hit_data' is not
>defined in Bio::Search::Hit::BlastHit.
>
>Basically, I want to obtain the raw alignment because
>I'd like to get the size of the gaps, not just the
>number.
>
>Any help will be much appreciated.
>Many thanks
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From peterm at bioinf.uni-leipzig.de  Tue May  9 12:44:25 2006
From: peterm at bioinf.uni-leipzig.de (Peter Menzel)
Date: Tue, 09 May 2006 14:44:25 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <44608EA9.1030808@bioinf.uni-leipzig.de>

Hi all,
I am using the Bio::Graphics module to draw sequences and their features 
with Bio::SeqFeature::Generic.
The features I want to highlight are occurrences of transcription 
binding factors. Therefore I want to give every factor its own color, 
but i didn't see how to manage it. I only can colorize complete tracks.
Is there a known workaround?

Thanks, Peter


From Marc.Logghe at DEVGEN.com  Tue May  9 14:13:24 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 9 May 2006 16:13:24 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D88@ANTARESIA.be.devgen.com>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Peter Menzel
> Sent: Tuesday, May 09, 2006 2:44 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] colorize features
> 
> Hi all,
> I am using the Bio::Graphics module to draw sequences and 
> their features with Bio::SeqFeature::Generic.
> The features I want to highlight are occurrences of 
> transcription binding factors. Therefore I want to give every 
> factor its own color, but i didn't see how to manage it. I 
> only can colorize complete tracks.
> Is there a known workaround?

Yes, instead of giving a hardcoded color value you can pass a subroutine
to the option.
-bgcolor => sub {
    my $feat = shift;
    # get your attribute on which you want to base your color
    my ($attr) = $feat->get_tag_values('my_attribute');

    return $attr > 10 ? 'red' : 'green'
}

Not sure about the method calls I am making here (could as well be
get_attributes()) but you get the idea.
Cheers,
Marc


From Marc.Logghe at DEVGEN.com  Tue May  9 14:47:06 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 9 May 2006 16:47:06 +0200
Subject: [Bioperl-l] colorize features
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746D89@ANTARESIA.be.devgen.com>

Hi Peter,
Actually it is explained much better in this howto:
http://bioperl.org/wiki/HOWTO:Graphics

The examples show the principle I mentioned in my previous post (e.g.
Example 4), but then for the -label or -description options.
But as said, you can apply this as well for (most of ?) the other
options as well.
Regards,
ML 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Marc Logghe
> Sent: Tuesday, May 09, 2006 4:13 PM
> To: Peter Menzel; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] colorize features
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Peter 
> > Menzel
> > Sent: Tuesday, May 09, 2006 2:44 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] colorize features
> > 
> > Hi all,
> > I am using the Bio::Graphics module to draw sequences and their 
> > features with Bio::SeqFeature::Generic.
> > The features I want to highlight are occurrences of transcription 
> > binding factors. Therefore I want to give every factor its 
> own color, 
> > but i didn't see how to manage it. I only can colorize complete 
> > tracks.
> > Is there a known workaround?
> 
> Yes, instead of giving a hardcoded color value you can pass a 
> subroutine to the option.
> -bgcolor => sub {
>     my $feat = shift;
>     # get your attribute on which you want to base your color
>     my ($attr) = $feat->get_tag_values('my_attribute');
> 
>     return $attr > 10 ? 'red' : 'green'
> }
> 
> Not sure about the method calls I am making here (could as well be
> get_attributes()) but you get the idea.
> Cheers,
> Marc
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From WiersmaP at AGR.GC.CA  Tue May  9 15:49:33 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Tue, 9 May 2006 11:49:33 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca>

Hi Li,

The line "my $result = $primer3->run" is already in the code you submitted.  In the Bio::Tools::Primer3 module the author uses "$p3" for the object.  If you change your line to "my $p3 = $primer3->run" you should be able to run the examples below. Process the results for each sequence and output the results before looping to the next sequence.

>From Bio::Tools::Primer3.pm:

 # how many results were there?
 my $num=$p3->number_of_results;
 print "There were $num results\n";

 # get all the results
 my $all_results=$p3->all_results;
 print "ALL the results\n";
 foreach my $key (keys %{$all_results}) {print "$key\t${$all_results}{$key}\n"}

 # get specific results
 my $result1=$p3->primer_results(1);
 print "The first primer is\n";
 foreach my $key (keys %{$result1}) {print "$key\t${$result1}{$key}\n"}

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Monday, May 08, 2006 8:32 PM
To: Wiersma, Paul
Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

I read both documents. What I understand is that
Bio:Tools::Run:Primer3 is for designing primers and
Bio:Tools::Primer3 is for parsing the results. When I
read the documents I do not see this line
 $result = $primer3->run in Bio:Tools::Primer3. I
wonder how you get this infomration.

Thanks,

Li 

--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
>  
> 
> When you execute $primer3->run with a
> Bio::Tools::Run::Primer3 object it
> opens -outfile=>"filename" for writing and then
> closes.  That's why
> putting it in a loop will overwrite your output file
> each time so you
> only see the last one.  I suppose you could read in
> each output file
> before looping to the next seq and append it to
> another file.
> 
>  
> 
> If you're doing a fair bit of work with this module
> it would be worth
> looking at the Bio::Tools::Primer3 module.  The
> statement $result =
> $primer3->run produces a Bio::Tools::Primer3 object
> which has all the
> methods you need for customizing your output.
> 
>  
> 
> Paul
> 
>  
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> 
> wiersmap at agr.gc.ca
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Tue May  9 17:32:32 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 9 May 2006 10:32:32 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C42@onncrxms5.agr.gc.ca>
Message-ID: <20060509173232.18843.qmail@web36802.mail.mud.yahoo.com>

Thanks Paul it REALLY works.

I have other questions:
1) When I run the script I use this line on the
command prompt

perl primer.pl >test

When I check the default output file(temp.out) used by
the script I only see the information about the last
sequence which is different from what is in the test
file. In test file I can get all the information for
all the sequences.

2)Is it possible directly to use Bio::Tools:: Primer3
to print out selective information such as the primer
sequence and the size of PCR product?  Or do I have
parse the file by myself?

After I get all these information I would like to post
the script for bacth-designing PCR primers.


Thanks,

Li 


--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
> The line "my $result = $primer3->run" is already in
> the code you submitted.  In the Bio::Tools::Primer3
> module the author uses "$p3" for the object.  If you
> change your line to "my $p3 = $primer3->run" you
> should be able to run the examples below. Process
> the results for each sequence and output the results
> before looping to the next sequence.
> 
> >From Bio::Tools::Primer3.pm:
> 
>  # how many results were there?
>  my $num=$p3->number_of_results;
>  print "There were $num results\n";
> 
>  # get all the results
>  my $all_results=$p3->all_results;
>  print "ALL the results\n";
>  foreach my $key (keys %{$all_results}) {print
> "$key\t${$all_results}{$key}\n"}
> 
>  # get specific results
>  my $result1=$p3->primer_results(1);
>  print "The first primer is\n";
>  foreach my $key (keys %{$result1}) {print
> "$key\t${$result1}{$key}\n"}
> 
> Paul
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> ?
> 
> 
> 
> -----Original Message-----
> From: chen li [mailto:chen_li3 at yahoo.com] 
> Sent: Monday, May 08, 2006 8:32 PM
> To: Wiersma, Paul
> Subject: Re: [Bioperl-l] use primer3 to design
> primers with multiple sequences
> 
> Hi Paul,
> 
> I read both documents. What I understand is that
> Bio:Tools::Run:Primer3 is for designing primers and
> Bio:Tools::Primer3 is for parsing the results. When
> I
> read the documents I do not see this line
>  $result = $primer3->run in Bio:Tools::Primer3. I
> wonder how you get this infomration.
> 
> Thanks,
> 
> Li 
> 
> --- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:
> 
> > Hi Li,
> > 
> >  
> > 
> > When you execute $primer3->run with a
> > Bio::Tools::Run::Primer3 object it
> > opens -outfile=>"filename" for writing and then
> > closes.  That's why
> > putting it in a loop will overwrite your output
> file
> > each time so you
> > only see the last one.  I suppose you could read
> in
> > each output file
> > before looping to the next seq and append it to
> > another file.
> > 
> >  
> > 
> > If you're doing a fair bit of work with this
> module
> > it would be worth
> > looking at the Bio::Tools::Primer3 module.  The
> > statement $result =
> > $primer3->run produces a Bio::Tools::Primer3
> object
> > which has all the
> > methods you need for customizing your output.
> > 
> >  
> > 
> > Paul
> > 
> >  
> > 
> > Paul A. Wiersma
> > Agriculture and Agri-Food Canada/Agriculture et
> > Agroalimentaire Canada
> > Summerland, BC
> > 
> > wiersmap at agr.gc.ca
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From WiersmaP at AGR.GC.CA  Tue May  9 17:59:20 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Tue, 9 May 2006 13:59:20 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca>

Hi Li,

I've attached some code I used to explore basic functionality of Primer3.pm modules.  Hopefully you can see how I've picked out parts of the results for printing.  You can modify it as you need to output only some results.

>>>>>>>>
  # design the primers. This runs primer3 and returns a 
  # Bio::Tools::Run::Primer3 object with the results
my $results=$primer3->run;

  # see the Bio::Tools::Run::Primer3 pod for
  # things that you can get from this. For example:

print "There were ", $results->number_of_results+1, " primers\n";

my @out_keys_part = qw(	START
			LENGTH
			TM
			GC_PERCENT
			SELF_ANY
			SELF_END
			SEQUENCE );

for (my $i=0;$i <= $results->number_of_results;$i++){
	
	# get specific results
	my $result1=$results->primer_results($i);
 
	print "\n",$i+1;	
	for $key qw(PRIMER_LEFT PRIMER_RIGHT){
			
			my ($start, $length) = split /,/, ${$result1}{$key};
			${$result1}{$key."_START"} = $start;
			${$result1}{$key."_LENGTH"} = $length;
			foreach $partkey (@out_keys_part) {
				print "\t", ${$result1}{$key."_".$partkey};
			} 
			print "\n";
	}
	print "\tPRODUCT SIZE: ", ${$result1}{'PRIMER_PRODUCT_SIZE'}, ", PAIR ANY COMPL: ",
				${$result1}{'PRIMER_PAIR_COMPL_ANY'};
	print ", PAIR 3\' COMPL: ", ${$result1}{'PRIMER_PAIR_COMPL_END'}, "\n";
}
>>>>>>>>>>>>>>>

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Telephone/T?l?phone: 250-494-6388
Facsimile/T?l?copieur: 250-494-0755
Box 5000, 4200 Hwy 97
Summerland, BC
V0H 1Z0
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Tuesday, May 09, 2006 10:33 AM
To: Wiersma, Paul
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] use primer3 to design primers with multiple sequences

Thanks Paul it REALLY works.

I have other questions:
1) When I run the script I use this line on the
command prompt

perl primer.pl >test

When I check the default output file(temp.out) used by
the script I only see the information about the last
sequence which is different from what is in the test
file. In test file I can get all the information for
all the sequences.

2)Is it possible directly to use Bio::Tools:: Primer3
to print out selective information such as the primer
sequence and the size of PCR product?  Or do I have
parse the file by myself?

After I get all these information I would like to post
the script for bacth-designing PCR primers.


Thanks,

Li 


--- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> Hi Li,
> 
> The line "my $result = $primer3->run" is already in
> the code you submitted.  In the Bio::Tools::Primer3
> module the author uses "$p3" for the object.  If you
> change your line to "my $p3 = $primer3->run" you
> should be able to run the examples below. Process
> the results for each sequence and output the results
> before looping to the next sequence.
> 
> >From Bio::Tools::Primer3.pm:
> 
>  # how many results were there?
>  my $num=$p3->number_of_results;
>  print "There were $num results\n";
> 
>  # get all the results
>  my $all_results=$p3->all_results;
>  print "ALL the results\n";
>  foreach my $key (keys %{$all_results}) {print
> "$key\t${$all_results}{$key}\n"}
> 
>  # get specific results
>  my $result1=$p3->primer_results(1);
>  print "The first primer is\n";
>  foreach my $key (keys %{$result1}) {print
> "$key\t${$result1}{$key}\n"}
> 
> Paul
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> ?
> 
> 
> 
> -----Original Message-----
> From: chen li [mailto:chen_li3 at yahoo.com] 
> Sent: Monday, May 08, 2006 8:32 PM
> To: Wiersma, Paul
> Subject: Re: [Bioperl-l] use primer3 to design
> primers with multiple sequences
> 
> Hi Paul,
> 
> I read both documents. What I understand is that
> Bio:Tools::Run:Primer3 is for designing primers and
> Bio:Tools::Primer3 is for parsing the results. When
> I
> read the documents I do not see this line
>  $result = $primer3->run in Bio:Tools::Primer3. I
> wonder how you get this infomration.
> 
> Thanks,
> 
> Li 
> 
> --- "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:
> 
> > Hi Li,
> > 
> >  
> > 
> > When you execute $primer3->run with a
> > Bio::Tools::Run::Primer3 object it
> > opens -outfile=>"filename" for writing and then
> > closes.  That's why
> > putting it in a loop will overwrite your output
> file
> > each time so you
> > only see the last one.  I suppose you could read
> in
> > each output file
> > before looping to the next seq and append it to
> > another file.
> > 
> >  
> > 
> > If you're doing a fair bit of work with this
> module
> > it would be worth
> > looking at the Bio::Tools::Primer3 module.  The
> > statement $result =
> > $primer3->run produces a Bio::Tools::Primer3
> object
> > which has all the
> > methods you need for customizing your output.
> > 
> >  
> > 
> > Paul
> > 
> >  
> > 
> > Paul A. Wiersma
> > Agriculture and Agri-Food Canada/Agriculture et
> > Agroalimentaire Canada
> > Summerland, BC
> > 
> > wiersmap at agr.gc.ca
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Tue May  9 21:13:43 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 9 May 2006 16:13:43 -0500
Subject: [Bioperl-l] Oddness in Bio::SeqIO
Message-ID: <000601c673ad$74601c30$15327e82@pyrimidine>

I noticed an odd thing with SeqIO parsing of species lines (those
problematic bacterial tax names again).  I have a simple script that runs
output to STDOUT to generate a list of hits.  Here's what I get:

Bacterium: Corynebacterium glutamicum ATCC 13032
         hits: 4
Bacterium: Corynebacterium jeikeium K411 K411 <--
         hits: 1
Bacterium: Frankia sp. CcI3 CcI3 <--
         hits: 1
Bacterium: Frankia sp. EAN1pec EAN1pec <--
         hits: 1
Bacterium: Janibacter sp. HTCC2649 HTCC2649 <--
         hits: 1
Bacterium: Kineococcus radiotolerans SRS30216 SRS30216  <--
         hits: 1
Bacterium: Leifsonia xyli subsp. xyli str. CTCB07 xyli str. CTCB07 <--
         hits: 1
Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
K-10 <--

...

Most (but not all) of the strain numbers get repeated (marked with arrows).
This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
(and thus passed through Bio::SeqIO).  Anyone seen this before?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From torsten.seemann at infotech.monash.edu.au  Tue May  9 23:42:29 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 10 May 2006 09:42:29 +1000
Subject: [Bioperl-l] Oddness in Bio::SeqIO
In-Reply-To: <000601c673ad$74601c30$15327e82@pyrimidine>
References: <000601c673ad$74601c30$15327e82@pyrimidine>
Message-ID: <446128E5.1000908@infotech.monash.edu.au>

Chris,

> I noticed an odd thing with SeqIO parsing of species lines (those
> problematic bacterial tax names again).  I have a simple script that runs
> output to STDOUT to generate a list of hits.  Here's what I get:

> Bacterium: Mycobacterium avium subsp. paratuberculosis K-10 paratuberculosis
> K-10 <--

In this case,

Genus = Mycobacterium
Species = avium
Subspecies = paratuberculosis
Strain = K-10

which suggests that BioPerl is trying to handle something special, 
because the 'subsp.' is gone?

Here's the pertinent parts of the Genbank file
(apologies for the wrapping):

LOCUS       NC_002944            4829781 bp    DNA     circular BCT 
18-JAN-2006
DEFINITION  Mycobacterium avium subsp. paratuberculosis K-10, complete 
genome.
SOURCE      Mycobacterium avium subsp. paratuberculosis K-10
   ORGANISM  Mycobacterium avium subsp. paratuberculosis K-10
             Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
             Corynebacterineae; Mycobacteriaceae; Mycobacterium; 
Mycobacterium
             avium complex (MAC).

                      /organism="Mycobacterium avium subsp. 
paratuberculosis K-10"
                      /strain="K-10"
                      /sub_species="paratuberculosis"


> Most (but not all) of the strain numbers get repeated (marked with arrows).
> This is actually in the GenBank file itself, downloaded via Bio::DB::GenBank
> (and thus passed through Bio::SeqIO).  Anyone seen this before?

The problem is mentioned in the wiki so it must have come up before?
http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data

I also deal with Bacteria mainly, and should also look into this. I 
haven't been using the genbank headers directly, only the features, so i 
never came across this.

Another thing which may crop up is when no Species has been allocated 
yet but the genus is known (or something like that). In that case the 
name is written as "Genus spp." eg.  	 Gallibacterium spp.

--Torsten


From chen_li3 at yahoo.com  Wed May 10 01:04:08 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 9 May 2006 18:04:08 -0700 (PDT)
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C47@onncrxms5.agr.gc.ca>
Message-ID: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com>

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now
figure out the line
"my $result1=$results->primer_results(1);"

returns a hash reference containing all the
information for the first pair of primer.  1)Since it
is a hash I should be able to get the specific value
for its corresponding  key by telling Perl which key
is the entry for the value. 2) Also it is a reference
I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks
a little bit complicated to me. But I get the job done
by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration 
#foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From zhouyubio at gmail.com  Wed May 10 01:35:01 2006
From: zhouyubio at gmail.com (Yu ZHOU)
Date: Wed, 10 May 2006 01:35:01 +0000 (UTC)
Subject: [Bioperl-l] pubmed
References: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu>
Message-ID: <loom.20060510T032601-573@post.gmane.org>

Qunfeng <qfdong <at> iastate.edu> writes:

> 
> Hi there,
> 
> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
> 
> I am not very familiar with BioPerl. I tried to follow the example showing 
> in the above page to retrieve pubmed ID under each Reference tag , i.e., 
> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The 
> authors() works for me.  Appreciate any suggestions.
> 
> Qunfeng 
> 


Hi, 

I have the same problem with you. Here is what I have done, by using regular
expression to match the value of 'location' tag, if there is.

#------------------
my $ann = $seqobj->annotation(); # annotation object
foreach my $ref ( $ann->get_Annotations('reference') ) {
    print "Title: ", $ref->title,"\n";
    print "Location: ", $ref->location, "\n";
    if ($ref->location =~ /PUBMED\s+(\d+)/) {
	my $pmid = $1;
	print "PMID: ", $pmid, "\n";
    }
    print "Authors: ", $ref->authors, "\n";
}
#------------------


From osborne1 at optonline.net  Wed May 10 03:01:49 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 09 May 2006 23:01:49 -0400
Subject: [Bioperl-l] pubmed
In-Reply-To: <loom.20060510T032601-573@post.gmane.org>
Message-ID: <C086CFDD.865A%osborne1@optonline.net>

Qunfeng,

I'm using bioperl-live, I'm able retrieve the single PubMed id found in the
56961711 entry using the pubmed() method. Note that there are 4 references,
only one of which has a Pubmed id. Also, the authors() method prints out the
authors, not the Pubmed id. If you have a problem please show your code and
tell us which version of Bioperl you're using.

Brian O.


use strict;

use lib "/Users/bosborne/bioperl-live";

use Bio::DB::GenBank;


my $db = Bio::DB::GenBank->new;

my $seq = $db->get_Seq_by_id(56961711);

my $ann_coll = $seq->annotation;


foreach my $ann ($ann_coll->get_Annotations('reference')) {

  print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n";

}


On 5/9/06 9:35 PM, "Yu ZHOU" <zhouyubio at gmail.com> wrote:

> Qunfeng <qfdong <at> iastate.edu> writes:
> 
>> 
>> Hi there,
>> 
>> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
>> 
>> I am not very familiar with BioPerl. I tried to follow the example showing
>> in the above page to retrieve pubmed ID under each Reference tag , i.e.,
>> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The
>> authors() works for me.  Appreciate any suggestions.
>> 
>> Qunfeng 
>> 
> 
> 
> Hi, 
> 
> I have the same problem with you. Here is what I have done, by using regular
> expression to match the value of 'location' tag, if there is.
> 
> #------------------
> my $ann = $seqobj->annotation(); # annotation object
> foreach my $ref ( $ann->get_Annotations('reference') ) {
>     print "Title: ", $ref->title,"\n";
>     print "Location: ", $ref->location, "\n";
>     if ($ref->location =~ /PUBMED\s+(\d+)/) {
> my $pmid = $1;
> print "PMID: ", $pmid, "\n";
>     }
>     print "Authors: ", $ref->authors, "\n";
> }
> #------------------
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Wed May 10 09:30:59 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 10 May 2006 10:30:59 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
Message-ID: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>

Hi,
I'm a little confused as to how names are supposed to work in 
Bio::Taxonomy::Node.

In the bioperl versions that I've looked at a Node doesn't seem to store 
the most important information about itself - it's scientific name - in 
an obvious place. bioperl 1.5.1 puts it at the start of the 
classification list. I'd have thought sticking it in -name would make 
more sense, but this is used only for the GenBank common name.

The Bio::Taxonomy docs still suggests:

my $node_species_sapiens = Bio::Taxonomy::Node->new(
   -object_id => 9606, # or -ncbi_taxid. Requird tag
   -names => {
       'scientific' => ['sapiens'],
       'common_name' => ['human']
   },
   -rank => 'species'  # Required tag
);

and whilst Bio::Taxonomy::Node does not accept -names, it does have a 
'name' method which claims to work like:

$obj->name('scientific', 'sapiens');

This kind of thing would be really nice, but afaics 
Bio::Taxonomy::Node->new takes the -name value and makes a common name 
out of it, whilst the name() method passes any 'scientific' name to the 
scientific_name() method which is unable to set any value (and warns 
about this), only get.

It seems like the need to have this classification array work the same 
way as Bio::Species is causing some unnecessary restrictions. Can't the 
more sensible idea of having a dedicated storage spot for the 
ScientificName and other parameters be used, with the classification 
array either being generated just-in-time from the hash-stored data, or 
indeed being generated from the Lineage field?


Also, why does a node store the complete hierarchy on itself in the 
classification array? If we're going that far, why don't the 
Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a 
get_taxonomy() method instead of a get_Taxonomy_Node() method. 
get_taxonomy() could, from a single efetch.fcgi lookup, create a 
complete Bio::Taxonomy with all the nodes. Whilst most nodes would only 
have a minimum of information, if you could simply ask a node what its 
rank and scientific name was you could easily build a classification 
array, or ask what Kingdom your species was in etc.

Are there good reasons for Taxonomy working the way it does in 1.5.1, or 
would I not be wasting my time re-writing things to make more sense (to me)?


Cheers,
Sendu.


From osborne1 at optonline.net  Wed May 10 14:33:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 10 May 2006 10:33:18 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C43@onncrxms5.agr.gc.ca>
Message-ID: <C08771EE.866A%osborne1@optonline.net>

Paul,

I took your code, added some "run" code and made it into a script and added
this to CVS, examples/tools/run_primer3.pl. I hope this is OK with you.

Brian O.


On 5/9/06 1:59 PM, "Wiersma, Paul" <WiersmaP at agr.gc.ca> wrote:

> $results->number_of_results


From stoltzfu at umbi.umd.edu  Tue May  9 20:22:43 2006
From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus)
Date: Tue, 09 May 2006 16:22:43 -0400
Subject: [Bioperl-l] proposal: CDAT (character data and trees) integrative
	object
Message-ID: <D8EE6983-2123-45B3-967C-0E4982428CFA@umbi.umd.edu>

Dear developers--

We propose a Bio::CDAT (Character Data And Trees) module to  
facilitate comparative analysis
using evolutionary methods by 1) managing evolutionary relationships  
(by linking data to trees)
and 2) allowing coordinated analysis of different types of data (by  
implementing a generic concept
of ?character-state? data).

Bio::CDAT would take advantage of existing BioPerl objects and would  
include the functionality
of Rutger Vos's Bio::Phylo.  It would provide the framework to  
develop interfaces to analysis tools
(phylogeny inference, evolutionary rate models, functional shift  
inference, etc), as well as to file
formats and visualization methods appropriate for such analyses.

A proposal is attached.  We would like to hear your thoughts (e.g.,  
see the section on "Questions
to consider")!  Thanks

Arlin Stoltzfus
WeiGang Qiu
Rutger Vos
(with thanks to Justin Reese and Aaron Mackey)
------------------
Arlin Stoltzfus (stoltzfu at umbi.umd.edu)
CARB, 9600 Gudelsky Drive, Rockville, Maryland  20850
tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel
---------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CDAT-proposal.pdf
Type: application/pdf
Size: 193701 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060509/48aeca4b/attachment-0004.pdf>
-------------- next part --------------


From zhouyubio at gmail.com  Wed May 10 08:55:46 2006
From: zhouyubio at gmail.com (Yu Zhou)
Date: Wed, 10 May 2006 16:55:46 +0800
Subject: [Bioperl-l] pubmed
In-Reply-To: <C086CFDD.865A%osborne1@optonline.net>
References: <loom.20060510T032601-573@post.gmane.org>
	<C086CFDD.865A%osborne1@optonline.net>
Message-ID: <613ffb490605100155w43a9ea4sca23818bc7fa4e33@mail.gmail.com>

Thanks!

I am using Bioperl-1.4, not bioperl-live. That may be the reason why
it does not work!


On 5/10/06, Brian Osborne <osborne1 at optonline.net> wrote:
> Qunfeng,
>
> I'm using bioperl-live, I'm able retrieve the single PubMed id found in the
> 56961711 entry using the pubmed() method. Note that there are 4 references,
> only one of which has a Pubmed id. Also, the authors() method prints out the
> authors, not the Pubmed id. If you have a problem please show your code and
> tell us which version of Bioperl you're using.
>
> Brian O.
>
>
> use strict;
>
> use lib "/Users/bosborne/bioperl-live";
>
> use Bio::DB::GenBank;
>
>
>
> my $db = Bio::DB::GenBank->new;
>
> my $seq = $db->get_Seq_by_id(56961711);
>
> my $ann_coll = $seq->annotation;
>
>
> foreach my $ann ($ann_coll->get_Annotations('reference')) {
>
>   print "Author: ", $ann->authors, "\nPubmed id: ", $ann->pubmed, "\n";
>
> }
>
>
>
>
>
> On 5/9/06 9:35 PM, "Yu ZHOU" <zhouyubio at gmail.com> wrote:
>
> > Qunfeng <qfdong <at> iastate.edu> writes:
> >
> >>
> >> Hi there,
> >>
> >> http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html
> >>
> >> I am not very familiar with BioPerl. I tried to follow the example
> showing
> >> in the above page to retrieve pubmed ID under each Reference tag , i.e.,
> >> $value->pubmed(), but it doesn't work for me for the seq gi#56961711. The
> >> authors() works for me.  Appreciate any suggestions.
> >>
> >> Qunfeng
> >>
> >
> >
> > Hi,
> >
> > I have the same problem with you. Here is what I have done, by using
> regular
> > expression to match the value of 'location' tag, if there is.
> >
> > #------------------
> > my $ann = $seqobj->annotation(); # annotation object
> > foreach my $ref ( $ann->get_Annotations('reference') ) {
> >     print "Title: ", $ref->title,"\n";
> >     print "Location: ", $ref->location, "\n";
> >     if ($ref->location =~ /PUBMED\s+(\d+)/) {
> > my $pmid = $1;
> > print "PMID: ", $pmid, "\n";
> >     }
> >     print "Authors: ", $ref->authors, "\n";
> > }
> > #------------------
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>


--
Best Wishes!

Yu


From cjfields at uiuc.edu  Wed May 10 15:46:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 10:46:27 -0500
Subject: [Bioperl-l] Oddness in Bio::SeqIO
In-Reply-To: <446128E5.1000908@infotech.monash.edu.au>
Message-ID: <000f01c67448$e63973b0$15327e82@pyrimidine>

This actually pops up when using $seq->species->common_name; using
$seq->species->binomial chops some of the strain designations off, so really
neither one works optimally for bacterial genus-species-strain taxonomy.
Hilmar made the suggestion that it's probably best to grab the NCBI TaxID
and parse it out that way by looking it up in the taxonomy database (using
Bio::DB::Taxonomy), but at the moment that's not what Bio::SeqIO::genbank
does.  

I wonder if we should be trying to shove most of this stuff into species
objects directly from the beginning; in other words, maybe we should try to
get the information in Bio::Annotation objects and then, after the
parsing/IO is finished, have a method to get the information into
Bio::Species objects when wanted/needed; a check could be added against the
NCBI Taxonomy database there.  

Anyway, I really haven't looked at how they are parsed out and don't have
the time at the moment.  I may look into this as well but not until I get
back from conference (end of May).  Jason and Brian have been calling for a
refactoring of Bio::SeqIO::genbank for a while; maybe it's getting time to
do something about it...

Chris 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> Sent: Tuesday, May 09, 2006 6:42 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Oddness in Bio::SeqIO
> 
> Chris,
> 
> > I noticed an odd thing with SeqIO parsing of species lines (those
> > problematic bacterial tax names again).  I have a simple script that
> runs
> > output to STDOUT to generate a list of hits.  Here's what I get:
> 
> > Bacterium: Mycobacterium avium subsp. paratuberculosis K-10
> paratuberculosis
> > K-10 <--
> 
> In this case,
> 
> Genus = Mycobacterium
> Species = avium
> Subspecies = paratuberculosis
> Strain = K-10
> 
> which suggests that BioPerl is trying to handle something special,
> because the 'subsp.' is gone?
> 
> Here's the pertinent parts of the Genbank file
> (apologies for the wrapping):
> 
> LOCUS       NC_002944            4829781 bp    DNA     circular BCT
> 18-JAN-2006
> DEFINITION  Mycobacterium avium subsp. paratuberculosis K-10, complete
> genome.
> SOURCE      Mycobacterium avium subsp. paratuberculosis K-10
>    ORGANISM  Mycobacterium avium subsp. paratuberculosis K-10
>              Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
>              Corynebacterineae; Mycobacteriaceae; Mycobacterium;
> Mycobacterium
>              avium complex (MAC).
> 
>                       /organism="Mycobacterium avium subsp.
> paratuberculosis K-10"
>                       /strain="K-10"
>                       /sub_species="paratuberculosis"
> 
> 
> > Most (but not all) of the strain numbers get repeated (marked with
> arrows).
> > This is actually in the GenBank file itself, downloaded via
> Bio::DB::GenBank
> > (and thus passed through Bio::SeqIO).  Anyone seen this before?
> 
> The problem is mentioned in the wiki so it must have come up before?
> http://bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data
> 
> I also deal with Bacteria mainly, and should also look into this. I
> haven't been using the genbank headers directly, only the features, so i
> never came across this.
> 
> Another thing which may crop up is when no Species has been allocated
> yet but the genus is known (or something like that). In that case the
> name is written as "Genus spp." eg.  	 Gallibacterium spp.
> 
> --Torsten
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cuiw at mail.nih.gov  Wed May 10 16:02:55 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 12:02:55 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiplesequences
In-Reply-To: <20060510010408.24494.qmail@web36804.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F4999@nihcesmlbx10.nih.gov>


'PRIMER_SEQUENCE_ID' is not a key in the Bio::Tools::Primer3 output
hash.

You can find all legal keys by "print keys %{$result1};"


There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

From WiersmaP at AGR.GC.CA  Wed May 10 16:08:37 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Wed, 10 May 2006 12:08:37 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiple
	sequences
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca>

Brian, no problem with the code, thanks for asking.

Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0).  If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error.

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com] 
Sent: Tuesday, May 09, 2006 6:04 PM
To: Wiersma, Paul
Cc: bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now
figure out the line
"my $result1=$results->primer_results(1);"

returns a hash reference containing all the
information for the first pair of primer.  1)Since it
is a hash I should be able to get the specific value
for its corresponding  key by telling Perl which key
is the entry for the value. 2) Also it is a reference
I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks
a little bit complicated to me. But I get the job done
by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration 
#foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cuiw at mail.nih.gov  Wed May 10 18:42:36 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 14:42:36 -0400
Subject: [Bioperl-l] use primer3 to design primers with multiplesequences:
	bug in code!
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C48@onncrxms5.agr.gc.ca>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B3F49A0@nihcesmlbx10.nih.gov>

Hope this works!

Bio::Tools::Primer3 line 264 should be:
 
$self->{seqobject}=Bio::Seq->new(-seq=>$value, -id=>$id);

Then you should be able to display PRIMER_SEQUENCE_ID by

####read primer3 output file############
my $p3=Bio::Tools::Primer3->new(-file=>"data/primer3_output.txt");

########  print id###############
print $p3->seqobject->id;

Wenwu Cui, PhD
NIH/NCI


-----Original Message-----
From: Wiersma, Paul [mailto:WiersmaP at agr.gc.ca] 
Sent: Wednesday, May 10, 2006 12:09 PM
To: chen li
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] use primer3 to design primers with multiplesequences

Brian, no problem with the code, thanks for asking.

Li, PRIMER_SEQUENCE_ID and SEQUENCE are not part of the individual results but only end up by default with $results->primer_results(0).  If you try to access them using $results->primer_results(1) (or anything but 0) you will get an error.

Paul

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada Summerland, BC wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: chen li [mailto:chen_li3 at yahoo.com]
Sent: Tuesday, May 09, 2006 6:04 PM
To: Wiersma, Paul
Cc: bioperl-l at bioperl.org
Subject: RE: [Bioperl-l] use primer3 to design primers with multiple sequences

Hi Paul,

Thank you very much.

Just like you point out in your lastest email I now figure out the line "my $result1=$results->primer_results(1);"

returns a hash reference containing all the information for the first pair of primer.  1)Since it is a hash I should be able to get the specific value for its corresponding  key by telling Perl which key is the entry for the value. 2) Also it is a reference I should deference it to get the so-called true value.

I don't know too much OO and Perl and your code looks a little bit complicated to me. But I get the job done by adding the following lines directly:

###############################################
#from Primer3 module to get all the infomration #foreach my $key (sort keys %{$result1}) {
	   #print "$key\t${$result1}{$key}\n"}
##################################################


#get the value for the key in the hash reference
	   
	   
	   my
$key_PRIMER_LEFT_SEQUENCE='PRIMER_LEFT_SEQUENCE'; 
	   print
"$key_PRIMER_LEFT_SEQUENCE\t${$result1}{$key_PRIMER_LEFT_SEQUENCE}\n";
 

There is one point I don't understand:

When I add these two lines into my code (line 49 in my
code)

	   my $key_PRIMER_SEQUENCE_ID='PRIMER_SEQUENCE_ID';		
  	      
  print
"$key_PRIMER_SEQUENCE_ID\t${$result1}{$key_PRIMER_SEQUENCE_ID}\n";

I don't get the PRIMER_SEQUENCE_ID. Perl complains it
and says "Use of uninitialized value in concatenation
(.) or string at primer3-3 line 49."


Li
	   

--- "Wiersma, Paul" <WiersmaP at AGR.GC.CA> wrote:

> Hi Li,
> 
> Just a bit of clarification of the code that I sent
> earlier. 
> The line "my $result1=$results->primer_results($i);"
> gives you a
> reference to a hash that contains all of the
> information for a primer
> pair.
> To access the entries you dereference the hash, i.e.
> the hash is
> %{$result1} and ${$result1}{'PRIMER_PRODUCT_SIZE'}
> gives you the entry
> for product size.  The following are the available
> entries. All are
> single values or strings except PRIMER_RIGHT and
> PRIMER_LEFT which are
> start,length pairs (e.g. PRIMER_LEFT => '60,20')
> which can be pulled out
> with split. 
> my ($start, $length) = split /,/,
> ${$result1}{'PRIMER_LEFT'};
> my $right_Tm =  ${$result1}{'PRIMER_RIGHT_TM'}  
> PRIMER_PRODUCT_SIZE
> PRIMER_PAIR_COMPL_ANY
> PRIMER_PAIR_COMPL_END
> PRIMER_PAIR_PENALTY
> 
> PRIMER_LEFT
> PRIMER_LEFT_END_STABILITY
> PRIMER_LEFT_PENALTY
> PRIMER_LEFT_TM
> PRIMER_LEFT_GC_PERCENT
> PRIMER_LEFT_SELF_ANY
> PRIMER_LEFT_SELF_END
> PRIMER_LEFT_SEQUENCE
> 
> PRIMER_RIGHT
> PRIMER_RIGHT_END_STABILITY
> PRIMER_RIGHT_PENALTY
> PRIMER_RIGHT_TM
> PRIMER_RIGHT_GC_PERCENT
> PRIMER_RIGHT_SELF_ANY
> PRIMER_RIGHT_SELF_END
> PRIMER_RIGHT_SEQUENCE
> 
> Paul A. Wiersma
> Agriculture and Agri-Food Canada/Agriculture et
> Agroalimentaire Canada
> Summerland, BC
> wiersmap at agr.gc.ca
>  
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 10 18:58:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 13:58:19 -0500
Subject: [Bioperl-l] ListSummaries for April 26-May 9
Message-ID: <001801c67463$b3c0a910$15327e82@pyrimidine>

ListSummaries for April 26-May 9 are up at the usual place:

http://www.bioperl.org/wiki/Mailing_list_summaries

Direct link:

http://www.bioperl.org/wiki/ListSummary:April_26-May_9%2C2006

It's a bit of a hurried one so don't be surprised to find a few spelling
errors here and there.  I'm getting ready for a conference in a couple weeks
so I may be off the radar a bit here and there.  The next ListSummary won't
be posted until May 26.  Enjoy!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From chen_li3 at yahoo.com  Thu May 11 00:27:34 2006
From: chen_li3 at yahoo.com (chen li)
Date: Wed, 10 May 2006 17:27:34 -0700 (PDT)
Subject: [Bioperl-l] What is the relationship between primer3 module and
	run-primer3 module?
Message-ID: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>

First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual 
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3 

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run; 

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li 

 
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From jason.stajich at duke.edu  Thu May 11 00:41:31 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 10 May 2006 20:41:31 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module and
	run-primer3 module?
In-Reply-To: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <B1D9C06A-09FF-4342-81E4-7D38AD66F4CA@duke.edu>

Bio::Tools::Run::XXX modules are for running applications...

On May 10, 2006, at 8:27 PM, chen li wrote:

> First thank you all for replying my previous post
> about primer3.
>
> But now I am a little confused even after I read the
> documents: What is the relationship between these two
> modules? What is correct/standard way to use them to
> do the batch-primer design? What I do is that I use
> Bio::Tools::Run::Primer3 to design primers. Based on
> Dr. Roy Chaudhuri's information I can set the
> parameters using the following syntax:
>
> $primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');
>
> Based on Paul A. Wiersma's explanation I can also
> print out part of the primer results(because I don't
> need all the information). But there is a little
> trouble: PRIMER_SEQUENCE_ID can't be accessed using
> this method. And Paul points out that
> "PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
> individual
> results but only end up by default with
> $results->primer_results(0)".  So it seems there is no
> way to get around this problem using
> Bio::Tools::Run::Primer3. And others suggest using
> Bio::Tools::Primer3 to parse the results. So is true
> that Bio::Tools::Run::Primer3 is for primer design and
> Bio::Tools::Primer3 is for parsing the results from
> Bio::Tools::Run::Primer3? But what I find is that I
> get almost all the results (except PRIMER_SEQUENCE_ID
> and SEQUENCE ) without providing a line code
>
> use Bio::Tools::Primer3
>
> in the script.  How to explain this? Is it because the
> following line code?
>
> my $result=$primer3->run;
>
> The last question: which line code is used to invoke
> program primer3.exe? How does Perl script call the
> primer3.exe?
>
> Once again thank you all very much,
>
> Li
>
>
>
>
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Thu May 11 00:53:43 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 10 May 2006 20:53:43 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
Message-ID: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>

I would use the implementation that talks to the flatfile db as the  
standard here.  nodes are defined by the data in from taxonomy dump  
dbs from ncbi.
the eutils is pretty worthless except for taxid->name or reverse, you  
can't get the full taxonomy (or couldn't when that implementation was  
written).

The "name" method refers to the name of the node - each level in the  
taxonomy can have a "name".

The bits of hackiness relate to wrapping the node object as a  
Bio::Species and/or being able to read  a genbank file and the  
organism taxonomy data as a list and instantiating.  If we could rely  
on everything being in a DB of course this would be simpler.

Another problem is the depth of the taxonomy is not constant for  
every node so assuming that a fixed number of slots will be filled in  
to generate the taxonomy leads to problems.

Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the  
best example of working code as this is how I really wanted it to  
work, the Bio::Species hacks are only there to shoehorn data  
retrieved from genbank files in.  With the flatfile implementation  
you have to walk all the way up the db hierarchy to get the kingdom  
for a node so you do have to build up the classification hierarchy as  
each node only stores data about itsself.

I'm not exactly sure what you are proposing to do, but would  
definitely enjoy another pair of hands, I don't really have time to  
mess with it any time soon.

-jason
On May 10, 2006, at 5:30 AM, Sendu Bala wrote:

> Hi,
> I'm a little confused as to how names are supposed to work in
> Bio::Taxonomy::Node.
>
> In the bioperl versions that I've looked at a Node doesn't seem to  
> store
> the most important information about itself - it's scientific name  
> - in
> an obvious place. bioperl 1.5.1 puts it at the start of the
> classification list. I'd have thought sticking it in -name would make
> more sense, but this is used only for the GenBank common name.
>
> The Bio::Taxonomy docs still suggests:
>
> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>    -names => {
>        'scientific' => ['sapiens'],
>        'common_name' => ['human']
>    },
>    -rank => 'species'  # Required tag
> );
>
> and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> 'name' method which claims to work like:
>
> $obj->name('scientific', 'sapiens');
>
> This kind of thing would be really nice, but afaics
> Bio::Taxonomy::Node->new takes the -name value and makes a common name
> out of it, whilst the name() method passes any 'scientific' name to  
> the
> scientific_name() method which is unable to set any value (and warns
> about this), only get.
>
> It seems like the need to have this classification array work the same
> way as Bio::Species is causing some unnecessary restrictions. Can't  
> the
> more sensible idea of having a dedicated storage spot for the
> ScientificName and other parameters be used, with the classification
> array either being generated just-in-time from the hash-stored  
> data, or
> indeed being generated from the Lineage field?
>
>
> Also, why does a node store the complete hierarchy on itself in the
> classification array? If we're going that far, why don't the
> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> complete Bio::Taxonomy with all the nodes. Whilst most nodes would  
> only
> have a minimum of information, if you could simply ask a node what its
> rank and scientific name was you could easily build a classification
> array, or ask what Kingdom your species was in etc.
>
> Are there good reasons for Taxonomy working the way it does in  
> 1.5.1, or
> would I not be wasting my time re-writing things to make more sense  
> (to me)?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cuiw at mail.nih.gov  Thu May 11 01:46:00 2006
From: cuiw at mail.nih.gov (Cui, Wenwu (NIH/NCI) [F])
Date: Wed, 10 May 2006 21:46:00 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module
	andrun-primer3 module?
References: <20060511002734.12570.qmail@web36807.mail.mud.yahoo.com>
Message-ID: <E02CDC9015FB0847BCE4FC309519F39B07D391@nihcesmlbx10.nih.gov>

1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file.
 
2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output PRIMER_SEQUENCE_ID
 
3. primer3.exe is called in the Bio::Tools::Run::Primer3  "run" function, please read the function definition.


________________________________

From: chen li [mailto:chen_li3 at yahoo.com]
Sent: Wed 5/10/2006 8:27 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module?


First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run;

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu May 11 03:36:39 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 10 May 2006 22:36:39 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
Message-ID: <000301c674ac$1d40f0f0$15327e82@pyrimidine>

I think you can get pretty much everything now, though I can definitely see
the use of a local database.  I ran a few tests, really unrelated to this,
using the powerscripting test page at NCBI for eutils (for the curious, at
http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was able to
retrieve XML-formatted taxonomic information; here's the bacterium Frankia
sp. CcI3 TaxID info, which looks like they have everything set up by rank.
It gives quite a bit of information. 
 
<?xml version="1.0"?>
<!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
"http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
<TaxaSet>

<Taxon>
  <TaxId>106370</TaxId>
  <ScientificName>Frankia sp. CcI3</ScientificName>
  <ParentTaxId>1854</ParentTaxId>
  <Rank>species</Rank>
  <Division>Bacteria</Division>
  <GeneticCode>
    <GCId>11</GCId>
    <GCName>Bacterial and Plant Plastid</GCName>
  </GeneticCode>
  <MitoGeneticCode>
    <MGCId>0</MGCId>
    <MGCName>Unspecified</MGCName>
  </MitoGeneticCode>
  <Lineage>cellular organisms; Bacteria; Actinobacteria; Actinobacteria
(class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
Frankia</Lineage>
  <LineageEx>
    <Taxon>
      <TaxId>131567</TaxId>
      <ScientificName>cellular organisms</ScientificName>
      <Rank>no rank</Rank>
    </Taxon>
    <Taxon>
      <TaxId>2</TaxId>
      <ScientificName>Bacteria</ScientificName>
      <Rank>superkingdom</Rank>
    </Taxon>
    <Taxon>
      <TaxId>201174</TaxId>
      <ScientificName>Actinobacteria</ScientificName>
      <Rank>phylum</Rank>
    </Taxon>
    <Taxon>
      <TaxId>1760</TaxId>
      <ScientificName>Actinobacteria (class)</ScientificName>
      <Rank>class</Rank>
    </Taxon>
    <Taxon>
      <TaxId>85003</TaxId>
      <ScientificName>Actinobacteridae</ScientificName>
      <Rank>subclass</Rank>
    </Taxon>
    <Taxon>
      <TaxId>2037</TaxId>
      <ScientificName>Actinomycetales</ScientificName>
      <Rank>order</Rank>
    </Taxon>
    <Taxon>
      <TaxId>85013</TaxId>
      <ScientificName>Frankineae</ScientificName>
      <Rank>suborder</Rank>
    </Taxon>
    <Taxon>
      <TaxId>74712</TaxId>
      <ScientificName>Frankiaceae</ScientificName>
      <Rank>family</Rank>
    </Taxon>
    <Taxon>
      <TaxId>1854</TaxId>
      <ScientificName>Frankia</ScientificName>
      <Rank>genus</Rank>
    </Taxon>
  </LineageEx>
  <CreateDate>1999/10/22</CreateDate>
  <UpdateDate>2005/01/19</UpdateDate>
  <PubDate>2000/02/02</PubDate>
</Taxon>


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Wednesday, May 10, 2006 7:54 PM
> To: Sendu Bala
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> 
> I would use the implementation that talks to the flatfile db as the
> standard here.  nodes are defined by the data in from taxonomy dump
> dbs from ncbi.
> the eutils is pretty worthless except for taxid->name or reverse, you
> can't get the full taxonomy (or couldn't when that implementation was
> written).
> 
> The "name" method refers to the name of the node - each level in the
> taxonomy can have a "name".
> 
> The bits of hackiness relate to wrapping the node object as a
> Bio::Species and/or being able to read  a genbank file and the
> organism taxonomy data as a list and instantiating.  If we could rely
> on everything being in a DB of course this would be simpler.
> 
> Another problem is the depth of the taxonomy is not constant for
> every node so assuming that a fixed number of slots will be filled in
> to generate the taxonomy leads to problems.
> 
> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
> best example of working code as this is how I really wanted it to
> work, the Bio::Species hacks are only there to shoehorn data
> retrieved from genbank files in.  With the flatfile implementation
> you have to walk all the way up the db hierarchy to get the kingdom
> for a node so you do have to build up the classification hierarchy as
> each node only stores data about itsself.
> 
> I'm not exactly sure what you are proposing to do, but would
> definitely enjoy another pair of hands, I don't really have time to
> mess with it any time soon.
> 
> -jason
> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> 
> > Hi,
> > I'm a little confused as to how names are supposed to work in
> > Bio::Taxonomy::Node.
> >
> > In the bioperl versions that I've looked at a Node doesn't seem to
> > store
> > the most important information about itself - it's scientific name
> > - in
> > an obvious place. bioperl 1.5.1 puts it at the start of the
> > classification list. I'd have thought sticking it in -name would make
> > more sense, but this is used only for the GenBank common name.
> >
> > The Bio::Taxonomy docs still suggests:
> >
> > my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >    -names => {
> >        'scientific' => ['sapiens'],
> >        'common_name' => ['human']
> >    },
> >    -rank => 'species'  # Required tag
> > );
> >
> > and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> > 'name' method which claims to work like:
> >
> > $obj->name('scientific', 'sapiens');
> >
> > This kind of thing would be really nice, but afaics
> > Bio::Taxonomy::Node->new takes the -name value and makes a common name
> > out of it, whilst the name() method passes any 'scientific' name to
> > the
> > scientific_name() method which is unable to set any value (and warns
> > about this), only get.
> >
> > It seems like the need to have this classification array work the same
> > way as Bio::Species is causing some unnecessary restrictions. Can't
> > the
> > more sensible idea of having a dedicated storage spot for the
> > ScientificName and other parameters be used, with the classification
> > array either being generated just-in-time from the hash-stored
> > data, or
> > indeed being generated from the Lineage field?
> >
> >
> > Also, why does a node store the complete hierarchy on itself in the
> > classification array? If we're going that far, why don't the
> > Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> > get_taxonomy() method instead of a get_Taxonomy_Node() method.
> > get_taxonomy() could, from a single efetch.fcgi lookup, create a
> > complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> > only
> > have a minimum of information, if you could simply ask a node what its
> > rank and scientific name was you could easily build a classification
> > array, or ask what Kingdom your species was in etc.
> >
> > Are there good reasons for Taxonomy working the way it does in
> > 1.5.1, or
> > would I not be wasting my time re-writing things to make more sense
> > (to me)?
> >
> >
> > Cheers,
> > Sendu.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 11 12:04:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 08:04:54 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
Message-ID: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>

Great - now we just need someone to volunteer to actually work on this.

The current code grabs most of this but I believe expects a different  
XML


On May 10, 2006, at 11:36 PM, Chris Fields wrote:

> I think you can get pretty much everything now, though I can  
> definitely see
> the use of a local database.  I ran a few tests, really unrelated  
> to this,
> using the powerscripting test page at NCBI for eutils (for the  
> curious, at
> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was  
> able to
> retrieve XML-formatted taxonomic information; here's the bacterium  
> Frankia
> sp. CcI3 TaxID info, which looks like they have everything set up  
> by rank.
> It gives quite a bit of information.
>
> <?xml version="1.0"?>
> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> <TaxaSet>
>
> <Taxon>
>   <TaxId>106370</TaxId>
>   <ScientificName>Frankia sp. CcI3</ScientificName>
>   <ParentTaxId>1854</ParentTaxId>
>   <Rank>species</Rank>
>   <Division>Bacteria</Division>
>   <GeneticCode>
>     <GCId>11</GCId>
>     <GCName>Bacterial and Plant Plastid</GCName>
>   </GeneticCode>
>   <MitoGeneticCode>
>     <MGCId>0</MGCId>
>     <MGCName>Unspecified</MGCName>
>   </MitoGeneticCode>
>   <Lineage>cellular organisms; Bacteria; Actinobacteria;  
> Actinobacteria
> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> Frankia</Lineage>
>   <LineageEx>
>     <Taxon>
>       <TaxId>131567</TaxId>
>       <ScientificName>cellular organisms</ScientificName>
>       <Rank>no rank</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>2</TaxId>
>       <ScientificName>Bacteria</ScientificName>
>       <Rank>superkingdom</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>201174</TaxId>
>       <ScientificName>Actinobacteria</ScientificName>
>       <Rank>phylum</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>1760</TaxId>
>       <ScientificName>Actinobacteria (class)</ScientificName>
>       <Rank>class</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>85003</TaxId>
>       <ScientificName>Actinobacteridae</ScientificName>
>       <Rank>subclass</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>2037</TaxId>
>       <ScientificName>Actinomycetales</ScientificName>
>       <Rank>order</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>85013</TaxId>
>       <ScientificName>Frankineae</ScientificName>
>       <Rank>suborder</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>74712</TaxId>
>       <ScientificName>Frankiaceae</ScientificName>
>       <Rank>family</Rank>
>     </Taxon>
>     <Taxon>
>       <TaxId>1854</TaxId>
>       <ScientificName>Frankia</ScientificName>
>       <Rank>genus</Rank>
>     </Taxon>
>   </LineageEx>
>   <CreateDate>1999/10/22</CreateDate>
>   <UpdateDate>2005/01/19</UpdateDate>
>   <PubDate>2000/02/02</PubDate>
> </Taxon>
>
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>> Sent: Wednesday, May 10, 2006 7:54 PM
>> To: Sendu Bala
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>
>> I would use the implementation that talks to the flatfile db as the
>> standard here.  nodes are defined by the data in from taxonomy dump
>> dbs from ncbi.
>> the eutils is pretty worthless except for taxid->name or reverse, you
>> can't get the full taxonomy (or couldn't when that implementation was
>> written).
>>
>> The "name" method refers to the name of the node - each level in the
>> taxonomy can have a "name".
>>
>> The bits of hackiness relate to wrapping the node object as a
>> Bio::Species and/or being able to read  a genbank file and the
>> organism taxonomy data as a list and instantiating.  If we could rely
>> on everything being in a DB of course this would be simpler.
>>
>> Another problem is the depth of the taxonomy is not constant for
>> every node so assuming that a fixed number of slots will be filled in
>> to generate the taxonomy leads to problems.
>>
>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
>> best example of working code as this is how I really wanted it to
>> work, the Bio::Species hacks are only there to shoehorn data
>> retrieved from genbank files in.  With the flatfile implementation
>> you have to walk all the way up the db hierarchy to get the kingdom
>> for a node so you do have to build up the classification hierarchy as
>> each node only stores data about itsself.
>>
>> I'm not exactly sure what you are proposing to do, but would
>> definitely enjoy another pair of hands, I don't really have time to
>> mess with it any time soon.
>>
>> -jason
>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
>>
>>> Hi,
>>> I'm a little confused as to how names are supposed to work in
>>> Bio::Taxonomy::Node.
>>>
>>> In the bioperl versions that I've looked at a Node doesn't seem to
>>> store
>>> the most important information about itself - it's scientific name
>>> - in
>>> an obvious place. bioperl 1.5.1 puts it at the start of the
>>> classification list. I'd have thought sticking it in -name would  
>>> make
>>> more sense, but this is used only for the GenBank common name.
>>>
>>> The Bio::Taxonomy docs still suggests:
>>>
>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>>>    -names => {
>>>        'scientific' => ['sapiens'],
>>>        'common_name' => ['human']
>>>    },
>>>    -rank => 'species'  # Required tag
>>> );
>>>
>>> and whilst Bio::Taxonomy::Node does not accept -names, it does  
>>> have a
>>> 'name' method which claims to work like:
>>>
>>> $obj->name('scientific', 'sapiens');
>>>
>>> This kind of thing would be really nice, but afaics
>>> Bio::Taxonomy::Node->new takes the -name value and makes a common  
>>> name
>>> out of it, whilst the name() method passes any 'scientific' name to
>>> the
>>> scientific_name() method which is unable to set any value (and warns
>>> about this), only get.
>>>
>>> It seems like the need to have this classification array work the  
>>> same
>>> way as Bio::Species is causing some unnecessary restrictions. Can't
>>> the
>>> more sensible idea of having a dedicated storage spot for the
>>> ScientificName and other parameters be used, with the classification
>>> array either being generated just-in-time from the hash-stored
>>> data, or
>>> indeed being generated from the Lineage field?
>>>
>>>
>>> Also, why does a node store the complete hierarchy on itself in the
>>> classification array? If we're going that far, why don't the
>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
>>> only
>>> have a minimum of information, if you could simply ask a node  
>>> what its
>>> rank and scientific name was you could easily build a classification
>>> array, or ask what Kingdom your species was in etc.
>>>
>>> Are there good reasons for Taxonomy working the way it does in
>>> 1.5.1, or
>>> would I not be wasting my time re-writing things to make more sense
>>> (to me)?
>>>
>>>
>>> Cheers,
>>> Sendu.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From sb at mrc-dunn.cam.ac.uk  Thu May 11 11:51:44 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 11 May 2006 12:51:44 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
References: <4461B2D3.7010603@mrc-dunn.cam.ac.uk>
	<655F2803-8272-4A6C-A5C1-73D2C34303FA@duke.edu>
Message-ID: <44632550.3040603@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> I would use the implementation that talks to the flatfile db as the 
> standard here.  nodes are defined by the data in from taxonomy dump
> dbs from ncbi. the eutils is pretty worthless except for taxid->name
> or reverse, you can't get the full taxonomy (or couldn't when that
> implementation was written).

I'm not sure what you mean. In 1.5.1 you have access to the full
taxonomy because you're using efetch.fcgi. Indeed, you parse the full
taxonomy already to get the classification.


> The "name" method refers to the name of the node - each level in the
>  taxonomy can have a "name".

Yes, and to me the 'name of the node' is its scientific name (something
like 'sapiens'), not a 'common' name. So why is it stored as a
'common' name in the object? Why don't the DB::Taxonomy modules store
the actual common names (something like 'human')?


> The bits of hackiness relate to wrapping the node object as a 
> Bio::Species and/or being able to read  a genbank file and the
> organism taxonomy data as a list and instantiating.  If we could rely
> on everything being in a DB of course this would be simpler.

I think that Taxonomy stuff could be done in a 'pure' way, with a new
Bio::Species made as a wrapper around an appropriate Taxonomy module(s)
that cheated and made fake nodes from a genbank list and then made a
proper Bio::Taxonomy.


> With the flatfile implementation you have to walk all the way up the
> db hierarchy to get the kingdom for a node so you do have to build up
> the classification hierarchy as each node only stores data about
> itsself.

I'm still actually using bioperl 1.4 but I'm looking at 1.5.1 assuming
it is the latest available and I see that the flatfile implementation
works the same way as the entrez one. The requested node is fetched, but
then internally it walks the hierarchy purely so it can build a
classification list which is then stored on the object. If you're
already retrieving every node above the the requested node, why not just
return every node? Why not just return a whole Bio::Taxonomy?


> I'm not exactly sure what you are proposing to do, but would
> definitely enjoy another pair of hands, I don't really have time to
> mess with it any time soon.

I shouldn't really be spending any time on it either, but I knocked up a
quick implementation for myself yesterday/today. I'm working on a bunch 
of modules that inherit from bioperl and then add/alter to suit my 
needs. In this regard they're a bit limited and kind of hard-coded to my 
way of thinking, but hopefully you can see my intent and perhaps use 
some of my implementation.

In my implementation:
# DB::Taxonomy::* return a Bio::Taxonomy equivalent with a single 
database lookup.
# The Taxonomy is implicitly a tree.
# The Taxonomy can have branches of different length from root to the
same rank level.
# The Taxonomy isn't told what ranks is has (isn't limited by some
supplied rank list); it has the ranks that its Nodes have and knows
(without being told) what order those ranks should be in.
# The Taxonomy is made of Nodes that truly only contain information
about themselves and have no classification array or anything like that.
# A Node can still be classified.
# We can have Nodes of rank 'no rank' that will be correctly ordered in
the classification.
# Nodes have a scientific name and common names
# You get parent and all children nodes without database lookups.
# There is a Bio::Species like thing that wraps around this and gives
easy access to what I really want to do:

my $human = TFBS::Species->new(-common_name => 'human');
my @classification = $human->classification; # returns the array you'd
expect from a normally created, fully classified Bio::Species
my $kingdom = $human->kingdom # returns 'Metazoa'

# For genbank, we can still supply TFBS::Species a classification array

http://bix.sendu.me.uk/files/taxonomy_the_tfbs_way.tar.gz
(only tested inheriting from bioperl 1.4, but ideally that shouldn't 
make any difference!)

Is there any scope for bioperl Taxonomy becoming more like this? Or are
there problems with my design (quite likely!)? Or are there good reasons
for maintaining the current way of working? Please feel free to shoot me
down/ discuss.


Cheers,
Sendu.


From sb at mrc-dunn.cam.ac.uk  Thu May 11 12:22:53 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Thu, 11 May 2006 13:22:53 +0100
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
	<EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
Message-ID: <44632C9D.4010408@mrc-dunn.cam.ac.uk>

Jason Stajich wrote:
> Great - now we just need someone to volunteer to actually work on this.

Now I'm really confused...


> The current code grabs most of this but I believe expects a different XML

No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez expects 
that XML, and parses it as fully as flatfile.pm does. Nothing more to 
do. Weren't you the person that wrote that parser?

I parse the same XML in my version of entrez.pm (see my previous email); 
the main difference being I make Nodes out of each Taxon instead of just 
adding each Taxon's ScientificName to the classification array.


From jason.stajich at duke.edu  Thu May 11 13:53:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 09:53:56 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <44632C9D.4010408@mrc-dunn.cam.ac.uk>
References: <000301c674ac$1d40f0f0$15327e82@pyrimidine>
	<EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
	<44632C9D.4010408@mrc-dunn.cam.ac.uk>
Message-ID: <AAFFC5EC-8B54-4D87-BE38-CB90785AD4B5@duke.edu>

i guess so - long since forgotten what it supports though since I  
don't regularly use it. sorry.

On May 11, 2006, at 8:22 AM, Sendu Bala wrote:

> Jason Stajich wrote:
>> Great - now we just need someone to volunteer to actually work on  
>> this.
>
> Now I'm really confused...
>
>
>> The current code grabs most of this but I believe expects a  
>> different XML
>
> No, I think the code in bioperl 1.5.1 Bio::DB::Taxonomy::entrez  
> expects
> that XML, and parses it as fully as flatfile.pm does. Nothing more to
> do. Weren't you the person that wrote that parser?
>
> I parse the same XML in my version of entrez.pm (see my previous  
> email);
> the main difference being I make Nodes out of each Taxon instead of  
> just
> adding each Taxon's ScientificName to the classification array.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Thu May 11 14:57:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 11 May 2006 09:57:20 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <EC4CF746-DF1E-4D87-BDE1-D8DE84DAD22C@duke.edu>
Message-ID: <000b01c6750b$33e95ea0$15327e82@pyrimidine>

Heh... 

To tell the truth, I haven't looked at Bio::DB::Taxonomy in any depth yet,
but I myself have seen issues with the way Bio::Species treats bacterial
strains (I guess this also involves Bio::Taxonomy::Node since that's what
Bio::Species delegates to).  Seems it likes to repeat some strain names when
using $seq->species->common_name.  Not a killer problem but annoying since
the correct name is in the source tag in the feature table!  I 'could' take
a look at it but I can't guarantee quick results.

Jason, I could add Taxonomy to the EUtilities overhaul I mentioned to you
previously but it'll take awhile to get going.  I'm really more interested
in getting epost-esearch-efetch sequence retrieval up and running first with
the same API as Bio::DB::GenBank/Genpept and Bio::DB::Query::GenBank, donate
the code (late summer/fall???) after working out namespace issues so it
doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I suppose I
could also look at Bio::DB:Taxonomy to see what's up in the next couple of
weeks (after conference), unless someone gets to it sooner.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Thursday, May 11, 2006 7:05 AM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> 
> Great - now we just need someone to volunteer to actually work on this.
> 
> The current code grabs most of this but I believe expects a different
> XML
> 
> 
> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
> 
> > I think you can get pretty much everything now, though I can
> > definitely see
> > the use of a local database.  I ran a few tests, really unrelated
> > to this,
> > using the powerscripting test page at NCBI for eutils (for the
> > curious, at
> > http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
> > able to
> > retrieve XML-formatted taxonomic information; here's the bacterium
> > Frankia
> > sp. CcI3 TaxID info, which looks like they have everything set up
> > by rank.
> > It gives quite a bit of information.
> >
> > <?xml version="1.0"?>
> > <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> > <TaxaSet>
> >
> > <Taxon>
> >   <TaxId>106370</TaxId>
> >   <ScientificName>Frankia sp. CcI3</ScientificName>
> >   <ParentTaxId>1854</ParentTaxId>
> >   <Rank>species</Rank>
> >   <Division>Bacteria</Division>
> >   <GeneticCode>
> >     <GCId>11</GCId>
> >     <GCName>Bacterial and Plant Plastid</GCName>
> >   </GeneticCode>
> >   <MitoGeneticCode>
> >     <MGCId>0</MGCId>
> >     <MGCName>Unspecified</MGCName>
> >   </MitoGeneticCode>
> >   <Lineage>cellular organisms; Bacteria; Actinobacteria;
> > Actinobacteria
> > (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> > Frankia</Lineage>
> >   <LineageEx>
> >     <Taxon>
> >       <TaxId>131567</TaxId>
> >       <ScientificName>cellular organisms</ScientificName>
> >       <Rank>no rank</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>2</TaxId>
> >       <ScientificName>Bacteria</ScientificName>
> >       <Rank>superkingdom</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>201174</TaxId>
> >       <ScientificName>Actinobacteria</ScientificName>
> >       <Rank>phylum</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>1760</TaxId>
> >       <ScientificName>Actinobacteria (class)</ScientificName>
> >       <Rank>class</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>85003</TaxId>
> >       <ScientificName>Actinobacteridae</ScientificName>
> >       <Rank>subclass</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>2037</TaxId>
> >       <ScientificName>Actinomycetales</ScientificName>
> >       <Rank>order</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>85013</TaxId>
> >       <ScientificName>Frankineae</ScientificName>
> >       <Rank>suborder</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>74712</TaxId>
> >       <ScientificName>Frankiaceae</ScientificName>
> >       <Rank>family</Rank>
> >     </Taxon>
> >     <Taxon>
> >       <TaxId>1854</TaxId>
> >       <ScientificName>Frankia</ScientificName>
> >       <Rank>genus</Rank>
> >     </Taxon>
> >   </LineageEx>
> >   <CreateDate>1999/10/22</CreateDate>
> >   <UpdateDate>2005/01/19</UpdateDate>
> >   <PubDate>2000/02/02</PubDate>
> > </Taxon>
> >
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >> Sent: Wednesday, May 10, 2006 7:54 PM
> >> To: Sendu Bala
> >> Cc: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>
> >> I would use the implementation that talks to the flatfile db as the
> >> standard here.  nodes are defined by the data in from taxonomy dump
> >> dbs from ncbi.
> >> the eutils is pretty worthless except for taxid->name or reverse, you
> >> can't get the full taxonomy (or couldn't when that implementation was
> >> written).
> >>
> >> The "name" method refers to the name of the node - each level in the
> >> taxonomy can have a "name".
> >>
> >> The bits of hackiness relate to wrapping the node object as a
> >> Bio::Species and/or being able to read  a genbank file and the
> >> organism taxonomy data as a list and instantiating.  If we could rely
> >> on everything being in a DB of course this would be simpler.
> >>
> >> Another problem is the depth of the taxonomy is not constant for
> >> every node so assuming that a fixed number of slots will be filled in
> >> to generate the taxonomy leads to problems.
> >>
> >> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
> >> best example of working code as this is how I really wanted it to
> >> work, the Bio::Species hacks are only there to shoehorn data
> >> retrieved from genbank files in.  With the flatfile implementation
> >> you have to walk all the way up the db hierarchy to get the kingdom
> >> for a node so you do have to build up the classification hierarchy as
> >> each node only stores data about itsself.
> >>
> >> I'm not exactly sure what you are proposing to do, but would
> >> definitely enjoy another pair of hands, I don't really have time to
> >> mess with it any time soon.
> >>
> >> -jason
> >> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> >>
> >>> Hi,
> >>> I'm a little confused as to how names are supposed to work in
> >>> Bio::Taxonomy::Node.
> >>>
> >>> In the bioperl versions that I've looked at a Node doesn't seem to
> >>> store
> >>> the most important information about itself - it's scientific name
> >>> - in
> >>> an obvious place. bioperl 1.5.1 puts it at the start of the
> >>> classification list. I'd have thought sticking it in -name would
> >>> make
> >>> more sense, but this is used only for the GenBank common name.
> >>>
> >>> The Bio::Taxonomy docs still suggests:
> >>>
> >>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >>>    -names => {
> >>>        'scientific' => ['sapiens'],
> >>>        'common_name' => ['human']
> >>>    },
> >>>    -rank => 'species'  # Required tag
> >>> );
> >>>
> >>> and whilst Bio::Taxonomy::Node does not accept -names, it does
> >>> have a
> >>> 'name' method which claims to work like:
> >>>
> >>> $obj->name('scientific', 'sapiens');
> >>>
> >>> This kind of thing would be really nice, but afaics
> >>> Bio::Taxonomy::Node->new takes the -name value and makes a common
> >>> name
> >>> out of it, whilst the name() method passes any 'scientific' name to
> >>> the
> >>> scientific_name() method which is unable to set any value (and warns
> >>> about this), only get.
> >>>
> >>> It seems like the need to have this classification array work the
> >>> same
> >>> way as Bio::Species is causing some unnecessary restrictions. Can't
> >>> the
> >>> more sensible idea of having a dedicated storage spot for the
> >>> ScientificName and other parameters be used, with the classification
> >>> array either being generated just-in-time from the hash-stored
> >>> data, or
> >>> indeed being generated from the Lineage field?
> >>>
> >>>
> >>> Also, why does a node store the complete hierarchy on itself in the
> >>> classification array? If we're going that far, why don't the
> >>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> >>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> >>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> >>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> >>> only
> >>> have a minimum of information, if you could simply ask a node
> >>> what its
> >>> rank and scientific name was you could easily build a classification
> >>> array, or ask what Kingdom your species was in etc.
> >>>
> >>> Are there good reasons for Taxonomy working the way it does in
> >>> 1.5.1, or
> >>> would I not be wasting my time re-writing things to make more sense
> >>> (to me)?
> >>>
> >>>
> >>> Cheers,
> >>> Sendu.
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 11 15:42:07 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 11 May 2006 11:42:07 -0400
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <000b01c6750b$33e95ea0$15327e82@pyrimidine>
References: <000b01c6750b$33e95ea0$15327e82@pyrimidine>
Message-ID: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu>


I think you'll see it is different and mostly a limitation of the  
genbank format and the Bio::Species objects that you get from a  
genbank parse do represent the full capabilities of a Taxonomy::Node.

I am happy for someone to overhaul things, but it all boils down to  
inferring which part of a list of names is the species versus sub- 
species versus strain when none of the members of the list are  
labeled.  This is some of the same problems we have for swissprot as  
well.  I just don't think we can do it right only from the genbank  
file data so I don't see a lot of point of expecting Bio::Species to  
provide more than a representation of what is in the file and just  
return that array.


It has seemed like we need to special case things pretty heavily or  
do a lookup in the taxonomydb for something.

Can you guess what value is the strain versus sub-species?  What  
happens when there is a two part strain name (space separated) and a  
sub-species or variety designation?

SOURCE      Staphylococcus haemolyticus JCSC1435
   ORGANISM  Staphylococcus haemolyticus JCSC1435
             Bacteria; Firmicutes; Bacillales; Staphylococcus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
strain is JCSC1435

versus
SOURCE      Muntiacus muntjak vaginalis
   ORGANISM  Muntiacus muntjak vaginalis
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;  
Ruminantia;
             Pecora; Cervidae; Muntiacinae; Muntiacus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
species is muntjak, sub-species vaginalis ?

versus
SOURCE      Aspergillus nidulans FGSC A4
   ORGANISM  Aspergillus nidulans FGSC A4
             Eukaryota; Fungi; Ascomycota; Pezizomycotina;  
Eurotiomycetes;
             Eurotiales; Trichocomaceae; Emericella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321

Genus should be Aspergillus or Emericella ?

Strain and subspecies/variety in the same entry
SOURCE      Cryptococcus neoformans var. grubii H99
   ORGANISM  Cryptococcus neoformans var. grubii H99
             Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
             Heterobasidiomycetes; Tremellomycetidae; Tremellales;  
Tremellaceae;
             Filobasidiella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443


On May 11, 2006, at 10:57 AM, Chris Fields wrote:

> Heh...
>
> To tell the truth, I haven't looked at Bio::DB::Taxonomy in any  
> depth yet,
> but I myself have seen issues with the way Bio::Species treats  
> bacterial
> strains (I guess this also involves Bio::Taxonomy::Node since  
> that's what
> Bio::Species delegates to).  Seems it likes to repeat some strain  
> names when
> using $seq->species->common_name.  Not a killer problem but  
> annoying since
> the correct name is in the source tag in the feature table!  I  
> 'could' take
> a look at it but I can't guarantee quick results.
>
> Jason, I could add Taxonomy to the EUtilities overhaul I mentioned  
> to you
> previously but it'll take awhile to get going.  I'm really more  
> interested
> in getting epost-esearch-efetch sequence retrieval up and running  
> first with
> the same API as Bio::DB::GenBank/Genpept and  
> Bio::DB::Query::GenBank, donate
> the code (late summer/fall???) after working out namespace issues  
> so it
> doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I  
> suppose I
> could also look at Bio::DB:Taxonomy to see what's up in the next  
> couple of
> weeks (after conference), unless someone gets to it sooner.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>> Sent: Thursday, May 11, 2006 7:05 AM
>> To: Chris Fields
>> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>
>> Great - now we just need someone to volunteer to actually work on  
>> this.
>>
>> The current code grabs most of this but I believe expects a different
>> XML
>>
>>
>> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
>>
>>> I think you can get pretty much everything now, though I can
>>> definitely see
>>> the use of a local database.  I ran a few tests, really unrelated
>>> to this,
>>> using the powerscripting test page at NCBI for eutils (for the
>>> curious, at
>>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
>>> able to
>>> retrieve XML-formatted taxonomic information; here's the bacterium
>>> Frankia
>>> sp. CcI3 TaxID info, which looks like they have everything set up
>>> by rank.
>>> It gives quite a bit of information.
>>>
>>> <?xml version="1.0"?>
>>> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
>>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
>>> <TaxaSet>
>>>
>>> <Taxon>
>>>   <TaxId>106370</TaxId>
>>>   <ScientificName>Frankia sp. CcI3</ScientificName>
>>>   <ParentTaxId>1854</ParentTaxId>
>>>   <Rank>species</Rank>
>>>   <Division>Bacteria</Division>
>>>   <GeneticCode>
>>>     <GCId>11</GCId>
>>>     <GCName>Bacterial and Plant Plastid</GCName>
>>>   </GeneticCode>
>>>   <MitoGeneticCode>
>>>     <MGCId>0</MGCId>
>>>     <MGCName>Unspecified</MGCName>
>>>   </MitoGeneticCode>
>>>   <Lineage>cellular organisms; Bacteria; Actinobacteria;
>>> Actinobacteria
>>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
>>> Frankia</Lineage>
>>>   <LineageEx>
>>>     <Taxon>
>>>       <TaxId>131567</TaxId>
>>>       <ScientificName>cellular organisms</ScientificName>
>>>       <Rank>no rank</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>2</TaxId>
>>>       <ScientificName>Bacteria</ScientificName>
>>>       <Rank>superkingdom</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>201174</TaxId>
>>>       <ScientificName>Actinobacteria</ScientificName>
>>>       <Rank>phylum</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>1760</TaxId>
>>>       <ScientificName>Actinobacteria (class)</ScientificName>
>>>       <Rank>class</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>85003</TaxId>
>>>       <ScientificName>Actinobacteridae</ScientificName>
>>>       <Rank>subclass</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>2037</TaxId>
>>>       <ScientificName>Actinomycetales</ScientificName>
>>>       <Rank>order</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>85013</TaxId>
>>>       <ScientificName>Frankineae</ScientificName>
>>>       <Rank>suborder</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>74712</TaxId>
>>>       <ScientificName>Frankiaceae</ScientificName>
>>>       <Rank>family</Rank>
>>>     </Taxon>
>>>     <Taxon>
>>>       <TaxId>1854</TaxId>
>>>       <ScientificName>Frankia</ScientificName>
>>>       <Rank>genus</Rank>
>>>     </Taxon>
>>>   </LineageEx>
>>>   <CreateDate>1999/10/22</CreateDate>
>>>   <UpdateDate>2005/01/19</UpdateDate>
>>>   <PubDate>2000/02/02</PubDate>
>>> </Taxon>
>>>
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
>>>> Sent: Wednesday, May 10, 2006 7:54 PM
>>>> To: Sendu Bala
>>>> Cc: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
>>>>
>>>> I would use the implementation that talks to the flatfile db as the
>>>> standard here.  nodes are defined by the data in from taxonomy dump
>>>> dbs from ncbi.
>>>> the eutils is pretty worthless except for taxid->name or  
>>>> reverse, you
>>>> can't get the full taxonomy (or couldn't when that  
>>>> implementation was
>>>> written).
>>>>
>>>> The "name" method refers to the name of the node - each level in  
>>>> the
>>>> taxonomy can have a "name".
>>>>
>>>> The bits of hackiness relate to wrapping the node object as a
>>>> Bio::Species and/or being able to read  a genbank file and the
>>>> organism taxonomy data as a list and instantiating.  If we could  
>>>> rely
>>>> on everything being in a DB of course this would be simpler.
>>>>
>>>> Another problem is the depth of the taxonomy is not constant for
>>>> every node so assuming that a fixed number of slots will be  
>>>> filled in
>>>> to generate the taxonomy leads to problems.
>>>>
>>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as  
>>>> the
>>>> best example of working code as this is how I really wanted it to
>>>> work, the Bio::Species hacks are only there to shoehorn data
>>>> retrieved from genbank files in.  With the flatfile implementation
>>>> you have to walk all the way up the db hierarchy to get the kingdom
>>>> for a node so you do have to build up the classification  
>>>> hierarchy as
>>>> each node only stores data about itsself.
>>>>
>>>> I'm not exactly sure what you are proposing to do, but would
>>>> definitely enjoy another pair of hands, I don't really have time to
>>>> mess with it any time soon.
>>>>
>>>> -jason
>>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
>>>>
>>>>> Hi,
>>>>> I'm a little confused as to how names are supposed to work in
>>>>> Bio::Taxonomy::Node.
>>>>>
>>>>> In the bioperl versions that I've looked at a Node doesn't seem to
>>>>> store
>>>>> the most important information about itself - it's scientific name
>>>>> - in
>>>>> an obvious place. bioperl 1.5.1 puts it at the start of the
>>>>> classification list. I'd have thought sticking it in -name would
>>>>> make
>>>>> more sense, but this is used only for the GenBank common name.
>>>>>
>>>>> The Bio::Taxonomy docs still suggests:
>>>>>
>>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>>>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>>>>>    -names => {
>>>>>        'scientific' => ['sapiens'],
>>>>>        'common_name' => ['human']
>>>>>    },
>>>>>    -rank => 'species'  # Required tag
>>>>> );
>>>>>
>>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does
>>>>> have a
>>>>> 'name' method which claims to work like:
>>>>>
>>>>> $obj->name('scientific', 'sapiens');
>>>>>
>>>>> This kind of thing would be really nice, but afaics
>>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common
>>>>> name
>>>>> out of it, whilst the name() method passes any 'scientific'  
>>>>> name to
>>>>> the
>>>>> scientific_name() method which is unable to set any value (and  
>>>>> warns
>>>>> about this), only get.
>>>>>
>>>>> It seems like the need to have this classification array work the
>>>>> same
>>>>> way as Bio::Species is causing some unnecessary restrictions.  
>>>>> Can't
>>>>> the
>>>>> more sensible idea of having a dedicated storage spot for the
>>>>> ScientificName and other parameters be used, with the  
>>>>> classification
>>>>> array either being generated just-in-time from the hash-stored
>>>>> data, or
>>>>> indeed being generated from the Lineage field?
>>>>>
>>>>>
>>>>> Also, why does a node store the complete hierarchy on itself in  
>>>>> the
>>>>> classification array? If we're going that far, why don't the
>>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just  
>>>>> have a
>>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
>>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
>>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
>>>>> only
>>>>> have a minimum of information, if you could simply ask a node
>>>>> what its
>>>>> rank and scientific name was you could easily build a  
>>>>> classification
>>>>> array, or ask what Kingdom your species was in etc.
>>>>>
>>>>> Are there good reasons for Taxonomy working the way it does in
>>>>> 1.5.1, or
>>>>> would I not be wasting my time re-writing things to make more  
>>>>> sense
>>>>> (to me)?
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Sendu.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From WiersmaP at AGR.GC.CA  Thu May 11 17:04:01 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Thu, 11 May 2006 13:04:01 -0400
Subject: [Bioperl-l] What is the relationship between primer3
	moduleandrun-primer3 module?
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca>

The bug that Wenwu referred should only occur when reading a Primer3 output file;  the Bio::Tools::Run::Primer3->run method takes the results and directly transfers them to a Bio::Tools::Primer3 object without an intermediate file.  A Data::Dumper look at the Bio::Tools::Primer3 object shows the keys and results for PRIMER_SEQUENCE_ID and SEQUENCE in 'results' and then again in the 'results_by_number' hash but only in the '0' hash.

All of this doesn't really matter for Li's original concern.  If you want to include the id of sequence along with the primer3 results just take it from the seq object (i.e. $seq->display_id() ).  Since you are in a loop taking one sequence at a time this $seq will be the one that was sent to primer3.

PAW

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Cui, Wenwu (NIH/NCI) [F]
Sent: Wednesday, May 10, 2006 6:46 PM
To: chen li; bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] What is the relationship between primer3 moduleandrun-primer3 module?

1. Bio::Tools::Primer3 is already included in Bio::Tools::Run::Primer3 module so that you can parse the result file.
 
2. There is a bug in Bio::Toos::Primer3.pm line 264 as I mentioned. Once fixed, it can output 
 
3. primer3.exe is called in the Bio::Tools::Run::Primer3  "run" function, please read the function definition.


From cjfields at uiuc.edu  Thu May 11 17:16:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 11 May 2006 12:16:19 -0500
Subject: [Bioperl-l] Bio::Taxonomy confusion
In-Reply-To: <0C1C2DAC-F388-465E-B6C2-7217A3B4CC6C@duke.edu>
Message-ID: <000f01c6751e$9e89d6a0$15327e82@pyrimidine>

> I think you'll see it is different and mostly a limitation of the
> genbank format and the Bio::Species objects that you get from a
> genbank parse do represent the full capabilities of a Taxonomy::Node.

I definitely see the rational for using a TaxID lookup (I think Hilmar said
so as well), especially for local databases.  I wonder, though, if there is
a way that RichSeqs like GenBank, when passed through SeqIO, can be just be
'short-circuited' using the sequence builder to just accept what's on the
SOURCE or ORGANISM line of a file as is, without forcing it into
Bio::Species/Bio::Taxonomy::Node.  Or maybe diminish the role of the
SOURCE/ORGANISM lines altogether to just simple Annotation objects and place
much greater emphasis on the TaxID itself, in effect decoupling the TaxID
(taxonomic information) from SOURCE/ORGANISM (annotation information).

In other words, have GenBank/EMBL classification lines and organism lines
essentially stay like they are in the input file (use simple objects).
Then, if one were really intent on getting the full name, classification,
etc., or one wanted to store their sequences in bioperl-db, they would be
required to either have a local db of NCBI Taxonomy or remote access to a
similar database (NCBI or something else) so a lookup could be accomplished
using the TaxID.  If they us BioSQL, then require them to preload their
BioSQL database with NCBI's taxonomy, something Hilmar already strongly
suggests.

If anyone isn't interested in the taxonomic information or doesn't want to
bother grabbing the database or setting up remote access, tough luck; just
grab the Bio::Annotation/Bio::Species object and use that.  As the saying
goes, "you can't be all things to all people."  At some point you have to
throw your arms in the air, do the best you can, but give up trying to
please everyone.

> I am happy for someone to overhaul things, but it all boils down to
> inferring which part of a list of names is the species versus sub-
> species versus strain when none of the members of the list are
> labeled.  This is some of the same problems we have for swissprot as
> well.  I just don't think we can do it right only from the genbank
> file data so I don't see a lot of point of expecting Bio::Species to
> provide more than a representation of what is in the file and just
> return that array.
> 
> 
> It has seemed like we need to special case things pretty heavily or
> do a lookup in the taxonomydb for something.
> 
> Can you guess what value is the strain versus sub-species?  What
> happens when there is a two part strain name (space separated) and a
> sub-species or variety designation?
> 
> SOURCE      Staphylococcus haemolyticus JCSC1435
>    ORGANISM  Staphylococcus haemolyticus JCSC1435
>              Bacteria; Firmicutes; Bacillales; Staphylococcus.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
> strain is JCSC1435
> 
> versus
> SOURCE      Muntiacus muntjak vaginalis
>    ORGANISM  Muntiacus muntjak vaginalis
>              Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
>              Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;
> Ruminantia;
>              Pecora; Cervidae; Muntiacinae; Muntiacus.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
> species is muntjak, sub-species vaginalis ?
> 
> versus
> SOURCE      Aspergillus nidulans FGSC A4
>    ORGANISM  Aspergillus nidulans FGSC A4
>              Eukaryota; Fungi; Ascomycota; Pezizomycotina;
> Eurotiomycetes;
>              Eurotiales; Trichocomaceae; Emericella.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321
> 
> Genus should be Aspergillus or Emericella ?
> 
> Strain and subspecies/variety in the same entry
> SOURCE      Cryptococcus neoformans var. grubii H99
>    ORGANISM  Cryptococcus neoformans var. grubii H99
>              Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
>              Heterobasidiomycetes; Tremellomycetidae; Tremellales;
> Tremellaceae;
>              Filobasidiella.
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443

Definitely tricky!  This really points out the problem here.  It used to be
a problem for only a few cases but with so many bacterial and fungal genomes
that's changed.  

The Frankia XML example has the scientific name set to "Frankia sp. CcI3",
which matches the SOURCE/ORGANISM line in NCBI's GenBank files and the OS
line in EMBL files.  It looks like the lines are parsed into and then built
from the ground-up in Bio::SeqIO::genbank using Bio::Species objects, which,
in my case with the strain designation, is where the problem lies.  They
could be placed in annotation objects with (-tagname=> 'SOURCE', value
=>'Frankia sp. CcI3') or similar settings.  Or simplify Bio::Species to only
represent the information in the GenBank SOURCE/ORGANISM/CLASSIFICATION or
EMBL OS/OC lines and nothing more complex than that (no complex taxonomy;
for that you use the TaxID and local database). 

Okay,  I need to lay off the coffee now...

Chris

> On May 11, 2006, at 10:57 AM, Chris Fields wrote:
> 
> > Heh...
> >
> > To tell the truth, I haven't looked at Bio::DB::Taxonomy in any
> > depth yet,
> > but I myself have seen issues with the way Bio::Species treats
> > bacterial
> > strains (I guess this also involves Bio::Taxonomy::Node since
> > that's what
> > Bio::Species delegates to).  Seems it likes to repeat some strain
> > names when
> > using $seq->species->common_name.  Not a killer problem but
> > annoying since
> > the correct name is in the source tag in the feature table!  I
> > 'could' take
> > a look at it but I can't guarantee quick results.
> >
> > Jason, I could add Taxonomy to the EUtilities overhaul I mentioned
> > to you
> > previously but it'll take awhile to get going.  I'm really more
> > interested
> > in getting epost-esearch-efetch sequence retrieval up and running
> > first with
> > the same API as Bio::DB::GenBank/Genpept and
> > Bio::DB::Query::GenBank, donate
> > the code (late summer/fall???) after working out namespace issues
> > so it
> > doesn't conflict with current Bio::DB::WebDBSeqI inheritance.  I
> > suppose I
> > could also look at Bio::DB:Taxonomy to see what's up in the next
> > couple of
> > weeks (after conference), unless someone gets to it sooner.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >> Sent: Thursday, May 11, 2006 7:05 AM
> >> To: Chris Fields
> >> Cc: bioperl-l at lists.open-bio.org; 'Sendu Bala'
> >> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>
> >> Great - now we just need someone to volunteer to actually work on
> >> this.
> >>
> >> The current code grabs most of this but I believe expects a different
> >> XML
> >>
> >>
> >> On May 10, 2006, at 11:36 PM, Chris Fields wrote:
> >>
> >>> I think you can get pretty much everything now, though I can
> >>> definitely see
> >>> the use of a local database.  I ran a few tests, really unrelated
> >>> to this,
> >>> using the powerscripting test page at NCBI for eutils (for the
> >>> curious, at
> >>> http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/eu.cgi) and was
> >>> able to
> >>> retrieve XML-formatted taxonomic information; here's the bacterium
> >>> Frankia
> >>> sp. CcI3 TaxID info, which looks like they have everything set up
> >>> by rank.
> >>> It gives quite a bit of information.
> >>>
> >>> <?xml version="1.0"?>
> >>> <!DOCTYPE TaxaSet PUBLIC "-//NLM//DTD Taxon, 14th January 2002//EN"
> >>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/taxon.dtd">
> >>> <TaxaSet>
> >>>
> >>> <Taxon>
> >>>   <TaxId>106370</TaxId>
> >>>   <ScientificName>Frankia sp. CcI3</ScientificName>
> >>>   <ParentTaxId>1854</ParentTaxId>
> >>>   <Rank>species</Rank>
> >>>   <Division>Bacteria</Division>
> >>>   <GeneticCode>
> >>>     <GCId>11</GCId>
> >>>     <GCName>Bacterial and Plant Plastid</GCName>
> >>>   </GeneticCode>
> >>>   <MitoGeneticCode>
> >>>     <MGCId>0</MGCId>
> >>>     <MGCName>Unspecified</MGCName>
> >>>   </MitoGeneticCode>
> >>>   <Lineage>cellular organisms; Bacteria; Actinobacteria;
> >>> Actinobacteria
> >>> (class); Actinobacteridae; Actinomycetales; Frankineae; Frankiaceae;
> >>> Frankia</Lineage>
> >>>   <LineageEx>
> >>>     <Taxon>
> >>>       <TaxId>131567</TaxId>
> >>>       <ScientificName>cellular organisms</ScientificName>
> >>>       <Rank>no rank</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>2</TaxId>
> >>>       <ScientificName>Bacteria</ScientificName>
> >>>       <Rank>superkingdom</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>201174</TaxId>
> >>>       <ScientificName>Actinobacteria</ScientificName>
> >>>       <Rank>phylum</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>1760</TaxId>
> >>>       <ScientificName>Actinobacteria (class)</ScientificName>
> >>>       <Rank>class</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>85003</TaxId>
> >>>       <ScientificName>Actinobacteridae</ScientificName>
> >>>       <Rank>subclass</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>2037</TaxId>
> >>>       <ScientificName>Actinomycetales</ScientificName>
> >>>       <Rank>order</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>85013</TaxId>
> >>>       <ScientificName>Frankineae</ScientificName>
> >>>       <Rank>suborder</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>74712</TaxId>
> >>>       <ScientificName>Frankiaceae</ScientificName>
> >>>       <Rank>family</Rank>
> >>>     </Taxon>
> >>>     <Taxon>
> >>>       <TaxId>1854</TaxId>
> >>>       <ScientificName>Frankia</ScientificName>
> >>>       <Rank>genus</Rank>
> >>>     </Taxon>
> >>>   </LineageEx>
> >>>   <CreateDate>1999/10/22</CreateDate>
> >>>   <UpdateDate>2005/01/19</UpdateDate>
> >>>   <PubDate>2000/02/02</PubDate>
> >>> </Taxon>
> >>>
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> >>>> Sent: Wednesday, May 10, 2006 7:54 PM
> >>>> To: Sendu Bala
> >>>> Cc: bioperl-l at lists.open-bio.org
> >>>> Subject: Re: [Bioperl-l] Bio::Taxonomy confusion
> >>>>
> >>>> I would use the implementation that talks to the flatfile db as the
> >>>> standard here.  nodes are defined by the data in from taxonomy dump
> >>>> dbs from ncbi.
> >>>> the eutils is pretty worthless except for taxid->name or
> >>>> reverse, you
> >>>> can't get the full taxonomy (or couldn't when that
> >>>> implementation was
> >>>> written).
> >>>>
> >>>> The "name" method refers to the name of the node - each level in
> >>>> the
> >>>> taxonomy can have a "name".
> >>>>
> >>>> The bits of hackiness relate to wrapping the node object as a
> >>>> Bio::Species and/or being able to read  a genbank file and the
> >>>> organism taxonomy data as a list and instantiating.  If we could
> >>>> rely
> >>>> on everything being in a DB of course this would be simpler.
> >>>>
> >>>> Another problem is the depth of the taxonomy is not constant for
> >>>> every node so assuming that a fixed number of slots will be
> >>>> filled in
> >>>> to generate the taxonomy leads to problems.
> >>>>
> >>>> Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as
> >>>> the
> >>>> best example of working code as this is how I really wanted it to
> >>>> work, the Bio::Species hacks are only there to shoehorn data
> >>>> retrieved from genbank files in.  With the flatfile implementation
> >>>> you have to walk all the way up the db hierarchy to get the kingdom
> >>>> for a node so you do have to build up the classification
> >>>> hierarchy as
> >>>> each node only stores data about itsself.
> >>>>
> >>>> I'm not exactly sure what you are proposing to do, but would
> >>>> definitely enjoy another pair of hands, I don't really have time to
> >>>> mess with it any time soon.
> >>>>
> >>>> -jason
> >>>> On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> >>>>
> >>>>> Hi,
> >>>>> I'm a little confused as to how names are supposed to work in
> >>>>> Bio::Taxonomy::Node.
> >>>>>
> >>>>> In the bioperl versions that I've looked at a Node doesn't seem to
> >>>>> store
> >>>>> the most important information about itself - it's scientific name
> >>>>> - in
> >>>>> an obvious place. bioperl 1.5.1 puts it at the start of the
> >>>>> classification list. I'd have thought sticking it in -name would
> >>>>> make
> >>>>> more sense, but this is used only for the GenBank common name.
> >>>>>
> >>>>> The Bio::Taxonomy docs still suggests:
> >>>>>
> >>>>> my $node_species_sapiens = Bio::Taxonomy::Node->new(
> >>>>>    -object_id => 9606, # or -ncbi_taxid. Requird tag
> >>>>>    -names => {
> >>>>>        'scientific' => ['sapiens'],
> >>>>>        'common_name' => ['human']
> >>>>>    },
> >>>>>    -rank => 'species'  # Required tag
> >>>>> );
> >>>>>
> >>>>> and whilst Bio::Taxonomy::Node does not accept -names, it does
> >>>>> have a
> >>>>> 'name' method which claims to work like:
> >>>>>
> >>>>> $obj->name('scientific', 'sapiens');
> >>>>>
> >>>>> This kind of thing would be really nice, but afaics
> >>>>> Bio::Taxonomy::Node->new takes the -name value and makes a common
> >>>>> name
> >>>>> out of it, whilst the name() method passes any 'scientific'
> >>>>> name to
> >>>>> the
> >>>>> scientific_name() method which is unable to set any value (and
> >>>>> warns
> >>>>> about this), only get.
> >>>>>
> >>>>> It seems like the need to have this classification array work the
> >>>>> same
> >>>>> way as Bio::Species is causing some unnecessary restrictions.
> >>>>> Can't
> >>>>> the
> >>>>> more sensible idea of having a dedicated storage spot for the
> >>>>> ScientificName and other parameters be used, with the
> >>>>> classification
> >>>>> array either being generated just-in-time from the hash-stored
> >>>>> data, or
> >>>>> indeed being generated from the Lineage field?
> >>>>>
> >>>>>
> >>>>> Also, why does a node store the complete hierarchy on itself in
> >>>>> the
> >>>>> classification array? If we're going that far, why don't the
> >>>>> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just
> >>>>> have a
> >>>>> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> >>>>> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> >>>>> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> >>>>> only
> >>>>> have a minimum of information, if you could simply ask a node
> >>>>> what its
> >>>>> rank and scientific name was you could easily build a
> >>>>> classification
> >>>>> array, or ask what Kingdom your species was in etc.
> >>>>>
> >>>>> Are there good reasons for Taxonomy working the way it does in
> >>>>> 1.5.1, or
> >>>>> would I not be wasting my time re-writing things to make more
> >>>>> sense
> >>>>> (to me)?
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>> Sendu.
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> --
> >>>> Jason Stajich
> >>>> Duke University
> >>>> http://www.duke.edu/~jes12
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12


From WiersmaP at AGR.GC.CA  Fri May 12 00:13:12 2006
From: WiersmaP at AGR.GC.CA (Wiersma, Paul)
Date: Thu, 11 May 2006 20:13:12 -0400
Subject: [Bioperl-l] What is the relationship between primer3 module
	andrun-primer3 module?
Message-ID: <5F0D2715D84F2842A9B857E8D7888F120C4C52@onncrxms5.agr.gc.ca>

Li,

If you are only "a little confused" by the OO concepts in the primer3 modules than you are doing well.

To expand a little on Wenwu's explanations.  A Bio::Tools::Run:Primer3 object is a "wrapper" around the Primer3 program. All the commands and parameters that Primer3 needs for it to run are collected inside the object.  This includes a sequence (which you must supply as a sequence object) and parameters (most of which are already supplied by default but can be changed using the $primer3_object->add_targets method). Then, when everything is set the way you want it you 'run' the Primer3 program by using $primer3_object->run.  The "wrapper" collects all the run parameters and sends them off to the Primer3 executable.  Primer3 does the analysis and outputs the results to "stdout" in boulder-io format.  By redirecting the output (i.e. perl p3run_script.pl > out.txt) you will get the Primer3 output directly in the  boulder-io format ('tag'='value') stored in out.txt.  Because out.txt is not being closed between each sequence called in the script you get all of the results concatenated in out.txt.  However, if you supplied an output filename (-outfile=>$file_out) in the "wrapper", each line of output from Primer3 will be written to $file_out and at the end of Primer3 output the file will be closed.  Now if your script loops to another sequence it will open the same outfile again and overwrite.  

One last important detail for the "wrapper" object.  When Primer3 is executed the $primer3_object is designed to return a Bio::Tools::Primer3 object (the code is: my $results_object = $primer3_object->run).  $results_object is a Bio::Tools::Primer3 object and contains the results of your Primer3 run as well as having methods for getting at that information.  This includes finding out how many primer sets were found and the means to access the primer set results one at a time.  It does work as advertised.  Because all of the primer sets are based on the same sequence, Primer3 only outputs the SEQUENCE and PRIMER_SEQUENCE_ID one time instead of for each primer set.  That is why they only show up in $results_object as if they belonged with the first primer set (set '0') and they are not available for the other primer sets.

PAW

Paul A. Wiersma
Agriculture and Agri-Food Canada/Agriculture et Agroalimentaire Canada
Summerland, BC
wiersmap at agr.gc.ca
 
?


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
Sent: Wednesday, May 10, 2006 5:28 PM
To: bioperl-l at bioperl.org
Subject: [Bioperl-l] What is the relationship between primer3 module andrun-primer3 module?

First thank you all for replying my previous post
about primer3.

But now I am a little confused even after I read the
documents: What is the relationship between these two
modules? What is correct/standard way to use them to
do the batch-primer design? What I do is that I use
Bio::Tools::Run::Primer3 to design primers. Based on
Dr. Roy Chaudhuri's information I can set the
parameters using the following syntax:

$primer3->add_targets(PRIMER_PRODUCT_SIZE_RANGE=>'490-510');

Based on Paul A. Wiersma's explanation I can also
print out part of the primer results(because I don't
need all the information). But there is a little
trouble: PRIMER_SEQUENCE_ID can't be accessed using
this method. And Paul points out that
"PRIMER_SEQUENCE_ID and SEQUENCE are not part of the
individual 
results but only end up by default with
$results->primer_results(0)".  So it seems there is no
way to get around this problem using
Bio::Tools::Run::Primer3. And others suggest using
Bio::Tools::Primer3 to parse the results. So is true
that Bio::Tools::Run::Primer3 is for primer design and
Bio::Tools::Primer3 is for parsing the results from
Bio::Tools::Run::Primer3? But what I find is that I
get almost all the results (except PRIMER_SEQUENCE_ID
and SEQUENCE ) without providing a line code

use Bio::Tools::Primer3 

in the script.  How to explain this? Is it because the
following line code?

my $result=$primer3->run; 

The last question: which line code is used to invoke
program primer3.exe? How does Perl script call the
primer3.exe?

Once again thank you all very much,

Li 

 
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Fri May 12 04:29:37 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 12 May 2006 14:29:37 +1000
Subject: [Bioperl-l] Using bioperl to convert gene predictions to gff
In-Reply-To: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin>
References: <000301c6698f$b17a4d20$0202a8c0@GosinkFranklin>
Message-ID: <44640F31.6090702@infotech.monash.edu.au>

Mark,

> I'd like to reformat gene predictions from several different programs
> (genscan, glimmerhmm, fgenesh) to gff format. I know bioperl can parse the
> output from these and other predictors and that it can export into GFF. But
> I'm not clear on how to string the two together.
> Can anyone point me at any example code?

The parser module for the gene predictions generally allow you to 
iterate through the predicted genes. Each prediction is usually returned 
as a Bio::SeqFeatureI-derived object. Those objects have a gff_string() 
method to print them as GFF.

So something as simple as this *may* work:

use Bio::Tools::Glimmer;
my $parser = new Bio::Tools::Glimmer(-file => 'glimmer.out');
while(my $gene = $parser->next_prediction) {
   print $gene->gff_string;
}

If you want separate GFF lines for each exon, you'll have to do another 
loop over $gene->exons() etc each of which are luckily also 
Bio::SeqFeatures!

Or if want to modify some of the GFF columns first, eg. the source tag, 
just do $gene->source_tag('mynewtag') before printing it.

Hope this helps,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From torsten.seemann at infotech.monash.edu.au  Fri May 12 04:36:46 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 12 May 2006 14:36:46 +1000
Subject: [Bioperl-l] Bio::Graphics::Panel imagemap making
	with	Bio::Graphics::Panel
In-Reply-To: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>
References: <5b6410e0605030120q31d1f554mbc4bf104deca48bf@mail.gmail.com>
Message-ID: <446410DE.7070305@infotech.monash.edu.au>

Kevin,

> I want to create an imagemap of short sequence matches with a longer one
> with clickable imagemaps for the short sequences. I figure I can do this
> easily enough using the example script for parsing blast output but I need
> an example script to understand how to produce the html code for the
> imagemap. I can find only rather cryptic references about how this can be
> done (see below).

The "blastGraphic" project probably has Perl code that could help you.

	http://www.gmod.org/blastGraphic.shtml

It is/was part of the GMOD project.
It produces pretty clickable image maps from BLAST reports.

Hope it helps,

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From brianjgilmartin at hotmail.com  Fri May 12 09:29:15 2006
From: brianjgilmartin at hotmail.com (brian gilmartin)
Date: Fri, 12 May 2006 10:29:15 +0100
Subject: [Bioperl-l] (no subject)
Message-ID: <BAY107-F354AD036A551D290A1874CBCAC0@phx.gbl>

please remove me from the list

_________________________________________________________________
Be the first to hear what's new at MSN - sign up to our free newsletters! 
http://www.msn.co.uk/newsletters


From sb at mrc-dunn.cam.ac.uk  Fri May 12 10:24:39 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Fri, 12 May 2006 11:24:39 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
Message-ID: <44646267.2000802@mrc-dunn.cam.ac.uk>

In bioperl up to at least 1.5.1, when one of the database modules comes 
across a species rank it does:

if ($rank eq 'species') {
   # get rid of genus from species name
   (undef,$taxon_name) = split(/\s+/,$taxon_name,2);
}

However even though true scientific name is usually 'Genus species' in 
the database, note the 'usually' - sometimes the species is a multiword 
item that does not include the Genus, so we can't do some simple split 
and take the second word.
The same applies to levels below species, eg. 'Avian erythroblastosis 
virus' is a variant of the species 'Avian leukosis virus' but 'Avian 
erythroblastosis virus (strain ES4)' is a variant of that variant...

My solution is to just remove whatever is the same between the current 
rank and the previous rank. Maybe even that's not so perfect, but it 
must be a lot better than turning the species 'Avian leukosis virus' 
into the species 'virus' (especially given that the genus here is 
'Alpharetrovirus')!

# we need to be going root(kingdom) -> leaf (species or lower) order
#
# we need to be storing untouched versions of the scientific name of
# the previous rank ($self->{_last_raw})
#
# probably only bother start doing this when we get to genus
my $last_raw = $self->{_last_raw} || undef;
$self->{_last_raw} = $sci_name;
if ($last_raw) {
   $sci_name =~ s/$last_raw//;
   $sci_name =~ s/^\s+//;
}

Are there even more strange species (and lower) names that would still 
not work well with the above solution?

Cheers,
Sendu.


From s_maheshwari84 at rediffmail.com  Fri May 12 13:55:49 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 12 May 2006 13:55:49 -0000
Subject: [Bioperl-l] problem help me...........please
Message-ID: <20060512135549.27106.qmail@webmail9.rediffmail.com>

  
hello
I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
I am working on protein protein interaction but I am unable to use the protein interaction module i.e. ProteinGraph.pm..
Actially I am facing lots of problem in the programme I have written Please help me since last four months I am not able to solve the same problem..
I am pasting my programe here also I am attaching it also. ......

#!usr/bin/perl
use lib "/usr/local/bioxapps/bioperl/library/";
use strict;
use Bio::Graph::SimpleGraph;
use Bio::Graph::IO;
our @ISA=qw( Bio::SeqI);
use Bio::Graph::Edge;
use Bio::Graph::IO::dip;
use Bio::Graph::IO::psi_xml;
use Clone qw(clone);
use vars  qw(@ISA);
use Bio::AnnotatableI;
use Bio::IdentifiableI;
our @ISA = qw(Bio::Graph::SimpleGraph);
@ISA = qw(Bio::Graph::IO);
our @ISA=qw(Expoerter);
use Bio::Graph::ProteinGraph;
use Class::AutoClass;
use Bio::Graph::SimpleGraph::Traversal;

my $graphio = Bio::Graph::IO->new(-file   => '/users/saurabh/perl_program/sample1.txt',-format => 'dip');
print "$graphio";
my $graph   = $graphio->next_network();
print "$graph->nodes\t";
$graph->remove_dup_edges();
my @un=$graph->unconnected_nodes();
print "\nthe unconnected nodes are =@un";
my @n=$graph->subgraph();
print "\subgraph=@n\n";
#print "Please the protein-id whose clusering coefficient is to be detemined\n";
#my $v=<STDIN>;
my $density = $graph->density();
print "\ngraph density=$density\n";
my @graphs = $graph->components();
print "\nno of Connected components=$#graphs\n";
print "\nplease enter the protein-id whom you want to remove from the network\n";
my $no=<STDIN>;
$graph->remove_nodes($graph->nodes_by_id($no));
my $count = $graph->edge_count();
print "\nno of edges=$count\n ";
my $ncount = $graph->node_count();
print "\nno of nodes=$ncount\n ";

print"\nenter the protein  whose interactions is to be find "; 
my $x=<STDIN>;
my $node = $graph->nodes_by_id($x);
#print " this is $node\n";
my @neighbors = $graph->neighbors($node); 
print "to check";
print join",",map{$_->object_id()} @neighbors;
my @nodes = $graph->nodes();
print "\nno of nodes = @nodes\t\n";
my @hubs;
foreach my $nodi (@nodes) 
 {
  if ($graph->neighbor_count($node) > 10) 
      {
       push @hubs, $nodi;
      }
  }
  
foreach my $r(@hubs)
  {
     my @y=@$r;
      print "the following proteins have > 10 interactors=@y\n";
  }
  #siblingual protein

 my @edgeref = $graph->articulation_points();
 print "no of articulation points=$#edgeref\n";
 print "please enter the protein whom you want to check for articulation point \n ";
 my $nod=<STDIN>;
  # make pathgen graph
  my $grap = Bio::Graph::IO->new(-file   => 'org.txt',-format => 'dip');
  my $gra   = $grap->next_network();
  $graph->remove_dup_edges();
  $graph->union($gra);
  my @duplicates = $graph->dup_edges();
  print "these interactions exist in cere and c.elegan\n=@duplicates";
  print "please enter the first protein for identifiaction of shortest path\n";
  my $p1=<STDIN>;
  print "please enter the second  protein for identifiaction of shortest path\n";
  my $p2=<STDIN>;
  
    my @a=$graph->shortest_paths();
 print "shortest path=@a\t\n";
    
  
with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: from.pl
Type: application/octet-stream
Size: 2723 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060512/fe287972/attachment-0004.obj>

From chen_li3 at yahoo.com  Thu May 11 17:47:33 2006
From: chen_li3 at yahoo.com (chen li)
Date: Thu, 11 May 2006 10:47:33 -0700 (PDT)
Subject: [Bioperl-l] script for batch-primer design using primer3 module
In-Reply-To: <5F0D2715D84F2842A9B857E8D7888F120C4C4D@onncrxms5.agr.gc.ca>
Message-ID: <20060511174733.68836.qmail@web36812.mail.mud.yahoo.com>

Hi all,

With the valuable input from many of you I finally
come out a script for my personal need:
1)bacth-primer design
2)set some of the parameters instead of using all the
default values
3)output only part of the information for the first
pair of primers but not all of them(but you can
choose)
4)the reults can be exported into excel for my
convience.

Enclosed are the script and the results tested.  I
also include some lines about how I figure out which
keys/entries are vailable for change.If you don't 
want the sequence part just add # to comment it.

Any comments are welcome.

BTW the solution suggested by Dr. Cui and Paul doesn't
work for me.

Once again thank you very much,

Li  

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: primer3-5
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: result1.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060511/2358c5b7/attachment-0004.txt>

From Marc.Logghe at DEVGEN.com  Fri May 12 15:28:55 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Fri, 12 May 2006 17:28:55 +0200
Subject: [Bioperl-l] problem help me...........please
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAB@ANTARESIA.be.devgen.com>

Hi,
What is actually the problem ? Do you have errors ? Is the script not
behaving as you expect ?
You also might attach the input file sample1.txt so that people can try
it.
Regards,
Marc
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> saurabh maheshwari
> Sent: Friday, May 12, 2006 3:56 PM
> To: bioperl-l at bioperl.org; s_maheshwari84
> Subject: [Bioperl-l] problem help me...........please
> 
>   
> hello
> I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
> I am working on protein protein interaction but I am unable 
> to use the protein interaction module i.e. ProteinGraph.pm..
> Actially I am facing lots of problem in the programme I have 
> written Please help me since last four months I am not able 
> to solve the same problem..
> I am pasting my programe here also I am attaching it also. ......
> 
> #!usr/bin/perl
> use lib "/usr/local/bioxapps/bioperl/library/";
> use strict;
> use Bio::Graph::SimpleGraph;
> use Bio::Graph::IO;
> our @ISA=qw( Bio::SeqI);
> use Bio::Graph::Edge;
> use Bio::Graph::IO::dip;
> use Bio::Graph::IO::psi_xml;
> use Clone qw(clone);
> use vars  qw(@ISA);
> use Bio::AnnotatableI;
> use Bio::IdentifiableI;
> our @ISA = qw(Bio::Graph::SimpleGraph);
> @ISA = qw(Bio::Graph::IO);
> our @ISA=qw(Expoerter);
> use Bio::Graph::ProteinGraph;
> use Class::AutoClass;
> use Bio::Graph::SimpleGraph::Traversal;
> 
> my $graphio = Bio::Graph::IO->new(-file   => 
> '/users/saurabh/perl_program/sample1.txt',-format => 'dip');
> print "$graphio";
> my $graph   = $graphio->next_network();
> print "$graph->nodes\t";
> $graph->remove_dup_edges();
> my @un=$graph->unconnected_nodes();
> print "\nthe unconnected nodes are =@un"; my 
> @n=$graph->subgraph(); print "\subgraph=@n\n"; #print "Please 
> the protein-id whose clusering coefficient is to be 
> detemined\n"; #my $v=<STDIN>; my $density = 
> $graph->density(); print "\ngraph density=$density\n"; my 
> @graphs = $graph->components(); print "\nno of Connected 
> components=$#graphs\n"; print "\nplease enter the protein-id 
> whom you want to remove from the network\n"; my $no=<STDIN>; 
> $graph->remove_nodes($graph->nodes_by_id($no));
> my $count = $graph->edge_count();
> print "\nno of edges=$count\n ";
> my $ncount = $graph->node_count();
> print "\nno of nodes=$ncount\n ";
> 
> print"\nenter the protein  whose interactions is to be find 
> "; my $x=<STDIN>; my $node = $graph->nodes_by_id($x); #print 
> " this is $node\n"; my @neighbors = $graph->neighbors($node); 
> print "to check"; print join",",map{$_->object_id()} 
> @neighbors; my @nodes = $graph->nodes(); print "\nno of nodes 
> = @nodes\t\n"; my @hubs; foreach my $nodi (@nodes)  {
>   if ($graph->neighbor_count($node) > 10) 
>       {
>        push @hubs, $nodi;
>       }
>   }
>   
> foreach my $r(@hubs)
>   {
>      my @y=@$r;
>       print "the following proteins have > 10 interactors=@y\n";
>   }
>   #siblingual protein
> 
>  my @edgeref = $graph->articulation_points();  print "no of 
> articulation points=$#edgeref\n";  print "please enter the 
> protein whom you want to check for articulation point \n ";  
> my $nod=<STDIN>;
>   # make pathgen graph
>   my $grap = Bio::Graph::IO->new(-file   => 'org.txt',-format 
> => 'dip');
>   my $gra   = $grap->next_network();
>   $graph->remove_dup_edges();
>   $graph->union($gra);
>   my @duplicates = $graph->dup_edges();
>   print "these interactions exist in cere and c.elegan\n=@duplicates";
>   print "please enter the first protein for identifiaction of 
> shortest path\n";
>   my $p1=<STDIN>;
>   print "please enter the second  protein for identifiaction 
> of shortest path\n";
>   my $p2=<STDIN>;
>   
>     my @a=$graph->shortest_paths();
>  print "shortest path=@a\t\n";
>     
>   
> 
> with Regards
> 
> SAURABH MAHESHWARI
> 
> M.Sc. (BIOINFORMATICS)
> 
> JAMIA MILLIA ISLAMIA
> 
> NEW DELHI
> 


From stoltzfu at umbi.umd.edu  Fri May 12 15:56:06 2006
From: stoltzfu at umbi.umd.edu (Arlin Stoltzfus)
Date: Fri, 12 May 2006 11:56:06 -0400
Subject: [Bioperl-l] proposal: Bio::CDAT (character data and trees)
Message-ID: <A52F256F-A851-4429-A5B1-D3162A344790@umbi.umd.edu>

Dear developers--

We propose a Bio::CDAT (Character Data And Trees) module to  
facilitate comparative analysis
using evolutionary methods by 1) managing evolutionary relationships  
(by linking data to trees)
and 2) allowing coordinated analysis of different types of data (by  
implementing a generic concept
of ?character-state? data).  Bio::CDAT would leverage existing  
BioPerl objects and include the functionality
of Rutger Vos's Bio::Phylo.  It would provide the framework to  
develop interfaces to analysis tools
(phylogeny inference, evolutionary rate models, functional shift  
inference, etc), as well as to file
formats and visualization methods appropriate for such analyses.  A  
proposal is available at

   http://www.molevol.org/camel/projects/CDAT-proposal.pdf

We would like to hear your thoughts (e.g., see the section on  
"Questions to consider")!  Thanks

Arlin Stoltzfus
WeiGang Qiu
Rutger Vos
(with thanks to Justin Reese and Aaron Mackey)
------------------
Arlin Stoltzfus (stoltzfu at umbi.umd.edu)
CARB, 9600 Gudelsky Drive, Rockville, Maryland  20850
tel 240 314 6208, fax 240 314 6255, www.molevol.org/camel


From sdavis2 at mail.nih.gov  Fri May 12 15:54:57 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Fri, 12 May 2006 11:54:57 -0400
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060512135549.27106.qmail@webmail9.rediffmail.com>
Message-ID: <C08A2811.B6B5%sdavis2@mail.nih.gov>


On 5/12/06 9:55 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
wrote:

>   
> hello
> I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
> I am working on protein protein interaction but I am unable to use the protein
> interaction module i.e. ProteinGraph.pm..
> Actially I am facing lots of problem in the programme I have written Please
> help me since last four months I am not able to solve the same problem..
> I am pasting my programe here also I am attaching it also. ......

You haven't really told us what you are trying to do or what problems you
are having.

Sean


From cjfields at uiuc.edu  Fri May 12 17:08:11 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 12 May 2006 12:08:11 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <44646267.2000802@mrc-dunn.cam.ac.uk>
Message-ID: <000f01c675e6$a61bde90$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Friday, May 12, 2006 5:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles
> species,subspecies/variant names
> 
> In bioperl up to at least 1.5.1, when one of the database modules comes
> across a species rank it does:
> 
> if ($rank eq 'species') {
>    # get rid of genus from species name
>    (undef,$taxon_name) = split(/\s+/,$taxon_name,2);
> }

The XML example from NCBI Taxonomy I mentioned previously seems to have
everything in the classification, from superkingdom down to species (no
strain unfortunately, and I'm nit sure about subspecies); if it's missing
the rank then the designation doesn't exist or is tagged as 'no rank'.  Like
I mentioned before I'm not intimately familiar Bio::Taxonomy,
Bio::DB::Taxonomy, or Bio::Species, so I don't have a clue as to how
everything is parsed and plugged in to Bio::Taxonomy objects.  I do know
that XML::Twig is used for parsing through the data so it shouldn't be too
hard to change what you want.

I haven't tried using Bio::DB::Taxonomy directly yet, but I would have
thought that the binomial is just built from the XML twig 'LineageEx'
Rank=Genus + Rank=Species, that the genus comes from the tag 'Genus' and
species from 'Species', and that the scientific name is from the tag
'ScientificName'.  Guess not. 

> However even though true scientific name is usually 'Genus species' in
> the database, note the 'usually' - sometimes the species is a multiword
> item that does not include the Genus, so we can't do some simple split
> and take the second word.
> The same applies to levels below species, eg. 'Avian erythroblastosis
> virus' is a variant of the species 'Avian leukosis virus' but 'Avian
> erythroblastosis virus (strain ES4)' is a variant of that variant...
> 
> My solution is to just remove whatever is the same between the current
> rank and the previous rank. Maybe even that's not so perfect, but it
> must be a lot better than turning the species 'Avian leukosis virus'
> into the species 'virus' (especially given that the genus here is
> 'Alpharetrovirus')!
> 
> # we need to be going root(kingdom) -> leaf (species or lower) order
> #
> # we need to be storing untouched versions of the scientific name of
> # the previous rank ($self->{_last_raw})
> #
> # probably only bother start doing this when we get to genus
> my $last_raw = $self->{_last_raw} || undef;
> $self->{_last_raw} = $sci_name;
> if ($last_raw) {
>    $sci_name =~ s/$last_raw//;
>    $sci_name =~ s/^\s+//;
> }
> 
> Are there even more strange species (and lower) names that would still
> not work well with the above solution?

I'm don't think taking Genus/Species directly from the scientific name
(normally what is in the SOURCE or ORGANISM annotation for GenBank or OS for
EMBL) is the best way to go about it since it's really a best guess using
regex; Jason pointed out several examples where this falls apart, and being
a bacterial man I have found many examples myself.  I'm also not sure that
forcing a lookup for every TaxID in every sequence every time it's passed
through SeqIO is the best way to go either, though I think it should be
required for storing sequences.  It's a tricky balance.  

I still think that maybe we should absolve ourselves from using
SOURCE/ORGANISM or OS/OC information in GenBank files as anything more than
strictly annotation, or reconstruct Bio::Species to maybe a
Bio::Annotation::Species object to handle that annotation and either
deprecate Bio::Species or separate it completely from any Bio::Taxonomy
objects.  It would really simplify things.  Then, if anyone is interested in
taxonomy, either install a local database or use Entrez efetch, and then use
Bio::DB::Taxonomy (fixed of course) to grab the TaxID info.  Seems like
we're running more and more into exceptions to the rule as more genomes are
made available.

Anyway, using Bio::Species for GenBank is really screwy for bacterial names,
so currently I get around BioPerl issues with bacterial names by grabbing
the 'source' seqfeature and pulling the 'organism' tag out.  But it really
shouldn't be that obfuscated, right?

Chris

> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Sat May 13 12:19:21 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Sat, 13 May 2006 08:19:21 -0400
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060513041853.16091.qmail@webmail31.rediffmail.com>
References: <20060513041853.16091.qmail@webmail31.rediffmail.com>
Message-ID: <4465CEC9.2010909@mail.nih.gov>

saurabh maheshwari wrote:
>  
> hello
> Thanks for your prompt reply.
> Actaully I am trying to make a protein interaction graph from a dip 
> file.But I am not able to do so.In my last mail I have already attached 
> my program which is giving some error and I am not able troble shot 
> them.Please help
> Thanks

I meant that since we don't know what error(s) you are getting, it is 
really not possible to determine what the problem is.  Also, someone 
else on the list offered to look at your code if you were to privide the 
input file.  I find it helpful to look at this webpage every now and 
then to remind myself what constitutes a useful question to email lists:

http://www.catb.org/~esr/faqs/smart-questions.html

Sean


> On Fri, 12 May 2006 Sean Davis wrote :
>  >
>  >
>  >
>  >On 5/12/06 9:55 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
>  >wrote:
>  >
>  > >
>  > > hello
>  > > I am a studnt at Center for DNA Finger Printing and Diagnostics(CDFD).
>  > > I am working on protein protein interaction but I am unable to use 
> the protein
>  > > interaction module i.e. ProteinGraph.pm..
>  > > Actially I am facing lots of problem in the programme I have 
> written Please
>  > > help me since last four months I am not able to solve the same 
> problem..
>  > > I am pasting my programe here also I am attaching it also. ......
>  >
>  >You haven't really told us what you are trying to do or what problems you
>  >are having.
>  >
>  >Sean
>  >
>  >_______________________________________________
>  >Bioperl-l mailing list
>  >Bioperl-l at lists.open-bio.org
>  >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> with Regards
> SAURABH MAHESHWARI
> M.Sc. (BIOINFORMATICS)
> JAMIA MILLIA ISLAMIA
> NEW DELHI
> 
> <http://adworks.rediff.com/cgi-bin/AdWorks/sigclick.cgi/www.rediff.com/signature-home.htm/1507191490 at Middle5?PARTNER=3> 
> 


From s_maheshwari84 at rediffmail.com  Sat May 13 05:17:58 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 13 May 2006 05:17:58 -0000
Subject: [Bioperl-l] problem help me...........please
Message-ID: <20060513051758.4610.qmail@webmail31.rediffmail.com>

  
hello
I am very happy to see the prompt reply from the group members..
As you all suggested  to attach the required files ..
So I have attached all the three file first the input file,secod I have saved the error I was getting into a error file and third the programme file..
Actully in error file I want to know some thing .
I am putting here one error line,
## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
what this stand for
Second thing I want to get the connected graph as I have.
which type of connected grph I explain you by example..
Let there are five object in such a way.
A connected to B
A connected to C
B connected to C
D connected to C
E connected to A
I want to create a whole link in betwwen all five.


Please help me I am not getting the result


with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample.dip
Type: application/octet-stream
Size: 5794 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment-0008.obj>
-------------- next part --------------
bash-2.05b$ perl from.pl
Bio::Graph::ProteinGraph=HASH(0x1182e70)
Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes
the unconnected nodes are =subgraph=Bio::Graph::SimpleGraph=HASH(0x11e2160)

graph density=0.00826446280991736

no of Connected components=60

please enter the protein-id whom you want to remove from the network
XMECF2

no of edges=61

no of nodes=122

enter the protein  whose interactions is to be find XMECF2
XMECF2
 interacts with map{->object_id()}

no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) Bio::Seq::RichSeq=HASH(0x11d1850
) Bio::Seq::RichSeq=HASH(0x11bd4c0) Bio::Seq::RichSeq=HASH(0x11c2fd0) Bio::Seq::
RichSeq=HASH(0x11aa7f0) Bio::Seq::RichSeq=HASH(0x1198340) Bio::Seq::RichSeq=HASH
(0x11d81a0) Bio::Seq::RichSeq=HASH(0x11ca320) Bio::Seq::RichSeq=HASH(0x11b5e40)
Bio::Seq::RichSeq=HASH(0x1190e00) Bio::Seq::RichSeq=HASH(0x11c1350) Bio::Seq::Ri
chSeq=HASH(0x11b2e20) Bio::Seq::RichSeq=HASH(0x11cb360) Bio::Seq::RichSeq=HASH(0
x1198250) Bio::Seq::RichSeq=HASH(0x11d0240) Bio::Seq::RichSeq=HASH(0x11c8f20) Bi
o::Seq::RichSeq=HASH(0x11b4ef0) Bio::Seq::RichSeq=HASH(0x119f7a0) Bio::Seq::Rich
Seq=HASH(0x11c2ee0) Bio::Seq::RichSeq=HASH(0x11dba20) Bio::Seq::RichSeq=HASH(0x1
1e2300) Bio::Seq::RichSeq=HASH(0x11b2f10) Bio::Seq::RichSeq=HASH(0x11b4b90) Bio:
:Seq::RichSeq=HASH(0x11d4df0) Bio::Seq::RichSeq=HASH(0x11d4b80) Bio::Seq::RichSe
q=HASH(0x11d8e70) Bio::Seq::RichSeq=HASH(0x11a1270) Bio::Seq::RichSeq=HASH(0x11c
b5d0) Bio::Seq::RichSeq=HASH(0x11d5cc0) Bio::Seq::RichSeq=HASH(0x11d32a0) Bio::S
eq::RichSeq=HASH(0x11b4c80) Bio::Seq::RichSeq=HASH(0x119e0c0) Bio::Seq::RichSeq=
HASH(0x11b7ed0) Bio::Seq::RichSeq=HASH(0x11ad490) Bio::Seq::RichSeq=HASH(0x1196e
60) Bio::Seq::RichSeq=HASH(0x119b7f0) Bio::Seq::RichSeq=HASH(0x11cef60) Bio::Seq
::RichSeq=HASH(0x11b7b70) Bio::Seq::RichSeq=HASH(0x11dd330) Bio::Seq::RichSeq=HA
SH(0x11da8c0) Bio::Seq::RichSeq=HASH(0x11a9f70) Bio::Seq::RichSeq=HASH(0x119b700
) Bio::Seq::RichSeq=HASH(0x119a550) Bio::Seq::RichSeq=HASH(0x11ba910) Bio::Seq::
RichSeq=HASH(0x11e0b30) Bio::Seq::RichSeq=HASH(0x11d3030) Bio::Seq::RichSeq=HASH
(0x11c62d0) Bio::Seq::RichSeq=HASH(0x11abb20) Bio::Seq::RichSeq=HASH(0x11d5bd0)
Bio::Seq::RichSeq=HASH(0x11b03c0) Bio::Seq::RichSeq=HASH(0x119e1b0) Bio::Seq::Ri
chSeq=HASH(0x11aa060) Bio::Seq::RichSeq=HASH(0x11a5700) Bio::Seq::RichSeq=HASH(0
x11a81e0) Bio::Seq::RichSeq=HASH(0x1196b00) Bio::Seq::RichSeq=HASH(0x11c1260) Bi
o::Seq::RichSeq=HASH(0x11a2800) Bio::Seq::RichSeq=HASH(0x11c63c0) Bio::Seq::Rich
Seq=HASH(0x11b60b0) Bio::Seq::RichSeq=HASH(0x11b93b0) Bio::Seq::RichSeq=HASH(0x1
1a4490) Bio::Seq::RichSeq=HASH(0x11ded50) Bio::Seq::RichSeq=HASH(0x11bbcd0) Bio:
:Seq::RichSeq=HASH(0x1194780) Bio::Seq::RichSeq=HASH(0x11aedd0) Bio::Seq::RichSe
q=HASH(0x11cd300) Bio::Seq::RichSeq=HASH(0x11a14e0) Bio::Seq::RichSeq=HASH(0x11c
4630) Bio::Seq::RichSeq=HASH(0x11a43a0) Bio::Seq::RichSeq=HASH(0x11a80f0) Bio::S
eq::RichSeq=HASH(0x11bbbe0) Bio::Seq::RichSeq=HASH(0x11d5960) Bio::Seq::RichSeq=
HASH(0x11c8e30) Bio::Seq::RichSeq=HASH(0x11cd3f0) Bio::Seq::RichSeq=HASH(0x11dd4
20) Bio::Seq::RichSeq=HASH(0x11cee70) Bio::Seq::RichSeq=HASH(0x11dbb10) Bio::Seq
::RichSeq=HASH(0x119a460) Bio::Seq::RichSeq=HASH(0x11aaa60) Bio::Seq::RichSeq=HA
SH(0x11d1760) Bio::Seq::RichSeq=HASH(0x11cb6c0) Bio::Seq::RichSeq=HASH(0x11c7530
) Bio::Seq::RichSeq=HASH(0x11deae0) Bio::Seq::RichSeq=HASH(0x11c4720) Bio::Seq::
RichSeq=HASH(0x119f890) Bio::Seq::RichSeq=HASH(0x11a6c40) Bio::Seq::RichSeq=HASH
(0x11ad130) Bio::Seq::RichSeq=HASH(0x11e23f0) Bio::Seq::RichSeq=HASH(0x11d2f40)
Bio::Seq::RichSeq=HASH(0x1194640) Bio::Seq::RichSeq=HASH(0x11d8f60) Bio::Seq::Ri
chSeq=HASH(0x11d0150) Bio::Seq::RichSeq=HASH(0x119d070) Bio::Seq::RichSeq=HASH(0
x11a5610) Bio::Seq::RichSeq=HASH(0x11aa2d0) Bio::Seq::RichSeq=HASH(0x11b94a0) Bi
o::Seq::RichSeq=HASH(0x11bd5b0) Bio::Seq::RichSeq=HASH(0x11c0ff0) Bio::Seq::Rich
Seq=HASH(0x11a6b50) Bio::Seq::RichSeq=HASH(0x119cf80) Bio::Seq::RichSeq=HASH(0x1
1baa00) Bio::Seq::RichSeq=HASH(0x11c7620) Bio::Seq::RichSeq=HASH(0x119fb00) Bio:
:Seq::RichSeq=HASH(0x11a2a70) Bio::Seq::RichSeq=HASH(0x11b1960) Bio::Seq::RichSe
q=HASH(0x11ab8b0) Bio::Seq::RichSeq=HASH(0x11e0c20) Bio::Seq::RichSeq=HASH(0x11a
d3a0) Bio::Seq::RichSeq=HASH(0x1197fe0) Bio::Seq::RichSeq=HASH(0x11b1870) Bio::S
eq::RichSeq=HASH(0x11a2b60) Bio::Seq::RichSeq=HASH(0x1192750) Bio::Seq::RichSeq=
HASH(0x11c9190) Bio::Seq::RichSeq=HASH(0x11e08c0) Bio::Seq::RichSeq=HASH(0x11dd6
90) Bio::Seq::RichSeq=HASH(0x11da7d0) Bio::Seq::RichSeq=HASH(0x11aece0) Bio::Seq
::RichSeq=HASH(0x11d80b0) Bio::Seq::RichSeq=HASH(0x11ca0b0) Bio::Seq::RichSeq=HA
SH(0x1196bf0) Bio::Seq::RichSeq=HASH(0x11b7de0) Bio::Seq::RichSeq=HASH(0x11b02d0
)
Can't call method "isa" on an undefined value at /usr/local/bioxapps/bioperl/lib
rary//Bio/Graph/ProteinGraph.pm line 477, <STDIN> line 2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: from.pl
Type: application/octet-stream
Size: 2723 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060513/77476ca5/attachment-0009.obj>

From cjfields at uiuc.edu  Sat May 13 18:18:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 13 May 2006 13:18:53 -0500
Subject: [Bioperl-l] problem help me...........please
In-Reply-To: <20060513051758.4610.qmail@webmail31.rediffmail.com>
Message-ID: <000901c676b9$b14479c0$15327e82@pyrimidine>

I really hate to break the bad news here, but I'm going to be brutally
honest.  I have not looked at any of the Bio::Graph modules and have no idea
how they are implemented, and I haven't looked at your input file, but I can
tell right off the bat your script has major logic problems.  I can also
pretty much  tell that you don't understand the object model we use here, at
all.  This is why I say that (from your last response):

> ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
> what this stand for

Did you cut and paste from several other scripts hoping that it would work?
I say that b/c you mix styles quite frequently here, using objects correctly
(deref'ing with '->') and incorrectly (print "$object").  You also declare
(and redeclare) @ISA four times for a script (not needed unless you're
declaring a class and inheriting methods from other modules).  You also use
@ISA once with a misspelled module name (I don't think there is a module
named 'Expoerter').  So, I'm actually stunned that the script doesn't crash
at all.  Yikes!

Okay, brutal honesty time over.  Any time you see something like this:

Bio::Graph::ProteinGraph=HASH(0x1182e70)

means that what you are printing out is an reference to an object (it refers
to the object class and the location in memory) and is NOT what you want.
You should be doing something along the lines of $object->method, not 'print
$object', to get at the object data and methods.  You use this several times
in your script already; that should be a big hint as the areas where it
doesn't work do not use this syntax.  Read the documentation for the many
varied modules you use in your script.  Look at script examples.  Start
simply, then work your way up.  

Also, using the '->' dereferencing operator inside double quotes doesn't
work; you have to do something like:

print $graph->nodes,"\t";

not 

print "$graph->nodes\t";

That's why you get this in your output:

Bio::Graph::ProteinGraph=HASH(0x1182e70)->nodes

Which just prints the object reference with the string '->nodes'.

If any of what I just said doesn't make any sense, you really need to pick
up 'Learning Perl' and 'Intermediate Perl' by Schwartz et al and
'Programming Perl' by Wall et al.  I don't know if anyone can really help at
this point w/o completely writing the script for you.  We will fix problems
to a point but we, for the most part, will not do your work for you.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari
> Sent: Saturday, May 13, 2006 12:18 AM
> To: bioperl_l
> Subject: [Bioperl-l] problem help me...........please
> 
> 
> hello
> I am very happy to see the prompt reply from the group members..
> As you all suggested  to attach the required files ..
> So I have attached all the three file first the input file,secod I have
> saved the error I was getting into a error file and third the programme
> file..
> Actully in error file I want to know some thing .
> I am putting here one error line,
> ## no of nodes = Bio::Seq::RichSeq=HASH(0x11aa700) ##
> what this stand for
> Second thing I want to get the connected graph as I have.
> which type of connected grph I explain you by example..
> Let there are five object in such a way.
> A connected to B
> A connected to C
> B connected to C
> D connected to C
> E connected to A
> I want to create a whole link in betwwen all five.
> 
> 
> Please help me I am not getting the result
> 
> 
> with Regards
> 
> SAURABH MAHESHWARI
> 
> M.Sc. (BIOINFORMATICS)
> 
> JAMIA MILLIA ISLAMIA
> 
> NEW DELHI


From hubert.prielinger at gmx.at  Sun May 14 03:45:58 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sat, 13 May 2006 21:45:58 -0600
Subject: [Bioperl-l] parsing output files from other tools
Message-ID: <4466A7F6.30204@gmx.at>

hi,
Is it possible to parse text outputfiles rather than blast output files, 
like the text outputfiles form the search tool mpSrch that is offered by
EBI, because the WU Blast output files are possible to parse with bioperl.

thanks
Hubert


From arareko at campus.iztacala.unam.mx  Sun May 14 04:09:35 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Sat, 13 May 2006 23:09:35 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <4466AD7F.6050700@campus.iztacala.unam.mx>

I'm glad to announce the availability of the Deobfuscator interface at 
the BioPerl website. You can use it at the following URL:

http://bioperl.org/cgi-bin/deob_interface.cgi

Many thanks to Laura Kavanaugh and David Messina for this great 
contribution to the BioPerl project!

Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Sun May 14 16:18:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 14 May 2006 11:18:10 -0500
Subject: [Bioperl-l] parsing output files from other tools
In-Reply-To: <4466A7F6.30204@gmx.at>
Message-ID: <000301c67772$00b4e4f0$15327e82@pyrimidine>

These are the current report types parsed through SearchIO:

http://www.bioperl.org/wiki/Module:Bio::SearchIO

I don't see mpsrch among them.  If you want you could create a new plugin
module to parse those reports; the SearchIO HOWTO gives some pointers:

http://www.bioperl.org/wiki/HOWTO:SearchIO

You can always look at some of the current modules like blast, blastxml, or
fasta to get an idea of how it works.  Judging by the mpsrch output I'm
pretty sure you would have to build a custom plugin for it.  

A viable alternative: looking through the mail list it looks like mpsrch is
a multiprocessor implementation of ssearch, itself an implementation of the
Smith-Waterman algorithm for local alignments in the FASTA package of
programs:

http://www.bioperl.org/wiki/SSEARCH

You might be able to use SearchIO::fasta there...

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
> Sent: Saturday, May 13, 2006 10:46 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] parsing output files from other tools
> 
> hi,
> Is it possible to parse text outputfiles rather than blast output files,
> like the text outputfiles form the search tool mpSrch that is offered by
> EBI, because the WU Blast output files are possible to parse with bioperl.
> 
> thanks
> Hubert
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May 14 17:14:30 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 14 May 2006 10:14:30 -0700 (PDT)
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com>

Hi all,

I need to get a reverse-complemenary sequence out of a
fasta sequence file. And the Synopsis of Bio::Seq
points out I can do like this way:

$revcom=$seqobj->revcom();

I use the following script trying to get the job done
but it doesn't work. Then I read documentation of
Bio::Seq and it looks like it doesn't contain revcom
method.

Any idea will be appreciated.

Li 


###############################
Here is the code:

#!c:/perl/bin/perl.exe
use strict;
use warnings;

use Bio::Seq; 
use Bio::SeqIO;     
       
my $file='c:/perl/local/primer3_1.0.0/src/est.txt';   
 
    
my $seqIO=Bio::SeqIO->new(-file=>"<$file",
                            -format=>'fasta' );
                            
    my $seqobj=$seqIO->next_seq();#create object  
    
  print "what attributes/keys are available:\n";    
  for my $key (sort keys %$seqobj){
           my $value=$seqobj->{$key};
	    print "$key\t=>\t$value\n"
	    }
# These are the output on the screen	    
#primary_id =>      gi|54093|emb|X61809.1|
#primary_seq =>     Bio::PrimarySeq=HASH(0x10492848)

#based on these results primary_id can get 
#access right away
# as to primary_seq it is an object in
#Bio::Primaryseq and it provides the following
#methods after reading the documentaion:
                #new   
		#seq 
		#validate_seq 
		#subseq 
		#length 
		#display_id
		#accession_number 
		#primary_id 
		#alphabet 
		#desc 
		#can_call_new
		#id 
		#is_circular 
		#object_id
		#version 
		#authority 
		#namespace 
		#display_name 
		#description 
    
print "primary_id=",$seqobj->primary_id, "\n\n";
print "id=",$seqobj->id, "\n\n"; 
print "revcom=",$seqobj->revcom,"\n\n";
                      
        my $now_time=localtime;
        print  $now_time, "\n\n";  
        exit;

 #These are the output on the screen 
	#primary_id=gi|54093|emb|X61809.1|
	#id=gi|54093|emb|X61809.1
	#revcom=Bio::Seq=HASH(0x10493304)
	#Sun May 14 12:45:20 2006

      
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Sun May 14 17:39:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 14 May 2006 12:39:50 -0500
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
In-Reply-To: <20060514171430.74846.qmail@web36802.mail.mud.yahoo.com>
Message-ID: <000401c6777d$66ddb120$15327e82@pyrimidine>

This line should give you the hint:

	#revcom=Bio::Seq=HASH(0x10493304)

You're getting an object ref here.  The actual way to get the rev. comp on
the wiki states '$seq->revcom->seq', not '$seq->revcom'.

When I ran your script and change your line to the wiki version I get (using
my test seq):

what attributes/keys are available:
primary_id      =>      test,
primary_seq     =>      Bio::PrimarySeq=HASH(0x1d47fe0)
primary_id=test,

id=test,

revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG
CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG
TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG
GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG

Sun May 14 17:34:45 2006

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of chen li
> Sent: Sunday, May 14, 2006 12:15 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] no revcom method in Bio::Seq module?
> 
> Hi all,
> 
> I need to get a reverse-complemenary sequence out of a
> fasta sequence file. And the Synopsis of Bio::Seq
> points out I can do like this way:
> 
> $revcom=$seqobj->revcom();
> 
> I use the following script trying to get the job done
> but it doesn't work. Then I read documentation of
> Bio::Seq and it looks like it doesn't contain revcom
> method.
> 
> Any idea will be appreciated.
> 
> Li
> 
> 
> ###############################
> Here is the code:
> 
> #!c:/perl/bin/perl.exe
> use strict;
> use warnings;
> 
> use Bio::Seq;
> use Bio::SeqIO;
> 
> my $file='c:/perl/local/primer3_1.0.0/src/est.txt';
> 
> 
> my $seqIO=Bio::SeqIO->new(-file=>"<$file",
>                             -format=>'fasta' );
> 
>     my $seqobj=$seqIO->next_seq();#create object
> 
>   print "what attributes/keys are available:\n";
>   for my $key (sort keys %$seqobj){
>            my $value=$seqobj->{$key};
> 	    print "$key\t=>\t$value\n"
> 	    }
> # These are the output on the screen
> #primary_id =>      gi|54093|emb|X61809.1|
> #primary_seq =>     Bio::PrimarySeq=HASH(0x10492848)
> 
> #based on these results primary_id can get
> #access right away
> # as to primary_seq it is an object in
> #Bio::Primaryseq and it provides the following
> #methods after reading the documentaion:
>                 #new
> 		#seq
> 		#validate_seq
> 		#subseq
> 		#length
> 		#display_id
> 		#accession_number
> 		#primary_id
> 		#alphabet
> 		#desc
> 		#can_call_new
> 		#id
> 		#is_circular
> 		#object_id
> 		#version
> 		#authority
> 		#namespace
> 		#display_name
> 		#description
> 
> print "primary_id=",$seqobj->primary_id, "\n\n";
> print "id=",$seqobj->id, "\n\n";
> print "revcom=",$seqobj->revcom,"\n\n";
> 
>         my $now_time=localtime;
>         print  $now_time, "\n\n";
>         exit;
> 
>  #These are the output on the screen
> 	#primary_id=gi|54093|emb|X61809.1|
> 	#id=gi|54093|emb|X61809.1
> 	#revcom=Bio::Seq=HASH(0x10493304)
> 	#Sun May 14 12:45:20 2006
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From chen_li3 at yahoo.com  Sun May 14 18:08:49 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sun, 14 May 2006 11:08:49 -0700 (PDT)
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
In-Reply-To: <000401c6777d$66ddb120$15327e82@pyrimidine>
Message-ID: <20060514180849.55423.qmail@web36808.mail.mud.yahoo.com>

Hi Chris,

Thank you very much. But could you please give me the
link for this syntax: $seq->revcom->seq?

Li


--- Chris Fields <cjfields at uiuc.edu> wrote:

> This line should give you the hint:
> 
> 	#revcom=Bio::Seq=HASH(0x10493304)
> 
> You're getting an object ref here.  The actual way
> to get the rev. comp on
> the wiki states '$seq->revcom->seq', not
> '$seq->revcom'.
> 
> When I ran your script and change your line to the
> wiki version I get (using
> my test seq):
> 
> what attributes/keys are available:
> primary_id      =>      test,
> primary_seq     =>     
> Bio::PrimarySeq=HASH(0x1d47fe0)
> primary_id=test,
> 
> id=test,
> 
>
revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGATCGCGCGGTCCGGCAGCATCG
> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
>
CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCGTCGGCCGCGGGCAGTTCGGCG
> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
>
GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGTCACGTTGGAGCGGGCCACGCG
> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG
> 
> Sun May 14 17:34:45 2006
> 
> Chris
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of chen li
> > Sent: Sunday, May 14, 2006 12:15 PM
> > To: bioperl-l at bioperl.org
> > Subject: [Bioperl-l] no revcom method in Bio::Seq
> module?
> > 
> > Hi all,
> > 
> > I need to get a reverse-complemenary sequence out
> of a
> > fasta sequence file. And the Synopsis of Bio::Seq
> > points out I can do like this way:
> > 
> > $revcom=$seqobj->revcom();
> > 
> > I use the following script trying to get the job
> done
> > but it doesn't work. Then I read documentation of
> > Bio::Seq and it looks like it doesn't contain
> revcom
> > method.
> > 
> > Any idea will be appreciated.
> > 
> > Li
> > 
> > 
> > ###############################
> > Here is the code:
> > 
> > #!c:/perl/bin/perl.exe
> > use strict;
> > use warnings;
> > 
> > use Bio::Seq;
> > use Bio::SeqIO;
> > 
> > my
> $file='c:/perl/local/primer3_1.0.0/src/est.txt';
> > 
> > 
> > my $seqIO=Bio::SeqIO->new(-file=>"<$file",
> >                             -format=>'fasta' );
> > 
> >     my $seqobj=$seqIO->next_seq();#create object
> > 
> >   print "what attributes/keys are available:\n";
> >   for my $key (sort keys %$seqobj){
> >            my $value=$seqobj->{$key};
> > 	    print "$key\t=>\t$value\n"
> > 	    }
> > # These are the output on the screen
> > #primary_id =>      gi|54093|emb|X61809.1|
> > #primary_seq =>    
> Bio::PrimarySeq=HASH(0x10492848)
> > 
> > #based on these results primary_id can get
> > #access right away
> > # as to primary_seq it is an object in
> > #Bio::Primaryseq and it provides the following
> > #methods after reading the documentaion:
> >                 #new
> > 		#seq
> > 		#validate_seq
> > 		#subseq
> > 		#length
> > 		#display_id
> > 		#accession_number
> > 		#primary_id
> > 		#alphabet
> > 		#desc
> > 		#can_call_new
> > 		#id
> > 		#is_circular
> > 		#object_id
> > 		#version
> > 		#authority
> > 		#namespace
> > 		#display_name
> > 		#description
> > 
> > print "primary_id=",$seqobj->primary_id, "\n\n";
> > print "id=",$seqobj->id, "\n\n";
> > print "revcom=",$seqobj->revcom,"\n\n";
> > 
> >         my $now_time=localtime;
> >         print  $now_time, "\n\n";
> >         exit;
> > 
> >  #These are the output on the screen
> > 	#primary_id=gi|54093|emb|X61809.1|
> > 	#id=gi|54093|emb|X61809.1
> > 	#revcom=Bio::Seq=HASH(0x10493304)
> > 	#Sun May 14 12:45:20 2006
> > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Sun May 14 18:28:14 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Sun, 14 May 2006 13:28:14 -0500
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <b3ef767e.b86a2fe8.820dd00@expms6.cites.uiuc.edu>

I think the confusion lies in what revcom returns.  This page

http://www.bioperl.org/wiki/Getting_Started

show a quick way of using revcom, (which I mentioned previously) while this 
page

http://www.bioperl.org/wiki/HOWTO:Beginners

explains what is returned when you use revcom.  '$seq_obj->revcom' returns a 
sequence object (not a sequence string):

http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object

which is why you need to use the 'seq' method to get the string.

Hence, '$seq_obj->revcom->seq'.

Chris

---- Original message ----
>Date: Sun, 14 May 2006 11:08:49 -0700 (PDT)
>From: chen li <chen_li3 at yahoo.com>  
>Subject: RE: [Bioperl-l] no revcom method in Bio::Seq module?  
>To: Chris Fields <cjfields at uiuc.edu>
>Cc: bioperl-l at bioperl.org
>
>Hi Chris,
>
>Thank you very much. But could you please give me the
>link for this syntax: $seq->revcom->seq?
>
>Li
>
>
>
>--- Chris Fields <cjfields at uiuc.edu> wrote:
>
>> This line should give you the hint:
>> 
>> 	#revcom=Bio::Seq=HASH(0x10493304)
>> 
>> You're getting an object ref here.  The actual way
>> to get the rev. comp on
>> the wiki states '$seq->revcom->seq', not
>> '$seq->revcom'.
>> 
>> When I ran your script and change your line to the
>> wiki version I get (using
>> my test seq):
>> 
>> what attributes/keys are available:
>> primary_id      =>      test,
>> primary_seq     =>     
>> Bio::PrimarySeq=HASH(0x1d47fe0)
>> primary_id=test,
>> 
>> id=test,
>> 
>>
>revcom=GGAACGAGATCTCCATGCCGCGCACCATCGGCCCGGGATGCAGCACGAT
CGCGCGGTCCGGCAGCATCG
>> CCTGGCGCTTCTCGGACAATCCGTAGCGCACCGAGTACTCACGCGCGGA
>>
>CGGGAAGAAACTGCCGTTCATGCGTTCGGCCTGCACGCGCAGCATGAGCACCGCG
TCGGCCGCGGGCAGTTCGGCG
>> TCCAGGTCATAGGACACGGTCACCGGCCAGTTCTCGACGCCCCTGGGGA
>>
>GCAGCGTCGGTGGGGACACCAGCACCACCTCGGCCCCGAGGGTGTGCAGCAGCGT
CACGTTGGAGCGGGCCACGCG
>> GCTGTGCAGCACGTCGCCGACGATCACCACGCGCTTGCCCTCGACGCTG
>> 
>> Sun May 14 17:34:45 2006
>> 
>> Chris
>> 
>> > -----Original Message-----
>> > From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-
>> > bounces at lists.open-bio.org] On Behalf Of chen li
>> > Sent: Sunday, May 14, 2006 12:15 PM
>> > To: bioperl-l at bioperl.org
>> > Subject: [Bioperl-l] no revcom method in Bio::Seq
>> module?
>> > 
>> > Hi all,
>> > 
>> > I need to get a reverse-complemenary sequence out
>> of a
>> > fasta sequence file. And the Synopsis of Bio::Seq
>> > points out I can do like this way:
>> > 
>> > $revcom=$seqobj->revcom();
>> > 
>> > I use the following script trying to get the job
>> done
>> > but it doesn't work. Then I read documentation of
>> > Bio::Seq and it looks like it doesn't contain
>> revcom
>> > method.
>> > 
>> > Any idea will be appreciated.
>> > 
>> > Li
>> > 
>> > 
>> > ###############################
>> > Here is the code:
>> > 
>> > #!c:/perl/bin/perl.exe
>> > use strict;
>> > use warnings;
>> > 
>> > use Bio::Seq;
>> > use Bio::SeqIO;
>> > 
>> > my
>> $file='c:/perl/local/primer3_1.0.0/src/est.txt';
>> > 
>> > 
>> > my $seqIO=Bio::SeqIO->new(-file=>"<$file",
>> >                             -format=>'fasta' );
>> > 
>> >     my $seqobj=$seqIO->next_seq();#create object
>> > 
>> >   print "what attributes/keys are available:\n";
>> >   for my $key (sort keys %$seqobj){
>> >            my $value=$seqobj->{$key};
>> > 	    print "$key\t=>\t$value\n"
>> > 	    }
>> > # These are the output on the screen
>> > #primary_id =>      gi|54093|emb|X61809.1|
>> > #primary_seq =>    
>> Bio::PrimarySeq=HASH(0x10492848)
>> > 
>> > #based on these results primary_id can get
>> > #access right away
>> > # as to primary_seq it is an object in
>> > #Bio::Primaryseq and it provides the following
>> > #methods after reading the documentaion:
>> >                 #new
>> > 		#seq
>> > 		#validate_seq
>> > 		#subseq
>> > 		#length
>> > 		#display_id
>> > 		#accession_number
>> > 		#primary_id
>> > 		#alphabet
>> > 		#desc
>> > 		#can_call_new
>> > 		#id
>> > 		#is_circular
>> > 		#object_id
>> > 		#version
>> > 		#authority
>> > 		#namespace
>> > 		#display_name
>> > 		#description
>> > 
>> > print "primary_id=",$seqobj->primary_id, "\n\n";
>> > print "id=",$seqobj->id, "\n\n";
>> > print "revcom=",$seqobj->revcom,"\n\n";
>> > 
>> >         my $now_time=localtime;
>> >         print  $now_time, "\n\n";
>> >         exit;
>> > 
>> >  #These are the output on the screen
>> > 	#primary_id=gi|54093|emb|X61809.1|
>> > 	#id=gi|54093|emb|X61809.1
>> > 	#revcom=Bio::Seq=HASH(0x10493304)
>> > 	#Sun May 14 12:45:20 2006
>> > 
>> > 
>> > 
>> > __________________________________________________
>> > Do You Yahoo!?
>> > Tired of spam?  Yahoo! Mail has the best spam
>> protection around
>> > http://mail.yahoo.com
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> >
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 


From Marc.Logghe at DEVGEN.com  Sun May 14 20:28:34 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Sun, 14 May 2006 22:28:34 +0200
Subject: [Bioperl-l] no revcom method in Bio::Seq module?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746DAC@ANTARESIA.be.devgen.com>

Hi Li,
> doesn't work. Then I read documentation of Bio::Seq and it 
> looks like it doesn't contain revcom method.
Here, the Deobfuscator interface that Mauricio announced earlier, comes
in handy.
http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3
A%3ASeq&sort_order=by+method&search_string=
If you look in the methods table, you will find out that the revcom
method is inherited from, and implemented by Bio::PrimarySeqI.
HTH,
Marc 


From sb at mrc-dunn.cam.ac.uk  Mon May 15 08:18:11 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 09:18:11 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
 subspecies/variant names
In-Reply-To: <000f01c675e6$a61bde90$15327e82@pyrimidine>
References: <000f01c675e6$a61bde90$15327e82@pyrimidine>
Message-ID: <44683943.5020307@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> Sendu Bala wrote:
>> In bioperl up to at least 1.5.1, when one of the database modules 
>> comes across a species rank it does:
>> 
>> if ($rank eq 'species') { # get rid of genus from species name 
>> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); }
> 
> The XML example from NCBI Taxonomy I mentioned previously seems to 
> have everything in the classification, from superkingdom down to 
> species (no strain unfortunately, and I'm nit sure about subspecies);
> if it's missing the rank then the designation doesn't exist or is 
> tagged as 'no rank'.  Like I mentioned before I'm not intimately 
> familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I 
> don't have a clue as to how everything is parsed and plugged in to 
> Bio::Taxonomy objects.  I do know that XML::Twig is used for parsing
> through the data so it shouldn't be too hard to change what you
> want.

Yes, that's all true, but I'm not sure what it has to do with what I was
saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my
own implementation I change the rank of all 'no rank' Nodes below
species to 'variant'.


> I haven't tried using Bio::DB::Taxonomy directly yet, but I would 
> have thought that the binomial is just built from the XML twig 
> 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the
> tag 'Genus' and species from 'Species', and that the scientific name
> is from the tag 'ScientificName'.  Guess not.

No. See above for what it actually does. That is a copy/paste from the
code (there, $taxon_name == ScientificName). When it finds a species
rank it does that split because in the
ncbi taxonomy database the 'genus' rank for a human has a ScientificName
of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo
sapiens', and the bioperl model (quite rightly, I think) wants the
'species' node to not have information of other nodes (well, except for
the classification array). So it removes the 'Homo' from 'Homo sapiens'
giving a species name of 'sapiens'. This then allows the binomial method
to return 'Homo sapiens' instead of 'Homo Homo sapiens'.

(though in a bizarre twist, and this is one of my problems with how
names are currently represented in the Taxonomy modules, 'Scientific
Name' and 'binomial' are synonymous)


[snip]
>> My solution is to just remove whatever is the same between the 
>> current rank and the previous rank. Maybe even that's not so 
>> perfect, but it must be a lot better than turning the species 
>> 'Avian leukosis virus' into the species 'virus' (especially given 
>> that the genus here is 'Alpharetrovirus')!
> 
> I'm don't think taking Genus/Species directly from the scientific 
> name (normally what is in the SOURCE or ORGANISM annotation for 
> GenBank or OS for EMBL) is the best way to go about it [snip]

Perhaps, but again I'm not sure what this has to do with what I was
saying. If you don't want your species name to contain your genus name
you have to do some kind of parsing. My post merely pointed out that the
parsing currently in bioperl does not work for viruses and possibly
other species. I'd like to think that someone cares about this error and
would do the simple fix I offered, or that they already know about the
problem and have done their own fix.


> I'm also not sure that forcing a lookup for every TaxID in every 
> sequence every time it's passed through SeqIO is the best way to go 
> either, though I think it should be required for storing sequences. 
> It's a tricky balance.

In my own implementation any database lookups are cached, and you have
the option of not doing any database lookup at all and 'faking' a
taxonomy from the supplied list of names (so it works just like normal
Bio::Seq).


> I still think that maybe we should absolve ourselves from using 
> SOURCE/ORGANISM or OS/OC information in GenBank files as anything 
> more than strictly annotation, or reconstruct Bio::Species to maybe a
>  Bio::Annotation::Species object to handle that annotation and either
>  deprecate Bio::Species or separate it completely from any 
> Bio::Taxonomy objects.  It would really simplify things.  Then, if 
> anyone is interested in taxonomy, either install a local database or
>  use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course)
>  to grab the TaxID info.

My personal view is that having it as an annotation would serve no real
purpose. For me the whole point of any kind of species representation in
bioperl is to allow you to compare species in a biologically meaningful
way. If it's just some annotation then that means it's basically
free-form text and you have no guarantee that two sequences from the
same species are annotated exactly the same - no guarantee that your
code would identify that those sequences are from the same species.
The only other useful thing that a species object needs to do it let you
know how related two different species are - you need to be able to ask
what a species' class, kingdom etc. are. Again, not viable with an
annotation - you need something strict like a properly constructed Taxonomy.

I guess it comes down to the philosophy of parsing a file. Do you try
and reflect exactly what the file contains, letter for letter, so that
your resulting object can recreate that file letter for letter, or do
you parse the file and extract the correct /meaning/ in order to be more
useful?
I think there can be a choice by the user, and this is best done by
making Bio::Species a clever wrapper around an improved Bio::Taxonomy,
as in my own implementation.


From s_maheshwari84 at rediffmail.com  Mon May 15 08:15:26 2006
From: s_maheshwari84 at rediffmail.com (saurabh maheshwari)
Date: 15 May 2006 08:15:26 -0000
Subject: [Bioperl-l] please help
Message-ID: <20060515081526.27270.qmail@webmail7.rediffmail.com>

  
Hello All
I have sent a problem to the earlier also but my problem is still unsolve so i have modified the problem in another way please can any body give me code to make a graph between some items which are in a text file in the following formate:
Example
item1 interacts with item2 and i want to make graph by giving any item as input and asking all interactions of that item.

item 1      item 2 
A            B
A            C
C            B
D            B
D            E
A            F
G            A     

with Regards
SAURABH MAHESHWARI
M.Sc. (BIOINFORMATICS)
JAMIA MILLIA ISLAMIA
NEW DELHI


From sdavis2 at mail.nih.gov  Mon May 15 10:26:53 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 15 May 2006 06:26:53 -0400
Subject: [Bioperl-l] please help
In-Reply-To: <20060515081526.27270.qmail@webmail7.rediffmail.com>
Message-ID: <C08DCFAD.B7D2%sdavis2@mail.nih.gov>


On 5/15/06 4:15 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
wrote:

>   
> Hello All
> I have sent a problem to the earlier also but my problem is still unsolve so i
> have modified the problem in another way please can any body give me code to
> make a graph between some items which are in a text file in the following
> formate:
> Example
> item1 interacts with item2 and i want to make graph by giving any item as
> input and asking all interactions of that item.
> 
> item 1      item 2
> A            B
> A            C
> C            B
> D            B
> D            E
> A            F
> G            A   

Not a bioperl answer, but in your case, I would suggest looking at using
cytoscape to do this.  Look here for details:

http://www.cytoscape.org/

Sean


From sdavis2 at mail.nih.gov  Mon May 15 11:03:28 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 15 May 2006 07:03:28 -0400
Subject: [Bioperl-l] please help
In-Reply-To: <C08DCFAD.B7D2%sdavis2@mail.nih.gov>
Message-ID: <C08DD840.B7DE%sdavis2@mail.nih.gov>


On 5/15/06 6:26 AM, "Sean Davis" <sdavis2 at mail.nih.gov> wrote:

> 
> 
> 
> On 5/15/06 4:15 AM, "saurabh maheshwari" <s_maheshwari84 at rediffmail.com>
> wrote:
> 
>>   
>> Hello All
>> I have sent a problem to the earlier also but my problem is still unsolve so
>> i
>> have modified the problem in another way please can any body give me code to
>> make a graph between some items which are in a text file in the following
>> formate:
>> Example
>> item1 interacts with item2 and i want to make graph by giving any item as
>> input and asking all interactions of that item.
>> 
>> item 1      item 2
>> A            B
>> A            C
>> C            B
>> D            B
>> D            E
>> A            F
>> G            A  
> 
> Not a bioperl answer, but in your case, I would suggest looking at using
> cytoscape to do this.  Look here for details:
> 
> http://www.cytoscape.org/

I forgot to mention, if you are looking for a perl solution, I would look at
the Graph module.

http://search.cpan.org/~jhi/Graph-0.69/lib/Graph.pod

You can create the graph according to the docs and then use the neighbors()
method (if I remember correctly) to get the nodes connected to the query
node.

Sean


From akarger at CGR.Harvard.edu  Mon May 15 12:20:11 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 15 May 2006 08:20:11 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>

This tool is quite nice, and may save me a lot of perdoc'ing.

A couple of minor interface thoughts. 

1)There's quite a lot of methods for many of the classes. As such, I
think I'll often want to browse through what's available in a class. But
60% or so of the screen real estate is used for "Enter a search
string... OR select a class from the list". IMO, it would be better to
have two pages, a search page and a result page.   It only takes a click
on Back (or a "new search" button) to get to a new search, and now you
can use your whole screen for reading your results.

2) Please sort the "select a class from the list" alphabetically. I
guess I can enter a search term to get the right classes, but it would
be nice to be able to browse.
2a) if you want to be really fancy, make a javascript nested menu with
expandable submenus. OK, maybe not.

3) Minimalist is nice, but documentation is even nicer. It wasn't clear
to me that the search searches within class names rather than function
names. What I really want to know sometimes is which module has, say,
the revcom method in it. So, if it's not easy to include that within
this search, then at least tell me what my search space is.

4) When I search for something that's not found, I get a screen that
looks pretty familiar, with the extra text "No match to string found"
down at the bottom. It took me a while to even notice it. (Studies show
that most users don't read most of the text on a page.) Bold might be
nice here. Or put the error at the top of the screen. Or both.

5) I'll save my stupidest comment for last - please make the page title
"Bioperl Deobfuscator", so that when I bookmark it I'll know what the
bookmark stands for.

Thanks, Laura Kavanaugh and David Messina, for a neat AND useful tool.

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626


From sb at mrc-dunn.cam.ac.uk  Mon May 15 13:08:32 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 14:08:32 +0100
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>
References: <BB91E836FD4CDC4E830F628F39CDFF04D92E65@huls5.nucleus.harvard.edu>
Message-ID: <44687D50.6080306@mrc-dunn.cam.ac.uk>

Amir Karger wrote:
> This tool is quite nice, and may save me a lot of perdoc'ing.

Yes, many thanks to everyone involved.


> A couple of minor interface thoughts. 
> 
> 1)There's quite a lot of methods for many of the classes. As such, I
> think I'll often want to browse through what's available in a class. But
> 60% or so of the screen real estate is used for "Enter a search
> string... OR select a class from the list". IMO, it would be better to
> have two pages, a search page and a result page.   It only takes a click
> on Back (or a "new search" button) to get to a new search, and now you
> can use your whole screen for reading your results.

As the compromise it must be, I like the way it behaves. I don't like 
lots of windows. I especially don't like pop up windows. Right now when 
I'm using the bioperl docs I tend to have a whole bunch of tabs open to 
different class pages at once, so being able to see an overview all on 
one page in Deobfuscator is very nice.

Further to that, I'd love it if clicking on a method name caused an 
in-place css(&|javascript) reveal (similar to how a well implemented 
drop down menu works in a website) rather than a new window opened. 
Alternatively, just have more columns in the results table, ie. usage, 
function, returns, args columns. I feel that opening a window for each 
method you want to understand is far too slow.

I'd also really like a link to the code for the method as well. The 
bioperl docs are rarely complete enough that you can really understand 
what every method is supposed to do without looking at the code.


> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
> to me that the search searches within class names rather than function
> names. What I really want to know sometimes is which module has, say,
> the revcom method in it.

This would be a great feature to add.


Another minor interface thought:
6) Have a little more cell padding in all the tables. Things are just a 
little too cramped and things start to look messy/ run into each other.


From cjfields at uiuc.edu  Mon May 15 13:59:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 08:59:57 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <44687D50.6080306@mrc-dunn.cam.ac.uk>
Message-ID: <000901c67827$d99eabb0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, May 15, 2006 8:09 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Amir Karger wrote:
> > This tool is quite nice, and may save me a lot of perdoc'ing.
> 
> Yes, many thanks to everyone involved.

The Deobfuscator currently indexes bioperl-1.4, so it's not completely
up-to-date.  I believe Mauricio and Dave may be working on updating to the
newer versions and maybe bioperl-live, as well as getting the other bioperl
packages up and running.

For modules added after v1.4 I use the script in the FAQ question mentioned
on the Deobfuscator wiki page to get up-to-date methods, then grab the that
ActiveState HTML'd perldocs pumped out when installing using PPM (I make a
custom PPM/PPD file and install myself every once in a while):

#!/usr/bin/perl -w
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector-

> > A couple of minor interface thoughts.
> >
> > 1)There's quite a lot of methods for many of the classes. As such, I
> > think I'll often want to browse through what's available in a class. But
> > 60% or so of the screen real estate is used for "Enter a search
> > string... OR select a class from the list". IMO, it would be better to
> > have two pages, a search page and a result page.   It only takes a click
> > on Back (or a "new search" button) to get to a new search, and now you
> > can use your whole screen for reading your results.
> 
> As the compromise it must be, I like the way it behaves. I don't like
> lots of windows. I especially don't like pop up windows. Right now when
> I'm using the bioperl docs I tend to have a whole bunch of tabs open to
> different class pages at once, so being able to see an overview all on
> one page in Deobfuscator is very nice.
>
> Further to that, I'd love it if clicking on a method name caused an
> in-place css(&|javascript) reveal (similar to how a well implemented
> drop down menu works in a website) rather than a new window opened.
> Alternatively, just have more columns in the results table, ie. usage,
> function, returns, args columns. I feel that opening a window for each
> method you want to understand is far too slow.

Agreed.

> I'd also really like a link to the code for the method as well. The
> bioperl docs are rarely complete enough that you can really understand
> what every method is supposed to do without looking at the code.

The methods that pop up are in columns along with the class module that
implements the method.  


If you click on that link you get PDOC documentation for the module which
includes most of the code (strangely, though Deobfuscator indexes bioperl
1.4, the PDOC corresponds to bioperl-live).  Is that what you meant, or
something a bit more detailed?

> > 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
> > to me that the search searches within class names rather than function
> > names. What I really want to know sometimes is which module has, say,
> > the revcom method in it.

That's listed in the method results table (the next column has the module
with a link to the module's online docs).


Chris


> This would be a great feature to add.
> 
> 
> Another minor interface thought:
> 6) Have a little more cell padding in all the tables. Things are just a
> little too cramped and things start to look messy/ run into each other.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 16:08:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 11:08:30 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <44683943.5020307@mrc-dunn.cam.ac.uk>
Message-ID: <001601c67839$cf289490$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, May 15, 2006 3:18 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
> subspecies/variant names
> 
> Chris Fields wrote:
> > Sendu Bala wrote:
> >> In bioperl up to at least 1.5.1, when one of the database modules
> >> comes across a species rank it does:
> >>
> >> if ($rank eq 'species') { # get rid of genus from species name
> >> (undef,$taxon_name) = split(/\s+/,$taxon_name,2); }
> >
> > The XML example from NCBI Taxonomy I mentioned previously seems to
> > have everything in the classification, from superkingdom down to
> > species (no strain unfortunately, and I'm nit sure about subspecies);
> > if it's missing the rank then the designation doesn't exist or is
> > tagged as 'no rank'.  Like I mentioned before I'm not intimately
> > familiar Bio::Taxonomy, Bio::DB::Taxonomy, or Bio::Species, so I
> > don't have a clue as to how everything is parsed and plugged in to
> > Bio::Taxonomy objects.  I do know that XML::Twig is used for parsing
> > through the data so it shouldn't be too hard to change what you
> > want.
> 
> Yes, that's all true, but I'm not sure what it has to do with what I was
> saying. FYI, you do get a 'subspecies' rank but no 'variant' rank. In my
> own implementation I change the rank of all 'no rank' Nodes below
> species to 'variant'.

Sorry; wandered a bit off topic there.

> > I haven't tried using Bio::DB::Taxonomy directly yet, but I would
> > have thought that the binomial is just built from the XML twig
> > 'LineageEx' Rank=Genus + Rank=Species, that the genus comes from the
> > tag 'Genus' and species from 'Species', and that the scientific name
> > is from the tag 'ScientificName'.  Guess not.
> 
> No. See above for what it actually does. That is a copy/paste from the
> code (there, $taxon_name == ScientificName). When it finds a species
> rank it does that split because in the
> ncbi taxonomy database the 'genus' rank for a human has a ScientificName
> of 'Homo', whilst the 'species' rank has a ScientificName of 'Homo
> sapiens', and the bioperl model (quite rightly, I think) wants the
> 'species' node to not have information of other nodes (well, except for
> the classification array). So it removes the 'Homo' from 'Homo sapiens'
> giving a species name of 'sapiens'. This then allows the binomial method
> to return 'Homo sapiens' instead of 'Homo Homo sapiens'.
> 
> (though in a bizarre twist, and this is one of my problems with how
> names are currently represented in the Taxonomy modules, 'Scientific
> Name' and 'binomial' are synonymous)
 
Ah, now I see.  That's a bit screwy, but it's not on our end so we have to
deal with it.  I also noticed that subspecies also contains the entire
string:

    <Taxon>
      <TaxId>135461</TaxId>
      <ScientificName>Bacillus subtilis subsp. subtilis</ScientificName>
      <Rank>subspecies</Rank>
    </Taxon>

As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy,
I don't get the actual scientific name for the node (from the GenBank
ORGANISM line) almost every time; I get the name with the strain chopped off
instead and a number of times the names get mangled.  The regexes below only
grab from the topmost tags:

Script:
---------------------------------
#! perl
use strict;
use warnings;

use Bio::DB::Taxonomy;
my $file = shift @ARGV;

print "\nNCBI XML output ScientificName tag for each node:\n";
my @taxid =();
open (TAXFILE, "<tax.xml") or die "Can't open file:$!\n";
while (<TAXFILE>){
	if (/^\s{2}<TaxId>(\d+)<\/TaxId>/) {
		print "$1\t";
		push @taxid, $1;
	}
	print "$1\n" if /^\s{2}<ScientificName>(.*)<\/ScientificName>/;
}
close TAXFILE;

print "\nBio::DB::Taxonomy scientific_name:\n";
for my $id (@taxid){
	my $factory = Bio::DB::Taxonomy->new(-source => 'entrez');
	my $node = $factory->get_Taxonomy_Node(-taxonid => $id);
	print $node->ncbi_taxid,"\t",$node->scientific_name,"\n";
}
---------------------------------

Output:
---------------------------------
NCBI XML output ScientificName tag for each node:
191218  Bacillus anthracis str. A2012
198094  Bacillus anthracis str. Ames
222523  Bacillus cereus ATCC 10987
224308  Bacillus subtilis subsp. subtilis str. 168
226186  Bacteroides thetaiotaomicron VPI-5482
226900  Bacillus cereus ATCC 14579
246194  Carboxydothermus hydrogenoformans Z-2901
260799  Bacillus anthracis str. Sterne
261594  Bacillus anthracis str. 'Ames Ancestor'
264462  Bdellovibrio bacteriovorus HD100
272558  Bacillus halodurans C-125
272559  Bacteroides fragilis NCTC 9343
279010  Bacillus licheniformis ATCC 14580
281309  Bacillus thuringiensis serovar konkukian str. 97-27
288681  Bacillus cereus E33L
295405  Bacteroides fragilis YCH46
66692   Bacillus clausii KSM-K16
76114   Azoarcus sp. EbN1

Bio::DB::Taxonomy scientific_name:
191218  Bacillus cereus group anthracis
198094  Bacillus cereus group anthracis
222523  Bacillus cereus group cereus
224308  subtilis Bacillus subtilis subsp. subtilis
226186  Bacteroides thetaiotaomicron
226900  Bacillus cereus group cereus
246194  Carboxydothermus hydrogenoformans
260799  Bacillus cereus group anthracis
261594  Bacillus cereus group anthracis
264462  Bdellovibrio bacteriovorus
272558  Bacillus halodurans
272559  Bacteroides fragilis
279010  Bacillus licheniformis
281309  Bacillus cereus group thuringiensis
288681  Bacillus cereus group cereus
295405  Bacteroides fragilis
66692   Bacillus clausii
76114   Azoarcus sp.
---------------------------------
Note Bacillus subtilis in the Bio::Tax output above.  Not one of those is
the scientific name as defined by NCBI (and most taxonomists for that
matter).

So, in a nutshell, there's a problem here.  I don't know if your fix works
for that, but I definitely don't think the 'scientific name' should be
assembled ad hoc but should be taken from the tagname for that node.  I am
currently reduced to grabbing the feature primary_tagged 'source' and
getting the 'organism' tagname from that.  I cannot stress enough that it
should NOT be that way.

As for 'binomial' == 'scientific_name', I agree; I see it as well and that
should be fixed.
 
...
> Perhaps, but again I'm not sure what this has to do with what I was
> saying. If you don't want your species name to contain your genus name
> you have to do some kind of parsing. My post merely pointed out that the
> parsing currently in bioperl does not work for viruses and possibly
> other species. I'd like to think that someone cares about this error and
> would do the simple fix I offered, or that they already know about the
> problem and have done their own fix.

Again me going off-topic, so my apologies; it's more to do with my
frustrations with Bio::Species (not Bio::DB::Taxonomy).  My point here was,
since there is no real way to surmise from a GenBank flatfile what the
taxonomic ranks are w/o guessing (which seems to break more often than not
when dealing with complex names), there shouldn't be any tie to Bio::Tax
objects, at least directly.  I guess methods could be incorporated into
Bio::Species for those who want to give it a try, but I would like to get a
GenBank file, for once, in which the scientific name/binomial name isn't
mangled by Bio::Species.

Back to Bio::DB::Taxonomy; I don't have a problem with implementing your
methods here; on the contrary, if they fix my problem above then I'll be
more than glad to.  I can't get to it immediately but maybe later
today/tomorrow.
 
> > I'm also not sure that forcing a lookup for every TaxID in every
> > sequence every time it's passed through SeqIO is the best way to go
> > either, though I think it should be required for storing sequences.
> > It's a tricky balance.
> 
> In my own implementation any database lookups are cached, and you have
> the option of not doing any database lookup at all and 'faking' a
> taxonomy from the supplied list of names (so it works just like normal
> Bio::Seq).
>
> 
> > I still think that maybe we should absolve ourselves from using
> > SOURCE/ORGANISM or OS/OC information in GenBank files as anything
> > more than strictly annotation, or reconstruct Bio::Species to maybe a
> >  Bio::Annotation::Species object to handle that annotation and either
> >  deprecate Bio::Species or separate it completely from any
> > Bio::Taxonomy objects.  It would really simplify things.  Then, if
> > anyone is interested in taxonomy, either install a local database or
> >  use Entrez efetch, and then use Bio::DB::Taxonomy (fixed of course)
> >  to grab the TaxID info.
> 
> My personal view is that having it as an annotation would serve no real
> purpose. For me the whole point of any kind of species representation in
> bioperl is to allow you to compare species in a biologically meaningful
> way. If it's just some annotation then that means it's basically
> free-form text and you have no guarantee that two sequences from the
> same species are annotated exactly the same - no guarantee that your
> code would identify that those sequences are from the same species.
> The only other useful thing that a species object needs to do it let you
> know how related two different species are - you need to be able to ask
> what a species' class, kingdom etc. are. Again, not viable with an
> annotation - you need something strict like a properly constructed
> Taxonomy.

My point is, a large number of users do NOT use, nor care about, taxonomic
information to the degree they need to know the entire classification of the
organism; many are just as happy about getting the scientific name only,
which is in the GenBank/EMBL file itself.  To take one extreme, it is not
productive to force every user to download the NCBI tax database and use
lookups just to convert sequences from EMBL format to GenBank format.  It's
not productive to allow users to spam the NCBI tax database remotely either,
so hardcoding lookups is, IMHO, a big mistake.  

> I guess it comes down to the philosophy of parsing a file. Do you try
> and reflect exactly what the file contains, letter for letter, so that
> your resulting object can recreate that file letter for letter, or do
> you parse the file and extract the correct /meaning/ in order to be more
> useful?
> I think there can be a choice by the user, and this is best done by
> making Bio::Species a clever wrapper around an improved Bio::Taxonomy,
> as in my own implementation.

I understand both philosophies, but the latter implies that you know the
intention of the ones submitting the sequence.  99.9% of the time that's
fine, something I can live with.  However, when we mess up something as
simple as getting the scientific name for an organism when the information
is directly in the flat file (ORGANISM line) by trying to 'imply' what the
classification is, yes, I get frustrated.  Even more frustrating to me is
that Bio::DB::Taxonomy, which should return accurate information directly
from the Taxonomy database, still manages to screw up the scientific name.  

The NCBI definition in the sample record:

http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

state that the ORGANISM line contains the formal scientific name and it's
lineage (no ranking).  If the lineage is very long it is abbreviated so you
don't get the same thing as you would through using TaxID. 

So, in essence, I believe you are correct, that Bio::Species can be used as
a 'wrapper' for Bio::Taxonomy objects, but only up to a certain degree with
caveats or warnings for possible inaccuracies.  I also believe that lookups
should be allowed but optional, not required (i.e. left up to the user, as
you state).  

I just feel that it's somewhat misleading to imply, by delegating to
Bio::Taxonomy, that Bio::Species contains accurate taxonomic information
when NCBI themselves state that the GenBank flatfile classification can be
incomplete and does not supply rankings (genus, species) in the file.  It's
our best guess in most cases, and a best guess by definition is not very
accurate.  If you want taxonomic accuracy, use the TaxID and a local tax
database.  I feel that we shouldn't punish those who don't worry/care about
taxonomy by implementing Bio::Species with methods that mangle data that's
directly in the flat file they're parsing.

Okay, not to cut short this discussion, but I have to get back to $job.
I'll try adding your fixes in a bit later today/tomorrow; if they pass tests
I'll commit them in.

Chris


From hlapp at gmx.net  Mon May 15 16:59:06 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 12:59:06 -0400
Subject: [Bioperl-l] error loading uniprot release 49.6 into mysql
In-Reply-To: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
References: <051520061234.14794.446875470003415A000039CA21602807489D0A02970E9DD29C@att.net>
Message-ID: <C78E4724-CC95-483E-876B-69AF7C1CC6AF@gmx.net>

You found the right instance. Unfortunately with the way the bioperl  
swissprot parser works the group (RG) isn't promoted to author if  
there is no author in addition (in fact you may debate whether that  
would even be the best way of doing things), so it doesn't find it on  
second occurrence by unique key.

If you can live without this entry, or any other entry that causes a  
hiccup, just supply the flag --safe and it will gracefully move on to  
the next entry.

Fixing the issue would require either to fix the bioperl swissprot  
parser (or Bio::Annotation::Reference) to stick the RG group into the  
author slot if there is no author, or to fix Bioperl  
Bio::Annotation::Reference to also feature a group and biosql to use  
it in place of a missing author.

Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql)  
should just use that in place of a missing author?

The downside is that upon round-tripping an entry, the RG annotation  
line will become an RA annotation line. How bad would that be?

Any thoughts from anyone?

	-hilmar

On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote:

> I found where the script is hiccuping....
>
> The Uniprot release contains lines with identical annotation for  
> the RL keyword for two different sequences.
>
> ___________________
>
> First occurence...
> ___________________
>
> ID   1433T_PONPY    STANDARD;      PRT;   245 AA.
> AC   Q5RFJ2; Q5RDK2;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein theta.
> GN   Name=YWHAQ;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Brain cortex, and Kidney;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.   
> <======  Not Unique
>
>
> ___________________
>
> Second occurence...
> ___________________
>
>
> ID   1433G_PONPY    STANDARD;      PRT;   246 AA.
> AC   Q5RC20;
> DT   05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT   05-JUL-2005, sequence version 2.
> DT   18-APR-2006, entry version 13.
> DE   14-3-3 protein gamma.
> GN   Name=YWHAG;
> OS   Pongo pygmaeus (Orangutan).
> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC   Catarrhini; Hominidae; Pongo.
> OX   NCBI_TaxID=9600;
> RN   [1]
> RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC   TISSUE=Heart;
> RG   The German cDNA consortium;
> RL   Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.    
> <======  Not Unique
>
>
>
> in these two cases the generated CRC key is identical and so MySQL  
> throws a wobbly.
>
> if i look at the MySQL entry in the REFERENCE table for the first  
> sequence
> ------+-------+---------+----------------------+
> |          139 |      NULL | Submitted (NOV-2004) to the EMBL/ 
> GenBank/DDBJ databases. | NULL  | NULL    | CRC-E7973FEA4B5611DC |
> +--------------+----------- 
> +----------------------------------------------------
>
> and the error when the script choked was
>
>  MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,  
> values were
>  ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ
>  databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
>  Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3
>
> hence the problem.
>
> I'm guessing i'm not the first person to encounter this, but dont  
> see any hints for an easy way around this.
>
> any suggestions....?
>
> ta
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon May 15 17:01:14 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 13:01:14 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <4466AD7F.6050700@campus.iztacala.unam.mx>
References: <4466AD7F.6050700@campus.iztacala.unam.mx>
Message-ID: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net>

Hey, thanks to Laura & David for this interface.

Any idea why most of the Bio::Ontology::* modules show up without  
their leading Bio::Ontology? And clicking on those hyperlinks doesn't  
go anywhere either ... Anything different with those modules that I  
can fix?

	-hilmar

On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:

> I'm glad to announce the availability of the Deobfuscator interface at
> the BioPerl website. You can use it at the following URL:
>
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> Many thanks to Laura Kavanaugh and David Messina for this great
> contribution to the BioPerl project!
>
> Mauricio.
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 17:22:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 12:22:13 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <068E49BD-2DE4-47BA-BD7C-D6FD487DF095@gmx.net>
Message-ID: <000301c67844$1b506280$15327e82@pyrimidine>

That's strange.  Clicking on the list gives me the results for that module.
When I click on the hyperlinks in the results section they open fine; the
method column links opens a new page containing usage-function-returns-args
and the class column links opens pdoc (same page) for bioperl-live.  I'm
using Firefox 1.5 on WinXP.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 12:01 PM
> To: Mauricio Herrera Cuadra
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Hey, thanks to Laura & David for this interface.
> 
> Any idea why most of the Bio::Ontology::* modules show up without
> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> go anywhere either ... Anything different with those modules that I
> can fix?
> 
> 	-hilmar
> 
> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> 
> > I'm glad to announce the availability of the Deobfuscator interface at
> > the BioPerl website. You can use it at the following URL:
> >
> > http://bioperl.org/cgi-bin/deob_interface.cgi
> >
> > Many thanks to Laura Kavanaugh and David Messina for this great
> > contribution to the BioPerl project!
> >
> > Mauricio.
> >
> > --
> > MAURICIO HERRERA CUADRA
> > arareko at campus.iztacala.unam.mx
> > Laboratorio de Gen?tica
> > Unidad de Morfofisiolog?a y Funci?n
> > Facultad de Estudios Superiores Iztacala, UNAM
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sb at mrc-dunn.cam.ac.uk  Mon May 15 18:00:15 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Mon, 15 May 2006 19:00:15 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
 subspecies/variant names
In-Reply-To: <001601c67839$cf289490$15327e82@pyrimidine>
References: <001601c67839$cf289490$15327e82@pyrimidine>
Message-ID: <4468C1AF.9080400@mrc-dunn.cam.ac.uk>

Chris Fields wrote:
> 
> Ah, now I see.  That's a bit screwy, but it's not on our end so we have to
> deal with it.  I also noticed that subspecies also contains the entire
> string:
> 
>     <Taxon>
>       <TaxId>135461</TaxId>
>       <ScientificName>Bacillus subtilis subsp. subtilis</ScientificName>
>       <Rank>subspecies</Rank>
>     </Taxon>

Yes, this is one of the problems I mentioned in the first post to this
thread.


> As for the 'scientific_name' method when accessed through Bio::DB::Taxonomy,
> I don't get the actual scientific name for the node (from the GenBank
> ORGANISM line) almost every time; I get the name with the strain chopped off
> instead and a number of times the names get mangled.

[snip, should be:]
> 224308  Bacillus subtilis subsp. subtilis str. 168
> 281309  Bacillus thuringiensis serovar konkukian str. 97-27

[snip, but Bio::DB::Taxonomy gives:]
> 224308  subtilis Bacillus subtilis subsp. subtilis
> 281309  Bacillus cereus group thuringiensis

[snip]
> So, in a nutshell, there's a problem here.  I don't know if your fix works
> for that, but I definitely don't think the 'scientific name' should be
> assembled ad hoc but should be taken from the tagname for that node.

Yes, my implementation will get you the correct answer, but not quite as
you say. My solution was to munge the actual ScientificName but 'ensure'
that the binomial would give you back the actual binomial name you
wanted - which is the intent of current Bio::DB::Taxonomy code.

my $species0 = TFBS::Species->new(-ncbi_taxid => 224308);
my $leaf_node = $species0->taxonomy->get_leaves();
print "sci_name of Node = '", $leaf_node->scientific_name, "'\n";
print "Species0 subspecies = '", $species0->subspecies, "'\n";
print "Species0 variants = '", scalar($species0->variant), "'\n";
print "Species0 binomial = '", $species0->binomial('FULL'), "'\n";

gives:
sci_name of Node = 'str. 168'
Species0 subspecies = 'subsp. subtilis'
Species0 variants = 'str. 168'
Species0 binomial = 'Bacillus subtilis subsp. subtilis str. 168'

and the same again for id 281309:

sci_name of Node = 'str. 97-27'
Species0 subspecies = ''
Species0 variants = 'serovar konkukian str. 97-27'
Species0 binomial = 'Bacillus thuringiensis serovar konkukian str. 97-27'

I've done it this way because even though strictly speaking the
ScientificName for 224308 (a 'no rank') is 'Bacillus subtilis subsp.
subtilis str. 168', when I ask for the variant I don't want that whole
string. I just want the bit that will be different when comparing other
strains of this subspecies of this species of Bacillus. I want 'str.
168'. Note that my objects never store the original ScientificName; it
is due to 'luck' (or as I like to think, a good implementation) that the
binomial method is able to reconstruct a string that is identical to
what the original ScientificName was.

If you'd like to see my code let me know. You can't just drop the code
snippet I posted in this thread into existing bioperl modules; quite a
bit else has to change as well. I'll have to make an updated
taxonomy_the_tfbs_way.tar.gz file available if you want an example
implementation; the current version of that file is now out of date - it
doesn't do any of what I describe above.


From hlapp at gmx.net  Mon May 15 18:08:49 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 14:08:49 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000301c67844$1b506280$15327e82@pyrimidine>
References: <000301c67844$1b506280$15327e82@pyrimidine>
Message-ID: <F85F6F46-3AB7-4D42-825B-BAD4CA748FC8@gmx.net>

Safari or Firefox on MacOSX don't do this. Note that the appearance  
in the browsable list is already different (the prefix is missing),  
and the JavaScript link also lacks the prefix in the module name in  
contrast to others, e.g., Bio::Ontology::Ontology (which is one of  
the few Bio::Ontology exceptions that do work and do display correctly).

I suppose there is something peculiar about the code formatting of  
those modules? Some of the modules under Bio::OntologyIO are also  
affected BTW.

What happens is after you click on the link the page apppears to  
reload (i.e., gets submitted) but the second table that is supposed  
open underneath the first doesn't appear. However, the sort-by drop  
down selector does appear.

	-hilmar

On May 15, 2006, at 1:22 PM, Chris Fields wrote:

> That's strange.  Clicking on the list gives me the results for that  
> module.
> When I click on the hyperlinks in the results section they open  
> fine; the
> method column links opens a new page containing usage-function- 
> returns-args
> and the class column links opens pdoc (same page) for bioperl- 
> live.  I'm
> using Firefox 1.5 on WinXP.
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Monday, May 15, 2006 12:01 PM
>> To: Mauricio Herrera Cuadra
>> Cc: bioperl-l
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Hey, thanks to Laura & David for this interface.
>>
>> Any idea why most of the Bio::Ontology::* modules show up without
>> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
>> go anywhere either ... Anything different with those modules that I
>> can fix?
>>
>> 	-hilmar
>>
>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>
>>> I'm glad to announce the availability of the Deobfuscator  
>>> interface at
>>> the BioPerl website. You can use it at the following URL:
>>>
>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>
>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>> contribution to the BioPerl project!
>>>
>>> Mauricio.
>>>
>>> --
>>> MAURICIO HERRERA CUADRA
>>> arareko at campus.iztacala.unam.mx
>>> Laboratorio de Gen?tica
>>> Unidad de Morfofisiolog?a y Funci?n
>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 19:07:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 14:07:59 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <F85F6F46-3AB7-4D42-825B-BAD4CA748FC8@gmx.net>
Message-ID: <000501c67852$e1bb55c0$15327e82@pyrimidine>

I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab
which I can try it on).  I'll let you know what I find.  

This is what I get when I do a search for 'Bio::Ont*' using Firefox on WinXP
and this Deobfuscator link (http://bioperl.org/cgi-bin/deob_interface.cgi?);
all the classes have links that work (I added newline and tab to make it a
bit more readable) :

Bio::OntologyIO	
	Parser factory for Ontology formats
Bio::OntologyIO::Handlers::BaseSAXHandler	
	no short description available
Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
	no short description available
Bio::Ontology::OntologyI
	Interface for an ontology implementation
Bio::Ontology::TermFactory
	Instantiates a new Bio::Ontology::TermI (or derived class) through a
factory
Bio::Ontology::OntologyStore
	A repository of ontologies
Bio::Ontology::RelationshipFactory
	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
through a factory
Bio::Ontology::Ontology
	standard implementation of an Ontology

So the names seem fine here.

When I click on a class (Bio::Ontology::Ontology) I get in the results
section:

Method                  Class                                     Returns
Usage
add_relationship        Bio::Ontology::Ontology	                  Its
argument.     add_relationship(RelationshipI relationship): RelationshipI
add_relationship_type   Bio::Ontology::OntologyEngineI            not
documented    not documented
add_term                Bio::Ontology::Ontology                   its
argument.     add_term(TermI term): TermI

....and so on

Where each method is clickable and opens a new page containing a table:

Bio::Ontology::Ontology::add_relationship
Usage	add_relationship(RelationshipI relationship): RelationshipI
Function	Adds a relationship object to the ontology engine.
Returns	Its argument.
Args	A RelationshipI object.


Each class is also linked to the bioperl-live PDOC.  Clicking on class
Bio::Ontology::Ontology in the results table gets me this page (no new
page):

http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html


Chris

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Monday, May 15, 2006 1:09 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> Safari or Firefox on MacOSX don't do this. Note that the appearance
> in the browsable list is already different (the prefix is missing),
> and the JavaScript link also lacks the prefix in the module name in
> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> the few Bio::Ontology exceptions that do work and do display correctly).
> 
> I suppose there is something peculiar about the code formatting of
> those modules? Some of the modules under Bio::OntologyIO are also
> affected BTW.
> 
> What happens is after you click on the link the page apppears to
> reload (i.e., gets submitted) but the second table that is supposed
> open underneath the first doesn't appear. However, the sort-by drop
> down selector does appear.
> 
> 	-hilmar
> 
> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> 
> > That's strange.  Clicking on the list gives me the results for that
> > module.
> > When I click on the hyperlinks in the results section they open
> > fine; the
> > method column links opens a new page containing usage-function-
> > returns-args
> > and the class column links opens pdoc (same page) for bioperl-
> > live.  I'm
> > using Firefox 1.5 on WinXP.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >> Sent: Monday, May 15, 2006 12:01 PM
> >> To: Mauricio Herrera Cuadra
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> Hey, thanks to Laura & David for this interface.
> >>
> >> Any idea why most of the Bio::Ontology::* modules show up without
> >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> >> go anywhere either ... Anything different with those modules that I
> >> can fix?
> >>
> >> 	-hilmar
> >>
> >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>
> >>> I'm glad to announce the availability of the Deobfuscator
> >>> interface at
> >>> the BioPerl website. You can use it at the following URL:
> >>>
> >>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>
> >>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>> contribution to the BioPerl project!
> >>>
> >>> Mauricio.
> >>>
> >>> --
> >>> MAURICIO HERRERA CUADRA
> >>> arareko at campus.iztacala.unam.mx
> >>> Laboratorio de Gen?tica
> >>> Unidad de Morfofisiolog?a y Funci?n
> >>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From cjfields at uiuc.edu  Mon May 15 19:12:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 14:12:34 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <000601c67853$85d49cc0$15327e82@pyrimidine>

I just tried the same thing (links, search, etc) with Mac OS X v 10.3.9 and
Safari (no Firefox sorry) and it worked fine as well (all links, no missing
Bio::Ontology, etc).  Not sure what it could be...

Chris

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Monday, May 15, 2006 2:08 PM
> To: 'Hilmar Lapp'
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: RE: [Bioperl-l] Deobfuscator interface now available
> 
> I'll have to give it a try on Mac OS X (we have an ancient G4 in the lab
> which I can try it on).  I'll let you know what I find.
> 
> This is what I get when I do a search for 'Bio::Ont*' using Firefox on
> WinXP and this Deobfuscator link (http://bioperl.org/cgi-
> bin/deob_interface.cgi?); all the classes have links that work (I added
> newline and tab to make it a bit more readable) :
> 
> Bio::OntologyIO
> 	Parser factory for Ontology formats
> Bio::OntologyIO::Handlers::BaseSAXHandler
> 	no short description available
> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> 	no short description available
> Bio::Ontology::OntologyI
> 	Interface for an ontology implementation
> Bio::Ontology::TermFactory
> 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> factory
> Bio::Ontology::OntologyStore
> 	A repository of ontologies
> Bio::Ontology::RelationshipFactory
> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> through a factory
> Bio::Ontology::Ontology
> 	standard implementation of an Ontology
> 
> So the names seem fine here.
> 
> When I click on a class (Bio::Ontology::Ontology) I get in the results
> section:
> 
> Method                  Class                                     Returns
> Usage
> add_relationship        Bio::Ontology::Ontology
Its
> argument.     add_relationship(RelationshipI relationship): RelationshipI
> add_relationship_type   Bio::Ontology::OntologyEngineI            not
> documented    not documented
> add_term                Bio::Ontology::Ontology                   its
> argument.     add_term(TermI term): TermI
> 
> ....and so on
> 
> Where each method is clickable and opens a new page containing a table:
> 
> Bio::Ontology::Ontology::add_relationship
> Usage	add_relationship(RelationshipI relationship): RelationshipI
> Function	Adds a relationship object to the ontology engine.
> Returns	Its argument.
> Args	A RelationshipI object.
> 
> 
> Each class is also linked to the bioperl-live PDOC.  Clicking on class
> Bio::Ontology::Ontology in the results table gets me this page (no new
> page):
> 
> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> 
> 
> Chris
> 
> > -----Original Message-----
> > From: Hilmar Lapp [mailto:hlapp at gmx.net]
> > Sent: Monday, May 15, 2006 1:09 PM
> > To: Chris Fields
> > Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> > Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >
> > Safari or Firefox on MacOSX don't do this. Note that the appearance
> > in the browsable list is already different (the prefix is missing),
> > and the JavaScript link also lacks the prefix in the module name in
> > contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> > the few Bio::Ontology exceptions that do work and do display correctly).
> >
> > I suppose there is something peculiar about the code formatting of
> > those modules? Some of the modules under Bio::OntologyIO are also
> > affected BTW.
> >
> > What happens is after you click on the link the page apppears to
> > reload (i.e., gets submitted) but the second table that is supposed
> > open underneath the first doesn't appear. However, the sort-by drop
> > down selector does appear.
> >
> > 	-hilmar
> >
> > On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >
> > > That's strange.  Clicking on the list gives me the results for that
> > > module.
> > > When I click on the hyperlinks in the results section they open
> > > fine; the
> > > method column links opens a new page containing usage-function-
> > > returns-args
> > > and the class column links opens pdoc (same page) for bioperl-
> > > live.  I'm
> > > using Firefox 1.5 on WinXP.
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> > >> Sent: Monday, May 15, 2006 12:01 PM
> > >> To: Mauricio Herrera Cuadra
> > >> Cc: bioperl-l
> > >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> > >>
> > >> Hey, thanks to Laura & David for this interface.
> > >>
> > >> Any idea why most of the Bio::Ontology::* modules show up without
> > >> their leading Bio::Ontology? And clicking on those hyperlinks doesn't
> > >> go anywhere either ... Anything different with those modules that I
> > >> can fix?
> > >>
> > >> 	-hilmar
> > >>
> > >> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> > >>
> > >>> I'm glad to announce the availability of the Deobfuscator
> > >>> interface at
> > >>> the BioPerl website. You can use it at the following URL:
> > >>>
> > >>> http://bioperl.org/cgi-bin/deob_interface.cgi
> > >>>
> > >>> Many thanks to Laura Kavanaugh and David Messina for this great
> > >>> contribution to the BioPerl project!
> > >>>
> > >>> Mauricio.
> > >>>
> > >>> --
> > >>> MAURICIO HERRERA CUADRA
> > >>> arareko at campus.iztacala.unam.mx
> > >>> Laboratorio de Gen?tica
> > >>> Unidad de Morfofisiolog?a y Funci?n
> > >>> Facultad de Estudios Superiores Iztacala, UNAM
> > >>>
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>
> > >>
> > >> --
> > >> ===========================================================
> > >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > >> ===========================================================
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >


From arareko at campus.iztacala.unam.mx  Mon May 15 19:20:10 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 15 May 2006 14:20:10 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
Message-ID: <4468D46A.8070203@campus.iztacala.unam.mx>

Laura and Dave would be very happy to see all of your 
comments/suggestions/enhancements/complaints summarized in the 
appropriate wiki page. Just be sure to sign them properly with your name 
and date:

http://bioperl.org/wiki/Deobfuscator

I think they'll have to discuss which features will be nice to implement 
and which don't, depending on the direction they want their project to 
go. But don't worry, they're extremely nice people who are open to all 
kind of ideas. The best of all: the Deobfuscator is open-source so 
everyone is invited to contribute to it, just ask them for the code :)

On my side, I'm working on tweaking the code so it would be able of 
browsing different BioPerl packages (core, run, ext) and their 
respective releases (stable, developer, cvs).

Regards,
Mauricio.

Chris Fields wrote:
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>> Sent: Monday, May 15, 2006 8:09 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Amir Karger wrote:
>>> This tool is quite nice, and may save me a lot of perdoc'ing.
>> Yes, many thanks to everyone involved.
> 
> The Deobfuscator currently indexes bioperl-1.4, so it's not completely
> up-to-date.  I believe Mauricio and Dave may be working on updating to the
> newer versions and maybe bioperl-live, as well as getting the other bioperl
> packages up and running.
> 
> For modules added after v1.4 I use the script in the FAQ question mentioned
> on the Deobfuscator wiki page to get up-to-date methods, then grab the that
> ActiveState HTML'd perldocs pumped out when installing using PPM (I make a
> custom PPM/PPD file and install myself every once in a while):
> 
> #!/usr/bin/perl -w
> use Class::Inspector;
> $class = shift || die "Usage: methods perl_class_name\n";
> eval "require $class";
> print join ("\n", sort @{Class::Inspector-
> 
>>> A couple of minor interface thoughts.
>>>
>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>> think I'll often want to browse through what's available in a class. But
>>> 60% or so of the screen real estate is used for "Enter a search
>>> string... OR select a class from the list". IMO, it would be better to
>>> have two pages, a search page and a result page.   It only takes a click
>>> on Back (or a "new search" button) to get to a new search, and now you
>>> can use your whole screen for reading your results.
>> As the compromise it must be, I like the way it behaves. I don't like
>> lots of windows. I especially don't like pop up windows. Right now when
>> I'm using the bioperl docs I tend to have a whole bunch of tabs open to
>> different class pages at once, so being able to see an overview all on
>> one page in Deobfuscator is very nice.
>>
>> Further to that, I'd love it if clicking on a method name caused an
>> in-place css(&|javascript) reveal (similar to how a well implemented
>> drop down menu works in a website) rather than a new window opened.
>> Alternatively, just have more columns in the results table, ie. usage,
>> function, returns, args columns. I feel that opening a window for each
>> method you want to understand is far too slow.
> 
> Agreed.
> 
>> I'd also really like a link to the code for the method as well. The
>> bioperl docs are rarely complete enough that you can really understand
>> what every method is supposed to do without looking at the code.
> 
> The methods that pop up are in columns along with the class module that
> implements the method.  
> 
> 
> If you click on that link you get PDOC documentation for the module which
> includes most of the code (strangely, though Deobfuscator indexes bioperl
> 1.4, the PDOC corresponds to bioperl-live).  Is that what you meant, or
> something a bit more detailed?
> 
>>> 3) Minimalist is nice, but documentation is even nicer. It wasn't clear
>>> to me that the search searches within class names rather than function
>>> names. What I really want to know sometimes is which module has, say,
>>> the revcom method in it.
> 
> That's listed in the method results table (the next column has the module
> with a link to the module's online docs).
> 
> 
> Chris
> 
> 
>> This would be a great feature to add.
>>
>>
>> Another minor interface thought:
>> 6) Have a little more cell padding in all the tables. Things are just a
>> little too cramped and things start to look messy/ run into each other.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at gmx.net  Mon May 15 19:23:55 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 15:23:55 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000501c67852$e1bb55c0$15327e82@pyrimidine>
References: <000501c67852$e1bb55c0$15327e82@pyrimidine>
Message-ID: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net>

I wasn't using the search. It's in the scrollable table for browsing.  
-hilmar

On May 15, 2006, at 3:07 PM, Chris Fields wrote:

> I'll have to give it a try on Mac OS X (we have an ancient G4 in  
> the lab
> which I can try it on).  I'll let you know what I find.
>
> This is what I get when I do a search for 'Bio::Ont*' using Firefox  
> on WinXP
> and this Deobfuscator link (http://bioperl.org/cgi-bin/ 
> deob_interface.cgi?);
> all the classes have links that work (I added newline and tab to  
> make it a
> bit more readable) :
>
> Bio::OntologyIO	
> 	Parser factory for Ontology formats
> Bio::OntologyIO::Handlers::BaseSAXHandler	
> 	no short description available
> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> 	no short description available
> Bio::Ontology::OntologyI
> 	Interface for an ontology implementation
> Bio::Ontology::TermFactory
> 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> factory
> Bio::Ontology::OntologyStore
> 	A repository of ontologies
> Bio::Ontology::RelationshipFactory
> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> through a factory
> Bio::Ontology::Ontology
> 	standard implementation of an Ontology
>
> So the names seem fine here.
>
> When I click on a class (Bio::Ontology::Ontology) I get in the results
> section:
>
> Method                  Class                                      
> Returns
> Usage
> add_relationship        Bio::Ontology::Ontology	                  Its
> argument.     add_relationship(RelationshipI relationship):  
> RelationshipI
> add_relationship_type   Bio::Ontology::OntologyEngineI            not
> documented    not documented
> add_term                Bio::Ontology::Ontology                   its
> argument.     add_term(TermI term): TermI
>
> ....and so on
>
> Where each method is clickable and opens a new page containing a  
> table:
>
> Bio::Ontology::Ontology::add_relationship
> Usage	add_relationship(RelationshipI relationship): RelationshipI
> Function	Adds a relationship object to the ontology engine.
> Returns	Its argument.
> Args	A RelationshipI object.
>
>
> Each class is also linked to the bioperl-live PDOC.  Clicking on class
> Bio::Ontology::Ontology in the results table gets me this page (no new
> page):
>
> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
>
>
> Chris
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Monday, May 15, 2006 1:09 PM
>> To: Chris Fields
>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> Safari or Firefox on MacOSX don't do this. Note that the appearance
>> in the browsable list is already different (the prefix is missing),
>> and the JavaScript link also lacks the prefix in the module name in
>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
>> the few Bio::Ontology exceptions that do work and do display  
>> correctly).
>>
>> I suppose there is something peculiar about the code formatting of
>> those modules? Some of the modules under Bio::OntologyIO are also
>> affected BTW.
>>
>> What happens is after you click on the link the page apppears to
>> reload (i.e., gets submitted) but the second table that is supposed
>> open underneath the first doesn't appear. However, the sort-by drop
>> down selector does appear.
>>
>> 	-hilmar
>>
>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
>>
>>> That's strange.  Clicking on the list gives me the results for that
>>> module.
>>> When I click on the hyperlinks in the results section they open
>>> fine; the
>>> method column links opens a new page containing usage-function-
>>> returns-args
>>> and the class column links opens pdoc (same page) for bioperl-
>>> live.  I'm
>>> using Firefox 1.5 on WinXP.
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>> Sent: Monday, May 15, 2006 12:01 PM
>>>> To: Mauricio Herrera Cuadra
>>>> Cc: bioperl-l
>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>
>>>> Hey, thanks to Laura & David for this interface.
>>>>
>>>> Any idea why most of the Bio::Ontology::* modules show up without
>>>> their leading Bio::Ontology? And clicking on those hyperlinks  
>>>> doesn't
>>>> go anywhere either ... Anything different with those modules that I
>>>> can fix?
>>>>
>>>> 	-hilmar
>>>>
>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>>>
>>>>> I'm glad to announce the availability of the Deobfuscator
>>>>> interface at
>>>>> the BioPerl website. You can use it at the following URL:
>>>>>
>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>>
>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>>>> contribution to the BioPerl project!
>>>>>
>>>>> Mauricio.
>>>>>
>>>>> --
>>>>> MAURICIO HERRERA CUADRA
>>>>> arareko at campus.iztacala.unam.mx
>>>>> Laboratorio de Gen?tica
>>>>> Unidad de Morfofisiolog?a y Funci?n
>>>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ClarkeW at AGR.GC.CA  Mon May 15 19:40:15 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Mon, 15 May 2006 15:40:15 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>

Hey everyone, 

 
I have been developing some code to download and parse blast reports
from a remote server using Soap::Lite as well as insert the results into
a mysql database. The problem I am having is that my program seems to be
taking up and huge amount of RAM. For a single job of 10000 queries it
can consume as much as a couple hundred Mb inside an hour. I realize
that a lot of work is being done but this seems like way too much. This
leads me to the subject of my post. I think I may have traced the source
of the memory leak to Bio::SearchIO. I have used Devel::Size to track
the size of my variables and done other debugging steps and have had no
luck with resolving this very frustrating problem. My code is as
follows:

 
 my $result = $connector->getQueryResult($query_id);

 
                my $FH;

                open $FH, "<", \$result;

 
                my $searchio = new Bio::SearchIO(-format => "blast",

 
                         -fh => $FH);

 
                while (my $o_blast = $searchio->next_result()) {

                        my $clone_id = $o_blast->query_name();

 
                        my $statement = $bdbi->form_push_SQL ($o_blast,
$clone_id, 5);

 
this is just the leading and tailing code surrounding the use of
Bio::SearchIO since there is quite a lot. I am mostly just wondering if
anyone has ever had problems with SearchIO and its memory usage. I
looked at the source code for it but am afraid it is out of my league.
Any help/suggestions/questions would be great. Thanks


From dmessina at wustl.edu  Mon May 15 19:34:10 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 15 May 2006 14:34:10 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000901c67827$d99eabb0$15327e82@pyrimidine>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
Message-ID: <C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>

Responding to:
>>> Amir Karger
>> Sendu Bala
>  Chris Fields


> The Deobfuscator currently indexes bioperl-1.4, so it's not completely
> up-to-date.  I believe Mauricio and Dave may be working on updating  
> to the
> newer versions and maybe bioperl-live, as well as getting the other  
> bioperl
> packages up and running.

That's correct -- Mauricio is currently working on a version that  
will allow you to search 1.4, 1.5.1, or bioperl-live. The  
Deobfuscator indexes will be updated (daily?) to keep them in sync  
with the CVS repository.


>>> A couple of minor interface thoughts.
>>>
>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>> think I'll often want to browse through what's available in a  
>>> class. But
>>> 60% or so of the screen real estate is used for "Enter a search
>>> string... OR select a class from the list". IMO, it would be  
>>> better to
>>> have two pages, a search page and a result page.   It only takes  
>>> a click
>>> on Back (or a "new search" button) to get to a new search, and  
>>> now you
>>> can use your whole screen for reading your results.
>>
>> As the compromise it must be, I like the way it behaves. I don't like
>> lots of windows. I especially don't like pop up windows. Right now  
>> when
>> I'm using the bioperl docs I tend to have a whole bunch of tabs  
>> open to
>> different class pages at once, so being able to see an overview  
>> all on
>> one page in Deobfuscator is very nice.

I think the current behavior makes sense as the default, but I like  
the idea of being able to view the search results in a separate  
window for easier browsing. Thanks for the suggestion; I'll add it to  
the list.


>> Further to that, I'd love it if clicking on a method name caused an
>> in-place css(&|javascript) reveal (similar to how a well implemented
>> drop down menu works in a website) rather than a new window opened.
>> Alternatively, just have more columns in the results table, ie.  
>> usage,
>> function, returns, args columns. I feel that opening a window for  
>> each
>> method you want to understand is far too slow.
>
> Agreed.

Yeah, the way it currently works is admittedly lame, and was done as  
a placeholder until we figured out a better way to do it. An in-place  
reveal sounds like a good solution.


>>> 2) Please sort the "select a class from the list" alphabetically. I
>>> guess I can enter a search term to get the right classes, but it  
>>> would
>>> be nice to be able to browse.

Agreed. I think we were doing this in an earlier test version, but I  
must have left it out of the release I handed off to Mauricio.


>>> 3) Minimalist is nice, but documentation is even nicer. It wasn't  
>>> clear
>>> to me that the search searches within class names rather than  
>>> function
>>> names. What I really want to know sometimes is which module has,  
>>> say,
>>> the revcom method in it.
>>
>> This would be a great feature to add.

That's a great idea.


>>> 4) When I search for something that's not found, I get a screen that
>>> looks pretty familiar, with the extra text "No match to string  
>>> found"
>>> down at the bottom. It took me a while to even notice it.  
>>> (Studies show
>>> that most users don't read most of the text on a page.) Bold  
>>> might be
>>> nice here. Or put the error at the top of the screen. Or both.

Added to the list.


>>> 5) I'll save my stupidest comment for last - please make the page  
>>> title
>>> "Bioperl Deobfuscator", so that when I bookmark it I'll know what  
>>> the
>>> bookmark stands for.

Added to the list. Not stupid, by the way -- much to my surprise,  
there are at least 2 or 3 other (obviously inferior :) )  
deobfuscators floating around out there.


>> Another minor interface thought:
>> 6) Have a little more cell padding in all the tables. Things are  
>> just a
>> little too cramped and things start to look messy/ run into each  
>> other.

Added to the list.


Thanks to all of you for taking the time to give such detailed  
feedback -- it's really helpful.

There is a wiki page on the BioPerl site for this project (http:// 
www.bioperl.org/wiki/Deobfuscator), so I'll be putting your comments  
there for tracking and further discussion. Please feel free to add to  
it.


Dave


-- 
Dave Messina
WashU Genome Sequencing Center
dmessina at wustl.edu
314-286-1825


From faruque at ebi.ac.uk  Mon May 15 19:47:27 2006
From: faruque at ebi.ac.uk (Nadeem Faruque)
Date: Mon, 15 May 2006 20:47:27 +0100
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
Message-ID: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk>

>> My personal view is that having it as an annotation would serve no  
>> real
>> purpose. For me the whole point of any kind of species  
>> representation in
>> bioperl is to allow you to compare species in a biologically  
>> meaningful
>> way. If it's just some annotation then that means it's basically

I understand the need to find the species name of entries, especially  
now that so many complete genomes have been given their own strain- 
specific tax nodes, and I also think it is a shame that the ncbi tax  
dump does not give a rank to entries such as these (they cannot  
easily be distinguished from unofficial ranks higher in the tree  
without ascending the tree).
Would it be useful for the species name to be included within EMBL  
file headers, eg in a line called OB (OB is a terrible suggestion  
based on 'Organism Binomial' since OS is already in use)?

eg two examples of the species 'Apple stem grooving virus', where the  
second one would appear to be a different species without delving  
into the tax tree or the inclusion of an OB line.

AC   D14995; S47260;
DE   Apple stem grooving virus genome, complete sequence.
OS   Apple stem grooving virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.

AC   AY646511;
DE   Citrus tatter leaf virus strain Kumquat 1, complete genome.
OS   Citrus tatter leaf virus
OB   Apple stem grooving virus
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
OC   Capillovirus.


> My point is, a large number of users do NOT use, nor care about,  
> taxonomic
> information to the degree they need to know the entire  
> classification of the
> organism; many are just as happy about getting the scientific name  
> only,
> which is in the GenBank/EMBL file itself.  To take one extreme, it  
> is not
> productive to force every user to download the NCBI tax database  
> and use
> lookups just to convert sequences from EMBL format to GenBank  
> format.  It's
> not productive to allow users to spam the NCBI tax database  
> remotely either,
> so hardcoding lookups is, IMHO, a big mistake.

I don't think you need to add any information to turn an embl-format  
file into a Genbank flatfile, but maybe I'm missing something obvious.

Nadeem


--
Dr S.M. Nadeem N. Faruque
9 Barley Court
Saffron Walden
Essex  CB11 3HG
01799 500 120


From dmessina at wustl.edu  Mon May 15 20:12:48 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 15 May 2006 15:12:48 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
Message-ID: <5A2309FD-8C6E-4349-99CC-B3EDA8B2F499@wustl.edu>

On May 15, 2006, at 2:23 PM, Hilmar Lapp wrote:

> I wasn't using the search. It's in the scrollable table for browsing.
> -hilmar

I'm seeing this too on OS X with Safari 2.0.3.

If you type 'goflat' (without the quotes) into the search box, you'll  
see the behavior. Chris, can you try it again this way just to  
confirm it's an OS/browser-specific thing?

Not sure what's going on, Hilmar -- I'll take a look.

Dave


From cjfields at uiuc.edu  Mon May 15 20:56:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 15:56:29 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <57326DCD-D72B-4CED-801D-9E25609BF57C@gmx.net>
Message-ID: <000a01c67862$0a00cab0$15327e82@pyrimidine>

Okay, I see what you mean.  Using the search term "Bio::Ont*" also explains
why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and Mac OS
X), and those links are broken like you said.  Could be something to do with
indexing.  

Using the methods script in the FAQ
(http://www.bioperl.org/wiki/FAQ#Why_can.27t_I_easily_get_a_list_of_all_the_
methods_a_object_can_call.3F) I get this:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
Bio::OntologyIO::simplehierarchy::Dumper
Bio::OntologyIO::simplehierarchy::basename
Bio::OntologyIO::simplehierarchy::dirname
Bio::OntologyIO::simplehierarchy::fileparse
Bio::OntologyIO::simplehierarchy::fileparse_set_fstype

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 2:24 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> I wasn't using the search. It's in the scrollable table for browsing.
> -hilmar
> 
> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
> 
> > I'll have to give it a try on Mac OS X (we have an ancient G4 in
> > the lab
> > which I can try it on).  I'll let you know what I find.
> >
> > This is what I get when I do a search for 'Bio::Ont*' using Firefox
> > on WinXP
> > and this Deobfuscator link (http://bioperl.org/cgi-bin/
> > deob_interface.cgi?);
> > all the classes have links that work (I added newline and tab to
> > make it a
> > bit more readable) :
> >
> > Bio::OntologyIO
> > 	Parser factory for Ontology formats
> > Bio::OntologyIO::Handlers::BaseSAXHandler
> > 	no short description available
> > Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> > 	no short description available
> > Bio::Ontology::OntologyI
> > 	Interface for an ontology implementation
> > Bio::Ontology::TermFactory
> > 	Instantiates a new Bio::Ontology::TermI (or derived class) through a
> > factory
> > Bio::Ontology::OntologyStore
> > 	A repository of ontologies
> > Bio::Ontology::RelationshipFactory
> > 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> > through a factory
> > Bio::Ontology::Ontology
> > 	standard implementation of an Ontology
> >
> > So the names seem fine here.
> >
> > When I click on a class (Bio::Ontology::Ontology) I get in the results
> > section:
> >
> > Method                  Class
> > Returns
> > Usage
> > add_relationship        Bio::Ontology::Ontology
> Its
> > argument.     add_relationship(RelationshipI relationship):
> > RelationshipI
> > add_relationship_type   Bio::Ontology::OntologyEngineI            not
> > documented    not documented
> > add_term                Bio::Ontology::Ontology                   its
> > argument.     add_term(TermI term): TermI
> >
> > ....and so on
> >
> > Where each method is clickable and opens a new page containing a
> > table:
> >
> > Bio::Ontology::Ontology::add_relationship
> > Usage	add_relationship(RelationshipI relationship): RelationshipI
> > Function	Adds a relationship object to the ontology engine.
> > Returns	Its argument.
> > Args	A RelationshipI object.
> >
> >
> > Each class is also linked to the bioperl-live PDOC.  Clicking on class
> > Bio::Ontology::Ontology in the results table gets me this page (no new
> > page):
> >
> > http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> >
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Monday, May 15, 2006 1:09 PM
> >> To: Chris Fields
> >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> Safari or Firefox on MacOSX don't do this. Note that the appearance
> >> in the browsable list is already different (the prefix is missing),
> >> and the JavaScript link also lacks the prefix in the module name in
> >> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> >> the few Bio::Ontology exceptions that do work and do display
> >> correctly).
> >>
> >> I suppose there is something peculiar about the code formatting of
> >> those modules? Some of the modules under Bio::OntologyIO are also
> >> affected BTW.
> >>
> >> What happens is after you click on the link the page apppears to
> >> reload (i.e., gets submitted) but the second table that is supposed
> >> open underneath the first doesn't appear. However, the sort-by drop
> >> down selector does appear.
> >>
> >> 	-hilmar
> >>
> >> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >>
> >>> That's strange.  Clicking on the list gives me the results for that
> >>> module.
> >>> When I click on the hyperlinks in the results section they open
> >>> fine; the
> >>> method column links opens a new page containing usage-function-
> >>> returns-args
> >>> and the class column links opens pdoc (same page) for bioperl-
> >>> live.  I'm
> >>> using Firefox 1.5 on WinXP.
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >>>> Sent: Monday, May 15, 2006 12:01 PM
> >>>> To: Mauricio Herrera Cuadra
> >>>> Cc: bioperl-l
> >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>
> >>>> Hey, thanks to Laura & David for this interface.
> >>>>
> >>>> Any idea why most of the Bio::Ontology::* modules show up without
> >>>> their leading Bio::Ontology? And clicking on those hyperlinks
> >>>> doesn't
> >>>> go anywhere either ... Anything different with those modules that I
> >>>> can fix?
> >>>>
> >>>> 	-hilmar
> >>>>
> >>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>>>
> >>>>> I'm glad to announce the availability of the Deobfuscator
> >>>>> interface at
> >>>>> the BioPerl website. You can use it at the following URL:
> >>>>>
> >>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>>>
> >>>>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>>>> contribution to the BioPerl project!
> >>>>>
> >>>>> Mauricio.
> >>>>>
> >>>>> --
> >>>>> MAURICIO HERRERA CUADRA
> >>>>> arareko at campus.iztacala.unam.mx
> >>>>> Laboratorio de Gen?tica
> >>>>> Unidad de Morfofisiolog?a y Funci?n
> >>>>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon May 15 21:29:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 16:29:14 -0500
Subject: [Bioperl-l] Bio::DB::Taxonomy:: mishandles species,
	subspecies/variant names
In-Reply-To: <809AE0C7-A9B6-48A4-BF11-BA392C770CA9@ebi.ac.uk>
Message-ID: <000b01c67866$9dac2620$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nadeem Faruque
> Sent: Monday, May 15, 2006 2:47 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::DB::Taxonomy:: mishandles
> species,subspecies/variant names
> 
> >> My personal view is that having it as an annotation would serve no
> >> real
> >> purpose. For me the whole point of any kind of species
> >> representation in
> >> bioperl is to allow you to compare species in a biologically
> >> meaningful
> >> way. If it's just some annotation then that means it's basically
> 
> I understand the need to find the species name of entries, especially
> now that so many complete genomes have been given their own strain-
> specific tax nodes, and I also think it is a shame that the ncbi tax
> dump does not give a rank to entries such as these (they cannot
> easily be distinguished from unofficial ranks higher in the tree
> without ascending the tree).
> Would it be useful for the species name to be included within EMBL
> file headers, eg in a line called OB (OB is a terrible suggestion
> based on 'Organism Binomial' since OS is already in use)?
> 
> eg two examples of the species 'Apple stem grooving virus', where the
> second one would appear to be a different species without delving
> into the tax tree or the inclusion of an OB line.
> 
> AC   D14995; S47260;
> DE   Apple stem grooving virus genome, complete sequence.
> OS   Apple stem grooving virus
> OB   Apple stem grooving virus
> OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
> OC   Capillovirus.
> 
> AC   AY646511;
> DE   Citrus tatter leaf virus strain Kumquat 1, complete genome.
> OS   Citrus tatter leaf virus
> OB   Apple stem grooving virus
> OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flexiviridae;
> OC   Capillovirus.

Jason also mentions a few examples (see below).  The problem lies in the
fact that EMBL and GenBank flatfiles do not give hierarchy ranking for
taxonomy, so it's a best guess.  What I'm seeing is that the guess is wrong
more often than not when it comes to complex scientific names (viruses,
bacteria, etc).  Notice the doubling of the strain in the following GenBank
files passed through SeqIO (genbank->genbank conversion, BTW; haven't tried
EMBL):

SOURCE      Azoarcus sp. EbN1 EbN1
  ORGANISM  Azoarcus sp.
            Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales;
            Rhodocyclaceae; Azoarcus.

SOURCE      Mycobacterium sp. KMS KMS
  ORGANISM  Mycobacterium sp.
            Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
            Corynebacterineae; Mycobacteriaceae; Mycobacterium.

SOURCE      Mycobacterium tuberculosis C C
  ORGANISM  Mycobacterium tuberculosis
            Bacteria; Actinobacteria; Actinobacteridae; Actinomycetales;
            Corynebacterineae; Mycobacteriaceae; Mycobacterium;
Mycobacterium;
            tuberculosis complex; Mycobacterium.

SOURCE      Bacillus subtilis subsp. subtilis str. 168 subtilis str. 168
  ORGANISM  Bacillus subtilis subsp.
            Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus.

Here are Jason's examples, for posterity:

Can you guess what value is the strain versus sub-species?  What happens
when there is a two part strain name (space separated) and a sub-species or
variety designation?

SOURCE      Staphylococcus haemolyticus JCSC1435
   ORGANISM  Staphylococcus haemolyticus JCSC1435
             Bacteria; Firmicutes; Bacillales; Staphylococcus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=279808
strain is JCSC1435

versus
SOURCE      Muntiacus muntjak vaginalis
   ORGANISM  Muntiacus muntjak vaginalis
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla;
Ruminantia;
             Pecora; Cervidae; Muntiacinae; Muntiacus.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9887
species is muntjak, sub-species vaginalis ?

versus
SOURCE      Aspergillus nidulans FGSC A4
   ORGANISM  Aspergillus nidulans FGSC A4
             Eukaryota; Fungi; Ascomycota; Pezizomycotina; Eurotiomycetes;
             Eurotiales; Trichocomaceae; Emericella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=227321
Genus should be Aspergillus or Emericella ?

Strain and subspecies/variety in the same entry
SOURCE      Cryptococcus neoformans var. grubii H99
   ORGANISM  Cryptococcus neoformans var. grubii H99
             Eukaryota; Fungi; Basidiomycota; Hymenomycetes;
             Heterobasidiomycetes; Tremellomycetidae; Tremellales;
Tremellaceae;
             Filobasidiella.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=235443


> > My point is, a large number of users do NOT use, nor care about,
> > taxonomic
> > information to the degree they need to know the entire
> > classification of the
> > organism; many are just as happy about getting the scientific name
> > only,
> > which is in the GenBank/EMBL file itself.  To take one extreme, it
> > is not
> > productive to force every user to download the NCBI tax database
> > and use
> > lookups just to convert sequences from EMBL format to GenBank
> > format.  It's
> > not productive to allow users to spam the NCBI tax database
> > remotely either,
> > so hardcoding lookups is, IMHO, a big mistake.
> 
> I don't think you need to add any information to turn an embl-format
> file into a Genbank flatfile, but maybe I'm missing something obvious.

The issue is the way the SOURCE and ORGANISM lines are handled (OS/OC lines
in EMBL, I believe), which is using a Bio::Species object.  The problem is,
like I mentioned above, no hierarchal ranking is in the flat file, just the
order of the ranking.  We can try to make a best guess based on that but
it's obviously very tricky, particularly when dealing with subspecies,
strains, etc.  

NCBI also states that many times the classification can be too long for a
file so may be incomplete (I think they leave out nodes which have 'no rank'
tags, but I can't be completely sure), so there's another issue.

Anyway, this is where the lookup would come in, which would require a local
taxonomy  database (we can't spam the NCBI remote database, that would just
be rude) which would give the complete taxonomic classification if it worked
properly.  

So now we have three possible situations:

1) One extreme : We require a lookup to get it right (which, BTW, it
currently doesn't); this by default requires a local database.  
2) Middle of the road : we try and guess the information as best as we can
with the information given (the current situation); this is breaking more
and more often now, so is becoming more unreliable.
3) Other extreme : we punt and absolve ourselves of even trying to parse the
data and just have a strict tagname->value or similar simple construct to
handle the data.

#3 as default with option to do #1 is probably best (least error prone with
option for most information), with caching to speed up lookups as Sendu Bala
does now.

Chris

 
> Nadeem
> 
> 
> --
> Dr S.M. Nadeem N. Faruque
> 9 Barley Court
> Saffron Walden
> Essex  CB11 3HG
> 01799 500 120
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Mon May 15 21:37:56 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 15 May 2006 17:37:56 -0400
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <000a01c67862$0a00cab0$15327e82@pyrimidine>
References: <000a01c67862$0a00cab0$15327e82@pyrimidine>
Message-ID: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net>

It does have the following line though (and a 'use' statement for  
OntologyIO);

@ISA = qw( Bio::OntologyIO );

So what is it doing 'wrong' (there aren't any tests or so in which  
anything erroneous would show)?

	-hilmar

On May 15, 2006, at 4:56 PM, Chris Fields wrote:

> Okay, I see what you mean.  Using the search term "Bio::Ont*" also  
> explains
> why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and  
> Mac OS
> X), and those links are broken like you said.  Could be something  
> to do with
> indexing.
>
> Using the methods script in the FAQ
> (http://www.bioperl.org/wiki/FAQ#Why_can. 
> 27t_I_easily_get_a_list_of_all_the_
> methods_a_object_can_call.3F) I get this:
>
> C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
> Bio::OntologyIO::simplehierarchy::Dumper
> Bio::OntologyIO::simplehierarchy::basename
> Bio::OntologyIO::simplehierarchy::dirname
> Bio::OntologyIO::simplehierarchy::fileparse
> Bio::OntologyIO::simplehierarchy::fileparse_set_fstype
>
> Chris
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>> Sent: Monday, May 15, 2006 2:24 PM
>> To: Chris Fields
>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>
>> I wasn't using the search. It's in the scrollable table for browsing.
>> -hilmar
>>
>> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
>>
>>> I'll have to give it a try on Mac OS X (we have an ancient G4 in
>>> the lab
>>> which I can try it on).  I'll let you know what I find.
>>>
>>> This is what I get when I do a search for 'Bio::Ont*' using Firefox
>>> on WinXP
>>> and this Deobfuscator link (http://bioperl.org/cgi-bin/
>>> deob_interface.cgi?);
>>> all the classes have links that work (I added newline and tab to
>>> make it a
>>> bit more readable) :
>>>
>>> Bio::OntologyIO
>>> 	Parser factory for Ontology formats
>>> Bio::OntologyIO::Handlers::BaseSAXHandler
>>> 	no short description available
>>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
>>> 	no short description available
>>> Bio::Ontology::OntologyI
>>> 	Interface for an ontology implementation
>>> Bio::Ontology::TermFactory
>>> 	Instantiates a new Bio::Ontology::TermI (or derived class)  
>>> through a
>>> factory
>>> Bio::Ontology::OntologyStore
>>> 	A repository of ontologies
>>> Bio::Ontology::RelationshipFactory
>>> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
>>> through a factory
>>> Bio::Ontology::Ontology
>>> 	standard implementation of an Ontology
>>>
>>> So the names seem fine here.
>>>
>>> When I click on a class (Bio::Ontology::Ontology) I get in the  
>>> results
>>> section:
>>>
>>> Method                  Class
>>> Returns
>>> Usage
>>> add_relationship        Bio::Ontology::Ontology
>> Its
>>> argument.     add_relationship(RelationshipI relationship):
>>> RelationshipI
>>> add_relationship_type   Bio::Ontology::OntologyEngineI             
>>> not
>>> documented    not documented
>>> add_term                Bio::Ontology::Ontology                    
>>> its
>>> argument.     add_term(TermI term): TermI
>>>
>>> ....and so on
>>>
>>> Where each method is clickable and opens a new page containing a
>>> table:
>>>
>>> Bio::Ontology::Ontology::add_relationship
>>> Usage	add_relationship(RelationshipI relationship): RelationshipI
>>> Function	Adds a relationship object to the ontology engine.
>>> Returns	Its argument.
>>> Args	A RelationshipI object.
>>>
>>>
>>> Each class is also linked to the bioperl-live PDOC.  Clicking on  
>>> class
>>> Bio::Ontology::Ontology in the results table gets me this page  
>>> (no new
>>> page):
>>>
>>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
>>>
>>>
>>> Chris
>>>
>>>> -----Original Message-----
>>>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>>>> Sent: Monday, May 15, 2006 1:09 PM
>>>> To: Chris Fields
>>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>
>>>> Safari or Firefox on MacOSX don't do this. Note that the appearance
>>>> in the browsable list is already different (the prefix is missing),
>>>> and the JavaScript link also lacks the prefix in the module name in
>>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
>>>> the few Bio::Ontology exceptions that do work and do display
>>>> correctly).
>>>>
>>>> I suppose there is something peculiar about the code formatting of
>>>> those modules? Some of the modules under Bio::OntologyIO are also
>>>> affected BTW.
>>>>
>>>> What happens is after you click on the link the page apppears to
>>>> reload (i.e., gets submitted) but the second table that is supposed
>>>> open underneath the first doesn't appear. However, the sort-by drop
>>>> down selector does appear.
>>>>
>>>> 	-hilmar
>>>>
>>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
>>>>
>>>>> That's strange.  Clicking on the list gives me the results for  
>>>>> that
>>>>> module.
>>>>> When I click on the hyperlinks in the results section they open
>>>>> fine; the
>>>>> method column links opens a new page containing usage-function-
>>>>> returns-args
>>>>> and the class column links opens pdoc (same page) for bioperl-
>>>>> live.  I'm
>>>>> using Firefox 1.5 on WinXP.
>>>>>
>>>>> Chris
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
>>>>>> Sent: Monday, May 15, 2006 12:01 PM
>>>>>> To: Mauricio Herrera Cuadra
>>>>>> Cc: bioperl-l
>>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
>>>>>>
>>>>>> Hey, thanks to Laura & David for this interface.
>>>>>>
>>>>>> Any idea why most of the Bio::Ontology::* modules show up without
>>>>>> their leading Bio::Ontology? And clicking on those hyperlinks
>>>>>> doesn't
>>>>>> go anywhere either ... Anything different with those modules  
>>>>>> that I
>>>>>> can fix?
>>>>>>
>>>>>> 	-hilmar
>>>>>>
>>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
>>>>>>
>>>>>>> I'm glad to announce the availability of the Deobfuscator
>>>>>>> interface at
>>>>>>> the BioPerl website. You can use it at the following URL:
>>>>>>>
>>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>>>>>>
>>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
>>>>>>> contribution to the BioPerl project!
>>>>>>>
>>>>>>> Mauricio.
>>>>>>>
>>>>>>> --
>>>>>>> MAURICIO HERRERA CUADRA
>>>>>>> arareko at campus.iztacala.unam.mx
>>>>>>> Laboratorio de Gen?tica
>>>>>>> Unidad de Morfofisiolog?a y Funci?n
>>>>>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ===========================================================
>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>> ===========================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Mon May 15 22:03:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 15 May 2006 17:03:48 -0500
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <6CCA8112-651D-4154-94AE-88FE8EFBCD27@gmx.net>
Message-ID: <000d01c6786b$71c04e60$15327e82@pyrimidine>

And Bio::OntologyIO works on it's own:

C:\Perl\Scripts>methods.pl Bio::OntologyIO
Bio::OntologyIO::DESTROY
Bio::OntologyIO::new
Bio::OntologyIO::next_ontology
Bio::OntologyIO::term_factory
Bio::OntologyIO::unescape
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented

But when I try these:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::goflat


C:\Perl\Scripts>methods.pl Bio::OntologyIO::dagflat


I get nada.  It could be related to the way the methods are parsed using
Class::Inspector :

print join ("\n", sort
@{Class::Inspector->methods($class,'full','public')}), "\n";

I haven't tried it on all the weird Bio::Ontology-missing modules (don't
have time today).  It's not common to all of those modules though:

C:\Perl\Scripts>methods.pl Bio::OntologyIO::InterProParser
Bio::OntologyIO::DESTROY
Bio::OntologyIO::InterProParser::next_ontology
Bio::OntologyIO::InterProParser::parse
Bio::OntologyIO::InterProParser::secondary_accessions_map
Bio::OntologyIO::new
Bio::OntologyIO::term_factory
Bio::OntologyIO::unescape
Bio::Root::IO::catfile
Bio::Root::IO::close
Bio::Root::IO::dup
Bio::Root::IO::exists_exe
Bio::Root::IO::file
Bio::Root::IO::flush
Bio::Root::IO::gensym
Bio::Root::IO::mode
Bio::Root::IO::noclose
Bio::Root::IO::qualify
Bio::Root::IO::qualify_to_ref
Bio::Root::IO::rmtree
Bio::Root::IO::tempdir
Bio::Root::IO::tempfile
Bio::Root::IO::ungensym
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented


Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Monday, May 15, 2006 4:38 PM
> To: Chris Fields
> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> 
> It does have the following line though (and a 'use' statement for
> OntologyIO);
> 
> @ISA = qw( Bio::OntologyIO );
> 
> So what is it doing 'wrong' (there aren't any tests or so in which
> anything erroneous would show)?
> 
> 	-hilmar
> 
> On May 15, 2006, at 4:56 PM, Chris Fields wrote:
> 
> > Okay, I see what you mean.  Using the search term "Bio::Ont*" also
> > explains
> > why I didn't see it ;P.  Yeah, the bug shows up here too (WinXP and
> > Mac OS
> > X), and those links are broken like you said.  Could be something
> > to do with
> > indexing.
> >
> > Using the methods script in the FAQ
> > (http://www.bioperl.org/wiki/FAQ#Why_can.
> > 27t_I_easily_get_a_list_of_all_the_
> > methods_a_object_can_call.3F) I get this:
> >
> > C:\Perl\Scripts>methods.pl Bio::OntologyIO::simplehierarchy
> > Bio::OntologyIO::simplehierarchy::Dumper
> > Bio::OntologyIO::simplehierarchy::basename
> > Bio::OntologyIO::simplehierarchy::dirname
> > Bio::OntologyIO::simplehierarchy::fileparse
> > Bio::OntologyIO::simplehierarchy::fileparse_set_fstype
> >
> > Chris
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >> Sent: Monday, May 15, 2006 2:24 PM
> >> To: Chris Fields
> >> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>
> >> I wasn't using the search. It's in the scrollable table for browsing.
> >> -hilmar
> >>
> >> On May 15, 2006, at 3:07 PM, Chris Fields wrote:
> >>
> >>> I'll have to give it a try on Mac OS X (we have an ancient G4 in
> >>> the lab
> >>> which I can try it on).  I'll let you know what I find.
> >>>
> >>> This is what I get when I do a search for 'Bio::Ont*' using Firefox
> >>> on WinXP
> >>> and this Deobfuscator link (http://bioperl.org/cgi-bin/
> >>> deob_interface.cgi?);
> >>> all the classes have links that work (I added newline and tab to
> >>> make it a
> >>> bit more readable) :
> >>>
> >>> Bio::OntologyIO
> >>> 	Parser factory for Ontology formats
> >>> Bio::OntologyIO::Handlers::BaseSAXHandler
> >>> 	no short description available
> >>> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
> >>> 	no short description available
> >>> Bio::Ontology::OntologyI
> >>> 	Interface for an ontology implementation
> >>> Bio::Ontology::TermFactory
> >>> 	Instantiates a new Bio::Ontology::TermI (or derived class)
> >>> through a
> >>> factory
> >>> Bio::Ontology::OntologyStore
> >>> 	A repository of ontologies
> >>> Bio::Ontology::RelationshipFactory
> >>> 	Instantiates a new Bio::Ontology::RelationshipI (or derived class)
> >>> through a factory
> >>> Bio::Ontology::Ontology
> >>> 	standard implementation of an Ontology
> >>>
> >>> So the names seem fine here.
> >>>
> >>> When I click on a class (Bio::Ontology::Ontology) I get in the
> >>> results
> >>> section:
> >>>
> >>> Method                  Class
> >>> Returns
> >>> Usage
> >>> add_relationship        Bio::Ontology::Ontology
> >> Its
> >>> argument.     add_relationship(RelationshipI relationship):
> >>> RelationshipI
> >>> add_relationship_type   Bio::Ontology::OntologyEngineI
> >>> not
> >>> documented    not documented
> >>> add_term                Bio::Ontology::Ontology
> >>> its
> >>> argument.     add_term(TermI term): TermI
> >>>
> >>> ....and so on
> >>>
> >>> Where each method is clickable and opens a new page containing a
> >>> table:
> >>>
> >>> Bio::Ontology::Ontology::add_relationship
> >>> Usage	add_relationship(RelationshipI relationship): RelationshipI
> >>> Function	Adds a relationship object to the ontology engine.
> >>> Returns	Its argument.
> >>> Args	A RelationshipI object.
> >>>
> >>>
> >>> Each class is also linked to the bioperl-live PDOC.  Clicking on
> >>> class
> >>> Bio::Ontology::Ontology in the results table gets me this page
> >>> (no new
> >>> page):
> >>>
> >>> http://doc.bioperl.org/bioperl-live/Bio/Ontology/Ontology.html
> >>>
> >>>
> >>> Chris
> >>>
> >>>> -----Original Message-----
> >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >>>> Sent: Monday, May 15, 2006 1:09 PM
> >>>> To: Chris Fields
> >>>> Cc: 'Mauricio Herrera Cuadra'; 'bioperl-l'
> >>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>
> >>>> Safari or Firefox on MacOSX don't do this. Note that the appearance
> >>>> in the browsable list is already different (the prefix is missing),
> >>>> and the JavaScript link also lacks the prefix in the module name in
> >>>> contrast to others, e.g., Bio::Ontology::Ontology (which is one of
> >>>> the few Bio::Ontology exceptions that do work and do display
> >>>> correctly).
> >>>>
> >>>> I suppose there is something peculiar about the code formatting of
> >>>> those modules? Some of the modules under Bio::OntologyIO are also
> >>>> affected BTW.
> >>>>
> >>>> What happens is after you click on the link the page apppears to
> >>>> reload (i.e., gets submitted) but the second table that is supposed
> >>>> open underneath the first doesn't appear. However, the sort-by drop
> >>>> down selector does appear.
> >>>>
> >>>> 	-hilmar
> >>>>
> >>>> On May 15, 2006, at 1:22 PM, Chris Fields wrote:
> >>>>
> >>>>> That's strange.  Clicking on the list gives me the results for
> >>>>> that
> >>>>> module.
> >>>>> When I click on the hyperlinks in the results section they open
> >>>>> fine; the
> >>>>> method column links opens a new page containing usage-function-
> >>>>> returns-args
> >>>>> and the class column links opens pdoc (same page) for bioperl-
> >>>>> live.  I'm
> >>>>> using Firefox 1.5 on WinXP.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>> bounces at lists.open-bio.org] On Behalf Of Hilmar Lapp
> >>>>>> Sent: Monday, May 15, 2006 12:01 PM
> >>>>>> To: Mauricio Herrera Cuadra
> >>>>>> Cc: bioperl-l
> >>>>>> Subject: Re: [Bioperl-l] Deobfuscator interface now available
> >>>>>>
> >>>>>> Hey, thanks to Laura & David for this interface.
> >>>>>>
> >>>>>> Any idea why most of the Bio::Ontology::* modules show up without
> >>>>>> their leading Bio::Ontology? And clicking on those hyperlinks
> >>>>>> doesn't
> >>>>>> go anywhere either ... Anything different with those modules
> >>>>>> that I
> >>>>>> can fix?
> >>>>>>
> >>>>>> 	-hilmar
> >>>>>>
> >>>>>> On May 14, 2006, at 12:09 AM, Mauricio Herrera Cuadra wrote:
> >>>>>>
> >>>>>>> I'm glad to announce the availability of the Deobfuscator
> >>>>>>> interface at
> >>>>>>> the BioPerl website. You can use it at the following URL:
> >>>>>>>
> >>>>>>> http://bioperl.org/cgi-bin/deob_interface.cgi
> >>>>>>>
> >>>>>>> Many thanks to Laura Kavanaugh and David Messina for this great
> >>>>>>> contribution to the BioPerl project!
> >>>>>>>
> >>>>>>> Mauricio.
> >>>>>>>
> >>>>>>> --
> >>>>>>> MAURICIO HERRERA CUADRA
> >>>>>>> arareko at campus.iztacala.unam.mx
> >>>>>>> Laboratorio de Gen?tica
> >>>>>>> Unidad de Morfofisiolog?a y Funci?n
> >>>>>>> Facultad de Estudios Superiores Iztacala, UNAM
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> ===========================================================
> >>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>>>> ===========================================================
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>
> >>>> --
> >>>> ===========================================================
> >>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >>>> ===========================================================
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue May 16 00:14:28 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Mon, 15 May 2006 19:14:28 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <a7d26051.b90db78f.81ac600@expms6.cites.uiuc.edu>

---- Original message ----
>Date: Mon, 15 May 2006 15:40:15 -0400
>From: "Clarke, Wayne" <ClarkeW at agr.gc.ca>  
>Subject: [Bioperl-l] Memory Leak in Bio::SearchIO  
>To: <bioperl-l at lists.open-bio.org>
>
>Hey everyone, 
>
> 
>
>I have been developing some code to download and parse blast reports
>from a remote server using Soap::Lite as well as insert the results into
>a mysql database. The problem I am having is that my program seems to be
>taking up and huge amount of RAM. For a single job of 10000 queries it
>can consume as much as a couple hundred Mb inside an hour. 

If you're parsing 10000 queries (10000 different BLAST reports, right?) then it's 
not necessarily a memory leak as much as it is object creatio.  Each report 
generates hit objects which in turn generate hsp objects.  I think Jason 
recommends using the tabular output option (-m8 or -m9) for huge reports as 
it cuts down considerably on this.  If you are cycling through each report it 
shouldn't be as much of a problem unless your BLAST reports are really huge.  
Have you tried parsing a single report to see if the problem persists?

Now, if you are using Bioperl 1.5.1 with BLAST 2.2.13 or newer, you'll likely run 
into a problem with an infinite loop that occurs due to a change in NCBI's text 
output.  You can try updating bioperl from CVS in either case to see if that helps 
any.  Tabular output and XML output, AFAIK, is the same regardless of version; 
this bug only affected text output of BLAST reports.

> I realize
>that a lot of work is being done but this seems like way too much. This
>leads me to the subject of my post. I think I may have traced the source
>of the memory leak to Bio::SearchIO. I have used Devel::Size to track
>the size of my variables and done other debugging steps and have had no
>luck with resolving this very frustrating problem. My code is as
>follows:
>
> 
>
> my $result = $connector->getQueryResult($query_id);
>
> 
>
>                my $FH;
>
>                open $FH, "<", \$result;
>
> 
>
>                my $searchio = new Bio::SearchIO(-format => "blast",
>
> 
>
>                         -fh => $FH);
>
> 
>
>                while (my $o_blast = $searchio->next_result()) {
>
>                        my $clone_id = $o_blast->query_name();
>
> 
>
>                        my $statement = $bdbi->form_push_SQL ($o_blast,
>$clone_id, 5);
>
> 
>
>this is just the leading and tailing code surrounding the use of
>Bio::SearchIO since there is quite a lot. I am mostly just wondering if
>anyone has ever had problems with SearchIO and its memory usage. I
>looked at the source code for it but am afraid it is out of my league.
>Any help/suggestions/questions would be great. Thanks
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Tue May 16 00:18:44 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 16 May 2006 10:18:44 +1000
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>
References: <320530F83FA47047823E57F110DDEAADB159EC@onncrxms4.agr.gc.ca>
Message-ID: <44691A64.8040607@infotech.monash.edu.au>

> taking up and huge amount of RAM. For a single job of 10000 queries it
> can consume as much as a couple hundred Mb inside an hour. I realize

>  my $result = $connector->getQueryResult($query_id);
>                 my $searchio = new Bio::SearchIO(-format => "blast",
>                 while (my $o_blast = $searchio->next_result()) {
>                         my $clone_id = $o_blast->query_name();
>                         my $statement = $bdbi->form_push_SQL ($o_blast, $clone_id, 5); }

Some comments:

Have you considered that whatever class/module $bdbi belongs to is 
causing the problem? ie. is it keeping a reference to $o_blast around?

Are you aware that Perl garbage collection does not necessarily return 
freed memory back to the OS? This may affect how you were measuring 
"memory usage".

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From kmdaily at indiana.edu  Mon May 15 21:00:12 2006
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Mon, 15 May 2006 17:00:12 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
Message-ID: <20528E699A515C499B80C222BDBEBC34043FF8@iu-mssg-mbx108.ads.iu.edu>

I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in Bio/SeqIO). How can I get this module?

Kenny Daily
IU School of Informatics
kmdaily at indiana.edu


From letondal at pasteur.fr  Tue May 16 06:06:19 2006
From: letondal at pasteur.fr (Catherine Letondal)
Date: Tue, 16 May 2006 08:06:19 +0200
Subject: [Bioperl-l] Deobfuscator interface now available
In-Reply-To: <C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>
References: <000901c67827$d99eabb0$15327e82@pyrimidine>
	<C91BD1BF-6309-4321-8B4D-819AC0AEE332@wustl.edu>
Message-ID: <9c36140009c3d80bbb0d543376afa6e0@pasteur.fr>


On May 15, 2006, at 9:34 PM, David Messina wrote:

>>>> A couple of minor interface thoughts.
>>>>
>>>> 1)There's quite a lot of methods for many of the classes. As such, I
>>>> think I'll often want to browse through what's available in a
>>>> class. But
>>>> 60% or so of the screen real estate is used for "Enter a search
>>>> string... OR select a class from the list". IMO, it would be
>>>> better to
>>>> have two pages, a search page and a result page.   It only takes
>>>> a click
>>>> on Back (or a "new search" button) to get to a new search, and
>>>> now you
>>>> can use your whole screen for reading your results.
>>>
>>> As the compromise it must be, I like the way it behaves. I don't like
>>> lots of windows. I especially don't like pop up windows. Right now
>>> when
>>> I'm using the bioperl docs I tend to have a whole bunch of tabs
>>> open to
>>> different class pages at once, so being able to see an overview
>>> all on
>>> one page in Deobfuscator is very nice.
>
> I think the current behavior makes sense as the default, but I like
> the idea of being able to view the search results in a separate
> window for easier browsing. Thanks for the suggestion; I'll add it to
> the list.
>

First, thanks for this very useful Web interface!

There are examples (quite ajaxian ones) that reach a compromise between 
several windows for easily browsing large results, and composing 
everything in one window to get an overview - the 2 examples that come 
in my mind currently are (not biology related):
	- http://montreal.mspace.fm/chi/sched/
	- http://www.live.com/
		(see the slider on the top right enabling to squeeze or enlarge the 
results area)


--
Catherine Letondal -- Institut Pasteur


From cjfields at uiuc.edu  Tue May 16 11:38:42 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Tue, 16 May 2006 06:38:42 -0500
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
Message-ID: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>

You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
developer release  (1.5.1):

http://www.bioperl.org/wiki/Installing_BioPerl

Chris

---- Original message ----
>Date: Mon, 15 May 2006 17:00:12 -0400
>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>To: <bioperl-l at lists.open-bio.org>
>
>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
Bio/SeqIO). How can I get this module?
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Tue May 16 11:37:46 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 16 May 2006 13:37:46 +0200
Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
Message-ID: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com>

Hi all,

I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and
found some issues and differences (bugs?) in behaviour wrt the pod.
Do these look familiar ?

Some example code:
my $query = Bio::DB::Query::GenBank->new
       (-query   =>'Lassa Virus[ORGN]',
        -reldate => '30',
        -db      => 'protein',
        -ids => [195052,2981014,11127914],
        -maxids => 30 );

$gb = new Bio::DB::GenBank(format=>'fasta');
my $seqio = $gb->get_Stream_by_query($query);
while (my $seq = $seqio->next_seq) {
       print $seq->desc,"\n"; }

The module states that if we provide -ids that:
       If you provide an array reference of IDs in -ids, the query will be
       ignored and the list of IDs will be used when the query is passed to a
       Bio::DB::GenBank object's get_Stream_by_query() method.

In the above case actually the query is passed ('Lassa Virus[ORGN]),
not the IDs. Also $query->query shows the original query. Am I doing
something wrong or is the pod not reflecting current behaviour of this
module?

I was also surprised that if internet is down no warning is thrown for
$query->query or $query->count at all. Only the get_Stream_by_query
above will warn us if the site is unreachable (500 Internal Server
Error).

$query->ids or $query->count will not throw a warning and
@ids=$query->ids will just be an empty array. (I realize $query->count
is not initialized, so I am using this now to check for succes, but a
warning from WebDBSeqI would me more approprotiate I think).

Last, the example from the pod is not working, but no warnings are raised:
          # initialize the list yourself
          my $query =
Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]);

$query->count returns zero w/o any warning. Of course this query did
not specify a DB. Only if we specify -db=>'nucleotide' $query->count
is 3.
However, why not any warning if we set -db->'protein' or if we did not set this?

On the NCBI website searching Protein DB returns for 19505:
      See Details. No items found.
      The following term(s) refer to a different DB:195052

But this is not reflected via Bio::DB::Query::GenBank.

Can I check for this situation in the code apart from checking on
$query->count == 0 ? Or would it indeed be better to check for these
situations in the module?

Regards,
Bernd


From chen_li3 at yahoo.com  Tue May 16 14:55:51 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 16 May 2006 07:55:51 -0700 (PDT)
Subject: [Bioperl-l] module for 6 reading frames
Message-ID: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com>

Hi all,

I wonder which module is available for translating DNA
sequence into 6 reading frames.

Thank you,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From smarkel at scitegic.com  Tue May 16 15:10:35 2006
From: smarkel at scitegic.com (smarkel at scitegic.com)
Date: Tue, 16 May 2006 08:10:35 -0700
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <20060516145551.50370.qmail@web36802.mail.mud.yahoo.com>
Message-ID: <OF41BF3DF8.D7365B03-ON88257170.00534209-88257170.00535904@scitegic.com>

Li,

Use the translate() function in Bio::Tools::CodonTable.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


bioperl-l-bounces at lists.open-bio.org wrote on 16.05.2006 07:55:51:

> Hi all,
> 
> I wonder which module is available for translating DNA
> sequence into 6 reading frames.
> 
> Thank you,
> 
> Li


From golharam at umdnj.edu  Tue May 16 16:18:19 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 16 May 2006 12:18:19 -0400
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
Message-ID: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1>

I just updated my local copy of bioperl from cvs.  When I ran the
configure script, it says I need the external module
Bio::ASN1::EntrezGene.  Which package contains this module?

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From golharam at umdnj.edu  Tue May 16 16:24:03 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 16 May 2006 12:24:03 -0400
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
Message-ID: <002001c67905$2622a580$2f01a8c0@GOLHARMOBILE1>

Never mind.  I see its in CPAN.

-----Original Message-----
From: Ryan Golhar [mailto:golharam at umdnj.edu] 
Sent: Tuesday, May 16, 2006 12:18 PM
To: 'bioperl-l at bioperl.org'
Subject: Where is Bio::ASN1::EntrezGene?


I just updated my local copy of bioperl from cvs.  When I ran the
configure script, it says I need the external module
Bio::ASN1::EntrezGene.  Which package contains this module?

--
Ryan Golhar  -  golharam at umdnj.edu
The Informatics Institute of UMDNJ


From cjfields at uiuc.edu  Tue May 16 17:27:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 12:27:32 -0500
Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
In-Reply-To: <001f01c67904$59b08ad0$2f01a8c0@GOLHARMOBILE1>
Message-ID: <002701c6790e$03d8f110$15327e82@pyrimidine>

It's actually not part of Bioperl currently; you can find it on CPAN:

http://search.cpan.org/~mingyiliu/Bio-ASN1-EntrezGene-1.091/lib/Bio/ASN1/Ent
rezGene.pm

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Tuesday, May 16, 2006 11:18 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Where is Bio::ASN1::EntrezGene?
> 
> I just updated my local copy of bioperl from cvs.  When I ran the
> configure script, it says I need the external module
> Bio::ASN1::EntrezGene.  Which package contains this module?
> 
> --
> Ryan Golhar  -  golharam at umdnj.edu
> The Informatics Institute of UMDNJ
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue May 16 20:57:13 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 16 May 2006 16:57:13 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca>


With regards to the suggestions/comments made thank you. However I think
I should clear a few things up. I am running bioperl v1.4, I am cycling
through the blast reports which should not be of absurd size since they
only contain the top 5 hits, and I am using top to track(although I
realize fairly inacuately) the memory usage. I have looked through the
code for both AAFCBLAST and BEAST_UPDATE but do not believe the
leak/problem to be contained within them since they are almost
exclusively using method calls and those variables should be destroyed
upon leaving the scope of the method. I have used Devel::Size to check
the size of the variables $bdbi and $searchio and $connector and on each
iteration these variables have the same size. Any other suggestions
would be greatly appreciated as I have nearly gone insane trying to
track this problem down.

Thanks, Wayne 


-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] 
Sent: Monday, May 15, 2006 6:19 PM
To: Clarke, Wayne
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO

> taking up and huge amount of RAM. For a single job of 10000 queries it
> can consume as much as a couple hundred Mb inside an hour. I realize

>  my $result = $connector->getQueryResult($query_id);
>                 my $searchio = new Bio::SearchIO(-format => "blast",
>                 while (my $o_blast = $searchio->next_result()) {
>                         my $clone_id = $o_blast->query_name();
>                         my $statement = $bdbi->form_push_SQL
($o_blast, $clone_id, 5); }

Some comments:

Have you considered that whatever class/module $bdbi belongs to is 
causing the problem? ie. is it keeping a reference to $o_blast around?

Are you aware that Perl garbage collection does not necessarily return 
freed memory back to the OS? This may affect how you were measuring 
"memory usage".

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From smarkel at scitegic.com  Tue May 16 20:52:05 2006
From: smarkel at scitegic.com (smarkel at scitegic.com)
Date: Tue, 16 May 2006 13:52:05 -0700
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <20060516200436.34908.qmail@web36812.mail.mud.yahoo.com>
Message-ID: <OFE1576D7B.C032BA7B-ON88257170.00721261-88257170.00729CCD@scitegic.com>

Li,

You can either do the substring, and reverse complement, yourself
or you can use the translate() function in Bio::PrimarySeq.  It
inherits from Bio::PrimarySeqI, so check there for the documentation.
That translate() function takes a "-frame" argument.

Scott

PS In future, please respond to the list.  That way others see
the questions and answers.

chen li <chen_li3 at yahoo.com> wrote on 16.05.2006 13:04:36:

> Dear Dr. Markel,
> 
>     I browse through the document of 
> Bio:Tools::Codontable and find this line:
> 
> my $translation= $CodonTable->translate($seq);
> 
> I think this line is to do the translation. Here is my
> question: which line in the doc says how to translate
> the remaining frames 2,3, and -1, -2, -3? 
> 
> 
> Thank you,
> 
> Li
> 
> --- smarkel at scitegic.com wrote:
> 
> > Li,
> > 
> > Use the translate() function in
> > Bio::Tools::CodonTable.
> > 
> > Scott
> > 
> > Scott Markel, Ph.D.
> > Principal Bioinformatics Architect  email: 
> > smarkel at scitegic.com
> > SciTegic Inc.                       mobile: +1 858
> > 205 3653
> > 10188 Telesis Court, Suite 100      voice:  +1 858
> > 799 5603
> > San Diego, CA 92121                 fax:    +1 858
> > 279 8804
> > USA                                 web: 
> > http://www.scitegic.com
> > 
> > 
> > bioperl-l-bounces at lists.open-bio.org wrote on
> > 16.05.2006 07:55:51:
> > 
> > > Hi all,
> > > 
> > > I wonder which module is available for translating
> > DNA
> > > sequence into 6 reading frames.
> > > 
> > > Thank you,
> > > 
> > > Li
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> 
> -- 
> Click on the link below to report this email as spam
> https://www.mailcontrol.
> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO!
> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI!
> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT!
> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2!
> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV 


From cjfields at uiuc.edu  Tue May 16 21:15:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 16:15:10 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FB@onncrxms4.agr.gc.ca>
Message-ID: <000601c6792d$d0ab1500$15327e82@pyrimidine>

I mentioned two possibilities last time I posted: 1) that the BLAST file was
too large, or 2) that you are using an old version of bioperl that SearchIO
is broken.  You seem to fit #2. 

The issue is that NCBI does not consider text BLAST output sacrosanct and
routinely makes changes to it that break parsing.  Due to this,
SearchIO::blast needs to be constantly updated, so much so that there are
normally a few updates a year to fix parsing issues in that module alone
compared to BioPerl as a whole.  And, BTW, although bioperl-1.4 is about 2
years old now, even bioperl-1.5.1 SearchIO is broken when it comes to the
latest NCBI BLAST (2.2.14 now).  I seriously suggest updating your local
bioperl distribution to the latest bioperl-live (from CVS).

Take one of those 10000 reports, just one, and try parsing it.  If you have
the same problem (a CPU spike and increasing memory usage) then it may be
fixed in CVS.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 3:57 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> With regards to the suggestions/comments made thank you. However I think
> I should clear a few things up. I am running bioperl v1.4, I am cycling
> through the blast reports which should not be of absurd size since they
> only contain the top 5 hits, and I am using top to track(although I
> realize fairly inacuately) the memory usage. I have looked through the
> code for both AAFCBLAST and BEAST_UPDATE but do not believe the
> leak/problem to be contained within them since they are almost
> exclusively using method calls and those variables should be destroyed
> upon leaving the scope of the method. I have used Devel::Size to check
> the size of the variables $bdbi and $searchio and $connector and on each
> iteration these variables have the same size. Any other suggestions
> would be greatly appreciated as I have nearly gone insane trying to
> track this problem down.
> 
> Thanks, Wayne
> 
> 
> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Monday, May 15, 2006 6:19 PM
> To: Clarke, Wayne
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> > taking up and huge amount of RAM. For a single job of 10000 queries it
> > can consume as much as a couple hundred Mb inside an hour. I realize
> 
> >  my $result = $connector->getQueryResult($query_id);
> >                 my $searchio = new Bio::SearchIO(-format => "blast",
> >                 while (my $o_blast = $searchio->next_result()) {
> >                         my $clone_id = $o_blast->query_name();
> >                         my $statement = $bdbi->form_push_SQL
> ($o_blast, $clone_id, 5); }
> 
> Some comments:
> 
> Have you considered that whatever class/module $bdbi belongs to is
> causing the problem? ie. is it keeping a reference to $o_blast around?
> 
> Are you aware that Perl garbage collection does not necessarily return
> freed memory back to the OS? This may affect how you were measuring
> "memory usage".
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ClarkeW at AGR.GC.CA  Tue May 16 21:24:51 2006
From: ClarkeW at AGR.GC.CA (Clarke, Wayne)
Date: Tue, 16 May 2006 17:24:51 -0400
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
Message-ID: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca>


Thanks Chris, 

I did forget to mention however that I did parse one single report and
found no problems, it finished fast and with no noticeable memory usage.
I will consider getting my SA to update bioperl from CVS as a precaution
but he has already stated he prefers to wait for the release of v1.5.
Even a single job of 10000 will finish but the problem is that I am
trying to loop through many jobs of 10000 and it seems to be additive
for reasons I can not determine. During testing I noticed that the RSS
on top decreased around 80% MEM usage, but then the shared mem
increased. I am wondering if this is due to the perl garbage collector
freeing up memory but keeping it in its pool for use, if so that is fine
as long as the it does not then want to reach into swapped mem.

Thanks again, Wayne

-----Original Message-----
From: Chris Fields [mailto:cjfields at uiuc.edu] 
Sent: Tuesday, May 16, 2006 3:15 PM
To: Clarke, Wayne; bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] Memory Leak in Bio::SearchIO

I mentioned two possibilities last time I posted: 1) that the BLAST file
was
too large, or 2) that you are using an old version of bioperl that
SearchIO
is broken.  You seem to fit #2. 

The issue is that NCBI does not consider text BLAST output sacrosanct
and
routinely makes changes to it that break parsing.  Due to this,
SearchIO::blast needs to be constantly updated, so much so that there
are
normally a few updates a year to fix parsing issues in that module alone
compared to BioPerl as a whole.  And, BTW, although bioperl-1.4 is about
2
years old now, even bioperl-1.5.1 SearchIO is broken when it comes to
the
latest NCBI BLAST (2.2.14 now).  I seriously suggest updating your local
bioperl distribution to the latest bioperl-live (from CVS).

Take one of those 10000 reports, just one, and try parsing it.  If you
have
the same problem (a CPU spike and increasing memory usage) then it may
be
fixed in CVS.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 3:57 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> With regards to the suggestions/comments made thank you. However I
think
> I should clear a few things up. I am running bioperl v1.4, I am
cycling
> through the blast reports which should not be of absurd size since
they
> only contain the top 5 hits, and I am using top to track(although I
> realize fairly inacuately) the memory usage. I have looked through the
> code for both AAFCBLAST and BEAST_UPDATE but do not believe the
> leak/problem to be contained within them since they are almost
> exclusively using method calls and those variables should be destroyed
> upon leaving the scope of the method. I have used Devel::Size to check
> the size of the variables $bdbi and $searchio and $connector and on
each
> iteration these variables have the same size. Any other suggestions
> would be greatly appreciated as I have nearly gone insane trying to
> track this problem down.
> 
> Thanks, Wayne
> 
> 
> -----Original Message-----
> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
> Sent: Monday, May 15, 2006 6:19 PM
> To: Clarke, Wayne
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> > taking up and huge amount of RAM. For a single job of 10000 queries
it
> > can consume as much as a couple hundred Mb inside an hour. I realize
> 
> >  my $result = $connector->getQueryResult($query_id);
> >                 my $searchio = new Bio::SearchIO(-format => "blast",
> >                 while (my $o_blast = $searchio->next_result()) {
> >                         my $clone_id = $o_blast->query_name();
> >                         my $statement = $bdbi->form_push_SQL
> ($o_blast, $clone_id, 5); }
> 
> Some comments:
> 
> Have you considered that whatever class/module $bdbi belongs to is
> causing the problem? ie. is it keeping a reference to $o_blast around?
> 
> Are you aware that Perl garbage collection does not necessarily return
> freed memory back to the OS? This may affect how you were measuring
> "memory usage".
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue May 16 21:45:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 16:45:16 -0500
Subject: [Bioperl-l] Memory Leak in Bio::SearchIO
In-Reply-To: <320530F83FA47047823E57F110DDEAADB159FC@onncrxms4.agr.gc.ca>
Message-ID: <000801c67932$050dbd30$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne
> Sent: Tuesday, May 16, 2006 4:25 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Memory Leak in Bio::SearchIO
> 
> 
> Thanks Chris,
> 
> I did forget to mention however that I did parse one single report and
> found no problems, it finished fast and with no noticeable memory usage.
> I will consider getting my SA to update bioperl from CVS as a precaution
> but he has already stated he prefers to wait for the release of v1.5.

Um, you can tell him the last release was v.1.5.1 (last October).  It's
considered a developer release but is pretty stable; well, except for that
whole SearchIO quibble, and that's not our fault.

You could also install a local version in case he doesn't budge; see here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_I
N_A_PERSONAL_MODULE_AREA

Chris

> Even a single job of 10000 will finish but the problem is that I am
> trying to loop through many jobs of 10000 and it seems to be additive
> for reasons I can not determine. During testing I noticed that the RSS
> on top decreased around 80% MEM usage, but then the shared mem
> increased. I am wondering if this is due to the perl garbage collector
> freeing up memory but keeping it in its pool for use, if so that is fine
> as long as the it does not then want to reach into swapped mem.
> 
> Thanks again, Wayne
> ...


From cjfields at uiuc.edu  Tue May 16 22:20:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 16 May 2006 17:20:29 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
In-Reply-To: <716af09c0605160437tfcf824dxa514f38f6b94d423@mail.gmail.com>
Message-ID: <000901c67936$f0896990$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernd Web
> Sent: Tuesday, May 16, 2006 6:38 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::Query::GenBank checks
> 
> Hi all,
> 
> I was using Bio::DB::Query::GenBank to obtain only IDs from Entrez and
> found some issues and differences (bugs?) in behaviour wrt the pod.
> Do these look familiar ?
> 
> Some example code:
> my $query = Bio::DB::Query::GenBank->new
>        (-query   =>'Lassa Virus[ORGN]',
>         -reldate => '30',
>         -db      => 'protein',
>         -ids => [195052,2981014,11127914],
>         -maxids => 30 );
> 
> $gb = new Bio::DB::GenBank(format=>'fasta');
> my $seqio = $gb->get_Stream_by_query($query);
> while (my $seq = $seqio->next_seq) {
>        print $seq->desc,"\n"; }
> 
> The module states that if we provide -ids that:
>        If you provide an array reference of IDs in -ids, the query will be
>        ignored and the list of IDs will be used when the query is passed
> to a
>        Bio::DB::GenBank object's get_Stream_by_query() method.
> 
> In the above case actually the query is passed ('Lassa Virus[ORGN]),
> not the IDs. Also $query->query shows the original query. Am I doing
> something wrong or is the pod not reflecting current behaviour of this
> module?
> 
> I was also surprised that if internet is down no warning is thrown for
> $query->query or $query->count at all. Only the get_Stream_by_query
> above will warn us if the site is unreachable (500 Internal Server
> Error).

I believe this has to do with the difference in the objects and the way they
retrieve request data; Bio::DB::GenBank and Bio::DB::Query::GenBank use
different methods to retrieve ids, Bio::DB::GenBank's get_Stream_by_query
method just makes it a bit easier to retrieve a list of uid's directly
instead of saving them as an array then reposting them using
get_Stream_by_id.  Not fullproof but it works okay.

> $query->ids or $query->count will not throw a warning and
> @ids=$query->ids will just be an empty array. (I realize $query->count
> is not initialized, so I am using this now to check for succes, but a
> warning from WebDBSeqI would me more approprotiate I think).

WebDBSeqI would be the place to make general warnings (it supposed to be and
interface for any web seq DB), but not eutils-specific warnings. 

> Last, the example from the pod is not working, but no warnings are raised:
>           # initialize the list yourself
>           my $query =
> Bio::DB::Query::GenBank->new(-ids=>[195052,2981014,11127914]);
> 
> $query->count returns zero w/o any warning. Of course this query did
> not specify a DB. Only if we specify -db=>'nucleotide' $query->count
> is 3.
> However, why not any warning if we set -db->'protein' or if we did not set
> this?
>
>
> On the NCBI website searching Protein DB returns for 19505:
>       See Details. No items found.
>       The following term(s) refer to a different DB:195052
> 
> But this is not reflected via Bio::DB::Query::GenBank.
> 
> Can I check for this situation in the code apart from checking on
> $query->count == 0 ? Or would it indeed be better to check for these
> situations in the module?
> 
> Regards,
> Bernd

I can probably play around with adding a few things in tomorrow and clean up
the POD somewhat.  I'm planning a rewrite for EUtilities-based searches but
that's a ways off still...  Can't promise much;l I'm pretty busy til next
week.

Chris


From chen_li3 at yahoo.com  Wed May 17 00:53:17 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 16 May 2006 17:53:17 -0700 (PDT)
Subject: [Bioperl-l] module for formating sequence output on the screen
Message-ID: <20060517005317.3976.qmail@web36815.mail.mud.yahoo.com>

Hi all,

Thank you very much for the help.

I have some DNA sequences printed on the screen. But
the default output is longer than I expect.  I need 50
necleotides/line. I search CPAN but can not get the
right module.  Which bioperl module can do this job?

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From kmdaily at indiana.edu  Tue May 16 13:57:52 2006
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Tue, 16 May 2006 09:57:52 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>
Message-ID: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>

OK, got that installed. But I still get an error:

Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557.

I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core".

Kenny Daily
IU School of Informatics
kmdaily at indiana.edu


-----Original Message-----
From: Christopher Fields [mailto:cjfields at uiuc.edu]
Sent: Tue 5/16/2006 7:38 AM
To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
 
You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
developer release  (1.5.1):

http://www.bioperl.org/wiki/Installing_BioPerl

Chris

---- Original message ----
>Date: Mon, 15 May 2006 17:00:12 -0400
>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>To: <bioperl-l at lists.open-bio.org>
>
>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
Bio/SeqIO). How can I get this module?
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From skirov at utk.edu  Wed May 17 11:48:29 2006
From: skirov at utk.edu (Stefan Kirov)
Date: Wed, 17 May 2006 07:48:29 -0400
Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
In-Reply-To: <20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>
References: <36b52ba4.b94c5b79.8198c00@expms6.cites.uiuc.edu>
	<20528E699A515C499B80C222BDBEBC34043FFB@iu-mssg-mbx108.ads.iu.edu>
Message-ID: <446B0D8D.40901@utk.edu>

You are using an old Bio::Annotation::DBLink module. Did you download 
only entrezgene.pm or the whole  bioperl? If yes, what does the tests 
tell you?
Stefan
 
Daily, Kenneth Michael wrote:

>OK, got that installed. But I still get an error:
>
>Can't locate object method "url" via package "Bio::Annotation::DBLink" at /home/kmdaily/src/bioperl/core/Bio/SeqIO/entrezgene.pm line 557.
>
>I am using this on a shared system, and an older version of Bioperl was installed by the admin. But the path to the one I downloaded via CVS is first in the list @INC, and PERL5LIB="/home/kmdaily/src/bioperl/core".
>
>Kenny Daily
>IU School of Informatics
>kmdaily at indiana.edu
>
>
>
>-----Original Message-----
>From: Christopher Fields [mailto:cjfields at uiuc.edu]
>Sent: Tue 5/16/2006 7:38 AM
>To: Daily, Kenneth Michael; bioperl-l at lists.open-bio.org
>Subject: Re: [Bioperl-l] entrezgene.pm not in Bio::SeqIO
> 
>You'll have to install from CVS.  I believe Brian added Entrezgene.pm after the lst 
>developer release  (1.5.1):
>
>http://www.bioperl.org/wiki/Installing_BioPerl
>
>Chris
>
>---- Original message ----
>  
>
>>Date: Mon, 15 May 2006 17:00:12 -0400
>>From: "Daily, Kenneth Michael" <kmdaily at indiana.edu>  
>>Subject: [Bioperl-l] entrezgene.pm not in Bio::SeqIO  
>>To: <bioperl-l at lists.open-bio.org>
>>
>>I just installed Bioperl 1.4, and entrezgene.pm is not included (should be in 
>>    
>>
>Bio/SeqIO). How can I get this module?
>  
>
>>Kenny Daily
>>IU School of Informatics
>>kmdaily at indiana.edu
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>    
>>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>  
>


From osborne1 at optonline.net  Wed May 17 00:46:00 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 16 May 2006 20:46:00 -0400
Subject: [Bioperl-l] module for 6 reading frames
In-Reply-To: <OFE1576D7B.C032BA7B-ON88257170.00721261-88257170.00729CCD@scitegic.com>
Message-ID: <C08FEA88.877A%osborne1@optonline.net>

Chen Li,

There's some documentation on translate() in bptutorial:

http://bioperl.org/Core/Latest/bptutorial.html

You could also use the translate_6frames() method of Bio::SeqUtils.


Brian O.


On 5/16/06 4:52 PM, "smarkel at scitegic.com" <smarkel at scitegic.com> wrote:

> Li,
> 
> You can either do the substring, and reverse complement, yourself
> or you can use the translate() function in Bio::PrimarySeq.  It
> inherits from Bio::PrimarySeqI, so check there for the documentation.
> That translate() function takes a "-frame" argument.
> 
> Scott
> 
> PS In future, please respond to the list.  That way others see
> the questions and answers.
> 
> chen li <chen_li3 at yahoo.com> wrote on 16.05.2006 13:04:36:
> 
>> Dear Dr. Markel,
>> 
>>     I browse through the document of
>> Bio:Tools::Codontable and find this line:
>> 
>> my $translation= $CodonTable->translate($seq);
>> 
>> I think this line is to do the translation. Here is my
>> question: which line in the doc says how to translate
>> the remaining frames 2,3, and -1, -2, -3?
>> 
>> 
>> Thank you,
>> 
>> Li
>> 
>> --- smarkel at scitegic.com wrote:
>> 
>>> Li,
>>> 
>>> Use the translate() function in
>>> Bio::Tools::CodonTable.
>>> 
>>> Scott
>>> 
>>> Scott Markel, Ph.D.
>>> Principal Bioinformatics Architect  email:
>>> smarkel at scitegic.com
>>> SciTegic Inc.                       mobile: +1 858
>>> 205 3653
>>> 10188 Telesis Court, Suite 100      voice:  +1 858
>>> 799 5603
>>> San Diego, CA 92121                 fax:    +1 858
>>> 279 8804
>>> USA                                 web:
>>> http://www.scitegic.com
>>> 
>>> 
>>> bioperl-l-bounces at lists.open-bio.org wrote on
>>> 16.05.2006 07:55:51:
>>> 
>>>> Hi all,
>>>> 
>>>> I wonder which module is available for translating
>>> DNA
>>>> sequence into 6 reading frames.
>>>> 
>>>> Thank you,
>>>> 
>>>> Li
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
>> 
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>> 
>> 
>> -- 
>> Click on the link below to report this email as spam
>> https://www.mailcontrol.
>> com/sr/YWaRnXqa+nSyeG1Z34OqL4dC5eYKMoJmYLQSBonkiAgNVwARwO!
>> frAkRrVu9wDE5L8wrIaSzXTpcs3mxX9Ufx7LAO0PQl77O8HiAh50c4TI!
>> ysIW++WTn79gM0HS11zvKPuUVANsGXCZT!
>> LRAY3PyyLo6NzoChgLXk6YfX05ndLG3vE+GH2aUSTxvV3pwd2!
>> JlBh9ARAt+OXXsyYtG6VgFNOO9GFnNxV
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e-just at northwestern.edu  Wed May 17 15:03:41 2006
From: e-just at northwestern.edu (Eric Just)
Date: Wed, 17 May 2006 10:03:41 -0500
Subject: [Bioperl-l] Modware: a BioPerl based API for Chado
Message-ID: <6.1.1.1.2.20060517095821.13353920@hecky.it.northwestern.edu>

Hi Everyone,

We are announcing a new Sourceforge Project called Modware that may be of 
interest to you.   It is an object-oriented API written in Perl that 
creates BioPerl object representations of biological features stored in a 
Chado database. It basically creates a Bio::Seq object for chromosomes in 
Chado and creates Bio::SeqFeature::Gene objects for protein coding 
transcripts stored in Chado.  Things like contigs are represented as 
Bio::SeqFeature::Generic objects.  We also provide many methods for 
manipulating these objects once they are in memory.

For download please visit our Sourceforge project page:
http://sourceforge.net/projects/gmod-ware

For API documentation and some short examples of selected use cases visit 
our project home page:
http://gmod-ware.sourceforge.net/

This software is adapted from the production middleware code that dictyBase 
uses.  Modware 0.1 requires the latest stable GMOD release: 0.003 be 
installed.  We are currently calling it a release candidate and if we get 
some feedback will call it an official release if there are no major 
install bugs (we've installed it only on two different machines).  If you 
would like a version that works on the latest CVS version of GMOD, let me 
know and I'll expedite getting that out the door.

Lastly, please use the direct download version, we have not fully recovered 
from the recent Sourceforge CVS issues.

Please try the software out and let us know what you think!


Sincerely,
Eric Just and Sohel Merchant

e-just at northwestern.edu
s-merchant at northwestern.edu


============================================

Eric Just
e-just at northwestern.edu
dictyBase Programmer
Center for Genetic Medicine
Northwestern University
http://dictybase.org

============================================ 


From sb at mrc-dunn.cam.ac.uk  Wed May 17 17:46:45 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 17 May 2006 18:46:45 +0100
Subject: [Bioperl-l] Bio::Map:: enhancements
Message-ID: <446B6185.1000602@mrc-dunn.cam.ac.uk>

I added bug http://bugzilla.bioperl.org/show_bug.cgi?id=1998

I'm interested in what people have to say about the secondary 
enhancement I talk about there. Is it a sane thing to do? What are the 
better ways of doing that?
If it /is/ ok, I suppose I'd have to go back and alter 
Bio::Map::MappableI and Bio::Map::MarkerI as well, not just Marker.


Oh, on a side note, you'll see I had to override RangeI's intersection 
method to work on multiple ranges. Why is RangeI limited to an 
intersection of only two ranges?

Cheers,
Sendu.


From David_Waner/San_Diego/Accelrys at scitegic.com  Thu May 18 19:30:46 2006
From: David_Waner/San_Diego/Accelrys at scitegic.com (David_Waner/San_Diego/Accelrys at scitegic.com)
Date: Thu, 18 May 2006 12:30:46 -0700
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on
	Windows
Message-ID: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>

BioPerl Users/Developers,

In our testing we have found severe performance problems using BioPerl 
with Perl 5.8 on Windows (but not on Linux). They show up especially in 
SeqIO when reading or writing Fasta files containing large (~16 MB) 
sequences.  The same files that can be read in 1 or 2 seconds with Windows 
Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.

Although the fault is clearly with Perl, not with BioPerl, I have 
identified a couple of places where BioPerl could be modified in order to 
save Windows Perl 5.8 users a lot of time, while not affecting other 
users. 

For example, in my testing the following excerpt from 
Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a 
16 MB sequence):

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015?\012/\n/g;
        $line =~ s/\015/\n/g unless $ONMAC;
    }
 
whereas the following replacement code should be equivalent: 

    if( (!$param{-raw}) && (defined $line) ) {
        $line =~ s/\015\012/\012/g;                        # Change all 
CR/LF pairs to LF
        $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to 
NEWLINE
    }
 
but executes in less than 1 second.

In addition, changing:

    defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
 
to:

    defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove 
whitespace
 
in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.

There are also problems in reading files with the <> operator when $/ is 
redefined to "\n>", where reading the first line of Fasta files containing 
large sequences takes ~50 seconds, but reading subsequent lines or files 
takes about 1 second. I don't have a work-around for this.

I would like to ask the mailing list:

1. Has anyone else run into this problem? Any fixes?
2. Do you think BioPerl should incorporate these changes? 

I plan to submit a bug report to perlbug, but don't know when or if the 
problem will be fixed. 

- David


From cjfields at uiuc.edu  Thu May 18 20:07:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 18 May 2006 15:07:14 -0500
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8
	onWindows
In-Reply-To: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>
Message-ID: <002901c67ab6$a84c3140$15327e82@pyrimidine>

David,

I have seen some slowdowns with Bio::SeqIO associated with GenBank files,
which this could be related to.  I can't do anything about it (test or
commit changes) until next week but someone else using Windows might (though
we are few and far between, and I'm switching to Mac OS X in fall).  Would
be nice to try the changes and test it out on a few platforms.  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of
> David_Waner/San_Diego/Accelrys at scitegic.com
> Sent: Thursday, May 18, 2006 2:31 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8
> onWindows
> 
> BioPerl Users/Developers,
> 
> In our testing we have found severe performance problems using BioPerl
> with Perl 5.8 on Windows (but not on Linux). They show up especially in
> SeqIO when reading or writing Fasta files containing large (~16 MB)
> sequences.  The same files that can be read in 1 or 2 seconds with Windows
> Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.
> 
> Although the fault is clearly with Perl, not with BioPerl, I have
> identified a couple of places where BioPerl could be modified in order to
> save Windows Perl 5.8 users a lot of time, while not affecting other
> users.
> 
> For example, in my testing the following excerpt from
> Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a
> 16 MB sequence):
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015?\012/\n/g;
>         $line =~ s/\015/\n/g unless $ONMAC;
>     }
> 
> whereas the following replacement code should be equivalent:
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015\012/\012/g;                        # Change all
> CR/LF pairs to LF
>         $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to
> NEWLINE
>     }
> 
> but executes in less than 1 second.
> 
> In addition, changing:
> 
>     defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
> 
> to:
> 
>     defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove
> whitespace
> 
> in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.
> 
> There are also problems in reading files with the <> operator when $/ is
> redefined to "\n>", where reading the first line of Fasta files containing
> large sequences takes ~50 seconds, but reading subsequent lines or files
> takes about 1 second. I don't have a work-around for this.
> 
> I would like to ask the mailing list:
> 
> 1. Has anyone else run into this problem? Any fixes?
> 2. Do you think BioPerl should incorporate these changes?
> 
> I plan to submit a bug report to perlbug, but don't know when or if the
> problem will be fixed.
> 
> - David
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Thu May 18 20:27:57 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 18 May 2006 16:27:57 -0400
Subject: [Bioperl-l] Performance problems with BioPerl and Perl 5.8 on
 Windows
In-Reply-To: <OF674B7DDA.992193E9-ON88257172.006A8C8A-88257172.006B3400@scitegic.com>
Message-ID: <C092510D.87EB%osborne1@optonline.net>

David,

What are the results from the relevant t/*t files before and after these
patches?

Brian O.


On 5/18/06 3:30 PM, "David_Waner/San_Diego/Accelrys at scitegic.com"
<David_Waner/San_Diego/Accelrys at scitegic.com> wrote:

> BioPerl Users/Developers,
> 
> In our testing we have found severe performance problems using BioPerl
> with Perl 5.8 on Windows (but not on Linux). They show up especially in
> SeqIO when reading or writing Fasta files containing large (~16 MB)
> sequences.  The same files that can be read in 1 or 2 seconds with Windows
> Perl 5.6 or Linux Perl 5.8, take minutes in Windows Perl 5.8.
> 
> Although the fault is clearly with Perl, not with BioPerl, I have
> identified a couple of places where BioPerl could be modified in order to
> save Windows Perl 5.8 users a lot of time, while not affecting other
> users. 
> 
> For example, in my testing the following excerpt from
> Bio::Root::IO::_readline() takes 50 seconds (!) to execute (when reading a
> 16 MB sequence):
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015?\012/\n/g;
>         $line =~ s/\015/\n/g unless $ONMAC;
>     }
>  
> whereas the following replacement code should be equivalent:
> 
>     if( (!$param{-raw}) && (defined $line) ) {
>         $line =~ s/\015\012/\012/g;                        # Change all
> CR/LF pairs to LF
>         $line =~ tr/\015/\n/ unless $ONMAC;     # Change all single CRs to
> NEWLINE
>     }
>  
> but executes in less than 1 second.
> 
> In addition, changing:
> 
>     defined $sequence && $sequence =~ s/\s//g;        # Remove whitespace
>  
> to:
> 
>     defined $sequence && $sequence =~ tr/ \t\n\r//d;        # Remove
> whitespace
>  
> in Bio::SeqIO::fasta.pm saves an additional ~20 seconds.
> 
> There are also problems in reading files with the <> operator when $/ is
> redefined to "\n>", where reading the first line of Fasta files containing
> large sequences takes ~50 seconds, but reading subsequent lines or files
> takes about 1 second. I don't have a work-around for this.
> 
> I would like to ask the mailing list:
> 
> 1. Has anyone else run into this problem? Any fixes?
> 2. Do you think BioPerl should incorporate these changes?
> 
> I plan to submit a bug report to perlbug, but don't know when or if the
> problem will be fixed.
> 
> - David
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Thu May 18 20:41:27 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 18 May 2006 14:41:27 -0600
Subject: [Bioperl-l] parsing xml output
Message-ID: <446CDBF7.10908@gmx.at>

hi,
what is the best way to parse NCBI- and WU- Blast XML output....
and is it possible to parse both with the same parser, or differ their 
XML output...

thanks


From staffa at niehs.nih.gov  Thu May 18 20:49:15 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Thu, 18 May 2006 16:49:15 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>

Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations.
Namely the six D.melanogaster sequences.  
Specifically to find gene entries and learn the gene name, begin and end and CDS.
Please point me to appropriate modules and documentation.


Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From adamnkraut at gmail.com  Thu May 18 21:07:42 2006
From: adamnkraut at gmail.com (Adam Kraut)
Date: Thu, 18 May 2006 17:07:42 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline C?
Message-ID: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>

I am currently using a pairwise alignment algorithm written in C (not by
me).  The program consists of a library of routines, structures, and
definitions which I do not want to spend a lot of time abstracting.  I
already have a hack method of writing the parameters and inputs I want from
perl, calling the c program with system( ), and then parsing the output in
Perl.  Any good programmer would probably smack me but I'm just an undergrad
and I needed to show my boss that this works in order to spend more time on
it.

So on to my question, what is the preferred method of extending Bioperl to
use this algorithm?  I have just read the XS tutorial and a bit about Inline
C.  Can I put the main function in my script using Inline, and then just
point Inline at the rest of the C library?  The program has several
C-structures that are semantically equivalent to Bioperl objects, so just
need somewhere to start.  I will spend some more time so that I have a more
specific question, I just wanted a little feedback, this is my first post to
the bioperl list.

Thanks,
Adam


From osborne1 at optonline.net  Thu May 18 21:54:01 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 18 May 2006 17:54:01 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
Message-ID: <C0926539.87F5%osborne1@optonline.net>

Nick,

Have you read the Feature-Annotation HOWTO? This would be a good starting
point...

Brian O.


On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
wrote:

> Would like a fairly simple way to extract certain information from Genbank
> Genomic File Annotations.
> Namely the six D.melanogaster sequences.
> Specifically to find gene entries and learn the gene name, begin and end and
> CDS.
> Please point me to appropriate modules and documentation.
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu May 18 22:22:32 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 18 May 2006 18:22:32 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446CDBF7.10908@gmx.at>
References: <446CDBF7.10908@gmx.at>
Message-ID: <EA7E8F20-2531-45B2-83CD-FDA216A18615@duke.edu>

we don't parse WU-BLAST XML at this time.  We'd welcome someone  
contributing this.

ncbi XML is parsed with blastxml format.

-jason
On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:

> hi,
> what is the best way to parse NCBI- and WU- Blast XML output....
> and is it possible to parse both with the same parser, or differ their
> XML output...
>
> thanks
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From MEC at stowers-institute.org  Thu May 18 22:39:15 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 18 May 2006 17:39:15 -0500
Subject: [Bioperl-l] module for formating sequence output on the screen
Message-ID: <CED81D34E37D5043A1211565277A51E50563F496@exchkc02.stowers-institute.org>

Li,

Here's a one-liner that uses bioperl's Bio::SeqIO module to reformat
fasta on standard input to 50 char wide fasta on standard output.

perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta",
-width => 50);  $in = Bio::SeqIO->newFh(-format => "fasta", -fh =>
\*STDIN); print while <$in>' 

You can call it like this:

perl -MBio::SeqIO -e 'select Bio::SeqIO->newFh(-format => "fasta",
-width => 50);  $in = Bio::SeqIO->newFh(-format => "fasta", -fh =>
\*STDIN); print while <$in>' inputfile.fasta > outputfile.fasta

Does this help?

--Malcolm Cook


>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
>Sent: Tuesday, May 16, 2006 7:53 PM
>To: bioperl-l at bioperl.org
>Subject: [Bioperl-l] module for formating sequence output on the screen
>
>Hi all,
>
>Thank you very much for the help.
>
>I have some DNA sequences printed on the screen. But
>the default output is longer than I expect.  I need 50
>necleotides/line. I search CPAN but can not get the
>right module.  Which bioperl module can do this job?
>
>Li
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gish at watson.wustl.edu  Thu May 18 23:57:03 2006
From: gish at watson.wustl.edu (Warren Gish)
Date: Thu, 18 May 2006 18:57:03 -0500
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <EA7E8F20-2531-45B2-83CD-FDA216A18615@duke.edu>
Message-ID: <009f01c67ad6$c359a560$0d00a8c0@PM>

Just to clarify, the XML output from WU-BLAST conforms to the standard
NCBI_BlastOutput.dtd.  Technically, contents of data fields could still be
incompatible, but care was taken to ensure compatibility.  If someone
identifies a difference that prevents parsing or proper interpretation of
the WU-BLAST output, please let me know.
Regards,
--Warren 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Thursday, May 18, 2006 5:23 PM
> To: Hubert Prielinger
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] parsing xml output
> 
> we don't parse WU-BLAST XML at this time.  We'd welcome someone  
> contributing this.
> 
> ncbi XML is parsed with blastxml format.
> 
> -jason
> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
> 
> > hi,
> > what is the best way to parse NCBI- and WU- Blast XML output....
> > and is it possible to parse both with the same parser, or 
> differ their
> > XML output...
> >
> > thanks
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Fri May 19 01:10:50 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Thu, 18 May 2006 20:10:50 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <ca6b1c14.ba9e5f4f.81c0100@expms6.cites.uiuc.edu>

Just to make sure everybody knows, if you use bioperl v1.5.1, 
SearchIO::blastxml uses XML::Parser which should come with most recent perl 
distributions.   The bioperl-live version has switched over to XML::SAX for SAX2 
parsing and it is recommended that you install XML::SAX::ExpatXS as well for 
faster parsing. 

Chris

---- Original message ----
>Date: Thu, 18 May 2006 18:57:03 -0500
>From: "Warren Gish" <gish at watson.wustl.edu>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: "'Hubert Prielinger'" <hubert.prielinger at gmx.at>
>Cc: bioperl-l at bioperl.org
>
>Just to clarify, the XML output from WU-BLAST conforms to the standard
>NCBI_BlastOutput.dtd.  Technically, contents of data fields could still be
>incompatible, but care was taken to ensure compatibility.  If someone
>identifies a difference that prevents parsing or proper interpretation of
>the WU-BLAST output, please let me know.
>Regards,
>--Warren 
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Jason Stajich
>> Sent: Thursday, May 18, 2006 5:23 PM
>> To: Hubert Prielinger
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] parsing xml output
>> 
>> we don't parse WU-BLAST XML at this time.  We'd welcome someone  
>> contributing this.
>> 
>> ncbi XML is parsed with blastxml format.
>> 
>> -jason
>> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
>> 
>> > hi,
>> > what is the best way to parse NCBI- and WU- Blast XML output....
>> > and is it possible to parse both with the same parser, or 
>> differ their
>> > XML output...
>> >
>> > thanks
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Fri May 19 12:52:13 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 19 May 2006 08:52:13 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <009f01c67ad6$c359a560$0d00a8c0@PM>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>
Message-ID: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>

Whoops - sorry Warren - for some reason I had it in my mind that it  
was different.  So the blastxml parser should work fine.  The WUBLAST  
tab-delimited output is different than NCBI's -m8/9 though, right?

-jason


On May 18, 2006, at 7:57 PM, Warren Gish wrote:

> Just to clarify, the XML output from WU-BLAST conforms to the standard
> NCBI_BlastOutput.dtd.  Technically, contents of data fields could  
> still be
> incompatible, but care was taken to ensure compatibility.  If someone
> identifies a difference that prevents parsing or proper  
> interpretation of
> the WU-BLAST output, please let me know.
> Regards,
> --Warren
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Jason Stajich
>> Sent: Thursday, May 18, 2006 5:23 PM
>> To: Hubert Prielinger
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] parsing xml output
>>
>> we don't parse WU-BLAST XML at this time.  We'd welcome someone
>> contributing this.
>>
>> ncbi XML is parsed with blastxml format.
>>
>> -jason
>> On May 18, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>
>>> hi,
>>> what is the best way to parse NCBI- and WU- Blast XML output....
>>> and is it possible to parse both with the same parser, or
>> differ their
>>> XML output...
>>>
>>> thanks
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From torsten.seemann at infotech.monash.edu.au  Thu May 18 22:42:05 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 19 May 2006 08:42:05 +1000
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446CDBF7.10908@gmx.at>
References: <446CDBF7.10908@gmx.at>
Message-ID: <446CF83D.60207@infotech.monash.edu.au>

> what is the best way to parse NCBI- and WU- Blast XML output....
> and is it possible to parse both with the same parser, or differ their 
> XML output...


For NCBI BLAST XML format, use
	Bio::SearchIO->new(-format=>'blastxml', ...)

I don't know if 'blastxml' will load WU-BLAST XML format.
http://www.bioperl.org/wiki/HOWTO:SearchIO does not mention it.

Why not try it, and report back the results to the bioperl list?

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.seemann.vcf
Type: text/x-vcard
Size: 348 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/b6343abe/attachment-0004.vcf>

From torsten.seemann at infotech.monash.edu.au  Thu May 18 22:37:17 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 19 May 2006 08:37:17 +1000
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D087BC@NIHCESMLBX6.nih.gov>
Message-ID: <446CF71D.2070207@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> Would like a fairly simple way to extract certain information from Genbank Genomic File Annotations.
> Namely the six D.melanogaster sequences.  
> Specifically to find gene entries and learn the gene name, begin and end and CDS.
> Please point me to appropriate modules and documentation.

http://www.bioperl.org/
-> http://www.bioperl.org/wiki/HOWTOs
-> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

http://www.bioperl.org/
-> http://www.bioperl.org/wiki/FAQ
-> http://www.bioperl.org/wiki/FAQ#Annotations_and_Features

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: torsten.seemann.vcf
Type: text/x-vcard
Size: 348 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060519/27f849fc/attachment-0004.vcf>

From gish at watson.wustl.edu  Fri May 19 14:50:08 2006
From: gish at watson.wustl.edu (Warren Gish)
Date: Fri, 19 May 2006 09:50:08 -0500
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>
	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
Message-ID: <D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>

Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
blast.wustl.edu/blast/tabular.html).
--Warren

> Whoops - sorry Warren - for some reason I had it in my mind that it  
> was different.  So the blastxml parser should work fine.  The  
> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
> right?
>
> -jason


From adamnkraut at gmail.com  Fri May 19 15:04:01 2006
From: adamnkraut at gmail.com (Adam Kraut)
Date: Fri, 19 May 2006 11:04:01 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
	C?
In-Reply-To: <OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>
References: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>
	<OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>
Message-ID: <134ede0b0605190804i60ee5ce1v984a33e0c91adf52@mail.gmail.com>

The program generates an ensemble of weighted suboptimal alignments by use
of a partition function and stochastic backtracking.  The algorithm is quite
novel and it's really only part of a larger multi-scale comparative modeling
project. There documentation is here:

http://www.tbi.univie.ac.at/~ulim/probA/probA_lib.html

While I think this would be useful to the bioperl community if it were fully
abstracted/extended, I would at the least like to be able to pass in any two
sequences and get back SimpleAlign objects for our internal uses first.  I
have a good idea on how to get started.  I will be sure to post when I get
into trouble.


On 5/19/06, aaron.j.mackey at gsk.com <aaron.j.mackey at gsk.com> wrote:
>
> bioperl-ext is the package in which alignment algorithms and/or BioPerl
> "wrapped" external C libraries live.  Subprojects in bioperl-ext use both
> XS and Inline::C, that's up to you.
>
> You'll need to get your C code compiled to a dynamically loaded library
> (.so) to use either XS or Inline::C; this precludes any reuse of the C
> main() function (although your Inline::C wrapper might recapitulate/copy
> the main() function code).
>
> Out of curiosity, what pairwise alignment algorithm are you using?  This
> is a heavily beaten path, you might want to dig around first to see if
> someone else already has what you need.
>
> -Aaron
>
>


From slenk at emich.edu  Fri May 19 14:42:41 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Fri, 19 May 2006 10:42:41 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
	C?
Message-ID: <f141831f144a37.f144a37f141831@emich.edu>

There is nothing wrong with a reasonable way that works - better not 
to put yourself down.

Inline is good if you can get it to work for you - I have had issues 
with linking Inline to dynamic libraries. I believe Inline makes a 
file that has linkage characteristics specified. Try it and see, then 
tell people how you did it. My two cents.

Another way to use exterior executables is popen3, then reading and 
writing to the pipes. I use it (primer3 and local lab automation 
code) - snippet follows:

my $pid     = 0;
my $cancmd  = 'cancmd.exe';
my $write   = 0;
my $read    = 0;

sub new {

    my $c = {};

    $pid   = open3(\*WTRFH, \*RDRFH, \*RDRFH, $cancmd);

    $write = *WTRFH;
    $read  = *RDRFH;

    $write->autoflush();

    bless $c;
    return $c;
}

Just write your request, then read it back - I make sure that each 
pair is a newline terminated text line - be sure you harvest the child 
pid when you are done.


----- Original Message -----
From: Adam Kraut <adamnkraut at gmail.com>
Date: Thursday, May 18, 2006 5:07 pm
Subject: [Bioperl-l] writing a pairwise alignment module: XS and 
Inline C?

> I am currently using a pairwise alignment algorithm written in C 
> (not by
> me).  The program consists of a library of routines, structures, and
> definitions which I do not want to spend a lot of time 
> abstracting.  I
> already have a hack method of writing the parameters and inputs I 
> want from
> perl, calling the c program with system( ), and then parsing the 
> output in
> Perl.  Any good programmer would probably smack me but I'm just an 
> undergradand I needed to show my boss that this works in order to 
> spend more time on
> it.
> 
> So on to my question, what is the preferred method of extending 
> Bioperl to
> use this algorithm?  I have just read the XS tutorial and a bit 
> about Inline
> C.  Can I put the main function in my script using Inline, and 
> then just
> point Inline at the rest of the C library?  The program has several
> C-structures that are semantically equivalent to Bioperl objects, 
> so just
> need somewhere to start.  I will spend some more time so that I 
> have a more
> specific question, I just wanted a little feedback, this is my 
> first post to
> the bioperl list.
> 
> Thanks,
> Adam
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hubert.prielinger at gmx.at  Fri May 19 16:52:28 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 10:52:28 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
	<D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
Message-ID: <446DF7CC.5060509@gmx.at>

hi,
I wondered whether is it also possible in the xml output (either WU or 
NCBI - Blast) to get the species (taxononmy) for every hit, if I do a 
general search.
regards

Warren Gish wrote:
> Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
> blast.wustl.edu/blast/tabular.html).
> --Warren
>
>   
>> Whoops - sorry Warren - for some reason I had it in my mind that it  
>> was different.  So the blastxml parser should work fine.  The  
>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
>> right?
>>
>> -jason
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   


From staffa at niehs.nih.gov  Fri May 19 18:12:47 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Fri, 19 May 2006 14:12:47 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <C0926539.87F5%osborne1@optonline.net>
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>

Specifically: 
I have the document to which you refer,
but have not seen this one thing I need in the printout of tags etc.:
the values in this line;
     mRNA            join(380..509,578..1913,7784..8649,9439..10200)
Is that a  location object?


Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


> ----------
> From: 	Brian Osborne
> Sent: 	Thursday, May 18, 2006 5:54 PM
> To: 	Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
> Subject: 	Re: [Bioperl-l] Reading GenBank Genomic File Annotation
> 
> Nick,
> 
> Have you read the Feature-Annotation HOWTO? This would be a good starting
> point...
> 
> Brian O.
> 
> 
> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
> wrote:
> 
> > Would like a fairly simple way to extract certain information from Genbank
> > Genomic File Annotations.
> > Namely the six D.melanogaster sequences.
> > Specifically to find gene entries and learn the gene name, begin and end and
> > CDS.
> > Please point me to appropriate modules and documentation.
> > 
> > 
> > Nick Staffa
> > Telephone: 919-316-4569  (NIEHS: 6-4569)
> > Scientific Computing Support Group
> > NIEHS Information Technology Support Services Contract
> > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> > National Institute of Environmental Health Sciences
> > National Institutes of Health
> > Research Triangle Park, North Carolina
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 


From chandan.kr.singh at gmail.com  Fri May 19 18:37:26 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Sat, 20 May 2006 00:07:26 +0530
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
References: <C0926539.87F5%osborne1@optonline.net>
	<7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
Message-ID: <2d4f320605191137n11017ec0xe41a632a3c7ea9a9@mail.gmail.com>

On 5/19/06, Staffa, Nick (NIH/NIEHS) [C] <staffa at niehs.nih.gov> wrote:
>
> Specifically:
> I have the document to which you refer,
> but have not seen this one thing I need in the printout of tags etc.:
> the values in this line;
>      mRNA            join(380..509,578..1913,7784..8649,9439..10200)
> Is that a  location object?


Yes it is a  location object .  If you  want  that  as a  string (this is
what seems  from ur mail ) , u just have to do this :

$loc = $fet->location();

$loc_str = $loc->to_FTstring() ;

Hope it helps.
Chandan

Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> > ----------
> > From:         Brian Osborne
> > Sent:         Thursday, May 18, 2006 5:54 PM
> > To:   Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
> > Subject:      Re: [Bioperl-l] Reading GenBank Genomic File Annotation
> >
> > Nick,
> >
> > Have you read the Feature-Annotation HOWTO? This would be a good
> starting
> > point...
> >
> > Brian O.
> >
> >
> > On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov
> >
> > wrote:
> >
> > > Would like a fairly simple way to extract certain information from
> Genbank
> > > Genomic File Annotations.
> > > Namely the six D.melanogaster sequences.
> > > Specifically to find gene entries and learn the gene name, begin and
> end and
> > > CDS.
> > > Please point me to appropriate modules and documentation.
> > >
> > >
> > > Nick Staffa
> > > Telephone: 919-316-4569  (NIEHS: 6-4569)
> > > Scientific Computing Support Group
> > > NIEHS Information Technology Support Services Contract
> > > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> > > National Institute of Environmental Health Sciences
> > > National Institutes of Health
> > > Research Triangle Park, North Carolina
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From osborne1 at optonline.net  Fri May 19 19:39:36 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 19 May 2006 15:39:36 -0400
Subject: [Bioperl-l] Reading GenBank Genomic File Annotation
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D087D3@NIHCESMLBX6.nih.gov>
Message-ID: <C0939738.8849%osborne1@optonline.net>

Nick,

This is from the HOWTO:

Another way of describing a feature in Genbank involves multiple start and
end positions. These could be called "split" locations, and a very common
example is the join statement in the CDS feature found in Genbank entries
(e.g. join(45..122,233..267)). This calls for a specialized object,
Bio::Location::SplitLocationI, which is a container for Location objects:

      for my $feature ($seqobj->top_SeqFeatures){
        if ( $feature->location->isa('Bio::Location::SplitLocationI')
                       && $feature->primary_tag eq 'CDS' )  {
            for my $location ( $feature->location->sub_Location ) {
                print $location->start . ".." . $location->end . "\n";
          }
        }
      }


Brian O.


On 5/19/06 2:12 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
wrote:

> Specifically: 
> I have the document to which you refer,
> but have not seen this one thing I need in the printout of tags etc.:
> the values in this line;
>      mRNA            join(380..509,578..1913,7784..8649,9439..10200)
> Is that a  location object?
> 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
>> ----------
>> From:  Brian Osborne
>> Sent:  Thursday, May 18, 2006 5:54 PM
>> To:  Staffa, Nick (NIH/NIEHS) [C]; bioperl-l at lists.open-bio.org
>> Subject:  Re: [Bioperl-l] Reading GenBank Genomic File Annotation
>> 
>> Nick,
>> 
>> Have you read the Feature-Annotation HOWTO? This would be a good starting
>> point...
>> 
>> Brian O.
>> 
>> 
>> On 5/18/06 4:49 PM, "Staffa, Nick (NIH/NIEHS) [C]" <staffa at niehs.nih.gov>
>> wrote:
>> 
>>> Would like a fairly simple way to extract certain information from Genbank
>>> Genomic File Annotations.
>>> Namely the six D.melanogaster sequences.
>>> Specifically to find gene entries and learn the gene name, begin and end and
>>> CDS.
>>> Please point me to appropriate modules and documentation.
>>> 
>>> 
>>> Nick Staffa
>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>> Scientific Computing Support Group
>>> NIEHS Information Technology Support Services Contract
>>> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
>>> National Institute of Environmental Health Sciences
>>> National Institutes of Health
>>> Research Triangle Park, North Carolina
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May 19 20:42:09 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 14:42:09 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <F5CA1CDF-B22E-4DFD-9CC1-7CEC7FF6FD75@watson.wustl.edu>
References: <009f01c67ad6$c359a560$0d00a8c0@PM>	<360BCB49-FF11-4413-92CD-97CFC6E8668A@duke.edu>
	<D525CC49-44E3-4B3F-9300-4084F74DC521@watson.wustl.edu>
	<446DF7CC.5060509@gmx.at>
	<F5CA1CDF-B22E-4DFD-9CC1-7CEC7FF6FD75@watson.wustl.edu>
Message-ID: <446E2DA1.1050503@gmx.at>

hi warren,
that means if I alter the DTD (if that is possible) by adding the 
taxonomic id to the DTD..... then I should have the taxonomic id tag in 
the xml file (theoretically)
but I guess this is only possible with a local search (blastall) but not 
with an online search.

greetings

Warren Gish wrote:
>
> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote:
>
>> hi,
>> I wondered whether is it also possible in the xml output (either WU 
>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>> do a general search.
>> regards
>>
> The taxonomic id is not an entity in the NCBI XML DTD.  If the 
> information was embedded in deflines, one could conceivably parse for 
> it, but I believe the NCBI only distributes taxids in their ASN.1 data 
> and in their pre-formated BLAST databases, and NCBI BLAST only reports 
> taxids in its ASN.1 output format, where taxid is available as an entity.
>
> --Warren
>
>


From cjfields at uiuc.edu  Fri May 19 20:56:56 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Fri, 19 May 2006 15:56:56 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>

You'll have to pull the GI or accession from each hit and do a lookup by either 
grabbing the sequence and using Bio::Species or use Bio::DB::Taxonomy; there 
isn't any tax information directly incorporated into BLAST reports AFAIK.

Chris

---- Original message ----
>Date: Fri, 19 May 2006 10:52:28 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: Warren Gish <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>
>hi,
>I wondered whether is it also possible in the xml output (either WU or 
>NCBI - Blast) to get the species (taxononmy) for every hit, if I do a 
>general search.
>regards
>
>Warren Gish wrote:
>> Right, the WU-BLAST tabbed output contains more fields.  (See http:// 
>> blast.wustl.edu/blast/tabular.html).
>> --Warren
>>
>>   
>>> Whoops - sorry Warren - for some reason I had it in my mind that it  
>>> was different.  So the blastxml parser should work fine.  The  
>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 though,  
>>> right?
>>>
>>> -jason
>>>     
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>   
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri May 19 20:59:35 2006
From: cjfields at uiuc.edu (Christopher Fields)
Date: Fri, 19 May 2006 15:59:35 -0500
Subject: [Bioperl-l] parsing xml output
Message-ID: <65932c77.bb0b33b0.8253400@expms6.cites.uiuc.edu>

Um, I don't think it works that way.  I'm pretty sure the XML is generated from 
the ASN1 output.  I don't think (like Warren says) that you can directly get to the 
tax information.  Indirectly is another matter...

Chris

---- Original message ----
>Date: Fri, 19 May 2006 14:42:09 -0600
>From: Hubert Prielinger <hubert.prielinger at gmx.at>  
>Subject: Re: [Bioperl-l] parsing xml output  
>To: Warren Gish <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>
>hi warren,
>that means if I alter the DTD (if that is possible) by adding the 
>taxonomic id to the DTD..... then I should have the taxonomic id tag in 
>the xml file (theoretically)
>but I guess this is only possible with a local search (blastall) but not 
>with an online search.
>
>greetings
>
>Warren Gish wrote:
>>
>> On May 19, 2006, at 11:52 AM, Hubert Prielinger wrote:
>>
>>> hi,
>>> I wondered whether is it also possible in the xml output (either WU 
>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>>> do a general search.
>>> regards
>>>
>> The taxonomic id is not an entity in the NCBI XML DTD.  If the 
>> information was embedded in deflines, one could conceivably parse for 
>> it, but I believe the NCBI only distributes taxids in their ASN.1 data 
>> and in their pre-formated BLAST databases, and NCBI BLAST only reports 
>> taxids in its ASN.1 output format, where taxid is available as an entity.
>>
>> --Warren
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hubert.prielinger at gmx.at  Fri May 19 21:30:20 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 19 May 2006 15:30:20 -0600
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446E3854.5010708@gmx.at>
References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>
	<446E3854.5010708@gmx.at>
Message-ID: <446E38EC.9020100@gmx.at>

ok, thanks,
it appears that I only need the species where the Protein is derived 
from, so I guess Bio:Species would satisfy me, or?
and it would work that I just pull off the accession from the blast 
output file and then assign the accession code and get as return value  
the  species name.
is it possible to just assign the accession code, because I looked up 
but they were always talking of the entire file.

regards
>
>
> Christopher Fields wrote:
>> You'll have to pull the GI or accession from each hit and do a lookup 
>> by either grabbing the sequence and using Bio::Species or use 
>> Bio::DB::Taxonomy; there isn't any tax information directly 
>> incorporated into BLAST reports AFAIK.
>>
>> Chris
>>
>> ---- Original message ----
>>  
>>> Date: Fri, 19 May 2006 10:52:28 -0600
>>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  Subject: Re: 
>>> [Bioperl-l] parsing xml output  To: Warren Gish 
>>> <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>>>
>>> hi,
>>> I wondered whether is it also possible in the xml output (either WU 
>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I 
>>> do a general search.
>>> regards
>>>
>>> Warren Gish wrote:
>>>    
>>>> Right, the WU-BLAST tabbed output contains more fields.  (See 
>>>> http:// blast.wustl.edu/blast/tabular.html).
>>>> --Warren
>>>>
>>>>        
>>>>> Whoops - sorry Warren - for some reason I had it in my mind that 
>>>>> it  was different.  So the blastxml parser should work fine.  The  
>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9 
>>>>> though,  right?
>>>>>
>>>>> -jason
>>>>>             
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>     
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>   
>
>


From jason.stajich at duke.edu  Fri May 19 22:40:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 19 May 2006 18:40:54 -0400
Subject: [Bioperl-l] parsing xml output
In-Reply-To: <446E38EC.9020100@gmx.at>
References: <5c1c5a79.bb0af5aa.8198d00@expms6.cites.uiuc.edu>
	<446E3854.5010708@gmx.at> <446E38EC.9020100@gmx.at>
Message-ID: <FAE3151B-301F-4A42-9EFD-D1F8D3CBE752@duke.edu>

There is a gi2taxid table in the /pub/taxonomy part of NCBI FTP site  
(ftp.ncbi.nih.gov) -- I have used this to take GI numbers from report  
and get taxonomy for overall classification. I think something like  
this exists in the scripts or examples directory in the bioperl  
distro. I know I posted about it when I wrote about it a while ago.

-jason
On May 19, 2006, at 5:30 PM, Hubert Prielinger wrote:

> ok, thanks,
> it appears that I only need the species where the Protein is derived
> from, so I guess Bio:Species would satisfy me, or?
> and it would work that I just pull off the accession from the blast
> output file and then assign the accession code and get as return value
> the  species name.
> is it possible to just assign the accession code, because I looked up
> but they were always talking of the entire file.
>
> regards
>>
>>
>> Christopher Fields wrote:
>>> You'll have to pull the GI or accession from each hit and do a  
>>> lookup
>>> by either grabbing the sequence and using Bio::Species or use
>>> Bio::DB::Taxonomy; there isn't any tax information directly
>>> incorporated into BLAST reports AFAIK.
>>>
>>> Chris
>>>
>>> ---- Original message ----
>>>
>>>> Date: Fri, 19 May 2006 10:52:28 -0600
>>>> From: Hubert Prielinger <hubert.prielinger at gmx.at>  Subject: Re:
>>>> [Bioperl-l] parsing xml output  To: Warren Gish
>>>> <gish at watson.wustl.edu>, bioperl-l at bioperl.org
>>>>
>>>> hi,
>>>> I wondered whether is it also possible in the xml output (either WU
>>>> or NCBI - Blast) to get the species (taxononmy) for every hit, if I
>>>> do a general search.
>>>> regards
>>>>
>>>> Warren Gish wrote:
>>>>
>>>>> Right, the WU-BLAST tabbed output contains more fields.  (See
>>>>> http:// blast.wustl.edu/blast/tabular.html).
>>>>> --Warren
>>>>>
>>>>>
>>>>>> Whoops - sorry Warren - for some reason I had it in my mind that
>>>>>> it  was different.  So the blastxml parser should work fine.  The
>>>>>> WUBLAST tab-delimited output is different than NCBI's -m8/9
>>>>>> though,  right?
>>>>>>
>>>>>> -jason
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/


From ewijaya at i2r.a-star.edu.sg  Sat May 20 12:36:44 2006
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sat, 20 May 2006 20:36:44 +0800
Subject: [Bioperl-l] Method for checking Sequence type of a file
Message-ID: <30362db229c.446f7ddc@i2r.a-star.edu.sg>


Dear expert,

Is there any Bioperl method that allows
you to check verify sequence type in a file?

For example, given a file we wish
to check (return true  or false) whether
it is in FASTA format, GENBANK format, etc.

This method is useful in web application
as taint checking procedure.

Regards,
Edward WIJAYA
SINGAPORE


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From aaron.j.mackey at gsk.com  Fri May 19 13:33:01 2006
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Fri, 19 May 2006 09:33:01 -0400
Subject: [Bioperl-l] writing a pairwise alignment module: XS and Inline
 C?
In-Reply-To: <134ede0b0605181407l52d1c2c3x79dd7f177ae7b828@mail.gmail.com>
Message-ID: <OF8F6BEDD1.4ACB4532-ON85257173.004A1117-85257173.004A6FAF@gsk.com>

bioperl-ext is the package in which alignment algorithms and/or BioPerl 
"wrapped" external C libraries live.  Subprojects in bioperl-ext use both 
XS and Inline::C, that's up to you.

You'll need to get your C code compiled to a dynamically loaded library 
(.so) to use either XS or Inline::C; this precludes any reuse of the C 
main() function (although your Inline::C wrapper might recapitulate/copy 
the main() function code).

Out of curiosity, what pairwise alignment algorithm are you using?  This 
is a heavily beaten path, you might want to dig around first to see if 
someone else already has what you need.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 05/18/2006 05:07:42 PM:

> I am currently using a pairwise alignment algorithm written in C (not by
> me).  The program consists of a library of routines, structures, and
> definitions which I do not want to spend a lot of time abstracting.  I
> already have a hack method of writing the parameters and inputs I want 
from
> perl, calling the c program with system( ), and then parsing the output 
in
> Perl.  Any good programmer would probably smack me but I'm just an 
undergrad
> and I needed to show my boss that this works in order to spend more time 
on
> it.
> 
> So on to my question, what is the preferred method of extending Bioperl 
to
> use this algorithm?  I have just read the XS tutorial and a bit about 
Inline
> C.  Can I put the main function in my script using Inline, and then just
> point Inline at the rest of the C library?  The program has several
> C-structures that are semantically equivalent to Bioperl objects, so 
just
> need somewhere to start.  I will spend some more time so that I have a 
more
> specific question, I just wanted a little feedback, this is my first 
post to
> the bioperl list.
> 
> Thanks,
> Adam
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason.stajich at duke.edu  Sat May 20 14:50:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat, 20 May 2006 10:50:17 -0400
Subject: [Bioperl-l] Method for checking Sequence type of a file
In-Reply-To: <30362db229c.446f7ddc@i2r.a-star.edu.sg>
References: <30362db229c.446f7ddc@i2r.a-star.edu.sg>
Message-ID: <F42D42CC-B609-48DF-B291-E0CE803D527C@duke.edu>

Try Bio::Tools::GuessSeqFormat

On May 20, 2006, at 8:36 AM, Wijaya Edward wrote:

>
> Dear expert,
>
> Is there any Bioperl method that allows
> you to check verify sequence type in a file?
>
> For example, given a file we wish
> to check (return true  or false) whether
> it is in FASTA format, GENBANK format, etc.
>
> This method is useful in web application
> as taint checking procedure.
>
> Regards,
> Edward WIJAYA
> SINGAPORE
>
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From chen_li3 at yahoo.com  Sun May 21 00:15:01 2006
From: chen_li3 at yahoo.com (chen li)
Date: Sat, 20 May 2006 17:15:01 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
Message-ID: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>

Dear all,


I try one script from GraphicsHowTo under Cygwin
environment(GD and libpng already installed). I type
this line in Cygwin X window:


$ perl render_blast1.pl data1.txt | display -

And here is the result:

display: no decode delegate for this image format
`/tmp/magick-qKiRPDRS'.

Any idea?


Thank you very much,

Li


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From osborne1 at optonline.net  Sun May 21 00:59:06 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sat, 20 May 2006 20:59:06 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <C095339A.886C%osborne1@optonline.net>

Chen,

Not sure. However, whenever I see a new or incomprehensible error message
like "display: no decode delegate for this image format" I Google it.

Brian O.


On 5/20/06 8:15 PM, "chen li" <chen_li3 at yahoo.com> wrote:

> Dear all,
> 
> 
> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
> 
> 
> $ perl render_blast1.pl data1.txt | display -
> 
> And here is the result:
> 
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.
> 
> Any idea?
> 
> 
> Thank you very much,
> 
> Li
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From n.saunders at uq.edu.au  Sun May 21 22:17:44 2006
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Mon, 22 May 2006 08:17:44 +1000
Subject: [Bioperl-l] problems with Bio::Graph
Message-ID: <4470E708.3070402@uq.edu.au>

dear all,

I am having some problems with the Bio::Graph modules.  Running Bioperl 1.5.0 
RC1 with Ubuntu 5.10 i686.

I would like to parse files in PSI MI XML 2.5 format and for selected proteins, 
get the Uniprot accession of interacting partners (this is outlined in the 
documentation for Bio::Graph::ProteinGraph).  I wrote a very simple test script 
and ran it on a selection of XML files.  The script is simply:

----------------------------------------------------------------
use strict;
use Bio::Graph::IO;

my $mifile = shift || die("Usage = biograph.pl <MI datafile>\n");
my $graphio = Bio::Graph::IO->new('-file'   => $mifile,
		  		  '-format' => 'psi_xml');
my $gr = $graphio->next_network;
----------------------------------------------------------------

Here's a summary of the error messages with some sample files (I tried PSI MI 
XML versions 1 and 2.5):

1.  MINT database 9707552_small.xml (PSI 2.5)
Can't call method "att" on an undefined value at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.

2. IntAct database yeast_small-11.xml (PSI 2.5)
Can't call method "att" on an undefined value at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.

3. IntAct database yeast_small-11.xml (PSI 1)
Use of uninitialized value in string eq at 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126.

4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1)
These give no errors

5. DIP file dip20060402.mif (PSI 1, complete dataset)
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1'
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328
STACK: Bio::Species::validate_species_name 
/usr/local/share/perl/5.8.7/Bio/Species.pm:340
STACK: Bio::Species::classification /usr/local/share/perl/5.8.7/Bio/Species.pm:170
STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118
STACK: Bio::Graph::IO::psi_xml::_proteinInteractor 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105
STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473
STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469
STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187
STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233
STACK: Bio::Graph::IO::psi_xml::next_network 
/usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79
STACK: ./biograph.pl:18
-----------------------------------------------------------


Looking at the module code, it seems that the first 2 errors relate to a 
parameter "proteinInteractorRef", found in PSI MI version 1 but not version 2.5. 
  Error 3 I haven't yet figured out.  DIP PSI MI XML version 1 for single 
species seems OK, but it seems there are species names in the complete dataset 
that cause problems (error 5).


Is the CVS version of Bio::Graph any better at handling PSI MI XML?  Are there 
plans to get it to work with version 2.5 files from all sources (MINT and 
IntAct) ?  Googling and checking the list archives didn't give a lot of hits 
which made me think it's not a widely-used module.

thanks,
Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://psychro.bioinformatics.unsw.edu.au/neil


From torsten.seemann at infotech.monash.edu.au  Mon May 22 01:31:56 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 22 May 2006 11:31:56 +1000
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <4471148C.5090404@infotech.monash.edu.au>

> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
> $ perl render_blast1.pl data1.txt | display -
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.

You are piping the output of the Perl script (which is a GIF/PNG image) 
into the input of a program called "display". This program is part of 
the ImageMagick toolkit, standard on most Linux installations. Because 
you are using Windows you probably don't have it installed! Try this:

$ perl render_blast1.pl data1.txt > image.gif

Then load 'image.gif' into whatever your favourite image viewer is.
	
-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From darin.london at duke.edu  Mon May 22 15:29:45 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 11:29:45 -0400
Subject: [Bioperl-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <4471D8E9.8090109@duke.edu>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.


From darin.london at duke.edu  Mon May 22 16:00:55 2006
From: darin.london at duke.edu (Darin London)
Date: Mon, 22 May 2006 09:00:55 -0700
Subject: [Bioperl-l] [Bioperl-announce-l] BOSC 2006 2nd Call for Papers
In-Reply-To: <4471CE49.80109@duke.edu>
References: <44294B65.4050207@duke.edu> <4471CE49.80109@duke.edu>
Message-ID: <000301c67db8$e8391f70$6400a8c0@CodonSolutions.local>

2nd CALL FOR SPEAKERS

This is the second and last official call for speakers to submit their
abstracts to speak at  BOSC 2006
in Fortaleza, Brasil.  In order to be considered as a potential speaker,
an abstract must be recieved by
Monday, June 5th, 2006.  We look forward to a great conference this
year. Please consult
The Official BOSC 2006 Website at:

http://www.open-bio.org/wiki/BOSC_2006

for more details and information.
 

In addition, a BOSC weblog has been setup to make it easier to
desiminate all BOSC
related announcements:

http://wiki.open-bio.org/boscblog/

And if you have an ICAL compatible Calendar, there is an EventDB
calendar set up with all
BOSC related deadlines.

http://eventful.com/groups/G0-001-000014747-0

More information about ISMB can be found at the Official ISMB 2006 Website:

http://ismb2006.cbi.cnptia.embrapa.br/

Thank You, and we look forward to seeing you all,

The BOSC Organizing Committee.

_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From osborne1 at optonline.net  Mon May 22 21:37:50 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 22 May 2006 17:37:50 -0400
Subject: [Bioperl-l] problems with Bio::Graph
In-Reply-To: <4470E708.3070402@uq.edu.au>
Message-ID: <C097A76E.88A9%osborne1@optonline.net>

Neil,

Let me propose an alternative. In the past few months I've been working on a
Bioperl package for handling protein interaction networks, it is called
bioperl-network. It's similar to the Bio::Graph modules, except for the
following:

- It does not use Nat Goodman's SimpleGraph, it uses Perl's Graph. The
advantage is that we are not responsible for maintaining the algorithm code,
the disadvantage is that Graph has some bugs but Jarkko Hietaniemi has been
working on these and has fixed some significant ones recently.

- It uses names and concepts from Graph. It also has separate notions of
edge and interaction, where one edge can have one or more interactions.

- It uses more method names and conventions borrowed from interaction
databases and PSI MI. For example, a node can be a protein complex composed
of multiple Seq objects, not just a protein.

This package is a makeover of Bio::Graph, therefore Nat Goodman and Richard
Adams are major contributors to it. It's also worth mentioning that it's not
complete, meaning it won't parse all fields from PSI MI 2 or 2.5 but I think
it should be able to handle the code you've shown (and if it cannot then
I'll see that it's fixed). I don't know about PSI MI version 1 but if I'm
not mistaken there's a version 1 -> version 2 converter.

I'm about to put this into CVS so you can take a look, should you choose to.

Brian O.


On 5/21/06 6:17 PM, "Neil Saunders" <n.saunders at uq.edu.au> wrote:

> dear all,
> 
> I am having some problems with the Bio::Graph modules.  Running Bioperl 1.5.0
> RC1 with Ubuntu 5.10 i686.
> 
> I would like to parse files in PSI MI XML 2.5 format and for selected
> proteins, 
> get the Uniprot accession of interacting partners (this is outlined in the
> documentation for Bio::Graph::ProteinGraph).  I wrote a very simple test
> script 
> and ran it on a selection of XML files.  The script is simply:
> 
> ----------------------------------------------------------------
> use strict;
> use Bio::Graph::IO;
> 
> my $mifile = shift || die("Usage = biograph.pl <MI datafile>\n");
> my $graphio = Bio::Graph::IO->new('-file'   => $mifile,
>  '-format' => 'psi_xml');
> my $gr = $graphio->next_network;
> ----------------------------------------------------------------
> 
> Here's a summary of the error messages with some sample files (I tried PSI MI
> XML versions 1 and 2.5):
> 
> 1.  MINT database 9707552_small.xml (PSI 2.5)
> Can't call method "att" on an undefined value at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.
> 
> 2. IntAct database yeast_small-11.xml (PSI 2.5)
> Can't call method "att" on an undefined value at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 173.
> 
> 3. IntAct database yeast_small-11.xml (PSI 1)
> Use of uninitialized value in string eq at
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm line 126.
> 
> 4. DIP files Scere20060402.mif, Ecoli20060402.mif (PSI 1)
> These give no errors
> 
> 5. DIP file dip20060402.mif (PSI 1, complete dataset)
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Invalid species name 'immunodeficiency virus type 1, HIV-1'
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/share/perl/5.8.7/Bio/Root/Root.pm:328
> STACK: Bio::Species::validate_species_name
> /usr/local/share/perl/5.8.7/Bio/Species.pm:340
> STACK: Bio::Species::classification
> /usr/local/share/perl/5.8.7/Bio/Species.pm:170
> STACK: Bio::Species::new /usr/local/share/perl/5.8.7/Bio/Species.pm:118
> STACK: Bio::Graph::IO::psi_xml::_proteinInteractor
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:105
> STACK: XML::Twig::_twig_end /usr/share/perl5/XML/Twig.pm:1473
> STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469
> STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187
> STACK: XML::Parser::parsefile /usr/lib/perl5/XML/Parser.pm:233
> STACK: Bio::Graph::IO::psi_xml::next_network
> /usr/local/share/perl/5.8.7/Bio/Graph/IO/psi_xml.pm:79
> STACK: ./biograph.pl:18
> -----------------------------------------------------------
> 
> 
> Looking at the module code, it seems that the first 2 errors relate to a
> parameter "proteinInteractorRef", found in PSI MI version 1 but not version
> 2.5. 
>   Error 3 I haven't yet figured out.  DIP PSI MI XML version 1 for single
> species seems OK, but it seems there are species names in the complete dataset
> that cause problems (error 5).
> 
> 
> Is the CVS version of Bio::Graph any better at handling PSI MI XML?  Are there
> plans to get it to work with version 2.5 files from all sources (MINT and
> IntAct) ?  Googling and checking the list archives didn't give a lot of hits
> which made me think it's not a widely-used module.
> 
> thanks,
> Neil


From torsten.seemann at infotech.monash.edu.au  Mon May 22 21:53:02 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 23 May 2006 07:53:02 +1000
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>
References: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>
Message-ID: <447232BE.1080001@infotech.monash.edu.au>

Chen Li

>  perl render_blast1.pl data1.txt >im.png

Based on http://bioperl.org/wiki/HOWTO:Graphics I believe the example 
script is creating a PNG image. The last line is:
print $panel->png;

> and Perl runs without any problem. I use adobe
> photoshop to open them and Adobe can't recognize them.
> If I use ACDSee to open them I only get a black
> background. If I issue this line under Cygwin X window
> display im.png  or display im.gif
> Cygwin says:
> display: Improper image header `im.png'.
> It seems Perl can't produce an image with right
> format.

Are you sure Perl is producing a PNG file at all?
How many bytes does im.png use? Zero?

Did you notice this in http://bioperl.org/wiki/HOWTO:Graphics ?

It says: "If you are on a Windows platform, you need to put STDOUT into 
binary mode so that the PNG file does not go through Window's carriage 
return/linefeed transformations. Before the final print statement, put 
the statement binmode(STDOUT)."

ie. your script should have

binmode(STDOUT);
print $panel->png;

as the last 2 lines.

> Do you experience the same problem before?

No.

--Torsten


From chen_li3 at yahoo.com  Mon May 22 13:25:53 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 22 May 2006 06:25:53 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <4471148C.5090404@infotech.monash.edu.au>
Message-ID: <20060522132553.21896.qmail@web36804.mail.mud.yahoo.com>

Dear Dr. Seemann,


Thank you very much for the reply.

I issue this line:
 perl render_blast1.pl data1.txt >im.gif
or 
 perl render_blast1.pl data1.txt >im.png

and Perl runs without any problem. I use adobe
photoshop to open them and Adobe can't recognize them.
If I use ACDSee to open them I only get a black
background. If I issue this line under Cygwin X window

display im.png  or display im.gif

Cygwin says:

display: Improper image header `im.png'.

or display: Improper image header `im.gif'.

It seems Perl can't produce an image with right
format.


Do you experience the same problem before?

Li


--- Torsten Seemann
<torsten.seemann at infotech.monash.edu.au> wrote:

> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I
> type
> > this line in Cygwin X window:
> > $ perl render_blast1.pl data1.txt | display -
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> 
> You are piping the output of the Perl script (which
> is a GIF/PNG image) 
> into the input of a program called "display". This
> program is part of 
> the ImageMagick toolkit, standard on most Linux
> installations. Because 
> you are using Windows you probably don't have it
> installed! Try this:
> 
> $ perl render_blast1.pl data1.txt > image.gif
> 
> Then load 'image.gif' into whatever your favourite
> image viewer is.
> 	
> -- 
> Dr Torsten Seemann              
> http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash
> University, Australia
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Mon May 22 22:57:42 2006
From: chen_li3 at yahoo.com (chen li)
Date: Mon, 22 May 2006 15:57:42 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <447232BE.1080001@infotech.monash.edu.au>
Message-ID: <20060522225742.78245.qmail@web36804.mail.mud.yahoo.com>

Hi,

I try both: either with or without this statement 
 binmode(STDOUT) before the last line print
$panel->png; But there are no differenes. I get a file
of 2432 bytes.

Li


> Chen Li
> 
> >  perl render_blast1.pl data1.txt >im.png
> 
> Based on http://bioperl.org/wiki/HOWTO:Graphics I
> believe the example 
> script is creating a PNG image. The last line is:
> print $panel->png;
> 
> > and Perl runs without any problem. I use adobe
> > photoshop to open them and Adobe can't recognize
> them.
> > If I use ACDSee to open them I only get a black
> > background. If I issue this line under Cygwin X
> window
> > display im.png  or display im.gif
> > Cygwin says:
> > display: Improper image header `im.png'.
> > It seems Perl can't produce an image with right
> > format.
> 
> Are you sure Perl is producing a PNG file at all?
> How many bytes does im.png use? Zero?
> 
> Did you notice this in
> http://bioperl.org/wiki/HOWTO:Graphics ?
> 
> It says: "If you are on a Windows platform, you need
> to put STDOUT into 
> binary mode so that the PNG file does not go through
> Window's carriage 
> return/linefeed transformations. Before the final
> print statement, put 
> the statement binmode(STDOUT)."
> 
> ie. your script should have
> 
> binmode(STDOUT);
> print $panel->png;
> 
> as the last 2 lines.
> 
> > Do you experience the same problem before?
> 
> No.
> 
> --Torsten
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From barry.moore at genetics.utah.edu  Tue May 23 01:00:06 2006
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Mon, 22 May 2006 19:00:06 -0600
Subject: [Bioperl-l] Problems with Unflattener.pm
Message-ID: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu>

Hi All,

NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into  
an infinite recursive loop.  The trouble occurs in the method  
find_best_matches between lines 2258 and 2281, and in particular the  
loop is perpetuated by line 2273.   NT_113910 has a fairly complex  
features table, and but I have as yet been unable to figure out why  
this loop is not exiting properly.  This has been submitted to  
bugzilla, but I?ll post here so it gets documented on the list also.   
Any suggestions from Chris or others would be greatly appreciated.

This problem can be recreated as follows:

Grab NT_113910 from genbank.
bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk

Pass NT_113910.gbk on the command line to the attached script.


#!/usr/bin/perl;

use strict;
use warnings;

use Bio::SeqIO;
use Bio::SeqFeature::Tools::Unflattener;

my $file = shift;

# generate an Unflattener object
my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
#$unflattener->verbose(1);

# first fetch a genbank SeqI object
my $seqio =
     Bio::SeqIO->new(-file   => $file,
                     -format => 'GenBank');
my $out =
     Bio::SeqIO->new(-format => 'asciitree');
while (my $seq = $seqio->next_seq()) {

         # get top level unflattended SeqFeatureI objects
         $unflattener->unflatten_seq(-seq       => $seq,
                                     -use_magic => 1);
         $out->write_seq($seq);
}


From miker at biotiquesystems.com  Mon May 22 23:56:52 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Mon, 22 May 2006 16:56:52 -0700
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
Message-ID: <002a01c67dfb$663cc600$c100a8c0@mike>


As best as I can tell, using Bio::SeqIO to parse a uniprot file ignores the
sequence version, and calling seq_version() on the resulting RichSeq object
returns undef.

It looks like swiss.pm is trying to parse the version out of the SV line, which
apparently doesn't exist any more?  The sequence version(s) are now specified as
part of the Date (DT) lines.  

Is this not a bug?  Is swiss.pm not designed to parse uniprot files?

Thanks for any help ...


From jason.stajich at duke.edu  Tue May 23 01:37:13 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 22 May 2006 21:37:13 -0400
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <002a01c67dfb$663cc600$c100a8c0@mike>
References: <002a01c67dfb$663cc600$c100a8c0@mike>
Message-ID: <B62A5429-083F-4B93-87EF-0F5DCD4033FE@duke.edu>

Sounds like a "missing feature" =)

AFAIK the module was only written for swissprot files.  It is  
possible there have been changes in the format that have not been  
tracked to the current code.  We'd certainly appreciate someone  
testing it out as versions evolve.  If you submit a bug to bugzilla  
with version of bioperl and example files you can track when a fix is  
in.  We of course appreciate anyone's efforts to provide a patch as  
most bugs get fixed of late when someone gets "itchy" enough to fix  
them.

-jason

On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:

>
> As best as I can tell, using Bio::SeqIO to parse a uniprot file  
> ignores the
> sequence version, and calling seq_version() on the resulting  
> RichSeq object
> returns undef.
>
> It looks like swiss.pm is trying to parse the version out of the SV  
> line, which
> apparently doesn't exist any more?  The sequence version(s) are now  
> specified as
> part of the Date (DT) lines.
>
> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>
> Thanks for any help ...
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From jason.stajich at duke.edu  Tue May 23 02:04:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon, 22 May 2006 22:04:17 -0400
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike>
References: <003301c67e0b$5dd44410$c100a8c0@mike>
Message-ID: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu>

We ask that people post patches to the bugzilla as an attachment to  
the bugzilla so we can track what and why the bug was that the patch  
fixes.

I am not totally sure this patch works because it seems like we need  
to strip out more information now from the DT line if the $date  
actually contains more information than just the date.

If you would go ahead and create a bug in bugzilla for  this (http:// 
bugzilla.open-bio.org) this sort of conversation can be tracked to  
the bug.

If any of this is unclear please let us know - I though we had put  
some pages up about this sort of thing on the wiki but maybe they  
need to be expanded.

-jason
On May 22, 2006, at 9:51 PM, Michael Rogoff wrote:

> I have a patch that seems to work but I'm not familiar with the  
> proper method to
> "provide" it.  How do I go about that?
>
> The patch is pretty simple, it just parses the sequence version out  
> of the date
> line where it now hides:
>
>          #date
>          elsif( /^DT\s+(.*)/ ) {
>            my $date = $1;
> +
> +          if ($date =~ /sequence version (\d+)/i) {
> +              $params{'-seq_version'} ||= $1;
> +          }
> +
>            $date =~ s/\;//;
>            $date =~ s/\s+$//;
>            push @{$params{'-dates'}}, $date;
>          }
>
> By the way, what is the difference between Bio::Seq::version and
> Bio::Seq::RichSeq::seq_version?
>
>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Monday, May 22, 2006 6:37 PM
>> To: Michael Rogoff
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
>>
>>
>> Sounds like a "missing feature" =)
>>
>> AFAIK the module was only written for swissprot files.  It is
>> possible there have been changes in the format that have not been
>> tracked to the current code.  We'd certainly appreciate someone
>> testing it out as versions evolve.  If you submit a bug to bugzilla
>> with version of bioperl and example files you can track when
>> a fix is
>> in.  We of course appreciate anyone's efforts to provide a patch as
>> most bugs get fixed of late when someone gets "itchy" enough to fix
>> them.
>>
>> -jason
>>
>> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
>>
>>>
>>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
>>> ignores the
>>> sequence version, and calling seq_version() on the resulting
>>> RichSeq object
>>> returns undef.
>>>
>>> It looks like swiss.pm is trying to parse the version out
>> of the SV
>>> line, which
>>> apparently doesn't exist any more?  The sequence version(s)
>> are now
>>> specified as
>>> part of the Date (DT) lines.
>>>
>>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>>>
>>> Thanks for any help ...
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From Marc.Logghe at DEVGEN.com  Tue May 23 07:08:37 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Tue, 23 May 2006 09:08:37 +0200
Subject: [Bioperl-l] problems iwth Bio::graphics module
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com>

Hi Li,
Did you check your script for any other print statements (to STDOUT,
that is) that potentially could contaminate your png stream ?

Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of chen li
> Sent: Tuesday, May 23, 2006 12:58 AM
> To: Torsten Seemann
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] problems iwth Bio::graphics module
> 
> Hi,
> 
> I try both: either with or without this statement
>  binmode(STDOUT) before the last line print $panel->png; But 
> there are no differenes. I get a file of 2432 bytes.
> 
> Li
> 
> 
> 
> > Chen Li
> > 
> > >  perl render_blast1.pl data1.txt >im.png
> > 
> > Based on http://bioperl.org/wiki/HOWTO:Graphics I believe 
> the example 
> > script is creating a PNG image. The last line is:
> > print $panel->png;
> > 
> > > and Perl runs without any problem. I use adobe photoshop to open 
> > > them and Adobe can't recognize
> > them.
> > > If I use ACDSee to open them I only get a black background. If I 
> > > issue this line under Cygwin X
> > window
> > > display im.png  or display im.gif
> > > Cygwin says:
> > > display: Improper image header `im.png'.
> > > It seems Perl can't produce an image with right format.
> > 
> > Are you sure Perl is producing a PNG file at all?
> > How many bytes does im.png use? Zero?
> > 
> > Did you notice this in
> > http://bioperl.org/wiki/HOWTO:Graphics ?
> > 
> > It says: "If you are on a Windows platform, you need to put STDOUT 
> > into binary mode so that the PNG file does not go through Window's 
> > carriage return/linefeed transformations. Before the final print 
> > statement, put the statement binmode(STDOUT)."
> > 
> > ie. your script should have
> > 
> > binmode(STDOUT);
> > print $panel->png;
> > 
> > as the last 2 lines.
> > 
> > > Do you experience the same problem before?
> > 
> > No.
> > 
> > --Torsten
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection 
> around http://mail.yahoo.com 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From chen_li3 at yahoo.com  Tue May 23 13:27:06 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 06:27:06 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746E4E@ANTARESIA.be.devgen.com>
Message-ID: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com>

Dear Dr. Logghe,

Thank you so much. I have the script worked after
getting your suggestion under Cygwin. Here are the
last two lines:

either binmode (STDOUT);
       print STDOUT $panel->png;

or only print STDOUT $panel->png;

They both work for me. I know the default output in
perl to the screen. I don't why it works if STDOUT
after print is added. Could you explain it?  

BTW I copy  this script from GraphicsHowTo on Bioperl
website and only one line contains print statement,
which is 'print $panel->png'. 

Once again thank you so much,

Li

--- Marc Logghe <Marc.Logghe at devgen.com> wrote:

> Hi Li,
> Did you check your script for any other print
> statements (to STDOUT,
> that is) that potentially could contaminate your png
> stream ?
> 
> Marc
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org 
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On
> Behalf Of chen li
> > Sent: Tuesday, May 23, 2006 12:58 AM
> > To: Torsten Seemann
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] problems iwth
> Bio::graphics module
> > 
> > Hi,
> > 
> > I try both: either with or without this statement
> >  binmode(STDOUT) before the last line print
> $panel->png; But 
> > there are no differenes. I get a file of 2432
> bytes.
> > 
> > Li
> > 
> > 
> > 
> > > Chen Li
> > > 
> > > >  perl render_blast1.pl data1.txt >im.png
> > > 
> > > Based on http://bioperl.org/wiki/HOWTO:Graphics
> I believe 
> > the example 
> > > script is creating a PNG image. The last line
> is:
> > > print $panel->png;
> > > 
> > > > and Perl runs without any problem. I use adobe
> photoshop to open 
> > > > them and Adobe can't recognize
> > > them.
> > > > If I use ACDSee to open them I only get a
> black background. If I 
> > > > issue this line under Cygwin X
> > > window
> > > > display im.png  or display im.gif
> > > > Cygwin says:
> > > > display: Improper image header `im.png'.
> > > > It seems Perl can't produce an image with
> right format.
> > > 
> > > Are you sure Perl is producing a PNG file at
> all?
> > > How many bytes does im.png use? Zero?
> > > 
> > > Did you notice this in
> > > http://bioperl.org/wiki/HOWTO:Graphics ?
> > > 
> > > It says: "If you are on a Windows platform, you
> need to put STDOUT 
> > > into binary mode so that the PNG file does not
> go through Window's 
> > > carriage return/linefeed transformations. Before
> the final print 
> > > statement, put the statement binmode(STDOUT)."
> > > 
> > > ie. your script should have
> > > 
> > > binmode(STDOUT);
> > > print $panel->png;
> > > 
> > > as the last 2 lines.
> > > 
> > > > Do you experience the same problem before?
> > > 
> > > No.
> > > 
> > > --Torsten
> > > 
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection 
> > around http://mail.yahoo.com 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From lstein at cshl.edu  Tue May 23 14:06:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 23 May 2006 10:06:27 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
Message-ID: <200605231006.28392.lstein@cshl.edu>

Hi,

It is possible that your version of display can't handle PNG images. Try 
saving the output as a file and then opening it in another image program:

	perl render_blast1.pl data1.txt > data1.png

Another thing to watch out for is that, depending on what version of Perl 
you're using, you may have to insert this statement into the render_blast1.pl 
script (somewhere near the top):

	binmode STDOUT;

Lincoln


On Saturday 20 May 2006 20:15, chen li wrote:
> Dear all,
>
>
> I try one script from GraphicsHowTo under Cygwin
> environment(GD and libpng already installed). I type
> this line in Cygwin X window:
>
>
> $ perl render_blast1.pl data1.txt | display -
>
> And here is the result:
>
> display: no decode delegate for this image format
> `/tmp/magick-qKiRPDRS'.
>
> Any idea?
>
>
> Thank you very much,
>
> Li
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From Derek.Fairley at bll.n-i.nhs.uk  Tue May 23 14:39:16 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Tue, 23 May 2006 15:39:16 +0100
Subject: [Bioperl-l] Bio::Restriction::IO query
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C04019F@bllmail.bll.n-i.nhs.uk>

Hi folks,

I'm new to BioPerl, and struggling to make the Bio::Restriction::*
modules work (using BioPerl-1.4; Perl-5.8.1; Linux-2.4). Specifically,
I'm having some trouble understanding the behaviour of the
Bio::Restriction::IO module. I'm trying to use this to create a
Bio::Restriction::EnzymeCollection object from a local REBASE file
(which is in bairoch-format); this will in turn be passed to a
Bio::Restriction::Analysis object.

The following test script (derived from the Bio::Restriction::IO
perldoc) runs fine:

#! /usr/bin/perl -w
use strict;
use warnings;
use Bio::Restriction::IO;

my $in = Bio::Restriction::IO->new(	-file => "REBASE_file",
						-format =>'Bairoch');
my $collection = $in->read();
print "Number of REs in the collection: ", scalar
$collection->each_enzyme, "\n";

#note that using -format=>'bairoch' without capitalisation (as shown in
perldoc synopsis) throws an exception: Failed to load module
Bio::Restriction::IO::bairoch...

However... the test script returns the number 532 - the number of
enzymes in the default enzyme set - regardless of the number of enzymes
in the file. A default Bio::Restriction::EnzymeCollection object has
presumably been created (as the 'read()' and 'each_enzyme' methods are
available) but it didn't come from the local file. The result is the
same if the Bio::Restriction::IO->new() method is called with no
arguments - a default EnzymeCollection object is created. It's not clear
to me where this has come from.

My (mis?)understanding was that the default set of enzymes would be
loaded on creation of a new Bio::Restriction::Analysis object (in the
absence of a -enzymes=>... argument). Presumably this is down to my poor
understanding of the BioPerl object model... ;-)

So: how should I create an EnzymeCollection object from file?

Any help or advice would be gratefully received.

PS. Congratulations to the development team for creating a very
impressive and useful open source toolkit.

Derek.


-----------------------------------------
Derek Fairley, Ph.D.
Regional Virus Laboratory,
Kelvin Building,
Royal Victoria Hospital, 
Grosvenor Road,
Belfast,
N. Ireland.
BT12 6BA

Tel. +44 (0)2890 635303


From rowan.mitchell at bbsrc.ac.uk  Tue May 23 14:53:42 2006
From: rowan.mitchell at bbsrc.ac.uk (rowan mitchell (RRes-Roth))
Date: Tue, 23 May 2006 15:53:42 +0100
Subject: [Bioperl-l] Assembly::IO ace output
Message-ID: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>

Hi 

I am very interested in writing ace format files and had assumed that I
would be able to do this with Assembly::IO until I tried it! I see there
has been some correspondence last year on this, but as far as I can see
this is still not implemented in 1.5.1. Is this correct ? Is it planned
to be included; are there modules under development available ?

many thanks

Rowan

===============================================
Dr Rowan Mitchell
Rothamsted Research
Harpenden
Herts AL5 2JQ UK

Tel: +44 (0)1582 763133 x2469
Fax: +44 (0)1582 763010
E-mail: rowan.mitchell at bbsrc.ac.uk
WWW: http://www.rothamsted.bbsrc.ac.uk/
===============================================
Rothamsted Research is a company limited by guarantee, registered in
England under the registration number 2393175 and a not for profit
charity number 802038.


From rfsouza at cecm.usp.br  Tue May 23 20:17:36 2006
From: rfsouza at cecm.usp.br (Robson Francisco de Souza {S})
Date: Tue, 23 May 2006 17:17:36 -0300
Subject: [Bioperl-l] Assembly::IO ace output
In-Reply-To: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>
References: <EFDAAE7F4B83D243868A2F25AD8A4B38056578A8@rothe2ksrv1.rothamsted.bbsrc.ac.uk>
Message-ID: <20060523201736.GA28401@cecm.usp.br>

Hi Rowan,

On Tue, May 23, 2006 at 03:53:42PM +0100, rowan mitchell (RRes-Roth) wrote:
> Hi 
> 
> I am very interested in writing ace format files and had assumed that I
> would be able to do this with Assembly::IO until I tried it! I see there
> has been some correspondence last year on this, but as far as I can see
> this is still not implemented in 1.5.1. Is this correct ? Is it planned
> to be included; are there modules under development available ?

As far as I know, there are no plans to add write support to
Bio::Assembly::IO. When I wrote the original modules there was no need
for this so I left it aside.

Best regards,
Robson

> many thanks
> 
> Rowan
> 
> ===============================================
> Dr Rowan Mitchell
> Rothamsted Research
> Harpenden
> Herts AL5 2JQ UK
> 
> Tel: +44 (0)1582 763133 x2469
> Fax: +44 (0)1582 763010
> E-mail: rowan.mitchell at bbsrc.ac.uk
> WWW: http://www.rothamsted.bbsrc.ac.uk/
> ===============================================
> Rothamsted Research is a company limited by guarantee, registered in
> England under the registration number 2393175 and a not for profit
> charity number 802038.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From lstein at cshl.edu  Tue May 23 20:53:34 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 23 May 2006 16:53:34 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231006.28392.lstein@cshl.edu>
References: <20060521001501.59274.qmail@web36808.mail.mud.yahoo.com>
	<200605231006.28392.lstein@cshl.edu>
Message-ID: <200605231653.36087.lstein@cshl.edu>

Hi Chen,

It looks to me like you cut and paste the data1.txt file from the web site, 
consequently replacing the tabs with spaces. Please get table1.txt from the 
BioPerl distribution, as instructed in the tutorial.

Best,

Lincoln

On Tuesday 23 May 2006 10:06, Lincoln Stein wrote:
> Hi,
>
> It is possible that your version of display can't handle PNG images. Try
> saving the output as a file and then opening it in another image program:
>
> 	perl render_blast1.pl data1.txt > data1.png
>
> Another thing to watch out for is that, depending on what version of Perl
> you're using, you may have to insert this statement into the
> render_blast1.pl script (somewhere near the top):
>
> 	binmode STDOUT;
>
> Lincoln
>
> On Saturday 20 May 2006 20:15, chen li wrote:
> > Dear all,
> >
> >
> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I type
> > this line in Cygwin X window:
> >
> >
> > $ perl render_blast1.pl data1.txt | display -
> >
> > And here is the result:
> >
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> >
> > Any idea?
> >
> >
> > Thank you very much,
> >
> > Li
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From chen_li3 at yahoo.com  Tue May 23 21:46:16 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 14:46:16 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231653.36087.lstein@cshl.edu>
Message-ID: <20060523214616.15131.qmail@web36813.mail.mud.yahoo.com>

Dear Dr. Stein,

Thank you so much. I follow your suggestions and
download codes from the Bioperl CVS website. Now
everything is working.


Li 


--- Lincoln Stein <lstein at cshl.edu> wrote:

> Hi Chen,
> 
> It looks to me like you cut and paste the data1.txt
> file from the web site, 
> consequently replacing the tabs with spaces. Please
> get table1.txt from the 
> BioPerl distribution, as instructed in the tutorial.
> 
> Best,
> 
> Lincoln
> 
> On Tuesday 23 May 2006 10:06, Lincoln Stein wrote:
> > Hi,
> >
> > It is possible that your version of display can't
> handle PNG images. Try
> > saving the output as a file and then opening it in
> another image program:
> >
> > 	perl render_blast1.pl data1.txt > data1.png
> >
> > Another thing to watch out for is that, depending
> on what version of Perl
> > you're using, you may have to insert this
> statement into the
> > render_blast1.pl script (somewhere near the top):
> >
> > 	binmode STDOUT;
> >
> > Lincoln
> >
> > On Saturday 20 May 2006 20:15, chen li wrote:
> > > Dear all,
> > >
> > >
> > > I try one script from GraphicsHowTo under Cygwin
> > > environment(GD and libpng already installed). I
> type
> > > this line in Cygwin X window:
> > >
> > >
> > > $ perl render_blast1.pl data1.txt | display -
> > >
> > > And here is the result:
> > >
> > > display: no decode delegate for this image
> format
> > > `/tmp/magick-qKiRPDRS'.
> > >
> > > Any idea?
> > >
> > >
> > > Thank you very much,
> > >
> > > Li
> > >
> > >
> > >
> __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > > http://mail.yahoo.com
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING, 
> PLEASE CONTACT MY ASSISTANT, 
> SANDRA MICHELSEN, AT michelse at cshl.edu
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From chen_li3 at yahoo.com  Tue May 23 22:59:46 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 15:59:46 -0700 (PDT)
Subject: [Bioperl-l] How to download sequence files either in EMBL format
Message-ID: <20060523225946.2118.qmail@web36805.mail.mud.yahoo.com>

Hi all,

I need to download one sequence for a gene. I go to
NCBI website,find the gene of interest,download the
file in Genbank format(saved as sequence.genbank). But
to my surprise this so-called genbank format file
doesn't contain many features such as exons,compared
to the one in Emsembl. 

My question: where can I download this sequence file
in EMBL format? It looks like the one in EMBL might
contain other information such exon.

Thank you very much,

Li

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From osborne1 at optonline.net  Wed May 24 14:33:16 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 24 May 2006 10:33:16 -0400
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <20060523132706.57245.qmail@web36811.mail.mud.yahoo.com>
Message-ID: <C099E6EC.88F0%osborne1@optonline.net>

Li,

The Graphics HOWTO talks about this Windows workaround in _four_ different
places, it's impossible to miss if you read it from start to finish. This is
what one should do if one wants to use these modules and one is a novice.
Example:

Important! Remember that if you are on a Windows platform, you need to put
STDOUT into binary mode so that the PNG file does not go through Window's
carriage return/linefeed transformations. Before the final print statement,
write binmode(STDOUT).

Brian O.


On 5/23/06 9:27 AM, "chen li" <chen_li3 at yahoo.com> wrote:

> BTW I copy  this script from GraphicsHowTo on Bioperl
> website and only one line contains print statement,
> which is 'print $panel->png'. 


From chen_li3 at yahoo.com  Wed May 24 16:17:15 2006
From: chen_li3 at yahoo.com (chen li)
Date: Wed, 24 May 2006 09:17:15 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <C099E6EC.88F0%osborne1@optonline.net>
Message-ID: <20060524161715.45141.qmail@web36807.mail.mud.yahoo.com>

Thanks but Dr. Stein already helps me to figure out
what is going on: I should have copied the source
codes for the examples in CVS instead of "cut and
paste" from the HOWTO tutorial. And sorry for any
inconvience.

Li

--- Brian Osborne <osborne1 at optonline.net> wrote:

> Li,
> 
> The Graphics HOWTO talks about this Windows
> workaround in _four_ different
> places, it's impossible to miss if you read it from
> start to finish. This is
> what one should do if one wants to use these modules
> and one is a novice.
> Example:
> 
> Important! Remember that if you are on a Windows
> platform, you need to put
> STDOUT into binary mode so that the PNG file does
> not go through Window's
> carriage return/linefeed transformations. Before the
> final print statement,
> write binmode(STDOUT).
> 
> Brian O.
> 
> 
> On 5/23/06 9:27 AM, "chen li" <chen_li3 at yahoo.com>
> wrote:
> 
> > BTW I copy  this script from GraphicsHowTo on
> Bioperl
> > website and only one line contains print
> statement,
> > which is 'print $panel->png'. 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From ULNJUJERYDIX at spammotel.com  Thu May 25 01:59:36 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Thu, 25 May 2006 09:59:36 +0800
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have
	negative (-) position numbering imagemap making
Message-ID: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>

Hi
thanks for the help offered thus far!
sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using
bioperl. therefore i was asked to make the numberings as such (-1000) is
there any way at all to do this in bioperl without changing the .pm file?

thanks guys..
kevin


From cjfields at uiuc.edu  Thu May 25 16:43:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 11:43:37 -0500
Subject: [Bioperl-l] Problems with Unflattener.pm
In-Reply-To: <729FFBBD-955B-4689-8A27-66733E81431C@genetics.utah.edu>
Message-ID: <009d01c6801a$5f75d2a0$15327e82@pyrimidine>

I was able to reproduce this using WinXP and bioperl-live.  Seems to get
caught up in the loop during recursion: debugging shows it is unable to get
past 'find_best_matches: (/15)'.  There are lots of unmatched pairs here
with this sequence, so could that be the problem?  I not terribly familiar
with Unflattener...

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Barry Moore
> Sent: Monday, May 22, 2006 8:00 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Problems with Unflattener.pm
> 
> Hi All,
> 
> NT_113910 appears to throw Bio::SeqFeatures::Tools::Unflattener into
> an infinite recursive loop.  The trouble occurs in the method
> find_best_matches between lines 2258 and 2281, and in particular the
> loop is perpetuated by line 2273.   NT_113910 has a fairly complex
> features table, and but I have as yet been unable to figure out why
> this loop is not exiting properly.  This has been submitted to
> bugzilla, but I'll post here so it gets documented on the list also.
> Any suggestions from Chris or others would be greatly appreciated.
> 
> This problem can be recreated as follows:
> 
> Grab NT_113910 from genbank.
> bp_fetch.pl -fmt genbank net::genbank:NT_113910 > NT_113910.gbk
> 
> Pass NT_113910.gbk on the command line to the attached script.
> 
> 
> 
> #!/usr/bin/perl;
> 
> use strict;
> use warnings;
> 
> use Bio::SeqIO;
> use Bio::SeqFeature::Tools::Unflattener;
> 
> my $file = shift;
> 
> # generate an Unflattener object
> my $unflattener = Bio::SeqFeature::Tools::Unflattener->new;
> #$unflattener->verbose(1);
> 
> # first fetch a genbank SeqI object
> my $seqio =
>      Bio::SeqIO->new(-file   => $file,
>                      -format => 'GenBank');
> my $out =
>      Bio::SeqIO->new(-format => 'asciitree');
> while (my $seq = $seqio->next_seq()) {
> 
>          # get top level unflattended SeqFeatureI objects
>          $unflattener->unflatten_seq(-seq       => $seq,
>                                      -use_magic => 1);
>          $out->write_seq($seq);
> }
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu May 25 19:44:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 14:44:01 -0500
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <3607997C-DAD4-4E0E-A919-7D9212AC6D50@duke.edu>
Message-ID: <00a101c68033$95606dd0$15327e82@pyrimidine>

This is due to recent changes in the SwissProt/UniProt format (there
apparently are many other changes besides this).  

>From UniProtKB news (http://ca.expasy.org/sprot/relnotes/sp_news.html) is
this tidbit:
----------------------------------------------------------
 UniProtKB release 7.0 of 07-Feb-2006

    Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB
releases in the DT lines to displaying the date of the biweekly release at
which an entry is integrated or updated. We dropped the information
concerning the release number and introduced entry and sequence version
numbers in the DT lines.

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino
acid sequence is modified. The entry version number is incremented by one
whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates,
using archived UniProtKB releases.

----------------------------------------------------------

Probably should explain on the swissprot wiki page that the format is in a
state of flux at the moment.  I've added this tidbit to the bug page (#2003)
as well.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Monday, May 22, 2006 9:04 PM
> To: Michael Rogoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> 
> We ask that people post patches to the bugzilla as an attachment to
> the bugzilla so we can track what and why the bug was that the patch
> fixes.
> 
> I am not totally sure this patch works because it seems like we need
> to strip out more information now from the DT line if the $date
> actually contains more information than just the date.
> 
> If you would go ahead and create a bug in bugzilla for  this (http://
> bugzilla.open-bio.org) this sort of conversation can be tracked to
> the bug.
> 
> If any of this is unclear please let us know - I though we had put
> some pages up about this sort of thing on the wiki but maybe they
> need to be expanded.
> 
> -jason
> On May 22, 2006, at 9:51 PM, Michael Rogoff wrote:
> 
> > I have a patch that seems to work but I'm not familiar with the
> > proper method to
> > "provide" it.  How do I go about that?
> >
> > The patch is pretty simple, it just parses the sequence version out
> > of the date
> > line where it now hides:
> >
> >          #date
> >          elsif( /^DT\s+(.*)/ ) {
> >            my $date = $1;
> > +
> > +          if ($date =~ /sequence version (\d+)/i) {
> > +              $params{'-seq_version'} ||= $1;
> > +          }
> > +
> >            $date =~ s/\;//;
> >            $date =~ s/\s+$//;
> >            push @{$params{'-dates'}}, $date;
> >          }
> >
> > By the way, what is the difference between Bio::Seq::version and
> > Bio::Seq::RichSeq::seq_version?
> >
> >
> >> -----Original Message-----
> >> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >> Sent: Monday, May 22, 2006 6:37 PM
> >> To: Michael Rogoff
> >> Cc: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> >>
> >>
> >> Sounds like a "missing feature" =)
> >>
> >> AFAIK the module was only written for swissprot files.  It is
> >> possible there have been changes in the format that have not been
> >> tracked to the current code.  We'd certainly appreciate someone
> >> testing it out as versions evolve.  If you submit a bug to bugzilla
> >> with version of bioperl and example files you can track when
> >> a fix is
> >> in.  We of course appreciate anyone's efforts to provide a patch as
> >> most bugs get fixed of late when someone gets "itchy" enough to fix
> >> them.
> >>
> >> -jason
> >>
> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
> >>
> >>>
> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
> >>> ignores the
> >>> sequence version, and calling seq_version() on the resulting
> >>> RichSeq object
> >>> returns undef.
> >>>
> >>> It looks like swiss.pm is trying to parse the version out
> >> of the SV
> >>> line, which
> >>> apparently doesn't exist any more?  The sequence version(s)
> >> are now
> >>> specified as
> >>> part of the Date (DT) lines.
> >>>
> >>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
> >>>
> >>> Thanks for any help ...
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >>
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From miker at biotiquesystems.com  Tue May 23 01:51:10 2006
From: miker at biotiquesystems.com (Michael Rogoff)
Date: Mon, 22 May 2006 18:51:10 -0700
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <B62A5429-083F-4B93-87EF-0F5DCD4033FE@duke.edu>
Message-ID: <003301c67e0b$5dd44410$c100a8c0@mike>

I have a patch that seems to work but I'm not familiar with the proper method to
"provide" it.  How do I go about that?

The patch is pretty simple, it just parses the sequence version out of the date
line where it now hides:

         #date
         elsif( /^DT\s+(.*)/ ) {
           my $date = $1;
+
+          if ($date =~ /sequence version (\d+)/i) {
+              $params{'-seq_version'} ||= $1;
+          }
+
           $date =~ s/\;//;
           $date =~ s/\s+$//;
           push @{$params{'-dates'}}, $date;
         }

By the way, what is the difference between Bio::Seq::version and
Bio::Seq::RichSeq::seq_version?


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Monday, May 22, 2006 6:37 PM
> To: Michael Rogoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> 
> 
> Sounds like a "missing feature" =)
> 
> AFAIK the module was only written for swissprot files.  It is  
> possible there have been changes in the format that have not been  
> tracked to the current code.  We'd certainly appreciate someone  
> testing it out as versions evolve.  If you submit a bug to bugzilla  
> with version of bioperl and example files you can track when 
> a fix is  
> in.  We of course appreciate anyone's efforts to provide a patch as  
> most bugs get fixed of late when someone gets "itchy" enough to fix  
> them.
> 
> -jason
> 
> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
> 
> >
> > As best as I can tell, using Bio::SeqIO to parse a uniprot file  
> > ignores the
> > sequence version, and calling seq_version() on the resulting  
> > RichSeq object
> > returns undef.
> >
> > It looks like swiss.pm is trying to parse the version out 
> of the SV  
> > line, which
> > apparently doesn't exist any more?  The sequence version(s) 
> are now  
> > specified as
> > part of the Date (DT) lines.
> >
> > Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
> >
> > Thanks for any help ...
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 


From chen_li3 at yahoo.com  Tue May 23 15:48:46 2006
From: chen_li3 at yahoo.com (chen li)
Date: Tue, 23 May 2006 08:48:46 -0700 (PDT)
Subject: [Bioperl-l] problems iwth Bio::graphics module
In-Reply-To: <200605231006.28392.lstein@cshl.edu>
Message-ID: <20060523154846.70831.qmail@web36815.mail.mud.yahoo.com>

Dear Dr. Stein,

I have the job partially done by adding this line
(under Cygwin)

print STDOUT $panel->png;

It is done because I can produce the image to be
viewed by other programs but it is only partially done
because I don't get exactly the same image as that
shown on the website. Enclosed is the image I get.

Thank you,

Li

--- Lincoln Stein <lstein at cshl.edu> wrote:

> Hi,
> 
> It is possible that your version of display can't
> handle PNG images. Try 
> saving the output as a file and then opening it in
> another image program:
> 
> 	perl render_blast1.pl data1.txt > data1.png
> 
> Another thing to watch out for is that, depending on
> what version of Perl 
> you're using, you may have to insert this statement
> into the render_blast1.pl 
> script (somewhere near the top):
> 
> 	binmode STDOUT;
> 
> Lincoln
> 
> 
> On Saturday 20 May 2006 20:15, chen li wrote:
> > Dear all,
> >
> >
> > I try one script from GraphicsHowTo under Cygwin
> > environment(GD and libpng already installed). I
> type
> > this line in Cygwin X window:
> >
> >
> > $ perl render_blast1.pl data1.txt | display -
> >
> > And here is the result:
> >
> > display: no decode delegate for this image format
> > `/tmp/magick-qKiRPDRS'.
> >
> > Any idea?
> >
> >
> > Thank you very much,
> >
> > Li
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING, 
> PLEASE CONTACT MY ASSISTANT, 
> SANDRA MICHELSEN, AT michelse at cshl.edu
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: im1
Type: image/x-png
Size: 2423 bytes
Desc: 2615755531-im1
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060523/6870f840/attachment-0004.bin>

From cjfields at uiuc.edu  Fri May 26 01:28:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 25 May 2006 20:28:14 -0500
Subject: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
In-Reply-To: <003301c67e0b$5dd44410$c100a8c0@mike>
References: <003301c67e0b$5dd44410$c100a8c0@mike>
Message-ID: <D422B7D5-C92D-436A-8385-01CFD306DFA8@uiuc.edu>

This patch works only for the recent change in swissprot seq format  
for sequence versions on the DT line.  I checked it out vs the test  
data provided with bioperl (t\data\swiss.dat).  I did manage to get  
it working for both old and new using a modification to your patch  
but there's another issue; using $seq->get_dates, which should only  
show dates, shows the entire line (date and version info).  Jason  
mentioned that there needs to be a better way to address this which  
I'm looking into.

Chris

On May 22, 2006, at 8:51 PM, Michael Rogoff wrote:

> I have a patch that seems to work but I'm not familiar with the  
> proper method to
> "provide" it.  How do I go about that?
>
> The patch is pretty simple, it just parses the sequence version out  
> of the date
> line where it now hides:
>
>          #date
>          elsif( /^DT\s+(.*)/ ) {
>            my $date = $1;
> +
> +          if ($date =~ /sequence version (\d+)/i) {
> +              $params{'-seq_version'} ||= $1;
> +          }
> +
>            $date =~ s/\;//;
>            $date =~ s/\s+$//;
>            push @{$params{'-dates'}}, $date;
>          }
>
> By the way, what is the difference between Bio::Seq::version and
> Bio::Seq::RichSeq::seq_version?
>
>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>> Sent: Monday, May 22, 2006 6:37 PM
>> To: Michael Rogoff
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
>>
>>
>> Sounds like a "missing feature" =)
>>
>> AFAIK the module was only written for swissprot files.  It is
>> possible there have been changes in the format that have not been
>> tracked to the current code.  We'd certainly appreciate someone
>> testing it out as versions evolve.  If you submit a bug to bugzilla
>> with version of bioperl and example files you can track when
>> a fix is
>> in.  We of course appreciate anyone's efforts to provide a patch as
>> most bugs get fixed of late when someone gets "itchy" enough to fix
>> them.
>>
>> -jason
>>
>> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
>>
>>>
>>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
>>> ignores the
>>> sequence version, and calling seq_version() on the resulting
>>> RichSeq object
>>> returns undef.
>>>
>>> It looks like swiss.pm is trying to parse the version out
>> of the SV
>>> line, which
>>> apparently doesn't exist any more?  The sequence version(s)
>> are now
>>> specified as
>>> part of the Date (DT) lines.
>>>
>>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
>>>
>>> Thanks for any help ...
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Fri May 26 14:38:29 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 26 May 2006 10:38:29 -0400
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler have
	negative (-) position numbering imagemap making
In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
Message-ID: <200605261038.30380.lstein@cshl.edu>

Hi,

For some reason I didn't see the first posting on this. In current bioperl 
live, the ruler can have negative numberings - I use this routinely. You need 
to create a feature that starts in negative coordinates. What is happening to 
you when you try this?

Lincoln

On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> Hi
> thanks for the help offered thus far!
> sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq using
> bioperl. therefore i was asked to make the numberings as such (-1000) is
> there any way at all to do this in bioperl without changing the .pm file?
>
> thanks guys..
> kevin
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From jelenaob at gmail.com  Fri May 26 16:47:05 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Fri, 26 May 2006 09:47:05 -0700
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
Message-ID: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>

Hi there,

I have tried loading enzyme list from a file REBASE bairoch.605 using
Bio::Restriction::IO;

1. But for some reason the number of enzymes in the list is always 532
which is a default set of enzymes in enzyme collection.

Is there any known issue with this module or a workaround?

And here is the code I have been using:

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-format=>"Bairoch")
|| die "can't load the file bairoch.605: $!";
my $enzymes = $re_in->read;
print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";

2. The other problem is when trying to use format that is lower-case
it throws an exception, but when "B" is capitalized it is ok.
I assume it cannot load a file and does not initilize enzyme
collection properly.

Can't call method "each_enzyme" on an undefined value at
.../cgi-bin/seq-load.pl line 51.

Any thoughts?


Thanks in advance,


Jelena Obradovic
jelenaob at gmail.com


From cjfields at uiuc.edu  Fri May 26 19:27:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 14:27:13 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
Message-ID: <002601c680fa$644635a0$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> Sent: Friday, May 26, 2006 11:47 AM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
> 
> Hi there,
> 
> I have tried loading enzyme list from a file REBASE bairoch.605 using
> Bio::Restriction::IO;
> 
> 1. But for some reason the number of enzymes in the list is always 532
> which is a default set of enzymes in enzyme collection.
> 
> Is there any known issue with this module or a workaround?
> 
> And here is the code I have been using:
> 
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
> format=>"Bairoch")
> || die "can't load the file bairoch.605: $!";
> my $enzymes = $re_in->read;
> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
 
my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"Bairoch");

should be 

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"bairoch");

Note the case change for the format; this is noted in the bug report you
submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (i.e.
requires a specific format, which I believe is case-sensitive).  Judging by
the modules in Bio/Restriction/IO directory, looks like the
Bio::Restriction::IO format should match one of the following formats:
bairoch, itype2, withrefm, and you can also build your own if needed using
the previous as examples and implementing Bio::Restriction::IO::base.

> 2. The other problem is when trying to use format that is lower-case
> it throws an exception, but when "B" is capitalized it is ok.
> I assume it cannot load a file and does not initilize enzyme
> collection properly.
> 
> Can't call method "each_enzyme" on an undefined value at
> .../cgi-bin/seq-load.pl line 51.

My guess?  The reason it works with an uppercase ('Bairoch') is that it
can't find the module and uses the default set of enzymes as a fallback.
The exception that you reported when you use lowercase ('bairoch') is real
and I reported it as a bug (there are a few I found in that module).

You might want to try using one of the other formats if you can get the
files in the right format from REBASE.  I'm looking into the bugs
specifically associated with Bio::Restriction::IO::bairoch.

> Any thoughts?
> 
> 
> Thanks in advance,
> 
> 
> Jelena Obradovic
> jelenaob at gmail.com
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From osborne1 at optonline.net  Fri May 26 19:43:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 26 May 2006 15:43:18 -0400
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine>
Message-ID: <C09CD296.8961%osborne1@optonline.net>

Chris,

SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA'
should work). This is what the documentation says and what the code seems to
suggest. This is probably what the Restriction modules should do as well.

Brian O.


From cjfields at uiuc.edu  Fri May 26 20:21:03 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 15:21:03 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <C09CD296.8961%osborne1@optonline.net>
Message-ID: <002701c68101$e9432540$15327e82@pyrimidine>

Okay, my bad.  Having the format be case-insensitive makes sense and is
probably an easy fix, but there seem to be more serious issues with the
Bio::Restriction::IO modules at the moment.  None have implemented write
methods though POD implies they work:

SYNOPSIS

    use Bio::Restriction::IO;

    $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                     -format => 'withrefm');
    $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                     -format => 'bairoch');
    my $res = $in->read; # a Bio::Restriction::EnzymeCollection
    $out->write($res);

and no tests exist for Bio::Restriction::IO::bairoch yet.  In fact, the
tests are pretty confusing; when did we allow this syntax: '-format => 8'?
Anyway, I'm muddling my way through this and will probably write something
up for the project priority list if I can't work this bug out.  

Chris

> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Friday, May 26, 2006 2:43 PM
> To: Chris Fields; 'Jelena Obradovic'; Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file
> 
> Chris,
> 
> SeqIO's arguments are case-insensitive (e.g. 'fasta', 'Fasta', 'FASTA'
> should work). This is what the documentation says and what the code seems
> to
> suggest. This is probably what the Restriction modules should do as well.
> 
> Brian O.
> 
> 


From andreas.bender at complife.org  Fri May 26 14:50:03 2006
From: andreas.bender at complife.org (Andreas Bender (CompLife'06))
Date: Fri, 26 May 2006 10:50:03 -0400
Subject: [Bioperl-l] Bioperl-based Applications for "Free Software" Session?
Message-ID: <e83118520605260750w3e66286bmbd6a14be3d2299d6@mail.gmail.com>

Dear All,

Did anyone of you implement some cool programs/tools using Bioperl? Or
is there someone from the Bioperl core team who wants to present
Bioperl itself at our conference? We are holding a "free software"
session (free at least as in free beer, ideally also open source, some
GNU-type license) at our "Computational Life Sciences" Conference in
Cambridge/UK later this year and you are warmly welcome to present
your software there. Please contact me directly or visit the website
in case of any questions.

Enjoy the weekend,
Andreas


                                  Call for Contributions
==================================================
               LIFE SCIENCE FREE SOFTWARE SESSION

          held at CompLife 2006 (http://www.complife.org)
     in Cambridge, United Kingdom, on September 27 - 29, 2006
==================================================
In the last years more and more free and open source software has been
developed for chemo- and bioinformatics, molecular modelling or other
Life Science applications, but many of the programs are not well
known. During the CompLife 2006 conference we will organize a special
session dedicated to this type of free software. The demo session will
be preceeded by a short session having room for brief introductory
presentations whereas the demo session itself will allow attendees to
see the tools in action. Authors of free software will have the
opportunity to present their program to the CompLife audience which
will consist of researchers and users from computer science, biology,
chemistry and everything in between.

In case you are interested in the free software session, send us an
email at fss at complife.org and briefly describe your program and how
you intend to present it at the conference (1-2 pages max - please
include URL to downloadable version where available). The only
restrictions are that the program must be freely available for
everyone or even open source and that it must be related to Life
Science applications. The deadline for these proposals is June, 16th
2006. In mid July we will notify you if your software demo was
accepted.
************************

-- 
Computational Life Sciences '06 Cambridge/UK, 27-29 September 2006:
Visit http://www.complife.org for more information!

Andreas Kieron Patrick Bender - http://www.andreasbender.de
Novartis Institutes for BioMedical Research, Cambridge/MA


From cjfields at uiuc.edu  Fri May 26 21:19:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 26 May 2006 16:19:08 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
Message-ID: <002b01c6810a$06642400$15327e82@pyrimidine>

The POD documentation is a bit misleading for Bio::Restriction::IO.  Brian's
right, there needs to be more flexibility with the case for the formats
used.  I found a few other odd things as well which I may file bug reports
for.  Looks like another post for the project priority list.

 
Chris

 
  _____  

From: Jelena Obradovic [mailto:jobradovic at gmail.com] 
Sent: Friday, May 26, 2006 3:56 PM
To: Chris Fields
Cc: Jelena Obradovic; Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Restriction::IO and REBASE file

 
Hi guys, I tried with the other formats, and it works fine with "withrefm"
format but not with "withref".

Thanks a lot for your reponse.

Cheers,

Jelena

On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> Sent: Friday, May 26, 2006 11:47 AM
> To: Bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file 
>
> Hi there,
>
> I have tried loading enzyme list from a file REBASE bairoch.605 using
> Bio::Restriction::IO;
>
> 1. But for some reason the number of enzymes in the list is always 532 
> which is a default set of enzymes in enzyme collection.
>
> Is there any known issue with this module or a workaround?
>
> And here is the code I have been using:
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- 
> format=>"Bairoch")
> || die "can't load the file bairoch.605: $!";
> my $enzymes = $re_in->read;
> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n"; 

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
                                   format=>"Bairoch");

should be

my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",- 
                                   format=>"bairoch");

Note the case change for the format; this is noted in the bug report you
submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (
i.e.
requires a specific format, which I believe is case-sensitive).  Judging by
the modules in Bio/Restriction/IO directory, looks like the
Bio::Restriction::IO format should match one of the following formats:
bairoch, itype2, withrefm, and you can also build your own if needed using
the previous as examples and implementing Bio::Restriction::IO::base.

> 2. The other problem is when trying to use format that is lower-case 
> it throws an exception, but when "B" is capitalized it is ok.
> I assume it cannot load a file and does not initilize enzyme
> collection properly.
>
> Can't call method "each_enzyme" on an undefined value at 
> .../cgi-bin/seq-load.pl line 51.

My guess?  The reason it works with an uppercase ('Bairoch') is that it
can't find the module and uses the default set of enzymes as a fallback.
The exception that you reported when you use lowercase ('bairoch') is real 
and I reported it as a bug (there are a few I found in that module).

You might want to try using one of the other formats if you can get the
files in the right format from REBASE.  I'm looking into the bugs
specifically associated with Bio::Restriction::IO::bairoch.

> Any thoughts?
>
>
> Thanks in advance,
>
>
> Jelena Obradovic
> jelenaob at gmail.com  <mailto:jelenaob at gmail.com> 
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Jelena Obradovic
Email: jobradovic at gmail.com


From jay at jays.net  Sat May 27 16:47:27 2006
From: jay at jays.net (Jay Hannah)
Date: Sat, 27 May 2006 11:47:27 -0500
Subject: [Bioperl-l] "Project OpenLab" (working title)
Message-ID: <4478829F.5030508@jays.net>

Hola --

We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)

   "Project OpenLab":
   http://omaha.pm.org/kwiki/?BioPerl

- Does any such project already exist? 
- If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 
- I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
- I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
- I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
- I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.

Thanks for your time,

j


From fernan at iib.unsam.edu.ar  Sat May 27 22:30:44 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Sat, 27 May 2006 19:30:44 -0300
Subject: [Bioperl-l] "Project OpenLab" (working title)
In-Reply-To: <4478829F.5030508@jays.net>
References: <4478829F.5030508@jays.net>
Message-ID: <20060527223044.GA40583@iib.unsam.edu.ar>

+----[ Jay Hannah <jay at jays.net> (27.May.2006 15:15):
|
| Hola --

Hola!

| We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)
| 
|    "Project OpenLab":
|    http://omaha.pm.org/kwiki/?BioPerl
| 
| - Does any such project already exist? 

mmm ... maybe ... both GUS (Genomics Unified Schema:
gusdb.org, though not developed around bioperl) and GMOD
(Generic Model Organism Database: gmod.org) provide you with 
i) RDBMS storage
ii) a Perl object layer
iii) a web app framework

Though certainly overkill for the needs you describe
in the wiki, they can be customized to work in the way you
describe or at least serve as a guide.

| - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 

Have you considered Perl Catalyst? It has the benefits of
allowing you to work with bioperl modules naturally (it's
Perl!) a choice of templating toolkits (Template Toolkit, Mason,
among others) and will provide you with an almost ready to
go controller/url dispatcher.

| - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
| - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
| - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
| - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.
| 
| Thanks for your time,
| 
| j
|
+----]

Good luck,

Fernan


From epsteinj at mail.nih.gov  Fri May 26 18:46:32 2006
From: epsteinj at mail.nih.gov (Epstein, Jonathan A (NIH/NICHD) [E])
Date: Fri, 26 May 2006 14:46:32 -0400
Subject: [Bioperl-l] URGENT: Bio::Graphics::Panel make the ruler
	havenegative (-) position numbering imagemap making
In-Reply-To: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
Message-ID: <42504F69898FE546B3F0238C9BD032750915F8@NIHCESMLBX7.nih.gov>

While this is being discussed and we have Lincoln's attention; in example 4 on the Biographics Howto:
   http://stein.cshl.org/genome_informatics/BioGraphics/Graphics-HOWTO.html
how can one assign directional arrows to the graded segments which represent the BLAST hits?  I.e., is there a glyph type which is both an 'arrow' and a 'graded_segment'?  What other techniques do you recommend for associating directionality with these hits?

Thanks&regards,

Jonathan


From jobradovic at gmail.com  Fri May 26 20:55:35 2006
From: jobradovic at gmail.com (Jelena Obradovic)
Date: Fri, 26 May 2006 13:55:35 -0700
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <002601c680fa$644635a0$15327e82@pyrimidine>
References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
	<002601c680fa$644635a0$15327e82@pyrimidine>
Message-ID: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>

Hi guys, I tried with the other formats, and it works fine with "withrefm"
format but not with "withref".

Thanks a lot for your reponse.

Cheers,

Jelena

On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
> > Sent: Friday, May 26, 2006 11:47 AM
> > To: Bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
> >
> > Hi there,
> >
> > I have tried loading enzyme list from a file REBASE bairoch.605 using
> > Bio::Restriction::IO;
> >
> > 1. But for some reason the number of enzymes in the list is always 532
> > which is a default set of enzymes in enzyme collection.
> >
> > Is there any known issue with this module or a workaround?
> >
> > And here is the code I have been using:
> >
> > my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
> > format=>"Bairoch")
> > || die "can't load the file bairoch.605: $!";
> > my $enzymes = $re_in->read;
> > print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>                                    format=>"Bairoch");
>
> should be
>
> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>                                    format=>"bairoch");
>
> Note the case change for the format; this is noted in the bug report you
> submitted earlier.  Bio::Restriction::IO works similarly to Bio::SeqIO (
> i.e.
> requires a specific format, which I believe is case-sensitive).  Judging
> by
> the modules in Bio/Restriction/IO directory, looks like the
> Bio::Restriction::IO format should match one of the following formats:
> bairoch, itype2, withrefm, and you can also build your own if needed using
> the previous as examples and implementing Bio::Restriction::IO::base.
>
> > 2. The other problem is when trying to use format that is lower-case
> > it throws an exception, but when "B" is capitalized it is ok.
> > I assume it cannot load a file and does not initilize enzyme
> > collection properly.
> >
> > Can't call method "each_enzyme" on an undefined value at
> > .../cgi-bin/seq-load.pl line 51.
>
> My guess?  The reason it works with an uppercase ('Bairoch') is that it
> can't find the module and uses the default set of enzymes as a fallback.
> The exception that you reported when you use lowercase ('bairoch') is real
> and I reported it as a bug (there are a few I found in that module).
>
> You might want to try using one of the other formats if you can get the
> files in the right format from REBASE.  I'm looking into the bugs
> specifically associated with Bio::Restriction::IO::bairoch.
>
> > Any thoughts?
> >
> >
> > Thanks in advance,
> >
> >
> > Jelena Obradovic
> > jelenaob at gmail.com
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Jelena Obradovic
Email: jobradovic at gmail.com


From gad14 at cornell.edu  Fri May 26 20:02:33 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Fri, 26 May 2006 16:02:33 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
Message-ID: <44775ED9.4020208@cornell.edu>

Hi,

I'm running local blast with Bio::Tools::Run::StandAloneBlast. 
Everything seems to work ok up to the point of accessing the results. I 
am able to print the results but when I try to do more than one thing 
with the result, nothing is returned for the second activity..

I'd like to first sort the results into groups of results that hit the 
db seq once, twice, three times, etc - where the results are stored as 
SeqFeature objects in temporary arrays whose contents are printed 
sequentially to stdout when the whole sort is complete.

Secondly, I need to print the results in Hit Table (i.e. -m 8) format to 
stdout.

If I've sorted the results the sorted-results will print to screen, 
however when I try to print the Hit Table results nothing is returned, 
as if the blast results have evaporated.... and visa versa, if i comment 
out the part where i point my sorting subroutine to the blast results 
reference,  my hit table results suddenly prints to screen. It's almost 
like the reference to the SearchIO obj that holds the StandAloneBlast 
results is lost after one use?? (I'm beginning to think there is 
something naive about the way I'm using references?..)


Here's an abbreviated version of my code:


my $ref_seq_objs; # ref to array of Sequence obj's
my $genome_seq; # fasta containing 1 genomic sequence

my @params = ('program' => 'blastn',
	       'database' => $genome_seq,
                 );
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

my $blast_report = $factory->blastall($ref_seq_objs); #OK

#######
### the following 2 actions seem to be mutually exclusive.
# 1) sort results into 1-hitter, 2-hitter, etc. groups of
# SeqFeature objs stored in arrays. arrays are then printed
# to stdout
&sort_results($blast_report);

# 2) print blast results
&print_blast_results($blast_report);
#######


sub print_blast_results{
   my $report = shift;
   while(my $result = $report->next_result()){
     while(my $hit = $result->next_hit()){
       while(my $hsp = $hit->next_hsp()){
	my $q_name = $hsp_q_seq_obj->display_id;
         print join(", ",$q_name,$hit->name,$hsp->bits)."\n";
       }
     }
   }
}


I'm about to lose my mind on this... any assistance appreciated!

Thanks,
Genevieve


From rvosa at sfu.ca  Sun May 28 07:43:23 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Sun, 28 May 2006 00:43:23 -0700
Subject: [Bioperl-l] "Project OpenLab" (working title)
In-Reply-To: <4478829F.5030508@jays.net>
References: <4478829F.5030508@jays.net>
Message-ID: <4479549B.5030202@sfu.ca>

The TreeBaseII team (part of the cipres project: http://www.phylo.org) 
are working on a lab database system for storage of intermediate 
calculation results and data (sequence alignments, trees, taxon sets). I 
think what you're discussing is a bit more molecular and less 
phylogenetic, but it does sound similar in spirit.

Rutger

Jay Hannah wrote:
> Hola --
>
> We've been kicking around this idea for a few months now. I'm threatening to start coding. Once I do I might not sleep for a few weeks so I thought I'd solicit feedback now. :)
>
>    "Project OpenLab":
>    http://omaha.pm.org/kwiki/?BioPerl
>
> - Does any such project already exist? 
> - If there's no other obvious choice already bent to BioPerl / BioPerl DB / BioSQL, I'll probably be writing the web framework in Perl's Template Toolkit. The server is Linux, Apache, mySQL (BioPerl DBs). 
> - I'll be using BioPerl objects for the persistence layer as much as possible. Where not possible I'll ask this list about my patches/additions/ugly hackery.
> - I'll be discussing my back office tables like "users" that don't belong in bioperl-db; and my questions about new tables that might belong there on the BioSQL-l mailing list.
> - I'm not a computer language zealot (usually), so I'm open to out-of-the-box ideas from anyone.
> - I'm a biology newb with a long Perl/database/web/e-commerce background, so please feel free to point out any bio idiocy I engage in.
>
> Thanks for your time,
>
> j
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From cjfields at uiuc.edu  Sun May 28 13:55:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 28 May 2006 08:55:47 -0500
Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
In-Reply-To: <286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
References: <5042a62b0605260947t486447adt2720e8ef8a464e2a@mail.gmail.com>
	<002601c680fa$644635a0$15327e82@pyrimidine>
	<286f332a0605261355o5a1ff9bas555fdd3913e1cd75@mail.gmail.com>
Message-ID: <EA78F27A-074E-4C9D-AC70-27D4CC20F8C4@uiuc.edu>

Again, it's b/c 'withrefm' is a valid Restriction::IO module and  
'withref' is not.  Similar to the case issue you saw before with  
'bairoch.'  Making this more lenient would help but there are more  
serious issues with these modules that need to be addressed...

http://www.bioperl.org/wiki/Project_priority_list#Restriction_Enzymes

Chris

On May 26, 2006, at 3:55 PM, Jelena Obradovic wrote:

> Hi guys, I tried with the other formats, and it works fine with  
> "withrefm"
> format but not with "withref".
>
> Thanks a lot for your reponse.
>
> Cheers,
>
> Jelena
>
> On 5/26/06, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Jelena Obradovic
>>> Sent: Friday, May 26, 2006 11:47 AM
>>> To: Bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Bio::Restriction::IO and REBASE file
>>>
>>> Hi there,
>>>
>>> I have tried loading enzyme list from a file REBASE bairoch.605  
>>> using
>>> Bio::Restriction::IO;
>>>
>>> 1. But for some reason the number of enzymes in the list is  
>>> always 532
>>> which is a default set of enzymes in enzyme collection.
>>>
>>> Is there any known issue with this module or a workaround?
>>>
>>> And here is the code I have been using:
>>>
>>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>> format=>"Bairoch")
>>> || die "can't load the file bairoch.605: $!";
>>> my $enzymes = $re_in->read;
>>> print "\nNo of enzymes: ", scalar $enzymes->each_enzyme, "\n";
>>
>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>                                    format=>"Bairoch");
>>
>> should be
>>
>> my $re_in=Bio::Restriction::IO->new(-file=>"bairoch_605.dat",-
>>                                    format=>"bairoch");
>>
>> Note the case change for the format; this is noted in the bug  
>> report you
>> submitted earlier.  Bio::Restriction::IO works similarly to  
>> Bio::SeqIO (
>> i.e.
>> requires a specific format, which I believe is case-sensitive).   
>> Judging
>> by
>> the modules in Bio/Restriction/IO directory, looks like the
>> Bio::Restriction::IO format should match one of the following  
>> formats:
>> bairoch, itype2, withrefm, and you can also build your own if  
>> needed using
>> the previous as examples and implementing Bio::Restriction::IO::base.
>>
>>> 2. The other problem is when trying to use format that is lower-case
>>> it throws an exception, but when "B" is capitalized it is ok.
>>> I assume it cannot load a file and does not initilize enzyme
>>> collection properly.
>>>
>>> Can't call method "each_enzyme" on an undefined value at
>>> .../cgi-bin/seq-load.pl line 51.
>>
>> My guess?  The reason it works with an uppercase ('Bairoch') is  
>> that it
>> can't find the module and uses the default set of enzymes as a  
>> fallback.
>> The exception that you reported when you use lowercase ('bairoch')  
>> is real
>> and I reported it as a bug (there are a few I found in that module).
>>
>> You might want to try using one of the other formats if you can  
>> get the
>> files in the right format from REBASE.  I'm looking into the bugs
>> specifically associated with Bio::Restriction::IO::bairoch.
>>
>>> Any thoughts?
>>>
>>>
>>> Thanks in advance,
>>>
>>>
>>> Jelena Obradovic
>>> jelenaob at gmail.com
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> -- 
> Jelena Obradovic
> Email: jobradovic at gmail.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From osborne1 at optonline.net  Sun May 28 15:03:37 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 28 May 2006 11:03:37 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
Message-ID: <C09F3409.8992%osborne1@optonline.net>

Genevieve,

Does this simplified code, without the &sort_results($blast_report) line,
work?

By the way, no one can really help you here because you haven't shown us all
of the code. The code you are showing certainly looks OK.


Brian O.


On 5/26/06 4:02 PM, "Genevieve DeClerck" <gad14 at cornell.edu> wrote:

> &sort_results($blast_report);


From simon.rayner.mlist at gmail.com  Mon May 29 07:37:24 2006
From: simon.rayner.mlist at gmail.com (mailing lists)
Date: Mon, 29 May 2006 15:37:24 +0800
Subject: [Bioperl-l] installation problems with bioperl-ext on x86_64
	running SuSE linux
Message-ID: <f73437f70605290037q3c7637e4h29faa3aed16ec77a@mail.gmail.com>

Hello,

i'm having a problem trying to install the bioperl-ext package on my
system.

biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # perl Makefile.PL
Writing Makefile for Bio::Ext::Align
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align # make
cc -c  -I./libs -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -fPIC -O2 -fmessage-length=0 -Wall
-D_FORTIFY_SOURCE=2 -g -Wall -pipe   -DVERSION=\"0.1\" -DXS_VERSION=
\"0.1\" -fPIC "-I/usr/lib/perl5/5.8.7/x86_64-linux-thread-multi/CORE"
-DPOSIX -DNOERROR Align.c
In file included from Align.xs:12:
./libs/sw.h:1360:1: warning: "/*" within comment
.
.
.
Running Mkbootstrap for Bio::Ext::Align ()
chmod 644 Align.bs
rm -f blib/arch/auto/Bio/Ext/Align/Align.so
LD_RUN_PATH="" cc  -shared -L/usr/local/lib64 Align.o  -o
blib/arch/auto/Bio/Ext/Align/Align.so libs/libsw.a  -lm
/usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld:
libs/libsw.a(aln.o): relocation R_X86_64_32 against `a local symbol' can not
be used when making a shared object; recompile with -fPIC
libs/libsw.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [blib/arch/auto/Bio/Ext/Align/Align.so] Error 1
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #
biowiv:~/.cpan/build/bioperl-ext-1.4/Bio/Ext/Align #

the -fPIC flag is already set in the makefile.

I found a similar problem in an earlier posting with the following
suggestions....


  From: Aaron J. Mackey <amackey <at> pcbi.upenn.edu>
  Subject: Re: compiling bioperl-ext
  Newsgroups: gmane.comp.lang.perl.bio.general
  Date: 2004-06-09 20:46:05 GMT (1 year, 50 weeks, 3 days, 3 hours and 50
  minutes ago)

  1) Are you starting with a clean build directory?

  2) Does installing other compiled Perl modules work for you (e.g.
  Data::Dumper or Storable)?

  That's a pretty arcane error, and if the answer to #2 is "no", then I
  don't think we can help you.

  -Aaron


....In my case, both 1) and 2) are true.  I installed Data::Dumper without
any problems.


I've found plenty of similar incidences for other sofware and it seems to
relate to
32/64bit issues.

Does anyone have any suggestions about how to get around this?

thanks

Simon Rayner


From ULNJUJERYDIX at spammotel.com  Mon May 29 09:46:21 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Mon, 29 May 2006 17:46:21 +0800
Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the
	ruler have
In-Reply-To: <200605261038.30380.lstein@cshl.edu>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
	<200605261038.30380.lstein@cshl.edu>
Message-ID: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>

Hi!
oh it was in a slightly different header asking about the create image map
feature.
I am using the stable version 1.4 of bioperl now. In any case I have not
added the sequence as a feature annotated seq. as I already have the bp
where the TF binds (in 1-1050 numberings) so what I did was to just add
graded segments based on the position.
I saw that there is a scale function for the arrow glyp however, it is a
multiply function, can it be hacked to take in a offset value (ie minus the
scale by 1000?)

cheers
kevin


Hi,
>
> For some reason I didn't see the first posting on this. In current bioperl
> live, the ruler can have negative numberings - I use this routinely. You
> need
> to create a feature that starts in negative coordinates. What is happening
> to
> you when you try this?
>
> Lincoln
>
> On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > Hi
> > thanks for the help offered thus far!
> > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> using
> > bioperl. therefore i was asked to make the numberings as such (-1000) is
> > there any way at all to do this in bioperl without changing the .pm
> file?
> >
> > thanks guys..
> > kevin
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From shameer at ncbs.res.in  Mon May 29 10:07:17 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 29 May 2006 15:37:17 +0530 (IST)
Subject: [Bioperl-l] Reg. Integrated Server / CGI to pass PDB to multiple
	Servers
Message-ID: <49187.192.168.1.1.1148897237.squirrel@192.168.1.1>

Dear All,

My query may not be directly related to BioPERL, But am sure I will get
some idea to move on. Some possibilities wil be available from Pise or
related modules

Query :
---------
We have several public servers(say a,b,c). All of them will take a
pdb-file as an input and process it and displays it. Now, I need to create
a web page(a meta-server/integrated web-server) with three radio
buttons(a,b,c) and a single input form(to accept pdb file from the users
...:( - File passing as an argument seems to be some what impossible to
me). I need output as 3 links in next page.

Is there any Bio-PERL module / CGI / Perl tricks to do it ?

Thanks in advance,
-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675
W - http://caps.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."


From torsten.seemann at infotech.monash.edu.au  Tue May 30 06:41:31 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 30 May 2006 16:41:31 +1000
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
References: <44775ED9.4020208@cornell.edu>
Message-ID: <447BE91B.30001@infotech.monash.edu.au>

> my $ref_seq_objs; # ref to array of Sequence obj's
> my $genome_seq; # fasta containing 1 genomic sequence
> my @params = ('program' => 'blastn',
> 	       'database' => $genome_seq,
 >                  );

The database parameter needs to be the same thing you would pass to the 
"-d" option in "blastall". I don't think you can pass a perl string 
here. ie. there needs to be a properly formatted set of blast indices 
for your genome sequence on the disk in the appropriate place.
See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html

> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
> my $blast_report = $factory->blastall($ref_seq_objs); #OK

But I could be wrong, and $blast_report here contains a valid BLAST report.

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From sb at mrc-dunn.cam.ac.uk  Tue May 30 07:59:28 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Tue, 30 May 2006 08:59:28 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <44775ED9.4020208@cornell.edu>
References: <44775ED9.4020208@cornell.edu>
Message-ID: <447BFB60.4000006@mrc-dunn.cam.ac.uk>

Genevieve DeClerck wrote:
> Hi,
[snip]
> If I've sorted the results the sorted-results will print to screen, 
> however when I try to print the Hit Table results nothing is returned, 
> as if the blast results have evaporated.... and visa versa, if i comment 
> out the part where i point my sorting subroutine to the blast results 
> reference,  my hit table results suddenly prints to screen.
[snip]
> Here's an abbreviated version of my code:
[snip]
> #######
> ### the following 2 actions seem to be mutually exclusive.
> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
> # SeqFeature objs stored in arrays. arrays are then printed
> # to stdout
> &sort_results($blast_report);
> 
> # 2) print blast results
> &print_blast_results($blast_report);

> sub print_blast_results{
>    my $report = shift;
>    while(my $result = $report->next_result()){
[snip]

You didn't give us your sort_results subroutine, but is it as simple as
they both use $report->next_result (and/or $result->next_hit), but you
don't reset the internal counter back to the start, so the second
subroutine tries to get the next_result and finds the first subroutine
has already looked at the last result and so next_result returns false?

 From a quick look it wasn't obvious how to reset the counter. Hopefully
this can be done and someone else knows how.


From torsten.seemann at infotech.monash.edu.au  Tue May 30 08:18:45 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 30 May 2006 18:18:45 +1000
Subject: [Bioperl-l] For CVS developers - potential pitfall with "return
	undef"
Message-ID: <447BFFE5.8010508@infotech.monash.edu.au>

FYI Bioperl developers:

I just audited the bioperl-live CVS and found about 450 occurrences of 
"return undef".

Page 199 of "Perl Best Practices" by Damian Conway, and this URL
http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:

"Use return; instead of return undef; if you want to return nothing. If 
someone assigns the return value to an array, the latter creates an 
array of one value (undef), which evaluates to true. The former will 
correctly handle all contexts."

So I'm guessing at least some of these 450 occurrences *could* result in 
bugs and should probably be changed.

Your opinion may differ :-)

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From cjfields at uiuc.edu  Tue May 30 14:07:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 09:07:45 -0500
Subject: [Bioperl-l] For CVS developers - potential pitfall with
	"returnundef"
In-Reply-To: <447BFFE5.8010508@infotech.monash.edu.au>
Message-ID: <000c01c683f2$6ca62570$15327e82@pyrimidine>

Torsten,

Any way you can post a list of some/all of the offending lines or modules?
Sounds like something to consider, but if the list is as large as you say we
made need something (bugzilla? wiki?) to track the changes and make sure
they pass tests; I'm sure a large majority will.  

I'm guessing Jason would want this somewhere on the project priority list or
bugzilla, with a link to the actual list, but I'm not sure.  Maybe start a
page on the wiki for proposed code changes?

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> Sent: Tuesday, May 30, 2006 3:19 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> "returnundef"
> 
> FYI Bioperl developers:
> 
> I just audited the bioperl-live CVS and found about 450 occurrences of
> "return undef".
> 
> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
> 
> "Use return; instead of return undef; if you want to return nothing. If
> someone assigns the return value to an array, the latter creates an
> array of one value (undef), which evaluates to true. The former will
> correctly handle all contexts."
> 
> So I'm guessing at least some of these 450 occurrences *could* result in
> bugs and should probably be changed.
> 
> Your opinion may differ :-)
> 
> --
> Dr Torsten Seemann               http://www.vicbioinformatics.com
> Victorian Bioinformatics Consortium, Monash University, Australia
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Tue May 30 14:47:48 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 30 May 2006 10:47:48 -0400
Subject: [Bioperl-l] **Fwd: Re: URGENT: Bio::Graphics::Panel make the
	ruler have
In-Reply-To: <5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>
References: <5b6410e0605241859t75d412bap45718d3cfe4bb77b@mail.gmail.com>
	<200605261038.30380.lstein@cshl.edu>
	<5b6410e0605290246p8875c78n286caa672a55b4de@mail.gmail.com>
Message-ID: <200605301047.49127.lstein@cshl.edu>

Hi Kevin,

I'm afraid that there is no offset value. You'll need the 1.51 version of 
bioperl to handle negative numbers properly. I understand your reluctance to 
upgrade just to get the Bio::Graphics functionality. You might consider 
checking out just the Bio/Graphics subtree and installing that. It should 
work on top of 1.4

Lincoln

On Monday 29 May 2006 05:46, Kevin Lam Koiyau wrote:
> Hi!
> oh it was in a slightly different header asking about the create image map
> feature.
> I am using the stable version 1.4 of bioperl now. In any case I have not
> added the sequence as a feature annotated seq. as I already have the bp
> where the TF binds (in 1-1050 numberings) so what I did was to just add
> graded segments based on the position.
> I saw that there is a scale function for the arrow glyp however, it is a
> multiply function, can it be hacked to take in a offset value (ie minus the
> scale by 1000?)
>
> cheers
> kevin
>
>
> Hi,
>
> > For some reason I didn't see the first posting on this. In current
> > bioperl live, the ruler can have negative numberings - I use this
> > routinely. You need
> > to create a feature that starts in negative coordinates. What is
> > happening to
> > you when you try this?
> >
> > Lincoln
> >
> > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > Hi
> > > thanks for the help offered thus far!
> > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> >
> > using
> >
> > > bioperl. therefore i was asked to make the numberings as such (-1000)
> > > is there any way at all to do this in bioperl without changing the .pm
> >
> > file?
> >
> > > thanks guys..
> > > kevin
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Tue May 30 14:50:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 09:50:06 -0500
Subject: [Bioperl-l] Bio::Restriction::IO issues
Message-ID: <000f01c683f8$5771ed50$15327e82@pyrimidine>

Jason, Brian, et al,

I found several major issues with Bio::Restriction::IO (this popped up while
bug squashing).  In particular, the POD is pretty misleading.  It states
(directly from perldoc):

SYNOPSIS
        use Bio::Restriction::IO;

        $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                         -format => 'withrefm');
        $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                         -format => 'bairoch');
        my $res = $in->read; # a Bio::Restriction::EnzymeCollection
        $out->write($res);

      # or

      #    use Bio::Restriction::IO;
      #
      #    #input file format can be read from the file extension (dat|xml)
      #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
      #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
      #
      #    # World's shortest flat<->xml format converter:
      #    print $out $_ while <$in>;

So, I have found several problems with these modules.  I really hate to
criticize code here, as my own is pretty hacky, but I think these are things
to seriously mull over: 

1)	Note that, though some of the lines above are commented they are
still there in POD and thus present in perldoc/pod2html etc.  So, judging
from the above, it suggests using the script above should read in from one
format and write out to another (like SeqIO).  However, NONE of the current
write() methods are implemented for any of the IO modules (withref, base,
itype2, bairoch), so this does not happen as expected.  You get the nasty
thrown 'method not implemented error' instead when writing.
2)	The commented statements in POD above also suggest that REBASE XML
format is supported when there is no XML module.  
3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
made it unusable until I added a few small changes; it still can't handle
multisite/multicut enzymes properly, so in essence it is useless until that
is addressed.
4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make
up it's own methods?  

I'm working on at least getting the 'bairoch' input format up and running
(so at least it gets the enzymes into a
Bio::Restriction::Enzyme::Collection).  From this point I'm not sure where
to proceed.  The POD obviously needs to be corrected to reflect that writing
formats is not implemented (and the bit about XML should be taken out
completely); that's the easy part which I am working on and plan committing
today.  However, these modules don't seem to be used too frequently so I'm
not sure whether it's worth spending too much time getting these up to speed
at the moment (adding write methods, switching to Bio::Root::Root, etc); I
have other priorities at the moment (including a way overdue ListSummary).
I'm also not sure who else is (using|working) on these so I don't want to
(make too many changes|step on someone else's toes), but these are, IMHO,
pretty serious problems.  

Any thoughts?

Chris


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Tue May 30 16:34:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 11:34:18 -0500
Subject: [Bioperl-l] Bio::Restriction::IO changes
Message-ID: <001401c68406$e71e9850$15327e82@pyrimidine>

Jason, Brian, et al:

I have made changes to the Bio::Restriction::IO POD to remove any reference
to write functions since almost none have been implemented yet, so including
this into POD is a bit misleading.  At the moment, you can't write to any
REBASE format except for 'base', which I found is the only one that works.
And, upon further checking, even that one has issues: it looks like there
are problems with multicut/multisite enzymes when writing in 'base' format
which I'm not delving into ('TaqII' only displays one site when writing when
it has two cut sites).  I'll add this to the wiki and a bug report
(enhancement) for this module.

I am also removing mention of XML and 'bairoch' formats (the former isn't
present and the latter is broken at the moment) and added a few things to
the POD TO DO section.  

Rob (if you're out there somewhere in the ether), have you made any more
changes to these modules that need to be committed?  Didn't know if any of
these issues have already been addressed/changed etc.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From jelenaob at gmail.com  Tue May 30 04:58:35 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Mon, 29 May 2006 21:58:35 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>

Hello everybody,

does anybody know how to remove the background color of the Panel.
Currently, I am not adding anything to it, so I can troubleshot the problem,
and I have tried setting up
all color attributes I could find to the panel, but no luck. Whatever I do,
I get the BLUE border of the panel.

Has anybody faced the same problem?

Thanks in advance,

Jelena

And here is the code I am currently using:

-----------------------------------------------------------------------------------------------------------
my $panel =
    Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
                              -width => 800,
                              -pad_left => 10,
                              -pad_right => 10,
                              -key_color => 'white',
                              -bgcolor => 'white',
                              -gridcolor=>'black',
                              -fgcolor => 'black',
                              -grid => 0,
                              );
   my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
     -url  => '/tmpimages');
   #make clickable image
   print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
   print $map;

-----------------------------------------------------------------------------------------------------------


From jelenaob at gmail.com  Tue May 30 04:58:35 2006
From: jelenaob at gmail.com (Jelena Obradovic)
Date: Mon, 29 May 2006 21:58:35 -0700
Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color
Message-ID: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com>

Hello everybody,

does anybody know how to remove the background color of the Panel.
Currently, I am not adding anything to it, so I can troubleshot the problem,
and I have tried setting up
all color attributes I could find to the panel, but no luck. Whatever I do,
I get the BLUE border of the panel.

Has anybody faced the same problem?

Thanks in advance,

Jelena

And here is the code I am currently using:

-----------------------------------------------------------------------------------------------------------
my $panel =
    Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200,
                              -width => 800,
                              -pad_left => 10,
                              -pad_right => 10,
                              -key_color => 'white',
                              -bgcolor => 'white',
                              -gridcolor=>'black',
                              -fgcolor => 'black',
                              -grid => 0,
                              );
   my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' ,
     -url  => '/tmpimages');
   #make clickable image
   print $cgi->img({-src=>$url,-usemap=>"#$mapname"});
   print $map;

-----------------------------------------------------------------------------------------------------------


From luciap at sas.upenn.edu  Tue May 30 18:49:48 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Tue, 30 May 2006 14:49:48 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
Message-ID: <1149014988.447c93cc01761@128.91.55.38>

Hi

I am here again, I finally got to write the "collapse nodes" function and have a
couple of questions.

In order to collpase any node $node, I first have to get the parent
which I can do as $parent=$node->ancestor

and then the children as:
@children=$node->get_all_Descendents (or should I use each descendent?)

Then before deleting $node I have to assign all its children to $parent,
and here is where I am kind of confussed.
Can I use the add_Descendent function for this?
I've been tryig to write something like this:
foreach $child (@children){
         $parent=add_Descendent->$child;
}
but this doesn't work and I think it is because I don't have any idea of what I
am doing
any suggestions?

thanks


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From rvosa at sfu.ca  Tue May 30 18:52:52 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 11:52:52 -0700
Subject: [Bioperl-l] For CVS developers - potential pitfall
	with	"returnundef"
In-Reply-To: <000c01c683f2$6ca62570$15327e82@pyrimidine>
References: <000c01c683f2$6ca62570$15327e82@pyrimidine>
Message-ID: <447C9484.9030102@sfu.ca>

Although I agree with the sentiment of following PBP, I'm not so sure 
changing 'return undef' to 'return' *now* will fix any bugs without 
introducing new, subtle ones.

Chris Fields wrote:
> Torsten,
>
> Any way you can post a list of some/all of the offending lines or modules?
> Sounds like something to consider, but if the list is as large as you say we
> made need something (bugzilla? wiki?) to track the changes and make sure
> they pass tests; I'm sure a large majority will.  
>
> I'm guessing Jason would want this somewhere on the project priority list or
> bugzilla, with a link to the actual list, but I'm not sure.  Maybe start a
> page on the wiki for proposed code changes?
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>> Sent: Tuesday, May 30, 2006 3:19 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>> "returnundef"
>>
>> FYI Bioperl developers:
>>
>> I just audited the bioperl-live CVS and found about 450 occurrences of
>> "return undef".
>>
>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
>>
>> "Use return; instead of return undef; if you want to return nothing. If
>> someone assigns the return value to an array, the latter creates an
>> array of one value (undef), which evaluates to true. The former will
>> correctly handle all contexts."
>>
>> So I'm guessing at least some of these 450 occurrences *could* result in
>> bugs and should probably be changed.
>>
>> Your opinion may differ :-)
>>
>> --
>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>> Victorian Bioinformatics Consortium, Monash University, Australia
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From luciap at sas.upenn.edu  Tue May 30 20:11:52 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Tue, 30 May 2006 16:11:52 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
Message-ID: <1149019912.447ca7085124e@128.91.55.38>

Hi
OK that was silly, but what I have in my code is what you just wrote
But the problem is that if I write

$parent->add_Descendent($child)

it tells me that I am calling  the method "ass_Descendent" on an undefined value
(but I did define $parent before??)

So here it goes the code so far:

use Bio::TreeIO;
 my $in = new Bio::TreeIO(-file => 'Test2.tre',
                          -format => 'newick');
 my $out = new Bio::TreeIO(-file => '>mytree.out',
                           -format => 'newick');
 while( my $tree = $in->next_tree ) {
    foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
    my $bootstrap=$node->_creation_id;

    if ($bootstrap < 70 ){
            my $parent = $node->ancestor;
            my @children=$node->get_all_Descendents;
            foreach my $child (@children){
                $parent->add_Descendent($child);
            }

........

eventually I'll add (once I assigned the children to the parent succesfully):
$tree->remove_Node($node);

        }
    }
    $out->write_tree($tree);
}

Quoting aaron.j.mackey at gsk.com:

> > foreach $child (@children){
> >          $parent=add_Descendent->$child;
> > }
>
> I think what you want is $parent->add_Descendent($child)
>
> -Aaron
>


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From jason.stajich at duke.edu  Tue May 30 20:30:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 30 May 2006 16:30:56 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <1149019912.447ca7085124e@128.91.55.38>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
Message-ID: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>

you need to special case the root - it won't have an ancestor.  just  
protect the my $parent = $node->ancestor with an if statement as I  
did below

On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:

> Hi
> OK that was silly, but what I have in my code is what you just wrote
> But the problem is that if I write
>
> $parent->add_Descendent($child)
>
> it tells me that I am calling  the method "ass_Descendent" on an  
> undefined value
> (but I did define $parent before??)
>
> So here it goes the code so far:
>
> use Bio::TreeIO;
>  my $in = new Bio::TreeIO(-file => 'Test2.tre',
>                           -format => 'newick');
>  my $out = new Bio::TreeIO(-file => '>mytree.out',
>                            -format => 'newick');
>  while( my $tree = $in->next_tree ) {
>     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
>     my $bootstrap=$node->_creation_id;
>
>     if ($bootstrap < 70 ){
>    >>> if(        my $parent = $node->ancestor ) {
>               my @children=$node->get_all_Descendents;
>               foreach my $child (@children){
>                  $parent->add_Descendent($child);
>               }
         }
>
> ........
>
> eventually I'll add (once I assigned the children to the parent  
> succesfully):
> $tree->remove_Node($node);
>
>         }
>     }
>     $out->write_tree($tree);
> }
>
> Quoting aaron.j.mackey at gsk.com:
>
>>> foreach $child (@children){
>>>          $parent=add_Descendent->$child;
>>> }
>>
>> I think what you want is $parent->add_Descendent($child)
>>
>> -Aaron
>>
>
>
> Lucia Peixoto
> Department of Biology,SAS
> University of Pennsylvania
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From cjfields at uiuc.edu  Tue May 30 21:40:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 16:40:18 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith	"returnundef"
In-Reply-To: <447C9484.9030102@sfu.ca>
Message-ID: <001801c68431$a586b2d0$15327e82@pyrimidine>

Agreed, though I think these changes should be implemented at some point
(Conway's argument here makes sense and it is nice for Torsten to check this
out).  If proper tests are written then any changes resulting in errors
should be picked up by checking the appropriate test suite, though I know it
doesn't absolutely guarantee it.  ; P  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> Sent: Tuesday, May 30, 2006 1:53 PM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> "returnundef"
> 
> Although I agree with the sentiment of following PBP, I'm not so sure
> changing 'return undef' to 'return' *now* will fix any bugs without
> introducing new, subtle ones.
> 
> Chris Fields wrote:
> > Torsten,
> >
> > Any way you can post a list of some/all of the offending lines or
> modules?
> > Sounds like something to consider, but if the list is as large as you
> say we
> > made need something (bugzilla? wiki?) to track the changes and make sure
> > they pass tests; I'm sure a large majority will.
> >
> > I'm guessing Jason would want this somewhere on the project priority
> list or
> > bugzilla, with a link to the actual list, but I'm not sure.  Maybe start
> a
> > page on the wiki for proposed code changes?
> >
> > Chris
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> >> Sent: Tuesday, May 30, 2006 3:19 AM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> >> "returnundef"
> >>
> >> FYI Bioperl developers:
> >>
> >> I just audited the bioperl-live CVS and found about 450 occurrences of
> >> "return undef".
> >>
> >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html suggest:
> >>
> >> "Use return; instead of return undef; if you want to return nothing. If
> >> someone assigns the return value to an array, the latter creates an
> >> array of one value (undef), which evaluates to true. The former will
> >> correctly handle all contexts."
> >>
> >> So I'm guessing at least some of these 450 occurrences *could* result
> in
> >> bugs and should probably be changed.
> >>
> >> Your opinion may differ :-)
> >>
> >> --
> >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> >> Victorian Bioinformatics Consortium, Monash University, Australia
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
> 
> --
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Rutger Vos, PhD. candidate
> Department of Biological Sciences
> Simon Fraser University
> 8888 University Drive
> Burnaby, BC, V5A1S6
> Phone: 604-291-5625
> Fax: 604-291-3496
> Personal site: http://www.sfu.ca/~rvosa
> FAB* lab: http://www.sfu.ca/~fabstar
> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rvosa at sfu.ca  Tue May 30 21:58:25 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 14:58:25 -0700
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001901c68433$026b1ad0$15327e82@pyrimidine>
References: <001901c68433$026b1ad0$15327e82@pyrimidine>
Message-ID: <447CC001.4050000@sfu.ca>

I've been following the perl6 mailing lists for a while now. I think 
this time around it won't really take that long (one year?) for 
pugs/perl6 stacks to become more than just toys. I think especially 
large projects, like bioperl, will really benefit from the improved OO 
implementation in perl6, so it might be of interest to at least 
fantasize about it.

Chris Fields wrote:
> Ha!  Or may be the 'nonexistent' bioperl-experimental.  Wonder what'll
> happen once Perl6 comes to term?
>
> -CJF
>
>   
>> -----Original Message-----
>> From: Rutger Vos [mailto:rvosa at sfu.ca]
>> Sent: Tuesday, May 30, 2006 4:48 PM
>> To: Chris Fields
>> Subject: Re: [Bioperl-l] For CVS developers - potential
>> pitfallwith"returnundef"
>>
>> Surely this will all sort itself out in bioperl6 ;-)
>>
>> Chris Fields wrote:
>>     
>>> Agreed, though I think these changes should be implemented at some point
>>> (Conway's argument here makes sense and it is nice for Torsten to check
>>>       
>> this
>>     
>>> out).  If proper tests are written then any changes resulting in errors
>>> should be picked up by checking the appropriate test suite, though I
>>>       
>> know it
>>     
>>> doesn't absolutely guarantee it.  ; P
>>>
>>> Chris
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
>>>> Sent: Tuesday, May 30, 2006 1:53 PM
>>>> To: bioperl-l at lists.open-bio.org
>>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
>>>> "returnundef"
>>>>
>>>> Although I agree with the sentiment of following PBP, I'm not so sure
>>>> changing 'return undef' to 'return' *now* will fix any bugs without
>>>> introducing new, subtle ones.
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> Torsten,
>>>>>
>>>>> Any way you can post a list of some/all of the offending lines or
>>>>>
>>>>>           
>>>> modules?
>>>>
>>>>         
>>>>> Sounds like something to consider, but if the list is as large as you
>>>>>
>>>>>           
>>>> say we
>>>>
>>>>         
>>>>> made need something (bugzilla? wiki?) to track the changes and make
>>>>>           
>> sure
>>     
>>>>> they pass tests; I'm sure a large majority will.
>>>>>
>>>>> I'm guessing Jason would want this somewhere on the project priority
>>>>>
>>>>>           
>>>> list or
>>>>
>>>>         
>>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
>>>>>           
>> start
>>     
>>>> a
>>>>
>>>>         
>>>>> page on the wiki for proposed code changes?
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>>>>>> "returnundef"
>>>>>>
>>>>>> FYI Bioperl developers:
>>>>>>
>>>>>> I just audited the bioperl-live CVS and found about 450 occurrences
>>>>>>             
>> of
>>     
>>>>>> "return undef".
>>>>>>
>>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
>>>>>>             
>> suggest:
>>     
>>>>>> "Use return; instead of return undef; if you want to return nothing.
>>>>>>             
>> If
>>     
>>>>>> someone assigns the return value to an array, the latter creates an
>>>>>> array of one value (undef), which evaluates to true. The former will
>>>>>> correctly handle all contexts."
>>>>>>
>>>>>> So I'm guessing at least some of these 450 occurrences *could* result
>>>>>>
>>>>>>             
>>>> in
>>>>
>>>>         
>>>>>> bugs and should probably be changed.
>>>>>>
>>>>>> Your opinion may differ :-)
>>>>>>
>>>>>> --
>>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>             
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> --
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Rutger Vos, PhD. candidate
>>>> Department of Biological Sciences
>>>> Simon Fraser University
>>>> 8888 University Drive
>>>> Burnaby, BC, V5A1S6
>>>> Phone: 604-291-5625
>>>> Fax: 604-291-3496
>>>> Personal site: http://www.sfu.ca/~rvosa
>>>> FAB* lab: http://www.sfu.ca/~fabstar
>>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>         
>>>
>>>
>>>
>>>       
>> --
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Rutger Vos, PhD. candidate
>> Department of Biological Sciences
>> Simon Fraser University
>> 8888 University Drive
>> Burnaby, BC, V5A1S6
>> Phone: 604-291-5625
>> Fax: 604-291-3496
>> Personal site: http://www.sfu.ca/~rvosa
>> FAB* lab: http://www.sfu.ca/~fabstar
>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>     
>
>
>
>
>
>   

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++


From cjfields at uiuc.edu  Tue May 30 22:08:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 30 May 2006 17:08:26 -0500
Subject: [Bioperl-l] For CVS developers -
	potentialpitfallwith"returnundef"
In-Reply-To: <447CC001.4050000@sfu.ca>
Message-ID: <001a01c68435$93135a50$15327e82@pyrimidine>

Agreed.  I would say, probably 6-12 months time, might be a good idea to try
getting something actually started, maybe under the 'bioperl-experimental'
title Jason has mentioned.  One could always try getting a Bio::Root-like
object going in Pugs/Perl6 as a starter and work up from there, with
emphasis on key areas (seq. parsing, so on).

CJF

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> Sent: Tuesday, May 30, 2006 4:58 PM
> To: bioperl list
> Subject: Re: [Bioperl-l] For CVS developers -
> potentialpitfallwith"returnundef"
> 
> I've been following the perl6 mailing lists for a while now. I think
> this time around it won't really take that long (one year?) for
> pugs/perl6 stacks to become more than just toys. I think especially
> large projects, like bioperl, will really benefit from the improved OO
> implementation in perl6, so it might be of interest to at least
> fantasize about it.
> 
> Chris Fields wrote:
> > Ha!  Or may be the 'nonexistent' bioperl-experimental.  Wonder what'll
> > happen once Perl6 comes to term?
> >
> > -CJF
> >
> >
> >> -----Original Message-----
> >> From: Rutger Vos [mailto:rvosa at sfu.ca]
> >> Sent: Tuesday, May 30, 2006 4:48 PM
> >> To: Chris Fields
> >> Subject: Re: [Bioperl-l] For CVS developers - potential
> >> pitfallwith"returnundef"
> >>
> >> Surely this will all sort itself out in bioperl6 ;-)
> >>
> >> Chris Fields wrote:
> >>
> >>> Agreed, though I think these changes should be implemented at some
> point
> >>> (Conway's argument here makes sense and it is nice for Torsten to
> check
> >>>
> >> this
> >>
> >>> out).  If proper tests are written then any changes resulting in
> errors
> >>> should be picked up by checking the appropriate test suite, though I
> >>>
> >> know it
> >>
> >>> doesn't absolutely guarantee it.  ; P
> >>>
> >>> Chris
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> >>>> Sent: Tuesday, May 30, 2006 1:53 PM
> >>>> To: bioperl-l at lists.open-bio.org
> >>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> >>>> "returnundef"
> >>>>
> >>>> Although I agree with the sentiment of following PBP, I'm not so sure
> >>>> changing 'return undef' to 'return' *now* will fix any bugs without
> >>>> introducing new, subtle ones.
> >>>>
> >>>> Chris Fields wrote:
> >>>>
> >>>>
> >>>>> Torsten,
> >>>>>
> >>>>> Any way you can post a list of some/all of the offending lines or
> >>>>>
> >>>>>
> >>>> modules?
> >>>>
> >>>>
> >>>>> Sounds like something to consider, but if the list is as large as
> you
> >>>>>
> >>>>>
> >>>> say we
> >>>>
> >>>>
> >>>>> made need something (bugzilla? wiki?) to track the changes and make
> >>>>>
> >> sure
> >>
> >>>>> they pass tests; I'm sure a large majority will.
> >>>>>
> >>>>> I'm guessing Jason would want this somewhere on the project priority
> >>>>>
> >>>>>
> >>>> list or
> >>>>
> >>>>
> >>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> >>>>>
> >> start
> >>
> >>>> a
> >>>>
> >>>>
> >>>>> page on the wiki for proposed code changes?
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> >>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
> >>>>>> To: bioperl-l at lists.open-bio.org
> >>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> >>>>>> "returnundef"
> >>>>>>
> >>>>>> FYI Bioperl developers:
> >>>>>>
> >>>>>> I just audited the bioperl-live CVS and found about 450 occurrences
> >>>>>>
> >> of
> >>
> >>>>>> "return undef".
> >>>>>>
> >>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> >>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> >>>>>>
> >> suggest:
> >>
> >>>>>> "Use return; instead of return undef; if you want to return
> nothing.
> >>>>>>
> >> If
> >>
> >>>>>> someone assigns the return value to an array, the latter creates an
> >>>>>> array of one value (undef), which evaluates to true. The former
> will
> >>>>>> correctly handle all contexts."
> >>>>>>
> >>>>>> So I'm guessing at least some of these 450 occurrences *could*
> result
> >>>>>>
> >>>>>>
> >>>> in
> >>>>
> >>>>
> >>>>>> bugs and should probably be changed.
> >>>>>>
> >>>>>> Your opinion may differ :-)
> >>>>>>
> >>>>>> --
> >>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
> >>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> --
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Rutger Vos, PhD. candidate
> >>>> Department of Biological Sciences
> >>>> Simon Fraser University
> >>>> 8888 University Drive
> >>>> Burnaby, BC, V5A1S6
> >>>> Phone: 604-291-5625
> >>>> Fax: 604-291-3496
> >>>> Personal site: http://www.sfu.ca/~rvosa
> >>>> FAB* lab: http://www.sfu.ca/~fabstar
> >>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >> --
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Rutger Vos, PhD. candidate
> >> Department of Biological Sciences
> >> Simon Fraser University
> >> 8888 University Drive
> >> Burnaby, BC, V5A1S6
> >> Phone: 604-291-5625
> >> Fax: 604-291-3496
> >> Personal site: http://www.sfu.ca/~rvosa
> >> FAB* lab: http://www.sfu.ca/~fabstar
> >> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >
> >
> >
> >
> >
> >
> 
> --
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> Rutger Vos, PhD. candidate
> Department of Biological Sciences
> Simon Fraser University
> 8888 University Drive
> Burnaby, BC, V5A1S6
> Phone: 604-291-5625
> Fax: 604-291-3496
> Personal site: http://www.sfu.ca/~rvosa
> FAB* lab: http://www.sfu.ca/~fabstar
> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ULNJUJERYDIX at spammotel.com  Wed May 31 03:45:12 2006
From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau)
Date: Wed, 31 May 2006 11:45:12 +0800
Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg values
Message-ID: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>

I am so sorry for the truncated email accidentally hit reply.
if anyone is interested i have opted to change

change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
in linux its
/usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm


      $gd->string($font,$middle,$center+$a2-1,$label,$font_color)

to

      $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)

just  for this one-off use.


strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
option for coords offset?
    my $relative_coords_offset = $self->option('relative_coords_offset');
    $relative_coords_offset    = 1 unless defined $relative_coords_offset;
but entering the option -relative_coords_offset=>1000 in the arrow glyphs
didn't do anything...


Hi!
> oh it was in a slightly different header asking about the create image map
> feature.
> I am using the stable version 1.4 of bioperl now. In any case I have not
> added the sequence as a feature annotated seq. as I already have the bp
> where the TF binds (in 1-1050 numberings) so what I did was to just add
> graded segments based on the position.
> I saw that there is a scale function for the arrow glyp however, it is a
> multiply function, can it be hacked to take in a offset value (ie minus
> the
> scale by 1000?)
>
> cheers
> kevin
>
>
> Hi,
> >
> > For some reason I didn't see the first posting on this. In current
> bioperl
> > live, the ruler can have negative numberings - I use this routinely. You
> > need
> > to create a feature that starts in negative coordinates. What is
> happening
> > to
> > you when you try this?
> >
> > Lincoln
> >
> > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > Hi
> > > thanks for the help offered thus far!
> > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > using
> > > bioperl. therefore i was asked to make the numberings as such (-1000)
> is
> > > there any way at all to do this in bioperl without changing the .pm
> > file?
> > >
> > > thanks guys..
> > > kevin
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From sb at mrc-dunn.cam.ac.uk  Wed May 31 08:40:08 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 31 May 2006 09:40:08 +0100
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447C7985.9000404@cornell.edu>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
	<447C7985.9000404@cornell.edu>
Message-ID: <447D5668.7070500@mrc-dunn.cam.ac.uk>

Genevieve DeClerck wrote:
> Thanks for your comment Sendu, it was very helpful. I think this must be 
> what's going on.. I am using $blast_report->next_result in both 
> subroutines. It appears that analyzing the blast results first w/ my 
> sort subroutine empties (?) the $blast_result object so that when I try 
> to print, there is nothing left to print. (and visa-versa when I print 
> first then try to sort).
> So, from the looks of things, using next_result has the effect of 
> popping the Bio::Search::Result::ResultI objects off of the SearchIO 
> blast report object??

Not quite. It's more or less exactly like opening a file and then trying 
to read it all twice like this:
open(FILE, "file");
while (<FILE>) {
     print # prints each line in the file
}
while (<FILE>) {
     print # never happens, we never enter this while loop
}

To get the second while loop to print anything we need to say seek(FILE, 
0, 0) before it. Or in the first while loop store each line in an array, 
and then make the second loop a foreach through that array.


> It seems I could get around this by making a copy of the blast report by 
> setting it to another new variable...(not the most elegant solution) but 
> I'm having trouble with this...
> 
> If I do:
> 
>     my $blast_report_copy = $blast_report;
> 
> I'm just copying the reference to the SearchIO blast result, so it 
> doesn't help me. How can I make another physical copy of this blast 
> result object? Seems like a simple thing but how to do it is escaping me.

Not really a good idea, and it may not work anyway if the object 
contains a filehandle. But for a simple object you might recursively 
loop through the data structure and copy each element out into a similar 
data structure.


> But better yet, the way to go is to 'reset the counter,' or to find a 
> way to look at/print/sort the results without removing data from the 
> blast result object. How is this done though??

It would be rather nice if this worked:
my $blast_report = $factory->blastall($ref_seq_objs);
my $blast_fh = $blast_report->fh();
while (<$blast_fh>) {
     # $_ is a ResultI object, use as normal
}
seek($blast_fh, 0, 0); # this would be great, but does it work?
while <$blast_fh>) {
     # go through the results again in your second subroutine
}

An alternative hacky way of doing it, which may also not work, would be 
to go through your $blast_report as normal, but then before going 
through it a second time, say
my $fh = $blast_report->_fh;
seek($fh, 0, 0);

Finally, the most sensible way (assuming bioperl provides no methods of 
its own for this) of solving the problem is, the first time you go 
through each next_result, next_hit and next_hsp, just store the returned 
objects in an array of arrays of arrays. Then the second time get the 
objects from your array structure instead of with the method calls.


From heikki at sanbi.ac.za  Wed May 31 10:55:18 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 31 May 2006 12:55:18 +0200
Subject: [Bioperl-l]
	=?iso-8859-1?q?For_CVS_developers_-_potential_pitfall?=
	=?iso-8859-1?q?with_=22returnundef=22?=
In-Reply-To: <001801c68431$a586b2d0$15327e82@pyrimidine>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
Message-ID: <200605311255.19166.heikki@sanbi.ac.za>

In my opinion the sooner the bugs get exposed the better. It is much more 
likely that there is a well hidden bug caused by assigning accidentally undef 
into an one element array that someone intentionally writing code that 
expects that behaviour!

I removed (but did not commit yet) all undefs from my old Bio::Variation code 
and could not see any differences in the test output. 

Let's remove them!

	-Heikki

On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> Agreed, though I think these changes should be implemented at some point
> (Conway's argument here makes sense and it is nice for Torsten to check
> this out).  If proper tests are written then any changes resulting in
> errors should be picked up by checking the appropriate test suite, though I
> know it doesn't absolutely guarantee it.  ; P
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > Sent: Tuesday, May 30, 2006 1:53 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > "returnundef"
> >
> > Although I agree with the sentiment of following PBP, I'm not so sure
> > changing 'return undef' to 'return' *now* will fix any bugs without
> > introducing new, subtle ones.
> >
> > Chris Fields wrote:
> > > Torsten,
> > >
> > > Any way you can post a list of some/all of the offending lines or
> >
> > modules?
> >
> > > Sounds like something to consider, but if the list is as large as you
> >
> > say we
> >
> > > made need something (bugzilla? wiki?) to track the changes and make
> > > sure they pass tests; I'm sure a large majority will.
> > >
> > > I'm guessing Jason would want this somewhere on the project priority
> >
> > list or
> >
> > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > start
> >
> > a
> >
> > > page on the wiki for proposed code changes?
> > >
> > > Chris
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > >> To: bioperl-l at lists.open-bio.org
> > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > >> "returnundef"
> > >>
> > >> FYI Bioperl developers:
> > >>
> > >> I just audited the bioperl-live CVS and found about 450 occurrences of
> > >> "return undef".
> > >>
> > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > >> suggest:
> > >>
> > >> "Use return; instead of return undef; if you want to return nothing.
> > >> If someone assigns the return value to an array, the latter creates an
> > >> array of one value (undef), which evaluates to true. The former will
> > >> correctly handle all contexts."
> > >>
> > >> So I'm guessing at least some of these 450 occurrences *could* result
> >
> > in
> >
> > >> bugs and should probably be changed.
> > >>
> > >> Your opinion may differ :-)
> > >>
> > >> --
> > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Rutger Vos, PhD. candidate
> > Department of Biological Sciences
> > Simon Fraser University
> > 8888 University Drive
> > Burnaby, BC, V5A1S6
> > Phone: 604-291-5625
> > Fax: 604-291-3496
> > Personal site: http://www.sfu.ca/~rvosa
> > FAB* lab: http://www.sfu.ca/~fabstar
> > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of the Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Wed May 31 10:44:28 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 31 May 2006 12:44:28 +0200
Subject: [Bioperl-l] Bio::Restriction::IO issues
In-Reply-To: <000f01c683f8$5771ed50$15327e82@pyrimidine>
References: <000f01c683f8$5771ed50$15327e82@pyrimidine>
Message-ID: <200605311244.29187.heikki@sanbi.ac.za>


Chris,

Thanks for stepping in. I feel partly responsible here because I originally 
changed some of Rob's code but have not followed up since.

There have not been active development on these modules so do not worry about 
stepping on anyone's toes.

   -Heikki

On Tuesday 30 May 2006 16:50, Chris Fields wrote:
> Jason, Brian, et al,
>
> I found several major issues with Bio::Restriction::IO (this popped up
> while bug squashing).  In particular, the POD is pretty misleading.  It
> states (directly from perldoc):
>
> SYNOPSIS
>         use Bio::Restriction::IO;
>
>         $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>                                          -format => 'withrefm');
>         $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>                                          -format => 'bairoch');
>         my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>         $out->write($res);
>
>       # or
>
>       #    use Bio::Restriction::IO;
>       #
>       #    #input file format can be read from the file extension (dat|xml)
>       #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
>       #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>       #
>       #    # World's shortest flat<->xml format converter:
>       #    print $out $_ while <$in>;
>
> So, I have found several problems with these modules.  I really hate to
> criticize code here, as my own is pretty hacky, but I think these are
> things to seriously mull over:
>
> 1)	Note that, though some of the lines above are commented they are
> still there in POD and thus present in perldoc/pod2html etc.  So, judging
> from the above, it suggests using the script above should read in from one
> format and write out to another (like SeqIO).  However, NONE of the current
> write() methods are implemented for any of the IO modules (withref, base,
> itype2, bairoch), so this does not happen as expected.  You get the nasty
> thrown 'method not implemented error' instead when writing.
> 2)	The commented statements in POD above also suggest that REBASE XML
> format is supported when there is no XML module.
> 3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
> made it unusable until I added a few small changes; it still can't handle
> multisite/multicut enzymes properly, so in essence it is useless until that
> is addressed.
> 4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
> why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO and make
> up it's own methods?
>
> I'm working on at least getting the 'bairoch' input format up and running
> (so at least it gets the enzymes into a
> Bio::Restriction::Enzyme::Collection).  From this point I'm not sure where
> to proceed.  The POD obviously needs to be corrected to reflect that
> writing formats is not implemented (and the bit about XML should be taken
> out completely); that's the easy part which I am working on and plan
> committing today.  However, these modules don't seem to be used too
> frequently so I'm not sure whether it's worth spending too much time
> getting these up to speed at the moment (adding write methods, switching to
> Bio::Root::Root, etc); I have other priorities at the moment (including a
> way overdue ListSummary). I'm also not sure who else is (using|working) on
> these so I don't want to (make too many changes|step on someone else's
> toes), but these are, IMHO, pretty serious problems.
>
> Any thoughts?
>
> Chris
>
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of the Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From cjfields at uiuc.edu  Wed May 31 13:10:00 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 08:10:00 -0500
Subject: [Bioperl-l] Bio::Restriction::IO issues
In-Reply-To: <200605311244.29187.heikki@sanbi.ac.za>
References: <000f01c683f8$5771ed50$15327e82@pyrimidine>
	<200605311244.29187.heikki@sanbi.ac.za>
Message-ID: <C8B60E1D-D5A5-4661-AA2B-CEE1E5B5D758@uiuc.edu>

Heikki,

I mainly just changed a few things so no one would get the wrong  
ideas from POD (that they write format as well) and added a few  
things to the TO DO.  I also added a warning to  
Bio::Restriction::IO::bairoch for the multisite/multicut issue.   
Besides that I haven't done much to them.  I also added a bit to the  
Project Priority List in case someone wants to take it up.  I may  
tinker with it but it's not really high on my priority list.  I've  
been pretty busy getting the ListSummaries back up to speed (very  
busy mail lists since the last one) and am writing/testing a new  
interface to NCBI EUtilities which I may donate at some in the next  
few months or so.

Chris


On May 31, 2006, at 5:44 AM, Heikki Lehvaslaiho wrote:

>
> Chris,
>
> Thanks for stepping in. I feel partly responsible here because I  
> originally
> changed some of Rob's code but have not followed up since.
>
> There have not been active development on these modules so do not  
> worry about
> stepping on anyone's toes.
>
>    -Heikki
>
> On Tuesday 30 May 2006 16:50, Chris Fields wrote:
>> Jason, Brian, et al,
>>
>> I found several major issues with Bio::Restriction::IO (this  
>> popped up
>> while bug squashing).  In particular, the POD is pretty  
>> misleading.  It
>> states (directly from perldoc):
>>
>> SYNOPSIS
>>         use Bio::Restriction::IO;
>>
>>         $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>>                                          -format => 'withrefm');
>>         $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>>                                          -format => 'bairoch');
>>         my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>>         $out->write($res);
>>
>>       # or
>>
>>       #    use Bio::Restriction::IO;
>>       #
>>       #    #input file format can be read from the file extension  
>> (dat|xml)
>>       #    $in  = Bio::Restriction::IO->newFh(-file =>  
>> "inputfilename");
>>       #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>>       #
>>       #    # World's shortest flat<->xml format converter:
>>       #    print $out $_ while <$in>;
>>
>> So, I have found several problems with these modules.  I really  
>> hate to
>> criticize code here, as my own is pretty hacky, but I think these are
>> things to seriously mull over:
>>
>> 1)	Note that, though some of the lines above are commented they are
>> still there in POD and thus present in perldoc/pod2html etc.  So,  
>> judging
>> from the above, it suggests using the script above should read in  
>> from one
>> format and write out to another (like SeqIO).  However, NONE of  
>> the current
>> write() methods are implemented for any of the IO modules  
>> (withref, base,
>> itype2, bairoch), so this does not happen as expected.  You get  
>> the nasty
>> thrown 'method not implemented error' instead when writing.
>> 2)	The commented statements in POD above also suggest that REBASE XML
>> format is supported when there is no XML module.
>> 3)	The Bio::Restriction::IO::bairoch module had multiple bugs which
>> made it unusable until I added a few small changes; it still can't  
>> handle
>> multisite/multicut enzymes properly, so in essence it is useless  
>> until that
>> is addressed.
>> 4)	Bio::Restriction::IO inherits from Bio::SeqIO, though I'm not sure
>> why.  Shouldn't it just inherit from Bio::Root::Root/Bio::Root::IO  
>> and make
>> up it's own methods?
>>
>> I'm working on at least getting the 'bairoch' input format up and  
>> running
>> (so at least it gets the enzymes into a
>> Bio::Restriction::Enzyme::Collection).  From this point I'm not  
>> sure where
>> to proceed.  The POD obviously needs to be corrected to reflect that
>> writing formats is not implemented (and the bit about XML should  
>> be taken
>> out completely); that's the easy part which I am working on and plan
>> committing today.  However, these modules don't seem to be used too
>> frequently so I'm not sure whether it's worth spending too much time
>> getting these up to speed at the moment (adding write methods,  
>> switching to
>> Bio::Root::Root, etc); I have other priorities at the moment  
>> (including a
>> way overdue ListSummary). I'm also not sure who else is (using| 
>> working) on
>> these so I don't want to (make too many changes|step on someone  
>> else's
>> toes), but these are, IMHO, pretty serious problems.
>>
>> Any thoughts?
>>
>> Chris
>>
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of the Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jay at jays.net  Wed May 31 13:07:10 2006
From: jay at jays.net (Jay Hannah)
Date: Wed, 31 May 2006 08:07:10 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
Message-ID: <447D94FE.8090305@jays.net>

http://www.bioperl.org/wiki/Bptutorial.pl

I think I just partially fulfilled this TODO:

  TODO: check if the POD is in the Wiki yet, and if not, put it here? 

I used Pod::Simple::Wiki (format 'mediawiki') to burn bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the wiki page via my web browser. (Is that proper procedure? Is the plan to just do that manually from time to time as the document changes?)

Now what?

Should there be a new link on the far left of bioperl.org called "Tutorial"? 

It's an amazing document. IMHO it should be listed prominently on bioperl.org.

HTH,

j


From osborne1 at optonline.net  Wed May 31 13:58:01 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 31 May 2006 09:58:01 -0400
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <447D94FE.8090305@jays.net>
Message-ID: <C0A31929.89F9%osborne1@optonline.net>

Jay,

Excellent! Now we need to answer a few more questions for ourselves:

- Do we remove the file bptutorial.pl from the package now? I'd say yes, we
don't want to have to maintain two bptutorials.

- What do we do with the script part of bptutorial.pl? It certainly could be
excised and put into the examples/ directory, for example, but this would
break a few of the paths that are being used.

- A link to bptutorial? Or a link to the existing tutorials page?
http://www.bioperl.org/wiki/Tutorials.

Any thoughts on these?


Brian O.


On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:

> http://www.bioperl.org/wiki/Bptutorial.pl
> 
> I think I just partially fulfilled this TODO:
> 
>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> 
> I used Pod::Simple::Wiki (format 'mediawiki') to burn
> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the
> wiki page via my web browser. (Is that proper procedure? Is the plan to just
> do that manually from time to time as the document changes?)
> 
> Now what?
> 
> Should there be a new link on the far left of bioperl.org called "Tutorial"?
> 
> It's an amazing document. IMHO it should be listed prominently on bioperl.org.
> 
> HTH,
> 
> j
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From luciap at sas.upenn.edu  Wed May 31 14:06:13 2006
From: luciap at sas.upenn.edu (Lucia Peixoto)
Date: Wed, 31 May 2006 10:06:13 -0400
Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function
In-Reply-To: <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
References: <OF08921C7C.2098C56E-ON8525717E.006A8FE9-8525717E.006A98C7@gsk.com>
	<1149019912.447ca7085124e@128.91.55.38>
	<6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu>
Message-ID: <1149084373.447da2d5c5339@128.91.55.38>

Hi
Thanks
a couple more questions
why is the bootstrap value stored as the node id? Is that right?

also, in the add_descendant method, how do you set the $ignoreoverwrite
parameter to true?

Lucia

Quoting Jason Stajich <jason.stajich at duke.edu>:

> you need to special case the root - it won't have an ancestor.  just
> protect the my $parent = $node->ancestor with an if statement as I
> did below
>
> On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote:
>
> > Hi
> > OK that was silly, but what I have in my code is what you just wrote
> > But the problem is that if I write
> >
> > $parent->add_Descendent($child)
> >
> > it tells me that I am calling  the method "ass_Descendent" on an
> > undefined value
> > (but I did define $parent before??)
> >
> > So here it goes the code so far:
> >
> > use Bio::TreeIO;
> >  my $in = new Bio::TreeIO(-file => 'Test2.tre',
> >                           -format => 'newick');
> >  my $out = new Bio::TreeIO(-file => '>mytree.out',
> >                            -format => 'newick');
> >  while( my $tree = $in->next_tree ) {
> >     foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes() ) {
> >     my $bootstrap=$node->_creation_id;
> >
> >     if ($bootstrap < 70 ){
> >    >>> if(        my $parent = $node->ancestor ) {
> >               my @children=$node->get_all_Descendents;
> >               foreach my $child (@children){
> >                  $parent->add_Descendent($child);
> >               }
>          }
> >
> > ........
> >
> > eventually I'll add (once I assigned the children to the parent
> > succesfully):
> > $tree->remove_Node($node);
> >
> >         }
> >     }
> >     $out->write_tree($tree);
> > }
> >
> > Quoting aaron.j.mackey at gsk.com:
> >
> >>> foreach $child (@children){
> >>>          $parent=add_Descendent->$child;
> >>> }
> >>
> >> I think what you want is $parent->add_Descendent($child)
> >>
> >> -Aaron
> >>
> >
> >
> > Lucia Peixoto
> > Department of Biology,SAS
> > University of Pennsylvania
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>


Lucia Peixoto
Department of Biology,SAS
University of Pennsylvania


From sb at mrc-dunn.cam.ac.uk  Wed May 31 14:56:49 2006
From: sb at mrc-dunn.cam.ac.uk (Sendu Bala)
Date: Wed, 31 May 2006 15:56:49 +0100
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>

Heikki Lehvaslaiho wrote:
> In my opinion the sooner the bugs get exposed the better. It is much more 
> likely that there is a well hidden bug caused by assigning accidentally undef 
> into an one element array that someone intentionally writing code that 
> expects that behaviour!
> 
> I removed (but did not commit yet) all undefs from my old Bio::Variation code 
> and could not see any differences in the test output. 
> 
> Let's remove them!

Just looking for all return undef;s isn't enough. It's entirely possible 
to do something like:

my $return_value;
{
   # do something that assigns to return_value on success
   # on failure, just do nothing
}
return $return_value;

The bioperl docs will typically explicitly state that undef is returned, 
and under what circumstance. If a user suffers from the 
undef-into-array-problem, yes it can be slightly unexpected, but lots of 
unexpected things will happen when you don't use a method correctly, as 
per the docs!

Fixing the return of undef is either a job that shouldn't be done, or a 
much harder job than expected.


From bernd.web at gmail.com  Wed May 31 14:30:30 2006
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 31 May 2006 16:30:30 +0200
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
References: <447D94FE.8090305@jays.net> <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <716af09c0605310730o7de20489m674a07b5a928039d@mail.gmail.com>

Hi,

I am not sure to what extent bptutorial will be removed, but
I actually like having bptutorial.pl in my BioPerl base for reference.

regards,
Bernd

On 5/31/06, Brian Osborne <osborne1 at optonline.net> wrote:
> Jay,
>
> Excellent! Now we need to answer a few more questions for ourselves:
>
> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we
> don't want to have to maintain two bptutorials.
>
> - What do we do with the script part of bptutorial.pl? It certainly could be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.
>
> - A link to bptutorial? Or a link to the existing tutorials page?
> http://www.bioperl.org/wiki/Tutorials.
>
> Any thoughts on these?
>
>
> Brian O.
>
>
> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
>
> > http://www.bioperl.org/wiki/Bptutorial.pl
> >
> > I think I just partially fulfilled this TODO:
> >
> >   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >
> > I used Pod::Simple::Wiki (format 'mediawiki') to burn
> > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it the
> > wiki page via my web browser. (Is that proper procedure? Is the plan to just
> > do that manually from time to time as the document changes?)
> >
> > Now what?
> >
> > Should there be a new link on the far left of bioperl.org called "Tutorial"?
> >
> > It's an amazing document. IMHO it should be listed prominently on bioperl.org.
> >
> > HTH,
> >
> > j
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From lstein at cshl.edu  Wed May 31 16:03:13 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 12:03:13 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <200605311203.13922.lstein@cshl.edu>

I'm afraid that everything depends on the context. If the subroutine is 
documented to return a single scalar, then returning undef is appropriate. If 
the subroutine is documented to return "false" on failure, then one must call 
return (or "return ()" ).

Changing all the return undefs to return is going to expose hidden bugs in the 
code written by people who are using BioPerl. While I agree wholeheartedly 
with the proposed audit, I think we need to expect that people are going to 
complain.

Lincoln


On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote:
> In my opinion the sooner the bugs get exposed the better. It is much more
> likely that there is a well hidden bug caused by assigning accidentally
> undef into an one element array that someone intentionally writing code
> that expects that behaviour!
>
> I removed (but did not commit yet) all undefs from my old Bio::Variation
> code and could not see any differences in the test output.
>
> Let's remove them!
>
> 	-Heikki
>
> On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> > Agreed, though I think these changes should be implemented at some point
> > (Conway's argument here makes sense and it is nice for Torsten to check
> > this out).  If proper tests are written then any changes resulting in
> > errors should be picked up by checking the appropriate test suite, though
> > I know it doesn't absolutely guarantee it.  ; P
> >
> > Chris
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > > Sent: Tuesday, May 30, 2006 1:53 PM
> > > To: bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > > "returnundef"
> > >
> > > Although I agree with the sentiment of following PBP, I'm not so sure
> > > changing 'return undef' to 'return' *now* will fix any bugs without
> > > introducing new, subtle ones.
> > >
> > > Chris Fields wrote:
> > > > Torsten,
> > > >
> > > > Any way you can post a list of some/all of the offending lines or
> > >
> > > modules?
> > >
> > > > Sounds like something to consider, but if the list is as large as you
> > >
> > > say we
> > >
> > > > made need something (bugzilla? wiki?) to track the changes and make
> > > > sure they pass tests; I'm sure a large majority will.
> > > >
> > > > I'm guessing Jason would want this somewhere on the project priority
> > >
> > > list or
> > >
> > > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > > start
> > >
> > > a
> > >
> > > > page on the wiki for proposed code changes?
> > > >
> > > > Chris
> > > >
> > > >> -----Original Message-----
> > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > > >> To: bioperl-l at lists.open-bio.org
> > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > > >> "returnundef"
> > > >>
> > > >> FYI Bioperl developers:
> > > >>
> > > >> I just audited the bioperl-live CVS and found about 450 occurrences
> > > >> of "return undef".
> > > >>
> > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > > >> suggest:
> > > >>
> > > >> "Use return; instead of return undef; if you want to return nothing.
> > > >> If someone assigns the return value to an array, the latter creates
> > > >> an array of one value (undef), which evaluates to true. The former
> > > >> will correctly handle all contexts."
> > > >>
> > > >> So I'm guessing at least some of these 450 occurrences *could*
> > > >> result
> > >
> > > in
> > >
> > > >> bugs and should probably be changed.
> > > >>
> > > >> Your opinion may differ :-)
> > > >>
> > > >> --
> > > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > > >>
> > > >> _______________________________________________
> > > >> Bioperl-l mailing list
> > > >> Bioperl-l at lists.open-bio.org
> > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Rutger Vos, PhD. candidate
> > > Department of Biological Sciences
> > > Simon Fraser University
> > > 8888 University Drive
> > > Burnaby, BC, V5A1S6
> > > Phone: 604-291-5625
> > > Fax: 604-291-3496
> > > Personal site: http://www.sfu.ca/~rvosa
> > > FAB* lab: http://www.sfu.ca/~fabstar
> > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed May 31 16:34:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 11:34:54 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <C0A31929.89F9%osborne1@optonline.net>
Message-ID: <001201c684d0$263c5530$15327e82@pyrimidine>

Brian, Jay,

I think it would be nice to have the tutorial prominently displayed somehow
(Jay's suggestion), with a link provided via the tutorials page.  Hopefully
this will help with the bioperl newbies.

Jay, looks like there are still some weird formatting issues with the
bptutorial wiki page, something which I ran into before when getting the
Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
spaces preceding a line denotes code for some reason).  Not much you can do
in these cases except remove the extra spaces in those spots.  Looking good
though!  

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
> Sent: Wednesday, May 31, 2006 8:58 AM
> To: Jay Hannah; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
> 
> Jay,
> 
> Excellent! Now we need to answer a few more questions for ourselves:
> 
> - Do we remove the file bptutorial.pl from the package now? I'd say yes,
> we
> don't want to have to maintain two bptutorials.
> 
> - What do we do with the script part of bptutorial.pl? It certainly could
> be
> excised and put into the examples/ directory, for example, but this would
> break a few of the paths that are being used.
> 
> - A link to bptutorial? Or a link to the existing tutorials page?
> http://www.bioperl.org/wiki/Tutorials.
> 
> Any thoughts on these?
> 
> 
> Brian O.
> 
> 
> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
> 
> > http://www.bioperl.org/wiki/Bptutorial.pl
> >
> > I think I just partially fulfilled this TODO:
> >
> >   TODO: check if the POD is in the Wiki yet, and if not, put it here?
> >
> > I used Pod::Simple::Wiki (format 'mediawiki') to burn
> > bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
> the
> > wiki page via my web browser. (Is that proper procedure? Is the plan to
> just
> > do that manually from time to time as the document changes?)
> >
> > Now what?
> >
> > Should there be a new link on the far left of bioperl.org called
> "Tutorial"?
> >
> > It's an amazing document. IMHO it should be listed prominently on
> bioperl.org.
> >
> > HTH,
> >
> > j
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 31 16:44:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 11:44:31 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311203.13922.lstein@cshl.edu>
Message-ID: <001301c684d1$7e849fd0$15327e82@pyrimidine>

My feeling is the test suite 'should' pick up a large majority of problems
if changes are made to these lines, the quotes there indicating the utopian
idea that the tests are all written well (I believe 99% of the tests are,
BTW).  You can always try the changes (wholesale or on smaller chunks of
code), see if they pass tests on different OS's using 'make/nmake test',
revert the ones that didn't pass, etc.  It's a matter of someone willing to
try it out.

I think the original argument proposed here (originating from Damian Conway
and 'Perl Best Practices') is maybe using 'return undef' is something we
shouldn't be doing since this can lead to subtle errors itself.  Not that
everything we do is considered 'a good practice' by any means.  If I
remember correctly from 'OOPerl', Conway doesn't like combined get/setters
either (he prefers separate getters and setters); we use the 'bad' combined
version predominately in Bioperl.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> Sent: Wednesday, May 31, 2006 11:03 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Heikki Lehvaslaiho
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> I'm afraid that everything depends on the context. If the subroutine is
> documented to return a single scalar, then returning undef is appropriate.
> If
> the subroutine is documented to return "false" on failure, then one must
> call
> return (or "return ()" ).
> 
> Changing all the return undefs to return is going to expose hidden bugs in
> the
> code written by people who are using BioPerl. While I agree wholeheartedly
> with the proposed audit, I think we need to expect that people are going
> to
> complain.
> 
> Lincoln
> 
> 
> On Wednesday 31 May 2006 06:55, Heikki Lehvaslaiho wrote:
> > In my opinion the sooner the bugs get exposed the better. It is much
> more
> > likely that there is a well hidden bug caused by assigning accidentally
> > undef into an one element array that someone intentionally writing code
> > that expects that behaviour!
> >
> > I removed (but did not commit yet) all undefs from my old Bio::Variation
> > code and could not see any differences in the test output.
> >
> > Let's remove them!
> >
> > 	-Heikki
> >
> > On Tuesday 30 May 2006 23:40, Chris Fields wrote:
> > > Agreed, though I think these changes should be implemented at some
> point
> > > (Conway's argument here makes sense and it is nice for Torsten to
> check
> > > this out).  If proper tests are written then any changes resulting in
> > > errors should be picked up by checking the appropriate test suite,
> though
> > > I know it doesn't absolutely guarantee it.  ; P
> > >
> > > Chris
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Rutger Vos
> > > > Sent: Tuesday, May 30, 2006 1:53 PM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
> > > > "returnundef"
> > > >
> > > > Although I agree with the sentiment of following PBP, I'm not so
> sure
> > > > changing 'return undef' to 'return' *now* will fix any bugs without
> > > > introducing new, subtle ones.
> > > >
> > > > Chris Fields wrote:
> > > > > Torsten,
> > > > >
> > > > > Any way you can post a list of some/all of the offending lines or
> > > >
> > > > modules?
> > > >
> > > > > Sounds like something to consider, but if the list is as large as
> you
> > > >
> > > > say we
> > > >
> > > > > made need something (bugzilla? wiki?) to track the changes and
> make
> > > > > sure they pass tests; I'm sure a large majority will.
> > > > >
> > > > > I'm guessing Jason would want this somewhere on the project
> priority
> > > >
> > > > list or
> > > >
> > > > > bugzilla, with a link to the actual list, but I'm not sure.  Maybe
> > > > > start
> > > >
> > > > a
> > > >
> > > > > page on the wiki for proposed code changes?
> > > > >
> > > > > Chris
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
> > > > >> Sent: Tuesday, May 30, 2006 3:19 AM
> > > > >> To: bioperl-l at lists.open-bio.org
> > > > >> Subject: [Bioperl-l] For CVS developers - potential pitfall with
> > > > >> "returnundef"
> > > > >>
> > > > >> FYI Bioperl developers:
> > > > >>
> > > > >> I just audited the bioperl-live CVS and found about 450
> occurrences
> > > > >> of "return undef".
> > > > >>
> > > > >> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
> > > > >> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
> > > > >> suggest:
> > > > >>
> > > > >> "Use return; instead of return undef; if you want to return
> nothing.
> > > > >> If someone assigns the return value to an array, the latter
> creates
> > > > >> an array of one value (undef), which evaluates to true. The
> former
> > > > >> will correctly handle all contexts."
> > > > >>
> > > > >> So I'm guessing at least some of these 450 occurrences *could*
> > > > >> result
> > > >
> > > > in
> > > >
> > > > >> bugs and should probably be changed.
> > > > >>
> > > > >> Your opinion may differ :-)
> > > > >>
> > > > >> --
> > > > >> Dr Torsten Seemann               http://www.vicbioinformatics.com
> > > > >> Victorian Bioinformatics Consortium, Monash University, Australia
> > > > >>
> > > > >> _______________________________________________
> > > > >> Bioperl-l mailing list
> > > > >> Bioperl-l at lists.open-bio.org
> > > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > Rutger Vos, PhD. candidate
> > > > Department of Biological Sciences
> > > > Simon Fraser University
> > > > 8888 University Drive
> > > > Burnaby, BC, V5A1S6
> > > > Phone: 604-291-5625
> > > > Fax: 604-291-3496
> > > > Personal site: http://www.sfu.ca/~rvosa
> > > > FAB* lab: http://www.sfu.ca/~fabstar
> > > > Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Wed May 31 14:59:53 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 10:59:53 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311255.19166.heikki@sanbi.ac.za>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
Message-ID: <949F348A-391B-495D-ABCE-30BABC37FF05@gmx.net>

I agree. Thanks to Torsten for the audit and Chris for stepping up.

	-hilmar

On May 31, 2006, at 6:55 AM, Heikki Lehvaslaiho wrote:

> In my opinion the sooner the bugs get exposed the better. It is  
> much more
> likely that there is a well hidden bug caused by assigning  
> accidentally undef
> into an one element array that someone intentionally writing code that
> expects that behaviour!
>
> I removed (but did not commit yet) all undefs from my old  
> Bio::Variation code
> and could not see any differences in the test output.
>
> Let's remove them!
>
> 	-Heikki
>
> On Tuesday 30 May 2006 23:40, Chris Fields wrote:
>> Agreed, though I think these changes should be implemented at some  
>> point
>> (Conway's argument here makes sense and it is nice for Torsten to  
>> check
>> this out).  If proper tests are written then any changes resulting in
>> errors should be picked up by checking the appropriate test suite,  
>> though I
>> know it doesn't absolutely guarantee it.  ; P
>>
>> Chris
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Rutger Vos
>>> Sent: Tuesday, May 30, 2006 1:53 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] For CVS developers - potential pitfallwith
>>> "returnundef"
>>>
>>> Although I agree with the sentiment of following PBP, I'm not so  
>>> sure
>>> changing 'return undef' to 'return' *now* will fix any bugs without
>>> introducing new, subtle ones.
>>>
>>> Chris Fields wrote:
>>>> Torsten,
>>>>
>>>> Any way you can post a list of some/all of the offending lines or
>>>
>>> modules?
>>>
>>>> Sounds like something to consider, but if the list is as large  
>>>> as you
>>>
>>> say we
>>>
>>>> made need something (bugzilla? wiki?) to track the changes and make
>>>> sure they pass tests; I'm sure a large majority will.
>>>>
>>>> I'm guessing Jason would want this somewhere on the project  
>>>> priority
>>>
>>> list or
>>>
>>>> bugzilla, with a link to the actual list, but I'm not sure.  Maybe
>>>> start
>>>
>>> a
>>>
>>>> page on the wiki for proposed code changes?
>>>>
>>>> Chris
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Torsten Seemann
>>>>> Sent: Tuesday, May 30, 2006 3:19 AM
>>>>> To: bioperl-l at lists.open-bio.org
>>>>> Subject: [Bioperl-l] For CVS developers - potential pitfall with
>>>>> "returnundef"
>>>>>
>>>>> FYI Bioperl developers:
>>>>>
>>>>> I just audited the bioperl-live CVS and found about 450  
>>>>> occurrences of
>>>>> "return undef".
>>>>>
>>>>> Page 199 of "Perl Best Practices" by Damian Conway, and this URL
>>>>> http://www.perl.com/lpt/a/2006/02/23/advanced_subroutines.html
>>>>> suggest:
>>>>>
>>>>> "Use return; instead of return undef; if you want to return  
>>>>> nothing.
>>>>> If someone assigns the return value to an array, the latter  
>>>>> creates an
>>>>> array of one value (undef), which evaluates to true. The former  
>>>>> will
>>>>> correctly handle all contexts."
>>>>>
>>>>> So I'm guessing at least some of these 450 occurrences *could*  
>>>>> result
>>>
>>> in
>>>
>>>>> bugs and should probably be changed.
>>>>>
>>>>> Your opinion may differ :-)
>>>>>
>>>>> --
>>>>> Dr Torsten Seemann               http://www.vicbioinformatics.com
>>>>> Victorian Bioinformatics Consortium, Monash University, Australia
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Rutger Vos, PhD. candidate
>>> Department of Biological Sciences
>>> Simon Fraser University
>>> 8888 University Drive
>>> Burnaby, BC, V5A1S6
>>> Phone: 604-291-5625
>>> Fax: 604-291-3496
>>> Personal site: http://www.sfu.ca/~rvosa
>>> FAB* lab: http://www.sfu.ca/~fabstar
>>> Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of the Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed May 31 18:08:43 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:08:43 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311203.13922.lstein@cshl.edu>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311255.19166.heikki@sanbi.ac.za>
	<200605311203.13922.lstein@cshl.edu>
Message-ID: <FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>


On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:

> If the subroutine is documented to return "false" on failure, then  
> one must call
> return (or "return ()" ).

The problem seems to be that 'a value that evaluates to either true  
or false' and 'a [meaningful] value or undef' and 'a value or  
false' ('a value or no value) are not the same in perl. And what  
would/should one expect if the doc states 'true on success and false  
otherwise'?

Maybe the documentation should also be fixed to avoid any ambiguity.  
I.e., avoid documenting 'a value or false' because it may be  
ambiguous (not only) to the less proficient. 'True or false' should  
imply a value being returned.

Comments?

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lstein at cshl.edu  Wed May 31 18:14:59 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 14:14:59 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311203.13922.lstein@cshl.edu>
	<FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
Message-ID: <200605311415.00414.lstein@cshl.edu>

If the documentation says "returns false" then I expect to be able to do this:

	@result = foo();
	die "foo() failed" unless @result;

If the documentation says "returns undef" then I expect this:

	@result = foo();
	die "foo() failed" unless $result[0];

Lincoln


On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > If the subroutine is documented to return "false" on failure, then
> > one must call
> > return (or "return ()" ).
>
> The problem seems to be that 'a value that evaluates to either true
> or false' and 'a [meaningful] value or undef' and 'a value or
> false' ('a value or no value) are not the same in perl. And what
> would/should one expect if the doc states 'true on success and false
> otherwise'?
>
> Maybe the documentation should also be fixed to avoid any ambiguity.
> I.e., avoid documenting 'a value or false' because it may be
> ambiguous (not only) to the less proficient. 'True or false' should
> imply a value being returned.
>
> Comments?
>
> 	-hilmar

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From hlapp at gmx.net  Wed May 31 18:31:21 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 14:31:21 -0400
Subject: [Bioperl-l] For CVS developers - potential pitfallwith
	"returnundef"
In-Reply-To: <200605311415.00414.lstein@cshl.edu>
References: <001801c68431$a586b2d0$15327e82@pyrimidine>
	<200605311203.13922.lstein@cshl.edu>
	<FE6E523B-ACCF-4BDC-BBB5-6A7A4D6DA62B@gmx.net>
	<200605311415.00414.lstein@cshl.edu>
Message-ID: <241E77AE-8D1E-4708-9C4C-8A9619822DB4@gmx.net>


On May 31, 2006, at 2:14 PM, Lincoln Stein wrote:

> If the documentation says "returns false" then I expect to be able  
> to do this:
>
> 	@result = foo();
> 	die "foo() failed" unless @result;

Except if the alternative to 'false' would be a scalar, you normally  
wouldn't assign it to an array, would you?

I.e., I wouldn't expect this strict of a behavior from an open-source  
package written largely from people whose job is biological science,  
not programming perl knowing and following DC to the letter ... I'd  
rather be on the safe side and assign to a scalar.

Just my $0.02 ...

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed May 31 18:50:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 13:50:30 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk>
Message-ID: <001801c684e3$16e33730$15327e82@pyrimidine>


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Wednesday, May 31, 2006 9:57 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> Heikki Lehvaslaiho wrote:
> > In my opinion the sooner the bugs get exposed the better. It is much
> more
> > likely that there is a well hidden bug caused by assigning accidentally
> undef
> > into an one element array that someone intentionally writing code that
> > expects that behaviour!
> >
> > I removed (but did not commit yet) all undefs from my old Bio::Variation
> code
> > and could not see any differences in the test output.
> >
> > Let's remove them!
> 
> Just looking for all return undef;s isn't enough. It's entirely possible
> to do something like:
> 
> my $return_value;
> {
>    # do something that assigns to return_value on success
>    # on failure, just do nothing
> }
> return $return_value;

Agreed, though looking for these is obviously much harder.  

The way to get around those is:

return $return_value if $return_value;
return;

which I've seen used in a number of get/set methods. 

> The bioperl docs will typically explicitly state that undef is returned,
> and under what circumstance. If a user suffers from the
> undef-into-array-problem, yes it can be slightly unexpected, but lots of
> unexpected things will happen when you don't use a method correctly, as
> per the docs!

Right, but the argument you make is that code will always work as expected
from the perldoc examples.  My recent experiences with the
Bio::Restriction::IO and Bio::Species classes show that the docs are not
always up-to-date and may indicate the unimplemented intent of the author
more than the actual implementation.  Again, I believe a large majority of
the docs are fine, but it's those few errors that made a devil's advocate of
me...

> Fixing the return of undef is either a job that shouldn't be done, or a
> much harder job than expected.

I don't think ignoring the problem is the best answer here though I agree
the problem is more complicated than at first glance.  Judging from code I'm
trolled through a bit lately I've seen a lot of methods (mainly get/setters)
that are essentially copied multiple times in the same or across similar
modules to save time.  You could see a scenario where, in those instances,
so-called 'bad code' would spread quite quickly.

I think adding a wiki page to address some of these issues would be nice,
something separate from the Project Priority List.

Chris
 _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From forward at hongyu.org  Wed May 31 18:03:46 2006
From: forward at hongyu.org (Hongyu Zhang)
Date: Wed, 31 May 2006 11:03:46 -0700
Subject: [Bioperl-l] New functions for SimpleAlign.pm
Message-ID: <20060531110346.78xod658td8o0w0w@hongyu.org>

Greetings,

I am a new member in this mailing list. Nice to be here.

I wrote two more functions for the alignment module SimpleAlign.pm  
that calculate the percentage of identity based on the shortest and  
longest sequence length, respectively. I also found an error in the  
no_residues() function that calculate the number of residues in the  
alignment.

I am wondering whether they can be added to the official bioperl  
package. I've contacted the original author of this module, Heikki  
Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet.

Thanks.

-- 
Hongyu Zhang, Ph.D.
Computational biologist
Ceres Inc.


From cjfields at uiuc.edu  Wed May 31 19:39:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 14:39:26 -0500
Subject: [Bioperl-l] New functions for SimpleAlign.pm
In-Reply-To: <20060531110346.78xod658td8o0w0w@hongyu.org>
Message-ID: <001901c684e9$ed4a1720$15327e82@pyrimidine>

I added a bit to the FAQ about this:

http://www.bioperl.org/wiki/FAQ#How_do_I_submit_a_patch_or_enhancement_to_Bi
oPerl.3F

and the HOWTO explains things a bit more directly:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

In brief, these need to be submitted to Bugzilla as either code enhancements
(for your added methods) or bugs with the patch to the relevant code.  Code
enhancements probably should include some code and test cases to demonstrate
usage.  Patches to buggy code are checked to make sure they pass relevant
tests by the core developers.  Submitting it to the mail list is definitely
the first step, though, so you're on the right path.

Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Hongyu Zhang
> Sent: Wednesday, May 31, 2006 1:04 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] New functions for SimpleAlign.pm
> 
> Greetings,
> 
> I am a new member in this mailing list. Nice to be here.
> 
> I wrote two more functions for the alignment module SimpleAlign.pm
> that calculate the percentage of identity based on the shortest and
> longest sequence length, respectively. I also found an error in the
> no_residues() function that calculate the number of residues in the
> alignment.
> 
> I am wondering whether they can be added to the official bioperl
> package. I've contacted the original author of this module, Heikki
> Lehvaslaiho, a couple of weeks ago, but haven't heard from him yet.
> 
> Thanks.
> 
> --
> Hongyu Zhang, Ph.D.
> Computational biologist
> Ceres Inc.
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed May 31 20:40:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 15:40:19 -0500
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <200605311415.00414.lstein@cshl.edu>
Message-ID: <002001c684f2$6fb7daf0$15327e82@pyrimidine>

What about modules that have 'throw_not_implemented' statements present?
Here's a list with the total for each.  Some of these are interfaces (I got
rid of a number that ended in 'I' or 'IO' to remove the I/IO interfaces but
it misses a few).  There are a number here that are implementations, though
(Bio::AlignIO::maf, Bio::Restriction:IO::*), so they are technically
incomplete:

Instances: 1	Module : Bio::AlignIO::maf
Instances: 25	Module : Bio::Assembly::Contig
Instances: 2	Module : Bio::Assembly::ContigAnalysis
Instances: 2	Module : Bio::Biblio::BiblioBase
Instances: 4	Module : Bio::DB::Expression
Instances: 2	Module : Bio::DB::Expression::geo
Instances: 5	Module : Bio::DB::Flat
Instances: 2	Module : Bio::DB::Query::WebQuery
Instances: 17	Module : Bio::DB::SeqFeature::Store
Instances: 2	Module : Bio::DB::SeqVersion
Instances: 3	Module : Bio::DB::Taxonomy
Instances: 1	Module : Bio::FeatureIO::bed
Instances: 1	Module : Bio::Map::Marker
Instances: 1	Module : Bio::MapIO::fpc
Instances: 1	Module : Bio::MapIO::mapmaker
Instances: 1	Module : Bio::Restriction::IO::bairoch
Instances: 1	Module : Bio::Restriction::IO::itype2
Instances: 1	Module : Bio::Restriction::IO::withrefm
Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
Instances: 3	Module : Bio::Tools::Run::WrapperBase

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> Sent: Wednesday, May 31, 2006 1:15 PM
> To: Hilmar Lapp
> Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> Subject: Re: [Bioperl-l] For CVS developers - potential
> pitfallwith"returnundef"
> 
> If the documentation says "returns false" then I expect to be able to do
> this:
> 
> 	@result = foo();
> 	die "foo() failed" unless @result;
> 
> If the documentation says "returns undef" then I expect this:
> 
> 	@result = foo();
> 	die "foo() failed" unless $result[0];
> 
> Lincoln
> 
> 
> On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > If the subroutine is documented to return "false" on failure, then
> > > one must call
> > > return (or "return ()" ).
> >
> > The problem seems to be that 'a value that evaluates to either true
> > or false' and 'a [meaningful] value or undef' and 'a value or
> > false' ('a value or no value) are not the same in perl. And what
> > would/should one expect if the doc states 'true on success and false
> > otherwise'?
> >
> > Maybe the documentation should also be fixed to avoid any ambiguity.
> > I.e., avoid documenting 'a value or false' because it may be
> > ambiguous (not only) to the less proficient. 'True or false' should
> > imply a value being returned.
> >
> > Comments?
> >
> > 	-hilmar
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From lstein at cshl.edu  Wed May 31 21:07:06 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 17:07:06 -0400
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
References: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
Message-ID: <200605311707.08196.lstein@cshl.edu>


> Instances: 17	Module : Bio::DB::SeqFeature::Store

This is intentional. Bio::DB::SeqFeature::Store is intended to be a virtual 
base class. The throw_not_implemented() calls are there to force developers 
to override the needed interface methods.

If this is not the right way to do it, let me know and I'll fix it.

Lincoln


> Instances: 2	Module : Bio::DB::SeqVersion
> Instances: 3	Module : Bio::DB::Taxonomy
> Instances: 1	Module : Bio::FeatureIO::bed
> Instances: 1	Module : Bio::Map::Marker
> Instances: 1	Module : Bio::MapIO::fpc
> Instances: 1	Module : Bio::MapIO::mapmaker
> Instances: 1	Module : Bio::Restriction::IO::bairoch
> Instances: 1	Module : Bio::Restriction::IO::itype2
> Instances: 1	Module : Bio::Restriction::IO::withrefm
> Instances: 1	Module : Bio::Tools::Analysis::SimpleAnalysisBase
> Instances: 3	Module : Bio::Tools::Run::WrapperBase
>
> Chris
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Lincoln Stein
> > Sent: Wednesday, May 31, 2006 1:15 PM
> > To: Hilmar Lapp
> > Cc: bioperl-l at lists.open-bio.org; Heikki Lehvaslaiho
> > Subject: Re: [Bioperl-l] For CVS developers - potential
> > pitfallwith"returnundef"
> >
> > If the documentation says "returns false" then I expect to be able to do
> > this:
> >
> > 	@result = foo();
> > 	die "foo() failed" unless @result;
> >
> > If the documentation says "returns undef" then I expect this:
> >
> > 	@result = foo();
> > 	die "foo() failed" unless $result[0];
> >
> > Lincoln
> >
> > On Wednesday 31 May 2006 14:08, Hilmar Lapp wrote:
> > > On May 31, 2006, at 12:03 PM, Lincoln Stein wrote:
> > > > If the subroutine is documented to return "false" on failure, then
> > > > one must call
> > > > return (or "return ()" ).
> > >
> > > The problem seems to be that 'a value that evaluates to either true
> > > or false' and 'a [meaningful] value or undef' and 'a value or
> > > false' ('a value or no value) are not the same in perl. And what
> > > would/should one expect if the doc states 'true on success and false
> > > otherwise'?
> > >
> > > Maybe the documentation should also be fixed to avoid any ambiguity.
> > > I.e., avoid documenting 'a value or false' because it may be
> > > ambiguous (not only) to the less proficient. 'True or false' should
> > > imply a value being returned.
> > >
> > > Comments?
> > >
> > > 	-hilmar
> >
> > --
> > Lincoln D. Stein
> > Cold Spring Harbor Laboratory
> > 1 Bungtown Road
> > Cold Spring Harbor, NY 11724
> > (516) 367-8380 (voice)
> > (516) 367-8389 (fax)
> > FOR URGENT MESSAGES & SCHEDULING,
> > PLEASE CONTACT MY ASSISTANT,
> > SANDRA MICHELSEN, AT michelse at cshl.edu
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From hlapp at gmx.net  Wed May 31 21:21:57 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:21:57 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
References: <002001c684f2$6fb7daf0$15327e82@pyrimidine>
Message-ID: <A5EEA3BE-DEC6-42F2-AC44-D54F6C49DD8E@gmx.net>


On May 31, 2006, at 4:40 PM, Chris Fields wrote:

> What about modules that have 'throw_not_implemented' statements  
> present?

Those are often if not always legitimate - the problem are those that  
don't have them but fail to override an inherited interface or  
abstract method.

If something is not implemented what is the better way to express  
this other than throwing an exception? (and if it's not an interface  
or abstract base class, saying so in the documentation)

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed May 31 21:25:48 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:25:48 -0400
Subject: [Bioperl-l] For CVS developers - potential
	pitfallwith"returnundef"
In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine>
References: <001801c684e3$16e33730$15327e82@pyrimidine>
Message-ID: <8AA04BF0-FA79-43CF-9FBB-310314FECD91@gmx.net>


On May 31, 2006, at 2:50 PM, Chris Fields wrote:

> I've seen a lot of methods (mainly get/setters)
> that are essentially copied multiple times in the same or across  
> similar
> modules to save time.  You could see a scenario where, in those  
> instances,
> so-called 'bad code' would spread quite quickly.

This will usually be code generated by macros, e.g. the emacs macros  
for getter/setter generation for properties.

If the macro generates wrong code, that's indeed pretty bad. (We've  
had that.) OTOH it should be spotted quickly as well. And macro  
changes or new macros should probably be scrutinized by all eyes  
watching ...

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed May 31 21:40:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 31 May 2006 16:40:22 -0500
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <A5EEA3BE-DEC6-42F2-AC44-D54F6C49DD8E@gmx.net>
Message-ID: <002401c684fa$d28e7640$15327e82@pyrimidine>

I think, as long as it's reflected in the docs that something doesn't work
(hasn't been implemented) then there's no problem.  It's when the docs are
misleading that we run into problems.  

The sticking point lies with some classes, such as IO classes (like SeqIO,
or Restrict::IO, with read and write methods) where the IO base class
specifies that it is possible to read and write a particular format but the
actual implementation varies according to whether or not the derived class
overrides the base or interface method (in other words, 'doesn't work as
advertised' only in specific circumstances).  I don't know how to solve this
issue except to add in the docs that specific formats don't implement
write() methods.  

Personally, I haven't had an issue with it and it probably makes no
difference, but I think it needs to be pointed out.  The most extreme I ran
into was Bio::Restriction::IO, which had 3 out of 4 plugin modules that
didn't implement the write() method but left this in the synopsis in POD:

    use Bio::Restriction::IO;

    $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
                                     -format => 'withrefm');
    $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
                                     -format => 'bairoch');
    my $res = $in->read; # a Bio::Restriction::EnzymeCollection
    $out->write($res);

  # or

  #    use Bio::Restriction::IO;
  #
  #    #input file format can be read from the file extension (dat|xml)
  #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
  #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
  #
  #    # World's shortest flat<->xml format converter:
  #    print $out $_ while <$in>;

None of this code works; in fact, no XML parser even exists for these IO
classes!  Bio::AlignIO also has a few as well (maf and Stockholm formats
don't write).

Chris


> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Wednesday, May 31, 2006 4:22 PM
> To: Chris Fields
> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki Lehvaslaiho'
> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
> 
> 
> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> 
> > What about modules that have 'throw_not_implemented' statements
> > present?
> 
> Those are often if not always legitimate - the problem are those that
> don't have them but fail to override an inherited interface or
> abstract method.
> 
> If something is not implemented what is the better way to express
> this other than throwing an exception? (and if it's not an interface
> or abstract base class, saying so in the documentation)
> 
> 	-hilmar
> 
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 


From hlapp at gmx.net  Wed May 31 21:55:37 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 31 May 2006 17:55:37 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
In-Reply-To: <002401c684fa$d28e7640$15327e82@pyrimidine>
References: <002401c684fa$d28e7640$15327e82@pyrimidine>
Message-ID: <CB29173C-0BFC-43CA-A620-519084AFEE04@gmx.net>

This is documentation cruft resulting from copy&paste w/o later  
fixing it. (which isn't a justification)

Note that not implementing the write is as legitimate as not  
implementing the read method ... It should be pointed out in the  
documentation though that it will depend on the actual implementation  
of the format whether it supports reading or writing or both.

	-hilmar

On May 31, 2006, at 5:40 PM, Chris Fields wrote:

> I think, as long as it's reflected in the docs that something  
> doesn't work
> (hasn't been implemented) then there's no problem.  It's when the  
> docs are
> misleading that we run into problems.
>
> The sticking point lies with some classes, such as IO classes (like  
> SeqIO,
> or Restrict::IO, with read and write methods) where the IO base class
> specifies that it is possible to read and write a particular format  
> but the
> actual implementation varies according to whether or not the  
> derived class
> overrides the base or interface method (in other words, 'doesn't  
> work as
> advertised' only in specific circumstances).  I don't know how to  
> solve this
> issue except to add in the docs that specific formats don't implement
> write() methods.
>
> Personally, I haven't had an issue with it and it probably makes no
> difference, but I think it needs to be pointed out.  The most  
> extreme I ran
> into was Bio::Restriction::IO, which had 3 out of 4 plugin modules  
> that
> didn't implement the write() method but left this in the synopsis  
> in POD:
>
>     use Bio::Restriction::IO;
>
>     $in  = Bio::Restriction::IO->new(-file => "inputfilename" ,
>                                      -format => 'withrefm');
>     $out = Bio::Restriction::IO->new(-file => ">outputfilename" ,
>                                      -format => 'bairoch');
>     my $res = $in->read; # a Bio::Restriction::EnzymeCollection
>     $out->write($res);
>
>   # or
>
>   #    use Bio::Restriction::IO;
>   #
>   #    #input file format can be read from the file extension (dat| 
> xml)
>   #    $in  = Bio::Restriction::IO->newFh(-file => "inputfilename");
>   #    $out = Bio::Restriction::IO->newFh('-format' => 'xml');
>   #
>   #    # World's shortest flat<->xml format converter:
>   #    print $out $_ while <$in>;
>
> None of this code works; in fact, no XML parser even exists for  
> these IO
> classes!  Bio::AlignIO also has a few as well (maf and Stockholm  
> formats
> don't write).
>
> Chris
>
>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Wednesday, May 31, 2006 4:22 PM
>> To: Chris Fields
>> Cc: lstein at cshl.edu; bioperl-l at lists.open-bio.org; 'Heikki  
>> Lehvaslaiho'
>> Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented
>>
>>
>> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
>>
>>> What about modules that have 'throw_not_implemented' statements
>>> present?
>>
>> Those are often if not always legitimate - the problem are those that
>> don't have them but fail to override an inherited interface or
>> abstract method.
>>
>> If something is not implemented what is the better way to express
>> this other than throwing an exception? (and if it's not an interface
>> or abstract base class, saying so in the documentation)
>>
>> 	-hilmar
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From slenk at emich.edu  Wed May 31 21:52:13 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Wed, 31 May 2006 17:52:13 -0400
Subject: [Bioperl-l] For CVS developers - throw_not_implemented
Message-ID: <100682f110067a83.10067a83100682f1@emich.edu>


Isn't it fairly standard in OO schemes/languages to have an exception thrown if a method 
can't be found at the 
end of a search up the class hierarchy? I recall being very mad at Smalltalk because "method 
not found" kept 
biting me. C++ has pure virtual base classes that do not allow objects to be instantiated 
directly; they are 
meant to be inherited and then implemented. 

Perl 6 was mentioned a bit back. Is this issue addressed there? Should it be? Do the Bioperl 
people feed their 
needs into Perl 6 so that all the code effort to make Bio::Root is handled for them in the next 
effort by Perl 6 
itself. Make the Perl 6 people solve these issues with your input, then you will not have to 
deal with 
implementing it yourselves. I'll just bet that you are not the only potential users of Perl 6 who 
will have to solve 
these issues eventually.


----- Original Message -----
From: Hilmar Lapp <hlapp at gmx.net>
Date: Wednesday, May 31, 2006 5:21 pm
Subject: Re: [Bioperl-l] For CVS developers - throw_not_implemented

> 
> On May 31, 2006, at 4:40 PM, Chris Fields wrote:
> 
> > What about modules that have 'throw_not_implemented' statements  
> > present?
> 
> Those are often if not always legitimate - the problem are those 
> that  
> don't have them but fail to override an inherited interface or  
> abstract method.
> 
> If something is not implemented what is the better way to express  
> this other than throwing an exception? (and if it's not an 
> interface  
> or abstract base class, saying so in the documentation)
> 
> 	-hilmar
> 
> -- 
> 
=========================================================
==
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> 
=========================================================
==
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From arareko at campus.iztacala.unam.mx  Wed May 31 22:49:03 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 31 May 2006 17:49:03 -0500
Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
In-Reply-To: <001201c684d0$263c5530$15327e82@pyrimidine>
References: <001201c684d0$263c5530$15327e82@pyrimidine>
Message-ID: <447E1D5F.1050807@campus.iztacala.unam.mx>

Brian, Jay, Chris,

I agree with what Bernd Web said in another reply. For some people will 
be nice to still be able to run the script from the codebase and 
interact with it.

I don't think it should be a lot of problem to maintain both tutorials, 
as long as the 'main' one is the one in the CVS tree. By reading what 
Jay did in order to convert it into mediawiki format, I suppose this can 
be easily done again for each new change to the script (again, this is 
just my guessing). Besides, as far as I've seen, there aren't frequent 
commits to the script at all.

I've added a link in the left menu of the wiki. If you think it should 
point to the Tutorials page instead of the Bptutorial.pl page please let 
me know.

Regards,
Mauricio.

Chris Fields wrote:
> Brian, Jay,
> 
> I think it would be nice to have the tutorial prominently displayed somehow
> (Jay's suggestion), with a link provided via the tutorials page.  Hopefully
> this will help with the bioperl newbies.
> 
> Jay, looks like there are still some weird formatting issues with the
> bptutorial wiki page, something which I ran into before when getting the
> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more
> spaces preceding a line denotes code for some reason).  Not much you can do
> in these cases except remove the extra spaces in those spots.  Looking good
> though!  
> 
> Chris
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Brian Osborne
>> Sent: Wednesday, May 31, 2006 8:58 AM
>> To: Jay Hannah; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl
>>
>> Jay,
>>
>> Excellent! Now we need to answer a few more questions for ourselves:
>>
>> - Do we remove the file bptutorial.pl from the package now? I'd say yes,
>> we
>> don't want to have to maintain two bptutorials.
>>
>> - What do we do with the script part of bptutorial.pl? It certainly could
>> be
>> excised and put into the examples/ directory, for example, but this would
>> break a few of the paths that are being used.
>>
>> - A link to bptutorial? Or a link to the existing tutorials page?
>> http://www.bioperl.org/wiki/Tutorials.
>>
>> Any thoughts on these?
>>
>>
>> Brian O.
>>
>>
>> On 5/31/06 9:07 AM, "Jay Hannah" <jay at jays.net> wrote:
>>
>>> http://www.bioperl.org/wiki/Bptutorial.pl
>>>
>>> I think I just partially fulfilled this TODO:
>>>
>>>   TODO: check if the POD is in the Wiki yet, and if not, put it here?
>>>
>>> I used Pod::Simple::Wiki (format 'mediawiki') to burn
>>> bioperl-live/bptutorial.pl POD into mediawiki format. I then pasted it
>> the
>>> wiki page via my web browser. (Is that proper procedure? Is the plan to
>> just
>>> do that manually from time to time as the document changes?)
>>>
>>> Now what?
>>>
>>> Should there be a new link on the far left of bioperl.org called
>> "Tutorial"?
>>> It's an amazing document. IMHO it should be listed prominently on
>> bioperl.org.
>>> HTH,
>>>
>>> j
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From gad14 at cornell.edu  Tue May 30 16:57:41 2006
From: gad14 at cornell.edu (Genevieve DeClerck)
Date: Tue, 30 May 2006 12:57:41 -0400
Subject: [Bioperl-l] results problem with StandAloneBlast
In-Reply-To: <447BFB20.40501@mrc-dunn.cam.ac.uk>
References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk>
Message-ID: <447C7985.9000404@cornell.edu>

Thanks for your comment Sendu, it was very helpful. I think this must be 
what's going on.. I am using $blast_report->next_result in both 
subroutines. It appears that analyzing the blast results first w/ my 
sort subroutine empties (?) the $blast_result object so that when I try 
to print, there is nothing left to print. (and visa-versa when I print 
first then try to sort).
So, from the looks of things, using next_result has the effect of 
popping the Bio::Search::Result::ResultI objects off of the SearchIO 
blast report object??

It seems I could get around this by making a copy of the blast report by 
setting it to another new variable...(not the most elegant solution) but 
I'm having trouble with this...

If I do:

	my $blast_report_copy = $blast_report;

I'm just copying the reference to the SearchIO blast result, so it 
doesn't help me. How can I make another physical copy of this blast 
result object? Seems like a simple thing but how to do it is escaping me.

But better yet, the way to go is to 'reset the counter,' or to find a 
way to look at/print/sort the results without removing data from the 
blast result object. How is this done though??

Sendu and Brian, I didn't post the sort_results subroutine because it is 
sprawling, as is a lot of my code. The code I provided was more like an 
aid for my explanation of the problem.. it doesn't actually run - sorry 
for the confusion, I should have more clear on that.  The important 
thing to know perhaps is that both sort_results and print_blast_results 
contain a foreach loop where I am using the 'next_results' method to 
view blast results. (And to clarify for Torsten, the blastall() is 
working just fine - the analysis/viewing of the results object is where 
I am encountering the problem.)


Any other ideas would be greatly appreciated...

Thank you,
Genevieve


Sendu Bala wrote:

> Genevieve DeClerck wrote:
> 
>> Hi,
> 
> [snip]
> 
>> If I've sorted the results the sorted-results will print to screen, 
>> however when I try to print the Hit Table results nothing is returned, 
>> as if the blast results have evaporated.... and visa versa, if i 
>> comment out the part where i point my sorting subroutine to the blast 
>> results reference,  my hit table results suddenly prints to screen.
> 
> [snip]
> 
>> Here's an abbreviated version of my code:
> 
> [snip]
> 
>> #######
>> ### the following 2 actions seem to be mutually exclusive.
>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of
>> # SeqFeature objs stored in arrays. arrays are then printed
>> # to stdout
>> &sort_results($blast_report);
>>
>> # 2) print blast results
>> &print_blast_results($blast_report);
> 
> 
>> sub print_blast_results{
>>    my $report = shift;
>>    while(my $result = $report->next_result()){
> 
> [snip]
> 
> You didn't give us your sort_results subroutine, but is it as simple as 
> they both use $report->next_result (and/or $result->next_hit), but you 
> don't reset the internal counter back to the start, so the second 
> subroutine tries to get the next_result and finds the first subroutine 
> has already looked at the last result and so next_result returns false?
> 
>  From a quick look it wasn't obvious how to reset the counter. Hopefully 
> this can be done and someone else knows how.
> 


From lstein at cshl.edu  Wed May 31 15:17:39 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 11:17:39 -0400
Subject: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg
	values
In-Reply-To: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>
References: <5b6410e0605302045x5c420674x6f898a8a2973991a@mail.gmail.com>
Message-ID: <200605311117.41479.lstein@cshl.edu>

Hi Kevin,

Since you are modifying the Panel.pm source code, why don't you just go ahead 
and use the current Bio::Graphics development tree? Since 1.5.1 it supports 
negative coordinates. Here's an illustration:

 #!/usr/bin/perl

 use strict;

 use Bio::Graphics;
 use Bio::Graphics::Feature;

 my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
 my $feature = 
Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
 my $panel   = Bio::Graphics::Panel->new(-start=> -200,
					 -end  => +200,
					 -width=>800,
					 -pad_left=>10,
					 -pad_right=>10);
 $panel->add_track($whole,
		   -glyph=>'arrow',
		   -double=>1,
		   -tick=>2);
 $panel->add_track($feature,
	 	  -glyph=>'box',
		   -stranded=>1);
 print $panel->png;

 exit 0;

The resulting image is attached.

Lincoln

On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> I am so sorry for the truncated email accidentally hit reply.
> if anyone is interested i have opted to change
>
> change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> in linux its
> /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
>
>
>       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
>
> to
>
>       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
>
> just  for this one-off use.
>
>
>
> strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> option for coords offset?
>     my $relative_coords_offset = $self->option('relative_coords_offset');
>     $relative_coords_offset    = 1 unless defined $relative_coords_offset;
> but entering the option -relative_coords_offset=>1000 in the arrow glyphs
> didn't do anything...
>
>
>
> Hi!
>
> > oh it was in a slightly different header asking about the create image
> > map feature.
> > I am using the stable version 1.4 of bioperl now. In any case I have not
> > added the sequence as a feature annotated seq. as I already have the bp
> > where the TF binds (in 1-1050 numberings) so what I did was to just add
> > graded segments based on the position.
> > I saw that there is a scale function for the arrow glyp however, it is a
> > multiply function, can it be hacked to take in a offset value (ie minus
> > the
> > scale by 1000?)
> >
> > cheers
> > kevin
> >
> >
> > Hi,
> >
> > > For some reason I didn't see the first posting on this. In current
> >
> > bioperl
> >
> > > live, the ruler can have negative numberings - I use this routinely.
> > > You need
> > > to create a feature that starts in negative coordinates. What is
> >
> > happening
> >
> > > to
> > > you when you try this?
> > >
> > > Lincoln
> > >
> > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > Hi
> > > > thanks for the help offered thus far!
> > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > >
> > > using
> > >
> > > > bioperl. therefore i was asked to make the numberings as such (-1000)
> >
> > is
> >
> > > > there any way at all to do this in bioperl without changing the .pm
> > >
> > > file?
> > >
> > > > thanks guys..
> > > > kevin
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: negatives.png
Type: image/png
Size: 1065 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/eaeb5e28/attachment-0004.png>

From lstein at cshl.edu  Wed May 31 16:05:47 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 31 May 2006 12:05:47 -0400
Subject: [Bioperl-l] Fwd: Re: SOLVED Bio::Graphics::Panel make ruler have
	neg values
Message-ID: <200605311205.48122.lstein@cshl.edu>

Oddly, bioperl-l listserver is holding this mail because it has "a suspicious 
header". I took out Kevin's email address in case it is the "spammotel" 
header that is bothering it.

Lincoln

----------  Forwarded Message  ----------

Subject: Re: [Bioperl-l] SOLVED Bio::Graphics::Panel make ruler have neg 
values
Date: Wednesday 31 May 2006 11:17
From: Lincoln Stein <lstein at cshl.edu>
To: bioperl-l at lists.open-bio.org
Cc: "Kevin Lam Koiyau" <ULNJUJERYDIX at spammotel.com>

Hi Kevin,

Since you are modifying the Panel.pm source code, why don't you just go ahead
and use the current Bio::Graphics development tree? Since 1.5.1 it supports
negative coordinates. Here's an illustration:

 #!/usr/bin/perl

 use strict;

 use Bio::Graphics;
 use Bio::Graphics::Feature;

 my $whole   = Bio::Graphics::Feature->new(-start=>-200,-end=>+200);
 my $feature =
Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1);
 my $panel   = Bio::Graphics::Panel->new(-start=> -200,
					 -end  => +200,
					 -width=>800,
					 -pad_left=>10,
					 -pad_right=>10);
 $panel->add_track($whole,
		   -glyph=>'arrow',
		   -double=>1,
		   -tick=>2);
 $panel->add_track($feature,
	 	  -glyph=>'box',
		   -stranded=>1);
 print $panel->png;

 exit 0;

The resulting image is attached.

Lincoln

On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote:
> I am so sorry for the truncated email accidentally hit reply.
> if anyone is interested i have opted to change
>
> change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm
> in linux its
> /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm
>
>
>       $gd->string($font,$middle,$center+$a2-1,$label,$font_color)
>
> to
>
>       $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color)
>
> just  for this one-off use.
>
>
>
> strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden
> option for coords offset?
>     my $relative_coords_offset = $self->option('relative_coords_offset');
>     $relative_coords_offset    = 1 unless defined $relative_coords_offset;
> but entering the option -relative_coords_offset=>1000 in the arrow glyphs
> didn't do anything...
>
>
>
> Hi!
>
> > oh it was in a slightly different header asking about the create image
> > map feature.
> > I am using the stable version 1.4 of bioperl now. In any case I have not
> > added the sequence as a feature annotated seq. as I already have the bp
> > where the TF binds (in 1-1050 numberings) so what I did was to just add
> > graded segments based on the position.
> > I saw that there is a scale function for the arrow glyp however, it is a
> > multiply function, can it be hacked to take in a offset value (ie minus
> > the
> > scale by 1000?)
> >
> > cheers
> > kevin
> >
> >
> > Hi,
> >
> > > For some reason I didn't see the first posting on this. In current
> >
> > bioperl
> >
> > > live, the ruler can have negative numberings - I use this routinely.
> > > You need
> > > to create a feature that starts in negative coordinates. What is
> >
> > happening
> >
> > > to
> > > you when you try this?
> > >
> > > Lincoln
> > >
> > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote:
> > > > Hi
> > > > thanks for the help offered thus far!
> > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer seq
> > >
> > > using
> > >
> > > > bioperl. therefore i was asked to make the numberings as such (-1000)
> >
> > is
> >
> > > > there any way at all to do this in bioperl without changing the .pm
> > >
> > > file?
> > >
> > > > thanks guys..
> > > > kevin
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Lincoln D. Stein
> > > Cold Spring Harbor Laboratory
> > > 1 Bungtown Road
> > > Cold Spring Harbor, NY 11724
> > > (516) 367-8380 (voice)
> > > (516) 367-8389 (fax)
> > > FOR URGENT MESSAGES & SCHEDULING,
> > > PLEASE CONTACT MY ASSISTANT,
> > > SANDRA MICHELSEN, AT michelse at cshl.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

-------------------------------------------------------

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: negatives.png
Type: image/png
Size: 1065 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060531/6c5f4137/attachment-0004.png>

From rvosa at sfu.ca  Tue May 30 19:10:17 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 30 May 2006 12:10:17 -0700
Subject: [Bioperl-l] New mailing list for Bio::Phylo
Message-ID: <447C9899.5060102@sfu.ca>

Dear recipients,

the open bioinformatics foundation has been kind enough to host a 
mailing list for Bio::Phylo (http://search.cpan.org/~rvosa/Bio-Phylo/, 
the cpan distribution for phylogenetic analysis using perl).

The scope of this list is at present fairly broad as it is both meant 
for user questions and development discussion on deeper integration with 
bioperl.

You are invited to sign up at: 
http://lists.open-bio.org/mailman/listinfo/bio-phylo-l

Best wishes,

Rutger Vos

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++
Rutger Vos, PhD. candidate
Department of Biological Sciences
Simon Fraser University
8888 University Drive
Burnaby, BC, V5A1S6
Phone: 604-291-5625 
Fax: 604-291-3496
Personal site: http://www.sfu.ca/~rvosa
FAB* lab: http://www.sfu.ca/~fabstar
Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
++++++++++++++++++++++++++++++++++++++++++++++++++++